GVProf: A Value Profiler for GPU-based Clusters
-
Updated
Mar 24, 2024 - Python
GVProf: A Value Profiler for GPU-based Clusters
KeSSie HUGE Context Semantic recall for Large Language Models
The GPU Optimizer for ML Models enhances GPU performance for machine learning. It offers advanced scheduling, real-time monitoring, and efficient resource management through a user-friendly web interface and robust API, integrating big data technologies for seamless data processing and model optimization. @NVIDIA
Physics-based computation at scale — Hamiltonian dynamics, spectral theory, and statistical mechanics powering optimization, drug discovery, genomics, molecular proof, and agentic commerce.
Executive FinOps dashboard and automated governance engine using FOCUS 1.3 standards for AWS, Azure, and Snowflake.
Text-to-video generation application that converts natural language (english) prompts into short animated videos using diffusion models and AnimateDiff, with GPU-aware optimization and an interactive Gradio UI that can be executed on Google Colab (T4 GPU).
🤖 Ollama Consumer - A Python-based interactive chat interface for Ollama models with advanced model management, comprehensive benchmarking, vision support, and automatic error recovery. Features dynamic model switching, GPU optimization, and intelligent service monitoring for seamless AI model interactions.
Quantitative dataset of 119 neural architectures (2017-2025) scored on hardware compatibility and ecosystem friction. Validates the Transformer Attractor thesis.
Optimizing PyTorch Model Training by Wrapping Memory Mapped Tensors on Nvidia GPUs with TensorDict.
AI Infrastructure Senior Engineer Learning Track - Advanced ML infrastructure and technical leadership
Hybrid AI routing: LOCAL Ollama + CLOUD GitHub Copilot
Optimizing PyTorch Model Training by Wrapping Memory Mapped Tensors on an Nvidia GPU with TensorDict.
Profile-first ML systems project optimizing a multi-camera end-to-end driving model for hardware efficiency using PyTorch, CUDA streams, NVTX instrumentation, and Nsight Systems.
LM Multi-Bin Dynamic Scheduler Simulator - Implementation combining Multi-Bin batching with SLA-constrained dynamic batching
Drop-in small-matrix acceleration for PyTorch on edge devices
High-performance CUDA implementation of LayerNorm for PyTorch achieving 1.46x speedup through kernel fusion. Optimized for large language models (4K-8K hidden dims) with vectorized memory access, warp-level primitives, and mixed precision support. Drop-in replacement for nn.LayerNorm with 25% memory reduction.
🚀 AWS FinOps Container Optimization for AI Workloads Reference implementation of FinOps best practices for optimizing ECS/EKS-based AI workloads on AWS. Achieve cost optimization through spot instances, autoscaling, and intelligent resource management. 🎯 Key Features: • Spot instance strategies for AI training/inference and cost visibility
GPU-Optimized AI for Geospatial Annotation and Visual Search Accelerating Geospatial Intelligence through Distillation, Segmentation, and GPU Optimization.
An advanced hybrid scheduling framework that leverages Reinforcement Learning and ML to dynamically optimize CPU/GPU task allocation in real-time.
Enterprise-grade financial framework for modeling $5.4M+ cloud contract ROI and risk sensitivity. Features automated break-even analysis, 3-year commitment simulation for AI/GPU infrastructure, and a CFO-ready stress-test matrix to prove profitability even at 40% utilization. Designed for high-stakes C-Suite decision support.
Add a description, image, and links to the gpu-optimization topic page so that developers can more easily learn about it.
To associate your repository with the gpu-optimization topic, visit your repo's landing page and select "manage topics."