Skip to content

goabiaryan/awesome-gpu-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome GPU Engineering Awesome

A curated list of resources for mastering GPU engineering from architecture and kernel programming to large-scale distributed systems and AI acceleration.


📘 Foundational Books

  • Programming Massively Parallel Processors: A Hands-on ApproachDavid B. Kirk & Wen-mei W. Hwu The canonical introduction to CUDA, memory hierarchies, and parallel patterns. Amazon , notes: Abi's Concise Notes
  • CUDA by ExampleJason Sanders & Edward Kandrot
    A practical introduction to CUDA for beginners. Amazon
  • The Ultra-Scale Playbook: Training LLMs on GPU Clusters - Hugging Face Web Version

💻 GPU Programming Frameworks

  • CUDA — NVIDIA’s proprietary GPU programming platform.
  • ROCm — AMD’s open compute stack.
  • OpenCL — Cross-platform parallel computing standard.
  • SYCL / oneAPI — Intel’s C++ abstraction for heterogeneous compute.
  • Vulkan Compute — Low-level GPU compute API.
  • Kompute — Higher level general purpose GPU compute framework built on Vulkan.
  • Metal Performance Shaders — Apple’s GPU framework.

🧩 Optimization and Performance

  • NVIDIA Nsight Systems — System-wide GPU profiler.
  • Nsight Compute — Kernel-level performance analysis.
  • Occupancy Calculator — NVIDIA spreadsheet for kernel configuration.
  • CUTLASS — CUDA templates for linear algebra subroutines.
  • TensorRT — High-performance deep learning inference.
  • OpenAI Triton — Python DSL for writing high-performance GPU kernels.
  • Roofline Model — Analytical model to reason about compute/memory bottlenecks.

🧠 Architecture and Low-Level Design

⚙️ Systems and Multi-GPU Engineering

🧪 Tutorials and Courses

📄 Research Papers and Articles

🧰 Tools and Utilities

  • nvprof, nvvp, Nsight Systems / Compute — NVIDIA profiling tools.
  • cuda-memcheck, compute-sanitizer — Memory and correctness tools.
  • GPGPU-Sim, Accel-Sim — GPU simulation frameworks.
  • Perfetto, Nsight UI — Visual profilers for tracing GPU workloads.

Learning Tools

🧑‍🔬 GPU for AI & ML

  • PyTorch CUDA Extensions — Custom kernels for PyTorch.
  • JAX + XLA — Compiler-based GPU vectorization.
  • TensorFlow XLA Compiler — Ahead-of-time GPU graph compilation.
  • FlashAttention, FlashConv — Kernel optimization techniques for transformers.
  • DeepSpeed, FSDP, Megatron-LM — Distributed training systems.

🧱 GPU Systems Design Topics For Interview Prep

  • FlashAttention and PagedAttention
  • Matmul Operations
  • GPU scheduling algorithms and runtime systems.
  • Memory oversubscription and unified memory models.
  • Resource allocation in GPU clusters.
  • GPU virtualization
  • Kernel fusion and graph execution
  • Dataflow optimization
  • Persistent threads model

🧑‍💻 Contributors

Contributions welcome!
Please read the contribution guidelines before submitting a pull request.

🧾 License

CC BY 4.0 — feel free to share and adapt with attribution.

⭐ Acknowledgements

Inspired by:


“GPU engineering is not just about writing kernels. It’s about understanding how systems work.” — Model Craft

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages