Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/gpu-perf-engineering-resources
Library/gpu-perf-engineering-resourcesForked

wafer-ai/gpu-perf-engineering-resources

gpu-perf-engineering-resources

A curriculum for learning about gpu performance engineering, from scratch to what the frontier AI labs do

View on GitHub↗Upstream wafer-ai/gpu-perf-engineering-resources↗

Builder

wafer-ai

wafer-ai

wafer-ai • individual

Stars

707

Using upstream star count

Forks

79

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Jan 12, 2026

Project creation date

README Summary

<p align="center"> <img src="cover.avif" alt="Performance Engineering for AI Infra" width="100%"> </p>

Community Evaluation

Loading…

AI Dev Skills

Unmapped

CUDA ProgrammingDeep Learning Systems OptimizationDistributed ComputingGPU Architecture and ProgrammingHardware-Software Co-designHigh-Performance ComputingKernel OptimizationMemory Management OptimizationParallel ComputingPerformance ProfilingTensor Operations Optimization

Tags

CUDA ProgrammingDeep Learning Systems OptimizationDistributed ComputingGPU Architecture and ProgrammingHardware-Software Co-designHigh-Performance ComputingKernel OptimizationMemory Management OptimizationParallel ComputingPerformance ProfilingTensor Operations OptimizationAI AgentsBatchingBenchmarkingCachingCourseData ScienceDeepSeekEvalsForkedGPU / CUDAGame DevGoogle AIHuggingFaceKV CacheLLM ServingLarge Language ModelsModel OptimizationOpenAIPyTorchPythonQuantizationResearch / PapersSGLangSpeculative DecodingTensorRTTransformersTutorialvLLM

Taxonomy

AI Trends

AI Infrastructure OptimizationHardware-Efficient AILarge Scale Model TrainingGPU Computing Acceleration

category

Inference & ServingFoundation ModelsAI AgentsModel TrainingEvals & BenchmarkingCloud & PlatformsLearning ResourcesIndustry: GamingData Science & Analytics

Deployment Context

GPU ClustersCloud GPU InstancesOn-premise GPU SystemsData Center Infrastructure

Industries

AI ResearchCloud ComputingSemiconductorHigh-Performance Computing

Skill Areas

GPU Architecture and ProgrammingCUDA ProgrammingMemory Management OptimizationKernel OptimizationParallel ComputingPerformance ProfilingHardware-Software Co-designDeep Learning Systems OptimizationTensor Operations OptimizationDistributed ComputingHigh-Performance Computing

tag

AI AgentsBatchingBenchmarkingCachingCourseData ScienceDeepSeekEvalsForkedGPU / CUDAGame DevGoogle AIHuggingFaceKV CacheLLM ServingLarge Language ModelsModel OptimizationOpenAIPyTorchPythonQuantizationResearch / PapersSGLangSpeculative DecodingTensorRTTransformersTutorialvLLM

Use Cases

GPU Performance Optimization TrainingAI Infrastructure Engineering EducationDeep Learning Systems Optimization LearningCUDA Programming Skill DevelopmentPerformance Engineering Curriculum

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

0

Merge pull request #3 from jmaczan/patch-1

emilio andere • Mar 2, 2026

bde36bd

Update PMPP reference to 5th editionUpdate PMPP

Jędrzej Maczan • Mar 2, 2026

f3c0f88

Quality

beta
Quality
medium
Maturity
beta

Categories

Inference & ServingPrimaryEvals & BenchmarkingCloud & PlatformsLearning ResourcesIndustry: GamingData Science & AnalyticsFoundation ModelsAI AgentsModel TrainingSearch & KnowledgeOther AI / ML

PM Skills

Cost & EfficiencyData & EvaluationAI-Native Architecture

Languages

No language breakdown recorded.

Timeline

Project created
Jan 12, 2026
Forked
Feb 24, 2026
Your last push
2 months ago
Upstream last push
1 months ago
Tracked since
Mar 17, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…