Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/leetcuda
Library/leetcudaForked

xlite-dev/LeetCUDA

leetcuda

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

View on GitHub↗Upstream xlite-dev/LeetCUDA↗

Builder

xlite-dev

xlite-dev

xlite-dev • individual

Stars

11,125

Using upstream star count

Forks

1,124

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Dec 17, 2022

Project creation date

README Summary

<div align="center"> <p align="center"> <h2>📚 LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners 🐑</h2> <img src='https://github.com/user-attachments/assets/b2578723-b7a7-4d8f-bcd1-5008947b808a' width="360" height="56" > <a href="https://hellogithub.com/repository/98348655a96640ca8ddcbc298edc901d" target="_blank"><img src="https://api.hellogithub.com/v1/widgets/recommend.svg?rid=98348655a96640ca8ddcbc298edc901d&claim_uid=ofSCbzTmdeQk3FD&theme=dark" alt="Featured|HelloGitH

Community Evaluation

Loading…

AI Dev Skills

Unmapped

CUDA ProgrammingDeep Learning OptimizationFlashAttention ImplementationGPU ComputingHalf-Precision Matrix MultiplicationKernel DevelopmentMemory ManagementParallel ComputingPyTorch CUDA IntegrationTensor Core Optimization

Tags

CUDA ProgrammingDeep Learning OptimizationFlashAttention ImplementationGPU ComputingHalf-Precision Matrix MultiplicationKernel DevelopmentMemory ManagementParallel ComputingPyTorch CUDA IntegrationTensor Core OptimizationBenchmarkingC++CachingComputer VisionDeepSeekDeepSpeedDockerEmbeddingsEvalsForkedGPU / CUDAKV CacheLLM ServingLarge Language ModelsModel OptimizationONNXOpenAIPyTorchPythonQwenTensorFlowTensorRTTransformersvLLM

Taxonomy

AI Trends

GPU OptimizationEfficient TrainingHardware-Aware AI

category

Inference & ServingFoundation ModelsRAG & RetrievalModel TrainingEvals & BenchmarkingComputer VisionMLOps & Infrastructure

Deployment Context

Self-hosted

Industries

EducationDeveloper Tools

Modalities

Tensor

Skill Areas

CUDA ProgrammingGPU ComputingTensor Core OptimizationHalf-Precision Matrix MultiplicationFlashAttention ImplementationPyTorch CUDA IntegrationParallel ComputingDeep Learning OptimizationMemory ManagementKernel Development

tag

C++CachingComputer VisionDeepSeekDeepSpeedDockerEmbeddingsEvalsForkedGPU / CUDAKV CacheLLM ServingLarge Language ModelsModel OptimizationONNXOpenAIPyTorchPythonQwenTensorFlowTensorRTTransformersvLLMBenchmarking

Use Cases

CUDA Programming EducationGPU Kernel DevelopmentDeep Learning AccelerationPerformance Optimization LearningPyTorch CUDA Extension Development

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

2

Update README.md (#412)

DefTruth • Mar 19, 2026

3849b37

Update Cache-DiT release information in README

DefTruth • Mar 12, 2026

73379dc

Update README.md (#410)

DefTruth • Feb 25, 2026

13d66ab

Quality

research
Quality
medium
Maturity
research

Categories

Inference & ServingPrimaryRAG & RetrievalEvals & BenchmarkingMLOps & InfrastructureFoundation ModelsModel TrainingComputer VisionEdge & Mobile AIOther AI / ML

PM Skills

Cost & EfficiencyScale & ReliabilityData & EvaluationProduct Discovery

Languages

Cuda100.0%

Timeline

Project created
Dec 17, 2022
Forked
Mar 23, 2026
Your last push
2 months ago
Upstream last push
17 days ago
Tracked since
Mar 23, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…