Library/llm.cForked

karpathy/llm.c

llm.c

LLM training in simple, raw C/CUDA

View on GitHub↗Upstream karpathy/llm.c↗

Builder

karpathy

karpathy • individual

Stars

29,328

Using upstream star count

Forks

3,476

Using upstream fork count

Open Issues

Activity Score

0/100

0 commits in 30d

Created

Apr 8, 2024

Project creation date

README Summary

LLM.c is a minimalist implementation for training large language models written in pure C and CUDA, designed to be educational and performant without dependencies on heavy frameworks like PyTorch. The project provides a clean, readable codebase that demonstrates how to train GPT-style transformers from scratch using low-level programming languages. It aims to make LLM training more accessible and transparent by removing abstractions and showing the core mathematical operations directly.

AI Dev Skills

Unmapped

LLM Training PipelineCUDA OptimizationTransformer Architecture ImplementationAutoregressive Language Model TrainingGPU Memory OptimizationDistributed TrainingTokenization and PreprocessingInference OptimizationMixed Precision TrainingLow-level Performance TuningLarge Language Model TrainingLow-level Deep LearningMemory Management in Deep LearningNumerical OptimizationAutograd ImplementationBatch ProcessingPerformance Profiling and Optimization

Taxonomy

AI Trends

Small Language Models On-device AI Efficient Model Training Model Interpretability Hardware-aware Machine Learning Efficient Training Educational AI

Deployment Context

Self-hosted On-premise Edge/Mobile

Modalities

Text

Skill Areas

LLM Training Pipeline CUDA Optimization Transformer Architecture Implementation Autoregressive Language Model Training GPU Memory Optimization Distributed Training Tokenization and Preprocessing Inference Optimization Mixed Precision Training Low-level Performance Tuning Large Language Model Training Low-level Deep Learning Memory Management in Deep Learning Numerical Optimization Autograd Implementation Batch Processing Performance Profiling and Optimization

Use Cases

Educational LLM training from scratch Custom language model development Performance-critical LLM inference GPU kernel optimization research Reproducible LLM training benchmarking Minimal dependency production deployments Educational understanding of LLM training mechanics Systems-level optimization for neural network training Performance benchmarking of training algorithms Custom LLM training implementations Research into efficient training methods

Recent Activity

Updated 9 months ago

7 Days

30 Days

90 Days

Quality

beta

Quality: high
Maturity: beta

PM Skills

Developer Platform

Languages

Cuda100.0%

Timeline

Project created: Apr 8, 2024
Forked: Mar 28, 2026
Your last push: 9 months ago
Upstream last push: 9 months ago
Tracked since: Jun 26, 2025

Similar Repos

pgvector cosine similarity · $0

Loading…