Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/Megatron-LM
Library/Megatron-LMForked

NVIDIA/Megatron-LM

Megatron-LM

Ongoing research training transformer models at scale

View on GitHub↗Upstream NVIDIA/Megatron-LM↗

Builder

NVIDIA

NVIDIA

NVIDIA • big-tech

Stars

16,507

Using upstream star count

Forks

4,013

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Mar 21, 2019

Project creation date

README Summary

Megatron-LM and Megatron Core =============================

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Data ParallelismDistributed Deep LearningDistributed SystemsGPU ComputingGradient AccumulationHigh Performance ComputingLarge Language Model TrainingMemory OptimizationMixed Precision TrainingModel ParallelismPipeline ParallelismTransformer Architecture

Tags

Data ParallelismDistributed Deep LearningDistributed SystemsGPU ComputingGradient AccumulationHigh Performance ComputingLarge Language Model TrainingMemory OptimizationMixed Precision TrainingModel ParallelismPipeline ParallelismTransformer ArchitectureBackendBenchmarkingDeepSeekDistillationEvalsFSDPForkedHuggingFaceLarge Language ModelsMistralModel OptimizationMultimodal AIOpenAIPythonQuantizationRLHFReinforcement LearningResearch / PapersRoadmapTensorRTTransformersTutorial

Taxonomy

AI Trends

Large Language ModelsFoundation ModelsScaling LawsDistributed AI Training

category

Foundation ModelsModel TrainingEvals & BenchmarkingInference & ServingDev Tools & AutomationLearning Resources

Deployment Context

Multi-GPU ClustersHigh Performance ComputingCloud Infrastructure

Modalities

Text

Skill Areas

Large Language Model TrainingDistributed Deep LearningModel ParallelismData ParallelismPipeline ParallelismTransformer ArchitectureGPU ComputingMixed Precision TrainingGradient AccumulationDistributed SystemsHigh Performance ComputingMemory Optimization

tag

BackendBenchmarkingDeepSeekDistillationEvalsFSDPForkedHuggingFaceLarge Language ModelsMistralModel OptimizationMultimodal AIOpenAIPythonQuantizationRLHFReinforcement LearningResearch / PapersRoadmapTensorRTTransformersTutorial

Use Cases

Large Language Model Pre-trainingFoundation Model DevelopmentMulti-billion Parameter Model TrainingDistributed Training ResearchScalable Transformer Training

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

20

[Megatron-FSDP] Support 'auto' argument which defaults to pre-MixedPrecisionPolicy be… (#3810)

Cory Ye • Mar 17, 2026

ff70b24

Use fp32 state dtypes for Mamba inference functional test (#3888)

Keshav Santhanam • Mar 16, 2026

72b10a8

Fix quantize.py script and support packed sequences in pretrain_gpt.py (#3564)

Asha Anoosheh • Mar 16, 2026

f89744b

Quality

research
Quality
high
Maturity
research

Categories

Evals & BenchmarkingPrimaryInference & ServingDev Tools & AutomationLearning ResourcesFoundation ModelsModel TrainingMultimodal AISearch & KnowledgeOther AI / ML

PM Skills

Cost & EfficiencyUser ExperienceData & Evaluation

Languages

Python100.0%

Timeline

Project created
Mar 21, 2019
Forked
Mar 14, 2026
Your last push
2 months ago
Upstream last push
16 days ago
Tracked since
Mar 17, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…