Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/mtp-lm
Library/mtp-lmForked

CelestialCreator/mtp-lm

mtp-lm

Source code to accompany research paper on training multi token prediction language models using self-distillation.

View on GitHub↗Upstream CelestialCreator/mtp-lm↗

Builder

CelestialCreator

CelestialCreator

CelestialCreator • individual

Stars

4

Using upstream star count

Forks

2

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Mar 5, 2026

Project creation date

README Summary

This repository contains the code for the arXiv preprint: [[2602.06019] Multi-Token Prediction via Self-Distillation](https://arxiv.org/abs/2602.06019)

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Deep Learning ResearchKnowledge DistillationLanguage Model TrainingLarge Language Model DevelopmentMulti-Token PredictionNeural Network OptimizationSelf-DistillationTransformer Architecture

Tags

Deep Learning ResearchKnowledge DistillationLanguage Model TrainingLarge Language Model DevelopmentMulti-Token PredictionNeural Network OptimizationSelf-DistillationTransformer ArchitectureAWSAutomationBenchmarkingCLI ToolCourseDeepSeekDistillationEmbeddingsEvalsFSDPFine-TuningForkedGPU / CUDAGemmaHuggingFaceJupyterLM Eval HarnessLarge Language ModelsLlamaLoRA / PEFTLong ContextMMLUMistralMobileOpenAIPlanning / CoTPyTorchPythonQuantizationQwenResearch / PapersTool UseTransformersTutorialWeights & Biases

Taxonomy

AI Trends

Model EfficiencyKnowledge DistillationLanguage Model InnovationResearch Reproducibility

category

Foundation ModelsAI AgentsRAG & RetrievalModel TrainingEvals & BenchmarkingObservability & MonitoringInference & ServingDev Tools & AutomationCloud & PlatformsLearning ResourcesData Science & Analytics

Deployment Context

Self-hosted

Modalities

Text

Skill Areas

Multi-Token PredictionSelf-DistillationLanguage Model TrainingTransformer ArchitectureKnowledge DistillationLarge Language Model DevelopmentDeep Learning ResearchNeural Network Optimization

tag

AWSAutomationBenchmarkingCLI ToolCourseDeepSeekDistillationEmbeddingsEvalsFSDPFine-TuningForkedGPU / CUDAGemmaHuggingFaceJupyterLM Eval HarnessLarge Language ModelsLlamaLoRA / PEFTLong ContextMMLUMistralMobileOpenAIPlanning / CoTPyTorchPythonQuantizationQwenResearch / PapersTool UseTransformersTutorialWeights & Biases

Use Cases

Language Model ResearchEfficient Model TrainingAcademic Research ImplementationModel Compression Techniques

Recent Activity

Updated 3 months ago

7 Days

0

30 Days

0

90 Days

1

Reproduce MTP self-distillation on single RTX 5090 (Llama-3.2-1B)

Akshay Mhaskar • Mar 5, 2026

13cb9d1

initial public release

jwkirchenbauer • Feb 21, 2026

167413e

Quality

research
Quality
medium
Maturity
research

Categories

RAG & RetrievalPrimaryEvals & BenchmarkingObservability & MonitoringInference & ServingDev Tools & AutomationCloud & PlatformsLearning ResourcesData Science & AnalyticsFoundation ModelsAI AgentsModel TrainingSafety & AlignmentEdge & Mobile AISearch & KnowledgeOther AI / ML

PM Skills

Cost & EfficiencyData & EvaluationProduct DiscoveryDeveloper PlatformAI-Native Architecture

Languages

Python100.0%

Timeline

Project created
Mar 5, 2026
Forked
Mar 12, 2026
Your last push
3 months ago
Upstream last push
3 months ago
Tracked since
Mar 5, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…