Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/trl
Library/trlForked

huggingface/trl

trl

Train transformer language models with reinforcement learning.

View on GitHub↗Upstream huggingface/trl↗

Builder

HuggingFace

HuggingFace

huggingface • ai-lab

Stars

18,493

Using upstream star count

Forks

2,755

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Mar 27, 2020

Project creation date

README Summary

<div style="text-align: center"> <picture> <source media="(prefers-color-scheme: light)" srcset="https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/TRL%20banner%20light.png"> <img src="https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/trl_banner_dark.png" alt="TRL Banner"> </picture> </div>

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Constitutional AIDirect Preference Optimization (DPO)Human Preference LearningLanguage Model TrainingPolicy Gradient MethodsProximal Policy Optimization (PPO)Reinforcement Learning from Human Feedback (RLHF)Reward ModelingTransformer Fine-tuning

Tags

Constitutional AIDirect Preference Optimization (DPO)Human Preference LearningLanguage Model TrainingPolicy Gradient MethodsProximal Policy Optimization (PPO)Reinforcement Learning from Human Feedback (RLHF)Reward ModelingTransformer Fine-tuningAI AgentsBackendCLI ToolDPODeepSeekDeepSpeedFSDPFine-TuningForkedGRPOHuggingFaceLlamaLoRA / PEFTPyTorchPythonQuantizationQwenReinforcement LearningTRLTransformersTutorialUnsloth

Taxonomy

AI Trends

AI SafetyHuman-AI AlignmentConstitutional AILarge Language ModelsInstruction Following

category

Model TrainingFoundation ModelsAI AgentsDev Tools & AutomationLearning Resources

Deployment Context

Cloud APISelf-hostedOn-premise

Modalities

Text

Skill Areas

Reinforcement Learning from Human Feedback (RLHF)Proximal Policy Optimization (PPO)Direct Preference Optimization (DPO)Transformer Fine-tuningLanguage Model TrainingHuman Preference LearningPolicy Gradient MethodsReward ModelingConstitutional AI

tag

AI AgentsBackendCLI ToolDPODeepSeekDeepSpeedFSDPFine-TuningForkedGRPOHuggingFaceLlamaLoRA / PEFTPyTorchPythonQuantizationQwenReinforcement LearningTRLTransformersTutorialUnsloth

Use Cases

Language Model AlignmentInstruction Following TrainingConversational AI DevelopmentAI Safety TrainingHuman Preference LearningChat Model Fine-tuning

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

20

Remove custom get_train/eval_dataloader from OnlineDPO (#5291)

Albert Villanova del Moral • Mar 16, 2026

d46131f

Remove TrainingArguments import from experimental trainers (#5290)

Albert Villanova del Moral • Mar 16, 2026

85cf8f4

Fix `accuracy_reward` crash when called from non-main thread (#5281)

Quentin Gallouédec • Mar 16, 2026

91e3da0

Quality

production
Quality
high
Maturity
production

Categories

Model TrainingPrimaryAI AgentsOther AI / MLFoundation ModelsDev Tools & AutomationLearning Resources

PM Skills

Cost & EfficiencyDeveloper PlatformAI-Native Architecture

Languages

Python100.0%

Timeline

Project created
Mar 27, 2020
Forked
Mar 13, 2026
Your last push
2 months ago
Upstream last push
16 days ago
Tracked since
Mar 17, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…