anthropics/hh-rlhf
hh-rlhf
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
Builder

Anthropic
anthropics • ai-lab
Stars
1,833
Using upstream star count
Forks
153
Using upstream fork count
Open Issues
0
Activity Score
0/100
0 commits in 30d
Created
Apr 10, 2022
Project creation date
README Summary
This repository contains human preference data used for training AI assistants through reinforcement learning from human feedback (RLHF). The dataset includes conversations between humans and AI assistants, with human annotations indicating which responses are more helpful and harmless. This data was used in Anthropic's research on training AI systems to be both helpful and safe through human feedback.
AI Dev Skills
Unmapped
Tags
Taxonomy
Deployment Context
Modalities
Skill Areas
Recent Activity
Updated 10 months ago
7 Days
0
30 Days
0
90 Days
0
Quality
research- Quality
- high
- Maturity
- research
Categories
PM Skills
Languages
No language breakdown recorded.
Timeline
- Project created
- Apr 10, 2022
- Forked
- Mar 14, 2026
- Your last push
- 10 months ago
- Upstream last push
- 10 months ago
- Tracked since
- Jun 17, 2025
Similar Repos
pgvector cosine similarity · $0
Loading…