openai/evals
evals
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Builder

OpenAI
openai • ai-lab
Stars
18,100
Using upstream star count
Forks
2,909
Using upstream fork count
Open Issues
0
Activity Score
0/100
0 commits in 30d
Created
Jan 23, 2023
Project creation date
README Summary
Evals is OpenAI's framework for evaluating Large Language Models (LLMs) and LLM systems through standardized benchmarks and metrics. It provides an open-source registry of evaluation benchmarks that can be used to assess model performance across various tasks. The framework enables researchers and developers to create, run, and share evaluations for language models in a consistent and reproducible manner.
AI Dev Skills
Unmapped
Tags
Taxonomy
AI Trends
Deployment Context
Modalities
Skill Areas
Recent Activity
Updated 5 months ago
7 Days
0
30 Days
0
90 Days
0
Quality
production- Quality
- high
- Maturity
- production
Categories
PM Skills
Languages
Timeline
- Project created
- Jan 23, 2023
- Forked
- Mar 12, 2026
- Your last push
- 5 months ago
- Upstream last push
- 7 days ago
- Tracked since
- Nov 3, 2025
Similar Repos
pgvector cosine similarity · $0
Loading…