Library/evalsForked

openai/evals

evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

View on GitHub↗Upstream openai/evals↗

Builder

OpenAI

openai • ai-lab

Stars

18,561

Using upstream star count

Forks

2,971

Using upstream fork count

Open Issues

Activity Score

0/100

0 commits in 30d

Created

Jan 23, 2023

Project creation date

README Summary

> You can now configure and run Evals directly in the OpenAI Dashboard. [Get started →](https://platform.openai.com/docs/guides/evals)

Community Evaluation

Loading…

AI Dev Skills

Unmapped

AI Safety EvaluationBenchmark Design and ImplementationCapability AssessmentEvaluation Metrics and ScoringLarge Language Model EvaluationLLM System TestingModel Comparison and AnalysisModel Performance AssessmentPrompt Engineering

Taxonomy

AI Trends

AI Safety Large Language Models Model Evaluation and Testing Responsible AI Development AI Benchmarking

Recent Activity

Updated 7 months ago

7 Days

30 Days

90 Days

Quality

production

Quality: high
Maturity: production

PM Skills

Data & Evaluation

Languages

Python100.0%

Timeline

Project created: Jan 23, 2023
Forked: Mar 12, 2026
Your last push: 7 months ago
Upstream last push: 1 months ago
Tracked since: Nov 3, 2025

Similar Repos

pgvector cosine similarity · $0

Loading…

Library/evalsForked

openai/evals

evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

View on GitHub↗Upstream openai/evals↗

Builder

OpenAI

openai • ai-lab

Stars

18,561

Using upstream star count

Forks

2,971

Using upstream fork count

Open Issues

Activity Score

0/100

0 commits in 30d

Created

Jan 23, 2023

Project creation date

README Summary

> You can now configure and run Evals directly in the OpenAI Dashboard. [Get started →](https://platform.openai.com/docs/guides/evals)

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Taxonomy

AI Trends

AI Safety Large Language Models Model Evaluation and Testing Responsible AI Development AI Benchmarking

Recent Activity

Updated 7 months ago

7 Days

30 Days

90 Days

Quality

production

Quality: high
Maturity: production

PM Skills

Data & Evaluation

Languages

Python100.0%

Timeline

Project created: Jan 23, 2023
Forked: Mar 12, 2026
Your last push: 7 months ago
Upstream last push: 1 months ago
Tracked since: Nov 3, 2025

Similar Repos

pgvector cosine similarity · $0

Loading…

evals

README Summary

Community Evaluation

AI Dev Skills

Tags

Taxonomy

Recent Activity

Quality

Categories

PM Skills

Languages

Timeline

Similar Repos

evals

README Summary

Community Evaluation

AI Dev Skills

Tags

Taxonomy

Recent Activity

Quality

Categories

PM Skills

Languages

Timeline

Similar Repos