Library/human-eval
Library/human-evalForked

openai/human-eval

human-eval

Code for the paper "Evaluating Large Language Models Trained on Code"

Builder

OpenAI

OpenAI

openai • ai-lab

Stars

3,185

Using upstream star count

Forks

442

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Jul 6, 2021

Project creation date

README Summary

HumanEval is a dataset and evaluation framework for measuring the code generation capabilities of large language models. It consists of 164 hand-written programming problems with unit tests, designed to evaluate whether models can generate functionally correct Python code from natural language descriptions. The repository provides tools for running evaluations and measuring pass@k metrics for code completion tasks.

AI Dev Skills

Unmapped

Large Language Model EvaluationCode Generation AssessmentNatural Language to Code TranslationAutomated Code TestingMachine Learning BenchmarkingProgramming Language Understanding

Tags

Large Language Model EvaluationCode Generation AssessmentNatural Language to Code TranslationAutomated Code TestingMachine Learning BenchmarkingProgramming Language UnderstandingAI-Assisted ProgrammingCode Generation EvaluationCode Generation ModelsAI Coding Assistant AssessmentSelf-hostedDeveloper ToolsCodeModel Performance ComparisonTextLanguage Model EvaluationLanguage Model BenchmarkingPython

Taxonomy

Recent Activity

Updated 1 years ago

7 Days

0

30 Days

0

90 Days

0

Quality

research
Quality
high
Maturity
research

Categories

Evals & BenchmarkingPrimaryDev Tools & AutomationNLP & TextCoding & Dev ToolsOther AI / MLFoundation Models

PM Skills

Developer Platform

Languages

Python100.0%

Timeline

Project created
Jul 6, 2021
Forked
Mar 14, 2026
Your last push
1 years ago
Upstream last push
1 years ago
Tracked since
Jan 17, 2025

Similar Repos

pgvector cosine similarity · $0

Loading…