Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/bigcode-evaluation-harness
Library/bigcode-evaluation-harnessForked

bigcode-project/bigcode-evaluation-harness

bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.

View on GitHub↗Upstream bigcode-project/bigcode-evaluation-harness↗

Builder

bigcode-project

bigcode-project

bigcode-project • individual

Stars

1,047

Using upstream star count

Forks

263

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Aug 9, 2022

Project creation date

README Summary

<h1 align="center">Code Generation LM Evaluation Harness</h1>

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Autoregressive Language ModelingCode CompletionCode Generation EvaluationFew-shot LearningLarge Language Model EvaluationModel BenchmarkingNatural Language to Code TranslationProgram Synthesis

Tags

Autoregressive Language ModelingCode CompletionCode Generation EvaluationFew-shot LearningLarge Language Model EvaluationModel BenchmarkingNatural Language to Code TranslationProgram SynthesisBenchmarkingC++DockerEvalsForkedGPU / CUDAHuggingFaceHumanEvalJavaScriptLM Eval HarnessOpenAIPyTorchPythonResearch / PapersSecurityTensorFlowTransformers

Taxonomy

AI Trends

Large Language ModelsCode Generation AIAI-Assisted ProgrammingModel Evaluation Standards

category

Foundation ModelsModel TrainingEvals & BenchmarkingInference & ServingMLOps & InfrastructureDev Tools & AutomationLearning ResourcesSecurity & Safety

Deployment Context

Self-hostedResearch EnvironmentCloud Computing

Industries

Developer ToolsSoftware DevelopmentAI Research

Modalities

CodeText

Skill Areas

Autoregressive Language ModelingCode Generation EvaluationNatural Language to Code TranslationModel BenchmarkingFew-shot LearningCode CompletionProgram SynthesisLarge Language Model Evaluation

tag

BenchmarkingC++DockerEvalsForkedGPU / CUDAHuggingFaceHumanEvalJavaScriptLM Eval HarnessOpenAIPyTorchPythonResearch / PapersSecurityTensorFlowTransformers

Use Cases

Code Generation Model BenchmarkingAutomated Programming Assistant EvaluationCode Completion System TestingNatural Language to Code Translation AssessmentProgramming Language Model Comparison

Recent Activity

Updated 10 months ago

7 Days

0

30 Days

0

90 Days

0

Quality

research
Quality
high
Maturity
research

Categories

Evals & BenchmarkingPrimaryInference & ServingMLOps & InfrastructureDev Tools & AutomationLearning ResourcesSecurity & SafetyFoundation ModelsModel TrainingSearch & KnowledgeOther AI / ML

PM Skills

Scale & ReliabilityData & Evaluation

Languages

Python100.0%

Timeline

Project created
Aug 9, 2022
Forked
Mar 22, 2026
Your last push
10 months ago
Upstream last push
10 months ago
Tracked since
Jul 22, 2025

Similar Repos

pgvector cosine similarity · $0

Loading…