Library/lm-evaluation-harness
Library/lm-evaluation-harnessForked

EleutherAI/lm-evaluation-harness

lm-evaluation-harness

A framework for few-shot evaluation of language models.

Builder

EleutherAI

EleutherAI

EleutherAI • ai-lab

Stars

11,988

Using upstream star count

Forks

3,148

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Aug 28, 2020

Project creation date

README Summary

The Language Model Evaluation Harness is a unified framework for few-shot evaluation of autoregressive language models. It provides a comprehensive suite of evaluation tasks and metrics to assess language model performance across various domains and capabilities. The framework supports multiple model backends and enables standardized comparison of different language models.

AI Dev Skills

Unmapped

Language Model EvaluationFew-shot LearningBenchmark DesignModel Performance AssessmentStatistical AnalysisNatural Language ProcessingPrompt EngineeringModel ComparisonStandardized Testing Frameworks

Tags

Language Model EvaluationFew-shot LearningBenchmark DesignModel Performance AssessmentStatistical AnalysisNatural Language ProcessingPrompt EngineeringModel ComparisonStandardized Testing FrameworksCloud APISelf-hostedResearch ValidationCapability AssessmentTextModel SelectionModel Performance ComparisonBenchmark StandardizationFew-shot Task EvaluationResponsible AIResearch ComputingAI SafetyModel InterpretabilityLanguage Model BenchmarkingPython

Taxonomy

Recent Activity

Updated 27 days ago

7 Days

0

30 Days

0

90 Days

0

Quality

production
Quality
high
Maturity
production

Categories

AI AgentsPrimaryEvals & BenchmarkingNLP & TextSearch & KnowledgeOther AI / MLLearning ResourcesSafety & AlignmentData Science & Analytics

PM Skills

Developer Platform

Languages

Python100.0%

Timeline

Project created
Aug 28, 2020
Forked
Mar 13, 2026
Your last push
27 days ago
Upstream last push
12 days ago
Tracked since
Mar 17, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…