stanford-crfm/helm
helm
Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models (CRFM) at Stanford for holistic, reproducible and transparent evaluation of foundation models, including large language models (LLMs) and multimodal models.
Builder

Stanford
stanford-crfm • research
Stars
2,735
Using upstream star count
Forks
369
Using upstream fork count
Open Issues
0
Activity Score
0/100
0 commits in 30d
Created
Nov 29, 2021
Project creation date
README Summary
HELM (Holistic Evaluation of Language Models) is Stanford CRFM's comprehensive Python framework for evaluating foundation models including LLMs and multimodal models. The framework emphasizes holistic assessment across multiple dimensions, reproducibility through standardized benchmarks, and transparency in evaluation methodologies. It provides researchers and practitioners with systematic tools to assess model performance, capabilities, and limitations across diverse tasks and scenarios.
AI Dev Skills
Unmapped
Tags
Taxonomy
Deployment Context
Modalities
Skill Areas
Recent Activity
Updated 24 days ago
7 Days
0
30 Days
0
90 Days
0
Quality
research- Quality
- high
- Maturity
- research
Categories
PM Skills
Languages
Timeline
- Project created
- Nov 29, 2021
- Forked
- Mar 22, 2026
- Your last push
- 24 days ago
- Upstream last push
- 7 days ago
- Tracked since
- Mar 20, 2026
Similar Repos
pgvector cosine similarity · $0
Loading…