THUDM/AgentBench
AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
Builder

THUDM
THUDM • individual
Stars
3,295
Using upstream star count
Forks
242
Using upstream fork count
Open Issues
0
Activity Score
0/100
0 commits in 30d
Created
Jul 28, 2023
Project creation date
README Summary
AgentBench is a comprehensive benchmark designed to evaluate Large Language Models (LLMs) as agents across multiple environments and tasks. It provides a systematic framework for assessing how well LLMs can perform autonomous decision-making and interact with various environments. The benchmark was accepted at ICLR 2024 and offers standardized evaluation protocols for agent capabilities.
AI Dev Skills
Unmapped
Tags
Taxonomy
Deployment Context
Modalities
Skill Areas
Recent Activity
Updated 2 months ago
7 Days
0
30 Days
0
90 Days
0
Quality
research- Quality
- high
- Maturity
- research
Categories
PM Skills
Languages
Timeline
- Project created
- Jul 28, 2023
- Forked
- Mar 22, 2026
- Your last push
- 2 months ago
- Upstream last push
- 2 months ago
- Tracked since
- Feb 8, 2026
Similar Repos
pgvector cosine similarity · $0
Loading…