Library/AgentBenchForked

THUDM/AgentBench

AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

View on GitHub↗Upstream THUDM/AgentBench↗

Builder

THUDM

THUDM • individual

Stars

3,458

Using upstream star count

Forks

257

Using upstream fork count

Open Issues

Activity Score

0/100

0 commits in 30d

Created

Jul 28, 2023

Project creation date

README Summary

<p align="center"> <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vRR3Wl7wsCgHpwUw1_eUXW_fptAPLL3FkhnW_rua0O1Ji_GIVrpTjY5LaKAhwO-WeARjnY_KNw0SYNJ/pubhtml" target="_blank">🌐 Leaderboard (new)</a> | <a href="https://twitter.com/thukeg" target="_blank">🐦 Twitter</a> | <a href="mailto:agentbench@googlegroups.com">✉️ Google Group</a> | <a href="https://arxiv.org/abs/2308.03688" target="_blank">📃 Paper </a> </p>

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Taxonomy

AI Trends

Agentic AI LLM Evaluation Agent Benchmarking Autonomous AI Systems

Recent Activity

Updated 3 months ago

7 Days

30 Days

90 Days

Merge pull request #213 from mkimhi/agentbench-lite-suite

Shaw • Feb 8, 2026

d1e4a10

Docs: clarify Python 3.9 recommended for dependency install

Moshe Kimhi • Feb 8, 2026

a3cc91a

Add CI smoke test for lite preset YAML configs

Moshe Kimhi • Feb 8, 2026

d3571d7

Quality

research

Quality: high
Maturity: research

PM Skills

Cost & EfficiencyScale & ReliabilityData & EvaluationProduct DiscoveryDeveloper PlatformAI-Native Architecture

Languages

Python100.0%

Timeline

Project created: Jul 28, 2023
Forked: Mar 22, 2026
Your last push: 3 months ago
Upstream last push: 3 months ago
Tracked since: Feb 8, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…

Library/AgentBenchForked

THUDM/AgentBench

AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

View on GitHub↗Upstream THUDM/AgentBench↗

Builder

THUDM

THUDM • individual

Stars

3,458

Using upstream star count

Forks

257

Using upstream fork count

Open Issues

Activity Score

0/100

0 commits in 30d

Created

Jul 28, 2023

Project creation date

README Summary

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Taxonomy

AI Trends

Agentic AI LLM Evaluation Agent Benchmarking Autonomous AI Systems

Recent Activity

Updated 3 months ago

7 Days

30 Days

90 Days

Merge pull request #213 from mkimhi/agentbench-lite-suite

Shaw • Feb 8, 2026

d1e4a10

Docs: clarify Python 3.9 recommended for dependency install

Moshe Kimhi • Feb 8, 2026

a3cc91a

Add CI smoke test for lite preset YAML configs

Moshe Kimhi • Feb 8, 2026

d3571d7

Quality

research

Quality: high
Maturity: research

PM Skills

Cost & EfficiencyScale & ReliabilityData & EvaluationProduct DiscoveryDeveloper PlatformAI-Native Architecture

Languages

Python100.0%

Timeline

Project created: Jul 28, 2023
Forked: Mar 22, 2026
Your last push: 3 months ago
Upstream last push: 3 months ago
Tracked since: Feb 8, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…

AgentBench

README Summary

Community Evaluation

AI Dev Skills

Tags

Taxonomy

Recent Activity

Quality

Categories

PM Skills

Languages

Timeline

Similar Repos

AgentBench

README Summary

Community Evaluation

AI Dev Skills

Tags

Taxonomy

Recent Activity

Quality

Categories

PM Skills

Languages

Timeline

Similar Repos