Library/skillForked

pinchbench/skill

skill

PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai

Builder

pinchbench

pinchbench

pinchbench • individual

Stars

906

Using upstream star count

Forks

95

Using upstream fork count

Open Issues

0

Activity Score

0/100

22 commits in 30d

Created

Feb 11, 2026

Project creation date

README Summary

PinchBench is a benchmarking system designed to evaluate Large Language Model (LLM) performance as OpenClaw coding agents. The system is built using Python and developed by the team at kilo.ai to provide standardized evaluation metrics for LLM coding capabilities.

AI Dev Skills

Unmapped

LLM Evaluation and BenchmarkingAgentic AI SystemsCode Generation ModelsModel Performance MetricsAgent-Based ArchitectureSoftware Engineering AILarge Language Model EvaluationCode Generation BenchmarkingLLM-as-Agent ArchitectureCoding Task EvaluationModel Performance MeasurementBenchmark Design and MethodologyCode Generation AssessmentPrompt EngineeringModel Comparison and AnalysisSoftware Engineering Agents

Tags

LLM Evaluation and BenchmarkingAgentic AI SystemsCode Generation ModelsModel Performance MetricsAgent-Based ArchitectureSoftware Engineering AILarge Language Model EvaluationCode Generation BenchmarkingLLM-as-Agent ArchitectureCoding Task EvaluationModel Performance MeasurementBenchmark Design and MethodologyCode Generation AssessmentPrompt EngineeringModel Comparison and AnalysisSoftware Engineering Agents

Taxonomy

Recent Activity

Updated 20 days ago

7 Days

0

30 Days

22

90 Days

106

Quality

prototype
Quality
medium
Maturity
prototype

Categories

Dev Tools & AutomationPrimaryFoundation ModelsAI AgentsCoding & Dev ToolsOther AI / MLEvals & Benchmarking

PM Skills

Developer Platform

Languages

Python100.0%

Timeline

Project created
Feb 11, 2026
Forked
Mar 28, 2026
Your last push
20 days ago
Upstream last push
7 days ago
Tracked since
Mar 24, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…