pinchbench/skill
skill
PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai
Builder

pinchbench
pinchbench • individual
Stars
906
Using upstream star count
Forks
95
Using upstream fork count
Open Issues
0
Activity Score
0/100
22 commits in 30d
Created
Feb 11, 2026
Project creation date
README Summary
PinchBench is a benchmarking system designed to evaluate Large Language Model (LLM) performance as OpenClaw coding agents. The system is built using Python and developed by the team at kilo.ai to provide standardized evaluation metrics for LLM coding capabilities.
AI Dev Skills
Unmapped
LLM Evaluation and BenchmarkingAgentic AI SystemsCode Generation ModelsModel Performance MetricsAgent-Based ArchitectureSoftware Engineering AILarge Language Model EvaluationCode Generation BenchmarkingLLM-as-Agent ArchitectureCoding Task EvaluationModel Performance MeasurementBenchmark Design and MethodologyCode Generation AssessmentPrompt EngineeringModel Comparison and AnalysisSoftware Engineering Agents
Tags
LLM Evaluation and BenchmarkingAgentic AI SystemsCode Generation ModelsModel Performance MetricsAgent-Based ArchitectureSoftware Engineering AILarge Language Model EvaluationCode Generation BenchmarkingLLM-as-Agent ArchitectureCoding Task EvaluationModel Performance MeasurementBenchmark Design and MethodologyCode Generation AssessmentPrompt EngineeringModel Comparison and AnalysisSoftware Engineering Agents
Taxonomy
AI Trends
Deployment Context
Skill Areas
LLM Evaluation and BenchmarkingAgentic AI SystemsCode Generation ModelsModel Performance MetricsAgent-Based ArchitectureSoftware Engineering AILarge Language Model EvaluationCode Generation BenchmarkingLLM-as-Agent ArchitectureCoding Task EvaluationModel Performance MeasurementBenchmark Design and MethodologyCode Generation AssessmentPrompt EngineeringModel Comparison and AnalysisSoftware Engineering Agents
Use Cases
LLM Coding Agent BenchmarkingAutonomous Code Generation EvaluationModel Comparison and SelectionAgent Performance MeasurementLLM coding agent performance evaluationModel comparison and rankingCode generation quality assessmentAutomated programming task benchmarkingLLM Model EvaluationCoding Agent Performance ComparisonCode Generation Quality AssessmentAgent Capability Benchmarking
Recent Activity
Updated 20 days ago
7 Days
0
30 Days
22
90 Days
106
Quality
prototype- Quality
- medium
- Maturity
- prototype
Categories
Dev Tools & AutomationPrimaryFoundation ModelsAI AgentsCoding & Dev ToolsOther AI / MLEvals & Benchmarking
PM Skills
Developer Platform
Languages
Python100.0%
Timeline
- Project created
- Feb 11, 2026
- Forked
- Mar 28, 2026
- Your last push
- 20 days ago
- Upstream last push
- 7 days ago
- Tracked since
- Mar 24, 2026
Similar Repos
pgvector cosine similarity · $0
Loading…