Library/deepeval
Library/deepevalForked

confident-ai/deepeval

deepeval

The LLM Evaluation Framework

Builder

Confident AI

Confident AI

confident-ai • startup

Stars

14,402

Using upstream star count

Forks

1,318

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Aug 10, 2023

Project creation date

README Summary

DeepEval is a comprehensive LLM evaluation framework that provides unit testing capabilities for LLM outputs with various metrics like hallucination, toxicity, and bias detection. It offers both synthesized and real dataset evaluation with integration support for popular platforms like Pytest, making it easy to incorporate LLM testing into existing workflows. The framework includes confidence scoring and supports custom metrics for thorough evaluation of language model performance.

AI Dev Skills

Unmapped

LLM Evaluation MetricsModel Performance TestingBias Detection and MeasurementHallucination DetectionToxicity AssessmentRetrieval-Augmented Generation EvaluationPrompt Engineering ValidationAI Safety TestingModel BenchmarkingAutomated Testing Pipelines

Tags

LLM Evaluation MetricsModel Performance TestingBias Detection and MeasurementHallucination DetectionToxicity AssessmentRetrieval-Augmented Generation EvaluationPrompt Engineering ValidationAI Safety TestingModel BenchmarkingAutomated Testing PipelinesEnterprise SoftwareAI Testing and ValidationTextAI Safety ValidationCI/CD PipelinesResearch and AcademiaCompound AI SystemsAI SafetyPrompt Optimization TestingBias and Fairness TestingSelf-hostedCloud APIModel Comparison and SelectionLLM Application TestingDeveloper ToolsMLOpsAI/ML PlatformsResponsible AIDevelopment EnvironmentModel Quality AssuranceAutomated Performance BenchmarkingRAG System EvaluationLLM EvaluationPython

Taxonomy

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

0

Quality

production
Quality
high
Maturity
production

Categories

MLOps & InfrastructurePrimaryDev Tools & AutomationLearning ResourcesOther AI / MLRAG & RetrievalEvals & BenchmarkingML Platform & InfrastructureSafety & AlignmentSearch & KnowledgeFoundation ModelsAI Agents

PM Skills

Scale & ReliabilityDeveloper Platform

Languages

Python100.0%

Timeline

Project created
Aug 10, 2023
Forked
Nov 8, 2025
Your last push
2 months ago
Upstream last push
8 days ago
Tracked since
Feb 6, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…