Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/trulens
Library/trulensForked

truera/trulens

trulens

Evaluation and Tracking for LLM Experiments and AI Agents

View on GitHub↗Upstream truera/trulens↗

Builder

truera

truera

truera • individual

Stars

3,352

Using upstream star count

Forks

284

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Nov 2, 2020

Project creation date

README Summary

**Don't just vibe-check your LLM app!** Systematically evaluate and track your LLM experiments with TruLens. As you develop your app including prompts, models, retrievers, knowledge sources and more, *TruLens* is the tool you need to understand its performance.

Community Evaluation

Loading…

AI Dev Skills

Unmapped

AI Agent MonitoringAI Quality AssuranceAI System ObservabilityLarge Language Model EvaluationLLM Application TestingLLM Performance MetricsMachine Learning OperationsNatural Language Processing EvaluationPrompt Engineering OptimizationRetrieval-Augmented Generation Assessment

Tags

AI Agent MonitoringAI Quality AssuranceAI System ObservabilityLarge Language Model EvaluationLLM Application TestingLLM Performance MetricsMachine Learning OperationsNatural Language Processing EvaluationPrompt Engineering OptimizationRetrieval-Augmented Generation AssessmentEvalsForkedJupyterLangChainLarge Language Models

Taxonomy

AI Trends

Agentic AIAI SafetyAI ObservabilityLLMOpsRetrieval-Augmented GenerationAI EvaluationResponsible AI

category

Foundation ModelsAI AgentsEvals & BenchmarkingData Science & Analytics

Deployment Context

Self-hostedCloud APIOn-premise

Industries

Developer ToolsEnterprise AIAI/ML Platforms

Modalities

Text

Skill Areas

Large Language Model EvaluationAI Agent MonitoringRetrieval-Augmented Generation AssessmentLLM Application TestingMachine Learning OperationsAI System ObservabilityNatural Language Processing EvaluationPrompt Engineering OptimizationAI Quality AssuranceLLM Performance Metrics

tag

EvalsForkedJupyterLangChainLarge Language Models

Use Cases

LLM Application EvaluationAI Agent Performance TrackingRAG System AssessmentLLM Response Quality MonitoringAI Application TestingModel Performance BenchmarkingLLM Deployment ValidationAI System DebuggingPrompt Engineering Optimization

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

7

[example]hybrid search rag evals example (#2376)

Tarun Jain • Mar 19, 2026

d956edf

the endpoint is Langchain but the docs misguides to LangChain (#2375)

Tarun Jain • Mar 18, 2026

a483029

Deprecate run_dashboard_sis and direct Snowflake users to Snowsight Evaluations. (#2370)

Josh Reini • Mar 17, 2026

65f53b0

Quality

beta
Quality
high
Maturity
beta

Categories

Evals & BenchmarkingPrimaryData Science & AnalyticsFoundation ModelsAI AgentsOther AI / ML

PM Skills

Data & Evaluation

Languages

Python100.0%

Timeline

Project created
Nov 2, 2020
Forked
Mar 22, 2026
Your last push
2 months ago
Upstream last push
17 days ago
Tracked since
Mar 21, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…