Library/LMCache
Library/LMCacheForked

LMCache/LMCache

LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

Builder

LMCache

LMCache

LMCache • individual

Stars

7,864

Using upstream star count

Forks

1,063

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

May 28, 2024

Project creation date

README Summary

LMCache is a high-performance key-value cache layer designed to accelerate Large Language Model (LLM) inference by caching and reusing computed key-value pairs. It provides a fast, distributed caching system that can significantly reduce computation overhead and improve response times for LLM applications. The system is built in Python and offers easy integration with existing LLM workflows.

AI Dev Skills

Unmapped

Transformer ArchitectureKey-Value CachingLLM Inference OptimizationDistributed SystemsMemory ManagementAttention MechanismsPerformance Engineering

Tags

Transformer ArchitectureKey-Value CachingLLM Inference OptimizationDistributed SystemsMemory ManagementAttention MechanismsPerformance EngineeringOn-premiseTextSelf-hostedCloud ComputingLLM Inference AccelerationLatency Optimization for ChatbotsCloud APILLM OptimizationDeveloper ToolsCost Reduction for AI ApplicationsCost-Effective AIEfficient Model ServingInference EfficiencyPython

Taxonomy

Recent Activity

Updated 5 months ago

7 Days

0

30 Days

0

90 Days

0

Quality

prototype
Quality
medium
Maturity
prototype

Categories

Dev Tools & AutomationPrimaryInference & ServingOther AI / MLFoundation Models

PM Skills

Developer Platform

Languages

Python100.0%

Timeline

Project created
May 28, 2024
Forked
Nov 2, 2025
Your last push
5 months ago
Upstream last push
6 days ago
Tracked since
Nov 2, 2025

Similar Repos

pgvector cosine similarity · $0

Loading…