Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/lmdeploy
Library/lmdeployForked

InternLM/lmdeploy

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

View on GitHub↗Upstream InternLM/lmdeploy↗

Builder

InternLM

InternLM

InternLM • individual

Stars

7,875

Using upstream star count

Forks

699

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Jun 15, 2023

Project creation date

README Summary

<div align="center"> <img src="docs/en/_static/image/lmdeploy-logo.svg" width="450"/>

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Distributed Model ServingGPU AccelerationHigh-Performance Inference ServingLarge Language Model DeploymentModel Compression and QuantizationModel OptimizationProduction ML SystemsTransformer Model Optimization

Tags

Distributed Model ServingGPU AccelerationHigh-Performance Inference ServingLarge Language Model DeploymentModel Compression and QuantizationModel OptimizationProduction ML SystemsTransformer Model OptimizationBatchingBenchmarkingCachingDeepSeekDeepSpeedEvalsForkedGPTGPU / CUDAGemmaHuggingFaceInferenceKV CacheLLM ServingLarge Language ModelsLlamaLong ContextMistralMultimodal AIOpenAIPhiPyTorchPythonQuantizationQwenResearch / PapersvLLM

Taxonomy

AI Trends

LLM OptimizationEfficient AI InferenceProduction AI SystemsModel CompressionEnterprise AI Deployment

category

Foundation ModelsModel TrainingEvals & BenchmarkingInference & ServingLearning Resources

Deployment Context

Cloud APISelf-hostedOn-premiseGPU Clusters

Industries

Developer ToolsCloud InfrastructureAI/ML Platform Services

Modalities

Text

Skill Areas

Large Language Model DeploymentModel Compression and QuantizationHigh-Performance Inference ServingModel OptimizationProduction ML SystemsGPU AccelerationDistributed Model ServingTransformer Model Optimization

tag

BatchingBenchmarkingCachingDeepSeekDeepSpeedEvalsForkedGPTGPU / CUDAGemmaHuggingFaceInferenceKV CacheLLM ServingLarge Language ModelsLlamaLong ContextMistralMultimodal AIOpenAIPhiPyTorchPythonQuantizationQwenResearch / PapersvLLM

Use Cases

Production LLM ServingModel Compression for DeploymentHigh-Throughput Text GenerationCost-Optimized LLM InferenceEnterprise LLM Deployment

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

20

fix multiround chat (#4438)

zxy • Mar 22, 2026

160f885

Optimize Qwen3.5 (#4434)

Li Zhang • Mar 21, 2026

764f35a

Make Intern-S1-Pro compatible with Transformers 5.0+ (#4435)

Lyu Han • Mar 19, 2026

09838bf

Quality

production
Quality
high
Maturity
production

Categories

Evals & BenchmarkingPrimaryInference & ServingLearning ResourcesFoundation ModelsModel TrainingMultimodal AISearch & KnowledgeOther AI / ML

PM Skills

Cost & EfficiencyUser ExperienceData & Evaluation

Languages

Python100.0%

Timeline

Project created
Jun 15, 2023
Forked
Mar 22, 2026
Your last push
2 months ago
Upstream last push
16 days ago
Tracked since
Mar 22, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…