Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/inference
Library/inferenceForked

xorbitsai/inference

inference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.

View on GitHub↗Upstream xorbitsai/inference↗

Builder

xorbitsai

xorbitsai

xorbitsai • individual

Stars

9,318

Using upstream star count

Forks

827

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Jun 14, 2023

Project creation date

README Summary

<div align="center"> <img src="./assets/xorbits-logo.png" width="180px" alt="xorbits" />

Community Evaluation

Loading…

AI Dev Skills

Unmapped

API Gateway DesignDistributed Model InferenceLarge Language Model DeploymentModel Serving InfrastructureMultimodal AI SystemsMulti-Model OrchestrationProduction MLOpsSpeech-to-Text Integration

Tags

API Gateway DesignDistributed Model InferenceLarge Language Model DeploymentModel Serving InfrastructureMultimodal AI SystemsMulti-Model OrchestrationProduction MLOpsSpeech-to-Text IntegrationBatchingC++Data ScienceDockerEmbeddingsForkedGPU / CUDAHuggingFaceJupyterKV CacheKubernetesLLM ServingLangChainLarge Language ModelsLlamaIndexMCPMLOpsModel OptimizationMultimodal AIOpenAIPlanning / CoTPythonQuantizationQwenRAGSpeech to TextTensorRTTool UseTutorialllama.cppvLLM

Taxonomy

AI Trends

Model InteroperabilityOpen Source LLMsUnified AI InterfacesOn-premise AIHybrid Cloud AI

category

Foundation ModelsAI AgentsRAG & RetrievalInference & ServingGenerative MediaMLOps & InfrastructureLearning ResourcesData Science & Analytics

Deployment Context

Cloud APISelf-hostedOn-premiseLocal Development

Industries

Developer Tools

Modalities

TextAudioMultimodal

Skill Areas

Large Language Model DeploymentModel Serving InfrastructureAPI Gateway DesignMulti-Model OrchestrationSpeech-to-Text IntegrationMultimodal AI SystemsDistributed Model InferenceProduction MLOps

tag

BatchingC++Data ScienceDockerEmbeddingsForkedGPU / CUDAHuggingFaceJupyterKV CacheKubernetesLLM ServingLangChainLarge Language ModelsLlamaIndexMCPMLOpsModel OptimizationMultimodal AIOpenAIPlanning / CoTPythonQuantizationQwenRAGSpeech to TextTensorRTTool UseTutorialllama.cppvLLM

Use Cases

LLM Provider AbstractionModel A/B TestingMulti-Model ApplicationsSpeech Processing WorkflowsCross-Platform AI DeploymentProduction LLM Serving

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

20

ENH: update models JSON [llm] (#4710)

XprobeBot • Mar 21, 2026

8b97828

fix(qwen3.5): support tool calls (#4709)

llyycchhee • Mar 21, 2026

1e55151

ENH: update model "qwen3.5" JSON (#4707)

llyycchhee • Mar 21, 2026

a6b1345

Quality

production
Quality
high
Maturity
production

Categories

RAG & RetrievalPrimaryInference & ServingMLOps & InfrastructureLearning ResourcesData Science & AnalyticsFoundation ModelsAI AgentsGenerative MediaML Platform & InfrastructureMultimodal AISearch & KnowledgeOther AI / ML

PM Skills

Cost & EfficiencyUser ExperienceScale & ReliabilityData & EvaluationProduct DiscoveryDeveloper PlatformAI-Native Architecture

Languages

Python100.0%

Timeline

Project created
Jun 14, 2023
Forked
Mar 22, 2026
Your last push
2 months ago
Upstream last push
16 days ago
Tracked since
Mar 21, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…