Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/vllm-omni
Library/vllm-omniForked

vllm-project/vllm-omni

vllm-omni

A framework for efficient model inference with omni-modality models

View on GitHub↗Upstream vllm-project/vllm-omni↗

Builder

vLLM

vLLM

vllm-project • startup

Stars

4,841

Using upstream star count

Forks

1,018

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Sep 11, 2025

Project creation date

README Summary

<p align="center"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/logos/vllm-omni-logo.png"> <img alt="vllm-omni" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/logos/vllm-omni-logo.png" width=55%> </picture> </p> <h3 align="center"> Easy, fast, and cheap omni-modality model serving for everyone </h3>

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Attention MechanismsBatching StrategiesDistributed ComputingGPU OptimizationLarge Language Model InferenceMemory ManagementModel Serving ArchitectureMultimodal AIPyTorch Model DeploymentTransformer Architecture

Tags

Attention MechanismsBatching StrategiesDistributed ComputingGPU OptimizationLarge Language Model InferenceMemory ManagementModel Serving ArchitectureMultimodal AIPyTorch Model DeploymentTransformer ArchitectureAI AgentsAI SafetyAnthropic / ClaudeBenchmarkingClaudeCurated ListForkedGPU / CUDAHuggingFaceKV CacheLLM ServingMLOpsOpenAIQwenReal-Time / StreamingResearch / PapersTransformersTutorialVideo GenerationvLLM

Taxonomy

AI Trends

Multimodal ReasoningLarge Language ModelsModel Serving InfrastructureGPU-Accelerated Inference

category

Foundation ModelsAI AgentsEvals & BenchmarkingInference & ServingGenerative MediaMLOps & InfrastructureLearning ResourcesSecurity & Safety

Deployment Context

Self-hostedCloud APIOn-premise

Industries

Developer ToolsAI/ML PlatformsCloud Infrastructure

Modalities

TextImageMultimodal

Skill Areas

Large Language Model InferenceMultimodal AIModel Serving ArchitectureGPU OptimizationDistributed ComputingPyTorch Model DeploymentTransformer ArchitectureAttention MechanismsMemory ManagementBatching Strategies

tag

AI AgentsAI SafetyAnthropic / ClaudeBenchmarkingClaudeCurated ListForkedGPU / CUDAHuggingFaceKV CacheLLM ServingMLOpsMultimodal AIOpenAIQwenReal-Time / StreamingResearch / PapersTransformersTutorialVideo GenerationvLLM

Use Cases

Multimodal ChatbotsVisual Question AnsweringDocument Analysis with ImagesAI Assistant ServicesContent UnderstandingCross-modal Search

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

20

[Bugfix] Restore chunk-waiting requests on OmniNewRequestData rewrap failure (#1691)

Du Bin • Mar 22, 2026

b4a96b0

[CI] Add Flux2 Klein Tests (#2027)

Alex Brooks • Mar 22, 2026

a5574a2

[FP8] enable hunyuan-image-3 diffusion model with fp8 online quant (#1935)

Chendi.Xue • Mar 22, 2026

28aee51

Quality

prototype
Quality
medium
Maturity
prototype

Categories

Evals & BenchmarkingPrimaryInference & ServingMLOps & InfrastructureLearning ResourcesSecurity & SafetyFoundation ModelsAI AgentsGenerative MediaML Platform & InfrastructureSafety & AlignmentCoding & Dev ToolsMultimodal AISearch & KnowledgeOther AI / ML

PM Skills

Cost & EfficiencySafety & AlignmentUser ExperienceScale & ReliabilityData & EvaluationAI-Native Architecture

Languages

Python100.0%

Timeline

Project created
Sep 11, 2025
Forked
Mar 22, 2026
Your last push
2 months ago
Upstream last push
16 days ago
Tracked since
Mar 22, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…