Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/sglang
Library/sglangForked

sgl-project/sglang

sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

View on GitHub↗Upstream sgl-project/sglang↗

Builder

sgl-project

sgl-project

sgl-project • individual

Stars

28,456

Using upstream star count

Forks

6,223

Using upstream fork count

Open Issues

0

Activity Score

0/100

1406 commits in 30d

Created

Jan 8, 2024

Project creation date

README Summary

<div align="center" id="sglangtop"> <img src="https://raw.githubusercontent.com/sgl-project/sglang/main/assets/logo.png" alt="logo" width="400" margin="10px"></img>

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Distributed SystemsGPU AccelerationHigh-Performance ComputingInference Engine DevelopmentLarge Language Model ServingModel Inference OptimizationModel Serving ArchitectureMultimodal Model Deployment

Tags

Distributed SystemsGPU AccelerationHigh-Performance ComputingInference Engine DevelopmentLarge Language Model ServingModel Inference OptimizationModel Serving ArchitectureMultimodal Model DeploymentBatchingBenchmarkingCachingDeepSeekEmbeddingsEvalsForkedGPTGemmaGoogle CloudHuggingFaceImage GenerationInferenceLLM ServingLarge Language ModelsLlamaLoRA / PEFTMistralModel OptimizationMultimodal AIOpen SourceOpenAIPyTorchQuantizationQwenRoadmapSGLangSpeculative DecodingTensorRTTutorialvLLM

Taxonomy

AI Trends

Large Language ModelsMultimodal AIAI InfrastructureModel Serving Optimization

category

Inference & ServingFoundation ModelsRAG & RetrievalModel TrainingEvals & BenchmarkingGenerative MediaCloud & PlatformsLearning Resources

Deployment Context

Cloud APISelf-hostedOn-premise

Industries

Developer ToolsCloud InfrastructureAI Platform Services

Modalities

TextMultimodal

Skill Areas

Large Language Model ServingModel Inference OptimizationMultimodal Model DeploymentHigh-Performance ComputingDistributed SystemsGPU AccelerationModel Serving ArchitectureInference Engine Development

tag

BatchingBenchmarkingCachingDeepSeekEmbeddingsEvalsForkedGPTGemmaGoogle CloudHuggingFaceImage GenerationInferenceLLM ServingLarge Language ModelsLlamaLoRA / PEFTMistralModel OptimizationMultimodal AIOpen SourceOpenAIPyTorchQuantizationQwenRoadmapSGLangSpeculative DecodingTensorRTTutorialvLLM

Use Cases

LLM API ServingMultimodal Model InferenceProduction AI Model DeploymentHigh-Throughput Text GenerationScalable AI Model Hosting

Recent Activity

Updated 2 months ago

7 Days

309

30 Days

1406

90 Days

4137

[Diffusion] Opt qwen-image-edit with fuse_residual_layernorm_scale_shift_gate_select01_kernel (#2039

Xiaoyu Zhang • Mar 13, 2026

e00328d

[RadixTree][7/N Refactor]: Refactor mamba radix tree, release dup kvcache in insert func (#19429)

hzh0425 • Mar 13, 2026

197f807

[HTTP] Fix `/GET` HTTP route when ollama endpoint is not set. (#20494)

Liangsheng Yin • Mar 13, 2026

f605612

Quality

beta
Quality
high
Maturity
beta

Categories

Foundation ModelsPrimaryRAG & RetrievalModel TrainingEvals & BenchmarkingInference & ServingGenerative MediaMultimodal AIOther AI / MLCloud & PlatformsLearning Resources

PM Skills

Cost & EfficiencyUser ExperienceData & EvaluationProduct Discovery

Languages

Python100.0%

Timeline

Project created
Jan 8, 2024
Forked
Mar 13, 2026
Your last push
2 months ago
Upstream last push
16 days ago
Tracked since
Mar 13, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…