deepseek-ai/DeepGEMM
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Builder
DeepSeek
deepseek-ai • ai-lab
Stars
7,342
Using upstream star count
Forks
1,025
Using upstream fork count
Open Issues
0
Activity Score
0/100
0 commits in 30d
Created
—
DeepGEMM is a unified, high-performance tensor core kernel library that brings together the key computation primitives of modern large language models — GEMMs (FP8, FP4, BF16), fused MoE with overlapped communication (Mega MoE), MQA scoring for the lightning indexer, HyperConnection (HC), and more — into a single, cohesive CUDA codebase. All kernels are compiled at runtime via a lightweight Just-In-Time (JIT) module, requiring no CUDA compilation during installation.
No AI dev skills recorded.
Updated 1 months ago
7 Days
0
30 Days
0
90 Days
2
Quality signals are not available for this repo yet.
pgvector cosine similarity · $0
Loading…