Library/DeepGEMMForked

deepseek-ai/DeepGEMM

DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

View on GitHub↗Upstream deepseek-ai/DeepGEMM↗

Builder

DeepSeek

deepseek-ai • ai-lab

Stars

7,342

Using upstream star count

Forks

1,025

Using upstream fork count

Open Issues

Activity Score

0/100

0 commits in 30d

Created

—

README Summary

DeepGEMM is a unified, high-performance tensor core kernel library that brings together the key computation primitives of modern large language models — GEMMs (FP8, FP4, BF16), fused MoE with overlapped communication (Mega MoE), MQA scoring for the lightning indexer, HyperConnection (HC), and more — into a single, cohesive CUDA codebase. All kernels are compiled at runtime via a lightweight Just-In-Time (JIT) module, requiring no CUDA compilation during installation.

Community Evaluation

Loading…

AI Dev Skills

No AI dev skills recorded.

Taxonomy

Recent Activity

Updated 1 months ago

7 Days

30 Days

90 Days

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304)

Chenggang Zhao • Apr 17, 2026

7f2a703

Fix sync issue of TMEM alloc/dealloc (#292)

Ray Wang • Mar 22, 2026

d30fc36

fix: k_grouped_fp8_gemm_nt_contiguous crashes with n = 768 on H100 (#238)

Xin Qiu • Feb 25, 2026

35c4bc8

Quality

Quality signals are not available for this repo yet.

PM Skills

Safety & AlignmentData & Evaluation

Languages

Cuda100.0%

Timeline

Project created: —
Forked: Apr 21, 2026
Your last push: 1 months ago
Upstream last push: 25 days ago
Tracked since: Apr 17, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…

Library/DeepGEMMForked

deepseek-ai/DeepGEMM

DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

View on GitHub↗Upstream deepseek-ai/DeepGEMM↗

Builder

DeepSeek

deepseek-ai • ai-lab

Stars

7,342

Using upstream star count

Forks

1,025

Using upstream fork count

Open Issues

Activity Score

0/100

0 commits in 30d

Created

—

README Summary

Community Evaluation

Loading…

AI Dev Skills

No AI dev skills recorded.

Taxonomy

Recent Activity

Updated 1 months ago

7 Days

30 Days

90 Days

[Public release 26/04] Introducing Mega MoE, FP4 Indexer and other features/fixes (#304)

Chenggang Zhao • Apr 17, 2026

7f2a703

Fix sync issue of TMEM alloc/dealloc (#292)

Ray Wang • Mar 22, 2026

d30fc36

fix: k_grouped_fp8_gemm_nt_contiguous crashes with n = 768 on H100 (#238)

Xin Qiu • Feb 25, 2026

35c4bc8

Quality

Quality signals are not available for this repo yet.

PM Skills

Safety & AlignmentData & Evaluation

Languages

Cuda100.0%

Timeline

Project created: —
Forked: Apr 21, 2026
Your last push: 1 months ago
Upstream last push: 25 days ago
Tracked since: Apr 17, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…

DeepGEMM

README Summary

Community Evaluation

AI Dev Skills

Tags

Taxonomy

Recent Activity

Quality

Categories

PM Skills

Languages

Timeline

Similar Repos

DeepGEMM

README Summary

Community Evaluation

AI Dev Skills

Tags

Taxonomy

Recent Activity

Quality

Categories

PM Skills

Languages

Timeline

Similar Repos