NVIDIA/cutlass
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Builder
NVIDIA
NVIDIA • big-tech
Stars
9,810
Using upstream star count
Forks
1,883
Using upstream fork count
Open Issues
0
Activity Score
0/100
0 commits in 30d
Created
Nov 30, 2017
Project creation date
CUTLASS is a collection of abstractions for implementing high-performance matrix-matrix multiplication (GEMM) and related computations at all levels and scales within CUDA. It incorporates strategies for hierarchical decomposition and data movement. CUTLASS decomposes these "moving parts" into reusable, modular software components and abstractions.
Unmapped
AI Trends
Deployment Context
Modalities
Skill Areas
Updated 2 months ago
7 Days
0
30 Days
0
90 Days
4
docs: Fix float16 documentation in elementwise_add notebook (#2949) (#3047)
Blake Ledden • Mar 12, 2026
Support for Group GEMM in CUTLASS Profiler for Geforce and Spark (#3092)
dePaul Miller • Mar 7, 2026
[fix] Boolean.__dsl_and__ emits arith.andi directly for i1 operands (#3087)
Johnsonms • Mar 5, 2026
pgvector cosine similarity · $0
Loading…