NVIDIA/cutlass
cutlass
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Builder

NVIDIA
NVIDIA • big-tech
Stars
9,525
Using upstream star count
Forks
1,768
Using upstream fork count
Open Issues
0
Activity Score
0/100
0 commits in 30d
Created
Nov 30, 2017
Project creation date
README Summary
CUTLASS is a collection of CUDA C++ template abstractions for implementing high-performance matrix multiplication (GEMM) and related computations at all levels and scales within CUDA kernels. It features a flexible, modular, and composable API that separates various concerns to support a broad applicability of programs and to provide a programming model for CUDA kernel specialization and tuning. The library includes Python DSLs for code generation and supports mixed-precision computations optimized for modern GPU architectures.
AI Dev Skills
Unmapped
Tags
Taxonomy
AI Trends
Deployment Context
Modalities
Skill Areas
Recent Activity
Updated 1 months ago
7 Days
0
30 Days
0
90 Days
0
Quality
production- Quality
- high
- Maturity
- production
Categories
PM Skills
Languages
Timeline
- Project created
- Nov 30, 2017
- Forked
- Mar 14, 2026
- Your last push
- 1 months ago
- Upstream last push
- 11 days ago
- Tracked since
- Mar 12, 2026
Similar Repos
pgvector cosine similarity · $0
Loading…