Library/cutlassForked

NVIDIA/cutlass

cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

View on GitHub↗Upstream NVIDIA/cutlass↗

Builder

NVIDIA

NVIDIA • big-tech

Stars

10,086

Using upstream star count

Forks

1,964

Using upstream fork count

Open Issues

Activity Score

0/100

0 commits in 30d

Created

Nov 30, 2017

Project creation date

README Summary

CUTLASS is a collection of abstractions for implementing high-performance matrix-matrix multiplication (GEMM) and related computations at all levels and scales within CUDA. It incorporates strategies for hierarchical decomposition and data movement. CUTLASS decomposes these "moving parts" into reusable, modular software components and abstractions.

Community Evaluation

Loading…

AI Dev Skills

Unmapped

CUDA ProgrammingDeep Learning AccelerationGPU Architecture UnderstandingGPU Kernel OptimizationHigh-Performance ComputingLinear Algebra OperationsMatrix Multiplication OptimizationMemory Hierarchy OptimizationPerformance Profiling and OptimizationTensor Operations

Taxonomy

AI Trends

Hardware-Software Co-optimization Efficient Deep Learning GPU Acceleration High-Performance AI Infrastructure

Recent Activity

Updated 4 months ago

7 Days

30 Days

90 Days

docs: Fix float16 documentation in elementwise_add notebook (#2949) (#3047)

Blake Ledden • Mar 12, 2026

087c84d

Support for Group GEMM in CUTLASS Profiler for Geforce and Spark (#3092)

dePaul Miller • Mar 7, 2026

73c59c0

[fix] Boolean.__dsl_and__ emits arith.andi directly for i1 operands (#3087)

Johnsonms • Mar 5, 2026

e5fcd12

Quality

production

Quality: high
Maturity: production

PM Skills

Safety & AlignmentDeveloper Platform

Languages

C++100.0%

Timeline

Project created: Nov 30, 2017
Forked: Mar 14, 2026
Your last push: 4 months ago
Upstream last push: 2 months ago
Tracked since: Mar 12, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…

Library/cutlassForked

NVIDIA/cutlass

cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

View on GitHub↗Upstream NVIDIA/cutlass↗

Builder

NVIDIA

NVIDIA • big-tech

Stars

10,086

Using upstream star count

Forks

1,964

Using upstream fork count

Open Issues

Activity Score

0/100

0 commits in 30d

Created

Nov 30, 2017

Project creation date

README Summary

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Taxonomy

AI Trends

Hardware-Software Co-optimization Efficient Deep Learning GPU Acceleration High-Performance AI Infrastructure

Recent Activity

Updated 4 months ago

7 Days

30 Days

90 Days

docs: Fix float16 documentation in elementwise_add notebook (#2949) (#3047)

Blake Ledden • Mar 12, 2026

087c84d

Support for Group GEMM in CUTLASS Profiler for Geforce and Spark (#3092)

dePaul Miller • Mar 7, 2026

73c59c0

[fix] Boolean.__dsl_and__ emits arith.andi directly for i1 operands (#3087)

Johnsonms • Mar 5, 2026

e5fcd12

Quality

production

Quality: high
Maturity: production

PM Skills

Safety & AlignmentDeveloper Platform

Languages

C++100.0%

Timeline

Project created: Nov 30, 2017
Forked: Mar 14, 2026
Your last push: 4 months ago
Upstream last push: 2 months ago
Tracked since: Mar 12, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…

cutlass

README Summary

Community Evaluation

AI Dev Skills

Tags

Taxonomy

Recent Activity

Quality

Categories

PM Skills

Languages

Timeline

Similar Repos

cutlass

README Summary

Community Evaluation

AI Dev Skills

Tags

Taxonomy

Recent Activity

Quality

Categories

PM Skills

Languages

Timeline

Similar Repos