Dao-AILab/flash-attention
flash-attention
Fast and memory-efficient exact attention
Builder

Dao-AILab
Dao-AILab • individual
Stars
23,111
Using upstream star count
Forks
2,577
Using upstream fork count
Open Issues
0
Activity Score
0/100
0 commits in 30d
Created
May 19, 2022
Project creation date
README Summary
FlashAttention is a fast and memory-efficient implementation of exact attention that reduces memory usage from quadratic to linear in sequence length. It achieves 2-4x speedup over standard attention while being mathematically equivalent, making it ideal for training long sequence transformers. The repository provides CUDA kernels and Python interfaces for easy integration into existing PyTorch models.
AI Dev Skills
Unmapped
Tags
Taxonomy
AI Trends
Deployment Context
Skill Areas
Use Cases
Recent Activity
Updated 18 days ago
7 Days
0
30 Days
0
90 Days
0
Quality
production- Quality
- high
- Maturity
- production
Categories
PM Skills
Languages
Timeline
- Project created
- May 19, 2022
- Forked
- Mar 28, 2026
- Your last push
- 18 days ago
- Upstream last push
- 7 days ago
- Tracked since
- Mar 26, 2026
Similar Repos
pgvector cosine similarity · $0
Loading…