Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/flash-attention
Library/flash-attentionForked

Dao-AILab/flash-attention

flash-attention

Fast and memory-efficient exact attention

View on GitHub↗Upstream Dao-AILab/flash-attention↗

Builder

Dao-AILab

Dao-AILab

Dao-AILab • individual

Stars

23,987

Using upstream star count

Forks

2,779

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

May 19, 2022

Project creation date

README Summary

FlashAttention This repository provides the official implementation of FlashAttention and FlashAttention-2 from the following papers.

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Algorithm OptimizationAttention Mechanism DesignAttention Mechanism OptimizationAttention MechanismsComputational Complexity AnalysisCUDA/GPU AccelerationCUDA ProgrammingGPU Kernel DevelopmentGPU Kernel ProgrammingGPU OptimizationHigh-Performance ComputingLarge Language Model TrainingLow-level Performance TuningMemory Efficiency in Deep LearningMemory-Efficient Deep LearningMixed Precision TrainingNumerical StabilityNumerical Stability in Neural NetworksTransformer ArchitectureTransformer Architecture Optimization

Tags

Algorithm OptimizationAttention Mechanism DesignAttention Mechanism OptimizationAttention MechanismsComputational Complexity AnalysisCUDA/GPU AccelerationCUDA ProgrammingGPU Kernel DevelopmentGPU Kernel ProgrammingGPU OptimizationHigh-Performance ComputingLarge Language Model TrainingLow-level Performance TuningMemory Efficiency in Deep LearningMemory-Efficient Deep LearningMixed Precision TrainingNumerical StabilityNumerical Stability in Neural NetworksTransformer ArchitectureTransformer Architecture OptimizationBenchmarkingContext EngineeringDockerEmbeddingsEvalsForkedGPU / CUDAGemmaHuggingFaceKV CacheMistralOpenAIPyTorchPythonResearch / PapersSecurityvLLM

Taxonomy

AI Trends

Large Language ModelsEfficient AITransformer OptimizationHardware-Aware Algorithm DesignModel OptimizationScaling Transformer ModelsEfficient TransformersHardware-aware Algorithm DesignScaling Language ModelsInference Optimization

category

Foundation ModelsAI AgentsRAG & RetrievalModel TrainingEvals & BenchmarkingInference & ServingMLOps & InfrastructureDev Tools & AutomationLearning ResourcesSecurity & Safety

Deployment Context

CloudOn-premiseSelf-hostedCloud GPUEdge with GPU acceleration

Modalities

TextImageCode

Skill Areas

Transformer Architecture OptimizationAttention Mechanism DesignGPU Kernel DevelopmentCUDA ProgrammingMemory-Efficient Deep LearningComputational Complexity AnalysisHigh-Performance ComputingAlgorithm OptimizationLarge Language Model TrainingNumerical Stability in Neural NetworksTransformer ArchitectureAttention Mechanism OptimizationGPU Kernel ProgrammingCUDA/GPU AccelerationMixed Precision TrainingAttention MechanismsGPU OptimizationMemory Efficiency in Deep LearningNumerical StabilityLow-level Performance Tuning

tag

BenchmarkingContext EngineeringDockerEmbeddingsEvalsForkedGPU / CUDAGemmaHuggingFaceKV CacheMistralOpenAIPyTorchPythonResearch / PapersSecurityvLLM

Use Cases

Efficient training of large language modelsLong-context sequence processingReducing GPU memory consumptionAccelerating transformer inferenceTraining with larger batch sizes on constrained hardwareEnabling longer sequence lengths in vision and language tasksTraining Large Language ModelsInference Optimization for TransformersLong Context ProcessingMemory-Constrained Model TrainingEfficient Sequence ProcessingAccelerating Large Language Model InferenceReducing Training Time for TransformersEfficient Fine-tuning of Foundation ModelsLong-context ProcessingMemory-constrained Deployment

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

20

[AMD ROCm] Update CK and add RDNA 3/4 support (#2400)

rocking • Mar 26, 2026

5301a35

[Fwd,Sm100] Clean up pipeline creation a bit

Tri Dao • Mar 26, 2026

4fcfdec

Fix edge case when tag has no delta from previous (#2394)

Driss Guessous • Mar 25, 2026

abd9943

Quality

production
Quality
high
Maturity
production

Categories

RAG & RetrievalPrimaryEvals & BenchmarkingInference & ServingMLOps & InfrastructureDev Tools & AutomationLearning ResourcesSecurity & SafetyFoundation ModelsAI AgentsModel TrainingSearch & KnowledgeOther AI / ML

PM Skills

Cost & EfficiencyScale & ReliabilityData & EvaluationProduct DiscoveryAI-Native Architecture

Languages

Python100.0%

Timeline

Project created
May 19, 2022
Forked
Mar 28, 2026
Your last push
2 months ago
Upstream last push
17 days ago
Tracked since
Mar 26, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…