Skip to content

Provide with pre-build flash-attention package wheels on Linux and Windows platforms using GitHub Actions

Notifications You must be signed in to change notification settings

mjun0812/flash-attention-prebuild-wheels

Repository files navigation

flash-attention pre-build wheels

This repository provides wheels for the pre-built flash-attention.

Since building flash-attention takes a very long time and is resource-intensive, I also build and provide combinations of CUDA and PyTorch that are not officially distributed.

The building Github Actions Workflow can be found here.
The built packages are available on the release page.

This repository uses a self-hosted runner and AWS CodeBuild for building the wheels. If you find this project helpful, please consider sponsoring to help maintain the infrastructure!

fund

Table of Contents

Install

  1. Select the versions for Python, CUDA, PyTorch, and flash_attn.
flash_attn-[flash_attn Version]+cu[CUDA Version]torch[PyTorch Version]-cp[Python Version]-cp[Python Version]-linux_x86_64.whl

# Example: Python 3.11, CUDA 12.4, PyTorch 2.5, and flash_attn 2.6.3
flash_attn-2.6.3+cu124torch2.5-cp312-cp312-linux_x86_64.whl
  1. Find the corresponding version of a wheel from the below Package section and releases

  2. Direct Install or Download and Local Install

# Direct Install
pip install https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.0.0/flash_attn-2.6.3+cu124torch2.5-cp312-cp312-linux_x86_64.whl

# Download and Local Install
wget https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.0.0/flash_attn-2.6.3+cu124torch2.5-cp312-cp312-linux_x86_64.whl
pip install ./flash_attn-2.6.3+cu124torch2.5-cp312-cp312-linux_x86_64.whl

Self build

If you cannot find the version you are looking for, you can fork this repository and create a wheel on GitHub Actions.

  1. Fork this repository
  2. Edit workflow file .github/workflows/build.yml to set the version you want to build.
  3. Add tag v*.*.* to trigger the build workflow.

Please note that depending on the combination of versions, it may not be possible to build.

Self-Hosted Runner Build

In some version combinations, you cannot build wheels on GitHub-hosted runners due to job time limitations. To build the wheels for these versions, you can use self-hosted runners.

git clone https://github.com/mjun0812/flash-attention-prebuild-wheels.git
cd self-hosted-runner
cp env.template env

Edit env file to set the environment variables.

# Edit env
PERSONAL_ACCESS_TOKEN=[Github Personal Access Token]

Edit compose.yml file if you use repository folked from this repository.

services:
  runner:
    privileged: true
    build:
      context: .
      dockerfile: Dockerfile
      args:
        REPOSITORY_URL: [Target Repository URL]
        PERSONAL_ACCESS_TOKEN: $PERSONAL_ACCESS_TOKEN
        GH_RUNNER_VERSION: 2.324.0
        RUNNER_NAME: self-hosted-runner
        RUNNER_GROUP: default
        RUNNER_LABELS: self-hosted
        TARGET_ARCH: x64

Then, build and run the docker container.

# Build and run
docker compose build
docker compose up -d

Packages

Linux x86_64

Flash-Attention 2.8.2

Packages for Flash-Attention 2.8.2
Python PyTorch CUDA package
3.12 2.8.0 12.9.1 Release1
3.12 2.8.0 12.8.1 Release1
3.12 2.8.0 12.4.1 Release1
3.12 2.7.1 12.9.1 Release1
3.12 2.7.1 12.8.1 Release1
3.12 2.7.1 12.4.1 Release1
3.12 2.6.0 12.9.1 Release1
3.12 2.6.0 12.8.1 Release1
3.12 2.6.0 12.4.1 Release1
3.12 2.5.1 12.9.1 Release1
3.12 2.5.1 12.8.1 Release1
3.12 2.5.1 12.4.1 Release1
3.11 2.8.0 12.9.1 Release1
3.11 2.8.0 12.8.1 Release1
3.11 2.8.0 12.4.1 Release1
3.11 2.7.1 12.9.1 Release1
3.11 2.7.1 12.8.1 Release1
3.11 2.7.1 12.4.1 Release1
3.11 2.6.0 12.9.1 Release1
3.11 2.6.0 12.8.1 Release1
3.11 2.6.0 12.4.1 Release1
3.11 2.5.1 12.9.1 Release1
3.11 2.5.1 12.8.1 Release1
3.11 2.5.1 12.4.1 Release1
3.10 2.8.0 12.9.1 Release1
3.10 2.8.0 12.8.1 Release1
3.10 2.8.0 12.4.1 Release1
3.10 2.7.1 12.9.1 Release1
3.10 2.7.1 12.8.1 Release1
3.10 2.7.1 12.4.1 Release1
3.10 2.6.0 12.9.1 Release1
3.10 2.6.0 12.8.1 Release1
3.10 2.6.0 12.4.1 Release1
3.10 2.5.1 12.9.1 Release1
3.10 2.5.1 12.8.1 Release1
3.10 2.5.1 12.4.1 Release1

Flash-Attention 2.8.1

Packages for Flash-Attention 2.8.1
Python PyTorch CUDA package
3.12 2.7.1 12.8.1 Release1
3.12 2.6.0 12.8.1 Release1
3.12 2.5.1 12.8.1 Release1
3.12 2.4.1 12.8.1 Release1
3.11 2.7.1 12.8.1 Release1
3.11 2.6.0 12.8.1 Release1
3.11 2.5.1 12.8.1 Release1
3.11 2.4.1 12.8.1 Release1
3.10 2.7.1 12.8.1 Release1
3.10 2.6.0 12.8.1 Release1
3.10 2.5.1 12.8.1 Release1
3.10 2.4.1 12.8.1 Release1

Flash-Attention 2.8.0

Packages for Flash-Attention 2.8.0
Python PyTorch CUDA package
3.12 2.7.1 12.8.1 Release1
3.12 2.7.1 12.4.1 Release1
3.12 2.6.0 12.8.1 Release1
3.12 2.6.0 12.4.1 Release1
3.12 2.5.1 12.8.1 Release1
3.12 2.5.1 12.4.1 Release1
3.12 2.4.1 12.8.1 Release1
3.12 2.4.1 12.4.1 Release1
3.11 2.7.1 12.8.1 Release1
3.11 2.7.1 12.4.1 Release1
3.11 2.6.0 12.8.1 Release1
3.11 2.6.0 12.4.1 Release1
3.11 2.5.1 12.8.1 Release1
3.11 2.5.1 12.4.1 Release1
3.11 2.4.1 12.8.1 Release1
3.11 2.4.1 12.4.1 Release1
3.10 2.7.1 12.8.1 Release1
3.10 2.7.1 12.4.1 Release1
3.10 2.6.0 12.8.1 Release1
3.10 2.6.0 12.4.1 Release1
3.10 2.5.1 12.8.1 Release1
3.10 2.5.1 12.4.1 Release1
3.10 2.4.1 12.8.1 Release1
3.10 2.4.1 12.4.1 Release1

Flash-Attention 2.7.4

Packages for Flash-Attention 2.7.4
Python PyTorch CUDA package
3.12 2.8 12.9 Release1
3.12 2.8 12.8 Release1
3.12 2.8 12.4 Release1
3.12 2.8.0.dev20250523 12.8.1 Release1
3.12 2.7 12.9 Release1
3.12 2.7 12.8 Release1
3.12 2.7 12.4 Release1
3.12 2.7.1 12.8.1 Release1
3.12 2.7.0 12.8.1 Release1
3.12 2.6 12.9 Release1
3.12 2.6 12.8 Release1
3.12 2.6 12.4 Release1
3.12 2.5 12.9 Release1
3.12 2.5 12.8 Release1
3.12 2.5 12.4 Release1
3.11 2.8 12.9 Release1
3.11 2.8 12.8 Release1
3.11 2.8 12.4 Release1
3.11 2.8.0.dev20250523 12.8.1 Release1
3.11 2.7 12.9 Release1
3.11 2.7 12.8 Release1
3.11 2.7 12.4 Release1
3.11 2.7.1 12.8.1 Release1
3.11 2.7.0 12.8.1 Release1
3.11 2.6 12.9 Release1
3.11 2.6 12.8 Release1
3.11 2.6 12.4 Release1
3.11 2.5 12.9 Release1
3.11 2.5 12.8 Release1
3.11 2.5 12.4 Release1
3.10 2.8 12.9 Release1
3.10 2.8 12.8 Release1
3.10 2.8 12.4 Release1
3.10 2.8.0.dev20250523 12.8.1 Release1
3.10 2.7 12.9 Release1
3.10 2.7 12.8 Release1
3.10 2.7 12.4 Release1
3.10 2.7.1 12.8.1 Release1
3.10 2.7.0 12.8.1 Release1
3.10 2.6 12.9 Release1
3.10 2.6 12.8 Release1
3.10 2.6 12.4 Release1
3.10 2.5 12.9 Release1
3.10 2.5 12.8 Release1
3.10 2.5 12.4 Release1

Flash-Attention 2.7.4.post1

Packages for Flash-Attention 2.7.4.post1
Python PyTorch CUDA package
3.12 2.7.0 12.6.3 Release1
3.12 2.7.0 12.4.1 Release1
3.12 2.7.0 11.8.0 Release1
3.12 2.6.0 12.6.3 Release1, Release2, Release3
3.12 2.6.0 12.4.1 Release1, Release2, Release3
3.12 2.6.0 11.8.0 Release1
3.12 2.5.1 12.6.3 Release1, Release2, Release3
3.12 2.5.1 12.4.1 Release1, Release2, Release3
3.12 2.5.1 11.8.0 Release1
3.12 2.4.1 12.6.3 Release1, Release2, Release3
3.12 2.4.1 12.4.1 Release1, Release2, Release3
3.12 2.4.1 11.8.0 Release1
3.12 2.3.1 12.6.3 Release1, Release2
3.12 2.3.1 12.4.1 Release1, Release2
3.12 2.2.2 12.6.3 Release1, Release2
3.12 2.2.2 12.4.1 Release1, Release2
3.12 2.1.2 12.6.3 Release1
3.12 2.1.2 12.4.1 Release1
3.12 2.0.1 12.6.3 Release1
3.12 2.0.1 12.4.1 Release1
3.11 2.7.0 12.6.3 Release1
3.11 2.7.0 12.4.1 Release1
3.11 2.7.0 11.8.0 Release1
3.11 2.6.0 12.6.3 Release1, Release2, Release3
3.11 2.6.0 12.4.1 Release1, Release2, Release3
3.11 2.6.0 11.8.0 Release1
3.11 2.5.1 12.6.3 Release1, Release2, Release3
3.11 2.5.1 12.4.1 Release1, Release2, Release3
3.11 2.5.1 11.8.0 Release1
3.11 2.4.1 12.6.3 Release1, Release2, Release3
3.11 2.4.1 12.4.1 Release1, Release2, Release3
3.11 2.4.1 11.8.0 Release1
3.11 2.3.1 12.6.3 Release1, Release2
3.11 2.3.1 12.4.1 Release1, Release2
3.11 2.2.2 12.6.3 Release1, Release2
3.11 2.2.2 12.4.1 Release1, Release2
3.11 2.1.2 12.6.3 Release1
3.11 2.1.2 12.4.1 Release1
3.11 2.0.1 12.6.3 Release1
3.11 2.0.1 12.4.1 Release1
3.10 2.7.0 12.6.3 Release1
3.10 2.7.0 12.4.1 Release1
3.10 2.7.0 11.8.0 Release1
3.10 2.6.0 12.6.3 Release1, Release2, Release3
3.10 2.6.0 12.4.1 Release1, Release2, Release3
3.10 2.6.0 11.8.0 Release1
3.10 2.5.1 12.6.3 Release1, Release2, Release3
3.10 2.5.1 12.4.1 Release1, Release2, Release3
3.10 2.5.1 11.8.0 Release1
3.10 2.4.1 12.6.3 Release1, Release2, Release3
3.10 2.4.1 12.4.1 Release1, Release2, Release3
3.10 2.4.1 11.8.0 Release1
3.10 2.3.1 12.6.3 Release1, Release2
3.10 2.3.1 12.4.1 Release1, Release2
3.10 2.2.2 12.6.3 Release1, Release2
3.10 2.2.2 12.4.1 Release1, Release2
3.10 2.1.2 12.6.3 Release1
3.10 2.1.2 12.4.1 Release1
3.10 2.0.1 12.6.3 Release1
3.10 2.0.1 12.4.1 Release1

Flash-Attention 2.7.3

Packages for Flash-Attention 2.7.3
Python PyTorch CUDA package
3.12 2.5.1 12.4.1 Release1
3.12 2.5.1 12.1.1 Release1
3.12 2.5.1 11.8.0 Release1
3.12 2.4.1 12.4.1 Release1
3.12 2.4.1 12.1.1 Release1
3.12 2.4.1 11.8.0 Release1
3.12 2.3.1 12.4.1 Release1
3.12 2.3.1 12.1.1 Release1
3.12 2.3.1 11.8.0 Release1
3.12 2.2.2 12.4.1 Release1
3.12 2.2.2 12.1.1 Release1
3.12 2.2.2 11.8.0 Release1
3.12 2.1.2 12.4.1 Release1
3.12 2.1.2 12.1.1 Release1
3.12 2.1.2 11.8.0 Release1
3.12 2.0.1 12.4.1 Release1
3.12 2.0.1 12.1.1 Release1
3.12 2.0.1 11.8.0 Release1
3.11 2.5.1 12.4.1 Release1
3.11 2.5.1 12.1.1 Release1
3.11 2.5.1 11.8.0 Release1
3.11 2.4.1 12.4.1 Release1
3.11 2.4.1 12.1.1 Release1
3.11 2.4.1 11.8.0 Release1
3.11 2.3.1 12.4.1 Release1
3.11 2.3.1 12.1.1 Release1
3.11 2.3.1 11.8.0 Release1
3.11 2.2.2 12.4.1 Release1
3.11 2.2.2 12.1.1 Release1
3.11 2.2.2 11.8.0 Release1
3.11 2.1.2 12.4.1 Release1
3.11 2.1.2 12.1.1 Release1
3.11 2.1.2 11.8.0 Release1
3.11 2.0.1 12.4.1 Release1
3.11 2.0.1 12.1.1 Release1
3.11 2.0.1 11.8.0 Release1
3.10 2.5.1 12.4.1 Release1
3.10 2.5.1 12.1.1 Release1
3.10 2.5.1 11.8.0 Release1
3.10 2.4.1 12.4.1 Release1
3.10 2.4.1 12.1.1 Release1
3.10 2.4.1 11.8.0 Release1
3.10 2.3.1 12.4.1 Release1
3.10 2.3.1 12.1.1 Release1
3.10 2.3.1 11.8.0 Release1
3.10 2.2.2 12.4.1 Release1
3.10 2.2.2 12.1.1 Release1
3.10 2.2.2 11.8.0 Release1
3.10 2.1.2 12.4.1 Release1
3.10 2.1.2 12.1.1 Release1
3.10 2.1.2 11.8.0 Release1
3.10 2.0.1 12.4.1 Release1
3.10 2.0.1 12.1.1 Release1
3.10 2.0.1 11.8.0 Release1

Flash-Attention 2.7.2.post1

Packages for Flash-Attention 2.7.2.post1
Python PyTorch CUDA package
3.12 2.5.1 12.4.1 Release1
3.12 2.5.1 12.1.1 Release1
3.12 2.5.1 11.8.0 Release1
3.12 2.4.1 12.4.1 Release1
3.12 2.4.1 12.1.1 Release1
3.12 2.4.1 11.8.0 Release1
3.12 2.3.1 12.4.1 Release1
3.12 2.3.1 12.1.1 Release1
3.12 2.3.1 11.8.0 Release1
3.12 2.2.2 12.4.1 Release1
3.12 2.2.2 12.1.1 Release1
3.12 2.2.2 11.8.0 Release1
3.12 2.1.2 12.4.1 Release1
3.12 2.1.2 12.1.1 Release1
3.12 2.1.2 11.8.0 Release1
3.12 2.0.1 12.4.1 Release1
3.12 2.0.1 12.1.1 Release1
3.12 2.0.1 11.8.0 Release1
3.11 2.5.1 12.4.1 Release1
3.11 2.5.1 12.1.1 Release1
3.11 2.5.1 11.8.0 Release1
3.11 2.4.1 12.4.1 Release1
3.11 2.4.1 12.1.1 Release1
3.11 2.4.1 11.8.0 Release1
3.11 2.3.1 12.4.1 Release1
3.11 2.3.1 12.1.1 Release1
3.11 2.3.1 11.8.0 Release1
3.11 2.2.2 12.4.1 Release1
3.11 2.2.2 12.1.1 Release1
3.11 2.2.2 11.8.0 Release1
3.11 2.1.2 12.4.1 Release1
3.11 2.1.2 12.1.1 Release1
3.11 2.1.2 11.8.0 Release1
3.11 2.0.1 12.4.1 Release1
3.11 2.0.1 12.1.1 Release1
3.11 2.0.1 11.8.0 Release1
3.10 2.5.1 12.4.1 Release1
3.10 2.5.1 12.1.1 Release1
3.10 2.5.1 11.8.0 Release1
3.10 2.4.1 12.4.1 Release1
3.10 2.4.1 12.1.1 Release1
3.10 2.4.1 11.8.0 Release1
3.10 2.3.1 12.4.1 Release1
3.10 2.3.1 12.1.1 Release1
3.10 2.3.1 11.8.0 Release1
3.10 2.2.2 12.4.1 Release1
3.10 2.2.2 12.1.1 Release1
3.10 2.2.2 11.8.0 Release1
3.10 2.1.2 12.4.1 Release1
3.10 2.1.2 12.1.1 Release1
3.10 2.1.2 11.8.0 Release1
3.10 2.0.1 12.4.1 Release1
3.10 2.0.1 12.1.1 Release1
3.10 2.0.1 11.8.0 Release1

Flash-Attention 2.7.0.post2

Packages for Flash-Attention 2.7.0.post2
Python PyTorch CUDA package
3.12 2.5.1 12.4.1 Release1
3.12 2.5.1 12.1.1 Release1
3.12 2.5.1 11.8.0 Release1
3.12 2.4.1 12.4.1 Release1
3.12 2.4.1 12.1.1 Release1
3.12 2.4.1 11.8.0 Release1
3.12 2.3.1 12.4.1 Release1
3.12 2.3.1 12.1.1 Release1
3.12 2.3.1 11.8.0 Release1
3.12 2.2.2 12.4.1 Release1
3.12 2.2.2 12.1.1 Release1
3.12 2.2.2 11.8.0 Release1
3.12 2.1.2 12.4.1 Release1
3.12 2.1.2 12.1.1 Release1
3.12 2.1.2 11.8.0 Release1
3.12 2.0.1 12.4.1 Release1
3.12 2.0.1 12.1.1 Release1
3.12 2.0.1 11.8.0 Release1
3.11 2.5.1 12.4.1 Release1
3.11 2.5.1 12.1.1 Release1
3.11 2.5.1 11.8.0 Release1
3.11 2.4.1 12.4.1 Release1
3.11 2.4.1 12.1.1 Release1
3.11 2.4.1 11.8.0 Release1
3.11 2.3.1 12.4.1 Release1
3.11 2.3.1 12.1.1 Release1
3.11 2.3.1 11.8.0 Release1
3.11 2.2.2 12.4.1 Release1
3.11 2.2.2 12.1.1 Release1
3.11 2.2.2 11.8.0 Release1
3.11 2.1.2 12.4.1 Release1
3.11 2.1.2 12.1.1 Release1
3.11 2.1.2 11.8.0 Release1
3.11 2.0.1 12.4.1 Release1
3.11 2.0.1 12.1.1 Release1
3.11 2.0.1 11.8.0 Release1
3.10 2.5.1 12.4.1 Release1
3.10 2.5.1 12.1.1 Release1
3.10 2.5.1 11.8.0 Release1
3.10 2.4.1 12.4.1 Release1
3.10 2.4.1 12.1.1 Release1
3.10 2.4.1 11.8.0 Release1
3.10 2.3.1 12.4.1 Release1
3.10 2.3.1 12.1.1 Release1
3.10 2.3.1 11.8.0 Release1
3.10 2.2.2 12.4.1 Release1
3.10 2.2.2 12.1.1 Release1
3.10 2.2.2 11.8.0 Release1
3.10 2.1.2 12.4.1 Release1
3.10 2.1.2 12.1.1 Release1
3.10 2.1.2 11.8.0 Release1
3.10 2.0.1 12.4.1 Release1
3.10 2.0.1 12.1.1 Release1
3.10 2.0.1 11.8.0 Release1

Flash-Attention 2.6.3

Packages for Flash-Attention 2.6.3
Python PyTorch CUDA package
3.12 2.8.0 12.9.1 Release1
3.12 2.8.0 12.8.1 Release1
3.12 2.8.0 12.4.1 Release1
3.12 2.8.0.dev20250523 12.8.1 Release1, Release2
3.12 2.7.1 12.9.1 Release1
3.12 2.7.1 12.8.1 Release1, Release2
3.12 2.7.1 12.4.1 Release1
3.12 2.7.0 12.8.1 Release1, Release2
3.12 2.7.0 12.6.3 Release1
3.12 2.7.0 12.4.1 Release1
3.12 2.7.0 11.8.0 Release1
3.12 2.6.0 12.9.1 Release1
3.12 2.6.0 12.8.1 Release1
3.12 2.6.0 12.6.3 Release1, Release2, Release3
3.12 2.6.0 12.4.1 Release1, Release2, Release3, Release4
3.12 2.6.0 11.8.0 Release1
3.12 2.5.1 12.9.1 Release1
3.12 2.5.1 12.8.1 Release1
3.12 2.5.1 12.6.3 Release1, Release2, Release3
3.12 2.5.1 12.4.1 Release1, Release2, Release3, Release4, Release5
3.12 2.5.1 12.1.1 Release1
3.12 2.5.1 11.8.0 Release1, Release2
3.12 2.4.1 12.6.3 Release1, Release2, Release3
3.12 2.4.1 12.4.1 Release1, Release2, Release3, Release4
3.12 2.4.1 12.1.1 Release1
3.12 2.4.1 11.8.0 Release1, Release2
3.12 2.3.1 12.6.3 Release1, Release2
3.12 2.3.1 12.4.1 Release1, Release2, Release3
3.12 2.3.1 12.1.1 Release1
3.12 2.3.1 11.8.0 Release1
3.12 2.2.2 12.6.3 Release1, Release2
3.12 2.2.2 12.4.1 Release1, Release2, Release3
3.12 2.2.2 12.1.1 Release1
3.12 2.2.2 11.8.0 Release1
3.12 2.1.2 12.6.3 Release1
3.12 2.1.2 12.4.1 Release1, Release2
3.12 2.1.2 12.1.1 Release1
3.12 2.1.2 11.8.0 Release1
3.12 2.0.1 12.6.3 Release1
3.12 2.0.1 12.4.1 Release1, Release2
3.12 2.0.1 12.1.1 Release1
3.12 2.0.1 11.8.0 Release1
3.11 2.8.0 12.9.1 Release1
3.11 2.8.0 12.8.1 Release1
3.11 2.8.0 12.4.1 Release1
3.11 2.8.0.dev20250523 12.8.1 Release1, Release2
3.11 2.7.1 12.9.1 Release1
3.11 2.7.1 12.8.1 Release1, Release2
3.11 2.7.1 12.4.1 Release1
3.11 2.7.0 12.8.1 Release1, Release2
3.11 2.7.0 12.6.3 Release1
3.11 2.7.0 12.4.1 Release1
3.11 2.7.0 11.8.0 Release1
3.11 2.6.0 12.9.1 Release1
3.11 2.6.0 12.8.1 Release1
3.11 2.6.0 12.6.3 Release1, Release2, Release3
3.11 2.6.0 12.4.1 Release1, Release2, Release3, Release4
3.11 2.6.0 11.8.0 Release1
3.11 2.5.1 12.9.1 Release1
3.11 2.5.1 12.8.1 Release1
3.11 2.5.1 12.6.3 Release1, Release2, Release3
3.11 2.5.1 12.4.1 Release1, Release2, Release3, Release4, Release5
3.11 2.5.1 12.1.1 Release1
3.11 2.5.1 11.8.0 Release1, Release2
3.11 2.4.1 12.6.3 Release1, Release2, Release3
3.11 2.4.1 12.4.1 Release1, Release2, Release3, Release4
3.11 2.4.1 12.1.1 Release1
3.11 2.4.1 11.8.0 Release1, Release2
3.11 2.3.1 12.6.3 Release1, Release2
3.11 2.3.1 12.4.1 Release1, Release2, Release3
3.11 2.3.1 12.1.1 Release1
3.11 2.3.1 11.8.0 Release1
3.11 2.2.2 12.6.3 Release1, Release2
3.11 2.2.2 12.4.1 Release1, Release2, Release3
3.11 2.2.2 12.1.1 Release1
3.11 2.2.2 11.8.0 Release1
3.11 2.1.2 12.6.3 Release1
3.11 2.1.2 12.4.1 Release1, Release2
3.11 2.1.2 12.1.1 Release1
3.11 2.1.2 11.8.0 Release1
3.11 2.0.1 12.6.3 Release1
3.11 2.0.1 12.4.1 Release1, Release2
3.11 2.0.1 12.1.1 Release1
3.11 2.0.1 11.8.0 Release1
3.10 2.8.0 12.9.1 Release1
3.10 2.8.0 12.8.1 Release1
3.10 2.8.0 12.4.1 Release1
3.10 2.8.0.dev20250523 12.8.1 Release1, Release2
3.10 2.7.1 12.9.1 Release1
3.10 2.7.1 12.8.1 Release1, Release2
3.10 2.7.1 12.4.1 Release1
3.10 2.7.0 12.8.1 Release1, Release2
3.10 2.7.0 12.6.3 Release1
3.10 2.7.0 12.4.1 Release1
3.10 2.7.0 11.8.0 Release1
3.10 2.6.0 12.9.1 Release1
3.10 2.6.0 12.8.1 Release1
3.10 2.6.0 12.6.3 Release1, Release2, Release3
3.10 2.6.0 12.4.1 Release1, Release2, Release3, Release4
3.10 2.6.0 11.8.0 Release1
3.10 2.5.1 12.9.1 Release1
3.10 2.5.1 12.8.1 Release1
3.10 2.5.1 12.6.3 Release1, Release2, Release3
3.10 2.5.1 12.4.1 Release1, Release2, Release3, Release4, Release5
3.10 2.5.1 12.1.1 Release1
3.10 2.5.1 11.8.0 Release1, Release2
3.10 2.4.1 12.6.3 Release1, Release2, Release3
3.10 2.4.1 12.4.1 Release1, Release2, Release3, Release4
3.10 2.4.1 12.1.1 Release1
3.10 2.4.1 11.8.0 Release1, Release2
3.10 2.3.1 12.6.3 Release1, Release2
3.10 2.3.1 12.4.1 Release1, Release2, Release3
3.10 2.3.1 12.1.1 Release1
3.10 2.3.1 11.8.0 Release1
3.10 2.2.2 12.6.3 Release1, Release2
3.10 2.2.2 12.4.1 Release1, Release2, Release3
3.10 2.2.2 12.1.1 Release1
3.10 2.2.2 11.8.0 Release1
3.10 2.1.2 12.6.3 Release1
3.10 2.1.2 12.4.1 Release1, Release2
3.10 2.1.2 12.1.1 Release1
3.10 2.1.2 11.8.0 Release1
3.10 2.0.1 12.6.3 Release1
3.10 2.0.1 12.4.1 Release1, Release2
3.10 2.0.1 12.1.1 Release1
3.10 2.0.1 11.8.0 Release1

Flash-Attention 2.5.9

Packages for Flash-Attention 2.5.9
Python PyTorch CUDA package
3.12 2.8.0.dev20250523 12.8.1 Release1, Release2
3.12 2.7.1 12.8.1 Release1
3.12 2.7.0 12.8.1 Release1, Release2
3.12 2.7.0 12.6.3 Release1
3.12 2.7.0 12.4.1 Release1
3.12 2.7.0 11.8.0 Release1
3.12 2.6.0 12.6.3 Release1, Release2
3.12 2.6.0 12.4.1 Release1, Release2
3.12 2.6.0 11.8.0 Release1
3.12 2.5.1 12.6.3 Release1, Release2
3.12 2.5.1 12.4.1 Release1, Release2
3.12 2.5.1 11.8.0 Release1
3.12 2.4.1 12.6.3 Release1, Release2
3.12 2.4.1 12.4.1 Release1, Release2
3.12 2.4.1 11.8.0 Release1
3.12 2.3.1 12.6.3 Release1
3.12 2.3.1 12.4.1 Release1
3.12 2.2.2 12.6.3 Release1
3.12 2.2.2 12.4.1 Release1
3.11 2.8.0.dev20250523 12.8.1 Release1, Release2
3.11 2.7.1 12.8.1 Release1
3.11 2.7.0 12.8.1 Release1, Release2
3.11 2.7.0 12.6.3 Release1
3.11 2.7.0 12.4.1 Release1
3.11 2.7.0 11.8.0 Release1
3.11 2.6.0 12.6.3 Release1, Release2
3.11 2.6.0 12.4.1 Release1, Release2
3.11 2.6.0 11.8.0 Release1
3.11 2.5.1 12.6.3 Release1, Release2
3.11 2.5.1 12.4.1 Release1, Release2
3.11 2.5.1 11.8.0 Release1
3.11 2.4.1 12.6.3 Release1, Release2
3.11 2.4.1 12.4.1 Release1, Release2
3.11 2.4.1 11.8.0 Release1
3.11 2.3.1 12.6.3 Release1
3.11 2.3.1 12.4.1 Release1
3.11 2.2.2 12.6.3 Release1
3.11 2.2.2 12.4.1 Release1
3.10 2.8.0.dev20250523 12.8.1 Release1, Release2
3.10 2.7.1 12.8.1 Release1
3.10 2.7.0 12.8.1 Release1, Release2
3.10 2.7.0 12.6.3 Release1
3.10 2.7.0 12.4.1 Release1
3.10 2.7.0 11.8.0 Release1
3.10 2.6.0 12.6.3 Release1, Release2
3.10 2.6.0 12.4.1 Release1, Release2
3.10 2.6.0 11.8.0 Release1
3.10 2.5.1 12.6.3 Release1, Release2
3.10 2.5.1 12.4.1 Release1, Release2
3.10 2.5.1 11.8.0 Release1
3.10 2.4.1 12.6.3 Release1, Release2
3.10 2.4.1 12.4.1 Release1, Release2
3.10 2.4.1 11.8.0 Release1
3.10 2.3.1 12.6.3 Release1
3.10 2.3.1 12.4.1 Release1
3.10 2.2.2 12.6.3 Release1
3.10 2.2.2 12.4.1 Release1

Flash-Attention 2.5.6

Packages for Flash-Attention 2.5.6
Python PyTorch CUDA package
3.12 2.5.1 12.4.1 Release1
3.12 2.5.1 12.1.1 Release1
3.12 2.5.1 11.8.0 Release1
3.12 2.4.1 12.4.1 Release1
3.12 2.4.1 12.1.1 Release1
3.12 2.4.1 11.8.0 Release1
3.12 2.3.1 12.4.1 Release1
3.12 2.3.1 12.1.1 Release1
3.12 2.3.1 11.8.0 Release1
3.12 2.2.2 12.4.1 Release1
3.12 2.2.2 12.1.1 Release1
3.12 2.2.2 11.8.0 Release1
3.12 2.1.2 12.4.1 Release1
3.12 2.1.2 12.1.1 Release1
3.12 2.1.2 11.8.0 Release1
3.12 2.0.1 12.4.1 Release1
3.12 2.0.1 12.1.1 Release1
3.12 2.0.1 11.8.0 Release1
3.11 2.5.1 12.4.1 Release1
3.11 2.5.1 12.1.1 Release1
3.11 2.5.1 11.8.0 Release1
3.11 2.4.1 12.4.1 Release1
3.11 2.4.1 12.1.1 Release1
3.11 2.4.1 11.8.0 Release1
3.11 2.3.1 12.4.1 Release1
3.11 2.3.1 12.1.1 Release1
3.11 2.3.1 11.8.0 Release1
3.11 2.2.2 12.4.1 Release1
3.11 2.2.2 12.1.1 Release1
3.11 2.2.2 11.8.0 Release1
3.11 2.1.2 12.4.1 Release1
3.11 2.1.2 12.1.1 Release1
3.11 2.1.2 11.8.0 Release1
3.11 2.0.1 12.4.1 Release1
3.11 2.0.1 12.1.1 Release1
3.11 2.0.1 11.8.0 Release1
3.10 2.5.1 12.4.1 Release1
3.10 2.5.1 12.1.1 Release1
3.10 2.5.1 11.8.0 Release1
3.10 2.4.1 12.4.1 Release1
3.10 2.4.1 12.1.1 Release1
3.10 2.4.1 11.8.0 Release1
3.10 2.3.1 12.4.1 Release1
3.10 2.3.1 12.1.1 Release1
3.10 2.3.1 11.8.0 Release1
3.10 2.2.2 12.4.1 Release1
3.10 2.2.2 12.1.1 Release1
3.10 2.2.2 11.8.0 Release1
3.10 2.1.2 12.4.1 Release1
3.10 2.1.2 12.1.1 Release1
3.10 2.1.2 11.8.0 Release1
3.10 2.0.1 12.4.1 Release1
3.10 2.0.1 12.1.1 Release1
3.10 2.0.1 11.8.0 Release1

Flash-Attention 2.4.3

Packages for Flash-Attention 2.4.3
Python PyTorch CUDA package
3.12 2.8.0.dev20250523 12.8.1 Release1, Release2
3.12 2.7.1 12.8.1 Release1
3.12 2.7.0 12.8.1 Release1, Release2
3.12 2.7.0 12.6.3 Release1
3.12 2.7.0 12.4.1 Release1
3.12 2.7.0 11.8.0 Release1
3.12 2.6.0 12.6.3 Release1, Release2
3.12 2.6.0 12.4.1 Release1, Release2
3.12 2.6.0 11.8.0 Release1
3.12 2.5.1 12.6.3 Release1, Release2
3.12 2.5.1 12.4.1 Release1, Release2, Release3
3.12 2.5.1 12.1.1 Release1
3.12 2.5.1 11.8.0 Release1, Release2
3.12 2.4.1 12.6.3 Release1, Release2
3.12 2.4.1 12.4.1 Release1, Release2, Release3
3.12 2.4.1 12.1.1 Release1
3.12 2.4.1 11.8.0 Release1, Release2
3.12 2.3.1 12.6.3 Release1
3.12 2.3.1 12.4.1 Release1, Release2
3.12 2.3.1 12.1.1 Release1
3.12 2.3.1 11.8.0 Release1
3.12 2.2.2 12.6.3 Release1
3.12 2.2.2 12.4.1 Release1, Release2
3.12 2.2.2 12.1.1 Release1
3.12 2.2.2 11.8.0 Release1
3.12 2.1.2 12.4.1 Release1
3.12 2.1.2 12.1.1 Release1
3.12 2.1.2 11.8.0 Release1
3.12 2.0.1 12.4.1 Release1
3.12 2.0.1 12.1.1 Release1
3.12 2.0.1 11.8.0 Release1
3.11 2.8.0.dev20250523 12.8.1 Release1, Release2
3.11 2.7.1 12.8.1 Release1
3.11 2.7.0 12.8.1 Release1, Release2
3.11 2.7.0 12.6.3 Release1
3.11 2.7.0 12.4.1 Release1
3.11 2.7.0 11.8.0 Release1
3.11 2.6.0 12.6.3 Release1, Release2
3.11 2.6.0 12.4.1 Release1, Release2
3.11 2.6.0 11.8.0 Release1
3.11 2.5.1 12.6.3 Release1, Release2
3.11 2.5.1 12.4.1 Release1, Release2, Release3
3.11 2.5.1 12.1.1 Release1
3.11 2.5.1 11.8.0 Release1, Release2
3.11 2.4.1 12.6.3 Release1, Release2
3.11 2.4.1 12.4.1 Release1, Release2, Release3
3.11 2.4.1 12.1.1 Release1
3.11 2.4.1 11.8.0 Release1, Release2
3.11 2.3.1 12.6.3 Release1
3.11 2.3.1 12.4.1 Release1, Release2
3.11 2.3.1 12.1.1 Release1
3.11 2.3.1 11.8.0 Release1
3.11 2.2.2 12.6.3 Release1
3.11 2.2.2 12.4.1 Release1, Release2
3.11 2.2.2 12.1.1 Release1
3.11 2.2.2 11.8.0 Release1
3.11 2.1.2 12.4.1 Release1
3.11 2.1.2 12.1.1 Release1
3.11 2.1.2 11.8.0 Release1
3.11 2.0.1 12.4.1 Release1
3.11 2.0.1 12.1.1 Release1
3.11 2.0.1 11.8.0 Release1
3.10 2.8.0.dev20250523 12.8.1 Release1, Release2
3.10 2.7.1 12.8.1 Release1
3.10 2.7.0 12.8.1 Release1, Release2
3.10 2.7.0 12.6.3 Release1
3.10 2.7.0 12.4.1 Release1
3.10 2.7.0 11.8.0 Release1
3.10 2.6.0 12.6.3 Release1, Release2
3.10 2.6.0 12.4.1 Release1, Release2
3.10 2.6.0 11.8.0 Release1
3.10 2.5.1 12.6.3 Release1, Release2
3.10 2.5.1 12.4.1 Release1, Release2, Release3
3.10 2.5.1 12.1.1 Release1
3.10 2.5.1 11.8.0 Release1, Release2
3.10 2.4.1 12.6.3 Release1, Release2
3.10 2.4.1 12.4.1 Release1, Release2, Release3
3.10 2.4.1 12.1.1 Release1
3.10 2.4.1 11.8.0 Release1, Release2
3.10 2.3.1 12.6.3 Release1
3.10 2.3.1 12.4.1 Release1, Release2
3.10 2.3.1 12.1.1 Release1
3.10 2.3.1 11.8.0 Release1
3.10 2.2.2 12.6.3 Release1
3.10 2.2.2 12.4.1 Release1, Release2
3.10 2.2.2 12.1.1 Release1
3.10 2.2.2 11.8.0 Release1
3.10 2.1.2 12.4.1 Release1
3.10 2.1.2 12.1.1 Release1
3.10 2.1.2 11.8.0 Release1
3.10 2.0.1 12.4.1 Release1
3.10 2.0.1 12.1.1 Release1
3.10 2.0.1 11.8.0 Release1

Windows x86_64

Flash-Attention 2.8.2

Packages for Flash-Attention 2.8.2
Python PyTorch CUDA package
3.12 2.8 12.8 Release1
3.12 2.7 12.8 Release1
3.11 2.8 12.8 Release1
3.11 2.7 12.8 Release1
3.10 2.8 12.8 Release1
3.10 2.7 12.8 Release1

Flash-Attention 2.7.4

Packages for Flash-Attention 2.7.4
Python PyTorch CUDA package
3.12 2.8 12.8 Release1
3.12 2.7 12.8 Release1
3.12 2.6.0 12.4.1 Release1
3.12 2.5.1 12.4.1 Release1
3.12 2.4.1 12.4.1 Release1
3.11 2.8 12.8 Release1
3.11 2.7 12.8 Release1, Release2
3.11 2.6.0 12.4.1 Release1
3.11 2.5.1 12.4.1 Release1
3.11 2.4.1 12.4.1 Release1
3.10 2.8 12.8 Release1
3.10 2.7 12.8 Release1
3.10 2.6.0 12.4.1 Release1
3.10 2.5.1 12.4.1 Release1
3.10 2.4.1 12.4.1 Release1

Flash-Attention 2.6.3

Packages for Flash-Attention 2.6.3
Python PyTorch CUDA package
3.12 2.6.0 12.4.1 Release1
3.12 2.5.1 12.4.1 Release1
3.12 2.4.1 12.4.1 Release1
3.11 2.6.0 12.6.3 Release1
3.11 2.6.0 12.4.1 Release1
3.11 2.5.1 12.4.1 Release1
3.11 2.4.1 12.4.1 Release1
3.10 2.6.0 12.4.1 Release1
3.10 2.5.1 12.4.1 Release1
3.10 2.4.1 12.4.1 Release1

Flash-Attention 2.5.9

Packages for Flash-Attention 2.5.9
Python PyTorch CUDA package
3.12 2.6.0 12.4.1 Release1
3.12 2.5.1 12.4.1 Release1
3.12 2.4.1 12.4.1 Release1
3.11 2.6.0 12.4.1 Release1
3.11 2.5.1 12.4.1 Release1
3.11 2.4.1 12.4.1 Release1
3.10 2.6.0 12.4.1 Release1
3.10 2.5.1 12.4.1 Release1
3.10 2.4.1 12.4.1 Release1

History

v0.4.10

Release

Windows x86_64

Flash-Attention Python PyTorch CUDA
2.7.4, 2.8.2 3.10, 3.11, 3.12 2.7, 2.8 12.8

v0.4.9

Release

Windows x86_64

Flash-Attention Python PyTorch CUDA
2.7.4 3.11 2.7 12.8

v0.3.18

Release

Linux x86_64

Flash-Attention Python PyTorch CUDA
2.7.4 3.10, 3.11, 3.12 2.5, 2.6, 2.7, 2.8 12.4, 12.8, 12.9

v0.3.14

Release

Linux x86_64

Flash-Attention Python PyTorch CUDA
2.6.3, 2.8.2 3.10, 3.11, 3.12 2.5.1, 2.6.0, 2.7.1, 2.8.0 12.4.1, 12.8.1, 12.9.1

v0.3.13

Release

Linux x86_64

Flash-Attention Python PyTorch CUDA
2.8.1 3.10, 3.11, 3.12 2.4.1, 2.5.1, 2.6.0, 2.7.1 12.8.1

v0.3.12

Release

Linux x86_64

Flash-Attention Python PyTorch CUDA
2.8.0 3.10, 3.11, 3.12 2.4.1, 2.5.1, 2.6.0, 2.7.1 12.4.1, 12.8.1

v0.3.10

Release

Linux x86_64

Flash-Attention Python PyTorch CUDA
2.7.4 3.10, 3.11, 3.12 2.7.1 12.8.1

v0.3.9

Release

Linux x86_64

Flash-Attention Python PyTorch CUDA
2.4.3, 2.5.9, 2.6.3 3.10, 3.11, 3.12 2.7.1 12.8.1

Windows x86_64

Flash-Attention Python PyTorch CUDA
2.5.9, 2.6.3, 2.7.4 3.10, 3.11, 3.12 2.4.1, 2.5.1, 2.6.0 12.4.1

Important

⚠️ Building flash-attn v2.7.4 with CUDA 12.8 on Windows cannot be completed because of GitHub Actions’ processing-time limits. In the future, I plan to add a self-hosted Windows runner to resolve this issue.

v0.3.1

Release

Windows x86_64

Flash-Attention Python PyTorch CUDA
2.6.3 3.11 2.6.0 12.6.3

From this version, Wheels for Windows are released.
However, we are waiting for a report on how it works because we have not tested it enough.

v0.2.1

Release

Flash-Attention Python PyTorch CUDA
2.4.3, 2.5.9, 2.6.3, 2.7.4 3.10, 3.11, 3.12 2.8.0.dev20250523 12.8.1

v0.2.0

Release

Flash-Attention Python PyTorch CUDA
2.4.3, 2.5.9, 2.6.3 3.10, 3.11, 3.12 2.8.0.dev20250523 12.8.1

v0.1.0

Release

Flash-Attention Python PyTorch CUDA
2.4.3, 2.5.9, 2.6.3, 2.7.4 3.10, 3.11, 3.12 2.7.0 12.8.1

v2.7.4 and v2.7.4.post1 are the same version.

From this release, self-hosted runners are used for building some wheels.

v0.0.9

Release

Flash-Attention Python PyTorch CUDA
2.4.3, 2.5.9, 2.6.3 3.10, 3.11, 3.12 2.7.0 12.8.1

v0.0.8

Release

Flash-Attention Python PyTorch CUDA
2.4.3, 2.5.9, 2.6.3, 2.7.4.post1 3.10, 3.11, 3.12 2.4.1, 2.5.1, 2.6.0, 2.7.0 11.8.0, 12.4.1, 12.6.3

v0.0.7

Skip for experimental reasons.

v0.0.6

Release

Flash-Attention Python PyTorch CUDA
2.4.3, 2.5.9, 2.6.3, 2.7.4.post1 3.10, 3.11, 3.12 2.2.2, 2.3.1, 2.4.1, 2.5.1, 2.6.0 12.4.1, 12.6.3

v0.0.5

Release

Flash-Attention Python PyTorch CUDA
2.6.3, 2.7.4.post1 3.10, 3.11, 3.12 2.0.1, 2.1.2, 2.2.2, 2.3.1, 2.4.1, 2.5.1, 2.6.0 12.4.1, 12.6.3

v0.0.4

Release

Flash-Attention Python PyTorch CUDA
2.7.3 3.10, 3.11, 3.12 2.0.1, 2.1.2, 2.2.2, 2.3.1, 2.4.1, 2.5.1 11.8.0, 12.1.1, 12.4.1

v0.0.3

Release

Flash-Attention Python PyTorch CUDA
2.7.2.post1 3.10, 3.11, 3.12 2.0.1, 2.1.2, 2.2.2, 2.3.1, 2.4.1, 2.5.1 11.8.0, 12.1.1, 12.4.1

v0.0.2

Release

Flash-Attention Python PyTorch CUDA
2.4.3, 2.5.6, 2.6.3, 2.7.0.post2 3.10, 3.11, 3.12 2.0.1, 2.1.2, 2.2.2, 2.3.1, 2.4.1, 2.5.1 11.8.0, 12.1.1, 12.4.1

v0.0.1

Release

flash-attention Python PyTorch CUDA
1.0.9, 2.4.3, 2.5.6, 2.5.9, 2.6.3 3.10, 3.11, 3.12 2.0.1, 2.1.2, 2.2.2, 2.3.1, 2.4.1, 2.5.0 11.8.0, 12.1.1, 12.4.1

v0.0.0

Release

flash-attention Python PyTorch CUDA
2.4.3, 2.5.6, 2.5.9, 2.6.3 3.11, 3.12 2.0.1, 2.1.2, 2.2.2, 2.3.1, 2.4.1, 2.5.0 11.8.0, 12.1.1, 12.4.1

Original Repository

repo

@inproceedings{dao2022flashattention,
  title={Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness},
  author={Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{\'e}, Christopher},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2022}
}
@inproceedings{dao2023flashattention2,
  title={Flash{A}ttention-2: Faster Attention with Better Parallelism and Work Partitioning},
  author={Dao, Tri},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2024}
}

About

Provide with pre-build flash-attention package wheels on Linux and Windows platforms using GitHub Actions

Resources

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published