fix race condition bug in cute _flash_attn_fwd in multiple gpu env #1793

beiw-nv · 2025-08-01T13:29:18Z

In multi-gpu runs, the cute implementation of _flash_attn_fwd returns incorrect values for gpu != 0. This can be fixed with torch.cuda.device context manager as sugguested issue1782:

tridao · 2025-08-01T14:59:00Z

I'm not sure it's "race condition", I suspect it just launches the kernel on cuda device 0 even when the data is on cuda device 1

beiw-nv · 2025-08-01T15:14:00Z

I see. When do you expect we will have Blackwell support for _flash_attn_bwd?

tridao · 2025-08-01T15:24:05Z

3-4 weeks

fix race condition bug in _flash_attn_fwd in multiple gpu env

fafc477

Merge branch 'main' into main

ff88962

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix race condition bug in cute _flash_attn_fwd in multiple gpu env #1793

fix race condition bug in cute _flash_attn_fwd in multiple gpu env #1793

Uh oh!

beiw-nv commented Aug 1, 2025

Uh oh!

tridao commented Aug 1, 2025

Uh oh!

beiw-nv commented Aug 1, 2025

Uh oh!

tridao commented Aug 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix race condition bug in cute _flash_attn_fwd in multiple gpu env #1793

Are you sure you want to change the base?

fix race condition bug in cute _flash_attn_fwd in multiple gpu env #1793

Uh oh!

Conversation

beiw-nv commented Aug 1, 2025

Uh oh!

tridao commented Aug 1, 2025

Uh oh!

beiw-nv commented Aug 1, 2025

Uh oh!

tridao commented Aug 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants