Fix illegal memory access through off-by-one error in num_splits_dynamic_ptr init #1747

klondenberg-bioptimus · 2025-07-10T11:55:29Z

There is an off-by-one error in flash_api.cpp / set_params_fprop() which can lead to memory access violations in our codebase.

Error description

in set_params_fprop, if scheduler_needs_semaphore == False and use_dynamic_split == True, the size of the tile_count_semaphore tensor is initialized here to be equal to the batch size. Which is theoretically sufficient:

flash-attention/hopper/flash_api.cpp

Line 959 in adf27d1

    
           int metadata_size = int(scheduler_needs_semaphore) + int(use_dynamic_split) * params.b;

But on line 975 a bit further below, even if scheduler_needs_semaphore==False, there is an offset of 1 being used to initialize num_splits_dynamic_ptr based off the raw data of the tile_count_semaphore tensor.

If num_splits_dynamic_ptr is now again being accessed at it's supposedly last valid element at an index equal to the batch size - 1, an illegal memory access occurs. Since it's just an off-by-one error, this might rarely be detectable, but it led to (rare) crashes and numerical issues in our CI. It could be detected by running some of our tests with "compute-sanitizer --padding 128 ... " while setting PYTORCH_NO_CUDA_MEMORY_CACHING=1 to disable pytorch's caching allocator ( without that, the access usually still hit memory that belonged to a valid allocation even if it was out of bounds ).

flash-attention/hopper/flash_api.cpp

Line 975 in adf27d1

    
           params.num_splits_dynamic_ptr = use_dynamic_split ? tile_count_semaphore.data_ptr<int>() + 1 : nullptr;

Fix illegal memory accesses through num_splits_dynamic_ptr

a924b8a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix illegal memory access through off-by-one error in num_splits_dynamic_ptr init #1747

Fix illegal memory access through off-by-one error in num_splits_dynamic_ptr init #1747

Uh oh!

klondenberg-bioptimus commented Jul 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix illegal memory access through off-by-one error in num_splits_dynamic_ptr init #1747

Are you sure you want to change the base?

Fix illegal memory access through off-by-one error in num_splits_dynamic_ptr init #1747

Uh oh!

Conversation

klondenberg-bioptimus commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Error description

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

klondenberg-bioptimus commented Jul 10, 2025 •

edited

Loading