Fix a potential buffer size misaligning issue in TMA description of partition attention #3

yuxiaoguo · 2025-06-06T07:14:23Z

The TMA descriptor for attn_lse_intermediates is initialized based on the original number of SMs in the hardware during make_globals (in latency/scheduler.py). However, its actual allocated size is later rounded up to a multiple of 16 based on the number of SMs (in demos/low-latency-llama/attention_reduction.cu). This discrepancy leads to a failure when creating the TMA descriptor.

…artition attention

Fix a potential buffer size misaligning issue in TMA description of p…

85d49e4

…artition attention

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix a potential buffer size misaligning issue in TMA description of partition attention #3

Fix a potential buffer size misaligning issue in TMA description of partition attention #3

Uh oh!

yuxiaoguo commented Jun 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix a potential buffer size misaligning issue in TMA description of partition attention #3

Are you sure you want to change the base?

Fix a potential buffer size misaligning issue in TMA description of partition attention #3

Uh oh!

Conversation

yuxiaoguo commented Jun 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant