Skip to content

Conversation

cudawarped
Copy link
Contributor

copy_dram_to_sram_async did not support 2d thread blocks before modular/modular#5068. As a result the async copy in the Idiomatic LayoutTensor tiling solution currently uses all 9 threads from the 2d thread block to perform the copy even though the load layout explicitley specifies 3. PR adds extra parameters to fix this.

… perform the copy even though the load layout specifies that only three should be used. PR adds extra parameters to fix this.
Copy link
Collaborator

@ehsanmok ehsanmok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! could you update the solution too?

@cudawarped
Copy link
Contributor Author

@ehsanmok Of course which part of the solution did I miss?

Copy link
Collaborator

@ehsanmok ehsanmok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, missed that! Thanks again.

@ehsanmok ehsanmok merged commit 2fac053 into modular:main Aug 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants