[Triton-MLIR] a detail questions related with mma_layout #686

goostavz · 2022-09-21T10:27:00Z

goostavz
Sep 21, 2022
Collaborator

1, in the legacy code:
contig_per_thread of mma_layout sm<80 is always set to {1, 1}, which means that the convert_layout(mma->blocked) would never be able to do vectorized store to share memory.
However, in my understanding contig_per_thread should be 2 in row direction according to its definition mma_layout ver1
I guess there should be some story in the legacy code that set contig_per_thread to {1,1}, or am I missing something?

Answered by ptillet

Sep 21, 2022

Ah, I somewhat remember that. You are right that because contig_per_thread=1, shared memory ops aren't well vectorized on V100. I don't think there is any deep reason why; this happened at a time during which I was very busy with OpenAI stuff, and probably I just wanted to get good A100 perf ASAP without risking to break the V100 codegen. :D

View full answer

ptillet · 2022-09-21T17:18:33Z

ptillet
Sep 21, 2022
Maintainer

Ah, I somewhat remember that. You are right that because contig_per_thread=1, shared memory ops aren't well vectorized on V100. I don't think there is any deep reason why; this happened at a time during which I was very busy with OpenAI stuff, and probably I just wanted to get good A100 perf ASAP without risking to break the V100 codegen. :D

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Triton-MLIR] a detail questions related with mma_layout #686

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

[Triton-MLIR] a detail questions related with mma_layout #686

Uh oh!

Uh oh!

goostavz Sep 21, 2022 Collaborator

Replies: 1 comment

Uh oh!

Uh oh!

ptillet Sep 21, 2022 Maintainer

goostavz
Sep 21, 2022
Collaborator

ptillet
Sep 21, 2022
Maintainer