Replies: 1 comment
-
For Q2, there is a detailed explanation for contiguity and divisibility in source code, which might resolve your confusion: triton/include/triton/Analysis/AxisInfo.h Lines 38 to 105 in 65d9862 I think this could be better added to document. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there!
I am very new with triton, and have started with the triton website's tutorial. During my study there was a big confusion with some api. I already checked the official document (https://triton-lang.org/main/python-api/triton.language.html), but it wasn't enough for me.
1. arange()
The results are like this:
This is a very simple block making me with the confusion of arange.
Because there is only 1 block with 1 warp (32 threads), the device_print works 32 times, and each prints shows an integer number 0~7, repeating 4 times.
However, according to the doc, arange is introduced like this:
Returns contiguous values within the half-open interval [start, end)
So this makes me a question, why is the device print shows me just one integer, not the whole array 0~7? I expect the output as
, not just 1 integer in 1 device_print call.
Is there something I'm missing?
2. max_contiguous && multiple_of
During my code reading, I bumped into this code line:
ram = tl.max_contiguous(tl.multiple_of(offset_m % M, BLOCK_M), BLOCK_M)
This code line seems a bit popular, since I found it on the pytorch library torch.mm() api, or on some other matrix multipling api. But this is quite confusing either. If I check the doc about these two,
My question is, what is the return type of each of them?
By the way, when I check the output of the device_print("ram", ram), it looks exactly the same with the results up at the question about arange()
Thanks for reading.
Beta Was this translation helpful? Give feedback.
All reactions