You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/wave-intrinsics.md
+77-8Lines changed: 77 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,9 +1,13 @@
1
+
1
2
Wave Intrinsics
2
3
===============
3
4
4
-
Slang has support for Wave intrinsics introduced to HLSL in [SM6.0](https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/hlsl-shader-model-6-0-features-for-direct3d-12) and [SM6.5](https://github.com/microsoft/DirectX-Specs/blob/master/d3d/HLSL_ShaderModel6_5.md). All intrinsics are available on D3D12, and a subset on Vulkan.
5
+
Slang has support for Wave intrinsics introduced to HLSL in [SM6.0](https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/hlsl-shader-model-6-0-features-for-direct3d-12) and [SM6.5](https://github.com/microsoft/DirectX-Specs/blob/master/d3d/HLSL_ShaderModel6_5.md). All intrinsics are available on D3D12 and Vulkan.
6
+
7
+
On GLSL targets such as Vulkan wave intrinsics map to ['subgroup' extension] (https://github.com/KhronosGroup/GLSL/blob/master/extensions/khr/GL_KHR_shader_subgroup.txt). Vulkan supports a number of masked wave operations through `SPV_NV_shader_subgroup_partitioned` that are not supported by HLSL.
8
+
9
+
There is no subgroup support for Matrix types, and currently this means that Matrix is not a supported type for Wave intrinsics on Vulkan, but may be in the future.
5
10
6
-
On GLSL targets such as Vulkan wave intrinsics map to ['subgroup' extension] (https://github.com/KhronosGroup/GLSL/blob/master/extensions/khr/GL_KHR_shader_subgroup.txt). There is no subgroup support for Matrix types, and currently this means that Matrix is not a supported type for Wave intrinsics on Vulkan, but may be in the future.
7
11
8
12
Also introduced are some 'non standard' Wave intrinsics which are only available on Slang. All WaveMask intrinsics are non standard. Other non standard intrinsics expose more accurately different behaviours which are either not distinguished on HLSL, or perhaps currently unavailable. Two examples would be `WaveShuffle` and `WaveBroadcastLaneAt`.
9
13
@@ -31,7 +35,10 @@ Using WaveMask intrinsics is generally more verbose and prone to error than the
31
35
* Might allow for higher performance (for example it gives more control of divergence)
32
36
* Maps most closely to CUDA
33
37
34
-
On D3D12 and Vulkan the WaveMask intrinsics can be used, but the mask is effectively ignored. For this to work across targets including CUDA, the mask must be calculated such that it exactly matches that of HLSL defined 'active' lanes, else the behavior is undefined.
38
+
For this to work across targets including CUDA, the mask must be calculated such that it exactly matches that of HLSL defined 'active' lanes, else the behavior is undefined.
39
+
40
+
On D3D12 and Vulkan the WaveMask intrinsics can be used, but the mask may be ignored depending on target's support for partitioned/masked wave intrinsics. SPIRV provides support for a wide variety of operations through the `SPV_NV_shader_subgroup_partitioned` extension while HLSL only provides a small subset of operations through `WaveMultiPrefix*` intrinsics. The difference between Slang's `WaveMask` and these targets' partitioned wave intrinsics is that they accept a `uint4` mask instead of a `uint` mask. `WaveMask*` intrinsics effectively gets translated to `WaveMulti*` intrinsics when targeting SPIRV/GLSL and HLSL. Please consult [Wave Multi Intrinsics](#wave-multi-intrinsics) for more details, including what masked operations are supported by each target.
41
+
35
42
36
43
The WaveMask intrinsics are a non standard Slang feature, and may change in the future.
The standard 'Multi' intrinsics were added to HLSL is SM6.5, they can specify a mask of lanes via uint4. They introduce some intrinsics that work in a similar fashion to the `WaveMask` intrinsics. The available intrisnics is currently significantly restricted compared to WaveMask.
115
+
The standard 'Multi' intrinsics were added to HLSL is SM 6.5 and are available in SPIRV through `SPV_NV_shader_subgroup_partitioned`, they can specify a mask of lanes via uint4. SPIRV provide non-prefix (reduction) and prefix (scan) intrinsics for arithmetic and min/max operations, while HLSL only provides a subset of these, namely exclusive prefix arithmetic operations.
Synchronizes all lanes to the same GroupMemoryBarrierWithWaveSync in program flow. Orders group shared memory accesses such that accesses after the barrier can be seen by writes before.
238
245
246
+
239
247
Wave Rotate Intrinsics
240
248
======================
241
249
@@ -250,16 +258,77 @@ T WaveRotate(T value, uint delta);
250
258
T WaveClusteredRotate(T value, uint delta, constexpr uint clusterSize);
251
259
```
252
260
261
+
Wave Multi Intrinsics
262
+
======================
263
+
264
+
`WaveMulti` intrinsics take an explicit `uint4` mask of lanes to operate on. They correspond to the subgroup partitioned intrinsics provided by `SPV_NV_shader_subgroup_partitioned` and the `WaveMultiPrefix*` intrinsics provided by HLSL SM 6.5. HLSL's `WaveMulti*` intrinsics only provide operations for exclusive prefix arithmetic operations, while Vulkan's `SPV_NV_shader_subgroup_partitioned` provides operations for both inclusive/exclusive prefix (scan) and non-prefix (reduction) arithmetic and min/max operations.
265
+
266
+
Slang adds new `WaveMulti*` intrinsics in addition to HLSL's `WaveMultiPrefix*` to allow generating all partitioned intrinsics supported in SPIRV. The new, non-standard HLSL, `WaveMulti*` intrinsics are only supported when targeting SPIRV, GLSL and CUDA. The inclusive variants of HLSL's `WaveMultiPrefix*` intrinsics are emulated by Slang by performing an additional operation in the current invocation. Metal and WGSL targets do not support `WaveMulti` intrinsics.
267
+
```
268
+
// Across lane ops. These are only supported when targeting SPIRV, GLSL and CUDA.
269
+
270
+
T WaveMultiSum(T value, uint4 mask);
271
+
272
+
T WaveMultiProduct(T value, uint4 mask);
273
+
274
+
T WaveMultiMin(T value, uint4 mask);
275
+
276
+
T WaveMultiMax(T value, uint4 mask);
277
+
278
+
T WaveMultiBitAnd(T value, uint4 mask);
279
+
280
+
T WaveMultiBitOr(T value, uint4 mask);
281
+
282
+
T WaveMultiBitXor(T value, uint4 mask);
283
+
284
+
285
+
// Prefix arithmetic operations. Supported when targeting SPIRV, GLSL, CUDA and HLSL.
286
+
// In addition to these non-HLSL standard intrinsics are the standard `WaveMultiPrefix*`
287
+
// intrinsics provided by SM 6.5, detailed in the `Standard Wave Intrinsics` section.
288
+
289
+
T WaveMultiPrefixInclusiveSum(T value, uint4 mask);
290
+
291
+
T WaveMultiPrefixInclusiveProduct(T value, uint4 mask);
292
+
293
+
T WaveMultiPrefixInclusiveBitAnd(T value, uint4 mask);
294
+
295
+
T WaveMultiPrefixInclusiveBitOr(T value, uint4 mask);
296
+
297
+
T WaveMultiPrefixInclusiveBitXor(T value, uint4 mask);
298
+
299
+
T WaveMultiPrefixExclusiveSum(T value, uint4 mask);
300
+
301
+
T WaveMultiPrefixExclusiveProduct(T value, uint4 mask);
302
+
303
+
T WaveMultiPrefixExclusiveBitAnd(T value, uint4 mask);
304
+
305
+
T WaveMultiPrefixExclusiveBitOr(T value, uint4 mask);
306
+
307
+
T WaveMultiPrefixExclusiveBitXor(T value, uint4 mask);
308
+
309
+
310
+
// Prefix min/max operations. Supported when targeting SPIRV and GLSL.
311
+
312
+
T WaveMultiPrefixInclusiveMin(T value, uint4 mask);
313
+
314
+
T WaveMultiPrefixInclusiveMax(T value, uint4 mask);
315
+
316
+
T WaveMultiPrefixExclusiveMin(T value, uint4 mask);
317
+
318
+
T WaveMultiPrefixExclusiveMax(T value, uint4 mask);
319
+
```
320
+
321
+
253
322
Wave Mask Intrinsics
254
323
====================
255
324
256
325
CUDA has a different programming model for inter warp/wave communication based around masks of active lanes. This is because the CUDA programming model allows for divergence that is more granualar than just on program flow, and that there isn't implied reconvergence at the end of a conditional.
257
326
258
-
In the future Slang may have the capability to work out the masks required such that the regular HLSL Wave intrinsics work. As it stands there does not appear to be any way to implement the regular Wave intrinsics directly. To work around this problem we introduce 'WaveMask' intrinsics, which are essentially the same as the regular HLSL Wave intrinsics with the first parameter as the WaveMask which identifies the participating lanes.
327
+
In the future Slang may have the capability to work out the masks required such that the regular HLSL Wave intrinsics work. As it stands there does not appear to be any way to implement the regular Wave intrinsics directly. To work around this problem we introduce 'WaveMask' intrinsics, which are essentially the same as the regular HLSL Wave intrinsics with the first parameter as the WaveMask which identifies the participating lanes.
259
328
260
-
The WaveMask intrinsics will work across targets, but *only* if on CUDA targets the mask captures exactly the same lanes as the 'Active' lanes concept in HLSL. If the masks deviate then the behavior is undefined. On non CUDA based targets currently the mask is ignored. This behavior may change on GLSL which has an extension to support a more CUDA like behavior.
329
+
The WaveMask intrinsics will work across targets, but *only* if on CUDA targets the mask captures exactly the same lanes as the 'Active' lanes concept in HLSL. If the masks deviate then the behavior is undefined. On non CUDA based targets currently the mask *may* be ignored depending on the intrinsics supported by the target.
261
330
262
-
Most of the `WaveMask` functions are identical to the regular Wave intrinsics, but they take a WaveMask as the first parameter, and the intrinsic name starts with `WaveMask`.
331
+
Most of the `WaveMask` functions are identical to the regular Wave intrinsics, but they take a WaveMask as the first parameter, and the intrinsic name starts with `WaveMask`. Also note that the `WaveMask` functions are introduced in Slang before the `WaveMulti` intrinsics, and they effectively function the same other than the mask width in bits (`uint` vs `uint4`). The `WaveMulti` intrinsics map closer to SPIRV and HLSL, and are recommended to be used over `WaveMask` intrinsics whenever possible. We plan to deprecate the `WaveMask` intrinsics some time in the future.
0 commit comments