Skip to content

Commit 0476b57

Browse files
authored
Add full support for SPV_NV_shader_subgroup_partitioned (#7103)
* Properly implement WaveMask* variants of WaveMultiPrefix* intrinsics * More partitioned intrinsics * More partitioned intrinsics and cleaned up non-prefixed WaveMask* implementations * Refactor HLSL WaveMultiPrefix* implementations * fix cap atoms * Clean up implementation * Add GLSL intrinsics and cleanup * Add tests * Fix affected capability test * Update and fix tests * Move expected.txt file * Refactor WaveMask* to call WaveMulti* * Refactor SPIRV/GLSL preamble code * Enable emit-via-glsl tests * remove wave_multi_prefix capability in favor of subgroup_partitioned * Update docs * Update cap atoms doc
1 parent 554be7a commit 0476b57

23 files changed

+2138
-775
lines changed

docs/command-line-slangc-reference.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1306,7 +1306,6 @@ A capability describes an optional feature that a target may or may not support.
13061306
* `atomicfloat2`
13071307
* `fragmentshaderbarycentric`
13081308
* `shadermemorycontrol`
1309-
* `wave_multi_prefix`
13101309
* `bufferreference`
13111310
* `bufferreference_int64`
13121311
* `cooperative_vector`

docs/user-guide/a3-02-reference-capability-atoms.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -964,9 +964,6 @@ Compound Capabilities
964964
`shadermemorycontrol`
965965
> (gfx targets) Capabilities needed to use memory barriers
966966
967-
`wave_multi_prefix`
968-
> Capabilities needed to use HLSL tier wave operations
969-
970967
`bufferreference`
971968
> Capabilities needed to use GLSL buffer-reference's
972969

docs/wave-intrinsics.md

Lines changed: 77 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,13 @@
1+
12
Wave Intrinsics
23
===============
34

4-
Slang has support for Wave intrinsics introduced to HLSL in [SM6.0](https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/hlsl-shader-model-6-0-features-for-direct3d-12) and [SM6.5](https://github.com/microsoft/DirectX-Specs/blob/master/d3d/HLSL_ShaderModel6_5.md). All intrinsics are available on D3D12, and a subset on Vulkan.
5+
Slang has support for Wave intrinsics introduced to HLSL in [SM6.0](https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/hlsl-shader-model-6-0-features-for-direct3d-12) and [SM6.5](https://github.com/microsoft/DirectX-Specs/blob/master/d3d/HLSL_ShaderModel6_5.md). All intrinsics are available on D3D12 and Vulkan.
6+
7+
On GLSL targets such as Vulkan wave intrinsics map to ['subgroup' extension] (https://github.com/KhronosGroup/GLSL/blob/master/extensions/khr/GL_KHR_shader_subgroup.txt). Vulkan supports a number of masked wave operations through `SPV_NV_shader_subgroup_partitioned` that are not supported by HLSL.
8+
9+
There is no subgroup support for Matrix types, and currently this means that Matrix is not a supported type for Wave intrinsics on Vulkan, but may be in the future.
510

6-
On GLSL targets such as Vulkan wave intrinsics map to ['subgroup' extension] (https://github.com/KhronosGroup/GLSL/blob/master/extensions/khr/GL_KHR_shader_subgroup.txt). There is no subgroup support for Matrix types, and currently this means that Matrix is not a supported type for Wave intrinsics on Vulkan, but may be in the future.
711

812
Also introduced are some 'non standard' Wave intrinsics which are only available on Slang. All WaveMask intrinsics are non standard. Other non standard intrinsics expose more accurately different behaviours which are either not distinguished on HLSL, or perhaps currently unavailable. Two examples would be `WaveShuffle` and `WaveBroadcastLaneAt`.
913

@@ -31,7 +35,10 @@ Using WaveMask intrinsics is generally more verbose and prone to error than the
3135
* Might allow for higher performance (for example it gives more control of divergence)
3236
* Maps most closely to CUDA
3337

34-
On D3D12 and Vulkan the WaveMask intrinsics can be used, but the mask is effectively ignored. For this to work across targets including CUDA, the mask must be calculated such that it exactly matches that of HLSL defined 'active' lanes, else the behavior is undefined.
38+
For this to work across targets including CUDA, the mask must be calculated such that it exactly matches that of HLSL defined 'active' lanes, else the behavior is undefined.
39+
40+
On D3D12 and Vulkan the WaveMask intrinsics can be used, but the mask may be ignored depending on target's support for partitioned/masked wave intrinsics. SPIRV provides support for a wide variety of operations through the `SPV_NV_shader_subgroup_partitioned` extension while HLSL only provides a small subset of operations through `WaveMultiPrefix*` intrinsics. The difference between Slang's `WaveMask` and these targets' partitioned wave intrinsics is that they accept a `uint4` mask instead of a `uint` mask. `WaveMask*` intrinsics effectively gets translated to `WaveMulti*` intrinsics when targeting SPIRV/GLSL and HLSL. Please consult [Wave Multi Intrinsics](#wave-multi-intrinsics) for more details, including what masked operations are supported by each target.
41+
3542

3643
The WaveMask intrinsics are a non standard Slang feature, and may change in the future.
3744

@@ -103,10 +110,10 @@ void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID)
103110
outputBuffer[idx] = value;
104111
}
105112
```
106-
107113
## WaveMulti
108114

109-
The standard 'Multi' intrinsics were added to HLSL is SM6.5, they can specify a mask of lanes via uint4. They introduce some intrinsics that work in a similar fashion to the `WaveMask` intrinsics. The available intrisnics is currently significantly restricted compared to WaveMask.
115+
The standard 'Multi' intrinsics were added to HLSL is SM 6.5 and are available in SPIRV through `SPV_NV_shader_subgroup_partitioned`, they can specify a mask of lanes via uint4. SPIRV provide non-prefix (reduction) and prefix (scan) intrinsics for arithmetic and min/max operations, while HLSL only provides a subset of these, namely exclusive prefix arithmetic operations.
116+
110117

111118
Standard Wave intrinsics
112119
=========================
@@ -236,6 +243,7 @@ void GroupMemoryBarrierWithWaveSync();
236243

237244
Synchronizes all lanes to the same GroupMemoryBarrierWithWaveSync in program flow. Orders group shared memory accesses such that accesses after the barrier can be seen by writes before.
238245

246+
239247
Wave Rotate Intrinsics
240248
======================
241249

@@ -250,16 +258,77 @@ T WaveRotate(T value, uint delta);
250258
T WaveClusteredRotate(T value, uint delta, constexpr uint clusterSize);
251259
```
252260

261+
Wave Multi Intrinsics
262+
======================
263+
264+
`WaveMulti` intrinsics take an explicit `uint4` mask of lanes to operate on. They correspond to the subgroup partitioned intrinsics provided by `SPV_NV_shader_subgroup_partitioned` and the `WaveMultiPrefix*` intrinsics provided by HLSL SM 6.5. HLSL's `WaveMulti*` intrinsics only provide operations for exclusive prefix arithmetic operations, while Vulkan's `SPV_NV_shader_subgroup_partitioned` provides operations for both inclusive/exclusive prefix (scan) and non-prefix (reduction) arithmetic and min/max operations.
265+
266+
Slang adds new `WaveMulti*` intrinsics in addition to HLSL's `WaveMultiPrefix*` to allow generating all partitioned intrinsics supported in SPIRV. The new, non-standard HLSL, `WaveMulti*` intrinsics are only supported when targeting SPIRV, GLSL and CUDA. The inclusive variants of HLSL's `WaveMultiPrefix*` intrinsics are emulated by Slang by performing an additional operation in the current invocation. Metal and WGSL targets do not support `WaveMulti` intrinsics.
267+
```
268+
// Across lane ops. These are only supported when targeting SPIRV, GLSL and CUDA.
269+
270+
T WaveMultiSum(T value, uint4 mask);
271+
272+
T WaveMultiProduct(T value, uint4 mask);
273+
274+
T WaveMultiMin(T value, uint4 mask);
275+
276+
T WaveMultiMax(T value, uint4 mask);
277+
278+
T WaveMultiBitAnd(T value, uint4 mask);
279+
280+
T WaveMultiBitOr(T value, uint4 mask);
281+
282+
T WaveMultiBitXor(T value, uint4 mask);
283+
284+
285+
// Prefix arithmetic operations. Supported when targeting SPIRV, GLSL, CUDA and HLSL.
286+
// In addition to these non-HLSL standard intrinsics are the standard `WaveMultiPrefix*`
287+
// intrinsics provided by SM 6.5, detailed in the `Standard Wave Intrinsics` section.
288+
289+
T WaveMultiPrefixInclusiveSum(T value, uint4 mask);
290+
291+
T WaveMultiPrefixInclusiveProduct(T value, uint4 mask);
292+
293+
T WaveMultiPrefixInclusiveBitAnd(T value, uint4 mask);
294+
295+
T WaveMultiPrefixInclusiveBitOr(T value, uint4 mask);
296+
297+
T WaveMultiPrefixInclusiveBitXor(T value, uint4 mask);
298+
299+
T WaveMultiPrefixExclusiveSum(T value, uint4 mask);
300+
301+
T WaveMultiPrefixExclusiveProduct(T value, uint4 mask);
302+
303+
T WaveMultiPrefixExclusiveBitAnd(T value, uint4 mask);
304+
305+
T WaveMultiPrefixExclusiveBitOr(T value, uint4 mask);
306+
307+
T WaveMultiPrefixExclusiveBitXor(T value, uint4 mask);
308+
309+
310+
// Prefix min/max operations. Supported when targeting SPIRV and GLSL.
311+
312+
T WaveMultiPrefixInclusiveMin(T value, uint4 mask);
313+
314+
T WaveMultiPrefixInclusiveMax(T value, uint4 mask);
315+
316+
T WaveMultiPrefixExclusiveMin(T value, uint4 mask);
317+
318+
T WaveMultiPrefixExclusiveMax(T value, uint4 mask);
319+
```
320+
321+
253322
Wave Mask Intrinsics
254323
====================
255324

256325
CUDA has a different programming model for inter warp/wave communication based around masks of active lanes. This is because the CUDA programming model allows for divergence that is more granualar than just on program flow, and that there isn't implied reconvergence at the end of a conditional.
257326

258-
In the future Slang may have the capability to work out the masks required such that the regular HLSL Wave intrinsics work. As it stands there does not appear to be any way to implement the regular Wave intrinsics directly. To work around this problem we introduce 'WaveMask' intrinsics, which are essentially the same as the regular HLSL Wave intrinsics with the first parameter as the WaveMask which identifies the participating lanes.
327+
In the future Slang may have the capability to work out the masks required such that the regular HLSL Wave intrinsics work. As it stands there does not appear to be any way to implement the regular Wave intrinsics directly. To work around this problem we introduce 'WaveMask' intrinsics, which are essentially the same as the regular HLSL Wave intrinsics with the first parameter as the WaveMask which identifies the participating lanes.
259328

260-
The WaveMask intrinsics will work across targets, but *only* if on CUDA targets the mask captures exactly the same lanes as the 'Active' lanes concept in HLSL. If the masks deviate then the behavior is undefined. On non CUDA based targets currently the mask is ignored. This behavior may change on GLSL which has an extension to support a more CUDA like behavior.
329+
The WaveMask intrinsics will work across targets, but *only* if on CUDA targets the mask captures exactly the same lanes as the 'Active' lanes concept in HLSL. If the masks deviate then the behavior is undefined. On non CUDA based targets currently the mask *may* be ignored depending on the intrinsics supported by the target.
261330

262-
Most of the `WaveMask` functions are identical to the regular Wave intrinsics, but they take a WaveMask as the first parameter, and the intrinsic name starts with `WaveMask`.
331+
Most of the `WaveMask` functions are identical to the regular Wave intrinsics, but they take a WaveMask as the first parameter, and the intrinsic name starts with `WaveMask`. Also note that the `WaveMask` functions are introduced in Slang before the `WaveMulti` intrinsics, and they effectively function the same other than the mask width in bits (`uint` vs `uint4`). The `WaveMulti` intrinsics map closer to SPIRV and HLSL, and are recommended to be used over `WaveMask` intrinsics whenever possible. We plan to deprecate the `WaveMask` intrinsics some time in the future.
263332

264333
```
265334
WaveMask WaveGetConvergedMask();

0 commit comments

Comments
 (0)