Skip to content

Commit f173692

Browse files
authored
Docs: Update fusion API doc and add link to samples (#3870)
* Update fusion API doc and add link to samples * Add new rows to table * Update graphs
1 parent 98f608d commit f173692

File tree

7 files changed

+111
-11
lines changed

7 files changed

+111
-11
lines changed

docs/data/how-to/bn_activ_fused.png

138 KB
Loading

docs/data/how-to/cba.png

184 KB
Loading

docs/data/how-to/na.png

-47.6 KB
Binary file not shown.

docs/how-to/use-fusion-api.rst

Lines changed: 98 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -269,22 +269,110 @@ need to worry about additional cleanup.
269269
Supported fusions
270270
=================================================
271271

272-
The following tables outline the supported fusions for ``FP32`` and ``FP16``, including any applicable
272+
The following tables outline the supported fusions for ``FP32``, ``FP16``, and ``BFP16``, including any applicable
273273
constraints.
274274

275275
.. note::
276276

277-
Fusion Plans with grouped convolutions are not supported.
277+
Fusion Plans with grouped convolutions are supported in the inference direction for
278+
convolution, bias, and activation.
279+
280+
The following abbreviations apply to the combination column in the following tables:
281+
282+
* **C**: Convolution
283+
* **B**: Bias
284+
* **N**: Batch Normalization
285+
* **A**: Activation
286+
287+
For example, CBA refers to convolution plus bias plus activation.
288+
289+
Convolution-based FP32 fusion for inference
290+
-------------------------------------------
291+
292+
The following table applies to single-precision floating point.
293+
294+
.. csv-table::
295+
:header: "Combination","Conv algo","Stride","Filter dims","N mode","Activations","Other constraints"
296+
:widths: 15, 15, 15, 20, 12, 20, 20
297+
298+
"CBNA","Direct","1 and 2","3x3, 5x5, 7x7, 9x9, 11x11","All","All","stride and padding must be either 1 or 2"
299+
"CBA","Direct","--","1x1","--","All","stride and padding not supported"
300+
"CBA","Winograd","1","1x1, 2x2","N/A","Relu, Leaky Relu","c >= 18"
301+
"CBA","Winograd","1","3x3","--","Relu, Leaky Relu","c >= 18 and c is even"
302+
"CBA","Winograd","1","4x4, 5x5, 6x6","--","Relu, Leaky Relu","4 x c >= 18"
303+
"CBA","Winograd","1","7x7, 8x8, 9x9","--","Relu, Leaky Relu","12 x c >= 18"
304+
"CBA","Winograd","1","10x10, 11x11, 12x12","--","Relu, Leaky Relu","16 x c >= 18"
305+
"CBA","Winograd","1","larger filter sizes","--","Relu, Leaky Relu","none"
306+
"CBA","Winograd","2","1x1","--","Relu, Leaky Relu","2 x c >= 18"
307+
"CBA","Winograd","2","2x2, 3x3, 4x4, 5x5, 6x6","--","Relu, Leaky Relu","4 x c >= 18"
308+
"CBA","Winograd","2","7x7","--","Relu, Leaky Relu","12 x c >= 18"
309+
"CBA","Winograd","2","8x8, 9x9, 10x10, 11x11, 12x12","--","Relu, Leaky Relu","16 x c >= 18"
310+
"CBA","Winograd","2","larger filter sizes","--","Relu, Leaky Relu","none"
311+
"CBA","CK","--","--","--","Relu, Clipped Relu, CLAMP","none"
312+
"NA","--","--","--","All","All","padding not supported"
313+
"CA","Direct","--","1x1","--","All","stride and padding not supported"
314+
"CA","CK","--","--","--","Relu, Clipped Relu, CLAMP","none"
278315

279-
**C = convolution, B = bias, N = batch normalization, A = activation**
316+
.. note::
280317

281-
.. image:: ../data/how-to/fp32fusions.png
282-
:width: 800
283-
:alt: Convolution based fp32 fusion
318+
N mode is either spatial or per activation. For CBA, other asymmetric kernels are supported but for brevity are not enumerated here.
284319

285-
.. image:: ../data/how-to/fp16fusions.png
286-
:width: 800
287-
:alt: Convolution based fp16 fusion
320+
321+
Convolution-based FP16 fusion for inference
322+
-------------------------------------------
323+
324+
The following table applies to half-precision floating point.
325+
326+
.. csv-table::
327+
:header: "Combination","Conv algo","Stride","Filter dims","N mode","Activations","Other constraints"
328+
:widths: 15, 15, 15, 20, 12, 20, 20
329+
330+
"CBNA","Direct","1 and 2","3x3, 5x5, 7x7, 9x9, 11x11","All","All","stride and padding must be either 1 or 2"
331+
"CBA","Direct","--","1x1","--","All","stride and padding not supported"
332+
"CBA","CK","--","--","--","Relu, Clipped Relu, CLAMP","none"
333+
"CA","Direct","--","1x1","--","All","stride and padding not supported"
334+
"CA","CK","--","--","--","Relu, Clipped Relu, CLAMP","none"
335+
336+
.. note::
337+
338+
N mode is either spatial or per activation.
339+
340+
341+
Convolution-based BFP16 fusion for inference
342+
--------------------------------------------
343+
344+
The following table applies to half-precision block floating point.
345+
346+
.. csv-table::
347+
:header: "Combination","Conv algo","Stride","Filter dims","N mode","Activations","Other constraints"
348+
:widths: 15, 15, 15, 20, 12, 20, 20
349+
350+
"CBNA","Direct","1 and 2","3x3, 5x5, 7x7, 9x9, 11x11","All","All","stride and padding must be either 1 or 2"
351+
"CBA","Direct","--","1x1","--","All","stride and padding not supported"
352+
"CBA","CK","--","--","--","Relu, Clipped Relu, CLAMP","none"
353+
"CA","Direct","--","1x1","--","All","stride and padding not supported"
354+
"CA","CK","--","--","--","Relu, Clipped Relu, CLAMP","none"
355+
356+
.. note::
357+
358+
N mode is either spatial or per activation.
359+
360+
Batch Normalization-based fusion for FP32, BFP16, and FP16 for inference and training
361+
-------------------------------------------------------------------------------------
362+
363+
The following table applies to both full-precision and half-precision floating point.
364+
365+
.. csv-table::
366+
:header: "Combination","N mode","Activations","Constraints"
367+
:widths: 30, 15, 15, 15
368+
369+
"NA for inference","All","All","None"
370+
"NA forward training","All","All","None"
371+
"NA backward training","All","All","None"
372+
373+
.. note::
374+
375+
N mode is either spatial or per activation.
288376

289377
Comparing performance with non-fused kernels
290378
=================================================
@@ -298,6 +386,6 @@ non-fused version. All configurations have a batch size of 64:
298386

299387
The following graph depicts the speedup obtained by fusing BatchNorm (in spatial mode) with activation:
300388

301-
.. image:: ../data/how-to/na.png
389+
.. image:: ../data/how-to/bn_activ_fused.png
302390
:width: 800
303391
:alt: BatchNorm activation fusion

docs/how-to/use-nhwc-batchnorm-in-pytorch.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ and is performed for each training batch.
4646
For more information on Batch Normalization, see `Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift <https://arxiv.org/abs/1502.03167>`_.
4747

4848
Enabling or disabling NHWC Batch Normalization for MIOpen using PyTorch
49-
=============================================================
49+
=======================================================================
5050

5151
The PyTorch open-source tensor library provides support for using NHWC Batch Normalization with MIOpen.
5252
In addition to Batch Normalization, NHWC support is also available for convolution and other MIOpen features.

docs/index.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,9 @@ The MIOpen public repository is located at `<https://github.com/ROCm/MIOpen>`_.
2727
* :doc:`Build MIOpen for embedded systems <./install/embed>`
2828
* :doc:`Build MIOpen using Docker <./install/docker-build>`
2929

30+
.. grid:: 2
31+
:gutter: 3
32+
3033
.. grid-item-card:: Conceptual
3134

3235
* :doc:`Find database <./conceptual/finddb>`
@@ -43,6 +46,10 @@ The MIOpen public repository is located at `<https://github.com/ROCm/MIOpen>`_.
4346
* :doc:`Use the find APIs and immediate mode <./how-to/find-and-immediate>`
4447
* :doc:`Use NHWC Batch Normalization with PyTorch <./how-to/use-nhwc-batchnorm-in-pytorch>`
4548

49+
.. grid-item-card:: Samples
50+
51+
* `MIOpen samples <https://github.com/ROCm/MIOpen/tree/develop/samples>`_
52+
4653
.. grid-item-card:: Reference
4754

4855
* :doc:`API library <reference/index>`

docs/sphinx/_toc.yml.in

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,11 @@ subtrees:
4343
- file: how-to/use-nhwc-batchnorm-in-pytorch.rst
4444
title: Use NHWC Batch Normalization with PyTorch
4545

46+
- caption: Samples
47+
entries:
48+
- url: https://github.com/ROCm/MIOpen/tree/develop/samples
49+
title: MIOpen samples
50+
4651
- caption: Reference
4752
entries:
4853
- file: reference/index

0 commit comments

Comments
 (0)