Skip to content

Commit 0f10081

Browse files
Sunny-Anandtungldchentong319christopherlmunoz
authored
Cherry pick updates from main for z17 and fix for ZHighConstantPropagation in QunarizedStick (#3133)
* Fix an error in ZHighConstantPropagation for QuantizedStick (#3112) Signed-off-by: Tung D. Le <[email protected]> Signed-off-by: Sunny Anand <[email protected]> * Add z17 for -march (#3113) * done Signed-off-by: Tong Chen <[email protected]> * convert Signed-off-by: Tong Chen <[email protected]> * fix Signed-off-by: Tong Chen <[email protected]> * format Signed-off-by: Tong Chen <[email protected]> --------- Signed-off-by: Tong Chen <[email protected]> Signed-off-by: Tong Chen <[email protected]> Signed-off-by: Sunny Anand <[email protected]> * update zdnn1.1.2 (#3130) Signed-off-by: Sunny Anand <[email protected]> * Updating supported ops on NNPA md for z17. (#3120) * starting to update new z17 NNPA ops Signed-off-by: Christopher Munoz <[email protected]> --------- Signed-off-by: Christopher Munoz <[email protected]> Co-authored-by: Sunny Anand <[email protected]> Co-authored-by: Tung D. Le <[email protected]> Signed-off-by: Sunny Anand <[email protected]> --------- Signed-off-by: Tung D. Le <[email protected]> Signed-off-by: Sunny Anand <[email protected]> Signed-off-by: Tong Chen <[email protected]> Signed-off-by: Tong Chen <[email protected]> Signed-off-by: Christopher Munoz <[email protected]> Co-authored-by: Tung D. Le <[email protected]> Co-authored-by: Tong Chen <[email protected]> Co-authored-by: Christopher Munoz <[email protected]>
1 parent 660bd8e commit 0f10081

File tree

11 files changed

+161
-92
lines changed

11 files changed

+161
-92
lines changed

docs/Quantization-NNPA.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,7 @@ There are two approaches to using quantization in the onnx-mlir compiler, depend
1010
- The input model is a non-quantized model, e.g. operations operate on float32 data types. In this case, the onnx-mlir compiler provides several quantization options in order to quantize the model during compilation, then run the compiled model on NNPA. The remaining of this document describes this approach.
1111
- In this approach, the compiler only supports dynamic quantization.
1212

13-
In both approaches, the following constraints are applied:
14-
- Only per-tensor quantization is supported, meaning `scale` and `zero_point` are computed per-tensor and are scalar values.
15-
- Target quantization data type is 8-bit signed-integer.
16-
17-
Quantization requires NNPA in IBM Telum II, meaning that the following compile flags must be specified to enable quantization: `-maccel=NNPA -march=arch15`.
13+
Quantization requires NNPA in IBM Telum II, meaning that the following compile flags must be specified to enable quantization: `-maccel=NNPA -march=z17`.
1814

1915
# Dynamic quantization by the compiler
2016

@@ -61,5 +57,12 @@ quantized_x = x/scale + zero_point
6157
It is often the case that symmetric quantization leads to better inference performance but poorer accuracy than asymmetric quantization.
6258
Users may want to experiment with different quantization schemes to find the best combination for their own model.
6359

60+
# Limitations
61+
62+
- Only per-tensor quantization is supported, meaning scale and zero_point are computed per-tensor and are scalar values. Per-channel quantization is not supported yet.
63+
- Target quantization data type is 8-bit signed-integer.
64+
- Asymmetric quantization for weights is not yet supported.
65+
- Blocked quantization is not supported.
66+
6467
# Resources
6568
- [A visual guide to quantization](https://www.maartengrootendorst.com/blog/quantization/)
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
<!--- File created to explain NNPA hardware operations and limitations. -->
2+
<!-- This file was created manually refer to https://github.com/onnx/onnx-mlir/issues/3125 for more information -->
3+
4+
# Supported Operations for Target *NNPA*.
5+
6+
This document highlights operations that are performed on NNPA hardware that are not explicilty supported by ONNX.
7+
8+
* **Minimum NNPA Level(Inclusive)** indicates the lowest and highest NNPA level a model may have for onnx-mlir to support compiling a model with the operator.
9+
* A ^ indicates onnx-mlir is compatible with the latest level of the NNPA Architecture which is z17.
10+
* Refer to [SupportedONNXOps-NNPA.md](https://github.com/onnx/onnx-mlir/blob/main/docs/SupportedONNXOps-NNPA.md) for ONNX supported operations.
11+
12+
* **Improvements**
13+
* Transposed MatMul - Optimization of the pattern MatMul followed by transpose consolidated and executed on NNPA.
14+
* Maximum Dimension Index Size (MDIS) - /*e1*/ 2097152, /*e2*/ 1048576, /*e3*/ 32768, /*e4*/ 32768.
15+
* Stickification - Perfroms data conversions to NNPAs internal format, DLFLOAT16, on the NNPA.
16+
* MatMul Broadcast - Adds Bcast1 support to the MatMul operations broadcasting input_a over input_b and input_c.
17+
18+
| Op |Minimum NNPA Level(Inclusive) |Limitations |Notes |
19+
| --- |--- |--- |--- |
20+
| **Invsqrt** |z17 - ^ | Input tensor must be less than or equal to 4 dimensions. | |

docs/SupportedONNXOps-NNPA.md

Lines changed: 40 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -8,38 +8,50 @@ Onnx-mlir currently supports ONNX operations targeting up to opset 22. Limitatio
88
* Operations are defined by the [ONNX Standard](https://github.com/onnx/onnx/blob/main/docs/Operators.md).
99
* **Supported Opsets** indicates the lowest and highest opset a model may have for onnx-mlir to support compiling a model with the operator.
1010
* A * indicates onnx-mlir is compatible with the latest version of that operator available as of opset 22.
11-
* A ^ indicates onnx-mlir is compatible with the latest level of the NNPA Architecture which is z16.
11+
* A ^ indicates onnx-mlir is compatible with the latest level of the NNPA Architecture which is z17.
12+
13+
14+
NNPA for z16 and z17 have hardware limitations in dimension index size and tensor size, which are described in [NNPALimit.cpp](../src/Accelerators/NNPA/Support/NNPALimit.cpp). They are large enough for normal use cases, but if your model exceeds the limitations, CPU is used instead of NNPA. NNPA currently only support DLFLOAT16 as its data type. Common data formats like FP32, FP16, BFLOAT need to undergo data conversions to the NNPA internal format DLFLOAT16. Hence ONNX ops which updated their tensors to BFLOAT16 will not be natively supported on NNPA. Onnx-mlir with NNPA utilizes hardware when possible. To accomplish this, the compiler converts ONNX ops to [ZHigh](Dialects/zhigh.md) ops, [ZLow](Dialects/zlow.md) ops, and are processed by the [IBM Z Deep Neural Network Library (zDNN)](https://github.com/IBM/zDNN).
15+
16+
17+
Refer to the [Qunatization-NNPA.md](https://github.com/onnx/onnx-mlir/blob/main/docs/Quantization-NNPA.md#limiations) page for limitations pertaining to quantization support on z17.
1218

1319

14-
NNPA has hardware limitations in dimension index size and tensor size, which are described in [NNPALimit.hpp](../src/Accelerators/NNPA/Support/NNPALimit.hpp). They are large enough for normal use cases, but if your model exceeds the limitations, CPU is used instead of NNPA. NNPA currently only support DLFLOAT16 as its data type. Common data formats like FP32, FP16, BFLOAT need to undergo data conversions to the NNPA internal format DLFLOAT16. Hence ONNX ops which updated their tensors to BFLOAT16 will not be natively supported on NNPA. Onnx-mlir with NNPA utilizes hardware when possible. To accomplish this, the compiler converts ONNX ops to [ZHigh](Dialects/zhigh.md) ops, [ZLow](Dialects/zlow.md) ops, and are processed by the [IBM Z Deep Neural Network Library (zDNN)](https://github.com/IBM/zDNN).
1520

1621

1722
| Op |Supported Opsets (inclusive) |Minimum NNPA Level(Inclusive) |Limitations |Notes |
1823
| --- |--- |--- |--- |--- |
19-
| **Add** |6 - * |z16 |- Shape of input tensors must be the same since broadcasting is not supported.<br>- Input tensors must have static dimensions. | |
20-
| **AveragePool** |6 - * |z16 |- `auto_pad` must be `NOTSET`, `VALID`, and `SAME_UPPER`. If `NOTSET` is used, `pads` must be set so that the padding valid type or same upper.<br>- `ceil_mode` must be default value(0) <br>- Input and output tensors must be 4D tensors (N x C x H x W).<br>- `kernel_shape` must be static.<br>- `count_include_pad` must be default value(0).<br>- `ceil_mode` must be default value(0). | |
21-
| **BatchNormalization** |6 - * |z16 |Input and output tensor must be 4D(N x C x H x W). | |
22-
| **Conv** |6 - * |z16 |- `auto_pad` must be `NOTSET`, `VALID`, and `SAME_UPPER`. If `NOTSET` is used, `pads` must be set so that the padding valid type or same upper.<br>- Dimension in Height and weight must be static.<br>- `group` must be default value(1).<br>- `dilations` must be default value(1).<br>- Input and output tensors must have 4D (N x C x H x W).<br>- `kernel_shape` must be static. | |
23-
| **ConvTranspose** |6 - * |z16 |- 1D and 3D not supported because Conv1D and Conv3D not supported in zDNN. non-default `dilations` not supported because dilated convolution not supported in zDNN. | |
24-
| **Div** |6 - * |z16 |- Shape of input tensors must be the same since broadcasting is not supported.<br>- Input tensors must have static dimensions. | |
25-
| **Exp** |6 - * |z16 |Input tensor must have 4 dimensions. | |
26-
| **GRU** |7 - * |z16 |- `direction` and `hidden_size` in `W` must have static dimensions.<br>- `R` must have static dimensions.<br>- If `B` and `initial_h` are given, they must have static dimensions.<br>- `sequence_lens` is not supported for bidirectional GRU.<br>- `activations` must be `["Sigmoid", "Tanh", "Tanh"]`.<br>- `clip` is not supported.<br>- `linear_before_reset` must be 1.<br>- `layout` is not supported. | |
27-
| **Gemm** |6 - * |z16 |- `alpha` and `beta` must be default value(1).<br>- Rank of `C` must be 1 or 2. If the rank is 1, the dimension of `C` must be the same with the seconde dimension of `B`.<br>. | |
28-
| **GlobalAveragePool** |6 - * |z16 |- Input shape must be 4D tensor(NCHW).<br>- Dimensions in `H` and `W` must be static. | |
29-
| **LSTM** |7 - * |z16 |- `direction` and `hidden_size` in `W` must have static dimensions.<br>- `R` must have static dimensions.<br>- `B` and `initial_h` have static dimensions if given. `B`'s direction dim must be 1 or 2.<br>- `P`(peepholes), `activation_alpha`, and `activation_beta` are not supported.<br>- `activations` must be `["Sigmoid", "Tanh", "Tanh"]`.<br>- `clip` is not supported.<br>- `input_forget` must be default value(0).<br>- `layout` is not supported. | |
24+
| **Add** |6 - * |z16 - ^ |Shape of input tensors must be the same since broadcasting is not supported. | |
25+
| **AveragePool** |6 - * |z16 - ^ |- `auto_pad` must be `NOTSET`, `VALID`, and `SAME_UPPER`. If `NOTSET` is used, `pads` must be set so that the padding valid type or same upper.<br>- `ceil_mode` must be default value(0) <br>- Input and output tensors must be 4D tensors (N x C x H x W).<br>- `kernel_shape` must be static.<br>- `count_include_pad` must be default value(0).<br>- `ceil_mode` must be default value(0). | |
26+
| **BatchNormalization** |6 - * |z16 - ^ |Input and output tensor must be 4D(N x C x H x W). | |
27+
| **Conv** |6 - * |z16 - ^ |- `auto_pad` must be `NOTSET`, `VALID`, and `SAME_UPPER`. If `NOTSET` is used, `pads` must be set so that the padding valid type or same upper.<br>- Dimension in Height and weight must be static.<br>- `group` must be default value(1).<br>- `dilations` must be default value(1).<br>- Input and output tensors must have 4D (N x C x H x W).<br>- `kernel_shape` must be static. | |
28+
| **ConvTranspose** |6 - * |z16 - ^ |- 1D and 3D not supported because Conv1D and Conv3D not supported in zDNN. non-default `dilations` not supported because dilated convolution not supported in zDNN. | |
29+
| **Div** |6 - * |z16 - ^ |Shape of input tensors must be the same since broadcasting is not supported. | |
30+
| **Exp** |6 - * |z16 - ^ |Input tensor must have 4 dimensions. | |
31+
| **GRU** |7 - * |z16 - ^ |- `direction` and `hidden_size` in `W` must have static dimensions.<br>- `R` must have static dimensions.<br>- If `B` and `initial_h` are given, they must have static dimensions.<br>- `sequence_lens` is not supported for bidirectional GRU.<br>- `activations` must be `["Sigmoid", "Tanh", "Tanh"]`.<br>- `clip` is not supported.<br>- `linear_before_reset` must be 1.<br>- `layout` is not supported. | |
32+
| **Gelu** |20 - * |z17 - ^ |Input tensor must be less than or equal to 4 dimensions. | |
33+
| **Gemm** |6 - * |z16 - ^ |- `alpha` and `beta` must be default value(1).<br>- Rank of `C` must be 1 or 2. If the rank is 1, the dimension of `C` must be the same with the seconde dimension of `B`.<br>. | |
34+
| **GlobalAveragePool** |6 - * |z16 - ^ |- Input shape must be 4D tensor(NCHW).<br>- Dimensions in `H` and `W` must be static. | |
35+
| **LSTM** |7 - * |z16 - ^ |- `direction` and `hidden_size` in `W` must have static dimensions.<br>- `R` must have static dimensions.<br>- `B` and `initial_h` have static dimensions if given. `B`'s direction dim must be 1 or 2.<br>- `P`(peepholes), `activation_alpha`, and `activation_beta` are not supported.<br>- `activations` must be `["Sigmoid", "Tanh", "Tanh"]`.<br>- `clip` is not supported.<br>- `input_forget` must be default value(0).<br>- `layout` is not supported. | |
36+
| **LeakyRelu** |6 - * |z17 - ^ |Input tensor must be less than or equal to 4 dimensions. | |
3037
| **Log** |6 - * |z16 |Input tensor must have 4 dimensions. | |
31-
| **LogSoftmax** |6 - * |z16 | | |
32-
| **MatMul** |6 - * |z16 |Ranks of input tensors must be (Rank of A, Rank of B) = (M, N), where M >= 2 and N >= 2. | |
33-
| **Max** |6 - * |z16 |- Shape of input tensors must be the same since broadcasting is not supported.<br>- Input tensors must have static dimensions. | |
34-
| **MaxPool** |6 - * |z16 |- `auto_pad` must be `NOTSET`, `VALID`, and `SAME_UPPER`. If `NOTSET` is used, `pads` must be set so that the padding valid type or same upper.<br>- `ceil_mode` must be default value(0) <br>- Input and output tensors must be 4D tensors(N x C x H x W).<br>- `kernel_shape` must be static.<br>- `ceil_mode` must be default value(0).<br>- `dilations` must be default value(1). | |
35-
| **Min** |6 - * |z16 |- Shape of input tensors must be the same since broadcasting is not supported.<br>- Input tensors must have static dimensions. | |
36-
| **Mul** |6 - * |z16 |- Shape of input tensors should be the same since broadcasting is not supported.<br>- Input tensors must have static dimensions. | |
37-
| **Pow** |7 - * |z16 |- Exponent should be a scalar integer and less or equal to 64. | |
38-
| **ReduceMean** |6 - * |z16 |- `keepdims` must be 1.<br>- Input tensor must be 4D tensors and `axis` must be [2, 3]. | |
39-
| **Relu** |6 - * |z16 |Input tensor must be less than or equal to 4 dimensions. | |
40-
| **Sigmoid** |6 - * |z16 |Input tensor must be less than or equal to 4 dimensions. | |
41-
| **Softmax** |6 - * |z16 |- `axis` must be the last dimension, i.e. `rank - 1` or -1. | |
42-
| **Softplus** |6 - * |z16 |The operations immediately before and after the Softplus operation must be executed on the NNPA. Otherwise, Softplus is executed on the CPU. This limitation is set to avoid performance degradation. | |
43-
| **Sub** |6 - * |z16 |- Shape of input tensors should be the same since broadcasting is not supported.<br>- Input tensors must have static dimensions. | |
44-
| **Sum** |6 - * |z16 |- All inputs must have the same static shape (Broadcasting not supported.)<br>- Single input not supported. | |
45-
| **Tanh** |6 - * |z16 |Input tensor must be less than or equal to 4 dimensions. | |
38+
| **LogSoftmax** |6 - * |z16 - ^ | | |
39+
| **MatMul** |6 - * |z16 - ^ |Ranks of input tensors must be (Rank of A, Rank of B) = (M, N), where M >= 2 and N >= 2. | |
40+
| **MatMulInteger** |10 - * |z17 - ^ | | |
41+
| **Max** |6 - * |z16 - ^ |Shape of input tensors must be the same since broadcasting is not supported. | |
42+
| **MaxPool** |6 - * |z16 - ^ |- `auto_pad` must be `NOTSET`, `VALID`, and `SAME_UPPER`. If `NOTSET` is used, `pads` must be set so that the padding valid type or same upper.<br>- `ceil_mode` must be default value(0) <br>- Input and output tensors must be 4D tensors(N x C x H x W).<br>- `kernel_shape` must be static.<br>- `ceil_mode` must be default value(0).<br>- `dilations` must be default value(1). | |
43+
| **Min** |6 - * |z16 - ^ |Shape of input tensors must be the same since broadcasting is not supported. | |
44+
| **Mul** |6 - * |z16 - ^ |Shape of input tensors should be the same since broadcasting is not supported. | |
45+
| **Pow** |7 - * |z16 - ^ |- Exponent should be a scalar integer and less or equal to 64. | |
46+
| **QLinearMatMul** |10 - * |z17 - ^ |Only support i8 and ui8 for zeropoint, and f32 for scale. | |
47+
| **ReduceMax** |6 - * |z17 - ^ |- `keepdims` must be 1.<br>- `noop_with_empty_axes` must be 0.<br>- Does not support reduction over multiple axes.<br>- We do not support `do_not_keepdims` backend tests.<br>- Only support reduction over the innermost dimension. | |
48+
| **ReduceMean** |6 - * |z16 - ^ |- `keepdims` must be 1.<br>- Input tensor must be 4D tensors and `axis` must be [2, 3]. | |
49+
| **ReduceMin** |6 - * |z17 - ^ |- `keepdims` must be 1.<br>- `noop_with_empty_axes` must be 0.<br>- Does not support reduction over multiple axes.<br>- We do not support `do_not_keepdims` backend tests.<br>- Only support reduction over the innermost dimension. | |
50+
| **Relu** |6 - * |z16 - ^ |Input tensor must be less than or equal to 4 dimensions. | |
51+
| **Sigmoid** |6 - * |z16 - ^ |Input tensor must be less than or equal to 4 dimensions. | |
52+
| **Softmax** |6 - * |z16 - ^ |- `axis` must be the last dimension, i.e. `rank - 1` or -1. | |
53+
| **Softplus** |6 - * |z16 - ^ |The operations immediately before and after the Softplus operation must be executed on the NNPA. Otherwise, Softplus is executed on the CPU. This limitation is set to avoid performance degradation. | |
54+
| **Sqrt** |6 - * |z17 - ^ |Input tensor must be less than or equal to 4 dimensions. | |
55+
| **Sub** |6 - * |z16 - ^ |Shape of input tensors should be the same since broadcasting is not supported. | |
56+
| **Sum** |6 - * |z16 - ^ |- Shape of input tensors must be the same since broadcasting is not supported.<br>- Single input not supported. | |
57+
| **Tanh** |6 - * |z16 - ^ |Input tensor must be less than or equal to 4 dimensions. | |

docs/SupportedOps-NNPA-supplement.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
<!--- File created to explain NNPA hardware operations and limitations. -->
2+
<!-- This file was created manually refer to https://github.com/onnx/onnx-mlir/issues/3125 for more information -->
3+
4+
# Supported Operations for Target *NNPA*.
5+
6+
This document highlights operations that are performed on NNPA hardware that are not explicilty supported by ONNX.
7+
8+
* **Minimum NNPA Level(Inclusive)** indicates the lowest and highest NNPA level a model may have for onnx-mlir to support compiling a model with the operator.
9+
* A ^ indicates onnx-mlir is compatible with the latest level of the NNPA Architecture which is z17.
10+
* Refer to [SupportedONNXOps-NNPA.md](https://github.com/onnx/onnx-mlir/blob/main/docs/SupportedONNXOps-NNPA.md) for ONNX supported operations.
11+
12+
* **Improvements**
13+
* Transposed MatMul - Optimization of the pattern MatMul followed by transpose consolidated and executed on NNPA.
14+
* Maximum Dimension Index Size (MDIS) - /*e1*/ 2097152, /*e2*/ 1048576, /*e3*/ 32768, /*e4*/ 32768.
15+
* Stickification - Perfroms data conversions to NNPAs internal format, DLFLOAT16, on the NNPA.
16+
* MatMul Broadcast - Adds Bcast1 support to the MatMul operation.
17+
18+
| Op |Minimum NNPA Level(Inclusive) |Limitations |Notes |
19+
| --- |--- |--- |--- |
20+
| **Invsqrt** |z17 - ^ | Input tensor must be less than or equal to 4 dimensions. | |

src/Accelerators/NNPA/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ else()
3333
endif()
3434

3535
include(zdnn.cmake)
36-
setup_zdnn(v1.1.1)
36+
setup_zdnn(v1.1.2)
3737

3838
add_subdirectory(Dialect)
3939
add_subdirectory(Conversion)

src/Accelerators/NNPA/Runtime/zDNNExtension/Elementwise.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -304,8 +304,8 @@ zdnn_status zdnn_tanh_ext(const zdnn_ztensor *input, zdnn_ztensor *output) {
304304
}
305305

306306
// -----------------------------------------------------------------------------
307-
// Extension Functions for arch15
308-
// arch15 specific zdnn functions but with the `_ext` postfix.
307+
// Extension Functions for arch15/z17
308+
// arch15/z17 specific zdnn functions but with the `_ext` postfix.
309309
// Retrieve the zdnn status message
310310
// -----------------------------------------------------------------------------
311311

src/Accelerators/NNPA/Support/NNPALimit.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ static NNPALevel getNNPAFromTargetFlag(std::string str) {
3030
if (str[1] == '1') {
3131
if (str[2] == '6')
3232
return NNPALevel::M14;
33+
if (str[2] == '7')
34+
return NNPALevel::M15;
3335
}
3436
}
3537
} else if (str.size() == 6) {

src/Accelerators/NNPA/Transform/ZHigh/ZHighConstPropagation.cpp

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -357,6 +357,8 @@ static void replaceOpAndGC(
357357
// v is consumed by only the current stick op.
358358
if (!v.hasOneUse())
359359
continue;
360+
if (llvm::any_of(newValues, [&v](Value nv) { return nv == v; }))
361+
continue;
360362
if (auto cop = v.getDefiningOp<ONNXConstantOp>()) {
361363
if (auto disposableAttr =
362364
mlir::dyn_cast<DisposableElementsAttr>(cop.getValueAttr())) {
@@ -463,8 +465,8 @@ struct ConstantQuantizedStickPattern
463465
LogicalResult matchAndRewrite(
464466
ZHighQuantizedStickOp stickOp, PatternRewriter &rewriter) const override {
465467
Value input = stickOp.getIn();
466-
Value recscale = stickOp.getRecScale();
467-
Value offset = stickOp.getOffset();
468+
Value recscale = stickOp.getInRecScale();
469+
Value offset = stickOp.getInOffset();
468470
Value output = stickOp.getOut();
469471
StringAttr layout = stickOp.getLayoutAttr();
470472
StringAttr quantizedType = stickOp.getQuantizedTypeAttr();

src/Compiler/CompilerOptions.cpp

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -879,7 +879,7 @@ static int64_t decodeZArchNum(std::string str) {
879879
return 13;
880880
if (str == "arch14" || str == "z16") // Z16 and equivalents.
881881
return 14;
882-
if (str == "arch15")
882+
if (str == "arch15" || str == "z17") // Z17 and equivalents.
883883
return 15;
884884
return -1;
885885
}
@@ -1386,6 +1386,9 @@ void initCompilerConfig() {
13861386
// Fast math option is enabled (in general)
13871387
setLLVMOption(getLLVMOption() + " --enable-unsafe-fp-math");
13881388
}
1389+
1390+
if (march == "z17")
1391+
march = "arch15";
13891392
}
13901393

13911394
} // namespace onnx_mlir

0 commit comments

Comments
 (0)