Skip to content

Conversation

copybara-service[bot]
Copy link
Contributor

Flip transpose of (slow) dynamic goi right-hand GEMM operands.

If a fully-connected or batch-matrix-multiply op will use dynamic goi packing for its right-hand operand, we add a transpose of the last two dimensions of the operand and flip the XNN_FLAG_TRANSPOSE_WEIGHTS flag.

If the right-hand operand is generated by a transpose op, then the transposition is done therein, potentially skipping the transpose altogether if it becomes a no-op.

We do this because, in almost all cases, we only have generic (unoptimized) gio packing kernels, but we do have optimized 'transpose' and 'goi' kernels.

If a `fully-connected` or `batch-matrix-multiply` op will use dynamic `goi` packing for its right-hand operand, we add a transpose of the last two dimensions of the operand and flip the `XNN_FLAG_TRANSPOSE_WEIGHTS` flag.

If the right-hand operand is generated by a `transpose` op, then the transposition is done therein, potentially skipping the `transpose` altogether if it becomes a no-op.

We do this because, in almost all cases, we only have generic (unoptimized) `gio` packing kernels, but we do have optimized 'transpose' and 'goi' kernels.

PiperOrigin-RevId: 811769415
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant