Skip to content

Conversation

hujiajie
Copy link
Contributor

@hujiajie hujiajie commented Aug 9, 2024

Description

In WebGPU, there is no direct mapping of int8 and uint8 in its supported types. This leads to memory access complexities in the indices helper, depending on how it is configured. The following table summarizes the expected element type of the storage (u32 or atomic<u32>) and how many values each (atomic) u32 encodes (1, 2, or 4).

usage components = 1 components = 2 components = 4
input u32, 4 u32, 4 u32, 4
output atomic u32, 4 atomic u32, 4 u32, 4
internal u32, 1 u32, 2 u32, 4

Motivation and Context

This is a prerequisite to ease support for int8 and uint8 inside individual operators.

See also the comments for why what configuration takes what path.

### Description

In WebGPU, there is no direct mapping of int8 and uint8 in its supported
types. This leads to memory access complexities in the indices helper,
depending on how it is configured. The following table summarizes the
expected element type of the storage (`u32` or `atomic<u32>`) and how
many values each (atomic) `u32` encodes (1, 2, or 4).

  | usage    | components = 1 | components = 2 | components = 4 |
  | -------- | -------------- | -------------- | -------------- |
  | input    | u32,        4  | u32,        4  | u32, 4         |
  | output   | atomic u32, 4  | atomic u32, 4  | u32, 4         |
  | internal | u32,        1  | u32,        2  | u32, 4         |

### Motivation and Context

This is a prerequisite to ease support for int8 and uint8 inside
individual operators.

See also the comments for why what configuration takes what path.
@hujiajie hujiajie marked this pull request as draft August 15, 2024 01:22
@hujiajie
Copy link
Contributor Author

This is no longer pursued.

#22755 introduces the explicit concept of atomicOutput, which effectively limits the pre-existing output usage to be non-atomic. However, the combo of (u)int8 + non-atomic output + components=1 or 2 cannot be handled in a data-race-free way, as explained in the comments of this PR. It seems quite error-prone to provide a non-atomic helper and ask op implementers to watch out for any data race when calling this.

I also don't love the idea of forcing the components to 4 like bool. The interface just looks poorly designed to me if it consumes a combo of data type + usage + components, while I have to remember a list of (perhaps not so obvious) exceptions.

@hujiajie hujiajie closed this Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant