-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-41298: [Format][Docs] Add a canonical extension type specification for UUID #41299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -272,6 +272,17 @@ JSON | |
| In the future, additional fields may be added, but they are not required | ||
| to interpret the array. | ||
|
|
||
| UUID | ||
| ==== | ||
|
|
||
| * Extension name: ``arrow.uuid``. | ||
|
|
||
| * The storage type of the extension is ``FixedSizeBinary`` with a length of 16 bytes. | ||
|
|
||
| .. note:: | ||
| A specific UUID version is not required or guaranteed. This extension represents | ||
|
||
| UUIDs as FixedSizeBinary(16) with big-endian notation and does not interpret the bytes in any way. | ||
|
|
||
| ========================= | ||
| Community Extension Types | ||
| ========================= | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't
FixedSizeBinarya logical type? Is it specified somewhere what physical layout should be used? Unless I'm mistaken there are two options: fixed-size primitive layout (FixedSizePrimitive<[u8; 16]>) or fixed-size list layout (FixedSizeList<u8>(16))?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed
FixedSizeBinarymaps to a single physical type (fixed size primitive). If this is ambiguous we should indeed specify (preferrably one) physical type.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK we want to use a logical type here, since that are the concrete types that are actually used in the various specs like IPC and C Data Interface (the phsyical layouts are just described in the format docs, but it are the logical types that are listed in the flatbuffer Schema.fbs).
While "storage type" might sound like it points to a physical layout, the docs also mention that extension types annotate any built-in logical type: https://arrow.apache.org/docs/dev/format/Columnar.html#extension-types
Sidenote: there is not really such a thing as a "physical type", but only physical layout. Next to the various layouts, we just have "types" on top of that, which are sometimes called "logical types", but we are not very consistent about that terminology ..