Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/source/format/CanonicalExtensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,17 @@ JSON
In the future, additional fields may be added, but they are not required
to interpret the array.

UUID
====

* Extension name: ``arrow.uuid``.

* The storage type of the extension is ``FixedSizeBinary`` with a length of 16 bytes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't FixedSizeBinary a logical type? Is it specified somewhere what physical layout should be used? Unless I'm mistaken there are two options: fixed-size primitive layout (FixedSizePrimitive<[u8; 16]>) or fixed-size list layout (FixedSizeList<u8>(16))?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed FixedSizeBinary maps to a single physical type (fixed size primitive). If this is ambiguous we should indeed specify (preferrably one) physical type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK we want to use a logical type here, since that are the concrete types that are actually used in the various specs like IPC and C Data Interface (the phsyical layouts are just described in the format docs, but it are the logical types that are listed in the flatbuffer Schema.fbs).
While "storage type" might sound like it points to a physical layout, the docs also mention that extension types annotate any built-in logical type: https://arrow.apache.org/docs/dev/format/Columnar.html#extension-types

Sidenote: there is not really such a thing as a "physical type", but only physical layout. Next to the various layouts, we just have "types" on top of that, which are sometimes called "logical types", but we are not very consistent about that terminology ..


.. note::
A specific UUID version is not required or guaranteed. This extension represents
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that specifying the version should be optional metadata?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UUID versioning seems like a concern for the generator and consumers. It could become very difficult for Arrow to guarantee anything about versions without validating the values one by one.

UUIDs as FixedSizeBinary(16) with big-endian notation and does not interpret the bytes in any way.

=========================
Community Extension Types
=========================
Expand Down