-
Notifications
You must be signed in to change notification settings - Fork 70
change UUID <-> Arrow mapping to (de)serialize to/from 16-byte FixedSizeBinary #103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Failing tests? |
Codecov Report
@@ Coverage Diff @@
## main #103 +/- ##
==========================================
- Coverage 84.07% 84.02% -0.05%
==========================================
Files 23 23
Lines 2669 2680 +11
==========================================
+ Hits 2244 2252 +8
- Misses 425 428 +3
Continue to review full report at Codecov.
|
|
Ah, was missing some methods for FixedSizeListType types it seems. Fixed IIUC. It's a shame that this entails a significant perf hit for I imagine we could optimize further to lessen/remove the overhead, though I think it'd preferable to do that in a follow-up |
Co-authored-by: SimonDanisch <[email protected]>
|
Thanks to @SimonDanisch the UInt128 <-> NTuple{16,UInt8} cast employed here is now non-allocating 🔥 Reduces allocations for I'll dig into it in a separate PR when I get a chance but AFAICT this is now good to go pending green CI. |
|
@quinnj This is now ready for review, recommend looking at #104 at the same time (ref #104 (comment), which I can take care of once either is merged) 🙂 thanks! |
* add isbitstype optimized path for FixedSizeList getindex * reuse _unsafe_cast strategy from #103 * rebase + DRY _unsafe_cast
…izeBinary (apache#103) * change UUID <-> Arrow mapping to (de)serialize to/from 16-byte FixedSizeBinary * fix tests * optimize UInt128 <-> NTuple{16,UInt8} casting Co-authored-by: SimonDanisch <[email protected]> Co-authored-by: SimonDanisch <[email protected]>
* add isbitstype optimized path for FixedSizeList getindex * reuse _unsafe_cast strategy from apache#103 * rebase + DRY _unsafe_cast
|
BTW the PR title here says FixedSizeBinary, but what was implemented was FixedSizeList; why is that? Asking since arrow is considering a canonical extension for uuids that uses FixedSizeBinary. |
IIRC back when this PR was authored, Arrow.jl would (de)serialize Julia values with type idk if that is still true nowadays though here's the relevant |
...to make tables written out by Arrow.jl more portable.
I'm dumb and didn't realize e.g.
pyarrowdoesn't support 128-bit ints, so currently UUID-containing tables will fail to be read by e.g.pa.feather.read_feather(...).There's even pyarrow documentation that suggests to Python users that UUIDs can be written as FixedSizeBinary columns, so this seems like a more friendly/correct way to do this.