-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-661: [C++] Add LargeRecordBatch metadata type, IPC support, associated refactoring #404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Change-Id: I33cfa0c572a74920666216351391e2c887ba45ca
…s for IPC metadata and convert to flatbuffers later Change-Id: Ia464c912d41afd4f68d28e8a58cf9fc8ea5f8797
Change-Id: I8fed2c84cc2e1c38ea04590487f980efacf306c4
… record batch read/write path Change-Id: I29604a1ed32e8598d39a2ceb632578e5fb8512b8
Change-Id: I340cb6744d883c361b64d090133773c084586ccf
Change-Id: I652fbd1cf1859b77e5efb96e0cf9a10d8c8cb8d4
… aligned bitmaps Change-Id: I92703a6bb6a6d20876efe9fac2844b9b15f4ba12
Change-Id: I9fe7248278965688c96a48917191d1e2c7b0fb9f
|
Java builds are failing intermittently due to Maven Central flakiness |
|
Green build: https://travis-ci.org/wesm/arrow/builds/212816101 |
xhochy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM
|
@xhochy thanks for merging this patch. Since this changed Message.fbs, want to make sure that @julienledem takes a look and understands the issue. It would be good to provide a reasonable guarantee that data stored in a
Since this is marked experimental we aren't making any forward or API compatibility guarantees on thsi functionality |
|
This looks fine. If the feature becomes real we should consider just changing the length field to long in FIeldNode and specify the restriction in the metadata that supporting length > 2^31 - 1 is optional. |
|
If others feel that would be acceptable, maintaining less code is always preferable from my perspective. I believe the requirement to store vectors with more than INT32_MAX elements on the C++/Python side is not going to go away. The downside is that if you encounter a RecordBatch in the wild in Java, you may get an exception if it's too big. I'm not sure how concerning that is. |
|
I'd recommend that the C++ side does not allow writing vectors with more than INT32_MAX entries by default. You'd have to explicitly enable it. This way people don't inadvertently create things that won't be cross-language compatible. |
|
I'm OK with that. I will open a JIRA about changing the RecordBatch types from int to long |
|
see ARROW-679 |
This patch enables the following code for writing record batches exceeding 2^31 - 1
This also does a fair amount of refactoring and code consolidation related to ongoing code cleaning in arrow_ipc.
These APIs are marked experimental. This does add
LargeRecordBatchflatbuffer type to the Message union, but I've indicated that Arrow implementations (e.g. Java) are not required to implement this type. It's strictly to enable C++ users to write very large datasets that have been embedded for convenience in Arrow's structured data model.cc @pcmoritz @robertnishihara