-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-41728:[Java][Vector] Add ExtensionWriter and fix StructVector construction with ExtensionTypes #41731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is currently more skeleton like as the concrete details of the ExtensionWriter interface will need to be nailed down.
934466e to
b4bda84
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where some design decisions have to be made of how to actually expose the ExtensionWriter to the user and how the user should actually extend some ExtensionWriter.
As mentioned above I would make this one interface BaswWriter exposing all the possible writers and leaving them abstract in the concrete implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TransferPair is likely a very common pattern for extension types, so one could create a concrete ExtensionWriter which actually implements this and something like UuidWriter could inherit from.
We do something similar for our writers: https://github.com/xtdb/xtdb/blob/e9a15ee12244367c1d3d1a75c00e4f5c0e10008f/core/src/main/kotlin/xtdb/vector/extensions/XtExtensionVector.kt#L30-L40
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be fine, let's get this one working and further evaluate.
b4bda84 to
f4f0e4d
Compare
|
@FiV0 I will take a look at this PR later today or earlier tomorrow. |
f4f0e4d to
7cb7fd5
Compare
| reader.loadNextBatch(); | ||
| final VectorSchemaRoot readerRoot = reader.getVectorSchemaRoot(); | ||
| Assert.assertEquals(root.getSchema(), readerRoot.getSchema()); | ||
| assertEquals(root.getSchema(), readerRoot.getSchema()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought: since we are updating test cases, shall we use JUnit5? juputer.api.?
| package org.apache.arrow.vector; | ||
|
|
||
| import static org.junit.Assert.*; | ||
| import static org.junit.Assert.assertEquals; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we use JUnit5?
| final StructVector toVector = (StructVector) tp.getTo(); | ||
| assertEquals(s1.getField(), toVector.getField()); | ||
| assertEquals(s1.getField().getChildren().get(0), toVector.getField().getChildren().get(0)); | ||
| // also fails but probably another issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I also debugged the code. It seems the copy didn't work properly. I guess you're still not finished with this work, right?
vibhatha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall idea looks fine to me, but need to run another pass.
And the added test case seems to need some work as we don't get the expected results from the transferPair AFAIU.
|
I think the main part of moving this forward is the design work on the writer or more specifically
I think all of this should have a clear answer before moving forward with this proposal. Maybe have a look at what other language implementations do in this case. |
|
They don't really have answers for this. Does it work to just ask users to build the underlying type directly for now? |
|
@lidavidm Ok, I will then create a Test for |
|
Thanks @FiV0! |
|
|
Based on changes from apache/arrow#41731. Added writer ExtensionWriter with 3 methods: - write method for writing values from Extension holders; - writeExtensionType method for writing values (arguments is Object because we don't know exact type); - addExtensionTypeFactory method - because exact vector and value type are unknown, user should create their own extension type vector, writer for it and ExtensionTypeFactory where it should map vector and writer.
|
Migrated to apache/arrow-java#697 |
Based on changes from apache/arrow#41731. ## What's Changed Added writer ExtensionWriter with 3 methods: - write method for writing values from Extension holders; - writeExtensionType method for writing values (arguments is Object because we don't know exact type); - addExtensionTypeFactory method - because the exact vector and value type are unknown, the user should create their own extension type vector, write for it, and ExtensionTypeFactory, which should map the vector and writer. Closes #87. Co-authored-by: Finn Völkel <[email protected]>
Based on changes from apache/arrow#41731. ## What's Changed Added writer ExtensionWriter with 3 methods: - write method for writing values from Extension holders; - writeExtensionType method for writing values (arguments is Object because we don't know exact type); - addExtensionTypeFactory method - because the exact vector and value type are unknown, the user should create their own extension type vector, write for it, and ExtensionTypeFactory, which should map the vector and writer. Closes apache#87. Co-authored-by: Finn Völkel <[email protected]>
Based on changes from apache#41731. Added writer ExtensionWriter with 3 methods: - write method for writing values from Extension holders; - writeExtensionType method for writing values (arguments is Object because we don't know exact type); - addExtensionTypeFactory method - because the exact vector and value type are unknown, the user should create their own extension type vector, write for it, and ExtensionTypeFactory, which should map the vector and writer. Closes dremio#87. Co-authored-by: Finn Völkel <[email protected]>
* GH-87: [Vector] Add ExtensionWriter (apache#697) Based on changes from apache#41731. Added writer ExtensionWriter with 3 methods: - write method for writing values from Extension holders; - writeExtensionType method for writing values (arguments is Object because we don't know exact type); - addExtensionTypeFactory method - because the exact vector and value type are unknown, the user should create their own extension type vector, write for it, and ExtensionTypeFactory, which should map the vector and writer. Closes #87. Co-authored-by: Finn Völkel <[email protected]> * just checking * Update Brewfile * revert * Fix for Cmake 3.xx installation * Fix for Cmake 3.xx installation * update thrift * update thrift url * update thrift url * Update github.yml * fix format --------- Co-authored-by: Finn Völkel <[email protected]>
Based on changes from apache/arrow#41731. ## What's Changed Added writer ExtensionWriter with 3 methods: - write method for writing values from Extension holders; - writeExtensionType method for writing values (arguments is Object because we don't know exact type); - addExtensionTypeFactory method - because the exact vector and value type are unknown, the user should create their own extension type vector, write for it, and ExtensionTypeFactory, which should map the vector and writer. Closes apache#87. Co-authored-by: Finn Völkel <[email protected]>
Rationale for this change
For me personally I mainly want to fix apache/arrow-java#87, but this change also includes some larger changes to the writer interface. This will likely include some design work and some back and forth as I can't make these decisions.
What changes are included in this PR?
A commit that extends the writer interfaces with an
ExtensionWriteras well as dealing with theMinorType.EXTENSIONTYPEin various places. TheExtensionWriterwas left rather skeleton like as this needs some design decisions to be made before implementing the concrete details. I will leave some comments throughout the code of where I think stuff needs to be decided.This next bit are my 2cents (hence taken with a grain of salt) of how I would do things as I have the impression that one will eventually hit a wall with the current writer interfaces. For example the
PromotableWritercurrently makes the assumption that the parent vector is a struct or a list vector, making it difficult to extend this to composite extension types.I would just create one single writer interface (
VectorWriteror whatever you want to call) implemented by all concrete writers. If a writer doesn't support an operation just keep it abstract or throw an UOE.Are these changes tested?
There are two tests that test the issue of apache/arrow-java#87. I have not tested (because of the lack of a concrete implementation) the
ExtensionWriterinterface.Are there any user-facing changes?
Yes.
ExtensionWriterwill be exposed for the first time. This is only an additive change and shouldn't break any existing API.