-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-7539: [Java] FieldVector getFieldBuffers API should not set reader/writer indices #6156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -225,7 +225,6 @@ public void loadFieldBuffers(ArrowFieldNode fieldNode, List<ArrowBuf> ownBuffers | |
| @Override | ||
| public List<ArrowBuf> getFieldBuffers() { | ||
| List<ArrowBuf> result = new ArrayList<>(2); | ||
| setReaderAndWriterIndex(); | ||
| result.add(validityBuffer); | ||
| result.add(offsetBuffer); | ||
|
|
||
|
|
@@ -662,15 +661,11 @@ public void reset() { | |
| public ArrowBuf[] getBuffers(boolean clear) { | ||
| setReaderAndWriterIndex(); | ||
| final ArrowBuf[] buffers; | ||
| if (getBufferSize() == 0) { | ||
| buffers = new ArrowBuf[0]; | ||
| } else { | ||
| List<ArrowBuf> list = new ArrayList<>(); | ||
| list.add(offsetBuffer); | ||
| list.add(validityBuffer); | ||
| list.addAll(Arrays.asList(vector.getBuffers(false))); | ||
| buffers = list.toArray(new ArrowBuf[list.size()]); | ||
| } | ||
| List<ArrowBuf> list = new ArrayList<>(); | ||
| list.add(validityBuffer); | ||
| list.add(offsetBuffer); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this change, the order of validity and offset buffers is changed in the getBuffers method for ListVector which creates problems with serialization/deserialization resulting in test failures in Dremio. This would break backward compatibility with existing serialised files.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @projjal Thanks for your testing result! |
||
| list.addAll(Arrays.asList(vector.getBuffers(false))); | ||
| buffers = list.toArray(new ArrowBuf[list.size()]); | ||
| if (clear) { | ||
| for (ArrowBuf buffer : buffers) { | ||
| buffer.getReferenceManager().retain(); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this if check came from an earlier refactor that happened in Arrow and several tests had failed in Dremio. I think we should keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the old implementation, VectorUnloader.java checks the typeBufferCount with getFieldBuffers, it's ok since getFieldBuffers returns all buffers without check buffer size. If we keep this if, then the expected buffer count check in VectorUnloader is invalid, should we remove that check directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am okay with this change but it would be great if someone from Dremio can bless this (based on my previous comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep this 'if' here, then I guess we should make some change in VectorUnloader.
If vector.getBufferSize ==0, then no buffers would be sent via IPC, however, the VectorLoader depends on TypeBufferCount to decide how many buffers to load into a vector, and in this case, something is wrong.
To solve this, we may add a check in VectorUnloader, if vector bufferSize==0, we should append it's field buffers(by call getFieldBuffers) also, even their writerIndex/readerIndex=0. In this way, we could keep the 'if (getBufferSize() == 0)' in vectors and the IPC also works well.
Do you think we should keep this PR or update it as I suggested above?