Skip to content

Conversation

@FiV0
Copy link
Contributor

@FiV0 FiV0 commented May 20, 2024

Rationale for this change

For me personally I mainly want to fix apache/arrow-java#87, but this change also includes some larger changes to the writer interface. This will likely include some design work and some back and forth as I can't make these decisions.

What changes are included in this PR?

A commit that extends the writer interfaces with an ExtensionWriter as well as dealing with the MinorType.EXTENSIONTYPE in various places. The ExtensionWriter was left rather skeleton like as this needs some design decisions to be made before implementing the concrete details. I will leave some comments throughout the code of where I think stuff needs to be decided.

This next bit are my 2cents (hence taken with a grain of salt) of how I would do things as I have the impression that one will eventually hit a wall with the current writer interfaces. For example the PromotableWriter currently makes the assumption that the parent vector is a struct or a list vector, making it difficult to extend this to composite extension types.
I would just create one single writer interface (VectorWriter or whatever you want to call) implemented by all concrete writers. If a writer doesn't support an operation just keep it abstract or throw an UOE.

Are these changes tested?

There are two tests that test the issue of apache/arrow-java#87. I have not tested (because of the lack of a concrete implementation) the ExtensionWriter interface.

Are there any user-facing changes?

Yes. ExtensionWriter will be exposed for the first time. This is only an additive change and shouldn't break any existing API.

@FiV0 FiV0 requested a review from lidavidm as a code owner May 20, 2024 11:45
@github-actions
Copy link

⚠️ GitHub issue apache/arrow-java#87 has been automatically assigned in GitHub to PR creator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is currently more skeleton like as the concrete details of the ExtensionWriter interface will need to be nailed down.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels May 20, 2024
@FiV0 FiV0 force-pushed the structs-and-extensions branch from 934466e to b4bda84 Compare May 20, 2024 11:48
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where some design decisions have to be made of how to actually expose the ExtensionWriter to the user and how the user should actually extend some ExtensionWriter.

As mentioned above I would make this one interface BaswWriter exposing all the possible writers and leaving them abstract in the concrete implementations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds reasonable to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TransferPair is likely a very common pattern for extension types, so one could create a concrete ExtensionWriter which actually implements this and something like UuidWriter could inherit from.

We do something similar for our writers: https://github.com/xtdb/xtdb/blob/e9a15ee12244367c1d3d1a75c00e4f5c0e10008f/core/src/main/kotlin/xtdb/vector/extensions/XtExtensionVector.kt#L30-L40

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be fine, let's get this one working and further evaluate.

@lidavidm
Copy link
Member

CC @vibhatha @danepitkin

@FiV0 FiV0 force-pushed the structs-and-extensions branch from b4bda84 to f4f0e4d Compare May 21, 2024 10:06
@vibhatha
Copy link
Collaborator

@FiV0 I will take a look at this PR later today or earlier tomorrow.

@FiV0 FiV0 force-pushed the structs-and-extensions branch from f4f0e4d to 7cb7fd5 Compare May 21, 2024 14:25
reader.loadNextBatch();
final VectorSchemaRoot readerRoot = reader.getVectorSchemaRoot();
Assert.assertEquals(root.getSchema(), readerRoot.getSchema());
assertEquals(root.getSchema(), readerRoot.getSchema());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought: since we are updating test cases, shall we use JUnit5? juputer.api.?

package org.apache.arrow.vector;

import static org.junit.Assert.*;
import static org.junit.Assert.assertEquals;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use JUnit5?

final StructVector toVector = (StructVector) tp.getTo();
assertEquals(s1.getField(), toVector.getField());
assertEquals(s1.getField().getChildren().get(0), toVector.getField().getChildren().get(0));
// also fails but probably another issue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also debugged the code. It seems the copy didn't work properly. I guess you're still not finished with this work, right?

Copy link
Collaborator

@vibhatha vibhatha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall idea looks fine to me, but need to run another pass.

And the added test case seems to need some work as we don't get the expected results from the transferPair AFAIU.

@FiV0
Copy link
Contributor Author

FiV0 commented May 23, 2024

I think the main part of moving this forward is the design work on the writer or more specifically ExtensionWriter interface. I think it would be good to have some rough idea of how this will look in the future for the user, as otherwise we will need to change this again when we realize that the current API doesn't work. For example.

  • How do you expect people to write to a UuidVector?
  • How do you expect people to write to a composite extension vector ( for example a SetVector as an extension to a ListVector)?
  • How do you expect people to write to a Union with an extension leg? How should a union with two extensions legs of different type be dealt with? etc..

I think all of this should have a clear answer before moving forward with this proposal. Maybe have a look at what other language implementations do in this case.

@lidavidm
Copy link
Member

They don't really have answers for this.

Does it work to just ask users to build the underlying type directly for now?

@FiV0
Copy link
Contributor Author

FiV0 commented May 24, 2024

@lidavidm Ok, I will then create a Test for ExtensionWriter (when I find the time) with the cases outlined above, so that we have at least some idea of what it will look like.

@lidavidm
Copy link
Member

Thanks @FiV0!

@github-actions
Copy link

⚠️ GitHub issue #41728 has no components, please add labels for components.

xxlaykxx added a commit to xxlaykxx/arrow-java that referenced this pull request Apr 1, 2025
Based on changes from apache/arrow#41731.

Added writer ExtensionWriter with 3 methods:
- write method  for writing values from Extension holders;
- writeExtensionType method for writing values (arguments is Object because we don't know exact type);
- addExtensionTypeFactory method - because exact vector and value type are unknown, user should create their own extension type vector, writer for it and ExtensionTypeFactory where it should map vector and writer.
@lidavidm
Copy link
Member

lidavidm commented Apr 3, 2025

Migrated to apache/arrow-java#697

@lidavidm lidavidm closed this Apr 3, 2025
lidavidm pushed a commit to apache/arrow-java that referenced this pull request Apr 9, 2025
Based on changes from apache/arrow#41731.

## What's Changed

Added writer ExtensionWriter with 3 methods:
- write method  for writing values from Extension holders;
- writeExtensionType method for writing values (arguments is Object
because we don't know exact type);
- addExtensionTypeFactory method - because the exact vector and value
type are unknown, the user should create their own extension type
vector, write for it, and ExtensionTypeFactory, which should map the
vector and writer.

Closes #87.

Co-authored-by: Finn Völkel <[email protected]>
xxlaykxx added a commit to dremio/arrow-java that referenced this pull request Apr 23, 2025
Based on changes from apache/arrow#41731.

## What's Changed

Added writer ExtensionWriter with 3 methods:
- write method  for writing values from Extension holders;
- writeExtensionType method for writing values (arguments is Object
because we don't know exact type);
- addExtensionTypeFactory method - because the exact vector and value
type are unknown, the user should create their own extension type
vector, write for it, and ExtensionTypeFactory, which should map the
vector and writer.

Closes apache#87.

Co-authored-by: Finn Völkel <[email protected]>
xxlaykxx added a commit to xxlaykxx/arrow that referenced this pull request May 5, 2025
Based on changes from apache#41731.

Added writer ExtensionWriter with 3 methods:
- write method  for writing values from Extension holders;
- writeExtensionType method for writing values (arguments is Object
because we don't know exact type);
- addExtensionTypeFactory method - because the exact vector and value
type are unknown, the user should create their own extension type
vector, write for it, and ExtensionTypeFactory, which should map the
vector and writer.

Closes dremio#87.

Co-authored-by: Finn Völkel <[email protected]>
xxlaykxx added a commit to dremio/arrow that referenced this pull request Jun 23, 2025
* GH-87: [Vector] Add ExtensionWriter (apache#697)

Based on changes from apache#41731.

Added writer ExtensionWriter with 3 methods:
- write method  for writing values from Extension holders;
- writeExtensionType method for writing values (arguments is Object
because we don't know exact type);
- addExtensionTypeFactory method - because the exact vector and value
type are unknown, the user should create their own extension type
vector, write for it, and ExtensionTypeFactory, which should map the
vector and writer.

Closes #87.

Co-authored-by: Finn Völkel <[email protected]>

* just checking

* Update Brewfile

* revert

* Fix for Cmake 3.xx installation

* Fix for Cmake 3.xx installation

* update thrift

* update thrift url

* update thrift url

* Update github.yml

* fix format

---------

Co-authored-by: Finn Völkel <[email protected]>
timhurskidremio pushed a commit to timhurskidremio/dremio-arrow-java that referenced this pull request Dec 5, 2025
Based on changes from apache/arrow#41731.

## What's Changed

Added writer ExtensionWriter with 3 methods:
- write method  for writing values from Extension holders;
- writeExtensionType method for writing values (arguments is Object
because we don't know exact type);
- addExtensionTypeFactory method - because the exact vector and value
type are unknown, the user should create their own extension type
vector, write for it, and ExtensionTypeFactory, which should map the
vector and writer.

Closes apache#87.

Co-authored-by: Finn Völkel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Java] StructVector throws with ExtensionType

3 participants