Skip to content

Conversation

@brancz
Copy link
Contributor

@brancz brancz commented Sep 8, 2023

Rationale for this change

We've observed that starting with an initial dictionary can be unnecessarily expensive, so we explored ways to achieve the same (or similar) behavior.

What changes are included in this PR?

Prepending dictionaries instead of inserting them into the memotable.

Are these changes tested?

No this is just a sketch for discussion purposes.

Are there any user-facing changes?

n/a for the moment

@brancz brancz requested a review from zeroshade as a code owner September 8, 2023 15:39
@brancz
Copy link
Contributor Author

brancz commented Sep 8, 2023

cc @zeroshade

@brancz
Copy link
Contributor Author

brancz commented Sep 8, 2023

This doesn't work, it's just a sketch for discussion purposes.

@github-actions
Copy link

github-actions bot commented Sep 8, 2023

⚠️ GitHub issue apache/arrow-go#47 has been automatically assigned in GitHub to PR creator.

@zeroshade
Copy link
Member

@brancz can you throw together a sketched example as to when/why someone would use this (that could eventually be turned into a unit test)?

@brancz
Copy link
Contributor Author

brancz commented Sep 21, 2023

@zeroshade added one of my use cases for this. Note that in this simplistic example, I could have also built the indices entirely separately without a dictionary builder and just call new dictionary array with the indices and dictionary, however, when you have deeply nested lists and structs then you don't have this option as you can't control the list and struct offsets, so you rely on the builders that the record builder exposes to you, which in this case is the binary dictionary builder.

@zeroshade
Copy link
Member

Thanks @brancz I'll try to take a look at this over the weekend or early next week and see if it makes sense as an approach to pursue

@zeroshade
Copy link
Member

@brancz Sorry for the long delay here, but I can see the sense and idea behind your suggestion here. It also doesn't seem like it would be very problematic to allow so I'd be in favor of you fleshing this out to a viable PR that works (as opposed to just a sketch).

@github-actions
Copy link

⚠️ GitHub issue #37538 has no components, please add labels for components.

Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. This PR seems stale, the Go implementation was moved to https://github.com/apache/arrow-go please open a PR there.

@raulcd raulcd closed this May 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Go] Allow prepending dictionary

3 participants