Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Jan 5, 2024

Draft

Which issue does this PR close?

Closes #8738

Rationale for this change

This is a second sketch of how to close #8738

#8291 / #7647 changed DataFusion's Grouping operator so that it never dictionary encoded the output grouping columns.

Previously, the types of the input grouping expressions were the same as the types of the output group by

The idea I think is that since the values in the group columns are unique, there is no reason to dictionary encode them (as each dictionary entry would have a single value). I actually am not sure about this for reasons I will explain shortly.

What changes are included in this PR?

This PR changes LogicalPlan::Aggregate and LogicalPlan::Distinct (which both use the HashAggregate ExecutionPlan) to report a schema that is dictionary unencoded.

Are these changes tested?

Yes, there is a regression test added in #8750

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules labels Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression "RowConverter column schema mismatch, expected Utf8 got Dictionary(Int32, Utf8)" after upgrade

1 participant