-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Describe the bug, including details regarding any error messages, version, and platform.
The helpful ursabot conbench integration noted some regressions after #14518. That was a rather small change that affected usage of head() that I had thought was rare; however, it appears that it actually gets used frequently which was surprising because I had assumed that sink nodes with options (rather than a record batch reader) was supposed to be used.
Some quick inspection suggests that we're generating a nested exec plan for every head() here:
Line 230 in ec9a8a3
| collapse.arrow_dplyr_query(x) |
...which means that all of these will go through a record batch reader instead of the appropriate sink node:
Lines 79 to 86 in ec9a8a3
| if (is_collapsed(.data)) { | |
| # We have a nested query. | |
| if (has_head_tail(.data$.data)) { | |
| # head and tail are not ExecNodes; at best we can handle them via | |
| # SinkNode, so if there are any steps done after head/tail, we need to | |
| # evaluate the query up to then and then do a new query for the rest. | |
| # as_record_batch_reader() will build and run an ExecPlan | |
| node <- self$SourceNode(as_record_batch_reader(.data$.data)) |
Reprex:
library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
as_arrow_table(mtcars) |>
arrow:::as_adq() |>
head(10) |>
show_exec_plan()
#> Warning: The `ExecPlan` cannot be printed for a nested query.(this query should probably not be nested, but instead be collected with the appropriate sink node options?)
Component(s)
R