-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
{}Proposed approach{}:
- add functionality to allow binding registration with the
pkg::fun()name;- Modify register_binding() to register 2 identical copies for each pkg::fun binding, fun and pkg::fun.
- Add a binding for the :: operator, which helps with retrieving bindings from the function registry.
- Add generic unit tests for the pkg::fun functionality.
- register
nse_funcsrequiring indirect mapping- register each binding with and without the pkg:: prefix
- add / update unit tests for the nse_funcs bindings to include at least one pkg::fun() call for each binding
- register
nse_funcsrequiring direct mapping (unary and binary bindings)- register each binding with and without the pkg:: prefix
- add / update unit tests for the nse_funcs bindings to include at least one pkg::fun() call for each binding
- register
agg_funcsfor use withsummarise() - document changes in the Writing bindings documentation
-
going forward we should be using pkg::fun when defining a binding, which will register 2 copies of the same binding.
Different implementation options are outlined and discussed in the design document.
{}Description{}:
Currently we implement a number of functions from packages likelubridatewhich work well when called without namespacing (e.g.{}year(){}), however if someone callslubridate::year()we get a not-implemented method (e.g.{}Warning: Expression lubridate::year(time_hour) not supported in Arrow{}). Is it possible for us to look and see if we have an arrow function that matches the function itself.
{code:r}
library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)ds <- InMemoryDataset$create(nycflights13::flights)
ds %>%
mutate(year = lubridate::year(time_hour)) %>%
collect()
#> Warning: Expression lubridate::year(time_hour) not supported in Arrow; pulling
#> data into R
#> # A tibble: 336,776 × 19
#> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#>
#> 1 2013 1 1 517 515 2 830 819
#> 2 2013 1 1 533 529 4 850 830
#> 3 2013 1 1 542 540 2 923 850
#> 4 2013 1 1 544 545 -1 1004 1022
#> 5 2013 1 1 554 600 -6 812 837
#> 6 2013 1 1 554 558 -4 740 728
#> 7 2013 1 1 555 600 -5 913 854
#> 8 2013 1 1 557 600 -3 709 723
#> 9 2013 1 1 557 600 -3 838 846
#> 10 2013 1 1 558 600 -2 753 745
#> # … with 336,766 more rows, and 11 more variables: arr_delay ,
#> # carrier , flight , tailnum , origin , dest ,
#> # air_time , distance , hour , minute , time_hour
-
ds %>%
mutate(year = year(time_hour)) %>%
collect()
#> # A tibble: 336,776 × 19
#> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#>
#> 1 2013 1 1 517 515 2 830 819
#> 2 2013 1 1 533 529 4 850 830
#> 3 2013 1 1 542 540 2 923 850
#> 4 2013 1 1 544 545 -1 1004 1022
#> 5 2013 1 1 554 600 -6 812 837
#> 6 2013 1 1 554 558 -4 740 728
#> 7 2013 1 1 555 600 -5 913 854
#> 8 2013 1 1 557 600 -3 709 723
#> 9 2013 1 1 557 600 -3 838 846
#> 10 2013 1 1 558 600 -2 753 745
#> # … with 336,766 more rows, and 11 more variables: arr_delay ,
#> # carrier , flight , tailnum , origin , dest ,
#> # air_time , distance , hour , minute , time_hour
{code}
Reporter: Jonathan Keane / @jonkeane
Assignee: Dragoș Moldovan-Grünfeld / @dragosmg
Subtasks:
- [R] Add functionality to allow binding registration with the
pkg::funname - [R] Register
nse_funcsrequiring indirect mapping - [R] Register
nse_funcsrequiring direct mapping (unary & binary) - [R] [Doc] Document changes in the Writing bindings documentation
Related issues:
- [R] dplyr
nfunction cannot be called withdplyr::n()(is duplicated by) - [R] Create a function registry for our NSE funcs (relates to)
PRs and other links:
Note: This issue was originally created as ARROW-14575. Please see the migration documentation for further details.