Skip to content

[R] Allow functions with {{pkg::}} prefixes #30124

@asfimport

Description

@asfimport

{}Proposed approach{}:

  • add functionality to allow binding registration with the pkg::fun() name;
    • Modify register_binding() to register 2 identical copies for each pkg::fun binding, fun and pkg::fun.
    • Add a binding for the :: operator, which helps with retrieving bindings from the function registry.
    • Add generic unit tests for the pkg::fun functionality.
  • register nse_funcs requiring indirect mapping
    • register each binding with and without the pkg:: prefix
    • add / update unit tests for the nse_funcs bindings to include at least one pkg::fun() call for each binding
  • register nse_funcs requiring direct mapping (unary and binary bindings)
    • register each binding with and without the pkg:: prefix
    • add / update unit tests for the nse_funcs bindings to include at least one pkg::fun() call for each binding
  • register agg_funcs for use with summarise()
  • document changes in the Writing bindings documentation
    • going forward we should be using pkg::fun when defining a binding, which will register 2 copies of the same binding.

      Different implementation options are outlined and discussed in the design document.

      {}Description{}:
      Currently we implement a number of functions from packages like lubridate which work well when called without namespacing (e.g. {}year(){}), however if someone calls lubridate::year() we get a not-implemented method (e.g. {}Warning: Expression lubridate::year(time_hour) not supported in Arrow{}). Is it possible for us to look and see if we have an arrow function that matches the function itself.
      {code:r}
      library(arrow, warn.conflicts = FALSE)
      library(dplyr, warn.conflicts = FALSE)

      ds <- InMemoryDataset$create(nycflights13::flights)

      ds %>%
      mutate(year = lubridate::year(time_hour)) %>%
      collect()
      #> Warning: Expression lubridate::year(time_hour) not supported in Arrow; pulling
      #> data into R
      #> # A tibble: 336,776 × 19
      #> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
      #>
      #> 1 2013 1 1 517 515 2 830 819
      #> 2 2013 1 1 533 529 4 850 830
      #> 3 2013 1 1 542 540 2 923 850
      #> 4 2013 1 1 544 545 -1 1004 1022
      #> 5 2013 1 1 554 600 -6 812 837
      #> 6 2013 1 1 554 558 -4 740 728
      #> 7 2013 1 1 555 600 -5 913 854
      #> 8 2013 1 1 557 600 -3 709 723
      #> 9 2013 1 1 557 600 -3 838 846
      #> 10 2013 1 1 558 600 -2 753 745
      #> # … with 336,766 more rows, and 11 more variables: arr_delay ,
      #> # carrier , flight , tailnum , origin , dest ,
      #> # air_time , distance , hour , minute , time_hour

ds %>%
mutate(year = year(time_hour)) %>%
collect()
#> # A tibble: 336,776 × 19
#> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#>
#> 1 2013 1 1 517 515 2 830 819
#> 2 2013 1 1 533 529 4 850 830
#> 3 2013 1 1 542 540 2 923 850
#> 4 2013 1 1 544 545 -1 1004 1022
#> 5 2013 1 1 554 600 -6 812 837
#> 6 2013 1 1 554 558 -4 740 728
#> 7 2013 1 1 555 600 -5 913 854
#> 8 2013 1 1 557 600 -3 709 723
#> 9 2013 1 1 557 600 -3 838 846
#> 10 2013 1 1 558 600 -2 753 745
#> # … with 336,766 more rows, and 11 more variables: arr_delay ,
#> # carrier , flight , tailnum , origin , dest ,
#> # air_time , distance , hour , minute , time_hour
{code}

Reporter: Jonathan Keane / @jonkeane
Assignee: Dragoș Moldovan-Grünfeld / @dragosmg

Subtasks:

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-14575. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions