-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-14575: [R] Allow functions with pkg:: prefixes
#13160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few thoughts on the approach! This will be very exciting to have implemented.
pkg:: prefixes
|
It sounds like having a defused version of the Another design choice would be the location for the bindings. Do we want?
I am in favour of the arrow:::register_binding("::", function(lhs, rhs) {
lhs_name <- as.character(substitute(lhs))
rhs_name <- as.character(substitute(rhs))
arrow:::nse_funcs[[lhs_name]][[rhs_name]]
}) |
|
Given that currently we use a "single-bucket" approach, I'd keep it that way in the short term, unless there's a specific reason to change it? We can update it later if necessary. |
Yaya, that's what I'm working on. I went for the mixed bucket approach yesterday evening and earlier today and got really messy really quickly. |
|
I quite liked your suggestion of the single bucket but registering twice (i.e., do |
|
(Since starting to write that note I now see that you both have already hashed that out!) |
|
@paleolimbot @thisisnic @jonkeane Would you mind having another look? I can't request a review from @paleolimbot |
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good...I imagine the next step is to add the function prefixes (including base::) to all the regsiter_binding() calls and add some tests (maybe one per package?) to make sure that when somebody does call lubridate::as_datetime() that it works as we expect.
Yep, that's the plan. There are some tests failing - I need to deal with that too. |
|
outstanding issues:
arrow/r/tests/testthat/test-dplyr-funcs-datetime.R Lines 918 to 922 in b7507c3
|
|
I have no issue with either |
|
I think you're safe to make a user prefix |
a49eda5 to
8944f5b
Compare
|
I think this is ready for another look, with the following caveats:
|
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few next steps for you...getting this to work in summarise() will be a set of changes with a different scope because it does its own inspection of the syntax tree. I think you should open another JIRA for it and implement it in a separate PR. I think the filter() issue you identified is actually a general issue and I think one of the review comments I left will fix it.
81aa495 to
642c23e
Compare
|
@nealrichardson I think I addressed all the feedback. I'm hoping I can maybe run the CI over the weekend if the r-hub image gets updated. |
nealrichardson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work! CI failures are unrelated, and since this fixes the lint issue that has appeared elsewhere, I'm inclined to merge sooner than later.
|
@github-actions crossbow submit test-r-minimal-build test-r-versions |
|
Revision: ac316ee Submitted crossbow builds: ursacomputing/crossbow @ actions-6a3852ff25
|
|
Benchmark runs are scheduled for baseline = 29cc263 and contender = 3e0eea1. 3e0eea1 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
…Hub issue numbers (#34260) Rewrite the Jira issue numbers to the GitHub issue numbers, so that the GitHub issue numbers are automatically linked to the issues by pkgdown's auto-linking feature. Issue numbers have been rewritten based on the following correspondence. Also, the pkgdown settings have been changed and updated to link to GitHub. I generated the Changelog page using the `pkgdown::build_news()` function and verified that the links work correctly. --- ARROW-6338 #5198 ARROW-6364 #5201 ARROW-6323 #5169 ARROW-6278 #5141 ARROW-6360 #5329 ARROW-6533 #5450 ARROW-6348 #5223 ARROW-6337 #5399 ARROW-10850 #9128 ARROW-10624 #9092 ARROW-10386 #8549 ARROW-6994 #23308 ARROW-12774 #10320 ARROW-12670 #10287 ARROW-16828 #13484 ARROW-14989 #13482 ARROW-16977 #13514 ARROW-13404 #10999 ARROW-16887 #13601 ARROW-15906 #13206 ARROW-15280 #13171 ARROW-16144 #13183 ARROW-16511 #13105 ARROW-16085 #13088 ARROW-16715 #13555 ARROW-16268 #13550 ARROW-16700 #13518 ARROW-16807 #13583 ARROW-16871 #13517 ARROW-16415 #13190 ARROW-14821 #12154 ARROW-16439 #13174 ARROW-16394 #13118 ARROW-16516 #13163 ARROW-16395 #13627 ARROW-14848 #12589 ARROW-16407 #13196 ARROW-16653 #13506 ARROW-14575 #13160 ARROW-15271 #13170 ARROW-16703 #13650 ARROW-16444 #13397 ARROW-15016 #13541 ARROW-16776 #13563 ARROW-15622 #13090 ARROW-18131 #14484 ARROW-18305 #14581 ARROW-18285 #14615 * Closes: #33631 Authored-by: SHIMA Tatsuya <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
This PR will allow the use of namespacing with bindings:
Created on 2022-05-14 by the reprex package (v2.0.1)
The approach (option 1 from the design doc):
add functionality to allow binding registration with the
pkg::fun()name;register_binding()to register 2 identical copies for eachpkg::funbinding, namelyfunandpkg::fun.::operator, which helps with retrieving bindings from the function registry.pkg::funfunctionality.register
nse_funcsrequiring indirect mappingpkg::prefix.nse_funcsbindings to include at least onepkg::fun()call for each bindingunit tests for conditional bindings
"dplyr::coalesce""dplyr::if_else""base::ifelse""dplyr::case_when"unit tests for date/time bindings
"base::strptime""base::strftime""lubridate::format_ISO8601""lubridate::is.Date""lubridate::is.instant""lubridate::is.timepoint""lubridate::is.POSIXct""lubridate::date""lubridate::second""lubridate::wday""lubridate::week""lubridate::month""lubridate::am""lubridate::pm""lubridate::tz""lubridate::semester""lubridate::make_datetime""lubridate::make_date""base::ISOdatetime""base::ISOdate""base::as.Date""lubridate::as_date""lubridate::as_datetime""lubridate::decimal_date""lubridate::date_decimal""base::difftime""base::as.difftime""lubridate::make_difftime""lubridate::dminutes""lubridate::dhours""lubridate::ddays""lubridate::dweeks""lubridate::dmonths""lubridate::dyears""lubridate::dseconds""lubridate::dmilliseconds""lubridate::dmicroseconds""lubridate::dnanoseconds""lubridate::dpicoseconds""lubridate::parse_date_time""lubridate::ymd""lubridate::ydm""lubridate::mdy""lubridate::myd""lubridate::dmy""lubridate::dym""lubridate::ym""lubridate::my""lubridate::yq""lubridate::fast_strptime"unit tests for math bindings
"base::log""base::logb""base::pmin""base::pmax""base::trunc""base::round""base::sqrt""base::exp"unit tests for string bindings
"base::paste""base::paste0""stringr::str_c""base::grepl""stringr::str_detect""stringr::str_like""stringr::str_count""base::startsWith""base::endsWith""stringr::str_starts""stringr::str_ends""base::sub""base::gsub""stringr::str_replace""stringr::str_replace_all""base::strsplit""stringr::str_split""base::nchar""stringr::str_to_lower""stringr::str_to_upper""stringr::str_to_title""stringr::str_trim""base::substr""base::substring""stringr::str_sub""stringr::str_pad"unit tests for type bindings
"base::as.character""base::as.double""base::as.integer""bit64::as.integer64""base::as.logical""base::as.numeric""methods::is""tibble::tibble""base::data.frame""base::is.character""base::is.numeric""base::is.double""base::is.integer""bit64::is.integer64""base::is.logical""base::is.factor""base::is.list""rlang::is_character""rlang::is_double""rlang::is_integer""rlang::is_list""rlang::is_logical""base::is.na""base::is.nan""dplyr::between""base::is.finite""base::is.infinite""base::format"register
nse_funcsrequiring direct mapping (unary and binary bindings)nse_funcsbindings to include at least onepkg::fun()call for each bindingUnary and binary bindings unit tests
arithmetic functions
"base::abs""base::ceiling""base::floor""base::log10""base::log1p""base::log2""base::sign"trigonometric functions
"base::acos""base::asin""base::cos""base::sin""base::tan"string functions
"stringr::str_length""stringi::stri_reverse""base::tolower""base::toupper"date and time functions
"lubridate::day""lubridate::dst""lubridate::hour""lubridate::isoweek""lubridate::epiweek""lubridate::isoyear""lubridate::epiyear""lubridate::minute""lubridate::quarter""lubridate::mday""lubridate::yday""lubridate::year""lubridate::leap_year"type conversion functions
"base::as.factor"binary functions
"base::strrep""stringr::str_dup"aggregating functions
agg_funcsagg_funcsunit tests for aggregating bindings
"base::sum""base::any""base::all""base::mean""stats::sd""stats::var""stats::quantile""stats::median""dplyr::n_distinct""dplyr::n""base::min""base::max"namespace qualified bindings work inside the {dplyr} action verbs:
filter()mutate()transmute()group_by()summarise()document changes in the Writing bindings article.
pkg::funwhen defining a binding, which will register 2 copies of the same binding.Bindings that will not be registered with a
pkg::prefix:cast()ordictionary_encode(), and"!","==","!=",">",">=","<","<=","&", etc.)