Skip to content

GForce as.double / as.numeric #2934

@HughParsonage

Description

@HughParsonage

Is there a way to use GForce optimization with sum without encountering the warning about the group being coerced to numeric? If not, I propose that as.double be included in GForce, so that one does not have to choose between warnings and optimization.

That is, in the example below, the expression xsum = as.numeric(sum(x)) should be GForced.

# Minimal reproducible example

library(data.table)
dt <- data.table(x = rep(1e9L, 5e6), y = c(TRUE, FALSE))
bench::system_time(dt[, .(xsum = sum(x)), by = "y", verbose = TRUE])
#> Detected that j uses these columns: x 
#> Finding groups using forderv ... 0.010sec 
#> Finding group sizes from the positions (can be avoided to save RAM) ... 0.000sec 
#> Getting back original order ... 0.000sec 
#> lapply optimization is on, j unchanged as 'list(sum(x))'
#> GForce optimized j to 'list(gsum(x))'
#> Making each group and running j (GForce TRUE) ...
#> Warning in gsum(x): Group 1 summed to more than type 'integer' can hold so
#> the result has been coerced to 'numeric' automatically, for convenience.
#> 0.050sec
#> process    real 
#>   125ms   128ms
bench::system_time(dt[, .(xsum = as.numeric(sum(x))), by = "y", verbose = TRUE])
#> Detected that j uses these columns: x 
#> Finding groups using forderv ... 0.020sec 
#> Finding group sizes from the positions (can be avoided to save RAM) ... 0.000sec 
#> Getting back original order ... 0.000sec 
#> lapply optimization is on, j unchanged as 'list(as.numeric(sum(x)))'
#> GForce is on, left j unchanged
#> Old mean optimization is on, left j unchanged.
#> Making each group and running j (GForce FALSE) ... 
#>   collecting discontiguous groups took 0.053s for 2 groups
#>   eval(j) took 0.004s for 2 calls
#> 0.040sec
#> process    real 
#>   141ms   149ms
sessionInfo()
#> R version 3.5.0 (2018-04-23)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 17134)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252   
#> [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
#> [5] LC_TIME=English_Australia.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] data.table_1.11.4
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_3.5.0  backports_1.1.2 bench_1.0.1     magrittr_1.5   
#>  [5] rprojroot_1.3-2 tools_3.5.0     htmltools_0.3.6 yaml_2.1.19    
#>  [9] Rcpp_0.12.17    stringi_1.2.2   rmarkdown_1.9   knitr_1.20.3   
#> [13] stringr_1.3.1   digest_0.6.15   evaluate_0.10.3

Created on 2018-06-14 by the reprex package (v0.2.0).

Metadata

Metadata

Assignees

No one assigned

    Labels

    GForceissues relating to optimized grouping calculations (GForce)enhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions