-
Notifications
You must be signed in to change notification settings - Fork 1k
Open
Labels
GForceissues relating to optimized grouping calculations (GForce)issues relating to optimized grouping calculations (GForce)enhancement
Description
Is there a way to use GForce optimization with sum without encountering the warning about the group being coerced to numeric? If not, I propose that as.double be included in GForce, so that one does not have to choose between warnings and optimization.
That is, in the example below, the expression xsum = as.numeric(sum(x)) should be GForced.
# Minimal reproducible example
library(data.table)
dt <- data.table(x = rep(1e9L, 5e6), y = c(TRUE, FALSE))
bench::system_time(dt[, .(xsum = sum(x)), by = "y", verbose = TRUE])
#> Detected that j uses these columns: x
#> Finding groups using forderv ... 0.010sec
#> Finding group sizes from the positions (can be avoided to save RAM) ... 0.000sec
#> Getting back original order ... 0.000sec
#> lapply optimization is on, j unchanged as 'list(sum(x))'
#> GForce optimized j to 'list(gsum(x))'
#> Making each group and running j (GForce TRUE) ...
#> Warning in gsum(x): Group 1 summed to more than type 'integer' can hold so
#> the result has been coerced to 'numeric' automatically, for convenience.
#> 0.050sec
#> process real
#> 125ms 128ms
bench::system_time(dt[, .(xsum = as.numeric(sum(x))), by = "y", verbose = TRUE])
#> Detected that j uses these columns: x
#> Finding groups using forderv ... 0.020sec
#> Finding group sizes from the positions (can be avoided to save RAM) ... 0.000sec
#> Getting back original order ... 0.000sec
#> lapply optimization is on, j unchanged as 'list(as.numeric(sum(x)))'
#> GForce is on, left j unchanged
#> Old mean optimization is on, left j unchanged.
#> Making each group and running j (GForce FALSE) ...
#> collecting discontiguous groups took 0.053s for 2 groups
#> eval(j) took 0.004s for 2 calls
#> 0.040sec
#> process real
#> 141ms 149ms
sessionInfo()
#> R version 3.5.0 (2018-04-23)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 17134)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
#> [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
#> [5] LC_TIME=English_Australia.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] data.table_1.11.4
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_3.5.0 backports_1.1.2 bench_1.0.1 magrittr_1.5
#> [5] rprojroot_1.3-2 tools_3.5.0 htmltools_0.3.6 yaml_2.1.19
#> [9] Rcpp_0.12.17 stringi_1.2.2 rmarkdown_1.9 knitr_1.20.3
#> [13] stringr_1.3.1 digest_0.6.15 evaluate_0.10.3Created on 2018-06-14 by the reprex package (v0.2.0).
Metadata
Metadata
Assignees
Labels
GForceissues relating to optimized grouping calculations (GForce)issues relating to optimized grouping calculations (GForce)enhancement