-
Notifications
You must be signed in to change notification settings - Fork 1k
tables() supports recursive search for nested data.tables via recursive=TRUE argument #7141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7141 +/- ##
=========================================
Coverage ? 98.79%
=========================================
Files ? 81
Lines ? 15286
Branches ? 0
=========================================
Hits ? 15102
Misses ? 184
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Generated via commit dec9834 Download link for the artifact containing the test results: ↓ atime-results.zip
|
hi @MichaelChirico ,can you please review it if has met the required conditions. working of the fix -
The Work Loop: The code then loops through this agenda one item at a time until it's empty. Inside the loop:
|
I will take some time to review as this PR is non-trivial, thanks for your patience. |
inst/tests/tests.Rraw
Outdated
@@ -21365,3 +21365,28 @@ test(2328.1, levels(droplevels(DT)$f), character()) | |||
DT[, i := integer()] | |||
DT[, f2 := factor()] | |||
test(2328.2, droplevels(DT), data.table(f=factor(), i=integer(), f2=factor())) | |||
|
|||
#2606 | |||
test(2329.1, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please avoid using many expressions inside test()
; also, note that the default for y
is TRUE
, so omit it if that's the target value you're testing.
For tables()
, it's a bit troublesome to test because it's specific to the current environment. I recommend trying something like this for these tests:
local({
dt1 <- data.table(a=1)
tables()
})
That is assured to only pick up objects defined within the local({...})
scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated it 👍
R/tables.R
Outdated
found_items = lapply(w, function(i) list(name=names[i], obj=obj[[i]])) | ||
} | ||
} | ||
if (!length(found_items)) { # MODIFIED: Check `found_items` instead of `w` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this comment
inst/tests/tests.Rraw
Outdated
@@ -21426,3 +21426,27 @@ test(2330.7, as.data.table(list(z), keep.rownames=TRUE), data.table(rn=rep("", 3 | |||
|
|||
M <- matrix(1:6, nrow=3, dimnames=list(rep("", 3), c("V1", "V2"))) # test of list(M) for empty-rowname'd matrix input | |||
test(2330.8, as.data.table(list(M), keep.rownames=TRUE), data.table(rn=rep("", 3), V1=1:3, V2=4:6)) | |||
|
|||
#2606 Recursive tables() naming convention | |||
test(2331.1, local({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would combine these three into one test:
local({
lst_named = list(...)
lst_unnamed = list(...)
nested = list(...)
test(, tables(recursive=TRUE)$NAME, c(...))
})
Please also include tests that cover:
- Mixed named/unnamed simple lists:
list(data.table(..), b=data.table(...))
- Mixing named/unnamed in a nested setting too
inst/tests/tests.Rraw
Outdated
"nested$l1$l2" %in% tables(recursive = TRUE)$NAME | ||
})) | ||
test(2331.4, local({ | ||
cycle <- list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for cycles, I don't think list()
is the case we should care about, AFAIK it's not possible to create an address cycle in a list() because of R's copy-on-write semantics. Rather, we should be testing the behavior of an environment with such a cycle, c.f.
l = list(1)
l[[2]] = l
address(l) == address(l[[2]])
# [1] FALSE
e =list2env(list(a = 1))
e$cpy = e
address(e) == address(e$cpy)
# [1] TRUE
inst/tests/tests.Rraw
Outdated
@@ -21489,3 +21489,45 @@ if (test_bit64) { | |||
# support rotate idiom | |||
M1 = M2 = matrix(1:4, nrow=2) | |||
test(2332.81, {M1[]=frev(M1); M1}, {M2[]=rev(M2); M2}) | |||
|
|||
#2606 Recursive tables() naming convention | |||
test(2333.1, local({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please read my response more carefully, this is not what I recommended
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve merged the three tests into a single test case as suggested. Could you please let me know if there’s anything that might still be missing ? —thanks again!
inst/tests/tests.Rraw
Outdated
@@ -21523,3 +21523,40 @@ test(2332.81, {M1[]=frev(M1); M1}, {M2[]=rev(M2); M2}) | |||
|
|||
# regression test of edge case report #4964 | |||
test(2333, as.expression(data.table(a = 1))[["a"]], 1) | |||
|
|||
#2606 Recursive tables() naming convention | |||
test(2334.1, local({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again: don't use local()
inside test()
, but the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have submitted the modified test do they align with the requirements .
thankks
R/tables.R
Outdated
} else { | ||
w = which(vapply_1b(objs, is.data.table)) | ||
if (length(w)) { | ||
found_items = lapply(w, function(i) list(name=names[i], obj=obj[[i]])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this lead to new copies of the data.table? please check the performance, e.g. consider adding an {atime} test here that confirms tables()
performs the same under !recursive
before and after this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No copies are made. found_items stores a list of names and references (pointers) to the original data.tables.
.ci/atime/tests.R
Outdated
@@ -286,5 +286,18 @@ test.list <- atime::atime_test_list( | |||
Slow = "548410d23dd74b625e8ea9aeb1a5d2e9dddd2927", # Parent of the first commit in the PR (https://github.com/Rdatatable/data.table/commit/548410d23dd74b625e8ea9aeb1a5d2e9dddd2927) | |||
Fast = "c0b32a60466bed0e63420ec105bc75c34590865e"), # Commit in the PR (https://github.com/Rdatatable/data.table/pull/7144/commits) that uses a much faster implementation | |||
|
|||
"tables() !recursive refactor in #2606" = atime::atime_test( | |||
N = as.integer(10^seq(1, 4, by=0.5)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should omit N unless you need it to be different from the global definition above (and I don't think you do in this case).
N = as.integer(10^seq(1, 7, by=0.25)),
.ci/atime/tests.R
Outdated
"before" = "5bb645082aa5c4a295cdd211a5a75c849d590b75", | ||
"after" = "8978cf201d8d228506e1e96d3eda7e542471720a"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove double quotes from "before" and "after" (they are only necessary for non-syntactic names)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also please add comments to explain where these version numbers come from.
atime job errors with this output * DONE (data.table.8978cf201d8d228506e1e96d3eda7e542471720a)
Registered S3 methods overwritten by 'data.table.5bb645082aa5c4a295cdd211a5a75c849d590b75':
method from
all.equal.data.table data.table.c0b32a60466bed0e63420ec105bc75c34590865e
...
sort_by.data.table data.table.c0b32a60466bed0e63420ec105bc75c34590865e
Error: Error in `[.data.frame`(x, i) : undefined columns selected
Calls: <Anonymous> ... <Anonymous> -> system.time -> do.call -> <Anonymous>
Timing stopped at: 0.06 0.002 0.062
Execution halted
Error: Process completed with exit code 1. this suggests there is some issue with the version of data.table it is trying to run, 5bb6450, which is the "before" historical version introduced in the new atime test. |
.ci/atime/tests.R
Outdated
before = "2b191aec3df51675a7ab5e6701384a8b89470af6", # The merge-base parent commit for PR #2606 | ||
after = "66024e6ea5304ae059b18c71de9973715ddb77ce"), # The final commit of PR #2606 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use the same template for comments as the other test cases (inlcude URLs)
also "final commit" does not make sense for a PR which has not yet been merged
.ci/atime/tests.R
Outdated
@@ -300,8 +300,8 @@ test.list <- atime::atime_test_list( | |||
data.table::tables(env = test_env, silent = TRUE, index = TRUE) | |||
NULL | |||
}, | |||
before = "2b191aec3df51675a7ab5e6701384a8b89470af6", # The merge-base parent commit for PR #2606 | |||
after = "66024e6ea5304ae059b18c71de9973715ddb77ce"), # The final commit of PR #2606 | |||
before = "2b191aec3df51675a7ab5e6701384a8b89470af6", # Merge-base parent commit for PR #2606 (https://github.com/Rdatatable/data.table/commit/2b191aec3df51675a7ab5e6701384a8b89470af6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please read https://github.com/Rdatatable/data.table/wiki/Performance-testing#documentation-of-historical-commits
the link should be to a PR page (on a page where we can see how the commit id is related to the PR), not to the commit page (on which we do not see anything related to PR2606).
also there is no PR2606, but there is an issue 2606.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also there is no PR2606, but there is an issue 2606.
that was a type error it was meant to be 7141
.ci/atime/tests.R
Outdated
before = "2b191aec3df51675a7ab5e6701384a8b89470af6", # Merge-base parent commit for PR #2606 (https://github.com/Rdatatable/data.table/commit/2b191aec3df51675a7ab5e6701384a8b89470af6 | ||
after = "66024e6ea5304ae059b18c71de9973715ddb77ce"), # Commit in the PR (https://github.com/Rdatatable/data.table/pull/7141/commits) | ||
before = "2b191aec3df51675a7ab5e6701384a8b89470af6", # Parent of the first commit in the PR (https://github.com/Rdatatable/data.table/commit/2b191aec3df51675a7ab5e6701384a8b89470af6) | ||
after = "66024e6ea5304ae059b1e96d3eda7e542471720a"), # Commit in the PR (https://github.com/Rdatatable/data.table/pull/7141/commits) that uses a much faster implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is much faster then please change before to Slow and after to Fast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was only for checking, to ensure there’s no performance regression after adding the recursive argument.
closes #2606
This PR adds a new recursive=TRUE argument to the tables() function to detect data.table objects nested within plain lists,
Changes introduced:
When recursive=TRUE, tables():
The recursive logic is implemented iteratively for performance and stack safety.
hi @MichaelChirico can you please have a look when you have time.
thanks..