Skip to content

Conversation

venom1204
Copy link
Contributor

closes #2606

This PR adds a new recursive=TRUE argument to the tables() function to detect data.table objects nested within plain lists,

Changes introduced:

  • Added recursive=FALSE (default) to preserve backward compatibility.

When recursive=TRUE, tables():

  • Recursively traverses plain lists in the environment to find nested data.tables.
  • Skips data.frame and data.table containers to avoid descending into list-columns.
  • Handles cyclic list references safely using an internal visited environment.
  • Generates user-friendly names for nested objects (e.g., mylist$dt, nested[[1]][[2]]).

The recursive logic is implemented iteratively for performance and stack safety.

hi @MichaelChirico can you please have a look when you have time.
thanks..

@venom1204 venom1204 requested a review from MichaelChirico as a code owner July 8, 2025 16:52
Copy link

codecov bot commented Jul 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (master@c27ec26). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff            @@
##             master    #7141   +/-   ##
=========================================
  Coverage          ?   98.79%           
=========================================
  Files             ?       81           
  Lines             ?    15286           
  Branches          ?        0           
=========================================
  Hits              ?    15102           
  Misses            ?      184           
  Partials          ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

github-actions bot commented Jul 8, 2025

  • HEAD=issue_2606 slower P<0.001 for memrecycle regression fixed in #5463
  • HEAD=issue_2606 slower P<0.001 for DT[by,verbose=TRUE] improved in #6296
    Comparison Plot

Generated via commit dec9834

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 2 minutes and 59 seconds
Installing different package versions 1 minutes and 19 seconds
Running and plotting the test cases 2 minutes and 34 seconds

@venom1204
Copy link
Contributor Author

hi @MichaelChirico ,can you please review it if has met the required conditions.

working of the fix -

  • Activation: The new search logic only runs if recursive is TRUE. Otherwise, the function behaves exactly as it did before.

  • The "To-Do List": It creates a to-do list (an agenda) that starts with all the objects in your current workspace.

The Work Loop: The code then loops through this agenda one item at a time until it's empty. Inside the loop:

  • If it's a data.table: It's a match! The item is saved to a final results list.

  • If it's a list: This is where it goes deeper. All of the list's contents are added to the front of the to-do list, each with a descriptive name (like my_list$item).

  • If it's anything else: It's simply ignored.

  • The Safety Net (Cycle Detection): This is the most important part. Before exploring a list, the code checks a high-speed "memory" to see if it has visited that exact list before. If so, it skips it. This prevents the code from ever getting stuck in an infinite loop.

  • Displaying Results: After the search is complete, the function takes all the data.tables it found and formats them into the same neat summary table as before.

@MichaelChirico
Copy link
Member

I will take some time to review as this PR is non-trivial, thanks for your patience.

@@ -21365,3 +21365,28 @@ test(2328.1, levels(droplevels(DT)$f), character())
DT[, i := integer()]
DT[, f2 := factor()]
test(2328.2, droplevels(DT), data.table(f=factor(), i=integer(), f2=factor()))

#2606
test(2329.1, {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid using many expressions inside test(); also, note that the default for y is TRUE, so omit it if that's the target value you're testing.

For tables(), it's a bit troublesome to test because it's specific to the current environment. I recommend trying something like this for these tests:

local({
  dt1 <- data.table(a=1)
  tables()
})

That is assured to only pick up objects defined within the local({...}) scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it 👍

R/tables.R Outdated
found_items = lapply(w, function(i) list(name=names[i], obj=obj[[i]]))
}
}
if (!length(found_items)) { # MODIFIED: Check `found_items` instead of `w`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this comment

@@ -21426,3 +21426,27 @@ test(2330.7, as.data.table(list(z), keep.rownames=TRUE), data.table(rn=rep("", 3

M <- matrix(1:6, nrow=3, dimnames=list(rep("", 3), c("V1", "V2"))) # test of list(M) for empty-rowname'd matrix input
test(2330.8, as.data.table(list(M), keep.rownames=TRUE), data.table(rn=rep("", 3), V1=1:3, V2=4:6))

#2606 Recursive tables() naming convention
test(2331.1, local({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would combine these three into one test:

local({
  lst_named = list(...)
  lst_unnamed = list(...)
  nested = list(...)
  test(, tables(recursive=TRUE)$NAME, c(...))
})

Please also include tests that cover:

  • Mixed named/unnamed simple lists: list(data.table(..), b=data.table(...))
  • Mixing named/unnamed in a nested setting too

"nested$l1$l2" %in% tables(recursive = TRUE)$NAME
}))
test(2331.4, local({
cycle <- list()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for cycles, I don't think list() is the case we should care about, AFAIK it's not possible to create an address cycle in a list() because of R's copy-on-write semantics. Rather, we should be testing the behavior of an environment with such a cycle, c.f.

l = list(1)
l[[2]] = l
address(l) == address(l[[2]])
# [1] FALSE

e =list2env(list(a = 1))
e$cpy = e
address(e) == address(e$cpy)
# [1] TRUE

@@ -21489,3 +21489,45 @@ if (test_bit64) {
# support rotate idiom
M1 = M2 = matrix(1:4, nrow=2)
test(2332.81, {M1[]=frev(M1); M1}, {M2[]=rev(M2); M2})

#2606 Recursive tables() naming convention
test(2333.1, local({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please read my response more carefully, this is not what I recommended

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve merged the three tests into a single test case as suggested. Could you please let me know if there’s anything that might still be missing ? —thanks again!

@@ -21523,3 +21523,40 @@ test(2332.81, {M1[]=frev(M1); M1}, {M2[]=rev(M2); M2})

# regression test of edge case report #4964
test(2333, as.expression(data.table(a = 1))[["a"]], 1)

#2606 Recursive tables() naming convention
test(2334.1, local({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again: don't use local() inside test(), but the other way around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have submitted the modified test do they align with the requirements .
thankks

R/tables.R Outdated
} else {
w = which(vapply_1b(objs, is.data.table))
if (length(w)) {
found_items = lapply(w, function(i) list(name=names[i], obj=obj[[i]]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this lead to new copies of the data.table? please check the performance, e.g. consider adding an {atime} test here that confirms tables() performs the same under !recursive before and after this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No copies are made. found_items stores a list of names and references (pointers) to the original data.tables.

@@ -286,5 +286,18 @@ test.list <- atime::atime_test_list(
Slow = "548410d23dd74b625e8ea9aeb1a5d2e9dddd2927", # Parent of the first commit in the PR (https://github.com/Rdatatable/data.table/commit/548410d23dd74b625e8ea9aeb1a5d2e9dddd2927)
Fast = "c0b32a60466bed0e63420ec105bc75c34590865e"), # Commit in the PR (https://github.com/Rdatatable/data.table/pull/7144/commits) that uses a much faster implementation

"tables() !recursive refactor in #2606" = atime::atime_test(
N = as.integer(10^seq(1, 4, by=0.5)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should omit N unless you need it to be different from the global definition above (and I don't think you do in this case).

  N = as.integer(10^seq(1, 7, by=0.25)),

Comment on lines 299 to 300
"before" = "5bb645082aa5c4a295cdd211a5a75c849d590b75",
"after" = "8978cf201d8d228506e1e96d3eda7e542471720a"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove double quotes from "before" and "after" (they are only necessary for non-syntactic names)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also please add comments to explain where these version numbers come from.

@tdhock
Copy link
Member

tdhock commented Jul 21, 2025

atime job errors with this output

* DONE (data.table.8978cf201d8d228506e1e96d3eda7e542471720a)
Registered S3 methods overwritten by 'data.table.5bb645082aa5c4a295cdd211a5a75c849d590b75':
  method                   from                                               
  all.equal.data.table     data.table.c0b32a60466bed0e63420ec105bc75c34590865e
...
  sort_by.data.table       data.table.c0b32a60466bed0e63420ec105bc75c34590865e
Error: Error in `[.data.frame`(x, i) : undefined columns selected
Calls: <Anonymous> ... <Anonymous> -> system.time -> do.call -> <Anonymous>
Timing stopped at: 0.06 0.002 0.062
Execution halted
Error: Process completed with exit code 1.

this suggests there is some issue with the version of data.table it is trying to run, 5bb6450, which is the "before" historical version introduced in the new atime test.

@venom1204 venom1204 requested a review from tdhock July 21, 2025 17:03
Comment on lines 303 to 304
before = "2b191aec3df51675a7ab5e6701384a8b89470af6", # The merge-base parent commit for PR #2606
after = "66024e6ea5304ae059b18c71de9973715ddb77ce"), # The final commit of PR #2606
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the same template for comments as the other test cases (inlcude URLs)
also "final commit" does not make sense for a PR which has not yet been merged

@venom1204 venom1204 requested a review from tdhock July 21, 2025 17:21
@@ -300,8 +300,8 @@ test.list <- atime::atime_test_list(
data.table::tables(env = test_env, silent = TRUE, index = TRUE)
NULL
},
before = "2b191aec3df51675a7ab5e6701384a8b89470af6", # The merge-base parent commit for PR #2606
after = "66024e6ea5304ae059b18c71de9973715ddb77ce"), # The final commit of PR #2606
before = "2b191aec3df51675a7ab5e6701384a8b89470af6", # Merge-base parent commit for PR #2606 (https://github.com/Rdatatable/data.table/commit/2b191aec3df51675a7ab5e6701384a8b89470af6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please read https://github.com/Rdatatable/data.table/wiki/Performance-testing#documentation-of-historical-commits
the link should be to a PR page (on a page where we can see how the commit id is related to the PR), not to the commit page (on which we do not see anything related to PR2606).

also there is no PR2606, but there is an issue 2606.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also there is no PR2606, but there is an issue 2606.
that was a type error it was meant to be 7141

before = "2b191aec3df51675a7ab5e6701384a8b89470af6", # Merge-base parent commit for PR #2606 (https://github.com/Rdatatable/data.table/commit/2b191aec3df51675a7ab5e6701384a8b89470af6
after = "66024e6ea5304ae059b18c71de9973715ddb77ce"), # Commit in the PR (https://github.com/Rdatatable/data.table/pull/7141/commits)
before = "2b191aec3df51675a7ab5e6701384a8b89470af6", # Parent of the first commit in the PR (https://github.com/Rdatatable/data.table/commit/2b191aec3df51675a7ab5e6701384a8b89470af6)
after = "66024e6ea5304ae059b1e96d3eda7e542471720a"), # Commit in the PR (https://github.com/Rdatatable/data.table/pull/7141/commits) that uses a much faster implementation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is much faster then please change before to Slow and after to Fast

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was only for checking, to ensure there’s no performance regression after adding the recursive argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tables could look for en-list-ed data.tables as well
3 participants