Skip to content

Conversation

venom1204
Copy link
Contributor

@venom1204 venom1204 commented Jul 11, 2025

Closes #5829

changes made-

  • Added a check to handle cases where a function (closure) is mistakenly used on the RHS of :=.
  • Specifically checks: is.function(jval)
  • Placed inside the if (length(newnames) > 0) block — so it only triggers when creating a new column.
  • Prevents malformed data.table objects and confusing errors.

At present, this only covers the function case. Other exotic RHS types (e.g., environments, external pointers) are not yet handled.

Is there anything else that should be checked or added to this condition?
i closed the previous pr and opened this, the whole summary is provided above.

hi @MichaelChirico @tdhock can you please review this when you have time

thanks for your time.

Copy link

github-actions bot commented Jul 11, 2025

  • HEAD=issue_5829 slower P<0.001 for memrecycle regression fixed in #5463
  • HEAD=issue_5829 slower P<0.001 for setDT improved in #5427
    Comparison Plot

Generated via commit c8cec80

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 2 minutes and 51 seconds
Installing different package versions 20 seconds
Running and plotting the test cases 2 minutes and 34 seconds

Copy link

codecov bot commented Jul 11, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.51%. Comparing base (2f0d12f) to head (c8cec80).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7161   +/-   ##
=======================================
  Coverage   98.50%   98.51%           
=======================================
  Files          81       81           
  Lines       15032    15036    +4     
=======================================
+ Hits        14808    14812    +4     
  Misses        224      224           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tdhock
Copy link
Member

tdhock commented Jul 11, 2025

Why did you open a new PR? https://github.com/Rdatatable/data.table/pull/6742/files has the same changes, right? Can you please close one or the other?

@venom1204
Copy link
Contributor Author

i closed the previous one ,i thought to make a branch in this repo so ...

@@ -1411,7 +1411,13 @@ replace_dot_alias = function(e) {
}

if (!is.null(lhs)) {
# TODO?: use set() here now that it can add new columns. Then remove newnames and alloc logic above.
newnames = setdiff(lhs, names(x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between this assignment and the one above?

newnames=setdiff(lhs, names_x)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both compute setdiff(lhs, names(x)), but the one above uses the cached names_x. I’ll update the later one to use names_x as well for consistency and to avoid unnecessary recomputation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry i don't understand, why recalculate newnames at all if it's just the same value as above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks. The second newnames assignment was redundant. I will take care of it.

@@ -1411,7 +1411,13 @@ replace_dot_alias = function(e) {
}

if (!is.null(lhs)) {
# TODO?: use set() here now that it can add new columns. Then remove newnames and alloc logic above.
newnames = setdiff(lhs, names(x))
if (length(newnames) > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this error not happen for the case of overwriting an existing column?

# TODO?: use set() here now that it can add new columns. Then remove newnames and alloc logic above.
newnames = setdiff(lhs, names(x))
if (length(newnames) > 0) {
if (is.function(jval)) {
Copy link
Member

@MichaelChirico MichaelChirico Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will catch the following case (which you should add as a new regression test):

DT=data.table(a=1:10)
DT[,c('a', 'b'):=.(1:10, mean)]

Copy link
Member

@MichaelChirico MichaelChirico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I suspect it will be more natural to put the fix here inside C

@venom1204
Copy link
Contributor Author

venom1204 commented Jul 12, 2025

Okay, that makes sense . I’ll try to implement it

So far, I’m aware to handle these cases:

  • Direct assignment of a function: DT[, b := mean]

  • Function inside a list: DT[, b := list(mean)]

  • Function assigned to a new column

Are there any other specific cases or edge conditions I should be aware of? Let me know so I can cover them in the implementation.

@MichaelChirico
Copy link
Member

Function assigned to a new column

I'm not sure how this differs from the first case.

You might try triggering this error from set() as well as :=.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

various errors when RHS of := is closure/function
3 participants