-
Notifications
You must be signed in to change notification settings - Fork 1k
Migrate most uses of SETLENGTH to the resizable API
#7451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Thanks to Luke Tierney for introducing the API and helping with the migration.
Make sure to set the GROWABLE_BIT on the resizable vectors to avoid problems when they are duplicated or garbage-collected.
Now that data.table objects have the GROWABLE_BIT set, R will reset TRUELENGTH when duplicating them, causing our code to take a different branch.
Now that (1) we depend on R >= 3.4 and (2) data.table objects have the GROWABLE_BIT set, there is no need to adjust allocated memory counts by hand.
Since adaptive application of rolling functions requires us to resize the argument to match the window size, make sure to allocate it as such.
- Don't SET_TRUELENGTH by hand. All of our resizable vectors now have the GROWABLE_BIT set, so when they are duplicated, TRUELENGTH is reset to 0. - Use a combination of R_isResizable and R_maxLength to replace other uses of TRUELENGTH.
|
No obvious timing issues in HEAD=resizable-API Generated via commit 5caf9cc Download link for the artifact containing the test results: ↓ atime-results.zip
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7451 +/- ##
==========================================
- Coverage 99.07% 99.06% -0.01%
==========================================
Files 85 86 +1
Lines 16610 16600 -10
==========================================
- Hits 16456 16445 -11
- Misses 154 155 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ben-schwen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, TY!
|
@aitap place for allocate as resizeable is perfect. I am not sure about by.column usage there, for by.column=TRUE it seems that columns will not be resizeable but only list of pointers to columns. Am I wrong? |
src/frollapply.c
Outdated
| } | ||
| return x; | ||
| SEXP copyAsGrowable(SEXP x, SEXP by_column) { | ||
| if (LOGICAL_RO(by_column)[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard length(by_column) == 0 or use a different macro. (I can see the R logic requires TRUE or FALSE but it's best not to trust anything about a SEXP coming from R.)
| for (int i = 0; i < ncol; i++) { | ||
| SEXP d = VECTOR_ELT(dest, i); | ||
| SETLENGTH(d, nrow); | ||
| R_resizeVector(d, nrow); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably overly defensive but assume R_resizeVector strips all but values. So names might be lost in corner cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lost naming is fine in this use case
|
|
||
| int overAlloc = checkOverAlloc(GetOption1(install("datatable.alloccol"))); | ||
| SEXP ans = PROTECT(allocVector(VECSXP, LENGTH(cols)+overAlloc)); nprotect++; // doing alloc.col directly here; eventually alloc.col can be deprecated. | ||
| SEXP ans = PROTECT(R_allocResizableVector(VECSXP, LENGTH(cols)+overAlloc)); nprotect++; // doing alloc.col directly here; eventually alloc.col can be deprecated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the comment still applicable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been there since 8e272c8. I guess not. Should I remove it altogether?
| # define R_duplicateAsResizable(x) R_duplicateAsResizable_(x) | ||
| # define R_resizeVector(x, newlen) SETLENGTH(x, newlen) | ||
| # define R_maxLength(x) TRUELENGTH(x) | ||
| # define R_isResizable(x) R_isResizable_(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need
if (ALTREP(x)) return false;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. We also checked for ALTREP(x) in the proposed version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also teach copyAsPlain to allocate resizable vectors and use it instead of R_duplicateAsResizable, but for now this works.
src/data.table.h
Outdated
| } | ||
| # define R_duplicateAsResizable(x) R_duplicateAsResizable_(x) | ||
| # define R_resizeVector(x, newlen) SETLENGTH(x, newlen) | ||
| # define R_maxLength(x) TRUELENGTH(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return R_isResizable_(x) ? (R_xlen_t)TRUELENGTH(x) : 0;
might be nicer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WRE doesn't document what's the R_maxLength() of a non-resizable vector, but in R-devel this function returns xlength(), not 0 (which necessitates R_isResizable and the LEVELS trick below). So far we're just lucky/careful that the TRUELENGTH() calls replaced with R_maxLength() don't contradict each other too much.
I can make a more precise backport of R_maxLength(), of course.
| return ret; | ||
| } | ||
| # define R_duplicateAsResizable(x) R_duplicateAsResizable_(x) | ||
| # define R_resizeVector(x, newlen) SETLENGTH(x, newlen) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be guarded by R_isResizable_(x) and new_len <= TRUELENGTH(x), and no attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It would be cleaner to guard for them, so we not only catch them via R-devels runner
if (!R_isResizable(x))
error(_("not a resizable vector"));
if (newlen > XTRUELENGTH(x))
error(_("'newlen' is too large"));There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should introduce extra checks here, this is called in tight loops, and it is better to check before those loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's a better benchmark for this, frollapply(adaptive = TRUE), or foo[bar, by = baz]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frolladaptive willa stress more, unless grouping is by unique column, then probably similarly
If this sounds confusing, let's rename the argument. |

The new resizable vectors API wraps
SETLENGTH,TRUELENGTH, and theGROWABLE_BIT. Since all the vectors we want to resize are now growable, this frees us from:TRUELENGTHfor duplicated vectors (R resetsTRUELENGTHto 0 when duplicating a growable vector)On the other hand,
data.table::truelengthmust be emulated usingR_maxLength.@jangorecki, I had to introduce
copyAsGrowableso that adaptivefrollapplywould work. Would that be fine, or should we find a way to make that copy earlier?@TysonStanley, if merged, this will be needed for cherry-picking into the patch branch.
There will be one more batch of fixes in
src/dogroups.cthat depends on both this PR (for resizable API) and #6694 (for the other use ofTRUELENGTH).Fixes: #990
Closes: #6697
Many thanks to Benjamin Schwendinger for proposing the API and Luke Tierney for making it a reality.