-
Notifications
You must be signed in to change notification settings - Fork 1k
Growable ALTREP vectors #6697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: truehash
Are you sure you want to change the base?
Growable ALTREP vectors #6697
Conversation
The resulting data.tables now have GROWABLE_BIT set, therefore: - the finalizer is not needed on R >= 3.4 - duplicates of data.tables (which are not over-allocated) now have TRUELENGTH of 0 instead of whatever it was before, which is detected earlier in selfrefok() As a result, assign.c only uses TRUELENGTH on R < 3.4.
Since dogroups() relies on being able to shrink vectors it hasn't allocated, introduce make_growable() and is_growable() to adapt. Since dogroups() relies on SD and SDall having shared columns, use the setgrowable() wrapper on the R side at the time when SD and SDall are being created. (In the ALTREP case, setgrowable() will re-create the columns.)
This helps avoid false positive warnings about unreachable places in the code.
Currently broken: - 1 errors out of 11637 - won't work with expression vectors at all
|
We did not support expressions prior to #5596 and still not fully it support. My guess is that nobody really uses this anyway |
| // out-of-memory. In that case the user will receive hard halt and know to rerun. | ||
| if (length(newcolnames)) { | ||
| oldtncol = TRUELENGTH(dt); // TO DO: oldtncol can be just called tl now, as we won't realloc here any more. | ||
| oldtncol = is_growable(dt) ? growable_max_size(dt) : 0; // TO DO: oldtncol can be just called tl now, as we won't realloc here any more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| oldtncol = is_growable(dt) ? growable_max_size(dt) : 0; // TO DO: oldtncol can be just called tl now, as we won't realloc here any more. | |
| oldtncol = truelength(x); // TO DO: oldtncol can be just called tl now, as we won't realloc here any more. |
| // Can growVector at this point easily enough, but it shouldn't happen in first place so leave it as | ||
| // strong error message for now. | ||
| else if (TRUELENGTH(names) != oldtncol) | ||
| else if (growable_max_size(names) != oldtncol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also use truelength(names) here?
| # segfault when selfref is not ok before set #6410 | ||
| df = data.frame(a=1:3) | ||
| setDT(df) | ||
| attr(df, "att") = 1 | ||
| test(2291.1, set(df, NULL, "new", "new"), error="attributes .* have been reassigned") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this will only bite us on R < 4.3.0 and will be detected earlier on the ALTREP version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By setting GROWABLE_BIT on R ≥ 3.4.0 for data.table lists, we prevent the situation where a shallow duplicate of a data.table has a misleading TRUELENGTH. When duplicating a growable list, R will notice the bit and reset TRUELENGTH to 0. This causes set to fail an earlier check for non-zero TRUELENGTH instead of failing the .internal.selfref check, so the error is different ("This data.table has either been loaded from disk...").
Perhaps the test should account for that, especially if we're moving from R-3.3 and thus will no longer hit the old behaviour (where "attributes .* have been reassigned" is still reachable).
Co-Authored-By: Benjamin Schwendinger <[email protected]>
|
Thank you very much for sparing the time to review! The biggest problem here are the crashes from performing ALTREP operations on objects that the current code wrongly assumes to be resizeable (exposed by the |
| (void)call; \ | ||
| indx = PROTECT(coerceVector(indx, REALSXP)); \ | ||
| double * ii = REAL(indx); \ | ||
| R_xlen_t rlen = XLENGTH(indx), mylen = XLENGTH(x); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I'm misunderstanding extract_subset but shouldn't this only be valid for the actual size and not the full overallocated length?
Hence, mylen = altall_Length(x) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XLENGTH(x) goes through the ALTREP dispatch and lands in altall_Length, but we could definitely save a function call and make the code clearer.
5927188 to
c16fd97
Compare
Ongoing work towards #6180. Depends on #6694. Detailed motivation in a pending blog post.
Good news:
Bad news:
Unfortunately, since there is no
altexpressionALTREP class, best I was able to supportEXPRSXPcolumns was by creatingVECSXPlists for their contents.How long do we have and does anyone have a plan B?
collapse