-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-34775: [R] arrow_table: as.data.frame() sometimes returns a tbl and sometimes a data.frame #35173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… return data.frames
…mes and update tests to use it
8a3d84b to
652952c
Compare
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! It seems like a cleaner solution than what we currently have. I like the idea of dropping metadata on the way in where possible because I seem to remember that we can skip some calls from C++ into R if there is no metadata to restore which speeds things up a bit.
|
Benchmark runs are scheduled for baseline = 2ee0345 and contender = 205ceb9. 205ceb9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
|
['Python', 'R'] benchmarks have high level of regressions. |
…tbl and sometimes a data.frame (apache#35173) Features of this PR: * Ensures that calling `as.data.frame()` on Arrow objects returns base R `data.frame` objects. * Drops the `class` attribute metadata of input objects of `data.frame` class (i.e. that don't have inherit from any additional classes other than `data.frame`). This results in us sacrificing roundtrip class fidelity for `data.frame` objects (i.e. if we input a base R data.frame, convert it to an Arrow Table, and then convert it back to R, we get a tibble). However, we now have consistency in the type of returned objects, retain roundtrip fidelity for other (non-class) metadata, and guarantee that `as.data.frame()` returns a base R data.frame. Users who wish to input and return a `data.frame` object can call `as.data.frame()` on the returned object. * Implements `dplyr::collect()` for StructArrays so that these objects can still be returned as tibbles if needed. * Renames `expect_data_frame()` to `expect_equal_data_frame()` for clarity, and updates it to convert both the object and expected object to data.frames. * Closes: apache#34775 Authored-by: Nic Crane <[email protected]> Signed-off-by: Nic Crane <[email protected]>
…tbl and sometimes a data.frame (apache#35173) Features of this PR: * Ensures that calling `as.data.frame()` on Arrow objects returns base R `data.frame` objects. * Drops the `class` attribute metadata of input objects of `data.frame` class (i.e. that don't have inherit from any additional classes other than `data.frame`). This results in us sacrificing roundtrip class fidelity for `data.frame` objects (i.e. if we input a base R data.frame, convert it to an Arrow Table, and then convert it back to R, we get a tibble). However, we now have consistency in the type of returned objects, retain roundtrip fidelity for other (non-class) metadata, and guarantee that `as.data.frame()` returns a base R data.frame. Users who wish to input and return a `data.frame` object can call `as.data.frame()` on the returned object. * Implements `dplyr::collect()` for StructArrays so that these objects can still be returned as tibbles if needed. * Renames `expect_data_frame()` to `expect_equal_data_frame()` for clarity, and updates it to convert both the object and expected object to data.frames. * Closes: apache#34775 Authored-by: Nic Crane <[email protected]> Signed-off-by: Nic Crane <[email protected]>
Features of this PR:
Ensures that calling
as.data.frame()on Arrow objects returns base Rdata.frameobjects.Drops the
classattribute metadata of input objects ofdata.frameclass (i.e. that don't have inherit from any additional classes other thandata.frame). This results in us sacrificing roundtrip class fidelity fordata.frameobjects (i.e. if we input a base R data.frame, convert it to an Arrow Table, and then convert it back to R, we get a tibble). However, we now have consistency in the type of returned objects, retain roundtrip fidelity for other (non-class) metadata, and guarantee thatas.data.frame()returns a base R data.frame. Users who wish to input and return adata.frameobject can callas.data.frame()on the returned object.Implements
dplyr::collect()for StructArrays so that these objects can still be returned as tibbles if needed.Renames
expect_data_frame()toexpect_equal_data_frame()for clarity, and updates it to convert both the object and expected object to data.frames.Closes: [R] arrow_table: as.data.frame() sometimes returns a tbl and sometimes a data.frame #34775