Skip to content

[R] auto splice data frames in record_batch() and table() #22146

@asfimport

Description

@asfimport

ARROW-3814https://github.com/apache/arrow/pull/3565/files#diff-95ad459e0128bfecf0d72ebd6d6ee8aaR94 changed the API of record_batch() and arrow::table() such that you could no longer pass in a data.frame to the function, not without massaging it yourself. That broke sparklyr integration tests with an opaque cannot infer type from data error, and it's unfortunate that there's no longer a direct way to go from a data.frame to a record batch, which sounds like a common need.

In order to follow best practices (cf. the tibble package, for example), we should (1) add an as_record_batch function, which the data.frame method is probably just as_record_batch.data.frame <- function(x) record_batch(!!!x); and (2) if a user supplies a single, unnamed data.frame as the argument to record_batch(), raise an error that says to use as_record_batch(). We may later decide that we should automatically call as_record_batch(), but in case that is too magical and prevents some legitimate use case, let's hold off for now. It's easier to add magic than remove it.

Once this function exists, sparklyr tests can try to use as_record_batch, and if that function doesn't exist, fall back to record_batch (because that means it has an older released version of arrow that doesn't have as_record_batch, so record_batch(df) should work).

cc @javierluraschi

Reporter: Neal Richardson / @nealrichardson
Assignee: Romain Francois / @romainfrancois

PRs and other links:

Note: This issue was originally created as ARROW-5718. Please see the migration documentation for further details.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions