-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Currently, this code fails:
dataset <- open_dataset("some/folder/with/parquet/files")
write_csv_arrow(dataset, sink = "dataset.csv")with this error message:
Error: x must be an object of class 'data.frame', 'RecordBatch', or 'Table', not 'FileSystemDataset'.In ARROW-14741, support was added for reading from a RecordBatchReader, so we should be able to now extend write_csv_arrow() to allow this behaviour.
Note: We would need to make sure whatever write_csv(record_batch_reader) function can take a filesystem= argument
Reporter: Nicola Crane / @thisisnic
Assignee: Nicola Crane / @thisisnic
Related issues:
- [C++] segfault when writing CSV from RecordBatchReader (is blocked by)
- write_parquet() / write_csv_arrow() cannot stream a dataset object back to S3 (is duplicated by)
- [R] Refactor do_exec_plan to return a RecordBatchReader (relates to)
- [C++] Allow CSV Writer to take a RecordBatchReader as input (depends upon)
PRs and other links:
Note: This issue was originally created as ARROW-15040. Please see the migration documentation for further details.