-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Describe the bug, including details regarding any error messages, version, and platform.
The docstring for InMemoryDataset indicate you can create one from a RecordBatchReader:
arrow/python/pyarrow/_dataset.pyx
Lines 990 to 999 in 5a52916
| cdef class InMemoryDataset(Dataset): | |
| """ | |
| A Dataset wrapping in-memory data. | |
| Parameters | |
| ---------- | |
| source : RecordBatch, Table, list, tuple | |
| The data for this dataset. Can be a RecordBatch, Table, list of | |
| RecordBatch/Table, iterable of RecordBatch, or a RecordBatchReader | |
| If an iterable is provided, the schema must also be provided. |
However, if you try this you currently get an error saying you cannot:
>>> ds = ds.InMemoryDataset(rbr)
Traceback (most recent call last):
File "<python-input-37>", line 1, in <module>
ds = ds.InMemoryDataset(rbr)
File "pyarrow/_dataset.pyx", line 1038, in pyarrow._dataset.InMemoryDataset.__init__
TypeError: Expected a table, batch, or list of tables/batches instead of the given type: RecordBatchReaderI don't think we allow simple construction of an InMemoryDataset from a RecordBatchReader because that violates the assumption Datasets about sources being re-readable (not one-shot like RBR). But I don't see why the InMemoryDataset constructor can't consume the RecordBatchReader and construct a Table from it.
Component(s)
Python