-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Describe the enhancement requested
Hello Arrow team,
First, I love Arrow - thank you so much for making this great project.
I am manipulating multi-dimensional array data (time-series like) that is produced from sensors as numpy arrays of type complex64. I would like to manipulate them in arrow for recording (feather and/or parquet formats) and distributed computing in the future (cu-df, dask, spark - most likely frameworks on top of arrow, but also cupy/scipy). This would also allow me to write column names and other schema metadata. I think it could be superior (and faster) than manipulating numpy arrays.
It would be great to just call pa.array(np.array([1 + 2 * 1j, 3 + 4 * 1j], dtype=np.complex64), type=pa.complex64()) but that type doesn't exist in Arrow. I haven't found a way to zero-copy a complex64 numpy array into a pyarrow array (my understanding is that only primitive types support zero-copy between arrow and numpy, and pa.binary(8) or struct attempts on top of numpy views so far have resulted in copies). I would also need to read it back from a feather/parquet format and potentially convert it to a numpy array if needed, and land back on np.complex64.
I think this has come up one or twice already: I found this thread: https://www.mail-archive.com/[email protected]/msg23352.html and this PR: #10452 and thought I would also +1 this request, just in case.
If no first class support, do you see an alternative way to get zero-copy behavior?
Thanks!
Component(s)
Python