-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Labels
Component: C++Critical FixBugfixes for security vulnerabilities, crashes, or invalid data.Bugfixes for security vulnerabilities, crashes, or invalid data.Type: bug
Milestone
Description
Describe the bug, including details regarding any error messages, version, and platform.
Steps to reproduce:
- Create a ListArray in Rust
- Slice it (at index > 0)
- Send it to C++ via the C data interface
- Perform IPC serialization of the array (wrapped in a RecordBatch)
- The resulting message produces invalid data upon deserialization, in C++ or Rust, for its offset buffer points past the end of its child data.
Here is a standalone python reproduction:
import pyarrow as pa
# This ListArray represents [[3, 4, 5]]. It was sliced the way Rust slices
# ListArrays.
# The C++ slicing would have resulted in offsets_buffer = [0, 2, 5] and
# top-level offset = 1.
list_array = pa.ListArray.from_arrays(offsets=pa.array([2, 5]), values=[1, 2, 3, 4, 5])
list_array.validate()
assert list_array == pa.array([[3, 4, 5]])
table = pa.table({"col": list_array})
sink = pa.BufferOutputStream()
pa.ipc.new_stream(sink, table.schema).write_table(table)
reader = pa.ipc.RecordBatchStreamReader(sink.getvalue())
table_deserialized = pa.Table.from_batches(list(reader))
# This raises pyarrow.lib.ArrowInvalid: In chunk 0: Invalid: First or last list offset out of bounds
table_deserialized.column(0).validate()The gist of the issue is that:
- Rust and C++ slice ListArray differently
- C++ bumps the top-level offset of the ArrayData
- However Rust does not maintain a top-level offset. Instead, it slices the offset buffers
- Upon IPC serialization of a ListArray, C++ only looks at the top-level offset do decide whether to rebuild the offsets buffer. However, it properly rebuilds the child data
- This leads to a corrupt serialized message
I have a test+fix for this.
Component(s)
C++
Metadata
Metadata
Assignees
Labels
Component: C++Critical FixBugfixes for security vulnerabilities, crashes, or invalid data.Bugfixes for security vulnerabilities, crashes, or invalid data.Type: bug