-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Milestone
Description
Describe the bug, including details regarding any error messages, version, and platform.
Hi,
We have a parquet that used to read fine in 13.0.0, but now I got an error when calling via pandas.read_parquet using 14.0.0. The relevant error is:
File "/opt/venv/lib/python3.10/site-packages/pyarrow/parquet/core.py", line 3003, in read_table
return dataset.read(columns=columns, use_threads=use_threads,
File "/opt/venv/lib/python3.10/site-packages/pyarrow/parquet/core.py", line 2631, in read
table = self._dataset.to_table(
File "pyarrow/_dataset.pyx", line 556, in pyarrow._dataset.Dataset.to_table
File "pyarrow/_dataset.pyx", line 3713, in pyarrow._dataset.Scanner.to_table
File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowCapacityError: array cannot contain more than 2147483646 bytes, have 2148480400
Is this an intended behavior? I skimmed through the changelog but did not find this. Thanks.
Component(s)
Python