Skip to content

Reading parquet file behavior change from 13.0.0 to 14.0.0 #38577

@jhwang7628

Description

@jhwang7628

Describe the bug, including details regarding any error messages, version, and platform.

Hi,

We have a parquet that used to read fine in 13.0.0, but now I got an error when calling via pandas.read_parquet using 14.0.0. The relevant error is:

  File "/opt/venv/lib/python3.10/site-packages/pyarrow/parquet/core.py", line 3003, in read_table
    return dataset.read(columns=columns, use_threads=use_threads,   
  File "/opt/venv/lib/python3.10/site-packages/pyarrow/parquet/core.py", line 2631, in read
    table = self._dataset.to_table(  
  File "pyarrow/_dataset.pyx", line 556, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 3713, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowCapacityError: array cannot contain more than 2147483646 bytes, have 2148480400

Is this an intended behavior? I skimmed through the changelog but did not find this. Thanks.

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions