Skip to content

One test fails with pandas 3.0 dev version #3291

@seisman

Description

@seisman

In the GMT Dev tests workflow, one test fails (see https://github.com/GenericMappingTools/pygmt/actions/runs/9540235948/job/26291686710).

__________________________ test_compute_bins_outfile ___________________________
[gw1] linux -- Python 3.12.3 /home/runner/micromamba/envs/pygmt/bin/python

grid = <xarray.DataArray 'z' (lat: 14, lon: 8)> Size: 448B
array([[347.5, 344.5, 386. , 640.5, 617. , 579. , 646.5, 671. ],
 ...3.5 -22.5 -21.5 -20.5 ... -12.5 -11.5 -10.5
Attributes:
    long_name:     elevation (m)
    actual_range:  [190. 981.]
expected_df =    start   stop  bin_id
0  345.5  519.5       0
1  519.5  726.5       1
region = [-52, -48, -22, -18]

    def test_compute_bins_outfile(grid, expected_df, region):
        """
        Test grdhisteq.compute_bins with ``outfile``.
        """
        with GMTTempFile(suffix=".txt") as tmpfile:
            with pytest.warns(RuntimeWarning) as record:
                result = grdhisteq.compute_bins(
                    grid=grid,
                    divisions=2,
                    region=region,
                    outfile=tmpfile.name,
                )
                assert len(record) == 1  # check that only one warning was raised
            assert result is None  # return value is None
            assert Path(tmpfile.name).stat().st_size > 0
            temp_df = pd.read_csv(
                filepath_or_buffer=tmpfile.name,
                sep="\t",
                header=None,
                names=["start", "stop", "bin_id"],
                dtype={"start": np.float32, "stop": np.float32, "bin_id": np.uint32},
                index_col="bin_id",
            )
>           pd.testing.assert_frame_equal(
                left=temp_df, right=expected_df.set_index("bin_id")
            )

../pygmt/tests/test_grdhisteq.py:130: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

left = RangeIndex(start=0, stop=2, step=1, name='bin_id')
right = Index([0, 1], dtype='uint32', name='bin_id'), obj = 'DataFrame.index'

    def _check_types(left, right, obj: str = "Index") -> None:
        if not exact:
            return
    
        assert_class_equal(left, right, exact=exact, obj=obj)
        assert_attr_equal("inferred_type", left, right, obj=obj)
    
        # Skip exact dtype checking when `check_categorical` is False
        if isinstance(left.dtype, CategoricalDtype) and isinstance(
            right.dtype, CategoricalDtype
        ):
            if check_categorical:
                assert_attr_equal("dtype", left, right, obj=obj)
                assert_index_equal(left.categories, right.categories, exact=exact)
            return
    
>       assert_attr_equal("dtype", left, right, obj=obj)
E       AssertionError: DataFrame.index are different
E       
E       Attribute "dtype" are different
E       [left]:  int64
E       [right]: uint32

The fail is likely due to changes/bugs in pandas dev version (3.x). To reproduce the issue:

>>> import numpy as np
>>> import pandas as pd
>>> df = pd.read_csv("text.dat", sep=r"\s+",
...                 header=None,
...                 names=["start", "stop", "bin_id"],
...                 dtype={"start": np.float32, "stop": np.float32, "bin_id": np.uint32},
...                 index_col="bin_id")
>>> df.index.dtype

pandas 2.x returns dtype('uint32') but pandas 3.x returns dtype('int64').

The test data is:

345.5  519.5       0
519.5  726.5       1

Need to read the pandas documentation to understand if it's a desired feature or an upstream bug.

Metadata

Metadata

Assignees

No one assigned

    Labels

    upstreamBug or missing feature of upstream core GMT

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions