Skip to content

BUG: Passing string as axis argument leads to incorrect behavior #5094

@noloerino

Description

@noloerino

Modin version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest released version of Modin.

  • I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)

Reproducible Example

import modin.pandas
import pandas
import numpy as np

def example(pd):
    df = pd.concat([pd.DataFrame({0: [1], 1: [3]}), pd.DataFrame({0: [2], 1: [4]})])
    df = df.rolling(window=2, axis="index")
    result = df.aggregate(np.sum)
    print(result)

print("Pandas:")
example(pandas)
# Pandas:
#     0   1
# 0 NaN NaN
# 0 3.0 7.0
print("Modin:")
example(modin.pandas)
# Modin:
#     0   1
# 0 NaN NaN
# 0 NaN NaN

Issue Description

Many query compiler, dataframe, and partition manager functions assume that the axis argument of a function will either be 0, 1, None, or modin.core.dataframe.base.dataframe.utils.Axis. However, the pandas API (and some of our own test cases) allow passing axis="index" or axis="columns", which has the potential to break some of our code.

The provided example is one such case: on this line in the partition manager, a comparison is made for axis == 0, but axis here is still the string "index". This can manifest itself in more insidious ways in other functions (for example, this line in the dataframe attempts to XOR axis even though it may be a string), though I'm not sure what the appropriate API call to hit that case would be.

Expected Behavior

The above code should match pandas. The axis argument should be normalized to an integer or Axis enum before it is used in comparisons in the query compiler/dataframe/partition manager.

Error Logs

N/A

Installed Versions

INSTALLED VERSIONS

commit : 7871c7b
python : 3.10.4.final.0
python-bits : 64
OS : Darwin
OS-release : 21.6.0
Version : Darwin Kernel Version 21.6.0: Wed Aug 10 14:28:23 PDT 2022; root:xnu-8020.141.5~2/RELEASE_ARM64_T6000
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

Modin dependencies

modin : 0.15.0+171.g7871c7bc
ray : 1.13.0
dask : 2022.8.0
distributed : 2022.8.0
hdk : None

pandas dependencies

pandas : 1.5.0
numpy : 1.23.1
pytz : 2022.1
dateutil : 2.8.2
setuptools : 61.2.0
pip : 22.1.2
Cython : None
pytest : 7.1.2
hypothesis : None
sphinx : 5.1.1
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.4.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : 0.8.2
fsspec : 2022.7.1
gcsfs : None
matplotlib : 3.5.3
numba : None
numexpr : 2.8.3
odfpy : None
openpyxl : 3.0.10
pandas_gbq : 0.17.8
pyarrow : 8.0.0
pyreadstat : None
pyxlsb : None
s3fs : 2022.7.1
scipy : 1.9.0
snappy : None
sqlalchemy : 1.4.40
tables : 3.7.0
tabulate : None
xarray : 2022.6.0
xlrd : 2.0.1
xlwt : None
zstandard : None
tzdata : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Minor bugs or low-priority feature requestsbug 🦗Something isn't workingpandas concordance 🐼Functionality that does not match pandas

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions