-
Notifications
You must be signed in to change notification settings - Fork 666
Description
Modin version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest released version of Modin.
-
I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)
Reproducible Example
import modin.pandas
import pandas
import numpy as np
def example(pd):
df = pd.concat([pd.DataFrame({0: [1], 1: [3]}), pd.DataFrame({0: [2], 1: [4]})])
df = df.rolling(window=2, axis="index")
result = df.aggregate(np.sum)
print(result)
print("Pandas:")
example(pandas)
# Pandas:
# 0 1
# 0 NaN NaN
# 0 3.0 7.0
print("Modin:")
example(modin.pandas)
# Modin:
# 0 1
# 0 NaN NaN
# 0 NaN NaN
Issue Description
Many query compiler, dataframe, and partition manager functions assume that the axis
argument of a function will either be 0, 1, None
, or modin.core.dataframe.base.dataframe.utils.Axis
. However, the pandas API (and some of our own test cases) allow passing axis="index"
or axis="columns"
, which has the potential to break some of our code.
The provided example is one such case: on this line in the partition manager, a comparison is made for axis == 0
, but axis
here is still the string "index"
. This can manifest itself in more insidious ways in other functions (for example, this line in the dataframe attempts to XOR axis
even though it may be a string), though I'm not sure what the appropriate API call to hit that case would be.
Expected Behavior
The above code should match pandas. The axis
argument should be normalized to an integer or Axis
enum before it is used in comparisons in the query compiler/dataframe/partition manager.
Error Logs
N/A
Installed Versions
INSTALLED VERSIONS
commit : 7871c7b
python : 3.10.4.final.0
python-bits : 64
OS : Darwin
OS-release : 21.6.0
Version : Darwin Kernel Version 21.6.0: Wed Aug 10 14:28:23 PDT 2022; root:xnu-8020.141.5~2/RELEASE_ARM64_T6000
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
Modin dependencies
modin : 0.15.0+171.g7871c7bc
ray : 1.13.0
dask : 2022.8.0
distributed : 2022.8.0
hdk : None
pandas dependencies
pandas : 1.5.0
numpy : 1.23.1
pytz : 2022.1
dateutil : 2.8.2
setuptools : 61.2.0
pip : 22.1.2
Cython : None
pytest : 7.1.2
hypothesis : None
sphinx : 5.1.1
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.4.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : 0.8.2
fsspec : 2022.7.1
gcsfs : None
matplotlib : 3.5.3
numba : None
numexpr : 2.8.3
odfpy : None
openpyxl : 3.0.10
pandas_gbq : 0.17.8
pyarrow : 8.0.0
pyreadstat : None
pyxlsb : None
s3fs : 2022.7.1
scipy : 1.9.0
snappy : None
sqlalchemy : 1.4.40
tables : 3.7.0
tabulate : None
xarray : 2022.6.0
xlrd : 2.0.1
xlwt : None
zstandard : None
tzdata : None