-
Notifications
You must be signed in to change notification settings - Fork 665
Description
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS Monterey 12.5.1
- Modin version (
modin.__version__
):5ff947b9
(latest master on my machine) - Python version: 3.10
- Code we can use to reproduce:
import modin.pandas as pd
import numpy as np
df = pd.DataFrame({"a": [np.nan] * 100, "b": [np.nan] * 100})
df.first_valid_index()
Exception: IndexError: only integers, slices (
:), ellipsis (
...), numpy.newaxis (
None) and integer or boolean arrays are valid indices
, coming from the index operation on this line in the query compiler.
Describe the problem
Per pandas docs, first_valid_index
should have the following behavior:
If all elements are non-NA/null, returns None. Also returns None for empty Series/DataFrame.
We currently do not add a check for this case, leading to an exception whenNone
is returned (I'm not sure why the pandas error message for theIndexError
listsNone
as a valid index).
This error currently does not affect empty dataframes, as those will default to pandas for this method.
This bug affects last_valid_index
and possibly other functions as well; I'll investigate further. I plan to fix this (and other similar issues) along with #4909, since the changes to Map
will affect Reduce
and TreeReduce
operators as well.