Skip to content

BUG: first_valid_index errors on dataframe with only None/NaN values #4912

@noloerino

Description

@noloerino

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS Monterey 12.5.1
  • Modin version (modin.__version__): 5ff947b9 (latest master on my machine)
  • Python version: 3.10
  • Code we can use to reproduce:
import modin.pandas as pd
import numpy as np
df = pd.DataFrame({"a": [np.nan] * 100, "b": [np.nan] * 100})
df.first_valid_index()

Exception: IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices, coming from the index operation on this line in the query compiler.

Describe the problem

Per pandas docs, first_valid_index should have the following behavior:

If all elements are non-NA/null, returns None. Also returns None for empty Series/DataFrame.
We currently do not add a check for this case, leading to an exception when None is returned (I'm not sure why the pandas error message for the IndexError lists None as a valid index).

This error currently does not affect empty dataframes, as those will default to pandas for this method.

This bug affects last_valid_index and possibly other functions as well; I'll investigate further. I plan to fix this (and other similar issues) along with #4909, since the changes to Map will affect Reduce and TreeReduce operators as well.

Metadata

Metadata

Assignees

Labels

P2Minor bugs or low-priority feature requestsbug 🦗Something isn't workingpandas concordance 🐼Functionality that does not match pandas

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions