Automatically infer a strategy for `dtype="object"` #4444

philastrophist · 2025-06-20T12:57:02Z

This change would add support for generating numpy.ndarray and pandas.Series with any python object as an element.
Effectively, hypothesis can now generate np.array([MyObject()], dtype=object).
The first use-case for this is with Pandas and Pandera where it is possible and sometimes required to have columns which themselves contain structured datatypes.
Pandera seems to be waiting for this change to support PythonDict, PythonTypedDict, PythonNamedTuple etc.

Accept dtype.kind = 'O' in from_dtype
Add the base case of any type
~~Use .iat instead of .iloc to set values in pandas strategies (this allows setting of dictionaries as elements etc)~~
Construct Series rather than setting elements in pandas strategies (this allows dictionaries as elements etc)

- Use `.iat` instead of `.iloc` to set values in pandas strategies

…rage since we now actually cover all types and this line is now not covered

philastrophist · 2025-07-02T15:39:16Z

Some form of timeout error in CI

Zac-HD · 2025-07-03T04:34:58Z

@tybug FAILED hypothesis-python/tests/watchdog/test_database.py::test_database_listener_directory_move - Exception: timing out after waiting 1s for condition lambda: set(events) on Windows CI

(I've hit retry, should be OK soon 🤞)

Zac-HD

Thanks so much for your PR, Shaun!

This is looking good, and I'm excited to ship it soon! Small comments below about testing and code-comments; and I can always push something to the changelog when I work out what I wanted for that.

hypothesis-python/src/hypothesis/extra/numpy.py

hypothesis-python/src/hypothesis/extra/pandas/impl.py

hypothesis-python/tests/numpy/test_argument_validation.py

hypothesis-python/tests/pandas/test_series.py

hypothesis-python/RELEASE.rst

philastrophist · 2025-07-03T09:16:06Z

Some interesting error is occurring outside of the changes in this PR...

Liam-DeVoe · 2025-07-03T20:37:49Z

sorry for dropping the requested review here, I'd want to be confident I understand the pandas interactions first and I don't have that requisite knowledge at the moment 😅

That failure might be a real crosshair failure, but I'm not sure it's worth pursuing with such a non-reproducer.

philastrophist · 2025-07-04T06:37:26Z

sorry for dropping the requested review here, I'd want to be confident I understand the pandas interactions first and I don't have that requisite knowledge at the moment 😅

As far as I understand at and iat are more basic indexers than loc and iloc in that they can only access a single entry rather than possibly an subset of entries.
But ignoring vector access here, loc will transform dicts into a series and then set them. There's an interesting note in their source here:

# TODO(EA): ExtensionBlock.setitem this causes issues with
# setting for extensionarrays that store dicts. Need to decide
# if it's worth supporting that.

Seems to be vaguely related.

But the important points are:

loc does transformations to the given values stopping us from inserting dicts into series using iloc/loc. This may or may not be a bug. Either way, editing this logic within pandas is likely to be fraught and it's difficult to tell what other transforms might be applied.
at is the intended way to set single values within a dataframe/series according to the docs. It's technically faster but more importantly it doesn't perform any checks or transformations on the value. The logic is a lot simpler. The reason ruff warns against it is that "iloc is more idiomatic and versatile". We know, that in our use-case, we will only ever be setting a series element by integer index, which is what iat is for.

From the docstrings:

DataFrame.iat : Access a single value for a row/column label pair by integer position(s).
DataFrame.iloc : Access a group of rows and columns by integer position(s).
Similar to ``iloc``, in that both provide integer-based lookups. Use
    ``iat`` if you only need to get or set a single value in a DataFrame
    or Series.

Demonstration:

import pandas as pd

s = pd.Series([1, 2, 3], dtype=object)  # object dtype so we dont get mismatch warnings

s.iloc[0] = {'a': 1}
print('series with iloc:\n', s)
print('entry type with iloc:', type(s.iloc[0]))

s.iat[0] = {'a': 1}
print('with iat:\n', s)
print('entry type with iat:', type(s.iat[0]))

prints out:

series with iloc:
 0    a    1
dtype: int64
1                      2
2                      3
dtype: object
entry type with iloc: <class 'pandas.core.series.Series'>
with iat:
 0    {'a': 1}
1           2
2           3
dtype: object
entry type with iat: <class 'dict'>

philastrophist · 2025-07-10T15:58:44Z

When do you think we could merge this?

Liam-DeVoe · 2025-07-10T16:49:19Z

I'll take a look today, thanks for your patience (and contribution!)

…eries

Liam-DeVoe

Looking good! I updated the changelog to be a bit more concise, and would like to improve our testing:

I'd like to see a test combining dtype="O" with a strategy that generates a custom (data)class, for both numpy and pandas
- A test for combining custom objects and normal types in the same dtype="O" array/series would be nice as well

hypothesis-python/src/hypothesis/extra/pandas/impl.py

hypothesis-python/src/hypothesis/extra/numpy.py

hypothesis-python/tests/pandas/test_series.py

philastrophist · 2025-08-04T17:30:53Z

I'm back again!
Could you clarify "custom (data)class, for both numpy and pandas"?

Liam-DeVoe · 2025-08-05T17:24:51Z

Could you clarify "custom (data)class, for both numpy and pandas"?

As in: I'd like to see a test which defines a class or dataclass A with a bunch of fields of different types, and passes elements=st.builds(A) to the pandas and numpy strategies which have newly-added support for dtype="O". Then check that you can pull out elements of type A from the pandas series or numpy array. I want to make sure that supplying complicated classes to dtype="O" is well supported!

philastrophist · 2025-08-27T17:50:24Z

Changes:

Sped up the hot path in numpy set_element by skipping for non-object cases
dropped using iat with pandas and instead construct the series using lists (type errors are still raised by pandas) to avoid pandas coercing values we don't want it to coerce (much cleaner anyway)
Made the tests a bit more sophisticated (checking exact parity of elements that go into the pandas/numpy strategy and their values when accessed in those numpy/pandas containers)
Remove overflow check pre-filter since overflow only happens when pandas errors and tries to display the erroring row using string.ljust
Removed assert_safe_equals since we can just assert list equality

…eries

Zac-HD · 2025-08-27T18:19:23Z

(looks like you merged master mid-release-process, and that's where the conflicts are coming from)

…numpy_arrays_and_pandas_series

philastrophist · 2025-08-27T18:51:26Z

(looks like you merged master mid-release-process, and that's where the conflicts are coming from)

Ok finally figured that out

philastrophist · 2025-09-01T09:42:19Z

Can I get a review?

philastrophist · 2025-09-15T12:09:33Z

Can I get a review?

Bump

…eries

…hub.com:philastrophist/hypothesis into allow_objects_in_numpy_arrays_and_pandas_series

Liam-DeVoe

I've made several direct changes here, since by the time I was deep enough in the review to give actionable feedback it was less effort to o so myself. I have one comment about the pandas changes, and then I think this is close to being ready.

hypothesis-python/docs/prolog.rst

hypothesis-python/src/hypothesis/extra/pandas/impl.py

Zac-HD

@Liam-DeVoe if you've got some time coming up, I think getting this in should be higher priority than the dropping-py39 cleanups - it's been slow because it's a big complicated subtle change but it'd be great to ship it!

hypothesis-python/src/hypothesis/extra/numpy.py

…eries

Liam-DeVoe · 2025-10-17T05:32:50Z

OK, I've spent a bit understanding the context around this pull.

we actually already support dtype="object". What we don't support is automatic inference of a strategy for dtype="object". That's the core of what this PR adds. nps.arrays("O", shape=(1,), elements=st.just(object())) works today
I believe the equality code in set_element is just for nice error messages, when numpy unexpectedly converts an element in an array due to dtypes. This can't happen in dtype="object", so we don't need to filter out un-equatable objects.
I believe the pandas changes to avoid object conversion is a bug, which I've filed here: BUG: assignment to Series.iloc with dtype="object" converts dictionary to Series pandas-dev/pandas#62723. I'm not a regular pandas user, so I may be mistaken. But not being able to store a dict in an object-type pandas series was pretty surprising to me.

(this means the pandas-coercion behavior is possibly an unrelated latent bug uncovered by the tests in this pull? unsure yet.)

Liam-DeVoe · 2025-10-17T05:36:25Z

hypothesis-python/src/hypothesis/extra/numpy.py

+    elif dtype.kind == "O":
+        return st.from_type(object)
    else:


it's not actually clear to me whether we want st.from_type(object) or from_type(type).flatmap(st.from_type) here. Should we make the former simply register to the latter?

Shaun Read added 7 commits June 20, 2025 13:48

- Accept dtype.kind = 'O' in from_dtype

77fc61e

- Use `.iat` instead of `.iloc` to set values in pandas strategies

ruff: yes we really want .iat

895e360

linting

8fe26b8

add test for failing coverage and dtypes

d2bf820

linter

918a9f0

rst mistakes and linting

b514789

comparable datatypes only

07f8ea0

Zac-HD requested a review from Liam-DeVoe June 26, 2025 02:08

Shaun Read added 3 commits July 2, 2025 14:46

still keep the else line to catch unknown dtypes but remove from cove…

aa0ab3f

…rage since we now actually cover all types and this line is now not covered

make test agree with the from_dtype strategy

b9380b7

formatting

e0c2909

Zac-HD reviewed Jul 3, 2025

View reviewed changes

Shaun Read added 4 commits July 3, 2025 09:19

addressed comments :)

6aea8bc

formatting

b65e335

formatting

212a628

Got rst sytnax wrong again...

088d272

philastrophist requested a review from Zac-HD July 3, 2025 09:16

Liam-DeVoe added 2 commits July 11, 2025 15:58

Merge branch 'master' into allow_objects_in_numpy_arrays_and_pandas_s…

c52bb66

…eries

clean up some things

f8b6ad6

Liam-DeVoe reviewed Jul 11, 2025

View reviewed changes

hypothesis-python/src/hypothesis/extra/pandas/impl.py Outdated Show resolved Hide resolved

hypothesis-python/src/hypothesis/extra/numpy.py Outdated Show resolved Hide resolved

hypothesis-python/tests/pandas/test_series.py Outdated Show resolved Hide resolved

restrict objects allowed in arrays, better tests

56ce942

Shaun Read added 2 commits August 27, 2025 18:16

formatting/linting

f87f00d

removed assert_safe_equals; using list equality now

a32eb9c

philastrophist requested a review from Zac-HD August 27, 2025 18:04

Merge branch 'master' into allow_objects_in_numpy_arrays_and_pandas_s…

877212e

…eries

Shaun Read added 3 commits August 27, 2025 19:33

not sure how that happened

6226f0e

Merge remote-tracking branch 'upstream/master' into allow_objects_in_…

704d80e

…numpy_arrays_and_pandas_series

format

808c01c

Liam-DeVoe added 7 commits September 16, 2025 22:13

Merge branch 'master' into allow_objects_in_numpy_arrays_and_pandas_s…

77fac3b

…eries

Merge branch 'allow_objects_in_numpy_arrays_and_pandas_series' of git…

c1d6fc4

…hub.com:philastrophist/hypothesis into allow_objects_in_numpy_arrays_and_pandas_series

refactor tests

631ff0f

simplify numpy code

1f70970

format

5982df7

bring back array equality check

b26eaac

comment, weaker series dtype test

d3e5f3b

Liam-DeVoe reviewed Sep 17, 2025

View reviewed changes

hypothesis-python/docs/prolog.rst Outdated Show resolved Hide resolved

hypothesis-python/src/hypothesis/extra/pandas/impl.py Show resolved Hide resolved

simplify pandas code

adab86e

Zac-HD reviewed Oct 16, 2025

View reviewed changes

hypothesis-python/src/hypothesis/extra/numpy.py Outdated Show resolved Hide resolved

hypothesis-python/src/hypothesis/extra/numpy.py Outdated Show resolved Hide resolved

Merge branch 'master' into allow_objects_in_numpy_arrays_and_pandas_s…

a2b28b1

…eries

Liam-DeVoe changed the title ~~Support for numpy.ndarray and pandas.Series with any python object as entry~~ Automatically infer a strategy for dtype="object" Oct 17, 2025

Liam-DeVoe reviewed Oct 17, 2025

View reviewed changes

refactor

1ec77f5

Liam-DeVoe force-pushed the allow_objects_in_numpy_arrays_and_pandas_series branch from 566aa1f to 1ec77f5 Compare October 18, 2025 16:14

Liam-DeVoe added 2 commits October 19, 2025 12:34

xfail pandas object test

6e39885

nonstrict

003b239

Automatically infer a strategy for dtype="object" #4444

Are you sure you want to change the base?

Automatically infer a strategy for dtype="object" #4444

Conversation

philastrophist commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philastrophist commented Jul 2, 2025

Uh oh!

Zac-HD commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zac-HD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

philastrophist commented Jul 3, 2025

Uh oh!

Liam-DeVoe commented Jul 3, 2025

Uh oh!

philastrophist commented Jul 4, 2025

Uh oh!

philastrophist commented Jul 10, 2025

Uh oh!

Liam-DeVoe commented Jul 10, 2025

Uh oh!

Liam-DeVoe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

philastrophist commented Aug 4, 2025

Uh oh!

Liam-DeVoe commented Aug 5, 2025

Uh oh!

philastrophist commented Aug 27, 2025

Uh oh!

Zac-HD commented Aug 27, 2025

Uh oh!

philastrophist commented Aug 27, 2025

Uh oh!

philastrophist commented Sep 1, 2025

Uh oh!

philastrophist commented Sep 15, 2025

Uh oh!

Liam-DeVoe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Zac-HD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Liam-DeVoe commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Liam-DeVoe Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Automatically infer a strategy for `dtype="object"` #4444

Automatically infer a strategy for `dtype="object"` #4444

philastrophist commented Jun 20, 2025 •

edited

Loading

Zac-HD commented Jul 3, 2025 •

edited

Loading

Liam-DeVoe commented Oct 17, 2025 •

edited

Loading