-
Notifications
You must be signed in to change notification settings - Fork 627
Automatically infer a strategy for dtype="object"
#4444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Automatically infer a strategy for dtype="object"
#4444
Conversation
- Use `.iat` instead of `.iloc` to set values in pandas strategies
…rage since we now actually cover all types and this line is now not covered
|
Some form of timeout error in CI |
|
@tybug (I've hit retry, should be OK soon 🤞) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for your PR, Shaun!
This is looking good, and I'm excited to ship it soon! Small comments below about testing and code-comments; and I can always push something to the changelog when I work out what I wanted for that.
|
Some interesting error is occurring outside of the changes in this PR... |
|
sorry for dropping the requested review here, I'd want to be confident I understand the pandas interactions first and I don't have that requisite knowledge at the moment 😅 That failure might be a real crosshair failure, but I'm not sure it's worth pursuing with such a non-reproducer. |
As far as I understand Seems to be vaguely related. But the important points are:
From the docstrings: Demonstration: prints out: |
|
When do you think we could merge this? |
|
I'll take a look today, thanks for your patience (and contribution!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! I updated the changelog to be a bit more concise, and would like to improve our testing:
- I'd like to see a test combining
dtype="O"with a strategy that generates a custom (data)class, for both numpy and pandas- A test for combining custom objects and normal types in the same dtype="O" array/series would be nice as well
|
I'm back again! |
As in: I'd like to see a test which defines a class or dataclass |
|
Changes:
|
|
(looks like you merged master mid-release-process, and that's where the conflicts are coming from) |
…numpy_arrays_and_pandas_series
Ok finally figured that out |
|
Can I get a review? |
Bump |
…hub.com:philastrophist/hypothesis into allow_objects_in_numpy_arrays_and_pandas_series
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made several direct changes here, since by the time I was deep enough in the review to give actionable feedback it was less effort to o so myself. I have one comment about the pandas changes, and then I think this is close to being ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Liam-DeVoe if you've got some time coming up, I think getting this in should be higher priority than the dropping-py39 cleanups - it's been slow because it's a big complicated subtle change but it'd be great to ship it!
|
OK, I've spent a bit understanding the context around this pull.
(this means the pandas-coercion behavior is possibly an unrelated latent bug uncovered by the tests in this pull? unsure yet.) |
dtype="object"
| elif dtype.kind == "O": | ||
| return st.from_type(object) | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not actually clear to me whether we want st.from_type(object) or from_type(type).flatmap(st.from_type) here. Should we make the former simply register to the latter?
566aa1f to
1ec77f5
Compare
This change would add support for generating numpy.ndarray and pandas.Series with any python object as an element.
Effectively, hypothesis can now generate
np.array([MyObject()], dtype=object).The first use-case for this is with Pandas and Pandera where it is possible and sometimes required to have columns which themselves contain structured datatypes.
Pandera seems to be waiting for this change to support PythonDict, PythonTypedDict, PythonNamedTuple etc.
from_dtypeUse.iatinstead of.ilocto set values in pandas strategies (this allows setting of dictionaries as elements etc)