-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-7663: [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table #8044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We lose the the more specific traceback and ZeroDivisionError message, in favor of
In [11]: class MyBrokenInt:
...: def __init__(self):
...: 1/0
In [12]: pa.array([MyBrokenInt()], type=pa.int64())
---------------------------------------------------------------------------
ArrowInvalid Traceback (most recent call last)
<ipython-input-12-1cf156b165b3> in <module>
----> 1 pa.array([MyBrokenInt()], type=pa.int64())
~/git_repo/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()
269 else:
270 # ConvertPySequence does strict conversion if type is explicitly passed
--> 271 return _sequence_to_array(obj, mask, size, type, pool, c_from_pandas)
272
273
~/git_repo/arrow/python/pyarrow/array.pxi in pyarrow.lib._sequence_to_array()
38
39 with nogil:
---> 40 check_status(ConvertPySequence(sequence, mask, options, &out))
41
42 if out.get().num_chunks() == 1:
~/git_repo/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
82
83 if status.IsInvalid():
---> 84 raise ArrowInvalid(message)
85 elif status.IsIOError():
86 # Note: OSError constructor is
ArrowInvalid: Could not convert <__main__.MyBrokenInt object at 0x7fc331394290> with type MyBrokenInt: tried to convert to intbut this is the same message as what we get on master for
In [11]: class MyBrokenInt:
...: def __init__(self):
...: 1/1 so maybe it's ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is fine, personally
9a73767 to
21166d3
Compare
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
python/pyarrow/tests/test_compute.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What did the error message say before, and what does it show now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On master it's
TypeError: an integer is required (got type pyarrow.lib.Int8Array)verus on this branch
ArrowInvalid: Could not convert [
5
] with type pyarrow.lib.Int8Array: tried to convert to intThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, for this case I find the original error message clearer ..
That's the consequence of the scalar(..) conversion using the array conversion under the hood, I suppose?
But OK, I suppose this is fine (it's maybe mainly the multiline repr of the array in the middle of the sentence that makes it more confusing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is fine, personally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the cases that this couldn't be converted, but that obj is an integer? When the integer is too big to fit in a C int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and also when converting a negative integer to a uint:
pa.scalar(-1, type='uint8')
No other tests are touched if I recompile without this check
716bd51 to
0735885
Compare
|
@jorisvandenbossche was there more to be done here? |
|
Thanks for the ping. I think all good. @arw2019 can you just rebase to ensure it's still all passing with latest master? |
|
@jorisvandenbossche Rebased and seeing some failures. They're ones also popping up in other, unrelated, PRs, so not sure they're to do with this patch? I'm happy to investigate, though |
|
There are some known failures on Mac and Appveyor at the moment, so nothing to worry about for this PR. |
|
Thanks @arw2019 ! |
|
Thanks @jorisvandenbossche for reviewing! |
This PR homogenizes error messages for mixed-type
Pandasinputs topa.Table.The message for
Pandascolumn withintfollowed bystringis nowthe same as for
doublefollowed bystring:As a side effect, this snippet [xref #5866, ARROW-7168] now throws an
ArrowInvalid(has beenFutureWarningsince 0.16):Finally, this does break a test [xref #4484, ARROW-4036] - see code comment