Skip to content

Conversation

@jorisvandenbossche
Copy link
Member

https://issues.apache.org/jira/browse/ARROW-7168

This change ensures that if you specify a type in pa.array, we ensure the output actually has this type when converting to dictionary array (as we also do for other types).

The PR now implements this change, but we might want to do this with a deprecation first, as this can break people's code.

@github-actions
Copy link

# (the type of the 'codes' in pandas is not part of the data type)
typ = pa.dictionary(index_type=pa.int16(), value_type=pa.string())
result = pa.array(cat, type=typ)
assert result.type.equals(typ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you check also the array value? (for example using result.to_pylist())

typ = pa.dictionary(index_type=pa.int16(), value_type=pa.string())
result = pa.array(cat, type=typ)
assert result.type.equals(typ)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should you perhaps check passing a mask argument to pa.array()?

@jorisvandenbossche
Copy link
Member Author

@pitrou thanks! Added some additional tests

@jorisvandenbossche
Copy link
Member Author

Also added some additional code to do it with a warning for now, falling back to the previous behaviour if the new behaviour fails.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the warning is necessary, and it's more maintenance work in the future, but I'll let you make the final choice. +1 otherwise.

@jorisvandenbossche
Copy link
Member Author

We had some regressions with the changes in 0.15 related to following the types in the schema and the pandas index etc, so thought to be more cautious this time. But certainly not attached to it (I can't really judge if it will be something easy to run into or not).

I think the extra maintenance work is limited, we will only need to remove the warnings and replace with an error at some point.

@pitrou pitrou closed this in 09eac5d Nov 21, 2019
@pitrou
Copy link
Member

pitrou commented Nov 21, 2019

Ok, merged as is :-)

pitrou pushed a commit that referenced this pull request Dec 4, 2019
I guess this should be in #5866

Closes #5952 from mrkn/ARROW-7306 and squashes the following commits:

e0793a3 <Kenta Murata> Add Result-returning version of FileSystemFromUri

Authored-by: Kenta Murata <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
jorisvandenbossche pushed a commit that referenced this pull request Sep 15, 2020
…pe (int/string) Pandas dataframe to pyarrow Table

This PR homogenizes error messages for mixed-type `Pandas` inputs to `pa.Table`.

The message for `Pandas` column with `int` followed by `string`  is now
```
In [2]: table = pa.Table.from_pandas(pd.DataFrame({'a': [ 19, 'a']}))
(... traceback...)
ArrowInvalid: ('Could not convert a with type str: tried to convert to int', 'Conversion failed for column a with type object')
```
the same as for `double` followed by `string`:
```
In [3]: table = pa.Table.from_pandas(pd.DataFrame({'a': [ 19.0, 'a']}))
(... traceback...)
ArrowInvalid: ('Could not convert a with type str: tried to convert to double', 'Conversion failed for column a with type object')
```

As a side effect, this snippet [xref #5866, ARROW-7168] now throws an `ArrowInvalid` (has been `FutureWarning` since 0.16):
```
In [8]: cat = pd.Categorical.from_codes(np.array([0, 1], dtype='int8'), np.array(['a', 'b'], dtype=object))
   ...: typ = pa.dictionary(index_type=pa.int8(), value_type=pa.int64())
   ...: result = pa.array(cat, type=typ)
(... traceback...)
ArrowInvalid: Could not convert a with type str: tried to convert to int
```
Finally, this *does* break a test [xref #4484, ARROW-4036] - see code comment

Closes #8044 from arw2019/ARROW-7663

Authored-by: arw2019 <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants