Skip to content

Conversation

@AlenkaF
Copy link
Member

@AlenkaF AlenkaF commented Apr 4, 2023

Rationale for this change

In the C++ the fixed shape tensor canonical extension type is implementated #8510 so we can add bindings to the extension type in Python.

What changes are included in this PR?

Binding for fixed shape tensor canonical extension type.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions
Copy link

github-actions bot commented Apr 4, 2023

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Apr 4, 2023
@AlenkaF AlenkaF force-pushed the python-binding-tensor-extension-type branch from cb8145c to 1bdba1d Compare April 5, 2023 06:23
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 5, 2023
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(some quick comments)

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Apr 5, 2023
Co-authored-by: Joris Van den Bossche <[email protected]>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 5, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Apr 5, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 5, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Apr 5, 2023
@github-actions github-actions bot added the awaiting change review Awaiting change review label Apr 11, 2023
@AlenkaF
Copy link
Member Author

AlenkaF commented Apr 11, 2023

I suggest we then first use the 3rd option: "Return always the actual C-contiguous array and raise error if permutation is non-trivial" and as Joris mentioned, finetune it in a follow-up PR (after the work on to/from_tensor in C++ is finished).

The check for to_numpy_ndarray was added with the last commit: 1ebb829

@rok do you agree with the current state of this PR? If yes we could merge it today and get it into 12.0.0.

Copy link
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Apr 11, 2023
@AlenkaF
Copy link
Member Author

AlenkaF commented Apr 11, 2023

@jorisvandenbossche if u agree we can go ahead and merge this.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Apr 11, 2023
Co-authored-by: Joris Van den Bossche <[email protected]>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 11, 2023
@jorisvandenbossche jorisvandenbossche merged commit d21c1c7 into apache:main Apr 11, 2023
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Apr 11, 2023
@jorisvandenbossche
Copy link
Member

pyarrow->numpy conversion should (imo) use permutation data to create ndarray with correct strides while using the same physical layout (zero-copy) of the data buffer. I don't see any good reason to not do this in pyarrow but I might be missing something?

@rok see my comments above?
Potential reasons that someone might not want to have the permuted ndarray:

  • For a specific use case, someone might not care about the permutation and just wants the physical C-contiguous array
  • Because you want to do the handling of permutation / dim_names yourself, and therefore first want to get the numpy array as the physical array (see GH-34882: [Python] Binding for FixedShapeTensorType #34883 (comment) above for a possible reason for this, eg if you need to handle the potential combination of both permutation and/or dim_names, this might be easier to just all do yourself)

@AlenkaF AlenkaF deleted the python-binding-tensor-extension-type branch April 11, 2023 15:19
@rok
Copy link
Member

rok commented Apr 11, 2023

  • For a specific use case, someone might not care about the permutation and just wants the physical C-contiguous array

Wouldn't that not be possible with tensor_ext_arr.storage.to_numpy()? Or would that not be zero-copy?

  • Because you want to do the handling of permutation / dim_names yourself, and therefore first want to get the numpy array as the physical array (see GH-34882: [Python] Binding for FixedShapeTensorType #34883 (comment) above for a possible reason for this, eg if you need to handle the potential combination of both permutation and/or dim_names, this might be easier to just all do yourself)

How about we have pyarrow pass the permutation/strides information by default and have users use lower level API for manual handling of permutation / dim_names? I'm assuming underlying buffer doesn't get actually permuted in either case of course.

@AlenkaF
Copy link
Member Author

AlenkaF commented Apr 11, 2023

Short interruption of the continuation of the thread 😊

Thank you Joris and Rok for all the help and reviews! 🙏
The docs for this PR are ready for review: #34957

@jorisvandenbossche
Copy link
Member

For a specific use case, someone might not care about the permutation and just wants the physical C-contiguous array

Wouldn't that not be possible with tensor_ext_arr.storage.to_numpy()? Or would that not be zero-copy?

to_numpy() gives a 1D object-dtype array of arrays, so that's not convenient to work with if you want the nD array (it can actually be "zero-copy", but you can't get the zero-copy nD array from it)

How about we have pyarrow pass the permutation/strides information by default and have users use lower level API for manual handling of permutation / dim_names?

What do you mean exactly with "pass the permutation/strides information"? (I suppose you mean returning the permuted/logical array by default?)
Users could indeed do this conversion to an nD array manually (the to_numpy_ndarray implementation isn't that long), but to me that feels a bit the point of providing this method ;) (it could be a keyword argument to the function, to indicate if you want the permuted logical array or not)

@rok
Copy link
Member

rok commented Apr 11, 2023

What do you mean exactly with "pass the permutation/strides information"? (I suppose you mean returning the permuted/logical array by default?)

I think so yes :D.

Users could indeed do this conversion to an nD array manually (the to_numpy_ndarray implementation isn't that long), but to me that feels a bit the point of providing this method ;) (it could be a keyword argument to the function, to indicate if you want the permuted logical array or not)

I suppose having this as an opt-in behavior (with the keyword) would work yeah. Just want to make sure this is not on a default path as that would be surprising.

AlenkaF added a commit that referenced this pull request Apr 12, 2023
…rType (#34957)

### Rationale for this change

This PR adds examples of the use of `FixedShapeTensorType`to the PyArrow user guide. Should be reviewed and merged after #34883 is done.
* Closes: #34956

Lead-authored-by: Alenka Frim <[email protected]>
Co-authored-by: Alenka Frim <[email protected]>
Co-authored-by: Rok Mihevc <[email protected]>
Signed-off-by: Alenka Frim <[email protected]>
@ursabot
Copy link

ursabot commented Apr 12, 2023

Benchmark runs are scheduled for baseline = e488942 and contender = d21c1c7. d21c1c7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] test-mac-arm
[Finished ⬇️0.26% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.09% ⬆️0.06%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] d21c1c79 ec2-t3-xlarge-us-east-2
[Failed] d21c1c79 test-mac-arm
[Finished] d21c1c79 ursa-i9-9960x
[Finished] d21c1c79 ursa-thinkcentre-m75q
[Finished] e488942c ec2-t3-xlarge-us-east-2
[Failed] e488942c test-mac-arm
[Finished] e488942c ursa-i9-9960x
[Finished] e488942c ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this pull request May 11, 2023
### Rationale for this change
In the C++ the fixed shape tensor canonical extension type is implementated apache#8510 so we can add bindings to the extension type in Python.

### What changes are included in this PR?
Binding for fixed shape tensor canonical extension type.

### Are these changes tested?
Yes.

### Are there any user-facing changes?
No.
* Closes: apache#34882

Lead-authored-by: Alenka Frim <[email protected]>
Co-authored-by: Alenka Frim <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Rok Mihevc <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this pull request May 11, 2023
…eTensorType (apache#34957)

### Rationale for this change

This PR adds examples of the use of `FixedShapeTensorType`to the PyArrow user guide. Should be reviewed and merged after apache#34883 is done.
* Closes: apache#34956

Lead-authored-by: Alenka Frim <[email protected]>
Co-authored-by: Alenka Frim <[email protected]>
Co-authored-by: Rok Mihevc <[email protected]>
Signed-off-by: Alenka Frim <[email protected]>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this pull request May 15, 2023
### Rationale for this change
In the C++ the fixed shape tensor canonical extension type is implementated apache#8510 so we can add bindings to the extension type in Python.

### What changes are included in this PR?
Binding for fixed shape tensor canonical extension type.

### Are these changes tested?
Yes.

### Are there any user-facing changes?
No.
* Closes: apache#34882

Lead-authored-by: Alenka Frim <[email protected]>
Co-authored-by: Alenka Frim <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Rok Mihevc <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this pull request May 15, 2023
…eTensorType (apache#34957)

### Rationale for this change

This PR adds examples of the use of `FixedShapeTensorType`to the PyArrow user guide. Should be reviewed and merged after apache#34883 is done.
* Closes: apache#34956

Lead-authored-by: Alenka Frim <[email protected]>
Co-authored-by: Alenka Frim <[email protected]>
Co-authored-by: Rok Mihevc <[email protected]>
Signed-off-by: Alenka Frim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python] Binding for FixedShapeTensorType

4 participants