Skip to content

Conversation

alexeykudinkin
Copy link
Contributor

Why are these changes needed?

Context

This change skips unnecessary blanket conversion to Numpy (applied to every chunk of data) before converting to Pyarrow.

That creates challenges when batches contain Arrow native Scalars which because of that are ultimately being serialized as ArrowPythonObjectType extension.

Changes

We revisit following conversion aspects and convert to Numpy passed in column values only in following cases:

  • Column name is TENSOR_COLUMN_NAME (for compatibility)
  • Provided column values are already represented by a tensor (either numpy, torch, etc)
  • Provided column values is a list of ndarrays (we do this for compatibility with previously existing behavior where all column values were blindly converted to Numpy leading to list of ndarrays being converted a tensor)

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@alexeykudinkin alexeykudinkin requested a review from a team as a code owner March 11, 2025 05:17
@alexeykudinkin alexeykudinkin requested a review from raulchen March 11, 2025 05:17
@alexeykudinkin alexeykudinkin added the go add ONLY when ready to merge, run all tests label Mar 11, 2025
@raulchen
Copy link
Contributor

lint is failing. please fix

@bveeramani bveeramani merged commit f6347c0 into master Mar 12, 2025
5 checks passed
@bveeramani bveeramani deleted the ak/arw-cnv-opt-fix branch March 12, 2025 22:14
qinyiyan pushed a commit to qinyiyan/ray that referenced this pull request Mar 13, 2025
…as blocks (ray-project#51238)

Context
---

This change skips unnecessary blanket conversion to Numpy (applied to
every chunk of data) before converting to Pyarrow.

That creates challenges when batches contain Arrow native `Scalars`
which because of that are ultimately being serialized as
`ArrowPythonObjectType` extension.

Changes
---

We revisit following conversion aspects and convert to Numpy passed in
column values only in following cases:

 - Column name is `TENSOR_COLUMN_NAME` (for compatibility)
- Provided column values are already represented by a tensor (either
numpy, torch, etc)
- Provided column values is a list of ndarrays (we do this for
compatibility with previously existing behavior where all column values
were blindly converted to Numpy leading to list of ndarrays being
converted a tensor)

---------

Signed-off-by: Alexey Kudinkin <[email protected]>
park12sj pushed a commit to park12sj/ray that referenced this pull request Mar 18, 2025
…as blocks (ray-project#51238)

Context
---

This change skips unnecessary blanket conversion to Numpy (applied to
every chunk of data) before converting to Pyarrow.

That creates challenges when batches contain Arrow native `Scalars`
which because of that are ultimately being serialized as
`ArrowPythonObjectType` extension.

Changes
---

We revisit following conversion aspects and convert to Numpy passed in
column values only in following cases:

 - Column name is `TENSOR_COLUMN_NAME` (for compatibility)
- Provided column values are already represented by a tensor (either
numpy, torch, etc)
- Provided column values is a list of ndarrays (we do this for
compatibility with previously existing behavior where all column values
were blindly converted to Numpy leading to list of ndarrays being
converted a tensor)

---------

Signed-off-by: Alexey Kudinkin <[email protected]>
dhakshin32 pushed a commit to dhakshin32/ray that referenced this pull request Mar 27, 2025
…as blocks (ray-project#51238)

Context
---

This change skips unnecessary blanket conversion to Numpy (applied to
every chunk of data) before converting to Pyarrow.

That creates challenges when batches contain Arrow native `Scalars`
which because of that are ultimately being serialized as
`ArrowPythonObjectType` extension.

Changes
---

We revisit following conversion aspects and convert to Numpy passed in
column values only in following cases:

 - Column name is `TENSOR_COLUMN_NAME` (for compatibility)
- Provided column values are already represented by a tensor (either
numpy, torch, etc)
- Provided column values is a list of ndarrays (we do this for
compatibility with previously existing behavior where all column values
were blindly converted to Numpy leading to list of ndarrays being
converted a tensor)

---------

Signed-off-by: Alexey Kudinkin <[email protected]>
Signed-off-by: Dhakshin Suriakannu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-backlog go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants