[data.llm] Return a batch of rows in the udf instead of row by row #54329

kouroshHakha · 2025-07-03T21:01:04Z

As mentioned in this PR there is a lot of overhead when returning each row as its own batch with bsize=1 in an async generator udf. This PR fixes is by returning a batch.

There is an argument that we are sacrificing throughput by making items in a batch sync but that there is even a bigger overhead now because of yielding each at row levels.

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

kouroshHakha · 2025-07-03T21:02:06Z

Running release tests to ensure correctness: https://buildkite.com/ray-project/release/builds/47502

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

…ay-project#54329) Signed-off-by: Kourosh Hakhamaneshi <[email protected]> Signed-off-by: doyoung <[email protected]>

…ay-project#54329) Signed-off-by: Kourosh Hakhamaneshi <[email protected]> Signed-off-by: ChanChan Mao <[email protected]>

kouroshHakha added 2 commits July 3, 2025 13:55

Wip

f1f6297

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

wip

992c84b

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

alexeykudinkin added the go add ONLY when ready to merge, run all tests label Jul 3, 2025

alexeykudinkin approved these changes Jul 3, 2025

View reviewed changes

tests

6fb0ae0

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

kouroshHakha marked this pull request as ready for review July 7, 2025 01:58

kouroshHakha requested a review from a team as a code owner July 7, 2025 01:58

kouroshHakha added 3 commits July 7, 2025 10:27

wip

5039c85

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

test_chat_template_stage

7322cb3

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

wip

3d34163

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

lk-chen approved these changes Jul 7, 2025

View reviewed changes

kouroshHakha merged commit 0803021 into ray-project:master Jul 7, 2025
5 checks passed

lk-chen mentioned this pull request Jul 8, 2025

[llm.data] Fix AttributeError for the shallow copy of data batch transfer #54419

Closed

8 tasks

kouroshHakha mentioned this pull request Jul 8, 2025

[wip][debug][do not merge] fixing ray data llm problems #54359

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[data.llm] Return a batch of rows in the udf instead of row by row #54329

[data.llm] Return a batch of rows in the udf instead of row by row #54329

Uh oh!

kouroshHakha commented Jul 3, 2025 •

edited

Loading

Uh oh!

kouroshHakha commented Jul 3, 2025

Uh oh!

Uh oh!

Uh oh!

[data.llm] Return a batch of rows in the udf instead of row by row #54329

[data.llm] Return a batch of rows in the udf instead of row by row #54329

Uh oh!

Conversation

kouroshHakha commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kouroshHakha commented Jul 3, 2025

Uh oh!

Uh oh!

Uh oh!

kouroshHakha commented Jul 3, 2025 •

edited

Loading