Resume inflight batches from unfinished samples #2563

ProFrenchToast · 2025-10-06T10:48:07Z

This PR contains:

What is the current behavior? (You can also link to an open issue here)

Currently in batch mode, if an eval is stopped mid sample there will be batches that are inflight that are not resumed when the eval is retried. This means that we need to resend all of the requests when running the sample again.

What is the new behavior?

Now the current set of inflight batches are saved in the eval stats of the eval log. When the log is retried it will check each of the inflight batches when the log was closed, each request from completed batches will be added into the cache, batches that are still in progress will be added to the current inflight batches. Adding completed requests to the cache will mean that when resuming the sample that request will be automatically filled without needing to create a new batch allowing whatever sample that was in progress to effectively resume from where it was (assuming the rest of the input is the same).

This was born out of a desire to run large batches asynchronously for simple 1 step evals such as big mcq datasets. The ideal use case would be run the eval, send off all the requests in a large batch, close the process, resume the next day and pick up the completed sample.

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

Changing the information stored in the logs might break something related to reading the logs but I have not checked.
As part of this change I removed the source field from cache key calculation. This will result in different keys compared to before meaning generate will need to be called again.

Other information:

TODO:

plan how to resume batches that have not been completed
-- The problem with this is how do we marry up the requests from a batch with a specific generate call in a sample?
-- we could do something like, if we would call generate with a given input, epoch, ect and there is an inflight batch with the same input, epoch, ect then we dont make a new request and instead just wait for that inflight batch to be done.
-- This approach sounds like it will have problems but I can't describe why yet.

Added the functionality to read the inflight batches from the log file and add any completed batches to the cache

this is done because the inflight batches don't record the source of a message and so will have a different key than the real messages. I don't see a reason we need the source in the key so I just removed it.

ProFrenchToast · 2025-10-06T13:30:10Z

Added epoch to batch requests and added argument to trigger the resume feature.

Now I just need to plan what we are gonna do with the still in progress batches and how to test this feature.

jjallaire · 2025-10-06T14:42:07Z

Thank you for this! There are some non-obvious additional things that will need to be done here, and we may want to move to storing the intermediate batches somewhere besides the log file (e.g. a sqlite database in the user's data dir).

@epatey I think we should this this up once we are through the scanner pipeline work.

@ProFrenchToast We will plan on taking this from here (we could go back and forth on all of the related/required other changes but I think it will be more efficient for us to just do the work).

ProFrenchToast · 2025-10-06T14:46:30Z

@ProFrenchToast We will plan on taking this from here (we could go back and forth on all of the related/required other changes but I think it will be more efficient for us to just do the work).

Yeah it was starting to get to a point where there would need to be some design decisions I didn't feel qualified to make. Happy to let you guys take over lol.

ProFrenchToast added 8 commits October 2, 2025 11:38

added in flight batches to eval_stats

6c46fe0

add and remove inflight batches

6a4daa8

added batches to cache

cd82136

Added the functionality to read the inflight batches from the log file and add any completed batches to the cache

removed source from cache key

3d531b8

this is done because the inflight batches don't record the source of a message and so will have a different key than the real messages. I don't see a reason we need the source in the key so I just removed it.

added epoch to batch request data

7c8d8c3

removed unneeded logging

6321ae7

added resume_batches argument

5f07791

added epoch arg to batch request

d9f4e86

added description to resume_batches

25d0856

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Resume inflight batches from unfinished samples #2563

Resume inflight batches from unfinished samples #2563

Uh oh!

ProFrenchToast commented Oct 6, 2025 •

edited

Loading

Uh oh!

ProFrenchToast commented Oct 6, 2025 •

edited

Loading

Uh oh!

jjallaire commented Oct 6, 2025

Uh oh!

ProFrenchToast commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Resume inflight batches from unfinished samples #2563

Are you sure you want to change the base?

Resume inflight batches from unfinished samples #2563

Uh oh!

Conversation

ProFrenchToast commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This PR contains:

What is the current behavior? (You can also link to an open issue here)

What is the new behavior?

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

Other information:

Uh oh!

ProFrenchToast commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjallaire commented Oct 6, 2025

Uh oh!

ProFrenchToast commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ProFrenchToast commented Oct 6, 2025 •

edited

Loading

ProFrenchToast commented Oct 6, 2025 •

edited

Loading