Skip to content

feat: Add support for multiple histories #170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 30, 2025
Merged

Conversation

bradhilton
Copy link
Collaborator

No description provided.

):
results.append(result)
trajectory_results: list[TokenizedResult] = []
for history in [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably doesn't matter but we could make Trajectory inherit from History to avoid having to create a new one here.

Copy link
Contributor

@corbt corbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it would be good to have an explicit way to mark an assistant message as trainable or not. I guess we get that right now by passing Choice vs Message objects but if we end up standardizing on using Message objects everywhere (or someone wants to pass allow_training_without_logprobs for any reason) we'll have an issue. That can come later though I suppose.

Also, just checking that these are being masked out somewhere so the different histories in the same trajectory don't attend to each other?

@bradhilton
Copy link
Collaborator Author

bradhilton commented Jun 28, 2025

@corbt no masking, here's a caveat I shared in our discussion:

For the qwen 3 reasoning use case you'll want to add an additional message and choices list for each history. You'll also want to convert previous choice objects to plain assistant messages so that you don't multiplicatively train on earlier messages with the wrong contents to boot

I also mused "maybe we could add support for auto-splitting reasoning trajectories once we figure out what works," but I'm not ready to tackle that atm

@corbt
Copy link
Contributor

corbt commented Jun 28, 2025

Ah I may have been unclear; my question was about attention masking, not loss masking. Are we masking to make sure the different message histories within a trajectory aren't able to attend to each other?

@bradhilton
Copy link
Collaborator Author

@corbt yes, each history will yield it's own TokenizedResult, which gets a unique token group id in the packing step

Copy link
Contributor

@corbt corbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet, excited to have this in. @arcticfly we'll definitely have to document how this works.

@bradhilton bradhilton marked this pull request as ready for review June 30, 2025 18:46
@bradhilton bradhilton merged commit d96e539 into main Jun 30, 2025
1 check passed
@bradhilton bradhilton deleted the feat/multiple-histories branch June 30, 2025 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants