Skip to content

Conversation

@bigmouthmodels
Copy link

Hey Ekin! I have played around with this today, love it and would speed up my main workflow a lot!

My only suggestion is that we go bigger and aim to give read_eval_logs a skip_sample_validation field so that users can get an EvalLog back. This is important to me because I have lots of analysis code that take EvalLogs, EvalSamples, List[ChatMessage] as input, and I would probably cast to these types before downstream analysis in the future.

Here's a suggestion of how we could adapt the approach you've taken to do this:

  • Use model_construct rather than model_validate to populate EvalSample with the JSON dict.
  • Use multiprocessing to parallelise reading the samples from disk. It is annoying that the GIL prevents us from sharing the single ZipFile here, but that's life.
  • Update the Recorder ABC to feature skip_sample_validation flag. There's a todo here - I haven't made a call about what we do with the JSONRecorder, which straightforwardly loads everything from JSON. I suggest we skip validation for all fields in this case, and detail this in the docs.

I've included a benchmarking test that measures execution time on a realistic (for me!) log file of 5 samples, a few hundred messages each, lots of big messages and events. The average speedup was x2.5. I expect this to be bigger for more and longer samples, and potentially slightly sub-1 for logs with a handful of short samples.

Screenshot 2025-08-07 at 09 33 01

Let me know what you think! Can then add in tests and docs. It could also be that we include some logic so that multiprocessing is only used if there are more than k samples, since for smaller logs I suspect the start-up overhead isn't worth it.

@bigmouthmodels
Copy link
Author

Edit: looking at the benchmarking data you shared again (have attached for ref). The difference in speed between model_construct(json.load(f)) and json.load(f) is huge! Makes me think we probably do want to let the user skip constructing the ChatMessages and Events at all, like you've done.

image (4)

One way that we could get the speed of json.load(f) while conforming to the existing ChatMessage and Event interfaces is to write an interface that functions like both of these, but simply grabs the requested field from the dictionary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant