Skip to content

v0.4.9.1

Latest
Compare
Choose a tag to compare
@baberabb baberabb released this 04 Aug 11:36
· 36 commits to main since this release
d021bf8

lm-eval v0.4.9.1 Release Notes

This v0.4.9.1 release is a quick patch to bring in some new tasks and fixes. Looking aheas, we're gearing up for some bigger updates to tackle common community pain points. We'll do our best to keep things from breaking, but we anticipate a few changes might not be fully backward-compatible. We're excited to share more soon!

Enhanced Reasoning Model Handling

  • Better support for reasoning models with a think_end_token argument to strip intermediate reasoning from outputs for the hf, vllm, and sglang model backends. A related enable_thinking argument was also added for specific models that support it (e.g., Qwen).

New Benchmarks & Tasks

Fixes & Improvements

Tasks & Benchmarks:

Backend & Stability:

  • Reduce CLI loading time from 2.2s to 0.05s by @stakodiak. (#3099)
  • Fixed a process hang caused by mp.Pool in bootstrap_stderr and introduced DISABLE_MULTIPROC envar by @ankitgola005 and @neel04. (#3135, #3106)
  • add image hashing and LMEVAL_HASHMM envar by @artemorloff in #2973
  • TaskManager: include-path precedence handling to prioritize custom dir over default by @parkhs21 in #3068

Housekeeping:

  • Pinned datasets < 4.0.0 temporarily to maintain compatibility with trust_remote_code by @baberabb. (#3172)
  • Removed models from Neural Magic and other unneeded files by @baberabb. (#3112, #3113, #3108)

What's Changed

New Contributors

Full Changelog: v0.4.9...v0.4.9.1