Skip to content

v0.17.1

Choose a tag to compare

@Yunnglin Yunnglin released this 21 Jul 02:10

新功能

  • 模型压测支持随机生成图文数据,用于多模态模型压测,使用方法参考
  • 支持τ-bench,用于评估 AI Agent在动态用户和工具交互的实际环境中的性能和可靠性,使用方法参考
  • 支持“人类最后的考试”(Humanity's-Last-Exam),这一高难度评测基准,使用方法参考

New Features

  • The model stress testing now supports randomly generated image-text data for multimodal model stress testing. For usage instructions, see here.
  • Support for τ-bench has been added, enabling the evaluation of AI Agent performance and reliability in real-world scenarios involving dynamic user and tool interactions. For usage instructions, see here.
  • Support for "Humanity's Last Exam", a high-difficulty evaluation benchmark, has been added. For usage instructions, see here.

What's Changed

Full Changelog: v0.17.0...v0.17.1