fix(evals): miscellaneous fixes and ergonomic improvements #9879

ehutt · 2025-10-10T23:56:53Z

Changes Include:

dataframe eval methods return json instead of string in score/details columns
adds a default tqdm progress bar formatter to dataframe evals + flag to hide
moves LLM to a top level import since it is commonly used with Evaluators
updates notebooks in tutorials accordingly with the new import statements

Bug fix:

runningbind_evaluator on an LLMEvaluator returns an error due to attempting to deep copy the LLM object which can't be pickled. Now the method does a shallow copy of the evaluator and only deep copies the input_mapping, which is the only property we don't want the original and copy to share.

Note

Switch dataframe eval outputs to dicts, add default/hideable tqdm progress bar, expose LLM at top level, and fix bind_evaluator copying; update notebooks/tests accordingly.

Evals 2.0 (Core):
- Return JSON-serializable dicts (not strings) in {score}_score and {evaluator}_execution_details columns for dataframe evals.
- Add progress bar controls: default formatter via default_tqdm_progress_bar_formatter(...) and hide_tqdm_bar flag for sync/async dataframe evals.
- Fix bind_evaluator(...): use shallow copy of evaluator (avoids deepcopying LLM with locks) and deep copy only the input_mapping.
API/Exports:
- Expose LLM at top-level import (from phoenix.evals import LLM).
Utils:
- Enhance to_annotation_dataframe to parse dict-backed scores; export default_tqdm_progress_bar_formatter.
Docs/Tutorials/Examples:
- Update imports to use top-level LLM; minor notebook cleanup.
Tests:
- Adjust expectations to handle dict outputs for scores and execution details; add progress bar coverage.

^{Written by Cursor Bugbot for commit 6d7c3f0. This will update automatically on new commits. Configure here.}

…t mapping when binding to avoid issues copying LLM

…lity updated to match

review-notebook-app · 2025-10-10T23:56:58Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

packages/phoenix-evals/src/phoenix/evals/metrics/document_relevance.py

packages/phoenix-evals/src/phoenix/evals/evaluators.py

packages/phoenix-evals/src/phoenix/evals/utils.py

Co-authored-by: Xander Song <[email protected]>

ehutt added 8 commits October 10, 2025 15:43

return scores/details as json in dataframe evals; only deep copy inpu…

1a19a53

…t mapping when binding to avoid issues copying LLM

evaluate dataframe returns json score/details columns; annotation uti…

b0a81ce

…lity updated to match

move LLM to top level import

a74207c

update doc strings with LLM as top level import

d64ca5c

move LLM to top level import

3fd536c

update docstring

b49c3a7

set default progress bar for dataframe evals

750604d

update docstrings

61b77a7

ehutt requested review from a team as code owners October 10, 2025 23:56

github-project-automation bot added this to phoenix Oct 10, 2025

github-project-automation bot moved this to 📘 Todo in phoenix Oct 10, 2025

dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Oct 10, 2025

update tests

e750af8

axiomofjoy approved these changes Oct 20, 2025

View reviewed changes

packages/phoenix-evals/src/phoenix/evals/metrics/document_relevance.py Outdated Show resolved Hide resolved

packages/phoenix-evals/src/phoenix/evals/evaluators.py Outdated Show resolved Hide resolved

packages/phoenix-evals/src/phoenix/evals/utils.py Outdated Show resolved Hide resolved

github-project-automation bot moved this from 📘 Todo to 👍 Approved in phoenix Oct 20, 2025

ehutt and others added 4 commits October 20, 2025 16:16

Update packages/phoenix-evals/src/phoenix/evals/evaluators.py

507ceb3

Co-authored-by: Xander Song <[email protected]>

update relevance metric docstring

dd4ea7e

add json utility function

2b3bb45

small fix

6d7c3f0

ehutt merged commit 5546179 into main Oct 22, 2025
50 checks passed

ehutt deleted the ehutt/random-fixes branch October 22, 2025 18:41

github-project-automation bot moved this from 👍 Approved to ✅ Done in phoenix Oct 22, 2025

mikeldking mentioned this pull request Oct 22, 2025

chore(main): release arize-phoenix-evals 2.6.0 #9832

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(evals): miscellaneous fixes and ergonomic improvements #9879

fix(evals): miscellaneous fixes and ergonomic improvements #9879

Uh oh!

ehutt commented Oct 10, 2025 •

edited by cursor bot

Loading

Uh oh!

review-notebook-app bot commented Oct 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(evals): miscellaneous fixes and ergonomic improvements #9879

fix(evals): miscellaneous fixes and ergonomic improvements #9879

Uh oh!

Conversation

ehutt commented Oct 10, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Oct 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ehutt commented Oct 10, 2025 •

edited by cursor bot

Loading