Add LLM-as-Judge task success evaluation to optimizer #4

MahtabSarvmaili · 2025-07-22T03:41:22Z

Implement LLMJudgeSuccess evaluator for automatic task completion assessment Add server-specific optimization with automatic server detection Enhanced documentation with evaluation criteria and output format

…multi-server processing support

…ness of user requested task.

…d output format.

- evaluting the "is_sucessful" based on the LLM as Judge output - removing the corret tool - removing the limitation on the number of successful and failed samples for optimization

- extracting the user's queries from the trace file

- Updated response extraction - Updated core_trace_process.py to use new extraction functions - Added helper functions for detecting different span types

- sorting the successful and failed examples based on the scores

petersonbill64 added 5 commits July 17, 2025 20:02

Add server-specific optimization with automatic server detection and …

f356c23

…multi-server processing support

initial step - needs revision - LLMAsJudge to evaluate the successful…

fc0c7de

…ness of user requested task.

LLM Judge Successful - needs revision

9010396

include LLM-as-Judge task success evaluation with scoring criteria an…

8a07e29

…d output format.

Merge branch 'main' into mahtab_prompt_optimize

67351ff

MahtabSarvmaili requested a review from saqadri July 22, 2025 03:41

petersonbill64 added 17 commits July 28, 2025 11:08

Merge branch 'main' into mahtab_prompt_optimize

451c5d1

- Fixing the truncated message - using llm to summarize the long message

ec51b2f

- evaluting the "is_sucessful" based on the LLM as Judge output - removing the corret tool - removing the limitation on the number of successful and failed samples for optimization

.

7d80afd

Merge branch 'main' into mahtab_prompt_optimize

19fef96

- inclusion of server name

fd65079

- extracting the user's queries from the trace file

- Enhanced trace processing

cf2d0e9

- Updated response extraction - Updated core_trace_process.py to use new extraction functions - Added helper functions for detecting different span types

Merge branch 'main' into mahtab_prompt_optimize

70be221

Enhance trace processing and dataset extraction

6f0cbb4

minor changes

ae011bb

minor changes

b840bd1

Merge branch 'main' into mahtab_prompt_optimize

5ba742e

.

2a37e8d

Merge branch 'main' into mahtab_prompt_optimize

5d9ffb5

Merge branch 'main' into mahtab_prompt_optimize

29c8fbc

improved prompt for toll docstring optimizer

9c6e99b

- OPTIM_PROCESS.md posted

52de158

- sorting the successful and failed examples based on the scores

Merge branch 'main' into mahtab_prompt_optimize

0436604

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add LLM-as-Judge task success evaluation to optimizer #4

Add LLM-as-Judge task success evaluation to optimizer #4

Uh oh!

MahtabSarvmaili commented Jul 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add LLM-as-Judge task success evaluation to optimizer #4

Are you sure you want to change the base?

Add LLM-as-Judge task success evaluation to optimizer #4

Uh oh!

Conversation

MahtabSarvmaili commented Jul 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants