-
Notifications
You must be signed in to change notification settings - Fork 832
Description
Description
When attempting to use the EquivalenceEvaluator, my testing comes back Inclusive. I did some debugging and I found that if using the debugger increase the MaxOutputTokens from 1 to 2, it starts working as I'd expect. I'm not sure why it needs more than 1 output token since the response should be a number between from 1-5, but that's what I'm seeing.
"Failed to parse numeric score for 'Equivalence' from the following text:"
Reproduction Steps
var messages = new List<ChatMessage>
{
new(ChatRole.System, "Your are a helpful assistant."),
new(ChatRole.Assistant, "What's the 3rd planet from the sun?")
};
var response = new ChatResponse(new ChatMessage(ChatRole.Assistant, "The Earth is the 3rd planet."));
var chatConfig = new ChatConfiguration(chatClient.AsBuilder().Build());
var equivalenceEvaluatorContext = new EquivalenceEvaluatorContext("The 3rd planet from the sun is the Earth.");
var equivalenceEvaluator = new EquivalenceEvaluator();
var evaluationResult =
await equivalenceEvaluator.EvaluateAsync(messages, response, chatConfig, additionalContext: [equivalenceEvaluatorContext]);
Debug.WriteLine(evaluationResult.Metrics.Single().Value.Diagnostics?.Single().Message);
Expected behavior
evaluationResult.Metrics.Single().Value.Interpretation.Rating
to be Exceptional
Actual behavior
evaluationResult.Metrics.Single().Value.Diagnostics?.Single().Message
is: Failed to parse numeric score for 'Equivalence' from the following text:
When looking in the debugger one can see that it model stopped generating output because of the max token limit being hit.
Regression?
No response
Known Workarounds
Attached the debugger here, and change the _clientOptions.MaxOutputTokens
to 2
before it's used.
Configuration
Using .net 9, and getting the IChatClient from AnthropicClient.Messages.AsBuidler().Build()
with the Anthropic.SDK nuget package and using the AnthropicModels.Claude4Sonnet model id.
Other information
No response