-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add Gemini judge support with weighted scoring in G-Eval #1913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@bofenghuang is attempting to deploy a commit to the Confident AI Team on Vercel. A member of the Team first needs to authorize it. |
|
Hey @bofenghuang thanks for the PR! Can we add a quick test for G-Eval as well (like supply the gemini model in code to G-Eval?) Thanks! |
0a901bc to
b4f8a8d
Compare
|
Hey @penguine-ip , Thanks for your review! Just rebased and added a mocked unit test plus a live test, where the live one reads the Gemini API config from |
|
Hi @penguine-ip , is this ready to be merged? Thanks! |
|
Hey @bofenghuang yes just one more thing. Would it be better to raise an attribute error if the function does fail? Since right now in G-Eval we are catching that gracefully. Let me know your thoughts |
|
Hey @penguine-ip, |
|
Hey @bofenghuang yes actually that woul dbe craet. We can add a things like `"Score: X (weighted summation=True)", does this sound feasible? |
|
Yes, that would be great! Maybe we could do this in another PR. |
|
Hi @penguine-ip, do you think this PR is ready to merge? Anything you'd like to add? Thanks! |
|
Hey @penguine-ip, does this added warning look good to you? |
|
Hey @bofenghuang is it possible to remove the warning? It will scare users off. The Score: X (weighted=true/false) would be good enough |
b6d2d6f to
987e34a
Compare
|
Hey @penguine-ip, I reverted it. Could you add it later? I'm not fully sure I get what you mean, and it's a bit out of scope for this PR 🙂 |
987e34a to
3f83b99
Compare
|
Hello @penguine-ip, just rebased the PR due to a conflict. Could you add the message? Thanks in advance. |
Hello 👋,
This PR enables the use of Gemini models as judge to get weighted summed scores in G-Eval.
Currently, there isn't a
generate_raw_responselike function to getlogprobsfor Gemini, so it falls back togeneratefunction, which only produces the final sampled score. This PR:generate_raw_responseanda_generate_raw_responsefunctiontop_logprobsat 19, since Gemini only supports a range of [0, 20)transform_gemini_to_openai_likefunction to convert Gemini output to OpenAI format, so we can reuse the existing post-processing code (calculate_weighted_summed_score)One issue is that Gemini tokenizes the default rubric upper bound
10to0and1, so with current version ofcalculate_weighted_summed_score, we need to set the upper bound to less than10