-
Notifications
You must be signed in to change notification settings - Fork 170
Add bootstrap resampling #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…so computes the 0.95 confidence interval around the mean BLEU score.
Thanks for this! I'd like to simply modify the one-line output for computer-readability, i.e.,
Mean BLEU score: 30.99 +/- 0.20 Any objections? |
Perfectly fine, go ahead :)
…On Tue, Nov 7, 2017, 6:16 AM Matt Post ***@***.***> wrote:
Thanks for this! I'd like to simply modify the one-line output for
computer-readability, i.e.,
cat newstest2017.uedin-nmt.4955.cs-en | ./sacrebleu.py -t wmt17 -l cs-en -b 10
Mean BLEU score: 30.99 +/- 0.20
BLEU+case.mixed+lang.cs-en+numrefs.1+test.wmt17+tok.13a+version.1.0.4 =
30.99 +/- 0.20 n=10 62.4/36.9/24.4/16.4 (BP = 1.000 ratio = 1.004 hyp_len =
61946 ref_len = 61718)
Any objections?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#11 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAUj6IujomqE6VZrZoyi6lJByX_SCQOsks5s0GZXgaJpZM4QUGLi>
.
|
So, don't hate me, but I had to refactor the main class to make an API. Do you want to try to rebase off master? If you don't get to it today I'll do that next. Then I'll push this out as version 1.1.0 and can hopefully leave it alone for a while... (This should actually be easier to implement with compute_bleu() now factored out). |
Another issue: have you tested this against the Moses implementation to ensure the results are the same? |
I'll look into this shortly. |
Hello, There's this |
As for the numpy dependency: despite the name sacreBLEU it would be nice to add also a character-based metric, e.g. chrF3 and there is a numpy implementation (probably much faster than the original pure-Python): https://github.com/awslabs/sockeye/pull/216/files |
I'm going to close this now in light of its age. Please feel free to re-open it if you have the time and inclination! |
Want to pick this up again, @cfedermann? |
This includes the SIGPIPE fix. Call with
--bootstrap-trials $n
or-b $n
. Anyn>1
will result in bootstrap resampling to determine the BLEU score. Ifnumpy
is available, the code also computes the 0.95 confidence interval around the final BLEU score.Uses a fixed random seed
12345
to guarantee reproducible scores. This could later be made configurable in which case the sacreBLEU signature needs to be updated. Not needed for now, though.