-
Notifications
You must be signed in to change notification settings - Fork 2.7k
docs: add MTEB evaluation guide and update usage.rst #3477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add MTEB evaluation guide and update usage.rst #3477
Conversation
@tomaarsen Please share your feedback and if anything else i should chnage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KennethEnevoldsen can you look too?
Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Roman Solomatin <[email protected]>
@Samoed and @tomaarsen |
@Samoed what is required for this PR to merge |
|
||
* Using it during training risks **overfitting** to public benchmarks. | ||
* It writes to disk and caches aggressively. | ||
* Official guidance recommends using SentenceTransformer's built-in evaluators like: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to evaluate during training?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KennethEnevoldsen ok so here the thought was to just show we do evaluation at testing
Although i also thought about it but then idea was to keep things simple and if this works
I would open another issue and then get it done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I am not quite sure what you mean, but from my reading, it sounds like you do not recommend MTEB for evaluation (even after training), which I don't think is the intention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification, @KennethEnevoldsen !
You're absolutely right — my intention was not to discourage the use of MTEB for evaluation after training. I’ve now updated the wording to clarify that MTEB is recommended for post-training evaluation, but not ideal during training loops due to the risk of overfitting and aggressive caching.
Let me know if the revised phrasing works better — happy to tweak further! 🙌
**Important**: MTEB is for *post-training* benchmarking only. | ||
|
||
* Using it during training risks **overfitting** to public benchmarks. | ||
* It writes to disk and caches aggressively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not ideal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes @KennethEnevoldsen from my understanding and based on discussion we had at the start
This looks ideal for me.
saved the "results" in a variable Co-authored-by: Kenneth Enevoldsen <[email protected]>
To gives more direction while submitting Co-authored-by: Kenneth Enevoldsen <[email protected]>
@KennethEnevoldsen and @Samoed All review suggestions have been incorporated:
Let me know if anything else is required — happy to revise 🙌 |
Hello @sahibpreetsingh12, Thanks for the PR. I've had a detailed look at it now, and I'm afraid I see a lot of issues/quirks. For example:
Overall, the documentation page reads very "AI generated", which I'd like to avoid, if possible. I've overhauled it now, with working code, etc. The overall message from your version still remains. I hope that's okay. Thank you @Samoed and @KennethEnevoldsen for taking the time to review the PR here thus far. Feel free to let me know what you think of the new version, I think we're pretty close to ready now.
|
Since these are my early days doing a documentation and even any open source. Thanks @tomaarsen @KennethEnevoldsen and @Samoed |
And keep "Speeding up Inference" as the last item in Usage
This PR resolves #3332.
Summary
Adds a new documentation page for evaluating SentenceTransformer models using the Massive Text Embedding Benchmark (MTEB), along with relevant task examples and best practices.
Changes
mteb_evaluation.md
indocs/sentence_transformer/usage/
:Linked from
usage.rst
to include in sidebar navigationNotes
Following the guidance in the discussion, MTEB is documented as a post-training evaluation tool, not integrated as an evaluator to avoid benchmark overfitting.
Let me know if you'd like any section adjusted. Thank you!