docs: add MTEB evaluation guide and update usage.rst #3477

sahibpreetsingh12 · 2025-07-30T19:26:20Z

This PR resolves #3332.

Summary

Adds a new documentation page for evaluating SentenceTransformer models using the Massive Text Embedding Benchmark (MTEB), along with relevant task examples and best practices.

Changes

mteb_evaluation.md in docs/sentence_transformer/usage/:
- Installation steps
- Minimal working example
- Task-type breakdown (STS, Classification, Retrieval, etc.)
- Notes on output handling
- Warnings about not using MTEB during training
- Leaderboard + export instructions
Linked from usage.rst to include in sidebar navigation

Notes

Following the guidance in the discussion, MTEB is documented as a post-training evaluation tool, not integrated as an evaluator to avoid benchmark overfitting.

Let me know if you'd like any section adjusted. Thank you!

sahibpreetsingh12 · 2025-07-30T19:46:25Z

@tomaarsen Please share your feedback and if anything else i should chnage

Samoed

@KennethEnevoldsen can you look too?

docs/sentence_transformer/usage/mteb_evaluation.md

Co-authored-by: Roman Solomatin <[email protected]>

sahibpreetsingh12 · 2025-07-31T14:18:07Z

@Samoed and @tomaarsen
If anything else is required form my side please do share.
Since I am new to this thing
What can i do in future to make Unit test run successfully since I just commited changes from UI
and If this merges willl take a pull for later

sahibpreetsingh12 · 2025-08-01T13:51:16Z

@Samoed what is required for this PR to merge
I am happy to contribute

docs/sentence_transformer/usage/mteb_evaluation.md

KennethEnevoldsen · 2025-08-02T12:06:26Z

docs/sentence_transformer/usage/mteb_evaluation.md

+
+* Using it during training risks **overfitting** to public benchmarks.
+* It writes to disk and caches aggressively.
+* Official guidance recommends using SentenceTransformer's built-in evaluators like:


If you want to evaluate during training?

@KennethEnevoldsen ok so here the thought was to just show we do evaluation at testing
Although i also thought about it but then idea was to keep things simple and if this works
I would open another issue and then get it done.

Hmm, I am not quite sure what you mean, but from my reading, it sounds like you do not recommend MTEB for evaluation (even after training), which I don't think is the intention?

Thanks for the clarification, @KennethEnevoldsen !

You're absolutely right — my intention was not to discourage the use of MTEB for evaluation after training. I’ve now updated the wording to clarify that MTEB is recommended for post-training evaluation, but not ideal during training loops due to the risk of overfitting and aggressive caching.

Let me know if the revised phrasing works better — happy to tweak further! 🙌

KennethEnevoldsen · 2025-08-02T12:06:50Z

docs/sentence_transformer/usage/mteb_evaluation.md

+**Important**: MTEB is for *post-training* benchmarking only.
+
+* Using it during training risks **overfitting** to public benchmarks.
+* It writes to disk and caches aggressively.


Is this not ideal?

yes @KennethEnevoldsen from my understanding and based on discussion we had at the start
This looks ideal for me.

docs/sentence_transformer/usage/mteb_evaluation.md

saved the "results" in a variable Co-authored-by: Kenneth Enevoldsen <[email protected]>

To gives more direction while submitting Co-authored-by: Kenneth Enevoldsen <[email protected]>

…wer suggestions

sahibpreetsingh12 · 2025-08-03T20:17:57Z

@KennethEnevoldsen and @Samoed

All review suggestions have been incorporated:

Replaced STSBenchmark with STS22.v2
Clarified task examples with disclaimer and link to full list
Added filtering by task_type, domain, language
Used .to_dataframe() for result printing
Explicitly looped over main_score values

Let me know if anything else is required — happy to revise 🙌

tomaarsen · 2025-08-04T11:29:22Z

Hello @sahibpreetsingh12,

Thanks for the PR. I've had a detailed look at it now, and I'm afraid I see a lot of issues/quirks. For example:

Very little of code the runs
The code snippets don't match the mteb suggested format of import mteb, mteb.get_tasks, but instead use e.g. from mteb import get_tasks
There's text in a code block in the Quick Start
The section titles are unusually long compared to related documentation in Sentence Transformers, they also use emojis unlike other documentation
There's links to Sentence Transformers documentation, even though this is Sentence Transformers documentation
The list of tasks/examples is a bit arbitrary
There's a second file under .ipynb_checkpoints that likely should not have been included

Overall, the documentation page reads very "AI generated", which I'd like to avoid, if possible. I've overhauled it now, with working code, etc. The overall message from your version still remains. I hope that's okay.

Thank you @Samoed and @KennethEnevoldsen for taking the time to review the PR here thus far.

Feel free to let me know what you think of the new version, I think we're pretty close to ready now.

Tom Aarsen

sahibpreetsingh12 · 2025-08-04T11:51:02Z

Hello @sahibpreetsingh12,

Thanks for the PR. I've had a detailed look at it now, and I'm afraid I see a lot of issues/quirks. For example:

Very little of code the runs

The code snippets don't match the mteb suggested format of import mteb, mteb.get_tasks, but instead use e.g. from mteb import get_tasks

There's text in a code block in the Quick Start

The section titles are unusually long compared to related documentation in Sentence Transformers, they also use emojis unlike other documentation

There's links to Sentence Transformers documentation, even though this is Sentence Transformers documentation

The list of tasks/examples is a bit arbitrary

There's a second file under .ipynb_checkpoints that likely should not have been included

Overall, the documentation page reads very "AI generated", which I'd like to avoid, if possible. I've overhauled it now, with working code, etc. The overall message from your version still remains. I hope that's okay.

Thank you @Samoed and @KennethEnevoldsen for taking the time to review the PR here thus far.

Feel free to let me know what you think of the new version, I think we're pretty close to ready now.

Tom Aarsen

Since these are my early days doing a documentation and even any open source.
Yes I took some help in checking how we can make a documentation but the help was setting a base template but yes I would take it and would definitely improve from here.
And these suggestions help me improve

Thanks @tomaarsen @KennethEnevoldsen and @Samoed

And keep "Speeding up Inference" as the last item in Usage

docs: add MTEB evaluation guide and update usage.rst

8db5eed

Samoed reviewed Jul 31, 2025

View reviewed changes

docs/sentence_transformer/usage/mteb_evaluation.md Outdated Show resolved Hide resolved

docs/sentence_transformer/usage/mteb_evaluation.md Outdated Show resolved Hide resolved

docs/sentence_transformer/usage/mteb_evaluation.md Outdated Show resolved Hide resolved

sahibpreetsingh12 and others added 3 commits July 31, 2025 10:16

Update docs/sentence_transformer/usage/mteb_evaluation.md

bf7b937

Co-authored-by: Roman Solomatin <[email protected]>

Update docs/sentence_transformer/usage/mteb_evaluation.md

98a1d68

Co-authored-by: Roman Solomatin <[email protected]>

Update docs/sentence_transformer/usage/mteb_evaluation.md

18d8869

Co-authored-by: Roman Solomatin <[email protected]>

Samoed reviewed Aug 2, 2025

View reviewed changes

docs/sentence_transformer/usage/mteb_evaluation.md Show resolved Hide resolved

docs/sentence_transformer/usage/mteb_evaluation.md Show resolved Hide resolved

KennethEnevoldsen reviewed Aug 2, 2025

View reviewed changes

sahibpreetsingh12 and others added 4 commits August 2, 2025 20:31

Update docs/sentence_transformer/usage/mteb_evaluation.md

6de5c03

saved the "results" in a variable Co-authored-by: Kenneth Enevoldsen <[email protected]>

Update docs/sentence_transformer/usage/mteb_evaluation.md

3f0c98a

To gives more direction while submitting Co-authored-by: Kenneth Enevoldsen <[email protected]>

docs: finalize MTEB evaluation guide with filtered examples and revie…

174fa40

…wer suggestions

docs: finalize MTEB evaluation guide- task name changed

3386666

Refactor the MTEB Evaluation documentation

4f79a40

Final adjustements, edit phrasing, add code block, add missing dot

6b2fe8e

And keep "Speeding up Inference" as the last item in Usage

tomaarsen merged commit 6e7d64e into UKPLab:master Aug 6, 2025
7 of 9 checks passed

docs: add MTEB evaluation guide and update usage.rst #3477

docs: add MTEB evaluation guide and update usage.rst #3477

Uh oh!

Conversation

sahibpreetsingh12 commented Jul 30, 2025

Summary

Changes

Notes

Uh oh!

sahibpreetsingh12 commented Jul 30, 2025

Uh oh!

Samoed left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sahibpreetsingh12 commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sahibpreetsingh12 commented Aug 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KennethEnevoldsen Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

sahibpreetsingh12 Aug 3, 2025

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen Aug 3, 2025

Choose a reason for hiding this comment

Uh oh!

sahibpreetsingh12 Aug 3, 2025

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

sahibpreetsingh12 Aug 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sahibpreetsingh12 commented Aug 3, 2025

Uh oh!

tomaarsen commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sahibpreetsingh12 commented Aug 4, 2025

Uh oh!

Uh oh!

Uh oh!

sahibpreetsingh12 commented Jul 31, 2025 •

edited

Loading

tomaarsen commented Aug 4, 2025 •

edited

Loading