Skip to content

Commit 64e6ece

Browse files
committed
Update loss overview docs tables for distillation
1 parent af58c7f commit 64e6ece

File tree

4 files changed

+36
-28
lines changed

4 files changed

+36
-28
lines changed

docs/cross_encoder/loss_overview.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,13 @@ Loss functions play a critical role in the performance of your fine-tuned Cross
3131
These loss functions are specifically designed to be used when distilling the knowledge from one model into another.
3232
For example, when finetuning a small model to behave more like a larger & stronger one, or when finetuning a model to become multi-lingual.
3333

34-
| Texts | Labels | Appropriate Loss Functions |
35-
|----------------------------------------------|---------------------------------------------------------------|--------------------------------------------------------------------------------------------|
36-
| `(sentence_A, sentence_B) pairs` | `similarity score` | <a href="../package_reference/cross_encoder/losses.html#mseloss">`MSELoss`</a> |
37-
| `(query, passage_one, passage_two) triplets` | `gold_sim(query, passage_one) - gold_sim(query, passage_two)` | <a href="../package_reference/cross_encoder/losses.html#marginmseloss">`MarginMSELoss`</a> |
34+
| Texts | Labels | Appropriate Loss Functions |
35+
|---------------------------------------------------|---------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|
36+
| `(sentence_A, sentence_B) pairs` | `similarity score` | <a href="../package_reference/cross_encoder/losses.html#mseloss">`MSELoss`</a> |
37+
| `(query, passage_one, passage_two) triplets` | `gold_sim(query, passage_one) - gold_sim(query, passage_two)` | <a href="../package_reference/cross_encoder/losses.html#marginmseloss">`MarginMSELoss`</a> |
38+
| `(query, positive, negative_1, ..., negative_n)` | `[gold_sim(query, positive) - gold_sim(query, negative_i) for i in 1..n]` | <a href="../package_reference/cross_encoder/losses.html#marginmseloss">`MarginMSELoss`</a> |
39+
| `(query, positive, negative)` | `[gold_sim(query, positive), gold_sim(query, negative)]` | <a href="../package_reference/cross_encoder/losses.html#marginmseloss">`MarginMSELoss`</a> |
40+
| `(query, positive, negative_1, ..., negative_n) ` | `[gold_sim(query, positive), gold_sim(query, negative_i)...] ` | <a href="../package_reference/cross_encoder/losses.html#marginmseloss">`MarginMSELoss`</a> |
3841

3942
## Commonly used Loss Functions
4043
In practice, not all loss functions get used equally often. The most common scenarios are:

docs/sentence_transformer/loss_overview.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -37,14 +37,14 @@ For example, models trained with <a href="../package_reference/sentence_transfor
3737
These loss functions are specifically designed to be used when distilling the knowledge from one model into another.
3838
For example, when finetuning a small model to behave more like a larger & stronger one, or when finetuning a model to become multi-lingual.
3939

40-
| Texts | Labels | Appropriate Loss Functions |
41-
|----------------------------------------------|---------------------------------------------------------------|---------------------------------------------------------------------------------------------------|
42-
| `sentence` | `model sentence embeddings` | <a href="../package_reference/sentence_transformer/losses.html#mseloss">`MSELoss`</a> |
43-
| `sentence_1, sentence_2, ..., sentence_N` | `model sentence embeddings` | <a href="../package_reference/sentence_transformer/losses.html#mseloss">`MSELoss`</a> |
44-
| `(query, passage_one, passage_two) triplets` | `gold_sim(query, passage_one) - gold_sim(query, passage_two)` | <a href="../package_reference/sentence_transformer/losses.html#marginmseloss">`MarginMSELoss`</a> |
45-
| `(query, positive, negative_1, ..., negative_n)` | `[gold_sim(query, positive) - gold_sim(query, negative_i) for i in 1..n]` | <a href="../package_reference/sentence_transformer/losses.html#marginmseloss">`MarginMSELoss`</a> |
46-
| `(query, positive, negative)` | `[gold_sim(query, positive), gold_sim(query, negative)]` | <a href="../package_reference/sentence_transformer/losses.html#distilkldivloss">`DistillKLDivLoss`</a> |
47-
| `(query, positive, negative_1, ..., negative_n) ` | `[gold_sim(query, positive), gold_sim(query, negative_i)...] ` | <a href="../package_reference/sentence_transformer/losses.html#distilkldivloss">`DistillKLDivLoss`</a> |
40+
| Texts | Labels | Appropriate Loss Functions |
41+
|---------------------------------------------------|---------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
42+
| `sentence` | `model sentence embeddings` | <a href="../package_reference/sentence_transformer/losses.html#mseloss">`MSELoss`</a> |
43+
| `(sentence_1, sentence_2, ..., sentence_N)` | `model sentence embeddings` | <a href="../package_reference/sentence_transformer/losses.html#mseloss">`MSELoss`</a> |
44+
| `(query, passage_one, passage_two)` | `gold_sim(query, passage_one) - gold_sim(query, passage_two)` | <a href="../package_reference/sentence_transformer/losses.html#marginmseloss">`MarginMSELoss`</a> |
45+
| `(query, positive, negative_1, ..., negative_n)` | `[gold_sim(query, positive) - gold_sim(query, negative_i) for i in 1..n]` | <a href="../package_reference/sentence_transformer/losses.html#marginmseloss">`MarginMSELoss`</a> |
46+
| `(query, positive, negative)` | `[gold_sim(query, positive), gold_sim(query, negative)]` | <a href="../package_reference/sentence_transformer/losses.html#distillkldivloss">`DistillKLDivLoss`</a><br><a href="../package_reference/sentence_transformer/losses.html#marginmseloss">`MarginMSELoss`</a> |
47+
| `(query, positive, negative_1, ..., negative_n) ` | `[gold_sim(query, positive), gold_sim(query, negative_i)...] ` | <a href="../package_reference/sentence_transformer/losses.html#distillkldivloss">`DistillKLDivLoss`</a><br><a href="../package_reference/sentence_transformer/losses.html#marginmseloss">`MarginMSELoss`</a> |
4848

4949
## Commonly used Loss Functions
5050
In practice, not all loss functions get used equally often. The most common scenarios are:

docs/sparse_encoder/loss_overview.md

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
The <a href="../package_reference/sparse_encoder/losses.html#spladeloss"><code>SpladeLoss</code></a> implements a specialized loss function for SPLADE (Sparse Lexical and Expansion) models. It combines a main loss function with regularization terms to control efficiency:
1414

15-
- Supports all the losses mention below as main loss but three principal loss types: <a href="../package_reference/sparse_encoder/losses.html#sparsemultiplenegativesrankingloss"><code>SparseMultipleNegativesRankingLoss</code></a>, <a href="../package_reference/sparse_encoder/losses.html#sparsemarginmseloss"><code>SparseMarginMSELoss</code></a> and <a href="../package_reference/sparse_encoder/losses.html#sparsedistilkldivloss"><code>SparseDistillKLDivLoss</code></a>.
15+
- Supports all the losses mention below as main loss but three principal loss types: <a href="../package_reference/sparse_encoder/losses.html#sparsemultiplenegativesrankingloss"><code>SparseMultipleNegativesRankingLoss</code></a>, <a href="../package_reference/sparse_encoder/losses.html#sparsemarginmseloss"><code>SparseMarginMSELoss</code></a> and <a href="../package_reference/sparse_encoder/losses.html#sparsedistillkldivloss"><code>SparseDistillKLDivLoss</code></a>.
1616
- Uses <a href="../package_reference/sparse_encoder/losses.html#flopsloss"><code>FlopsLoss</code></a> for regularization to control sparsity by default, but supports custom regularizers.
1717
- Balances effectiveness (via the main loss) with efficiency by regularizing both query and document representations.
1818
- Allows using different regularizers for queries and documents via the `query_regularizer` and `document_regularizer` parameters, enabling fine-grained control over sparsity patterns for different types of inputs.
@@ -51,23 +51,22 @@ Loss functions play a critical role in the performance of your fine-tuned model.
5151
## Distillation
5252
These loss functions are specifically designed to be used when distilling the knowledge from one model into another. This is rather commonly used when training Sparse embedding models.
5353

54-
| Texts | Labels | Appropriate Loss Functions |
55-
|---------------------------------------------------|---------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
56-
| `sentence` | `model sentence embeddings` | <a href="../package_reference/sparse_encoder/losses.html#sparsemseloss">`SparseMSELoss`</a> |
57-
| `sentence_1, sentence_2, ..., sentence_N` | `model sentence embeddings` | <a href="../package_reference/sparse_encoder/losses.html#sparsemseloss">`SparseMSELoss`</a> |
58-
| `(query, passage_one, passage_two) triplets` | `gold_sim(query, passage_one) - gold_sim(query, passage_two)` | <a href="../package_reference/sparse_encoder/losses.html#sparsemarginmseloss">`SparseMarginMSELoss`</a> |
59-
| `(query, positive, negative) triplets` | `[gold_sim(query, positive), gold_sim(query, negative)]` | <a href="../package_reference/sparse_encoder/losses.html#sparsedistilkldivloss">`SparseDistillKLDivLoss`</a><br><a href="../package_reference/sparse_encoder/losses.html#sparsemarginmseloss">`SparseMarginMSELoss`</a> |
60-
| `(query, positive, negative_1, ..., negative_n)` | `[gold_sim(query, positive) - gold_sim(query, negative_i) for i in 1..n]` | <a href="../package_reference/sparse_encoder/losses.html#sparsemarginmseloss">`SparseMarginMSELoss`</a> |
61-
| `(query, positive, negative_1, ..., negative_n) ` | `[gold_sim(query, positive), gold_sim(query, negative_i)...] ` | <a href="../package_reference/sparse_encoder/losses.html#sparsedistilkldivloss">`SparseDistillKLDivLoss`</a><br><a href="../package_reference/sparse_encoder/losses.html#sparsemarginmseloss">`SparseMarginMSELoss`</a> |
62-
54+
| Texts | Labels | Appropriate Loss Functions |
55+
|---------------------------------------------------|---------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
56+
| `sentence` | `model sentence embeddings` | <a href="../package_reference/sparse_encoder/losses.html#sparsemseloss">`SparseMSELoss`</a> |
57+
| `(sentence_1, sentence_2, ..., sentence_N)` | `model sentence embeddings` | <a href="../package_reference/sparse_encoder/losses.html#sparsemseloss">`SparseMSELoss`</a> |
58+
| `(query, passage_one, passage_two)` | `gold_sim(query, passage_one) - gold_sim(query, passage_two)` | <a href="../package_reference/sparse_encoder/losses.html#sparsemarginmseloss">`SparseMarginMSELoss`</a> |
59+
| `(query, positive, negative_1, ..., negative_n)` | `[gold_sim(query, positive) - gold_sim(query, negative_i) for i in 1..n]` | <a href="../package_reference/sparse_encoder/losses.html#sparsemarginmseloss">`SparseMarginMSELoss`</a> |
60+
| `(query, positive, negative)` | `[gold_sim(query, positive), gold_sim(query, negative)]` | <a href="../package_reference/sparse_encoder/losses.html#sparsedistillkldivloss">`SparseDistillKLDivLoss`</a><br><a href="../package_reference/sparse_encoder/losses.html#sparsemarginmseloss">`SparseMarginMSELoss`</a> |
61+
| `(query, positive, negative_1, ..., negative_n) ` | `[gold_sim(query, positive), gold_sim(query, negative_i)...] ` | <a href="../package_reference/sparse_encoder/losses.html#sparsedistillkldivloss">`SparseDistillKLDivLoss`</a><br><a href="../package_reference/sparse_encoder/losses.html#sparsemarginmseloss">`SparseMarginMSELoss`</a> |
6362

6463
## Commonly used Loss Functions
6564

6665
In practice, not all loss functions get used equally often. The most common scenarios are:
6766

6867
* `(anchor, positive) pairs` without any labels: <a href="../package_reference/sparse_encoder/losses.html#sparsemultiplenegativesrankingloss"><code>SparseMultipleNegativesRankingLoss</code></a> (a.k.a. InfoNCE or in-batch negatives loss) is commonly used to train the top performing embedding models. This data is often relatively cheap to obtain, and the models are generally very performant. Here for our sparse retrieval tasks, this format works well with <a href="../package_reference/sparse_encoder/losses.html#spladeloss"><code>SpladeLoss</code></a> or <a href="../package_reference/sparse_encoder/losses.html#csrloss"><code>CSRLoss</code></a>, both typically using InfoNCE as their underlying loss function.
6968

70-
* `(query, positive, negative_1, ..., negative_n)` format: This structure with multiple negatives is particularly effective with <a href="../package_reference/sparse_encoder/losses.html#spladeloss"><code>SpladeLoss</code></a> configured with <a href="../package_reference/sparse_encoder/losses.html#sparsemarginmseloss"><code>SparseMarginMSELoss</code></a>, especially in knowledge distillation scenarios where a teacher model provides similarity scores. The strongest models are trained with distillation losses like <a href="../package_reference/sparse_encoder/losses.html#sparsedistilkldivloss"><code>SparseDistillKLDivLoss</code></a> or <a href="../package_reference/sparse_encoder/losses.html#sparsemarginmseloss"><code>SparseMarginMSELoss</code></a>.
69+
* `(query, positive, negative_1, ..., negative_n)` format: This structure with multiple negatives is particularly effective with <a href="../package_reference/sparse_encoder/losses.html#spladeloss"><code>SpladeLoss</code></a> configured with <a href="../package_reference/sparse_encoder/losses.html#sparsemarginmseloss"><code>SparseMarginMSELoss</code></a>, especially in knowledge distillation scenarios where a teacher model provides similarity scores. The strongest models are trained with distillation losses like <a href="../package_reference/sparse_encoder/losses.html#sparsedistillkldivloss"><code>SparseDistillKLDivLoss</code></a> or <a href="../package_reference/sparse_encoder/losses.html#sparsemarginmseloss"><code>SparseMarginMSELoss</code></a>.
7170

7271
## Custom Loss Functions
7372

0 commit comments

Comments
 (0)