Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 2 additions & 8 deletions docs/concepts/metrics/available_metrics/traditional.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ scorer = NonLLMStringSimilarity(distance_measure=DistanceMeasure.HAMMING)

## BLEU Score

The `BleuScore` score is a metric used to evaluate the quality of `response` by comparing it with `reference`. It measures the similarity between the response and the reference based on n-gram precision and brevity penalty. BLEU score was originally designed to evaluate machine translation systems, but it is also used in other natural language processing tasks. Since it was designed to evaluate machine translation systems, it expects the response and reference to contain same number of sentences. The comparison is done at sentence level. BLEU score ranges from 0 to 1, where 1 indicates a perfect match between the response and the reference. This is a non LLM based metric.
The `BleuScore` score is a metric used to evaluate the quality of `response` by comparing it with `reference`. It measures the similarity between the response and the reference based on n-gram precision and brevity penalty. BLEU score was originally designed to evaluate machine translation systems, but it is also used in other natural language processing tasks. BLEU score ranges from 0 to 1, where 1 indicates a perfect match between the response and the reference. This is a non LLM based metric.

### Example
```python
Expand All @@ -44,12 +44,6 @@ sample = SingleTurnSample(
scorer = BleuScore()
await scorer.single_turn_ascore(sample)
```
Custom weights may be supplied to fine-tune the BLEU score further. A tuple of float weights for unigrams, bigrams, trigrams and so on can be given by

```python
scorer = BleuScore(weights=(0.25, 0.25, 0.25, 0.25))
```



## ROUGE Score
Expand Down Expand Up @@ -110,4 +104,4 @@ sample = SingleTurnSample(
)
scorer = StringPresence()
await scorer.single_turn_ascore(sample)
```
```
3 changes: 2 additions & 1 deletion requirements/dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ transformers
fastembed
graphene
rouge_score
sacrebleu
nltk
rapidfuzz
pandas
datacompy
datacompy
13 changes: 5 additions & 8 deletions src/ragas/metrics/_bleu_score.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,21 +15,18 @@ class BleuScore(SingleTurnMetric):
_required_columns: t.Dict[MetricType, t.Set[str]] = field(
default_factory=lambda: {MetricType.SINGLE_TURN: {"reference", "response"}}
)
weights: t.Tuple[float, ...] = (0.25, 0.25, 0.25, 0.25)
sentence_segmenter: t.Optional[HasSegmentMethod] = None
language: str = "english"

def __post_init__(self):
try:
from nltk.tokenize import word_tokenize
from nltk.translate.bleu_score import corpus_bleu
from sacrebleu import corpus_bleu
except ImportError:
raise ImportError(
"nltk is required for bleu score. Please install it using `pip install nltk`"
"sacrebleu is required for bleu score. Please install it using `pip install sacrebleu`"
)
if not self.sentence_segmenter:
self.sentence_segmenter = get_segmenter(language=self.language, clean=False)
self.word_tokenizer = word_tokenize
self.corpus_bleu = corpus_bleu

def init(self, run_config: RunConfig):
Expand All @@ -46,10 +43,10 @@ async def _single_turn_ascore(
response_sentences = self.sentence_segmenter.segment(sample.response)

reference = [
[self.word_tokenizer(reference)] for reference in reference_sentences
[reference] for reference in reference_sentences
]
response = [self.word_tokenizer(response) for response in response_sentences]
score = self.corpus_bleu(reference, response, weights=self.weights)
response = response_sentences
score = self.corpus_bleu(response, reference).score / 100
assert isinstance(score, float), "Expecting a float"
return score

Expand Down
Loading