BLEU¶

class mmeval.metrics.BLEU(n_gram: int = 4, smooth: bool = False, ngram_weights: Optional[Sequence[float]] = None, tokenizer_fn: Optional[Union[Callable, str]] = None, **kwargs)[source]¶

Bilingual Evaluation Understudy metric.

This metric proposed in BLEU: a Method for Automatic Evaluation of Machine Translation is a tool for evaluating the quality of machine translation. The closer the translation is to human translation, the higher the score will be.

Parameters

n_gram (int) – The maximum number of words contained in a phrase when calculating word fragments. Defaults to 4.
smooth (bool) – Whether or not to apply to smooth. Defaults to False.
ngram_weights (Sequence[float], optional) – Weights used for unigrams, bigrams, etc. to calculate BLEU score. If not provided, uniform weights are used. Defaults to None.
tokenizer_fn (Union[Callable, str, None]) – A user’s own tokenizer function. Defaults to None. New in version 0.3.0.
**kwargs – Keyword parameters passed to BaseMetric.

Examples

>>> from mmeval import BLEU
>>> predictions = ['the cat is on the mat', 'There is a big tree near the park here']  # noqa: E501
>>> references = [['a cat is on the mat'], ['A big tree is growing near the park here']]  # noqa: E501
>>> bleu = BLEU()
>>> bleu_results = bleu(predictions, references)
{'bleu': 0.5226045319355426}

>>> # Calculate BLEU with smooth:
>>> from mmeval import BLEU
>>> predictions = ['the cat is on the mat', 'There is a big tree near the park here']  # noqa: E501
>>> references = [['a cat is on the mat'], ['A big tree is growing near the park here']]  # noqa: E501
>>> bleu = BLEU(smooth = True)
>>> bleu_results = bleu(predictions, references)
{'bleu': 0.566315716093867}

add(predictions: Sequence[str], references: Sequence[Sequence[str]]) → None[source]¶

Add the intermediate results to self._results.

Parameters

predictions (Sequence[str]) – An iterable of predicted sentences.
references (Sequence[Sequence[str]) – An iterable of referenced sentences.

compute_metric(results: List[Tuple[int, int, numpy.ndarray, numpy.ndarray]]) → dict[source]¶

Compute the bleu metric.

This method would be invoked in BaseMetric.compute after distributed synchronization.

Parameters: results (List[Tuple[int, int, np.ndarray, np.ndarray]]) – A list that consisting the tuple of correct numbers. Tuple contains pred_len, references_len, precision_matches, precision_total. This list has already been synced across all ranks.
Returns: The computed bleu score.
Return type: Dict[str, float]