BLEU¶
- class mmeval.metrics.BLEU(n_gram: int = 4, smooth: bool = False, ngram_weights: Optional[Sequence[float]] = None, tokenizer_fn: Optional[Union[Callable, str]] = None, **kwargs)[source]¶
Bilingual Evaluation Understudy metric.
This metric proposed in BLEU: a Method for Automatic Evaluation of Machine Translation is a tool for evaluating the quality of machine translation. The closer the translation is to human translation, the higher the score will be.
- Parameters
n_gram (int) – The maximum number of words contained in a phrase when calculating word fragments. Defaults to 4.
smooth (bool) – Whether or not to apply to smooth. Defaults to False.
ngram_weights (Sequence[float], optional) – Weights used for unigrams, bigrams, etc. to calculate BLEU score. If not provided, uniform weights are used. Defaults to None.
tokenizer_fn (Union[Callable, str, None]) – A user’s own tokenizer function. Defaults to None. New in version 0.3.0.
**kwargs – Keyword parameters passed to
BaseMetric
.
Examples
>>> from mmeval import BLEU >>> predictions = ['the cat is on the mat', 'There is a big tree near the park here'] # noqa: E501 >>> references = [['a cat is on the mat'], ['A big tree is growing near the park here']] # noqa: E501 >>> bleu = BLEU() >>> bleu_results = bleu(predictions, references) {'bleu': 0.5226045319355426}
>>> # Calculate BLEU with smooth: >>> from mmeval import BLEU >>> predictions = ['the cat is on the mat', 'There is a big tree near the park here'] # noqa: E501 >>> references = [['a cat is on the mat'], ['A big tree is growing near the park here']] # noqa: E501 >>> bleu = BLEU(smooth = True) >>> bleu_results = bleu(predictions, references) {'bleu': 0.566315716093867}
- add(predictions: Sequence[str], references: Sequence[Sequence[str]]) → None[source]¶
Add the intermediate results to
self._results
.- Parameters
predictions (Sequence[str]) – An iterable of predicted sentences.
references (Sequence[Sequence[str]) – An iterable of referenced sentences.
- compute_metric(results: List[Tuple[int, int, numpy.ndarray, numpy.ndarray]]) → dict[source]¶
Compute the bleu metric.
This method would be invoked in
BaseMetric.compute
after distributed synchronization.- Parameters
results (List[Tuple[int, int, np.ndarray, np.ndarray]]) – A list that consisting the tuple of correct numbers. Tuple contains pred_len, references_len, precision_matches, precision_total. This list has already been synced across all ranks.
- Returns
The computed bleu score.
- Return type
Dict[str, float]