ROUGE¶

class mmeval.metrics.ROUGE(rouge_keys: Union[List, Tuple, int, str] = (1, 2, 'L'), use_stemmer: bool = False, normalizer: Optional[Callable] = None, tokenizer: Optional[Union[Callable, str]] = None, accumulate: str = 'best', lowercase: bool = True, **kwargs: Any)[source]¶

Calculate Rouge Score used for automatic summarization.

This metric proposed in ROUGE: A Package for Automatic Evaluation of Summaries are common evaluation indicators in the fields of machine translation, automatic summarization, question and answer generation, etc.

Parameters

rouge_keys (List or Tuple or int or str) – A list of rouge types to calculate. Keys that are allowed are L, and 1 through 9. Defaults to (1, 2, 'L').
use_stemmer (bool) – Use Porter stemmer to strip word suffixes to improve matching. Defaults to False.
normalizer (Callable, optional) – A user’s own normalizer function. If this is None, replacing any non-alpha-numeric characters with spaces is default. Defaults to None.
tokenizer (Callable or str, optional) – A user’s own tokenizer function. Defaults to None.
accumulate (str) – Useful in case of multi-reference rouge score. avg takes the average of all references with respect to predictions. best takes the best fmeasure score obtained between prediction and multiple corresponding references. Defaults to best.
lowercase (bool) – If it is True, all characters will be lowercase. Defaults to True.
**kwargs – Keyword parameters passed to BaseMetric.

Examples

>>> from mmeval import ROUGE
>>> predictions = ['the cat is on the mat']
>>> references = [['a cat is on the mat']]
>>> metric = ROUGE(rouge_keys='L')
>>> metric.add(predictions, references)
>>> results = metric.compute_metric()
{'rougeL_fmeasure': 0.8333333,
 'rougeL_precision': 0.8333333,
 'rougeL_recall': 0.8333333}

add(predictions: Sequence[str], references: Sequence[Sequence[str]]) → None[source]¶

Add the intermediate results to self._results.

Parameters

predictions (Sequence[str]) – An iterable of predicted sentences.
references (Sequence[Sequence[str]) – An iterable of referenced sentences. Each predicted sentence may correspond to multiple referenced sentences.

compute_metric(results: List[Any]) → dict[source]¶

Compute the rouge metric.

This method would be invoked in BaseMetric.compute after distributed synchronization.

Parameters: results (List) – A list that consists correct numbers. This list has already been synced across all ranks.
Returns: The computed rouge score.
Return type: Dict[str, float]