ROUGE¶
- class mmeval.metrics.ROUGE(rouge_keys: Union[List, Tuple, int, str] = (1, 2, 'L'), use_stemmer: bool = False, normalizer: Optional[Callable] = None, tokenizer: Optional[Union[Callable, str]] = None, accumulate: str = 'best', lowercase: bool = True, **kwargs: Any)[source]¶
Calculate Rouge Score used for automatic summarization.
This metric proposed in ROUGE: A Package for Automatic Evaluation of Summaries are common evaluation indicators in the fields of machine translation, automatic summarization, question and answer generation, etc.
- Parameters
rouge_keys (List or Tuple or int or str) – A list of rouge types to calculate. Keys that are allowed are
L
, and1
through9
. Defaults to(1, 2, 'L')
.use_stemmer (bool) – Use Porter stemmer to strip word suffixes to improve matching. Defaults to False.
normalizer (Callable, optional) – A user’s own normalizer function. If this is
None
, replacing any non-alpha-numeric characters with spaces is default. Defaults to None.tokenizer (Callable or str, optional) – A user’s own tokenizer function. Defaults to None.
accumulate (str) – Useful in case of multi-reference rouge score.
avg
takes the average of all references with respect to predictions.best
takes the best fmeasure score obtained between prediction and multiple corresponding references. Defaults tobest
.lowercase (bool) – If it is True, all characters will be lowercase. Defaults to True.
**kwargs – Keyword parameters passed to
BaseMetric
.
Examples
>>> from mmeval import ROUGE >>> predictions = ['the cat is on the mat'] >>> references = [['a cat is on the mat']] >>> metric = ROUGE(rouge_keys='L') >>> metric.add(predictions, references) >>> results = metric.compute_metric() {'rougeL_fmeasure': 0.8333333, 'rougeL_precision': 0.8333333, 'rougeL_recall': 0.8333333}
- add(predictions: Sequence[str], references: Sequence[Sequence[str]]) → None[source]¶
Add the intermediate results to
self._results
.- Parameters
predictions (Sequence[str]) – An iterable of predicted sentences.
references (Sequence[Sequence[str]) – An iterable of referenced sentences. Each predicted sentence may correspond to multiple referenced sentences.
- compute_metric(results: List[Any]) → dict[source]¶
Compute the rouge metric.
This method would be invoked in
BaseMetric.compute
after distributed synchronization.- Parameters
results (List) – A list that consists correct numbers. This list has already been synced across all ranks.
- Returns
The computed rouge score.
- Return type
Dict[str, float]