Perplexity¶

class mmeval.metrics.Perplexity(ignore_labels: Optional[Union[int, List[int]]] = None, **kwargs)[source]¶

Perplexity measures how well a language model predicts a text sample.

It is commonly used as a metric for evaluating the quality of a language model. It is defined as 2 to the power of the cross-entropy loss of the model (or the negative log-likelihood of the sample).

Parameters

ignore_labels (int or list[int], optional) – Integer specifying a target class to ignore. If given, this class index does not contribute to the returned score. Defaults to None.
**kwargs – Keyword parameters passed to BaseMetric.

Examples

>>> from mmeval import Perplexity
>>> import numpy as np
>>>
>>> preds = np.random.rand(2, 4, 2)
>>> targets = np.random.randint(low=0, high=2, size=(2, 4))
>>> metric = Perplexity()
>>> result = metric(preds, targets)  
{'perplexity': ...}

add(predictions: Sequence, targets: Sequence) → None[source]¶

Add the intermediate results to self._results.

Parameters

predictions (Sequence) – Probabilities assigned to each token in a sequence with shape [batch_size, seq_len, vocab_size].
targets (Sequence) – Ground truth values with a shape [batch_size, seq_len].

compute_metric(results: List[Tuple[float, int]]) → Dict[str, float][source]¶

Compute the perplexity metric.

This method would be invoked in BaseMetric.compute after distributed synchronization.

Parameters: results (list) – A list that consisting the total and count. This list has already been synced across all ranks.
Returns: The computed perplexity metric.
Return type: Dict[str, float]