Perplexity¶
- class mmeval.metrics.Perplexity(ignore_labels: Optional[Union[int, List[int]]] = None, **kwargs)[source]¶
Perplexity measures how well a language model predicts a text sample.
It is commonly used as a metric for evaluating the quality of a language model. It is defined as 2 to the power of the cross-entropy loss of the model (or the negative log-likelihood of the sample).
- Parameters
ignore_labels (int or list[int], optional) – Integer specifying a target class to ignore. If given, this class index does not contribute to the returned score. Defaults to None.
**kwargs – Keyword parameters passed to
BaseMetric
.
Examples
>>> from mmeval import Perplexity >>> import numpy as np >>> >>> preds = np.random.rand(2, 4, 2) >>> targets = np.random.randint(low=0, high=2, size=(2, 4)) >>> metric = Perplexity() >>> result = metric(preds, targets) {'perplexity': ...}
- add(predictions: Sequence, targets: Sequence) → None[source]¶
Add the intermediate results to
self._results
.- Parameters
predictions (Sequence) – Probabilities assigned to each token in a sequence with shape [batch_size, seq_len, vocab_size].
targets (Sequence) – Ground truth values with a shape [batch_size, seq_len].
- compute_metric(results: List[Tuple[float, int]]) → Dict[str, float][source]¶
Compute the perplexity metric.
This method would be invoked in
BaseMetric.compute
after distributed synchronization.- Parameters
results (list) – A list that consisting the total and count. This list has already been synced across all ranks.
- Returns
The computed perplexity metric.
- Return type
Dict[str, float]