Perplexity¶
- class mmeval.metrics.Perplexity(ignore_labels: Optional[Union[int, List[int]]] = None, **kwargs)[源代码]¶
Perplexity measures how well a language model predicts a text sample.
It is commonly used as a metric for evaluating the quality of a language model. It is defined as 2 to the power of the cross-entropy loss of the model (or the negative log-likelihood of the sample).
- 参数
ignore_labels (int or list[int], optional) – Integer specifying a target class to ignore. If given, this class index does not contribute to the returned score. Defaults to None.
**kwargs – Keyword parameters passed to
BaseMetric
.
实际案例
>>> from mmeval import Perplexity >>> import numpy as np >>> >>> preds = np.random.rand(2, 4, 2) >>> targets = np.random.randint(low=0, high=2, size=(2, 4)) >>> metric = Perplexity() >>> result = metric(preds, targets) {'perplexity': ...}
- add(predictions: Sequence, targets: Sequence) → None[源代码]¶
Add the intermediate results to
self._results
.- 参数
predictions (Sequence) – Probabilities assigned to each token in a sequence with shape [batch_size, seq_len, vocab_size].
targets (Sequence) – Ground truth values with a shape [batch_size, seq_len].
- compute_metric(results: List[Tuple[float, int]]) → Dict[str, float][源代码]¶
Compute the perplexity metric.
This method would be invoked in
BaseMetric.compute
after distributed synchronization.- 参数
results (list) – A list that consisting the total and count. This list has already been synced across all ranks.
- 返回
The computed perplexity metric.
- 返回类型
Dict[str, float]