BaseMetric¶
- class mmeval.core.BaseMetric(dataset_meta: Optional[Dict] = None, dist_collect_mode: str = 'unzip', dist_backend: Optional[str] = None, logger: Optional[logging.Logger] = None)[源代码]¶
Base class for metric.
To implement a metric, you should implement a subclass of
BaseMetric
that overrides theadd
andcompute_metric
methods.BaseMetric
will automatically complete the distributed synchronization between processes.In the evaluation process, each metric will update
self._results
to store intermediate results after each call ofadd
. When computing the final metric result, theself._results
will be synchronized between processes.- 参数
dataset_meta (dict, optional) – Meta information of the dataset, this is required for some metrics that require dataset information. Defaults to None.
dist_collect_mode (str, optional) – The method of concatenating the collected synchronization results. This depends on how the distributed data is split. Currently only ‘unzip’ and ‘cat’ are supported. For PyTorch’s
DistributedSampler
, ‘unzip’ should be used. Defaults to ‘unzip’.dist_backend (str, optional) – The name of the distributed communication backend, you can get all the backend names through
mmeval.core.list_all_backends()
. IfNone
, use the default backend. Defaults to None.logger (Logger, optional) – The logger used to log messages. If
None
, use the default logger of mmeval. Defaults to None.
Example to implement an accuracy metric:
>>> import numpy as np >>> from mmeval.core import BaseMetric >>> >>> class Accuracy(BaseMetric): ... def add(self, predictions, labels): ... self._results.append((predictions, labels)) ... def compute_metric(self, results): ... predictions = np.concatenate([res[0] for res in results]) ... labels = np.concatenate([res[1] for res in results]) ... correct = (predictions == labels) ... accuracy = sum(correct) / len(predictions) ... return {'accuracy': accuracy}
Stateless call of metric:
>>> accuracy = Accuracy() >>> accuracy(predictions=[1, 2, 3, 4], labels=[1, 2, 3, 1]) {'accuracy': 0.75}
Accumulate batch:
>>> for i in range(10): >>> predicts = np.random.randint(0, 4, size=(10,)) >>> labels = predicts = np.random.randint(0, 4, size=(10,)) >>> accuracy.add(predicts, labels) >>> accuracy.compute()
- abstract add(*args, **kwargs)[源代码]¶
Override this method to add the intermediate results to
self._results
.注解
For performance issues, what you add to the
self._results
should be as simple as possible. But be aware that the intermediate results stored inself._results
should correspond one-to-one with the samples, in that we need to remove the padded samples for the most accurate result.
- compute(size: Optional[int] = None) → Dict[源代码]¶
Synchronize intermediate results and then call
self.compute_metric
.- 参数
size (int, optional) – The length of the entire dataset, it is only used when distributed evaluation. When batch size > 1, the dataloader may pad some data samples to make sure all ranks have the same length of dataset slice. The
compute
will drop the padded data based on this size. If None, do nothing. Defaults to None.- 返回
The computed metric results.
- 返回类型
dict
- abstract compute_metric(results: List[Any]) → Dict[源代码]¶
Override this method to compute the metric result from collectd intermediate results.
The returned result of the metric compute should be a dictionary.
- property dataset_meta: Optional[Dict]¶
Meta information of the dataset.
- property name: str¶
The metric name, defaults to the name of the class.