BaseMetric¶

class mmeval.core.BaseMetric(dataset_meta: Optional[Dict] = None, dist_collect_mode: str = 'unzip', dist_backend: Optional[str] = None, logger: Optional[logging.Logger] = None)[source]¶

Base class for metric.

To implement a metric, you should implement a subclass of BaseMetric that overrides the add and compute_metric methods. BaseMetric will automatically complete the distributed synchronization between processes.

In the evaluation process, each metric will update self._results to store intermediate results after each call of add. When computing the final metric result, the self._results will be synchronized between processes.

Parameters

dataset_meta (dict, optional) – Meta information of the dataset, this is required for some metrics that require dataset information. Defaults to None.
dist_collect_mode (str, optional) – The method of concatenating the collected synchronization results. This depends on how the distributed data is split. Currently only ‘unzip’ and ‘cat’ are supported. For PyTorch’s DistributedSampler, ‘unzip’ should be used. Defaults to ‘unzip’.
dist_backend (str, optional) – The name of the distributed communication backend, you can get all the backend names through mmeval.core.list_all_backends(). If None, use the default backend. Defaults to None.
logger (Logger, optional) – The logger used to log messages. If None, use the default logger of mmeval. Defaults to None.

Example to implement an accuracy metric:

>>> import numpy as np
>>> from mmeval.core import BaseMetric
>>>
>>> class Accuracy(BaseMetric):
...     def add(self, predictions, labels):
...         self._results.append((predictions, labels))
...     def compute_metric(self, results):
...         predictions = np.concatenate([res[0] for res in results])
...         labels = np.concatenate([res[1] for res in results])
...         correct = (predictions == labels)
...         accuracy = sum(correct) / len(predictions)
...         return {'accuracy': accuracy}

Stateless call of metric:

>>> accuracy = Accuracy()
>>> accuracy(predictions=[1, 2, 3, 4], labels=[1, 2, 3, 1])
{'accuracy': 0.75}

Accumulate batch:

>>> for i in range(10):
>>>     predicts = np.random.randint(0, 4, size=(10,))
>>>     labels = predicts = np.random.randint(0, 4, size=(10,))
>>>     accuracy.add(predicts, labels)
>>> accuracy.compute()  

abstract add(*args, **kwargs)[source]¶: Override this method to add the intermediate results to self._results.

Note

For performance issues, what you add to the self._results should be as simple as possible. But be aware that the intermediate results stored in self._results should correspond one-to-one with the samples, in that we need to remove the padded samples for the most accurate result.

compute(size: Optional[int] = None) → Dict[source]¶

Synchronize intermediate results and then call self.compute_metric.

Parameters: size (int, optional) – The length of the entire dataset, it is only used when distributed evaluation. When batch size > 1, the dataloader may pad some data samples to make sure all ranks have the same length of dataset slice. The compute will drop the padded data based on this size. If None, do nothing. Defaults to None.
Returns: The computed metric results.
Return type: dict

abstract compute_metric(results: List[Any]) → Dict[source]¶

Override this method to compute the metric result from collectd intermediate results.

The returned result of the metric compute should be a dictionary.

property dataset_meta: Optional[Dict]¶: Meta information of the dataset.

property name: str¶: The metric name, defaults to the name of the class.

reset() → None[source]¶: Clear the metric stored results.