BaseMetric Design¶
During the evaluation process, the results of partial datasets are usually inferred on each GPU in data parallel to speed up the evaluation.
Most of the time, we can’t just reduce the metric results from each subset of the dataset as the metric result of the dataset.
Therefore, the usual practice is to save the inference results obtained by each process or the intermediate results of the metric calculation. Then perform an all-gather operation across all processes, and finally calculate the metric results of the entire evaluation dataset.
The above operations are completed by BaseMetric in MMEval
, and its interface design is shown in the following:
The add
and compute_metric
methods are interfaces that need to be implemented by users. For more details, please refer to Custom Evaluation Metrics.
It can be seen from the [BaseMetric](mmeval.core.BaseMetric) interface that the main function of
BaseMetric is to provide distributed evaluation. The basic process is as follows:
The user calls the
add
method to save the inference result or the intermediate result of the metric calculation in theBaseMetric._results
list.The user calls the
compute
method, andBaseMetric
synchronizes the data in the_results
list across processes and calls the user-definedcompute_metric
method to calculate the metrics.
In addition, BaseMetric also considers that in distributed evaluation, some processes may pad repeated data samples, in order to ensure the same number of data samples in all processes. Such behavior will affect the indicators correctness of the calculation. E.g. DistributedSampler
in PyTorch.
To deal with this problem, BaseMetric.compute can receive a size
parameter, which represents the actual number of samples in the evaluation dataset. After _results
completes process synchronization, the padded samples will be removed according to dist_collect_mode
to achieve correct metric calculation.
Note
Be aware that the intermediate results stored in _results
should correspond one-to-one with the samples, in that we need to remove the padded samples for the most accurate result.