-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Search before asking
- I have searched the Supervision issues and found no similar bug report.
Bug
Describe the bug
The current implementation of MeanAverageRecall computes mAR@K by selecting the top-K predictions across all images in the dataset, rather than selecting the top-K predictions per image. According to the COCO evaluation protocol, mAR@K should be calculated by considering the top-K highest-confidence detections for each image.
Average Recall (AR): ARmax=K AR given K detections per image
This issue occurs because, in the concatenation step below, all detection results are merged together without keeping track of which image each detection came from. As a result, the subsequent selection of top-K predictions is performed globally across the entire dataset, rather than per image.
supervision/supervision/metrics/mean_average_recall.py
Lines 222 to 225 in deb1c9c
concatenated_stats = [np.concatenate(items, 0) for items in zip(*stats)] | |
recall_scores_per_k, recall_per_class, unique_classes = ( | |
self._compute_average_recall_for_classes(*concatenated_stats) | |
) |
Proposed Solution
To address this issue, I have modified the _compute
and _compute_average_recall_for_classes
functions so that only the top-K detections per image are considered when calculating mAR@K, in accordance with the COCO evaluation protocol.
In both functions, instead of simply concatenating all detections, I've modified the process to first filter the statistics by confidence score before concatenating them and calculating the confusion matrix. I will submit a pull request with these changes shortly.
Environment
- Supervision 0.26.1
- OS: Ubuntu 24.04
- Python: 3.12.3
Minimal Reproducible Example
No response
Additional
No response
Are you willing to submit a PR?
- Yes I'd like to help by submitting a PR!