MeanAverageRecall does not follow COCO: mAR@K should use top-K detections per image, not globally

### Search before asking

- [x] I have searched the Supervision [issues](https://github.com/roboflow/supervision/issues) and found no similar bug report.


### Bug

### Describe the bug

The current implementation of [MeanAverageRecall](https://github.com/roboflow/supervision/blob/develop/supervision/metrics/mean_average_recall.py) computes mAR@K by selecting the top-K predictions across all images in the dataset, rather than selecting the top-K predictions **per image**. According to the [COCO evaluation protocol](https://cocodataset.org/#detection-eval), mAR@K should be calculated by considering the top-K highest-confidence detections **for each image**.

> **Average Recall (AR):** AR<sup>max=K</sup> AR given K detections per image

This issue occurs because, in the concatenation step below, all detection results are merged together without keeping track of which image each detection came from. As a result, the subsequent selection of top-K predictions is performed globally across the entire dataset, rather than per image.

https://github.com/roboflow/supervision/blob/deb1c9c4f4b0cd678416a67c8a13f2ef8ed6878f/supervision/metrics/mean_average_recall.py#L222-L225

### Proposed Solution

To address this issue, I have modified the `_compute` and `_compute_average_recall_for_classes`  functions so that only the top-K detections per image are considered when calculating mAR@K, in accordance with the COCO evaluation protocol.
In both functions, instead of simply concatenating all detections, I've modified the process to first filter the statistics by confidence score before concatenating them and calculating the confusion matrix. I will submit a pull request with these changes shortly.

### Environment

- Supervision 0.26.1
- OS: Ubuntu 24.04
- Python: 3.12.3

### Minimal Reproducible Example

_No response_

### Additional

_No response_

### Are you willing to submit a PR?

- [x] Yes I'd like to help by submitting a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MeanAverageRecall does not follow COCO: mAR@K should use top-K detections per image, not globally #1966

Search before asking

Bug

Describe the bug

Proposed Solution

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	concatenated_stats = [np.concatenate(items, 0) for items in zip(*stats)]
	recall_scores_per_k, recall_per_class, unique_classes = (
	self._compute_average_recall_for_classes(*concatenated_stats)
	)

MeanAverageRecall does not follow COCO: mAR@K should use top-K detections per image, not globally #1966

Description

Search before asking

Bug

Describe the bug

Proposed Solution

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions