Skip to content

fix: filter predictions with ignore_index in MulticlassClassificationMetrics#997

Open
jonaserb-k wants to merge 1 commit intomainfrom
fix/multiclass-metrics-ignore-index-predictions
Open

fix: filter predictions with ignore_index in MulticlassClassificationMetrics#997
jonaserb-k wants to merge 1 commit intomainfrom
fix/multiclass-metrics-ignore-index-predictions

Conversation

@jonaserb-k
Copy link
Collaborator

Summary

  • torchmetrics' ignore_index only filters targets, not predictions. When a model produces unmappable answers (e.g. "n/a"), the answer extraction maps them to -1, which causes torch.bincount to crash on negative values.
  • Overrides update() in MulticlassClassificationMetrics to filter out predictions matching ignore_index before they reach the underlying torchmetrics computation.
  • Adds two tests: mixed valid/ignored predictions, and all-ignored batch followed by valid batch.

Context

Discovered during CFMPB breast tubule formation benchmark evaluation — alibaba/qwen2-5-vl-7b-instruct-vllm returned "n/a" for some samples, which was mapped to -1 by ExtractDiscreteAnswer, crashing MulticlassAccuracy/MulticlassF1Score.

Test plan

  • Existing test still passes (test_multiclass_classification_metrics)
  • New test: predictions with ignore_index are filtered correctly
  • New test: all-ignored batch is skipped without crashing
  • Verified fix works on actual CFMPB benchmark run

…Metrics

torchmetrics' ignore_index only filters targets, not predictions.
When a model produces unmappable answers (e.g. "n/a"), the extraction
maps them to -1, which causes torch.bincount to crash on negative values.

This overrides update() to filter out predictions matching ignore_index
before they reach the underlying torchmetrics computation.
@jonaserb-k jonaserb-k requested a review from nkaenzig March 16, 2026 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant