perf: math rubric skip overlong answers by mikasenghaas · Pull Request #1046 · PrimeIntellect-ai/verifiers

mikasenghaas · 2026-03-20T15:39:08Z

Description

Two fixes to improve the performance of the math rubric at high concurrency:

Defaults to extract_boxed_answer in strict mode to avoid symbolic parsing of the full model response (and also false positives, as mentioned in fix: extract_boxed_answer returns full text when no \boxed{} found #1028)
Explicitly skip verification of answers with >1000 chars

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

Note

Medium Risk
Moderate risk because it changes MathRubric scoring behavior (unboxed answers now score 0 and long parsed responses are skipped), which can affect training/eval metrics; changes are localized and guarded by configurable limits.

Overview
MathRubric now enforces boxed-format answers by default and avoids expensive verification on huge outputs. It switches the default parser extractor to extract_boxed_answer(strict=True), so responses without a \boxed{} final answer no longer get passed through to symbolic parsing.

Adds a configurable max_verify_chars (default 50_000) and skips math_verify when the parsed response exceeds this limit (with a warning), improving throughput under high concurrency.

Updates extract_boxed_answer API/docs to support strict mode, and adjusts tests to require boxed completions and to cover the new length limit behavior.

^{Written by Cursor Bugbot for commit 8d5dbbe. This will update automatically on new commits. Configure here.}

…nses Add strict_extract_boxed_answer that returns empty string on no \boxed{} match (instead of returning the full text). Add max_verify_chars guard to MathRubric to skip math_verify for responses exceeding 50k chars, preventing thread pool starvation from pathologically long expressions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

verifiers/rubrics/math_rubric.py

Problem: When a completion contains no \boxed{} tag, extract_boxed_answer returns the entire input text. This is passed to math_verify, which matches any number in the text — allowing a model to get correct-answer credit by mentioning the answer anywhere without using \boxed{}. During RL training, this means a model can skip the \boxed{} format entirely and still score 1.0 by embedding the correct number in its reasoning text. The strategy scoreboard from rewardprobe shows the impact: "correct_lazy" (just outputting the answer) scores 1.0, while "perfect" (full reasoning + boxed answer) scores only 0.67. Fix: Add a `strict` parameter to extract_boxed_answer (default: False). When strict=True, returns "" on no match instead of the full text. MathRubric now uses strict=True via functools.partial. This is backwards compatible: - extract_boxed_answer(text) still returns text (default strict=False) - Only MathRubric's parser uses strict=True - Other callers (rlm_env.py, etc.) are unaffected - Tests updated to use \boxed{} format in completions Found using rewardprobe (https://github.com/chopratejas/rewardprobe).

Wrap the timeout test completion in \boxed{} so the strict parser can extract it, and raise max_verify_chars to allow the 100k-char string through the length check to actually exercise the timeout logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Now redundant since extract_boxed_answer supports strict=True directly via the cherry-picked fix from chopratejas. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

tests/test_math_rubric.py

verifiers/utils/data_utils.py

The invalid-answer tests were still passing raw completions without \boxed{}, so the strict parser returned "" before math_verify ran. Wrapping in \boxed{} ensures the tests exercise actual math verification. Also update docs/reference.md with the new extract_boxed_answer(strict) signature. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mikasenghaas changed the title ~~math rubric strict extract boxed answer~~ perf: math rubric skip overlong answers Mar 20, 2026

mikasenghaas force-pushed the fix/math-rubric-strict-extract branch from 7f37a0f to 9aeb520 Compare March 20, 2026 15:44

mikasenghaas requested a review from willccbb March 20, 2026 15:46

mikasenghaas marked this pull request as ready for review March 20, 2026 15:46

cursor bot reviewed Mar 20, 2026

View reviewed changes

verifiers/rubrics/math_rubric.py Show resolved Hide resolved

verifiers/rubrics/math_rubric.py Show resolved Hide resolved

chopratejas and others added 2 commits March 20, 2026 16:32

mikasenghaas force-pushed the fix/math-rubric-strict-extract branch from 67eb88b to 823541a Compare March 20, 2026 16:33

refactor: remove strict_extract_boxed_answer

520df0f

Now redundant since extract_boxed_answer supports strict=True directly via the cherry-picked fix from chopratejas. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cursor bot reviewed Mar 20, 2026

View reviewed changes

tests/test_math_rubric.py Show resolved Hide resolved

verifiers/utils/data_utils.py Show resolved Hide resolved

willccbb merged commit fb64a9a into main Mar 20, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: math rubric skip overlong answers#1046

perf: math rubric skip overlong answers#1046
willccbb merged 5 commits intomainfrom
fix/math-rubric-strict-extract

mikasenghaas commented Mar 20, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mikasenghaas commented Mar 20, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mikasenghaas commented Mar 20, 2026 •

edited by cursor bot

Loading