fix: skip multimodal samples that exceed seq_len instead of truncating#2064
Open
fix: skip multimodal samples that exceed seq_len instead of truncating#2064
Conversation
d214b39 to
9c4ab23
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| (idx, sample) | ||
| for idx, rollout in zip(idxs, rollouts) | ||
| if (sample := prepare_sample(rollout, seq_len)) is not None | ||
| ] |
There was a problem hiding this comment.
Skipping all samples causes empty batch crash
Medium Severity
When every sample in a batch is multimodal and exceeds seq_len, prepare_batch now filters them all out, producing empty micro-batch lists per worker. The training loop then crashes with an IndexError at micro_batches[0]["input_ids"]. Before this change, truncation always produced at least one micro batch, so this is a new crash path. The SinglePacker path has no upstream length validation, making it reachable in practice.
9c4ab23 to
8ab17e4
Compare
Truncating a multimodal sample breaks the alignment between image_pad tokens in input_ids and the pixel_values tensor. Instead, skip such samples with a warning. Text-only samples continue to truncate normally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8ab17e4 to
b314a39
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Summary
prepare_sample()now returnsNonefor multimodal samples that exceedseq_lenprepare_batch()skipsNonesamplesProblem: Truncating a multimodal sample drops image_pad tokens from
input_idswhilepixel_valuesandimage_grid_thware passed through unchanged. This causesValueError: Image features and image tokens do not matchor silent training on corrupt data.Fix: Skip the sample entirely with a warning log. The right long-term fix is to ensure
seq_lencovers your longest VLM samples.Related to #2013 which takes a different approach (trim pixel_values to match surviving tokens). Our approach is simpler and model-agnostic — no hardcoded token IDs or merge sizes.
Test plan
test_prepare_sample_skips_multimodal_exceeding_seq_len— multimodal > seq_len returns Nonetest_prepare_sample_keeps_multimodal_within_seq_len— multimodal <= seq_len works normallytest_prepare_sample_still_truncates_text_only— text-only truncation unchanged🤖 Generated with Claude Code
Note
Medium Risk
Changes batching behavior for VLM training by dropping overlong multimodal samples, which can affect training dynamics and effective batch size if
seq_lenis misconfigured. Logic is localized but sits on the training data path, so regressions could surface as reduced utilization or unexpected sample loss.Overview
Prevents corrupt VLM training data by skipping multimodal samples whose tokenized length exceeds
seq_leninstead of truncating them (which can desync image tokens frompixel_values).prepare_samplenow returnsNonefor these cases andprepare_batchfilters them out while emitting warning logs; text-only samples continue to truncate normally. Documentation and unit tests were updated to reflect and enforce the new skip behavior.Written by Cursor Bugbot for commit b314a39. This will update automatically on new commits. Configure here.