-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[BugFix] Fix sample scan range index overflow in files schema detection #65894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes an index overflow bug in the file schema detection sampling logic that caused crashes (SIGABRT) when the requested sample file count exceeded the total number of available files.
Key Changes:
- Modified the sampling algorithm to iterate based on
total_file_countinstead ofsample_file_count, preventing index overflow - Added test cases to verify correct behavior when
sample_file_count > total_file_count
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| be/src/exec/file_scanner/file_scanner.cpp | Changed loop from iterating sample_file_count times to iterating while index is within bounds of total_file_count, fixing the overflow bug |
| be/test/exec/file_scanner/file_scanner_test.cpp | Added two test cases to verify correct sampling behavior when sample count exceeds total file count |
Signed-off-by: wyb <[email protected]>
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 6 / 6 (100.00%) file detail
|
|
@cursor review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Bugbot reviewed your changes and found no bugs!
|
@Mergifyio backport branch-3.5 |
|
@Mergifyio backport branch-4.0 |
|
@Mergifyio backport branch-3.4 |
✅ Backports have been created
|
🛑 Command
|
✅ Backports have been created
|
…on (#65894) Signed-off-by: wyb <[email protected]> (cherry picked from commit 5905aec)
…on (#65894) Signed-off-by: wyb <[email protected]> (cherry picked from commit 5905aec)
…on (backport #65894) (#65912) Signed-off-by: wyb <[email protected]> Co-authored-by: wyb <[email protected]>
|
@mergify backport branch-4.0 |
✅ Backports have been created
|
…on (StarRocks#65894) Signed-off-by: wyb <[email protected]>
…on (StarRocks#65894) Signed-off-by: wyb <[email protected]>
…on (backport #65894) (#65930) Signed-off-by: wyb <[email protected]>
…on (backport #65894) (#65931) Signed-off-by: wyb <[email protected]>
…on (StarRocks#65894) Signed-off-by: wyb <[email protected]> Signed-off-by: yan zhang <[email protected]>
Why I'm doing:
What I'm doing:
sample scan range index is overflow when
sample file countis smaller thantotal file count.Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check:
Note
Revamps
sample_filesto a bounded incremental selection to avoid overflow and correctly handlesample >= total, and updates unit tests with cases where total files are fewer than requested samples.FileScanner::sample_filesto iterate with an incremental rounded index, ensuring indices stay within[0, total-1], always include the last file, and sample all files whensample_file_count >= total_file_count.FileScannerTest::select_sample_files: Adjusts expectations and adds coverage for cases wheretotal_file_count < sample_file_count(e.g.,1/10,2/8,10/100).Written by Cursor Bugbot for commit 8210152. This will update automatically on new commits. Configure here.