-
Notifications
You must be signed in to change notification settings - Fork 2.4k
perf: Optimize ipc stream read performance #24671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #24671 +/- ##
==========================================
+ Coverage 81.83% 81.84% +0.01%
==========================================
Files 1688 1689 +1
Lines 229999 230491 +492
Branches 2974 2974
==========================================
+ Hits 188219 188650 +431
- Misses 41027 41088 +61
Partials 753 753 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
scratch, | ||
); | ||
|
||
let new_pos = reader.stream_position()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the reborrow of reader explicit here? That is, pass &mut (&mut *reader).take(block_length as u64)
instead? Right now this only works because reader.take
automatically gets transformed to (&mut *reader).take
. I was really confused how this compiled since take
takes self
and should consume the &mut R
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have change this to &mut (&mut *reader).take(block_length as u64)
. But I still think it's a little verbose...
.unwrap_or_else(VecDeque::new); | ||
let mut buffers: VecDeque<arrow_format::ipc::BufferRef> = buffers.iter().collect(); | ||
|
||
// check that the sum of the sizes of all buffers is <= than the size of the file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why this check was removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't easily get file_size after removing the intermediate vec. But remove this check will indeed cause wrong result when file_size and buffer_size not match rather then report an error. Maybe we can change all the subsequent read_to_end
to read_excat
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't easily get file_size after removing the intermediate vec. But remove this check will indeed cause wrong result when file_size and buffer_size not match rather then report an error. Maybe we can change all the subsequent
read_to_end
toread_excat
?
@orlp kindly pin in case you forget this thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Liyixin95 I think that should be the way forward then yes. I do want the length to be checked.
reading a 1.9GB ipc stream file.
before:
peek memory: 3980.86 MB
after:
peek memory: 2086.41 MB