-
-
Notifications
You must be signed in to change notification settings - Fork 162
Improve memory efficiency of Position class #1054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember a tweet from @byroot that benchmarks should be committed in the repo, so I thought we'd want to follow his suggestion(?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I said that, or at least not exactly.
In the case of JSON in makes sense, as it's a suite of benchmark encompassing most of the library.
My read from your PR here is you are just optimizing a small part of it, in such case just recording the benchmark in the commit message is plenty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes way more sense! I'm glad I mentioned my "source of hallucination" and appreciate you chiming in 🙇♂️
x.report("string.each_char") do | ||
from_offset_each_char(large_string, offset) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why benchmark each_char
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably because I'm a newbie in this area and just got curious 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My assumption on why each_char
turned out to be super-slow compared to each_line
was that the latter handles more in the ruby c
code side of things - is there some truth to it, or is there some other reason why you'd say this would not make any sense (to even try benchmarking)?
string.each_char: 27.3 i/s - 5340.84x slower
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each_char
will allocate one string per character in the file. That alone ensure it will never be competitive. Eventually you could use each_byte
, but you are more likely to find a performant implementation by looking for a way to delegate searching for line boundaries to builtin methods.
lib/solargraph/position.rb
Outdated
if l.end_with?("\n") | ||
char_length = line_length - 1 | ||
else | ||
char_length = line_length | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you did actually expect \r\n
here, then:
if l.end_with?("\n") | |
char_length = line_length - 1 | |
else | |
char_length = line_length | |
end | |
if l.end_with?("\n") | |
char_length = line_length - (l.end_with?("\r\n") ? 2 : 1) | |
else | |
char_length = line_length | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the fact that specs pass without checking l.end_with?("\r\n")
, I will assume that it was just a 🤖 spill over.
I'll keep your suggestion in mind, this trick could come in handy in other circumstances 👍
I'm absolutely unfamiliar with this codebase, and unsure what this code does. But I'd suggest looking into various string APIs that allow you counting lines without allocating a string for each line. e..g you can use
But again this is just blind suggestions based on the diff. I have no clue what this code does. |
castwide#1054 (comment) Co-authored-by: Jean Boussier <[email protected]>
Total consumption went down to
And file memory consumption down to the bare minimum 🚀
(had to filter for this specific file only in order to appear in the report :D compared to: My initial approach
Original implementation
Today I learned a lot! |
From what I can tell, @castwide, this improvement mainly benefits the typechecker, while the LSP Server isn’t affected as much. Hopefully, we can carry some of these learnings over there soon! |
Avoid allocating additional strings, instead use sliced substrings
Co-authored-by: Jean Boussier <[email protected]>
castwide#1054 (comment) Co-authored-by: Jean Boussier <[email protected]>
5f698e0
to
b087df1
Compare
Problem
#1055 (extracted into an issue)
This sparked my interest in exploring why this is the case.
Good news here is that we don't seem to have a memory leak; however, we seem to have a huge memory bloat.
Approach
Avoid allocating additional strings, instead use sliced substrings
- Use String#index
each_lineinstead of String#linesBefore
Interesting bits:
After
Interesting bits:
Microbenchmark results:
ruby tmp/position_from_offset_benchmark.rb
While we unfortunately did not see a good impact on overall memory consumption when running
solargraph stdio
, I still thought it was worth committing this as a good start, maybe.