Improve memory efficiency of Position class #1054

lekemula · 2025-08-31T10:00:07Z

Problem

#1055 (extracted into an issue)

This sparked my interest in exploring why this is the case.

Good news here is that we don't seem to have a memory leak; however, we seem to have a huge memory bloat.

Approach

Avoid allocating additional strings, instead use sliced substrings
- Use String#index ~~each_line~~ instead of String#lines

ruby-memory-profiler run --max 10 --pretty -- bundle exec solargraph typecheck --level typed lib/solargraph/api_map.rb

Before

Interesting bits:

Total allocated: 1.37 GB (14775646 objects)
Total retained:  136.90 MB (1527641 objects)

allocated memory by gem
-----------------------------------
 955.06 MB  solargraph/lib
 137.73 MB  other
  82.55 MB  prism-1.4.0
  75.43 MB  parser-3.3.9.0
  42.17 MB  set
  24.82 MB  ripper
  17.08 MB  yard-0.9.37
  11.17 MB  pathname
   6.95 MB  ast-2.4.3
   5.70 MB  rbs-3.9.4

allocated memory by file
-----------------------------------
 308.94 MB  solargraph/lib/solargraph/position.rb
 207.20 MB  solargraph/lib/solargraph/complex_type/unique_type.rb
 124.32 MB  <internal:marshal>
  87.84 MB  solargraph/lib/solargraph/complex_type.rb
  65.70 MB  parser-3.3.9.0/lib/parser/builders/default.rb
  58.76 MB  solargraph/lib/solargraph/source/cursor.rb
  54.47 MB  solargraph/lib/solargraph/parser/comment_ripper.rb
  42.17 MB  ruby/lib/set.rb
  37.06 MB  solargraph/lib/solargraph/pin_cache.rb
  33.58 MB  prism-1.4.0/lib/prism/translation/parser/compiler.rb

allocated memory by location
-----------------------------------
 143.55 MB  solargraph/lib/solargraph/complex_type/unique_type.rb:367
 130.47 MB  solargraph/lib/solargraph/position.rb:85
 124.32 MB  <internal:marshal>:34
 109.17 MB  solargraph/lib/solargraph/position.rb:62
  67.87 MB  solargraph/lib/solargraph/position.rb:87
  49.13 MB  solargraph/lib/solargraph/complex_type.rb:151
  47.80 MB  parser-3.3.9.0/lib/parser/builders/default.rb:1875
  47.45 MB  solargraph/lib/solargraph/parser/comment_ripper.rb:64
  36.93 MB  solargraph/lib/solargraph/pin_cache.rb:198
  27.74 MB  ruby/lib/set.rb:244

allocated memory by class
-----------------------------------
 540.24 MB  String
 320.98 MB  Array
 266.11 MB  Hash
  55.74 MB  Solargraph::ComplexType::UniqueType
  21.22 MB  Solargraph::Pin::Method
  15.31 MB  Prism::Location
  13.78 MB  Solargraph::Pin::Parameter
  11.96 MB  Parser::AST::Node
  10.43 MB  Parser::Source::Range
  10.03 MB  Solargraph::Position

After

Interesting bits:

Total allocated: 1.19 GB (12861604 objects)
Total retained:  136.93 MB (1528024 objects)

allocated memory by gem
-----------------------------------
 773.30 MB  solargraph/lib
 137.73 MB  other
  82.71 MB  prism-1.4.0
  75.62 MB  parser-3.3.9.0
  42.19 MB  set
  24.89 MB  ripper
  17.08 MB  yard-0.9.37
  11.27 MB  pathname
   6.99 MB  ast-2.4.3
   5.73 MB  rbs-3.9.4

allocated memory by file
-----------------------------------
 207.20 MB  solargraph/lib/solargraph/complex_type/unique_type.rb
 126.81 MB  solargraph/lib/solargraph/position.rb
 124.32 MB  <internal:marshal>
  87.84 MB  solargraph/lib/solargraph/complex_type.rb
  65.87 MB  parser-3.3.9.0/lib/parser/builders/default.rb
  58.76 MB  solargraph/lib/solargraph/source/cursor.rb
  54.59 MB  solargraph/lib/solargraph/parser/comment_ripper.rb
  42.19 MB  ruby/lib/set.rb
  37.07 MB  solargraph/lib/solargraph/pin_cache.rb
  33.62 MB  prism-1.4.0/lib/prism/translation/parser/compiler.rb

allocated memory by location
-----------------------------------
 143.55 MB  solargraph/lib/solargraph/complex_type/unique_type.rb:367
 124.32 MB  <internal:marshal>:34
  68.50 MB  solargraph/lib/solargraph/position.rb:91
  56.88 MB  solargraph/lib/solargraph/position.rb:63
  49.14 MB  solargraph/lib/solargraph/complex_type.rb:151
  47.92 MB  parser-3.3.9.0/lib/parser/builders/default.rb:1875
  47.57 MB  solargraph/lib/solargraph/parser/comment_ripper.rb:64
  36.93 MB  solargraph/lib/solargraph/pin_cache.rb:198
  27.76 MB  ruby/lib/set.rb:244
  23.19 MB  ruby/lib/ripper/sexp.rb:128

allocated memory by class
-----------------------------------
 373.45 MB  String
 306.25 MB  Array
 266.43 MB  Hash
  55.74 MB  Solargraph::ComplexType::UniqueType
  21.23 MB  Solargraph::Pin::Method
  15.34 MB  Prism::Location
  13.78 MB  Solargraph::Pin::Parameter
  12.00 MB  Parser::AST::Node
  10.45 MB  Parser::Source::Range
  10.04 MB  Solargraph::Position

Microbenchmark results: ruby tmp/position_from_offset_benchmark.rb

# tmp/position_from_offset_benchmark.rb

require 'bundler/inline'

# Ensure the required gems are installed
gemfile do
  source 'https://rubygems.org'
  gem 'benchmark-memory'
  gem 'benchmark-ips'
end


large_string = (0..1_000).map { |i| i.to_s * 100 }.join

def from_offset_index text, offset
  cursor = 0
  line = 0
  character = offset
  newline_index = -1

  while (newline_index = text.index("\n", newline_index + 1)) && newline_index < offset
    line += 1
    character = offset - newline_index - 1
  end

  character = 0 if character.nil? and (cursor - offset).between?(0, 1)
end

def from_offset_each_char text, offset
  cursor = 0
  line = 0
  character = nil
  char_index = 0
  text.each_char do |char|
    if cursor == offset
      character = char_index
      break
    end

    if char == "\n"
      line += 1
      char_index = 0
    else
      char_index += 1
    end

    cursor += 1
  end
  character = 0 if character.nil? and (cursor - offset).between?(0, 1)
end

def from_offset_each_line text, offset
  cursor = 0
  line = 0
  character = nil
  text.each_line do |l|
    line_length = l.length

    if l.end_with?("\n") || l.end_with?("\r\n")
      char_length = line_length - 1
    else
      char_length = line_length
    end

    if cursor + char_length >= offset
      character = offset - cursor
      break
    end
    cursor += line_length
    line += 1
  end
  character = 0 if character.nil? and (cursor - offset).between?(0, 1)
end


def from_offset_old text, offset
  cursor = 0
  line = 0
  character = nil
  text.lines.each do |l|
    line_length = l.length
    char_length = l.chomp.length
    if cursor + char_length >= offset
      character = offset - cursor
      break
    end
    cursor += line_length
    line += 1
  end
  character = 0 if character.nil? and (cursor - offset).between?(0, 1)
end

offset = 10_000_000

Benchmark.memory do |x|
  x.report("string.lines.each") do
    from_offset_old(large_string, offset)
  end

  x.report("string.index") do
    from_offset_index(large_string, offset)
  end

  x.report("string.each_line") do
    from_offset_each_line(large_string, offset)
  end

  x.report("string.each_char") do
    from_offset_each_char(large_string, offset)
  end

  x.compare!
end


Benchmark.ips do |x|
  x.report("string.lines.each") do
    from_offset_old(large_string, offset)
  end

  x.report("string.index") do
    from_offset_index(large_string, offset)
  end

  x.report("string.each_line") do
    from_offset_each_line(large_string, offset)
  end

  x.report("string.each_char") do
    from_offset_each_char(large_string, offset)
  end

  x.compare!
end

Calculating -------------------------------------
   string.lines.each   289.562k memsize (   289.442k retained)
                         4.000  objects (     1.000  retained)
                         1.000  strings (     1.000  retained)
        string.index    40.000  memsize (     0.000  retained)
                         1.000  objects (     0.000  retained)
                         1.000  strings (     0.000  retained)
    string.each_line   120.000  memsize (     0.000  retained)
                         3.000  objects (     0.000  retained)
                         3.000  strings (     0.000  retained)
    string.each_char    23.152M memsize (     0.000  retained)
                       578.800k objects (     0.000  retained)
                        11.000  strings (     0.000  retained)

Comparison:
        string.index:         40 allocated
    string.each_line:        120 allocated - 3.00x more
   string.lines.each:     289562 allocated - 7239.05x more
    string.each_char:   23152000 allocated - 578800.00x more
ruby 3.3.1 (2024-04-23 revision c56cd86388) [arm64-darwin22]
Warming up --------------------------------------
   string.lines.each    14.497k i/100ms
        string.index    14.833k i/100ms
    string.each_line    13.165k i/100ms
    string.each_char     2.000 i/100ms
Calculating -------------------------------------
   string.lines.each    143.855k (± 1.8%) i/s -    724.850k in   5.040368s
        string.index    151.539k (± 1.6%) i/s -    771.316k in   5.091280s
    string.each_line    141.401k (± 8.1%) i/s -    710.910k in   5.069155s
    string.each_char     27.332 (± 3.7%) i/s -    138.000 in   5.058226s

Comparison:
        string.index:   151538.8 i/s
   string.lines.each:   143855.4 i/s - 1.05x  slower
    string.each_line:   141401.4 i/s - same-ish: difference falls within error
    string.each_char:       27.3 i/s - 5544.34x  slower

While we unfortunately did not see a good impact on overall memory consumption when running solargraph stdio, I still thought it was worth committing this as a good start, maybe.

lekemula · 2025-08-31T10:04:48Z

benchmarks/position_from_offset_benchmark.rb

I remember a tweet from @byroot that benchmarks should be committed in the repo, so I thought we'd want to follow his suggestion(?).

Example:
https://github.com/ruby/json/tree/master/benchmark

I don't think I said that, or at least not exactly.

In the case of JSON in makes sense, as it's a suite of benchmark encompassing most of the library.

My read from your PR here is you are just optimizing a small part of it, in such case just recording the benchmark in the commit message is plenty.

That makes way more sense! I'm glad I mentioned my "source of hallucination" and appreciate you chiming in 🙇‍♂️

byroot · 2025-08-31T10:15:24Z

benchmarks/position_from_offset_benchmark.rb

+  x.report("string.each_char") do
+    from_offset_each_char(large_string, offset)
+  end


Why benchmark each_char?

Probably because I'm a newbie in this area and just got curious 🙈

My assumption on why each_char turned out to be super-slow compared to each_line was that the latter handles more in the ruby c code side of things - is there some truth to it, or is there some other reason why you'd say this would not make any sense (to even try benchmarking)?

string.each_char: 27.3 i/s - 5340.84x slower

each_char will allocate one string per character in the file. That alone ensure it will never be competitive. Eventually you could use each_byte, but you are more likely to find a performant implementation by looking for a way to delegate searching for line boundaries to builtin methods.

lib/solargraph/position.rb

byroot · 2025-08-31T10:20:20Z

lib/solargraph/position.rb

+        if l.end_with?("\n")
+          char_length = line_length - 1
+        else
+          char_length = line_length
+        end


If you did actually expect \r\n here, then:

Suggested change

if l.end_with?("\n")

char_length = line_length - 1

else

char_length = line_length

end

if l.end_with?("\n")

char_length = line_length - (l.end_with?("\r\n") ? 2 : 1)

else

char_length = line_length

end

Considering the fact that specs pass without checking l.end_with?("\r\n"), I will assume that it was just a 🤖 spill over.

I'll keep your suggestion in mind, this trick could come in handy in other circumstances 👍

byroot · 2025-08-31T10:26:15Z

I'm absolutely unfamiliar with this codebase, and unsure what this code does. But I'd suggest looking into various string APIs that allow you counting lines without allocating a string for each line.

e..g you can use String#index(offset:) to find new lines one by one, etc.

index may be a bit slow with UTF-8, but you could potentially use byteindex https://docs.ruby-lang.org/en/master/String.html#method-i-byteindex

But again this is just blind suggestions based on the diff. I have no clue what this code does.

castwide#1054 (comment)

castwide#1054 (comment) Co-authored-by: Jean Boussier <[email protected]>

lekemula · 2025-08-31T21:13:45Z

I'm absolutely unfamiliar with this codebase, and unsure what this code does. But I'd suggest looking into various string APIs that allow you counting lines without allocating a string for each line.

e..g you can use String#index(offset:) to find new lines one by one, etc.

index may be a bit slow with UTF-8, but you could potentially use byteindex docs.ruby-lang.org/en/master/String.html#method-i-byteindex

But again this is just blind suggestions based on the diff. I have no clue what this code does.

@byroot 🚀🚀 5f698e0

Total consumption went down to Total allocated: 1.10 GB (11661287 objects) from:

Total allocated: 1.19 GB (12861604 objects) - compared to my initial String#each_line approach
Total allocated: 1.37 GB (14775646 objects) - compared to original implementation

And file memory consumption down to the bare minimum 🚀

allocated memory by file
-----------------------------------
   1.42 MB  solargraph/lib/solargraph/position.rb

(had to filter for this specific file only in order to appear in the report :D ruby-memory-profiler run --max 10 --pretty --allow_files lib/solargraph/position.rb -- bundle exec solargraph typecheck --level typed lib/solargraph/api_map.rb

compared to:

My initial approach

allocated memory by file
-----------------------------------
 126.81 MB  solargraph/lib/solargraph/position.rb

Original implementation

allocated memory by file
-----------------------------------
 308.94 MB  solargraph/lib/solargraph/position.rb

Today I learned a lot!

lekemula · 2025-08-31T21:33:09Z

From what I can tell, @castwide, this improvement mainly benefits the typechecker, while the LSP Server isn’t affected as much. Hopefully, we can carry some of these learnings over there soon!

Avoid allocating additional strings, instead use sliced substrings

Co-authored-by: Jean Boussier <[email protected]>

castwide#1054 (comment)

castwide#1054 (comment) Co-authored-by: Jean Boussier <[email protected]>

lekemula changed the title ~~Use each_line instead of text.lines~~ Improve memory efficiency of Position class Aug 31, 2025

lekemula commented Aug 31, 2025

View reviewed changes

byroot reviewed Aug 31, 2025

View reviewed changes

lib/solargraph/position.rb Outdated Show resolved Hide resolved

byroot reviewed Aug 31, 2025

View reviewed changes

lekemula added a commit to lekemula/solargraph that referenced this pull request Aug 31, 2025

Remove benchamrks

2e87f96

castwide#1054 (comment)

lekemula added a commit to lekemula/solargraph that referenced this pull request Aug 31, 2025

String#index(offset:) FTW 🚀

5f698e0

castwide#1054 (comment) Co-authored-by: Jean Boussier <[email protected]>

lekemula and others added 6 commits September 3, 2025 22:03

Use each_line instead of text.lines

c70ed5b

Avoid allocating additional strings, instead use sliced substrings

gitignore vendor/cache

66704c8

Remove redundant end_with?

5370f98

Co-authored-by: Jean Boussier <[email protected]>

Remove benchamrks

d89fa39

castwide#1054 (comment)

String#index(offset:) FTW 🚀

c88accb

castwide#1054 (comment) Co-authored-by: Jean Boussier <[email protected]>

fix rubocop

b087df1

lekemula force-pushed the improve-memory-efficiency branch from 5f698e0 to b087df1 Compare September 3, 2025 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve memory efficiency of Position class #1054

Improve memory efficiency of Position class #1054

Uh oh!

lekemula commented Aug 31, 2025 •

edited

Loading

Uh oh!

lekemula Aug 31, 2025

Uh oh!

byroot Aug 31, 2025

Uh oh!

lekemula Aug 31, 2025 •

edited

Loading

Uh oh!

byroot Aug 31, 2025

Uh oh!

lekemula Aug 31, 2025

Uh oh!

lekemula Aug 31, 2025 •

edited

Loading

Uh oh!

byroot Aug 31, 2025

Uh oh!

Uh oh!

byroot Aug 31, 2025 •

edited

Loading

Uh oh!

lekemula Aug 31, 2025

Uh oh!

byroot commented Aug 31, 2025

Uh oh!

lekemula commented Aug 31, 2025 •

edited

Loading

Uh oh!

lekemula commented Aug 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Improve memory efficiency of Position class #1054

Are you sure you want to change the base?

Improve memory efficiency of Position class #1054

Uh oh!

Conversation

lekemula commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Approach

Before

After

Uh oh!

lekemula Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

byroot Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

lekemula Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

byroot Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

lekemula Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

lekemula Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

byroot Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

byroot Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lekemula Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

byroot commented Aug 31, 2025

Uh oh!

lekemula commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

My initial approach

Original implementation

Uh oh!

lekemula commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

lekemula commented Aug 31, 2025 •

edited

Loading

lekemula Aug 31, 2025 •

edited

Loading

lekemula Aug 31, 2025 •

edited

Loading

byroot Aug 31, 2025 •

edited

Loading

lekemula commented Aug 31, 2025 •

edited

Loading

lekemula commented Aug 31, 2025 •

edited

Loading