Skip to content

[#2716] feat(spark): Introduce option of max segments decompression to control memory usage#2735

Merged
zuston merged 3 commits intoapache:masterfrom
zuston:decompression-oom
Mar 2, 2026
Merged

[#2716] feat(spark): Introduce option of max segments decompression to control memory usage#2735
zuston merged 3 commits intoapache:masterfrom
zuston:decompression-oom

Conversation

@zuston
Copy link
Member

@zuston zuston commented Feb 28, 2026

What changes were proposed in this pull request?

Introduce option of max segments decompression to control memory usage

Why are the changes needed?

To address issue #2716, this PR introduces an option to set the maximum number of concurrent decompression segments, allowing better control over overall memory usage. Setting this value to 1 restricts decompression to a single segment at a time.

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Unit test

@github-actions
Copy link

github-actions bot commented Feb 28, 2026

Test Results

 3 185 files  +34   3 185 suites  +34   6h 49m 50s ⏱️ + 12m 14s
 1 246 tests + 1   1 245 ✅ + 2   1 💤 ±0  0 ❌ ±0 
15 789 runs  +79  15 774 ✅ +81  15 💤 ±0  0 ❌ ±0 

Results for commit 59df817. ± Comparison against base commit 5986591.

♻️ This comment has been updated with latest results.

RSS_READ_OVERLAPPING_DECOMPRESSION_MAX_CONCURRENT_SEGMENTS =
ConfigOptions.key("rss.client.read.overlappingDecompressionMaxConcurrentSegments")
.intType()
.defaultValue(-1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set the default value to 1? That would make memory usage similar to when overlapping disabled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could adopt a more aggressive value, which should result in better performance.

Copy link
Member

@wForget wForget left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zuston , LGTM

// block
if (block != null) {
nowMemoryUsed.addAndGet(-block.getUncompressLength());
segmentPermits.ifPresent(x -> x.release());
Copy link
Contributor

@roryqi roryqi Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When block is null, we don't release. Will it cause dead lock? Should we use try finally to gurantee the release of the lock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have ensured the block is not always null here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@zuston
Copy link
Member Author

zuston commented Mar 2, 2026

updated. cc @wForget @roryqi .

BTW, to better tune this option, it would be helpful to add some statistics to the decompression worker, such as tracking the number of segments per batch.

@zuston zuston requested review from roryqi and wForget March 2, 2026 02:50
@roryqi
Copy link
Contributor

roryqi commented Mar 2, 2026

updated. cc @wForget @roryqi .

BTW, to better tune this option, it would be helpful to add some statistics to the decompression worker, such as tracking the number of segments per batch.

You can calculate the default value according to some data which you have known.

@zuston
Copy link
Member Author

zuston commented Mar 2, 2026

updated. cc @wForget @roryqi .
BTW, to better tune this option, it would be helpful to add some statistics to the decompression worker, such as tracking the number of segments per batch.

You can calculate the default value according to some data which you have known.

I have set this 10

@zuston zuston merged commit b324cc3 into apache:master Mar 2, 2026
24 of 40 checks passed
@zuston zuston deleted the decompression-oom branch March 2, 2026 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants