Add a progressiveChunkSize option for Flight rendering #35089
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Use the same default as ReactFizzServer. Setting it higher leads to even more performance gains (up to 2x), but obviously needs to be balanced with blocking painting.
On Vercel, Next.js 16.0.2-canary.12, before:
After, with
MAX_ROW_SIZE=65100(1.72x better P95):Using the somewhat conservative default from this PR (
MAX_ROW_SIZE=12800) delivers a ~1.3x gain in local testing (1.4x on Bun). Results are even more noticeable on Next.js 15, on other platforms, and also appear to be larger in real-world testing on serverless platforms (though I haven't done extensive testing here).Steady-state memory usage appears somewhat lower after this too, though marginal (402MB vs 428MB on the same Vercel function).
I've often see gains that are even higher in real world testing (this is a 2.2x gain, also on Next.js 16 canary), but I suspect that this may just be due to serverless hardware noise / noisy-neighbours.
Summary
#33030 introduced a fixed
MAX_ROW_SIZE=3200, above which flight tasks are deferred, so as to reduce blocking the painting of large non-lazy elements that may contain client components (I may have some of the exact details / terminology incorrect here).However, this appears to have had a relatively large impact on the SSR performance of large pages / elements, especially in Next.js. Profiling the Next.js benchmark from t3dotgg, which SSRs a ~2MB page with no async components or Suspense, shows that a large amount of rendering time is spent handling these lazy chunks. A smaller reproduction is mentioned below.
Making this configurable would at least give frameworks (and/or users) the option to choose which trade-offs they'd prefer – and with some refactoring, may allow for larger chunk sizes for SSR, while still delivering smaller chunks for client-side rendering.
How did you test this change?
More results and a reproduction (~120kb html page) can be found here: https://github.com/mhart/react-server-defer-task
Local testing with Bun and wrk (2 threads, 10 concurrent requests):
Other runtimes:
More details
My initial investigations showed a large amount of time was being spent in this "throw" line. Depending on the JS engine and its settings, throws can be quite expensive, as call stack information may need to be gathered at the time of throw, and many of these stacks were at least 30 deep.
Following the trail of where the throws were coming from led me to see they were "pending" lazy chunks that had been created by "deferTask", and that led me to see they were being created as the max row size had been reached.
In the aforementioned page, there are 9,485 chunks thrown this way with the current max row size. Removing this check suddenly led to a 2x performance improvement (on Next.js 15).
NB: To be clear, I don't think most of the gains are due to the reduction of throwing (at least on Node.js, Bun may be different) – I think most of the slowdown is just due to the introduction of many many lazy chunks that need to be handled in the rendering pipeline.
Other thoughts
I'm not wedded to this approach – I'm not even sure progressiveChunkSize has the same meaning as MAX_ROW_SIZE, though it intuitively feels like it is. This PR is, if nothing else, just a way to highlight that the splitting into many lazy chunks has a noticeable impact on SSR performance, and that better batching would be beneficial.
The 2MB page renders in 24ms locally on Node.js, using renderToReadableStream (the 120kb page renders in 1.2ms, ~15x faster than in Next.js) – so the serialization (and subsequent deserialization and reserialization) of RSC chunks clearly has a noticeable effect – and that makes up the bulk of the Next.js time here. Early JSON serialization, then JSON parsing and then JSON serialization again show up predominately in the profiles.
It seems to me that alternative approaches that avoid the need to deserialize just to render HTML on the server in the first place might be useful here (eg, just render to an object stream / async iterator) – and would also allow for other optimizations, such as knowing where the closing head tag is or closing body tag, instead of needing to deserialize bytes to find these (as Next.js currently has to do).
Also the need to serialize the entire RSC stream and double or 2.5x the payload seems like extra work – having markers in the already-serialized HTML for client component boundaries, and only serializing extra properties that can't be rendered to HTML would be useful to avoid doing extra work here.
However, these are larger changes that would require alternative approaches – doing something around the lazy chunk serialization here feels like an easier quick win.