Skip to content

Conversation

stormshield-fabs
Copy link
Contributor

@stormshield-fabs stormshield-fabs commented Jul 2, 2025

Benchmarks in the cloud between two Linux machines on a 10ms/1Gbps link revealed that quinn performs consistently worse than msquic. Plotting the congestion window for both implementations shows that msquic reaches higher congestion window sizes, in part because it implements spurious loss recovery (as described in rfc8312bis which is now rfc9438).

This PR implements a similar spurious loss recovery mechanism (quinn implements rfc8312)

Lost packets are stored in the PacketSpace and used on ACK reception to determine if all the packets we deemed lost were in fact acknowledged. This list is regularly purged from packets that were sent more than 2 PTO ago (this is the value used in msquic: https://github.com/microsoft/msquic/blob/2623c07df62b4bd171f469fb29c2714b6735b676/src/core/loss_detection.c#L925).

Overall, this appears to effectively improve throughput on the benchmark I described above: we've seen up to a 30% increase in upload bandwidth.

main patch gain
average 314 Mbps 391 Mbps 25%
P50 302 Mbps 402 Mbps 33%

@stormshield-fabs
Copy link
Contributor Author

The msquic congestion window plot that helped find this difference between both implementations
image

Before/after plots for quinn (the orange curve is the average of all samples)
image

Copy link
Collaborator

@Ralith Ralith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is great work! It's much simpler than I feared, and the improved throughput is really compelling. I wonder if this will also account for our slightly lagging throughput numbers in the interop test...

I'd like to see test coverage for the new behavior.

@stormshield-fabs stormshield-fabs force-pushed the spurious-loss-recovery branch 3 times, most recently from ebee833 to fbf7f41 Compare July 3, 2025 13:22
@stormshield-fabs stormshield-fabs requested a review from Ralith July 3, 2025 14:06
@stormshield-fabs stormshield-fabs force-pushed the spurious-loss-recovery branch from fbf7f41 to a567107 Compare July 3, 2025 15:04
@Ralith
Copy link
Collaborator

Ralith commented Jul 3, 2025

Note this is a breaking change due to the tweak to the congestion controller interface. We could adjust the design to be non-breaking, but I think it's fine; the current form is fairly natural and it's been a while.

@stormshield-fabs stormshield-fabs force-pushed the spurious-loss-recovery branch from a567107 to 592929b Compare July 3, 2025 19:24
@stormshield-fabs stormshield-fabs marked this pull request as ready for review July 4, 2025 08:05
@stormshield-fabs stormshield-fabs force-pushed the spurious-loss-recovery branch from 592929b to b116872 Compare July 4, 2025 08:07
@djc djc added the breaking label Jul 4, 2025
@djc
Copy link
Member

djc commented Jul 4, 2025

Note this is a breaking change due to the tweak to the congestion controller interface. We could adjust the design to be non-breaking, but I think it's fine; the current form is fairly natural and it's been a while.

Should we get another release out with non-breaking improvements before we merge this?

@Ralith
Copy link
Collaborator

Ralith commented Jul 5, 2025

No objection from me.

@stormshield-fabs
Copy link
Contributor Author

We are working on implementing RFC 9438 which will likely require more changes to the Controller trait.

Can we wait a bit more before merging this breaking change?

@Ralith
Copy link
Collaborator

Ralith commented Aug 4, 2025

I think we should merge this as soon as we get a non-breaking release out the door, but we can hold the breaking release.

@Ralith Ralith added this pull request to the merge queue Oct 7, 2025
Merged via the queue into quinn-rs:main with commit 531ca90 Oct 7, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants