-
Notifications
You must be signed in to change notification settings - Fork 314
feat: switch to QUIC multipath #3381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
dignifiedquire
wants to merge
151
commits into
main
Choose a base branch
from
feat-multipath
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3381/docs/iroh/ Last updated: 2025-11-13T18:17:23Z |
72cb071 to
db712c0
Compare
4827e62 to
946f71c
Compare
Contributor
Author
|
/netsim report |
This is replaced by the endpoint_two_relay_only_becomes_direct test.
This was a weird relic from the past. The state it maintains is much clearer without the inner. This also comments out the unused PathSelection for now. We do need to bring that back somehow though.
## Description This improves how we expose paths and path stats for connections, and also updates feat-multipath to use n0-computer/quinn#168. * The watcher for open paths internally uses a SmallVec to not allocate in the common case of not-too-many paths * The path info for a path now includes a boolean whether this is the currently selected primary transmission path * We no longer expose PathIds to users * We expose stats for paths from `Connection::path_stats` ## Breaking Changes <!-- Optional, if there are any breaking changes document them, including how to migrate older code. --> ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist <!-- Remove any that are not relevant. --> - [ ] Self-review. - [ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [ ] Tests if relevant. - [ ] All breaking changes documented. - [ ] List all breaking changes in the above "Breaking Changes" section. - [ ] Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are: - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc) - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip) - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs) - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe) - [ ] [`sendme`](https://github.com/n0-computer/sendme) --------- Co-authored-by: Floris Bruynooghe <[email protected]>
## Description This has a few minor cleanups without any functional changes in the endpoint state actor: * Remove double handle upgrade * Add helper function `to_transport_addr` on the relay mapped addr map * Use hash map `entry` API instead of `get` and `expect` * Use `if let` chains to remove a level of indentation ## Breaking Changes <!-- Optional, if there are any breaking changes document them, including how to migrate older code. --> ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist <!-- Remove any that are not relevant. --> - [ ] Self-review. - [ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [ ] Tests if relevant. - [ ] All breaking changes documented. - [ ] List all breaking changes in the above "Breaking Changes" section. - [ ] Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are: - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc) - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip) - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs) - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe) - [ ] [`sendme`](https://github.com/n0-computer/sendme)
In the endpoint state actor, this uses the `Connection::on_closed` future added in n0-computer/quinn#153 to remove connections once they are closed instead of relying on manual cleanup.
…async (#3629) ## Description Currently, `Magicsock::register_connection` is a sync function, but needs to send over an async channel to notify the endpoint state actor about the new connection. It currently employs a hack to achieve that: it spawns a tokio task for sending the message. This PR cleans this up by making `regsiter_connection` return a future, and awaits this future at the various sites where we go from quinn::Connection to iroh Connection. Luckily, all these call sites already are in async contexts. * When going from `Connecting` or `Accepting` to `Connection`, we await the registration after having the `quinn::Connecting` completes. The future is stored in an option instead of using a state enum as you would usually, because we need unconditional access to the `quinn::Connecting` in the functions on `Connecting`/`Accepting`. * For the `(Incoming|Outgoing)ZeroRttConnection`, we store a future that first awaits the handshake and then registers the connection. So we need only a single future here. With `register_connection` being async, we can also clean up some of the not-so-nice things introduced in #3622: Because we now have an async function, we can let the endpoint state actor return a reply. This makes it much more straightforward because we can have the endpoint state actor initialize a watcher for the paths and return it instead of having to do a weird dance with parts of the state being initialized or stored outside of the endpoint state actor to satisfy the sync function constraints. This is much nicer now IMO. ## Breaking Changes <!-- Optional, if there are any breaking changes document them, including how to migrate older code. --> ## Notes & open questions This adds a boxed future into the process of going from a `Connecting` to a `Connection`. If we really wanted, we could use a manually implemented future instead. However, I don't think one boxed future *per connection* is an issue, so I'd prefer to leave it like this (implementing a manual future for `tokio::mpsc::Sender::send` is cumbersome). ## Change checklist <!-- Remove any that are not relevant. --> - [ ] Self-review. - [ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [ ] Tests if relevant. - [ ] All breaking changes documented. - [ ] List all breaking changes in the above "Breaking Changes" section. - [ ] Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are: - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc) - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip) - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs) - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe) - [ ] [`sendme`](https://github.com/n0-computer/sendme)
First step for #3641, the rest can be done once quic holepunching has landed Unfortunately `send_disco_message` also needs the sender, so it can't be fully removed from the `EndpointState`
## Description * Remove `Endpoint::conn_type` * Update `transfer.rs` example to use `Connection::paths` instead * Change return type of `Connection::paths` to have more guarantees on the return type (Send, Unpin, 'static) ## Breaking Changes <!-- Optional, if there are any breaking changes document them, including how to migrate older code. --> ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist <!-- Remove any that are not relevant. --> - [ ] Self-review. - [ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [ ] Tests if relevant. - [ ] All breaking changes documented. - [ ] List all breaking changes in the above "Breaking Changes" section. - [ ] Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are: - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc) - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip) - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs) - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe) - [ ] [`sendme`](https://github.com/n0-computer/sendme)
Contributor
Author
|
/netsim report |
## Description Alternative to #3631 Replaces the `Watchable`s for path changes on the `Connection` with a boxed `Watcher`. The watcher is boxed because it would increase the `Connection` struct size significantly otherwise because the mapped-and-joined watcher with a `SmallVec` of `PathInfo` inside is ~600 bytes atm. The benefit of storing a `Watcher` and not a `Watchable` is that the watcher streams now close once the EndpointStateActor drops the state for the connection, which it does after the connection is closed. Also adds a test for path watching, including testing that the streams now close when the connection closes. ## Breaking Changes <!-- Optional, if there are any breaking changes document them, including how to migrate older code. --> ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist <!-- Remove any that are not relevant. --> - [ ] Self-review. - [ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [ ] Tests if relevant. - [ ] All breaking changes documented. - [ ] List all breaking changes in the above "Breaking Changes" section. - [ ] Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are: - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc) - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip) - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs) - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe) - [ ] [`sendme`](https://github.com/n0-computer/sendme)
## Description Fixes #3638 (partially) This is a first, small solution to stop inactive endpoint actors after an idle timeout. I implemented it such that the *actor* decides once to stop, while making sure that we *never* create senders to actors that are shutting down. Logic in the actor: * The actor enters an idle timeout (set to 60 seconds) once it has no active connections, an empty inbox, and no inbox senders * Once the timeout expires, it is rechecked that the idle conditions hold, and if so the actor exits * Once any of the idle conditions don't hold anymore, the idle timeout is deactivated and restarted once the conditions are met again The actor checks if the inbox's sender strong count equals 1, which means that no senders exist apart from the one held in the endpoint map. This check is protected with a mutex, to enter a critical section for closing the inbox while the lock is held in case the conditions are met. This is to ensure that there cannot be a race condition where a sender is cloned out right after the check in the actor returns true, but before the inbox is closed. Logic in the endpoint map: * When handing out senders, we acquire the shared lock, and check that the channel is not closed while the lock is held. This ensures that the actor never closes while a sender is alive. If the actor is closed, we remove the handle to the dead actor and create a new actor. * On regular intervals (set to 60 seconds) the magicsock actor removes handles to dead actors. ## Breaking Changes <!-- Optional, if there are any breaking changes document them, including how to migrate older code. --> ## Notes & open questions * I *think* my logic around the critical section and ensuring that we never close the actor while senders exist is sound. However, it needs careful review and tests. I'll do some thinking on how to best test this. * Instead of employing an interval to remove dead actor handles, we could use a channel where the actor informs an outside-task which endpoint actors terminated, so that the outside-task can then lock the endpoint map and remove just those. Not sure if that's worth it. * Another solution here might be to spawn the actor tasks into a join set in the magicsock actor. However this would need further refactoring and would likely make spawning actors async. I think I'd prefer to keep that sync because it makes the surrounding code a lot simpler. * This does not yet implement some of the more advanced reasoning that #3638 proposes. I think we should start with something simple that prevents memory exhaustion and tweak as needed. However, it could also be argued that we should start with a more featureful design right away. ## Change checklist <!-- Remove any that are not relevant. --> - [ ] Self-review. - [ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [ ] Tests if relevant. - [ ] All breaking changes documented. - [ ] List all breaking changes in the above "Breaking Changes" section. - [ ] Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are: - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc) - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip) - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs) - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe) - [ ] [`sendme`](https://github.com/n0-computer/sendme)
…nnection closes (#3650) ## Description Addresses #3602 for multipath. This implements the first bullet point from @flub's suggestion in the above issue: We clear the `selected_path` once the last known connection to an endpoint closes. This means that a new connection attempt after that will instead send to all addresses again, and avoids the case where we send on e.g. the old port from a previous restart of the endpoint we're connecting to. This turns the `test_0rtt_after_server_restart` test green. ## Notes & open questions I *think* this will still fail if we have an "open" connection to the server that's in the process of timing out and we open a new connection to the restarted server while that's happening. I'm not sure though. ## Change checklist <!-- Remove any that are not relevant. --> - [x] Self-review.
Co-authored-by: Philipp Krüger <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Work on integrating n0-computer/quinn#28 into the iroh magic