-
Notifications
You must be signed in to change notification settings - Fork 516
Attribution data (feature 36/37) #1044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I've started implementing it in eclair, do you have some test vectors so we can check that we are compatible? |
I don't have test vectors yet, but I can produce them. Will add them to this PR when ready. Capping the max hops at a lower number is fine to me, but do you have a scenario in mind where this would really make the difference? Or is it to more generally that everything above 8 is wasteful? |
4b48481
to
24b10d5
Compare
@thomash-acinq added a happy fat error test vector. |
24b10d5
to
76dbf21
Compare
09-features.md
Outdated
| 20/21 | `option_anchor_outputs` | Anchor outputs | IN | `option_static_remotekey` | [BOLT #3](03-transactions.md) | | ||
| 22/23 | `option_anchors_zero_fee_htlc_tx` | Anchor commitment type with zero fee HTLC transactions | IN | `option_static_remotekey` | [BOLT #3][bolt03-htlc-tx], [lightning-dev][ml-sighash-single-harmful]| | ||
| 26/27 | `option_shutdown_anysegwit` | Future segwit versions allowed in `shutdown` | IN | | [BOLT #2][bolt02-shutdown] | | ||
| 28/29 | `option_fat_error` | Can generate/relay fat errors in `update_fail_htlc` | IN | | [BOLT #4][bolt04-fat-errors] | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this big gap in the bits has emerged here because of tentative spec changes that may or may not make it. Not sure why that is necessary. I thought for unofficial extensions, the custom range is supposed to be used?
I can see that with unofficial features deployed in the wild, it is easier to keep the same bit when something becomes official. But not sure if that is worth creating the gap here? An alternative is to deploy unofficial features in the custom range first, and then later recognize both the official and unofficial bit. Slightly more code, but this feature list remains clean.
Added fat error signaling to the PR. |
76dbf21
to
2de919a
Compare
I've spent a lot of time trying to make the test vector pass and I've finally found what was wrong:
implying that we need to concatenate them in that order. But in your code you follow a different order:
I think the order message + hop payloads + hmacs is more intuitive as it matches the order of the fields in the packet. |
Oh great catch! Will produce a new vector. |
2de919a
to
bcf022b
Compare
@thomash-acinq updated vector |
Updated LND implementation with sender-picked fat error structure parameters: lightningnetwork/lnd#7139 |
bcf022b
to
6bf0729
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see the start time and end time of the htlc_hold_time
defined. I assume these are upon receiving the update_add_htlc
and before sending the update_fulfill_htlc
/update_fail_htlc
, respectively?
Overall, I like the idea of htlc_hold_time
but suspect it'll be gamed by routing nodes.
A motivated sender can always point point what the actual forwarding delays are by bisecting the route. They also always know how long the entire route took as well. You are correct though that it's intended mainly on a best effort basis, the overall assumption is that this can be used to allow a "fast" node to distinguish themselves, and be rewarded for that. |
Attribution data is added to both failed and fulfilled HTLCs lightning/bolts#1044
In lightning#1044, we introduced a 12-blocks delay before considering a channel closed when we see a spending confirmation on-chain. This ensures that if the channel was spliced instead of closed, the channel participants are able to broadcast a new `channel_announcement` to the rest of the network. If this new `channel_announcement` is received by renote nodes before the 12-blocks delay, the channel can keep its history in path finding scoring/reputation, which is important. We then realize that 12 blocks probably wasn't enough to allow this to happen: some implementations default to 8 confirmations before sending `splice_locked`, and allow node operators to increase this value. We thus bump this delay to 72 blocks to allow more time before channels are removed from the local graph.
As @Roasbeef says, in any case the sender knows how long the entire route took. The total penalty to apply is known. Without additional information, the sender might apply that penalty equally across all nodes. The only thing the reported hold time does, is shift the penalty to better match reality. If a node reports zero, it doesn't mean that it doesn't get any penalty. It just means that the sender concludes that if that zero is true, that the outgoing side of that node is fast, and all of the to-be-distributed penalty for that node needs to be directed to its incoming side. There is some gaming potential in there, but I think it is limited and also in a routing node's interest to report realistically. |
Ok, looks like this is ready for final review now. |
ACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't verified the test vectors.
Otherwise some small comments then this is ready to go IMO.
04-onion-routing.md
Outdated
The `update_fulfill_htlc` message contains an optional `attribution_data` field, similar to the one used in the failure | ||
case described above. The only difference is that there is no failure message that is covered by the HMACs. With | ||
attribution data, the sender can obtain hold times as reported and committed to by each hop along the payment path. This | ||
field should not be set when the node is part of a blinded route. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For errors we set this data if path_key
is not set in update_add_htlc
, which means we'll set it for introduction nodes and nodes that aren't in a blinded route at all.
To me "part of a blinded route" could include the introduction route, which would make this inconsistent with the error section. We should express the requirements in a standard way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's indeed unclear. Removed this sentence and added it to the BOLT02 requirements. I think that's better.
For the layout of `attribution data`, see [Removing an HTLC: `update_fulfill_htlc`, `update_fail_htlc`, and | ||
`update_fail_malformed_htlc`](02-peer-protocol.md#removing-an-htlc-update_fulfill_htlc-update_fail_htlc-and-update_fail_malformed_htlc). | ||
|
||
The `htlc_hold_times` field specifies the htlc hold time for each hop in units of 100 milliseconds. For example, a value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to define this properly, otherwise each implementation can have its own interpretations, which can also be a privacy leak? For instance, does it start when the node receives an htlc or sends an htlc? and does it end when the node receives the failure msg or sends back the msg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with the 100 ms granularity, we are good now? I also wouldn't be sure that everyone would really implement it exactly the way we'd specify it here. The implementation complexity can vary quite a bit depending on which timing points are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think that once senders start to use timing info during pathfinding, routing nodes will want to optimize the way they report times in their favor, potentially ignoring what we'd write in this document now.
2. types: | ||
1. type: 1 (`attribution_data`) | ||
2. data: | ||
* [`20*u32`:`htlc_hold_times`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the goal of this field here.
- If people are dishonest, which they have incentives to, given there's no penalty to lie here, then this value becomes useless.
- If people are honest, this value creates privacy leaks and side channel attacks, people can use it to fingerprint software or hardware, detect node latency, or its internal state. And for the blinded path this can be used to probe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If people are dishonest, which they have incentives to, given there's no penalty to lie here, then this value becomes useless.
Keep in mind that this value is guarded by 2 more reported values. For example, if we have a path A->B->C->D->E
over which A
pays to E
, let's assume Alice gets the hold times as 4 / 3 / 2 / 1
for B C D & E respectively. Also A
keeps track of the real elapsed time.
If C wants to lie, he can try to report a value other than 3
, but is at the same time guarded by B and D's values, which are 4
and 2
. He can under-report or over-report, and he also doesn't know what the actual values for B and D are. If he crosses the neighboring values then the sender can detect the invalid hold time and normalize/penalize accordingly.
If people are honest, this value creates privacy leaks and side channel attacks, people can use it to fingerprint software or hardware, detect node latency, or its internal state. And for the blinded path this can be used to probe?
That was the whole rationale for the 100ms buckets discussion. This (partially) helps with mitigating fingerprinting (i.e the LND 50ms batch ticker now becomes invisible) and also pushes away from over-optimizing pathfinding around low latency nodes.
For blinded paths (at least on LND) we don't populate the attribution data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If he crosses the neighboring values then the sender can detect the invalid hold time and normalize/penalize accordingly.
B can also be dishonest here. It's likely A will see 0.4/0.3/0.2/0.1
while the measured duration is 4 seconds, so this info alone cannot be used to penalize nodes. With a more sophisticated algo and historical data A can start recognizing patterns, assuming there are incentives to stay honest.
That was the whole rationale for the 100ms buckets discussion.
That helps, but note that there's no perfect hiding as long as you are giving the info. For blinded paths this enables a side channel attack as you can get the hold time from the intro node, then you can maybe dos the suspected node in the blinded path to slow down its processing time and measure again the blinded path. This is already doable, but with the hold time it gives a much clearer signal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
B can also be dishonest here. It's likely A will see 0.4/0.3/0.2/0.1 while the measured duration is 4 seconds, so this info alone cannot be used to penalize nodes. With a more sophisticated algo and historical data A can start recognizing patterns, assuming there are incentives to stay honest.
With this info, A knows that there is at least something slow on its connection to B and A can penalize B. The others got away this time, but in a next round - where B is no longer used - they may also receive a penalty.
@carlaKC, I've pushed a fix up addressing your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
92f8466
to
5d4ca40
Compare
Squashed and rebased without changes or conflicts |
Failure attribution is important to properly penalize nodes after a payment failure occurs. The goal of the penalty is to give the next attempt a better chance at succeeding. In the happy failure flow, the sender is able to determine the origin of the failure and penalizes a single node or pair of nodes.
Unfortunately it is possible for nodes on the route to hide themselves. If they return random data as the failure message, the sender won't know where the failure happened.
This PR proposes a new failure message format that lets each node commit to the failure message. If one of the nodes corrupts the failure message, the sender will be able to identify that node.
For more information, see https://lists.linuxfoundation.org/pipermail/lightning-dev/2022-October/003723.html and https://delvingbitcoin.org/t/latency-and-privacy-in-lightning/1723.
Furthermore, the htlc fail and fulfill flows are extended to convey self-reported htlc hold times to the sender.
Fail flow implementations
LND implementation: lightningnetwork/lnd#9888
LDK implementation: lightningdevkit/rust-lightning#2256
Eclair implementation: ACINQ/eclair#3065
CLN implementation: ElementsProject/lightning#8291
Fulfill flow implementations
LDK implementation: lightningdevkit/rust-lightning#3801
Eclair implementation: ACINQ/eclair#3100