-
Notifications
You must be signed in to change notification settings - Fork 513
bolt04: Multi-frame sphinx onion routing proposal #593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Therefore the number of frames allocated to the current hop is given by `num_frames = (num_frames_and_realm >> 4) + 1`. | ||
For simplification we will use `hop_payload_len` to refer to `num_frames * FRAME_SIZE`. | ||
|
||
In order to have sufficient space to serialized the `raw_payload` into the `hop_payload` while minimizing the number of used frames the number of frames used for a single `hop_payload` MUST be equal to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: tiny typo (serialized
should be serialize
)
1. type: `per_hop` (for `realm` 0) | ||
1. type: `hop_payload` | ||
2. data: | ||
* [`1`: `num_frames_and_realm`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there use-cases where one realm
could use multiple frame sizes? And if that's the case, how would the application interpret that? It seems to me that the application would always need to map num_frames
+ realm
to an object of a specific size.
I saw in another PR a proposal to rename this field packet_type
and I expected each packet type to have a constant, fully-specified length right? If that's the case we could still rename this field to packet_type
(instead of num_frames_and_realm
), keep it a full byte, and the packet value would implicitly let the application know the number of frames (and the raw_payload_len
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm mislead because I haven't dived enough in the proposals to integrate TLV. Did you make it this way to let TLV handle the definition of the raw_payload
? But even then, it seems that encoding the num_frames
in this field would be redundant with the length encoded in the TLV...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed intentional that we have two fields in a single byte. num_frames
are the MSB 4 bits, whereas the realm
, or packet_type
as it is called elsewhere, consists of the lower 4 bits. The reason why we are handling it this way is that the former realm byte is the only byte that has contained metadata about the payload so far, and we keep it this way for simplicity. It still allows for a total of 16 realms/packet types/payload types and up to 15 additional frames. The latter may appear limiting, however remember that the onion construction is limited to 20 frames in total anyway, so being able to use 16 (1 default + 15 additional ones) for a single payload seems extremely nice to me 😉
The reason we don't defer the determination of the length to the TLV payload itself, is that it'd be a layering violation (the payload would be telling us its length, and therefore how many frames to use in the onion which is the payload's transport layer). It'd also require us to parse the content before being able to forward the packet, which is a timing side-channel on the content of the payload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation.
The layer violation makes a lot of sense, I get why you made that choice now.
I'm completely fine with the proposal then, I believe the simplicity and nice layering separation make up for the small amount of potentially wasted bytes (in the padding).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll start working on an implementation in scala for eclair ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so being able to use 16 (1 default + 15 additional ones) for a single payload seems extremely nice to me
But it doesn't allow a node to use all payloads for something like a single hop route containing the max amount of information. I don't think we need to unnecessarily paint our selves into a corner with limiting work arounds like this. Instead, the realm can stay as it is, and a byte of the hop payload be used for type purposes. This would also allow for "type name-spacing", as I can send you frames, each using a distinct TLV namespace. The current proposal doesn't allow such composition, and seems to assume that 255 types will be enough for the foreseeable future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, the realm can stay as it is, and a byte of the hop payload be used for type purposes
The issue with that is that it breaks the current component layering.
What's nice with having the frame count in the first byte is that the sphinx layer knows that it only has to look at this first byte to split the packet correctly and pass it off to other components for actual interpretation of the plaintext payload.
If we want to put the frame count in another byte than the first one, it will be a breaking change (and we can't have backwards-compatibility so it's going to be a lot harder to deploy).
We said during yesterday's meeting that we will eventually need to make some changes around version management for the onion packet (to support per hop versioning). I think at that point we will introduce a version 1 with breaking changes, and we can re-visit the onion structure entirely to address these shortcomings. Does that sound acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to put the frame count in another byte than the first one, it will be a breaking chang
I don't see why this is the case. Today, we have 13 unused bytes that are "padding", I'm proposing using one of those bytes as the number of frames, and using the realm byte as the packet type. Older nodes have their behavior unchanged as that byte is still zero for them. For any older nodes that don't signal this new feature, you wouldn't include any EOB data at all.
The current split also means we only have 16 possible packet types. In this day and age, I think we can all agree that's far too little for any sort of reasonable extensibility. If we use the entire realm/packet type, then we at least gain 255 types, which can then be augmented by having distinct TLV namespaces under each of those packet types.
ephemeralPrivKey := btcec.PrivKeyFromBytes(btcec.S256(), ephemeralKey.Bytes()) | ||
ephemeralPubKey := ephemeralPrivKey.PubKey() | ||
// NewOnionPacket creates a new onion packet which is capable of | ||
// obliviously routing a message through the mix-net path outline by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo: outlined
I'm especially interested in simple multi-part payments, is there a discussion about them anywhere? |
they're rather simple, since we just need to define a single TLV key which is used to specify the total amount to wait for. |
overlay network): more specifically, _sending peers_ forward packets | ||
to_receiving peers_. | ||
- Route length: the maximum route length is limited to 20 hops. | ||
- The longest route supported has 20 hops without counting the _origin node_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should differentiate between the objective and subjective routing lengths? So in other words, the "true" route length (number of nodes that the HTLC will travel through), and the packet-level route length (20 max).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed all mentions of that magic 20.
- The terms _hop_ and _node_ are sometimes used interchangeably, but a _hop_ | ||
usually refers to an intermediate node in the route rather than an end node. | ||
_origin node_ --> _hop_ --> ... --> _hop_ --> _final node_ | ||
usually refers to an intermediate node in the route rather than an end node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that prior example was helpful, it aides in identifying that a route with 3 nodes has 2 hops for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-added.
@@ -150,52 +155,75 @@ The overall structure of the packet is as follows: | |||
2. data: | |||
* [`1`:`version`] | |||
* [`33`:`public_key`] | |||
* [`20*65`:`hops_data`] | |||
* [`20*FRAME_SIZE`:`hops_data`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should also lift 20
into a MAX_HOPS
constant while we're at it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since FRAME_SIZE
(65 bytes) is fixed and hops_data
length is fixed too (1300 bytes), I suggest FRAME_COUNT
(max doesn't really make sense because it implies we could use less, which would result in a smaller onion which isn't allowed).
1. type: `per_hop` (for `realm` 0) | ||
1. type: `hop_payload` | ||
2. data: | ||
* [`1`: `num_frames_and_realm`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened to the packet type? IMO we still shouldn't intermingle them, as it's feasible that a distinct realm may also warrant an entirely different interpretation of the TLV values.
1. type: `per_hop` (for `realm` 0) | ||
1. type: `hop_payload` | ||
2. data: | ||
* [`1`: `num_frames_and_realm`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so being able to use 16 (1 default + 15 additional ones) for a single payload seems extremely nice to me
But it doesn't allow a node to use all payloads for something like a single hop route containing the max amount of information. I don't think we need to unnecessarily paint our selves into a corner with limiting work arounds like this. Instead, the realm can stay as it is, and a byte of the hop payload be used for type purposes. This would also allow for "type name-spacing", as I can send you frames, each using a distinct TLV namespace. The current proposal doesn't allow such composition, and seems to assume that 255 types will be enough for the foreseeable future.
receiving peer in the route. | ||
- The `hops_data` field is right-shifted by `hop_payload_len` bytes, discarding the last `hop_payload_len` bytes that exceed its 1300-byte size. | ||
- The payload for the hop is serialized into that hop's `raw_payload`, using the desired format, and the `num_frames_and_realm` is set accordingly. | ||
- The `num_frames_and_realm`, `raw_payload`, `padding` and `HMAC` are copied into the first `hop_payload_len` bytes of the `hops_data`, i.e., the bytes that were just shifted in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be hop_payload_len_i
?
// Before we assemble the packet, we'll shift the current | ||
// mix-header to the right in order to make room for this next | ||
// per-hop data. | ||
rightShift(mixHeader[:], path[i].HopPayload.CountFrames()*frameSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use a comment here along the lines:
CountFrames() returns the total number of frames (factoring in any needed padding) required to fully encode the hop payload.
@@ -0,0 +1,43 @@ | |||
{ | |||
"comment": "A simple error returned by node hops[4], the public/private keys and shared_secrets are identical to the ones used in `onion-test-v0.json`", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay JSON!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know 😉 But it was the simplest way I could figure out to add a JSON-compatible comment.
The clarifications were tacked on after the fact, but they should really be part of the conventions. I also updated the links to use the reference style, which results in better text flow and makes it easier to read the source. Signed-off-by: Christian Decker <[email protected]>
Introduces `frame`s and all related concepts such as `FRAME_SIZE`. It also fixes a conceptual issue where `hops_data` would be used both for the section of the overall packet as well as the wire format for individual payloads. This separation now lends itself better to the incremental wrapping of the onion. Signed-off-by: Christian Decker <[email protected]>
Signed-off-by: Christian Decker <[email protected]>
They were cluttering the spec document, and not easily integrated into the tests for implementations. This just cleans up the spec and allows automated testing going forward. Signed-off-by: Christian Decker <[email protected]>
Signed-off-by: Christian Decker <[email protected]>
(Sorry, I could not reply to the comment inline for some reason, so adding
I think you meant keeping the realm byte the same (maybe renaming it to packet While I understand your objection to limiting the number of frames we can use
TBH I'm personally not a big fan of this kind of byte-stuffing (splitting a
With the current proposal we still have 15 packet types ( |
Yeah exactly!
This is a completely arbitrary distinction, and it's also how things work in this proposal as is (they need to check the realm to know if there're any additional frames). It's also possible to keep the realm as is, and use 2 of the filler bytes (type and number of frames). It really depends on what demarcates the packet header, and the payload itself. I'm proposing the "packet header" stops at the 2 extra (or just 1) padding byte.
I disagree, it's just about lifting arbitrary limitations like only having 16 packet types or being able to only use 16 frames. It's up the developers what this space will be used for, not us, and we shouldn't make value judgements on certain classes of use cases that don't have an averse effect on network health.
OK, so we're partially on the same page here. The distinction of a "layer" in this case is completely arbitrary. It's all the same packet, the only distinction is what we deem to be the packet header vs the body/payload. 15 packets type are simply not enough.
And by using another byte or two, we can avoid this bit packing, and gain access to 255 packet types and 65K total TLV types. IMO it's short sighted to conclude that 15 types is anywhere near enough given all the developer excitement around this feature (at the SF Lightning Dev meetup last night, multiple devs asked me when the EOB space was finally going to be opened up). |
It is not arbitrary. I want to separate the processing of the onion as much from the processing of the payload as possible. If we use 2 bytes as you suggest the second byte is both in the payload region (legacy
But it is our job to ensure the incentives are set correctly, otherwise we end up with a network that is full of single-hop payments, and we might as well strike the "network" part off the LN name.
Like I mentioned above it is not an arbitrary distinction, and 16 payload types ought to be enough for the foreseeable future. |
Wouldn't that concern be better addressed by making the type in the TLV definition a Notice also that my prior observation about having Another compromise then would be to use 3 bytes for packet-type (given that we now have 4 billion possible TLV-types, we don't need the hack of using packet-type bits to namespace the TLV-types), then we could use 5 bits for the |
For example, @cdecker used a FRAME_SIZE constant in lightning#593 which broke parsing. Signed-off-by: Rusty Russell <[email protected]>
I seem to have forgotten this the first time around, so here it goes.
The arbitrary line is what counts as the "payload" and what constitutes the "packet header". In my model, the payload starts after these 1 or 2 bytes which were previously the padding.
A var in with what max size? In any case, even with more bytes for the TLV types themselves, having packet types be able to modify the namespace is still warranted, as it creates an explicit new context for any TLV namespaces.
If each frame specifies a packet type, then there's no issue here.
Which 3 bytes? Why's this better than just using 1 byte for the packet type and 1 for the number of frames? |
Do you really care whether we use 1, 2 or 4 bytes? All foreseeable use-cases would be 1 byte or 2 bytes anyway, and if developers want to waste extra space encoding TLV custom types they can. I'm now convinced that varint types for TLV are way more sensible because they keep the type information in one location instead of being spread across a realm and a type byte. If you really want namespacing we can define ranges in the TLV type number space, but let's not spread type information into two fields that have very different scopes.
Please, just no. That'd mean we have to split any payload that doesn't align with the 65 byte frame boundary in order to be able to have that special realm byte, and we'd lose the nice property of just being able to read memory instead of re-assembling the data. The varint TLV types are much cleaner in this case.
My bad, I meant bits. So the split of the realm byte (8 bits) would be 3 bits for packet types and 5 bits for |
Given the limited space in the payload, it's a pertinent question as it also determines the total number of types as well as lost space due to signalling overhead. I'm OK with just extending the number of types rather than splitting the context across another field. Though this is out of the scope of this PR as it's a matter of the TLV format in the onion.
Hmm, down to 8 packet types...I guess it all depends on what we think the packet types will be used for in the future. Rusty's modified proposal introduces the concept of using them to re-define the format of the existing hop payload. Beyond that we also have the onion packet type itself to modulate which grants additional freedom. |
07-routing-gossip.md
Outdated
@@ -280,6 +280,13 @@ The following `address descriptor` types are defined: | |||
`[32:32_byte_ed25519_pubkey] || [2:checksum] || [1:version]`, where | |||
`checksum = sha3(".onion checksum" | pubkey || version)[:2]`. | |||
|
|||
The following feature bits are currently assigned by this specification: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shouldn't feature flags be speced in Bolt 9 instead of scattered accross bolts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I misread bolt 09 to be specific for globalfeatures
and localfeatures
. Fixup appended.
Just to make sure I understood properly yesterday's decision in the IRC meeting: if we were to merge this version of the PR, we would do a follow-up PR to spec that That would address the discrepancy between this PR and #604 (in #604 this 1-frame TLV payload is realm value 0x01 too). |
Signed-off-by: Christian Decker <[email protected]>
Yes, that's the plan. The followup would define these realm bytes for TLV payloads:
Due to the switch to a 5bit + 3 bit split the numbers don't align on the nibble in the hex representation anymore, but that's just cosmetic at this point 😉 |
Perfect that makes total sense, I'm on board with that. |
Ack from me, this is good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coming along nicely :)
and `r` 65-byte `per_hop` payloads. The construction returns a single 1366-byte | ||
packet along with the first receiving peer's address. | ||
packet. | ||
Constructing a packet routed over `r` hops requires `r` 32-byte ephemeral public keys, `r` 32-byte shared secrets, `r` 32-byte blinding factors, and `r` `hop_payload` payloads (each of size `hop_payload_len_i` bytes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks to be a preexisting inaccuracy. shouldn't this read "33-byte ephemeral public keys"?
@@ -511,22 +535,20 @@ Next, the processing node uses the shared secret to compute a _mu_-key, which it | |||
in turn uses to compute the HMAC of the `hops_data`. The resulting HMAC is then | |||
compared against the packet's HMAC. | |||
|
|||
Comparison of the computed HMAC and the packet's HMAC MUST be | |||
time-constant to avoid information leaks. | |||
Comparison of the computed HMAC and the packet's HMAC MUST be time-constant to avoid information leaks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also preexisting: should time-constant be constant-time? suppose either is correct, just never hear it described this way :P
ACK from me too, shall we proceed? Or do we need to wait for next monday's meeting? |
@t-bast probably wait till the meeting. Typically hold off on merges unless it falls under the typo rule |
This is needed due to the split into 5 bits + 3 bits, since the previous test-vector was using the 4 bits + 4 bits split. Signed-off-by: Christian Decker <[email protected]>
Ok, just updated the test-vector to reflect the 5 num_frames bits + 3 packet-type bits ^^ |
Have we considered dropping the 65-byte frames and instead always assuming a single 65 byte frame, then making the remainder variable length? The 65-byte frames can be pretty wasteful for payloads that need a lot of padding. |
@Roasbeef this is a much bigger change that has a potential impact on security (bigger deviation from the Sphinx paper). This is an interesting idea to explore, but could we shelve that for future improvements? |
I understand the security argument behind maintaining a fixed-sized packet during transit. I'm not suggesting making the packet variable sized. Instead, curious as to if there's anything fundamental that requires us to maintain a 65-byte frame size (what stops a 2-byte frame size as an example assuming padding is handled properly at the crafter, and destination?). As is now, the padding can regularly cause a payload to consume an entire hops worth of data, when it may only need a few bytes from the current frame size. |
Got it. I think there's nothing that ties us to a 65-byte frame size as long as the whole packet is fixed-size (and in my trampoline onion proposal I suggest an 80-byte frame size for the trampoline onion). It might be a bit more expensive to create the onion if we use smaller frames, but that would probably not be noticeable (to be tested). |
As discussed during the last specification meeting I will go back, and make the hop size fluid by using a Leaving this one open just in case... See meeting notes for reference. |
Following the discussion on the mailing list, and the two implementations in Go and C, I went ahead and drafted the proposed specification changes.
Aside from some copy-editing and reformatting the changes include:
frame
as a 65 byte size unit that can be used to serialize data in (and the associatedFRAME_SIZE
constant to make it easier to spot).num_frames_and_realm
for the formerrealm
byte which is being reused to specify both type/realm as well as how many additionalframe
s should be consumed by the processing node.raw_payload
which is the serialized information that we want to communicate to the processing node in the first place.hop_payload
to refer to the the payload composed of multipleframe
s, that includes thenum_frames_and_realm
byte, theraw_payload
, any padding, and theHMAC
for the next hop.raw_payload
parsing from the pure data extraction from thehop_payload
, since we will soon have multiple ways to parse it once the TLV proposal is live (and I'd like to retire the v1hop_data
format in favor of using the more flexible TLV in the onion).hop_data
format.Notice that this proposal is perfectly backwards compatible, given that we just rename some of the concepts that are already present and make the two shifts variable.
The reason I'd like to get the ball rolling again, is that there hasn't been a lot of pushback, and there are now quite a few nice features that rely on the ability of transporting more information in the onion, including rendezvous routing, spontaneous payments, simple multi-part payments, and trampoline payments (outsourcing route finding to some other node).
Finally these are the PRs that implement this proposal:
Closes #586