Skip to content

Simple taproot channels #3103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft

Simple taproot channels #3103

wants to merge 6 commits into from

Conversation

sstone
Copy link
Member

@sstone sstone commented Jun 9, 2025

This PR implements lightning/bolts#995 which introduces a new channel format where funding transactions send to an aggregated musig2 public key instead of a 2-of-2 multisig address:

funding and closing transactions become cheaper (by about 15%)
on-chain footprint becomes more private: funding and closing transactions are impossible to distinguish from other p2tr transactions.

This PR is compatible with lightning/bolts@1ad02ee with the assumption that option_simple_close is mandatory for simple taproot channels (which is not yet reflected in the BOLT proposal).

Support for dual-funding and splicing has also been implemented through extensions to the interactive transaction construction protocol.

EDIT: This PR also includes code to upgrade channels to the new taproot commitment format during a splice.

@sstone sstone force-pushed the simple-taproot-channels-v3 branch 5 times, most recently from 8292732 to 4c933f6 Compare June 19, 2025 08:02
@sstone sstone force-pushed the simple-taproot-channels-v3 branch 5 times, most recently from f45142a to 7e101ce Compare June 30, 2025 16:20
Copy link
Member

@t-bast t-bast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice that this doesn't require touching too many parts of the codebase now that we've done some refactoring in previous PRs, it's easy to focus on the important taproot-related features!

Overall, I think the main issue is that there is a lot of unsafe patterns where you introduce require, casting to Right(_), .get on Options and throwing exceptions. This must be improved: we cannot risk having an exception in most of those cases, because it would mean that the actor would crash, which could potentially be exploited to prevent us from publishing penalty transactions if our peer published a revoked commitment while our channel actor constantly crashes.

One of the culprits for that is the partial signing function and the signature aggregation: we must either make the functions that call them return an Either (bubble up the failure), or if we know that we've previously validated the input and it will always work, we should change the low-level signing function to always return a partial signature without ever throwing.

Similarly, when we depend on remote nonces, we should probably use different function overloads that take a non-optional nonce (to prove that we've verified that it existed before calling the function) instead of doing a .get inside the function. That may require creating different function overloads for taproot commitment formats in some cases, but that seems acceptable?

* @return a deterministic nonce
*/
def verificationNonce(fundingTxId: TxId, fundingPrivKey: PrivateKey, index: Long): LocalNonce = {
val nonces = Musig2.generateNonceWithCounter(index, fundingPrivKey, Seq(fundingPrivKey.publicKey), None, Some(fundingTxId.value))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The publicKeys field should also include the remote funding public key, I believe it's always accessible whenever we call this function, isn't it? Same for the signingNonce function below.

Why didn't you also include the fundingTxId as extraInput in the signingNonce function below? It cannot hurt, any randomness is useful (especially since fundingTxId is regularly changed whenever we splice the channel).

@sstone
Copy link
Member Author

sstone commented Jul 1, 2025

Overall, I think the main issue is that there is a lot of unsafe patterns where you introduce require, casting to Right(_), .get on Options and throwing exceptions. This must be improved: we cannot risk having an exception in most of those cases, because it would mean that the actor would crash, which could potentially be exploited to prevent us from publishing penalty transactions if our peer published a revoked commitment while our channel actor constantly crashes.

One of the culprits for that is the partial signing function and the signature aggregation: we must either make the functions that call them return an Either (bubble up the failure), or if we know that we've previously validated the input and it will always work, we should change the low-level signing function to always return a partial signature without ever throwing.

Yes, I first wanted to get everything to work including splices and changes required for the interactive sessions, but I'll change this to safer code now.

Similarly, when we depend on remote nonces, we should probably use different function overloads that take a non-optional nonce (to prove that we've verified that it existed before calling the function) instead of doing a .get inside the function. That may require creating different function overloads for taproot commitment formats in some cases, but that seems acceptable?

I had something similar in mind.

By far the biggest issue I had was nonce management, especially with splices and re-connections, it got much easier when we switched to sending funding tx id -> nonce maps.

@sstone sstone force-pushed the simple-taproot-channels-v3 branch 2 times, most recently from 2755e8d to f74c1e6 Compare July 10, 2025 14:26
t-bast added 4 commits July 15, 2025 11:21
Before introducing channel codecs v5, we add more test data encoded with
the v4 codecs, especially for the fields we intend to change.
We introduce a new channel codec version where:

- we extract per-commitment parameters (commitment format and params
  such as `dust_limit` which we may want to dynamically update in the
  future)
- we clean-up the splice iterations on the `Commitments` structure
- we remove the funding transaction from the confirmed local funding
  status (once confirmed, we don't need it anymore: it is wasting DB
  space for no good reason, we can get it from the blockchain)
- we add a `maxClosingFeerate_opt` to the closing state, to allow
  limiting the feerate used in RBF attempts for safe transactions
And remove the unused sealed trait, the commitment format contains the
information we need so we don't need multiple child classes. We still
use a `discriminated` at the codec level though to allow changing the
format in the future if needed.
We want to be able to update channel types during splices, which means
that channel type features will not be permanent anymore. We decouple
permanent channel features (that apply to the lifetime of the channel)
from channel type features to allow this.
@sstone sstone force-pushed the simple-taproot-channels-v3 branch from f74c1e6 to 227a8e6 Compare July 15, 2025 17:36
@t-bast
Copy link
Member

t-bast commented Jul 16, 2025

Here is a (non-exhaustive) list of important tests that must be added to ensure that we have good enough coverage compared to other commitment formats:

  • add unit tests of the various stages of channel opening (for single-funding and dual-funding), verifying in particular that nonces are different every time and that we handle the cases where nonces or partial sigs are missing
  • add tests for the interactive-tx process: verify that nonces change every time a new tx_add_input / tx_add_output is added, signatures work, that we reject invalid stuff, etc
  • add tests for reconnection while a taproot interactive-tx is being signed (commit_sig / tx_signatures retransmission)
  • add tests for normal channel operations (adding HTLCs, fulfill / fail, etc)
  • verify that we correctly force-close if a commit_sig is missing partial sigs, or if commit_sig or revoke_and_ack is missing nonces
  • add tests for reconnection during normal operation:
    • disconnect when one side has sent update_add_htlc and commit_sig but the other side has not received commit_sig (should be re-signed with new nonces)
    • disconnect when one side has sent commit_sig, the other side received it and sent revoke_and_ack, which hasn't been received yet (retransmist revoke_and_ack and re-sign commit_sig)
    • resume using the channel for HTLCs (I'm not sure we correctly clear the nonces maps on disconnection yet)
  • in ClosingStateSpec.scala, there are a lot of tests that we run for each commitment format: add one instance for each of the two taproot commitment formats
  • Commitment upgrade during a splice tests:
    • add test for the normal scenario where we upgrade from anchor outputs to taproot: exchange HTLCs after the splice before the taproot tx confirms (the batch of commit_sig messages will contain a mix of taproot and segwit v0 transactions)
    • add force-close tests for various scenarios, such as:
      • our peer force-closes using the pre-taproot commitment (non-revoked, with pending HTLCs in both directions)
      • our peer force-closes using the taproot commitment (non-revoked, with pending HTLCs in both directions)
      • our peer force-closes using an active pre-taproot revoked commitment (with pending HTLCs in both directions)
      • our peer force-closes using an inactive pre-taproot revoked commitment (with pending HTLCs in both directions)
      • our peer force-closes using a revoked taproot commitment (with pending HTLCs in both directions) while we also have a non-taproot active commitment (we haven't yet received splice_locked for the taproot migration)
      • our peer force-closes using an inactive revoked taproot commitment
  • in ChannelIntegrationSpec.scala, add a test suite for the taproot commitment format

One thing that could make sense would be to update many of the existing channel unit tests that currently use the default commitment format (which we plan to remove) to instead use taproot (mostly in NormalStateSpec): this would be less effort than writing new tests, would provide some test coverage, and would make it easier to remove the default commitment format in the future.

@sstone sstone force-pushed the simple-taproot-channels-v3 branch 3 times, most recently from 1f96546 to d1659b9 Compare July 22, 2025 17:42
@sstone sstone force-pushed the simple-taproot-channels-v3 branch 4 times, most recently from 1cee368 to ed25c16 Compare July 24, 2025 18:02
@sstone sstone force-pushed the simple-taproot-channels-v3 branch 2 times, most recently from 3a4d368 to 3dadcc6 Compare July 29, 2025 08:31
We add new commitment format and TLV extensions to include musig2 nonces.

We update the single-funging,dual-funding and splicing workflows:

We extends the interactive tx session to include:
- an optional funding nonce for the shared input (i.e. the funding tx that is being spent)
- a nonce for the commit tx that is being created, and another nonce that will become the channel's "next remote nonce" once the session completes
The funding nonce is random and its lifecycle is bound to the interactive session.

revoke_and_ack is extended to include a list of funding_tx_id -> nonce tuples (one for each active commitment).

channel_restablish is extended to include the same list of funding_tx_id -> nonce tuples, as well as an optional "current commit nonce" if we got disconnected
while a splice was in progress before both nodes exchanged their commit signatures: if that is the case, we need to re-send our peer's current signature
and will use this nonce to compute it.

We also update the simple close protocol to include closing nonces.

And we allow upgrading channels to taproot during splices, with an optional channel_type TLV added to splice_init/splice_ack.
@sstone sstone force-pushed the simple-taproot-channels-v3 branch from 3dadcc6 to be3eb9e Compare July 29, 2025 14:55
@sstone
Copy link
Member Author

sstone commented Jul 29, 2025

Here is a (non-exhaustive) list of important tests that must be added to ensure that we have good enough coverage compared to other commitment formats:

  • add unit tests of the various stages of channel opening (for single-funding and dual-funding), verifying in particular that nonces are different every time and that we handle the cases where nonces or partial sigs are missing

  • add tests for the interactive-tx process: verify that nonces change every time a new tx_add_input / tx_add_output is added, signatures work, that we reject invalid stuff, etc

  • add tests for reconnection while a taproot interactive-tx is being signed (commit_sig / tx_signatures retransmission)

  • add tests for normal channel operations (adding HTLCs, fulfill / fail, etc)

  • verify that we correctly force-close if a commit_sig is missing partial sigs, or if commit_sig or revoke_and_ack is missing nonces

  • add tests for reconnection during normal operation:

    • disconnect when one side has sent update_add_htlc and commit_sig but the other side has not received commit_sig (should be re-signed with new nonces)
    • disconnect when one side has sent commit_sig, the other side received it and sent revoke_and_ack, which hasn't been received yet (retransmist revoke_and_ack and re-sign commit_sig)
    • resume using the channel for HTLCs (I'm not sure we correctly clear the nonces maps on disconnection yet)
  • in ClosingStateSpec.scala, there are a lot of tests that we run for each commitment format: add one instance for each of the two taproot commitment formats

  • Commitment upgrade during a splice tests:

    • add test for the normal scenario where we upgrade from anchor outputs to taproot: exchange HTLCs after the splice before the taproot tx confirms (the batch of commit_sig messages will contain a mix of taproot and segwit v0 transactions)

    • add force-close tests for various scenarios, such as:

      • our peer force-closes using the pre-taproot commitment (non-revoked, with pending HTLCs in both directions)
      • our peer force-closes using the taproot commitment (non-revoked, with pending HTLCs in both directions)
      • our peer force-closes using an active pre-taproot revoked commitment (with pending HTLCs in both directions)
      • our peer force-closes using an inactive pre-taproot revoked commitment (with pending HTLCs in both directions)
      • our peer force-closes using a revoked taproot commitment (with pending HTLCs in both directions) while we also have a non-taproot active commitment (we haven't yet received splice_locked for the taproot migration)
      • our peer force-closes using an inactive revoked taproot commitment
  • in ChannelIntegrationSpec.scala, add a test suite for the taproot commitment format

One thing that could make sense would be to update many of the existing channel unit tests that currently use the default commitment format (which we plan to remove) to instead use taproot (mostly in NormalStateSpec): this would be less effort than writing new tests, would provide some test coverage, and would make it easier to remove the default commitment format in the future.

Almost all of these have been implemented by creating a superclass of the original test that uses taproot channels.
Upgrading NormalStateSpec first to use anchor outputs (and then taproot channels) would be a significant change that is better done is a specific PR.
ChannelIntegrationSpec tests rely on channel announcements and there is no clean way to upgrade them for taproot channels (which for now is restricted to private channels).

@t-bast
Copy link
Member

t-bast commented Aug 8, 2025

Almost all of these have been implemented by creating a superclass of the original test that uses taproot channels.
Upgrading NormalStateSpec first to use anchor outputs (and then taproot channels) would be a significant change that is better done is a specific PR.

Unfortunately, duplicating test suites like you did wasn't a good strategy: it gave a false sense of security but wasn't actually adding coverage where we need it. I've reverted most of those changes in #3136 (with more details about the motivation behind it) and added more focused tests (which let me find many subtle bugs).

* Add more parameters to nonce generation

We provide both funding keys to the nonce generation algorithm, which
better matches the interface provided by libsecp256k1. We also provide
the `funding_txid` to the signing nonce generation to improve the extra
randomness used.

* Remove `ignoreRetransmittedCommitSig`

This isn't necessary since we now use `next_commitment_number` to
indicate whether we want a retransmission of `commit_sig` or not
on reconnection.

* Fix preimage extraction from remote HTLC-success txs

When extracting the preimage from potential HTLC-success transactions,
we remove the unnecessary check on the signature size, which could be
gamed by our peer if they sign with a different sighash than none.

* Define signature functions without `extraUtxos`

`CommitTx` and `ClosingTx` never have extra utxos, we can thus define a
dedicated function overload instead of explicitly providing `Map.empty`
in many places.

* Refactor `channel_reestablish` local nonces creation

The previous implementation didn't handle the dual-funding RBF case.
We add support for this case and clean-up the local nonces TLVs created
during `channel_reestablish`.

* Clean-up nonces and partial sigs TLVs and codecs

We add documentation to taproot TLVs and clean-up the codec files.

We clean-up the constructors for various lightning messages and better
type channel spending signatures (see `commit_sig` for example).

We remove the redundant `NextLocalNonce` TLV from `revoke_and_ack`.

We also rename closing nonces to be more clear, as local/remote only is
confusing.

We add codecs tests, as the PR didn't contain any encoding test for
the new TLVs. Writing those tests allowed us to discover a bug in the
encoding of `txId`s, which must be encoded in reverse endianness.

* Fix `channel_ready` retransmission

We were missing our local nonce when retransmitting `channel_ready` on
reconnection for taproot channels.

* Refactor closing helpers

We refactor the taproot closing helpers and fix a bug where the `closer`
TLV was filled instead of using the `closee` TLV (when signing remote
transactions in `signSimpleClosingTx`).

The diff with the base branch may look complex, but it is actually to
make the diff with `master` much smaller and easier to review, by
grouping logic that applies to both taproot and non-taproot cases.

We also move the `shutdown` creation helpers to a separate file, like
what is done for the `channel_ready` creation helper.

* nit: rename Phoenix simple taproot feature

The `tweaked` suffix doesn't seem to be useful, we're already stating
that this is specific to Phoenix.

* More detailed remote nonce exceptions

We rename the exceptions on remote nonces to more clearly apply to the
commit tx, funding tx or closing tx. We add more fields to help debug.

We remove the unnecessary log line on setting the remote commit nonces
(the messages are logged and contain the nonces TLV, which is enough
for debugging).

We clear nonces on disconnection, to avoid polluting a fresh connection
with invalid data from the previous connection.

We fix a few `channel_reestablish` bugs, where the *next* commit nonce
was used instead of the *current* commit nonce when retransmitting our
`commit_sig`.

* Simplify error handling

The previous code was calling `handleLocalError` in `spendLocalCurrent`
which created an infinite loop, since `handleLocalError` was itself
calling `spendLocalCurrent` to publish the commit tx. This was clearly
untested, otherwise this issue would have been noticed earlier. More
tests will be added in future commits to address this issue.

We can actually greatly simplify error handling during force-close by
making `fullySignedCommitTx` ignore errors, which is safe since we only
use this *after* receiving the remote `commit_sig`, which we validate
(we force-close with the previous commitment if the remote nonce or
partial sig is invalid).

We improve pattern matching to fully match on the commitment format,
which is important to ensure that when adding v3 taproot channels we
automatically get a compiler error to give us the opportunity to set
nonces correctly, otherwise we will silently miss some code paths.

We also fix the `Shutdown` state, where the remote commit nonces were
not provided to the `sendCommit` method, which means we couldn't settle
HTLCs for taproot channels after exchanging `shutdown`. This needs unit
tests as well, which will be added in future commits.

* Rename phoenix taproot commitment format and channel type

We're tweaking the official taproot channel type for Phoenix by using
HTLC transactions that have the same feerate as the commit tx. This
will only be support for Phoenix users, so we rename those cases to
explicitly mention that this is for Phoenix.

We also more explicitly only support upgrades of the commitment format
to this specific commitment format during a splice: this isn't a BOLT
mechanism so we can simply ignore malicious nodes who try to update to
something different. We can change this in the future when this becomes
a standard mechanism.

* Refactor and simplify interactive-tx for taproot

We refactor the `InteractiveTxBuilder` to:

- leverage existing signing helpers (from `Transactions.SpliceTx`)
- revert unnecessary changes (`receiveInput()`)
- add more documentation around `tx_complete` nonce management
- add early nonce validation (in `validateTx`)
- clean-up error handling

* Add interactive-tx taproot tests

The existing code duplicated the interactive-tx test suite for taproot.
This has many drawbacks:

- it runs a lot of tests twice for exactly the same codepaths
- it spins up another `bitcoind` instance, which is expensive
- it doesn't test the taproot-related stuff in enough depth
- it doesn't test taproot with `eclair-signer = true`

We improve this by reverting these test changes and instead:

- changing some of the tests to use taproot
- duplicating a couple of tests that have different behavior
  in taproot and non-taproot cases
- adding a test for the commitment upgrade during a splice

* Add missing taproot channel tests

The base branch simply duplicated some test suites for taproot without
any changes: this gave a false sense of safety, as it actually simply
ran twice a lot of tests that exercised the same code paths that didn't
have anything related to taproot. It also made the test suite slower,
since the duplicated tests were some of our slowest tests.

We revert this test suite duplication and instead add tests specific to
taproot channels. This showed that there were a lot of bugs in subtle
edge cases that weren't discovered previously:

- simple close RBF didn't work at all (nonce management was missing)
- shutting down a taproot channels that had pending HTLCs didn't work
  at all either (nonce management wasn't handled)
- reconnecting a channel during mutual close lead to a force-close
- reconnection with partially signed interactive-tx session wasn't
  properly checking the next commit nonce for the partially signed tx

We also correctly test all scenarios of revoked commitments during and
after a commitment upgrade to taproot. Thankfully, this scenario didn't
require any code changes, since a revoked commitment that doesn't spend
the latest funding transaction must be kept in either our `active` or
`inactive` commitments list, which means we have access to the format
and parameters of this specific commitment and don't need to brute-force
various commitment formats.

* Only generate local nonce if remote nonce is provided

* Verify nonces on reconnection *after* we've pruned funding transactions (#3138)

If we have not received their funding locked, we may have active commitments
that our peer has already pruned and will not send a nonce for, and which we
need to ignore when checking nonces sent in their `channel_reestablish` message.

* More refactoring of `channel_reestablish`

We iterate on the refactoring from #3138 to further simplify the channel
reestablish handler: we more explicitly return a single optional message
from helper functions when possible, and move comments back to the handler
instead of the helper functions.

---------

Co-authored-by: Fabrice Drouin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants