Skip to content

Conversation

@dmitry-tarnyagin
Copy link
Contributor

Hei,

This PR introduces two optimizations aimed to boost TLS uplink performance:

  • Key caching
    Adds optional caching for expanded session keys and IVs, eliminating costly re-expansion for every packet.

  • Configurable flush policy
    Introduces a FlushPolicy to control when the TLS layer flushes its transport.
    In Relaxed mode, it skips per-record flushes, removing TCP ACK delays and allowing TCP-level buffering to work.

Regards,
Dmitry

Cache expanded session keys that are used to encrypt and decrypt traffic.

This change is potentially unsafe and is enabled only when the "key-cache"
feature flag is set.
Introduce a configurable FlushPolicy to control when the TLS layer flushes its
underlying transport.

This change adds a FlushPolicy enum (Relaxed or Strict) and wires it into TlsConnection
and TlsWriter. The policy determines whether the transport’s flush() is called after
writing a TLS record.

With Strict (the default), the transport is always flushed, ensuring that data is acknowledged
or committed before continuing. This mode is compatible with existing behavior.

With Relaxed, the TLS layer closes the record and hands bytes to the transport without
forcing a flush, allowing buffered writes to improve performance and reduce latency.

This gives callers explicit control over how aggressively the transport flushes data.
It's particularly important for transports like embassy-net TCP, where flush() blocks
waiting for ACKs. The default remains Strict to preserve compatibility with embedded-tls 0.17.0,
while the Relaxed mode can be selected together with larger socket window for improved throughput.
@lulf
Copy link
Member

lulf commented Oct 24, 2025

Hei,

This PR introduces two optimizations aimed to boost TLS uplink performance:

* Key caching
  Adds optional caching for expanded session keys and IVs, eliminating costly re-expansion for every packet.

* Configurable flush policy
  Introduces a FlushPolicy to control when the TLS layer flushes its transport.
  In Relaxed mode, it skips per-record flushes, removing TCP ACK delays and allowing TCP-level buffering to work.

I'm wondering, could we just not do per-record flushes and rely on TCP buffering? I've never tried that, so it's more of a question if you've tried it with embassy-net.

@dmitry-tarnyagin
Copy link
Contributor Author

dmitry-tarnyagin commented Oct 24, 2025

I'm wondering, could we just not do per-record flushes and rely on TCP buffering? I've never tried that, so it's more of a question if you've tried it with embassy-net.

It depends. With embassy-net as a transport you should not do flushes, as it's the stack's responsibility to send data (, buffer, handle retransmits, etc). But if you for some reason is tunneling embedded-tls over embedded-tls (why not?:)), you'll face a problem.

At the same time it's ok to flush handshakes and other request-response transactions (if any), were you anyway have to wait for a reply.

@lulf
Copy link
Member

lulf commented Oct 24, 2025

I don't see any reason to explicitly flush unless there is some cases I'm missing. Thoughts, @rmja, @newAM ? I'd rather just change it to not explicitly flush and avoid having the flush strategy.

@rmja
Copy link
Member

rmja commented Oct 24, 2025

I guess the expectation for flush() is that the propagates all the way through the stack, and that any buffering that has occurred in an intermediate layer is delegated to the next. If some layer below embedded-tls buffers in any way, then it will not receive the flush() if we hijack it. For that reason I think something like "strict flush" is good and should be the default (like in the pr).

That being said, I can understand why you would want something like this, when you are able to control flush behavior on the below layer explicitly.

@newAM
Copy link
Collaborator

newAM commented Oct 25, 2025

No opinions on the flushing. I'm not familiar enough with the underlying transports to have a solid mental model of the consequences of the flushing change.


The key cache seems logical, but is there a reason to hide it behind a feature? The memory cost of storing an IV and Key is negligible compared to the overall footprint of embedded-tls. I would prefer this to be enabled by default without a feature to disable it. Easier to maintain.

@dmitry-tarnyagin
Copy link
Contributor Author

dmitry-tarnyagin commented Oct 25, 2025

@newAM

is there a reason to hide it behind a feature?

I'm not entirely sure. On one hand, it's perfectly fine for me to cache them and even derive them immediately in calculate_traffic_secret(). On the other hand, keeping ready-to-use keys in readable memory sounds a bit risky (yes, I know the key material is already readable). I'm not a security expert, so the feature flag allows others to decide.

@newAM
Copy link
Collaborator

newAM commented Oct 26, 2025

is there a reason to hide it behind a feature?

I'm not entirely sure. On one hand, it's perfectly fine for me to cache them and even derive them immediately in calculate_traffic_secret(). On the other hand, keeping ready-to-use keys in readable memory sounds a bit risky (yes, I know the key material is already readable). I'm not a security expert, so the feature flag allows others to decide.

When it comes to features for the purposes of security my method is to describe a threat model and then evaluate if the feature is sufficient to mitigate that threat. I can't think of a threat model where a motivated attacker obtains arbitrary memory access then is stopped by key derivation, which is why I suggested this is enabled by default.

@dmitry-tarnyagin
Copy link
Contributor Author

@newAM That's perfectly fine for me, I will update it w/o a feature. Thanks for review!

@dmitry-tarnyagin dmitry-tarnyagin force-pushed the feature/key-cache-n-flush-policy branch from 09d165e to ab0c892 Compare October 28, 2025 14:21
@dmitry-tarnyagin
Copy link
Contributor Author

@lulf Updated, ready for review.

@lulf lulf merged commit dccd966 into drogue-iot:main Oct 28, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants