feat: Optimize `neqo-crypto` #2832

larseggert · 2025-08-07T09:44:41Z

Similar to #2827

larseggert · 2025-08-07T09:44:55Z

neqo-crypto/src/agent.rs

@@ -459,9 +459,9 @@ impl SecretAgent {
            alert: Box::pin(None),
            now: TimeHolder::default(),

-            extension_handlers: Vec::new(),
+            extension_handlers: Vec::with_capacity(4), // Typical number of TLS extensions


Likely too small.

WAT. We add ONE.

We could move to SmallVec<[1; _]>, I guess.

codecov · 2025-08-07T09:50:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.93%. Comparing base (6942acc) to head (86baaea).
⚠️ Report is 49 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2832      +/-   ##
==========================================
- Coverage   94.93%   94.93%   -0.01%     
==========================================
  Files         115      115              
  Lines       34425    34425              
  Branches    34425    34425              
==========================================
- Hits        32682    32680       -2     
  Misses       1736     1736              
- Partials        7        9       +2

Components	Coverage Δ
neqo-common	`97.73% <ø> (ø)`
neqo-crypto	`89.91% <100.00%> (ø)`
neqo-http3	`93.72% <ø> (ø)`
neqo-qpack	`95.45% <ø> (ø)`
neqo-transport	`95.94% <ø> (-0.02%)`	⬇️
neqo-udp	`89.85% <ø> (ø)`

github-actions · 2025-08-07T10:08:11Z

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to 853e4be.

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: H DC LR C20 M S R Z 3 B U A L1 L2 ⚠️C1 C2 6 V2 BP BA
neqo-latest vs. go-x-net: H DC LR M B U A L2 C2 6
neqo-latest vs. haproxy: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. kwik: H DC LR C20 M S R Z 3 B U A L2 C2 6 V2
neqo-latest vs. linuxquic: H DC LR C20 M S R Z 3 B U E A L2 C2 6 V2 BP BA CM
neqo-latest vs. lsquic: H DC LR C20 M S R Z 3 B U A L2 C2 6 V2 BP BA CM
neqo-latest vs. msquic: H DC LR C20 M S R Z B U L2 C2 6 V2 BP BA
neqo-latest vs. mvfst: H DC LR M R Z 3 B U L2 ⚠️C1 C2 6 BP
neqo-latest vs. neqo: H DC LR C20 M S R Z 3 B U E L1 L2 C1 C2 6 V2 BP BA CM
neqo-latest vs. neqo-latest: H DC LR C20 M S R Z 3 B U E L1 L2 C1 C2 6 V2 BP BA CM
neqo-latest vs. nginx: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. ngtcp2: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2 BP BA
neqo-latest vs. picoquic: H DC LR C20 M S R 🚀Z 3 B U 🚀L1 L2 C2 6 V2 BP BA
neqo-latest vs. quic-go: H DC LR C20 M S R Z 3 B U L1 L2 C1 C2 6 BP BA
neqo-latest vs. quiche: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. quinn: H DC LR C20 M S R Z 3 B U E L1 L2 C1 C2 6 BP 🚀BA
neqo-latest vs. s2n-quic: H DC LR C20 M S R 3 B U A L1 L2 C1 C2 6 BP
neqo-latest vs. tquic: H DC LR C20 M R Z 3 B U 🚀A L1 L2 C1 C2 6
neqo-latest vs. xquic: H DC LR C20 M R ⚠️Z 3 B U L2 C2 6 BP BA

neqo-latest as server

aioquic vs. neqo-latest: H DC LR C20 M S R Z 3 B A L1 L2 C1 C2 6 V2 BP BA
chrome vs. neqo-latest: 3
go-x-net vs. neqo-latest: H DC LR M B U A L2 C2 6 BP BA
kwik vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
linuxquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
lsquic vs. neqo-latest: H DC LR C20 M S R 3 B E A L1 L2 C1 C2 6 V2 BP BA CM
msquic vs. neqo-latest: H DC LR C20 M S R Z B A L1 L2 C1 C2 6 V2 BP BA
mvfst vs. neqo-latest: H DC LR M 3 B L2 C2 6 BP BA
neqo vs. neqo-latest: H DC LR C20 M S R Z 3 B U E L1 L2 C1 C2 6 V2 BP BA CM
ngtcp2 vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
openssl vs. neqo-latest: H DC C20 S R 3 B L2 C2 6 BP BA
picoquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
quic-go vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 BP ⚠️BA
quiche vs. neqo-latest: H DC LR M S R Z 3 B A L1 L2 C1 C2 6 BP BA
quinn vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A ⚠️L1 L2 C1 C2 6 BP BA
s2n-quic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6 BP BA
tquic vs. neqo-latest: H DC LR M S R Z 3 B A L1 L2 C1 C2 6 BP BA
xquic vs. neqo-latest: H DC LR C20 S R Z 3 B U A L1 L2 C1 C2 6 BP BA

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: E CM
neqo-latest vs. go-x-net: C20 S R Z 3 E L1 C1 V2 CM
neqo-latest vs. haproxy: E CM
neqo-latest vs. kwik: E CM
neqo-latest vs. msquic: 3 E CM
neqo-latest vs. mvfst: C20 S E V2 CM
neqo-latest vs. nginx: E V2 CM
neqo-latest vs. picoquic: CM
neqo-latest vs. quic-go: E V2 CM
neqo-latest vs. quiche: E V2 CM
neqo-latest vs. quinn: V2 CM
neqo-latest vs. s2n-quic: Z V2
neqo-latest vs. tquic: E V2 CM
neqo-latest vs. xquic: S E V2 CM

neqo-latest as server

aioquic vs. neqo-latest: U E
chrome vs. neqo-latest: H DC LR C20 M S R Z B U E A L1 L2 C1 C2 6 V2 BP BA CM
go-x-net vs. neqo-latest: C20 S R Z 3 E L1 C1 V2
kwik vs. neqo-latest: E
lsquic vs. neqo-latest: Z U
msquic vs. neqo-latest: 3 E
mvfst vs. neqo-latest: C20 S R U E V2
openssl vs. neqo-latest: Z U E L1 C1 V2
quic-go vs. neqo-latest: E V2
quiche vs. neqo-latest: C20 U E V2
s2n-quic vs. neqo-latest: C20 Z U V2
tquic vs. neqo-latest: C20 U E V2
xquic vs. neqo-latest: E V2

github-actions · 2025-08-07T10:14:12Z

Benchmark results

Performance differences relative to 6942acc.

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: Change within noise threshold.

       time:   [207.82 ms 208.22 ms 208.76 ms]
       thrpt:  [479.02 MiB/s 480.26 MiB/s 481.20 MiB/s]
change:
       time:   [+0.8831% +1.1574% +1.4612%] (p = 0.00 < 0.05)
       thrpt:  [−1.4402% −1.1441% −0.8754%]
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high severe

1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected.

       time:   [300.85 ms 302.28 ms 303.73 ms]
       thrpt:  [32.924 Kelem/s 33.082 Kelem/s 33.239 Kelem/s]
change:
       time:   [−0.9210% −0.2502% +0.4280%] (p = 0.49 > 0.05)
       thrpt:  [−0.4262% +0.2508% +0.9296%]
Found 2 outliers among 100 measurements (2.00%)

2 (2.00%) high mild

1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.

       time:   [28.212 ms 28.382 ms 28.569 ms]
       thrpt:  [35.003   B/s 35.234   B/s 35.446   B/s]
change:
       time:   [−0.4317% +0.2965% +1.0421%] (p = 0.42 > 0.05)
       thrpt:  [−1.0314% −0.2956% +0.4335%]
Found 10 outliers among 100 measurements (10.00%)

10 (10.00%) high severe

1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: 💔 Performance has regressed.

       time:   [210.99 ms 211.34 ms 211.72 ms]
       thrpt:  [472.32 MiB/s 473.18 MiB/s 473.95 MiB/s]
change:
       time:   [+3.3262% +3.5402% +3.7802%] (p = 0.00 < 0.05)
       thrpt:  [−3.6425% −3.4191% −3.2192%]
Found 3 outliers among 100 measurements (3.00%)

2 (2.00%) high mild

1 (1.00%) high severe

decode 4096 bytes, mask ff: Change within noise threshold.

       time:   [11.790 µs 11.814 µs 11.847 µs]
       change: [+0.7866% +1.2525% +1.7080%] (p = 0.00 < 0.05)
Found 11 outliers among 100 measurements (11.00%)

3 (3.00%) low severe

4 (4.00%) low mild

4 (4.00%) high severe

decode 1048576 bytes, mask ff: No change in performance detected.

       time:   [3.0246 ms 3.0413 ms 3.0645 ms]
       change: [+0.0551% +0.7454% +1.5034%] (p = 0.05 > 0.05)
Found 12 outliers among 100 measurements (12.00%)

2 (2.00%) low mild

10 (10.00%) high severe

decode 4096 bytes, mask 7f: 💔 Performance has regressed.

       time:   [19.924 µs 19.965 µs 20.014 µs]
       change: [+1.7722% +2.5362% +3.1082%] (p = 0.00 < 0.05)
Found 13 outliers among 100 measurements (13.00%)

2 (2.00%) low mild

1 (1.00%) high mild

10 (10.00%) high severe

decode 1048576 bytes, mask 7f: Change within noise threshold.

       time:   [5.0515 ms 5.0682 ms 5.0889 ms]
       change: [−0.9776% −0.5504% −0.0766%] (p = 0.01 < 0.05)
Found 16 outliers among 100 measurements (16.00%)

16 (16.00%) high severe

decode 4096 bytes, mask 3f: 💔 Performance has regressed.

       time:   [8.2650 µs 8.2975 µs 8.3365 µs]
       change: [+47.173% +48.696% +50.007%] (p = 0.00 < 0.05)
Found 18 outliers among 100 measurements (18.00%)

5 (5.00%) low mild

5 (5.00%) high mild

8 (8.00%) high severe

decode 1048576 bytes, mask 3f: 💚 Performance has improved.

       time:   [1.5855 ms 1.5911 ms 1.5980 ms]
       change: [−9.8134% −9.4979% −9.1085%] (p = 0.00 < 0.05)
Found 8 outliers among 100 measurements (8.00%)

1 (1.00%) high mild

7 (7.00%) high severe

coalesce_acked_from_zero 1+1 entries: No change in performance detected.

       time:   [88.774 ns 89.072 ns 89.374 ns]
       change: [−0.3476% +0.1094% +0.5901%] (p = 0.65 > 0.05)
Found 9 outliers among 100 measurements (9.00%)

7 (7.00%) high mild

2 (2.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.

       time:   [106.35 ns 106.89 ns 107.55 ns]
       change: [−1.5089% +0.4484% +2.9321%] (p = 0.76 > 0.05)
Found 12 outliers among 100 measurements (12.00%)

12 (12.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.

       time:   [105.53 ns 105.84 ns 106.24 ns]
       change: [−0.3284% +0.0960% +0.5412%] (p = 0.68 > 0.05)
Found 10 outliers among 100 measurements (10.00%)

3 (3.00%) low mild

2 (2.00%) high mild

5 (5.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.

       time:   [89.364 ns 89.473 ns 89.612 ns]
       change: [−0.9734% +0.0520% +0.9473%] (p = 0.93 > 0.05)
Found 10 outliers among 100 measurements (10.00%)

5 (5.00%) high mild

5 (5.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.

       time:   [107.74 ms 107.81 ms 107.88 ms]
       change: [−1.0131% −0.9202% −0.8238%] (p = 0.00 < 0.05)
Found 11 outliers among 100 measurements (11.00%)

9 (9.00%) low mild

1 (1.00%) high mild

1 (1.00%) high severe

sent::Packets::take_ranges: :green_heart: Performance has improved.

       time:   [5.0679 µs 5.1373 µs 5.1998 µs]
       change: [−42.082% −35.482% −24.086%] (p = 0.00 < 0.05)
Found 3 outliers among 100 measurements (3.00%)

3 (3.00%) high severe

transfer/pacing-false/varying-seeds: Change within noise threshold.

       time:   [37.786 ms 37.869 ms 37.952 ms]
       change: [+1.3178% +1.6318% +1.9647%] (p = 0.00 < 0.05)
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high mild

transfer/pacing-true/varying-seeds: Change within noise threshold.

       time:   [38.673 ms 38.789 ms 38.908 ms]
       change: [+1.1776% +1.6292% +2.0804%] (p = 0.00 < 0.05)
Found 2 outliers among 100 measurements (2.00%)

2 (2.00%) high mild

transfer/pacing-false/same-seed: Change within noise threshold.

       time:   [37.351 ms 37.415 ms 37.480 ms]
       change: [+1.5200% +1.7767% +2.0321%] (p = 0.00 < 0.05)

transfer/pacing-true/same-seed: Change within noise threshold.

       time:   [39.053 ms 39.149 ms 39.251 ms]
       change: [+1.1312% +1.5003% +1.8620%] (p = 0.00 < 0.05)
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high severe

Download data for profiler.firefox.com or download performance comparison data.

martinthomson · 2025-08-07T10:17:21Z

neqo-crypto/src/agent.rs

@@ -459,9 +459,9 @@ impl SecretAgent {
            alert: Box::pin(None),
            now: TimeHolder::default(),

-            extension_handlers: Vec::new(),
+            extension_handlers: Vec::with_capacity(4), // Typical number of TLS extensions


WAT. We add ONE.

We could move to SmallVec<[1; _]>, I guess.

martinthomson · 2025-08-07T10:17:27Z

neqo-crypto/src/agent.rs


-            ech_config: Vec::new(),
+            ech_config: Vec::with_capacity(1), // Usually 0 or 1 ECH config


This is actually some number of u8s.

Oh, so it's a byte array? I'm going to suggest 64 then.

martinthomson · 2025-08-07T10:18:40Z

neqo-crypto/src/agent.rs

@@ -1040,7 +1040,7 @@ impl Client {
        let mut client = Self {
            agent,
            server_name,
-            resumption: Box::pin(Vec::new()),
+            resumption: Box::pin(Vec::with_capacity(256)), // Typical TLS resumption token size


This number is very wrong. And trivially measurable. We should be able to get traces with a better number. Any of our resumption tests, with logging on, will spit out a value that we can round up by a little bit.

Tokens are 840 or 856 bytes in tests.

I'd hedge and say 900.

martinthomson · 2025-08-07T10:19:23Z

neqo-crypto/src/agentio.rs

        Self {
            input: AgentIoInput {
                input: null(),
                available: 0,
            },
-            output: Vec::new(),
+            output: Vec::with_capacity(1500), // Pre-allocate for typical TLS record output


Well, this is also wrong. It's the analogue of the pre-allocation we'd use for CRYPTO frames. It is not record size.

In the one test (neqo-crypto::agent basic) where this isn't zero, it's 238.

That seems low. Don't we have a handshake message with an ML-KEM share in it?

martinthomson · 2025-08-07T10:20:48Z

neqo-crypto/src/cert.rs

+            #[expect(clippy::cast_sign_loss, reason = "OK because <= 2^24")]
+            let mut ocsp_helper: Vec<Vec<u8>> = Vec::with_capacity(len as usize);


This is solid. I don't like the cast much, but the preallocation is legit. (I should say, it won't do squat for performance though, so the extra complexity doesn't really pay off at all.)

github-actions · 2025-08-07T10:25:15Z

Client/server transfer results

Performance differences relative to 6942acc.

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params)	Mean ± σ	Min	Max	MiB/s ± σ	Δ `main`	Δ `main`
google vs. google	455.9 ± 3.2	450.2	464.7	70.2 ± 10.0
google vs. neqo (cubic, paced)	273.6 ± 4.4	265.9	282.1	116.9 ± 7.3	0.5	0.2%
msquic vs. msquic	131.1 ± 29.8	109.4	321.9	244.1 ± 1.1
msquic vs. neqo (cubic, paced)	148.9 ± 16.1	124.3	217.3	215.0 ± 2.0	-4.9	-3.2%
neqo vs. google (cubic, paced)	760.6 ± 4.3	753.5	774.6	42.1 ± 7.4	💔 1.8	0.2%
neqo vs. msquic (cubic, paced)	154.8 ± 4.4	147.7	165.5	206.7 ± 7.3	-0.7	-0.5%
neqo vs. neqo (cubic)	88.9 ± 4.5	81.8	103.2	360.0 ± 7.1	💚 -3.3	-3.5%
neqo vs. neqo (cubic, paced)	94.4 ± 4.8	82.7	111.1	339.2 ± 6.7	1.2	1.3%
neqo vs. neqo (reno)	91.7 ± 4.6	80.7	107.1	348.8 ± 7.0	0.5	0.6%
neqo vs. neqo (reno, paced)	93.4 ± 4.1	85.7	100.8	342.5 ± 7.8	0.4	0.4%
neqo vs. quiche (cubic, paced)	194.0 ± 4.9	188.0	207.2	165.0 ± 6.5	💚 -1.9	-1.0%
neqo vs. s2n (cubic, paced)	223.3 ± 3.9	214.0	235.7	143.3 ± 8.2	💔 4.3	2.0%
quiche vs. neqo (cubic, paced)	148.7 ± 4.5	138.7	158.8	215.1 ± 7.1	💔 1.5	1.0%
quiche vs. quiche	144.7 ± 5.0	136.3	156.2	221.1 ± 6.4
s2n vs. neqo (cubic, paced)	171.6 ± 4.5	163.6	183.0	186.5 ± 7.1	0.7	0.4%
s2n vs. s2n	250.4 ± 27.8	233.3	349.5	127.8 ± 1.2

Download data for profiler.firefox.com or download performance comparison data.

larseggert · 2025-08-27T09:23:12Z

There doesn't seem to be any win here.

feat: Optimize neqo-crypto

f461a29

Similar to mozilla#2827

larseggert commented Aug 7, 2025

View reviewed changes

martinthomson reviewed Aug 7, 2025

View reviewed changes

larseggert added 3 commits August 8, 2025 14:44

Merge branch 'main' into feat-crypto-opt

663c9c7

Fixes

5cfcf29

Fix

57fbe4e

larseggert marked this pull request as ready for review August 8, 2025 12:03

larseggert requested review from KershawChang and mxinden as code owners August 8, 2025 12:03

Merge branch 'main' into feat-crypto-opt

86baaea

larseggert closed this Aug 27, 2025


		ech_config: Vec::new(),
		ech_config: Vec::with_capacity(1), // Usually 0 or 1 ECH config

		#[expect(clippy::cast_sign_loss, reason = "OK because <= 2^24")]
		let mut ocsp_helper: Vec<Vec<u8>> = Vec::with_capacity(len as usize);

feat: Optimize neqo-crypto #2832

feat: Optimize neqo-crypto #2832

Uh oh!

Conversation

larseggert commented Aug 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Failed Interop Tests

neqo-latest as client

neqo-latest as server

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

Uh oh!

github-actions bot commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark results

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Client/server transfer results

Uh oh!

larseggert commented Aug 27, 2025

Uh oh!

Uh oh!

feat: Optimize `neqo-crypto` #2832

feat: Optimize `neqo-crypto` #2832

codecov bot commented Aug 7, 2025 •

edited

Loading

github-actions bot commented Aug 7, 2025 •

edited

Loading

github-actions bot commented Aug 7, 2025 •

edited

Loading

github-actions bot commented Aug 7, 2025 •

edited

Loading