Skip to content

feat: Randomize CI packet number #2499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Conversation

larseggert
Copy link
Collaborator

@larseggert larseggert commented Mar 17, 2025

This randomizes the starting packet number the client uses for the Initial packet number space.

We don't randomize this on the server, since otherwise we'd need even more changes to the tests to account for that.

Fixes #2462

CC @omansfeld

WIP. Two potential issues:
1. Did I mess up the validation in `CryptoDxState::continuation()`?
2. See the `FIXME` in `Version::confirm_version()`.

Fixes mozilla#2462

CC @omansfeld
Copy link

github-actions bot commented Mar 17, 2025

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to fd50a23.

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Copy link

github-actions bot commented Mar 17, 2025

Benchmark results

Performance differences relative to eb3e2a6.

decode 4096 bytes, mask ff: No change in performance detected.
       time:   [11.865 µs 11.911 µs 11.962 µs]
       change: [+0.0023% +0.3730% +0.7267%] (p = 0.05 > 0.05)

Found 14 outliers among 100 measurements (14.00%)
2 (2.00%) low mild
12 (12.00%) high severe

decode 1048576 bytes, mask ff: No change in performance detected.
       time:   [3.0668 ms 3.0760 ms 3.0868 ms]
       change: [-0.2914% +0.1283% +0.5644%] (p = 0.55 > 0.05)

Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) low mild
8 (8.00%) high severe

decode 4096 bytes, mask 7f: No change in performance detected.
       time:   [19.767 µs 19.813 µs 19.865 µs]
       change: [-0.2894% +0.1264% +0.6193%] (p = 0.60 > 0.05)

Found 15 outliers among 100 measurements (15.00%)
2 (2.00%) low severe
1 (1.00%) high mild
12 (12.00%) high severe

decode 1048576 bytes, mask 7f: No change in performance detected.
       time:   [5.1412 ms 5.1520 ms 5.1634 ms]
       change: [-0.3776% -0.0581% +0.2623%] (p = 0.73 > 0.05)

Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
11 (11.00%) high severe

decode 4096 bytes, mask 3f: No change in performance detected.
       time:   [6.8760 µs 6.9108 µs 6.9562 µs]
       change: [-0.1045% +0.3173% +0.8251%] (p = 0.19 > 0.05)

Found 14 outliers among 100 measurements (14.00%)
1 (1.00%) high mild
13 (13.00%) high severe

decode 1048576 bytes, mask 3f: No change in performance detected.
       time:   [1.7551 ms 1.7607 ms 1.7676 ms]
       change: [-0.4074% +0.0636% +0.6146%] (p = 0.86 > 0.05)

Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) high mild
6 (6.00%) high severe

1 streams of 1 bytes/multistream: No change in performance detected.
       time:   [68.798 µs 69.360 µs 70.371 µs]
       change: [-1.2242% +0.8016% +2.9149%] (p = 0.53 > 0.05)

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

1000 streams of 1 bytes/multistream: Change within noise threshold.
       time:   [24.728 ms 24.767 ms 24.806 ms]
       change: [+0.7068% +0.9091% +1.1247%] (p = 0.00 < 0.05)
10000 streams of 1 bytes/multistream: 💔 Performance has regressed.
       time:   [1.6641 s 1.6660 s 1.6679 s]
       change: [+1.0095% +1.1630% +1.3195%] (p = 0.00 < 0.05)

Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild

1 streams of 1000 bytes/multistream: No change in performance detected.
       time:   [70.279 µs 70.851 µs 71.889 µs]
       change: [-3.7524% -1.1530% +1.1388%] (p = 0.43 > 0.05)

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

100 streams of 1000 bytes/multistream: 💔 Performance has regressed.
       time:   [3.3342 ms 3.3406 ms 3.3474 ms]
       change: [+3.5920% +3.9093% +4.2248%] (p = 0.00 < 0.05)

Found 21 outliers among 100 measurements (21.00%)
21 (21.00%) high severe

1000 streams of 1000 bytes/multistream: 💔 Performance has regressed.
       time:   [145.70 ms 145.78 ms 145.88 ms]
       change: [+6.2013% +6.2904% +6.3845%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [92.503 ns 92.815 ns 93.133 ns]
       change: [-0.1145% +2.4447% +8.9192%] (p = 0.30 > 0.05)

Found 13 outliers among 100 measurements (13.00%)
10 (10.00%) high mild
3 (3.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [110.04 ns 110.34 ns 110.68 ns]
       change: [-0.0131% +1.5025% +4.2996%] (p = 0.23 > 0.05)

Found 20 outliers among 100 measurements (20.00%)
4 (4.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
12 (12.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [110.19 ns 110.77 ns 111.43 ns]
       change: [-0.0848% +1.3116% +3.4977%] (p = 0.18 > 0.05)

Found 18 outliers among 100 measurements (18.00%)
6 (6.00%) low severe
1 (1.00%) low mild
11 (11.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [91.468 ns 91.504 ns 91.545 ns]
       change: [-1.2043% -0.1532% +0.8703%] (p = 0.79 > 0.05)

Found 17 outliers among 100 measurements (17.00%)
5 (5.00%) high mild
12 (12.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [115.32 ms 115.37 ms 115.41 ms]
       change: [-0.1802% -0.1216% -0.0643%] (p = 0.00 < 0.05)

Found 16 outliers among 100 measurements (16.00%)
1 (1.00%) low severe
5 (5.00%) low mild
10 (10.00%) high mild

SentPackets::take_ranges: No change in performance detected.
       time:   [5.3141 µs 5.4993 µs 5.6994 µs]
       change: [-16.718% -4.6892% +4.2003%] (p = 0.60 > 0.05)

Found 9 outliers among 100 measurements (9.00%)
6 (6.00%) high mild
3 (3.00%) high severe

transfer/pacing-false/varying-seeds: Change within noise threshold.
       time:   [34.386 ms 34.451 ms 34.517 ms]
       change: [-1.0133% -0.7358% -0.4605%] (p = 0.00 < 0.05)

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild

transfer/pacing-true/varying-seeds: Change within noise threshold.
       time:   [34.410 ms 34.469 ms 34.527 ms]
       change: [-1.0390% -0.7966% -0.5473%] (p = 0.00 < 0.05)
transfer/pacing-false/same-seed: Change within noise threshold.
       time:   [34.795 ms 34.856 ms 34.916 ms]
       change: [+0.3264% +0.5567% +0.7844%] (p = 0.00 < 0.05)
transfer/pacing-true/same-seed: Change within noise threshold.
       time:   [34.547 ms 34.602 ms 34.657 ms]
       change: [-2.0059% -1.7627% -1.5087%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: Change within noise threshold.
       time:   [2.1900 s 2.1975 s 2.2053 s]
       thrpt:  [45.345 MiB/s 45.505 MiB/s 45.663 MiB/s]
change:
       time:   [-1.6458% -1.0290% -0.4638%] (p = 0.00 < 0.05)
       thrpt:  [+0.4660% +1.0397% +1.6733%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected.
       time:   [387.29 ms 389.31 ms 391.36 ms]
       thrpt:  [25.552 Kelem/s 25.686 Kelem/s 25.821 Kelem/s]
change:
       time:   [-0.9330% -0.1693% +0.6008%] (p = 0.66 > 0.05)
       thrpt:  [-0.5972% +0.1696% +0.9418%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: 💔 Performance has regressed.
       time:   [29.233 ms 30.064 ms 30.882 ms]
       thrpt:  [32.382  elem/s 33.263  elem/s 34.208  elem/s]
change:
       time:   [+1.6212% +5.5274% +9.5398%] (p = 0.01 < 0.05)
       thrpt:  [-8.7090% -5.2379% -1.5953%]
1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: 💔 Performance has regressed.
       time:   [3.5594 s 3.5820 s 3.6069 s]
       thrpt:  [27.725 MiB/s 27.918 MiB/s 28.095 MiB/s]
change:
       time:   [+10.109% +11.075% +12.085%] (p = 0.00 < 0.05)
       thrpt:  [-10.782% -9.9709% -9.1805%]

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

Client/server transfer results

Performance differences relative to eb3e2a6.

Transfer of 33554432 bytes over loopback, 30 runs. All unit-less numbers are in milliseconds.

Client Server CC Pacing Mean ± σ Min Max Δ main Δ main
neqo neqo reno on 528.5 ± 50.3 471.9 709.7 12.3 0.6%
neqo neqo reno 551.2 ± 100.6 462.8 841.1 -3.0 -0.1%
neqo neqo cubic on 544.1 ± 46.0 477.9 716.3 💔 23.1 1.1%
neqo neqo cubic 537.3 ± 55.8 475.4 719.3 13.3 0.6%
google neqo reno on 915.4 ± 103.6 660.0 1142.2 8.7 0.2%
google neqo reno 918.2 ± 106.6 670.7 1095.6 3.1 0.1%
google neqo cubic on 914.6 ± 93.9 698.4 1094.0 13.0 0.4%
google neqo cubic 905.1 ± 93.5 668.6 977.1 8.8 0.2%
google google 555.6 ± 31.5 531.3 654.8 0.6 0.0%
neqo msquic reno on 238.2 ± 52.2 202.6 404.8 1.0 0.1%
neqo msquic reno 231.1 ± 40.8 203.9 434.5 -1.1 -0.1%
neqo msquic cubic on 233.6 ± 42.0 203.7 384.0 11.1 1.2%
neqo msquic cubic 225.6 ± 13.0 204.7 262.2 4.2 0.5%
msquic msquic 127.1 ± 39.3 97.6 266.8 4.3 0.9%

⬇️ Download logs

Copy link

codecov bot commented Aug 8, 2025

Codecov Report

❌ Patch coverage is 96.96970% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 94.91%. Comparing base (853e4be) to head (6acb298).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2499      +/-   ##
==========================================
- Coverage   94.92%   94.91%   -0.02%     
==========================================
  Files         115      115              
  Lines       34370    34465      +95     
  Branches    34370    34465      +95     
==========================================
+ Hits        32625    32711      +86     
- Misses       1736     1747      +11     
+ Partials        9        7       -2     
Components Coverage Δ
neqo-common 97.73% <ø> (ø)
neqo-crypto 89.91% <ø> (+0.27%) ⬆️
neqo-http3 93.73% <ø> (+<0.01%) ⬆️
neqo-qpack 95.45% <ø> (+<0.01%) ⬆️
neqo-transport 95.90% <96.96%> (-0.05%) ⬇️
neqo-udp 89.37% <ø> (-0.49%) ⬇️

@larseggert larseggert requested a review from omansfeld August 8, 2025 13:34
@larseggert larseggert marked this pull request as ready for review August 8, 2025 13:34
Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't test that the client Initial has a non-zero initial packet number.

@@ -979,15 +988,21 @@ impl CryptoStates {
Role::Server => (SERVER_INITIAL_LABEL, CLIENT_INITIAL_LABEL),
};

let min_pn = if randomize_ci_pn {
packet::Number::from((random::<1>()[0] >> 4) + 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why skip 0? Isn't it a good idea to include that in the range? If you want lower odds: (r >> 4) + (r & 0xf) returns a value of 0 pretty rarely. Though I tend to like (r >> 3) + (r & 0x7) for keeping it in one byte.

Also, what logic are you using to decide that 16 is the largest value? Obviously, if we bump this up to 64, we'll have to contend with the whole thing where the packet number needs to be longer. But that seems harmless. We use largest_acknowledged, so it's probably fine to use any starting number. And we can afford to spend a byte on this.

Given that we are safe in including any value here, could we instead do let r = random::<2>(); r[0] & r[1] ? That has a 7/16 chance of slipping into the next byte, but no more than that. It also pushes more toward having smaller values overall.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skipped zero, so that it would be easy to do a test for it (which I then failed to do.)

The logic behind 16 was simply to pick a number that gave us some small randomization without eating up too much of the 0-255 range. Am very open to doing something different.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is now a test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should make it more random, I think. It doesn't hurt if this uses more of the range. Even beyond 63. Though I'd aim for something that did less than 63 most times, either way.

Copy link
Collaborator Author

@larseggert larseggert Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried your last suggestion, but we then need to fix the reorder_handshake test, which currently fails when we get two-byte packet numbers. Going with

packet::Number::from((r >> 3) + (r & 0x7)) + 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't easily see how reorder_handshake fails. If the failure happens after two-bytes, just disable the initial packet number tweak for that test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    thread 'connection::tests::handshake::reorder_handshake' (12105956) panicked at neqo-transport/src/connection/tests/handshake.rs:566:5:
    assertion `left == right` failed
      left: 1
     right: 3

I.e., it's only receiving one packet by then.

I can make the change, would then do

let r = random::<2>();
(r[0] & r[1]) + 1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martinthomson so, about that failing test, I wonder if we actually have a bug somewhere. What seems to happen with two-byte packet numbers is that Handshake keys are not available after the

client.process_input(s_initial_2, now);

line (neqo-transport/src/connection/tests/handshake.rs:564), whereas they are available (and saved packets are hence processed) with single-byte packet numbers. Odd...

Copy link
Collaborator

@omansfeld omansfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me with the added test and pending the discussion about randomization.

@@ -979,15 +988,21 @@ impl CryptoStates {
Role::Server => (SERVER_INITIAL_LABEL, CLIENT_INITIAL_LABEL),
};

let min_pn = if randomize_ci_pn {
packet::Number::from((random::<1>()[0] >> 4) + 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should make it more random, I think. It doesn't hurt if this uses more of the range. Even beyond 63. Though I'd aim for something that did less than 63 most times, either way.

Copy link

github-actions bot commented Aug 11, 2025

Client/server transfer results

Performance differences relative to f1df423.

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ main Δ main
google vs. google 455.2 ± 4.4 449.3 475.7 70.3 ± 7.3
google vs. neqo (cubic, paced) 271.5 ± 3.9 264.7 279.9 117.9 ± 8.2 -0.9 -0.3%
msquic vs. msquic 127.9 ± 20.4 110.2 219.0 250.2 ± 1.6
msquic vs. neqo (cubic, paced) 148.0 ± 33.2 125.0 357.4 216.2 ± 1.0 -5.0 -3.3%
neqo vs. google (cubic, paced) 760.1 ± 4.8 752.8 772.8 42.1 ± 6.7 💔 1.8 0.2%
neqo vs. msquic (cubic, paced) 155.2 ± 5.0 146.8 172.7 206.2 ± 6.4 -0.7 -0.4%
neqo vs. neqo (cubic) 90.1 ± 4.7 80.6 102.5 355.2 ± 6.8 -0.6 -0.7%
neqo vs. neqo (cubic, paced) 91.0 ± 4.5 81.7 101.9 351.8 ± 7.1 -0.1 -0.1%
neqo vs. neqo (reno) 91.4 ± 4.8 83.5 103.4 350.1 ± 6.7 💔 1.6 1.8%
neqo vs. neqo (reno, paced) 91.9 ± 5.1 81.1 108.4 348.0 ± 6.3 💔 1.8 2.0%
neqo vs. quiche (cubic, paced) 194.8 ± 4.7 187.8 204.2 164.3 ± 6.8 💔 2.8 1.5%
neqo vs. s2n (cubic, paced) 219.6 ± 4.2 211.3 229.5 145.7 ± 7.6 0.8 0.4%
quiche vs. neqo (cubic, paced) 152.3 ± 5.1 142.2 164.5 210.1 ± 6.3 💚 -2.9 -1.9%
quiche vs. quiche 143.8 ± 4.4 136.7 156.2 222.6 ± 7.3
s2n vs. neqo (cubic, paced) 173.3 ± 4.8 163.2 185.5 184.6 ± 6.7 -0.0 -0.0%
s2n vs. s2n 244.4 ± 20.3 230.9 345.3 130.9 ± 1.6

Download data for profiler.firefox.com or download performance comparison data.

Copy link

github-actions bot commented Aug 11, 2025

Benchmark results

Performance differences relative to f1df423.

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: Change within noise threshold.
       time:   [206.80 ms 207.31 ms 207.95 ms]
       thrpt:  [480.88 MiB/s 482.38 MiB/s 483.57 MiB/s]
change:
       time:   [+0.5946% +0.9179% +1.2714%] (p = 0.00 < 0.05)
       thrpt:  [−1.2554% −0.9096% −0.5911%]

Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe

1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: Change within noise threshold.
       time:   [303.15 ms 304.53 ms 305.92 ms]
       thrpt:  [32.689 Kelem/s 32.837 Kelem/s 32.987 Kelem/s]
change:
       time:   [−1.7236% −1.0745% −0.4304%] (p = 0.00 < 0.05)
       thrpt:  [+0.4322% +1.0862% +1.7539%]
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.
       time:   [28.132 ms 28.247 ms 28.376 ms]
       thrpt:  [35.241   B/s 35.402   B/s 35.546   B/s]
change:
       time:   [−0.8420% −0.0444% +0.6253%] (p = 0.92 > 0.05)
       thrpt:  [−0.6214% +0.0444% +0.8492%]

Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high severe

1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: 💔 Performance has regressed.
       time:   [208.69 ms 208.97 ms 209.31 ms]
       thrpt:  [477.76 MiB/s 478.54 MiB/s 479.19 MiB/s]
change:
       time:   [+2.1152% +2.4115% +2.6711%] (p = 0.00 < 0.05)
       thrpt:  [−2.6016% −2.3547% −2.0714%]

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low severe
1 (1.00%) high severe

decode 4096 bytes, mask ff: No change in performance detected.
       time:   [11.624 µs 11.666 µs 11.712 µs]
       change: [−1.0150% −0.4196% +0.1159%] (p = 0.16 > 0.05)

Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) low severe
2 (2.00%) low mild
11 (11.00%) high severe

decode 1048576 bytes, mask ff: No change in performance detected.
       time:   [3.0068 ms 3.0162 ms 3.0274 ms]
       change: [−0.7423% −0.1456% +0.4175%] (p = 0.64 > 0.05)

Found 9 outliers among 100 measurements (9.00%)
9 (9.00%) high severe

decode 4096 bytes, mask 7f: No change in performance detected.
       time:   [19.378 µs 19.431 µs 19.488 µs]
       change: [−0.4198% +0.0597% +0.5116%] (p = 0.81 > 0.05)

Found 21 outliers among 100 measurements (21.00%)
1 (1.00%) low severe
4 (4.00%) low mild
16 (16.00%) high severe

decode 1048576 bytes, mask 7f: No change in performance detected.
       time:   [5.0857 ms 5.0971 ms 5.1099 ms]
       change: [−0.4551% −0.0635% +0.3313%] (p = 0.76 > 0.05)

Found 17 outliers among 100 measurements (17.00%)
1 (1.00%) low mild
1 (1.00%) high mild
15 (15.00%) high severe

decode 4096 bytes, mask 3f: No change in performance detected.
       time:   [5.5307 µs 5.5586 µs 5.5928 µs]
       change: [−0.5146% +0.0795% +0.7571%] (p = 0.82 > 0.05)

Found 14 outliers among 100 measurements (14.00%)
1 (1.00%) low mild
2 (2.00%) high mild
11 (11.00%) high severe

decode 1048576 bytes, mask 3f: No change in performance detected.
       time:   [1.7579 ms 1.7594 ms 1.7624 ms]
       change: [−0.0063% +0.0848% +0.3157%] (p = 0.31 > 0.05)

Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low mild
1 (1.00%) high mild
3 (3.00%) high severe

coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [88.477 ns 88.796 ns 89.104 ns]
       change: [−0.7817% −0.3812% +0.0541%] (p = 0.06 > 0.05)

Found 13 outliers among 100 measurements (13.00%)
12 (12.00%) high mild
1 (1.00%) high severe

coalesce_acked_from_zero 3+1 entries: Change within noise threshold.
       time:   [105.65 ns 105.96 ns 106.29 ns]
       change: [−1.0977% −0.7066% −0.3470%] (p = 0.00 < 0.05)

Found 11 outliers among 100 measurements (11.00%)
11 (11.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [105.20 ns 105.69 ns 106.27 ns]
       change: [−1.4938% −0.5767% +0.3480%] (p = 0.22 > 0.05)

Found 17 outliers among 100 measurements (17.00%)
4 (4.00%) low mild
2 (2.00%) high mild
11 (11.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [89.176 ns 92.788 ns 100.92 ns]
       change: [−1.5064% +1.8440% +7.2353%] (p = 0.65 > 0.05)

Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) high mild
7 (7.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [107.99 ms 108.08 ms 108.18 ms]
       change: [−0.6893% −0.4848% −0.3110%] (p = 0.00 < 0.05)

Found 18 outliers among 100 measurements (18.00%)
8 (8.00%) low mild
4 (4.00%) high mild
6 (6.00%) high severe

sent::Packets::take_ranges: :green_heart: Performance has improved.
       time:   [5.0716 µs 5.1701 µs 5.2758 µs]
       change: [−42.641% −36.626% −25.438%] (p = 0.00 < 0.05)

Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) high mild
3 (3.00%) high severe

transfer/pacing-false/varying-seeds: Change within noise threshold.
       time:   [36.989 ms 37.074 ms 37.170 ms]
       change: [+0.0643% +0.4117% +0.7689%] (p = 0.02 < 0.05)

Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe

transfer/pacing-true/varying-seeds: Change within noise threshold.
       time:   [37.832 ms 37.939 ms 38.051 ms]
       change: [+0.3177% +0.7468% +1.1725%] (p = 0.00 < 0.05)

Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe

transfer/pacing-false/same-seed: Change within noise threshold.
       time:   [36.727 ms 36.801 ms 36.891 ms]
       change: [+0.4516% +0.7648% +1.1022%] (p = 0.00 < 0.05)

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

transfer/pacing-true/same-seed: No change in performance detected.
       time:   [38.263 ms 38.357 ms 38.456 ms]
       change: [−0.5487% −0.1922% +0.1659%] (p = 0.29 > 0.05)

Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe

Download data for profiler.firefox.com or download performance comparison data.

@martinthomson
Copy link
Member

martinthomson commented Aug 14, 2025

Nice way to find a boundary condition, @larseggert.

Our tests operate with a 1232-byte MTU. In this particular test, the packet was being filled completely and precisely. The ServerHello is 1178 bytes. In a CRYPTO frame, that's 1182 bytes. Add an ACK frame with a one-byte largest acknowledged and a gap for the dropped packet (7 bytes), the header (27) and an auth tag (16) and you get EXACTLY 1232 bytes.

In this case, we're adding one byte to the size of the ACK frame because the largest acknowledged needs two bytes to encode. So the CRYPTO frame needs to have its last byte cut off to fit.

The reason we don't bother with another Initial packet is unclear to me. (Edit: worked it out...) If we added the remainder to the second packet, then that would have been processed and the keys would be available. We have sent an Initial before and that one had a CRYPTO frame that fits the packet (because that didn't have a gap in the ACK frame, so it just fit) and that was marked as needing retransmission so that it could be retransmitted. So all of the bytes were marked as needing retransmission.

The challenge here is that we have lost ONE packet, but we need to transmit TWO packets to repair the loss properly. But our PTO code only sends two packets for Application packets, whereas it only sends ONE for Initial and Handshake packets. That is because we want to avoid pointless PING-only packets when repairing the handshake. So we limited handshake PTOs to a single packet. That means that the PTO code moves on to sending a Handshake packet, rather than sending the last byte of that CRYPTO frame.

If we had sent TWO packets, we could maybe have made the PTO code pay attention to the number of packets that were marked as needing retransmission, but in this very narrow case, we need to repair the loss of one packet with two. The alternative is to add some smarts to check for outstanding CRYPTO data when sending a second PTO, rather than capping the number of PTOs to 1.

A hacky workaround here might be to increase the MTU. Though that just defers the problem for later.

martinthomson added a commit to martinthomson/neqo that referenced this pull request Aug 21, 2025
This is mostly the work of @larseggert, I'm just repackaging it.  This
will target his branch with any changes.

This randomizes the starting packet number the client uses for the Initial packet number space.

We don't randomize this on the server, since otherwise we'd need even more changes to the tests to account for that.

Fixes mozilla#2462.
Closes mozilla#2499.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Initial packet number need not be zero
3 participants