-
Notifications
You must be signed in to change notification settings - Fork 138
feat: Optimize neqo-crypto
#2832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Similar to mozilla#2827
neqo-crypto/src/agent.rs
Outdated
@@ -459,9 +459,9 @@ impl SecretAgent { | |||
alert: Box::pin(None), | |||
now: TimeHolder::default(), | |||
|
|||
extension_handlers: Vec::new(), | |||
extension_handlers: Vec::with_capacity(4), // Typical number of TLS extensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely too small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WAT. We add ONE.
We could move to SmallVec<[1; _]>
, I guess.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2832 +/- ##
==========================================
- Coverage 94.93% 94.93% -0.01%
==========================================
Files 115 115
Lines 34425 34425
Branches 34425 34425
==========================================
- Hits 32682 32680 -2
Misses 1736 1736
- Partials 7 9 +2
|
Failed Interop TestsQUIC Interop Runner, client vs. server, differences relative to 853e4be. neqo-latest as client
neqo-latest as server
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
|
Benchmark resultsPerformance differences relative to 6942acc. 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: Change within noise threshold.time: [207.82 ms 208.22 ms 208.76 ms] thrpt: [479.02 MiB/s 480.26 MiB/s 481.20 MiB/s] change: time: [+0.8831% +1.1574% +1.4612%] (p = 0.00 < 0.05) thrpt: [−1.4402% −1.1441% −0.8754%] 1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected.time: [300.85 ms 302.28 ms 303.73 ms] thrpt: [32.924 Kelem/s 33.082 Kelem/s 33.239 Kelem/s] change: time: [−0.9210% −0.2502% +0.4280%] (p = 0.49 > 0.05) thrpt: [−0.4262% +0.2508% +0.9296%] 1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.time: [28.212 ms 28.382 ms 28.569 ms] thrpt: [35.003 B/s 35.234 B/s 35.446 B/s] change: time: [−0.4317% +0.2965% +1.0421%] (p = 0.42 > 0.05) thrpt: [−1.0314% −0.2956% +0.4335%] 1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: 💔 Performance has regressed.time: [210.99 ms 211.34 ms 211.72 ms] thrpt: [472.32 MiB/s 473.18 MiB/s 473.95 MiB/s] change: time: [+3.3262% +3.5402% +3.7802%] (p = 0.00 < 0.05) thrpt: [−3.6425% −3.4191% −3.2192%] decode 4096 bytes, mask ff: Change within noise threshold.time: [11.790 µs 11.814 µs 11.847 µs] change: [+0.7866% +1.2525% +1.7080%] (p = 0.00 < 0.05) decode 1048576 bytes, mask ff: No change in performance detected.time: [3.0246 ms 3.0413 ms 3.0645 ms] change: [+0.0551% +0.7454% +1.5034%] (p = 0.05 > 0.05) decode 4096 bytes, mask 7f: 💔 Performance has regressed.time: [19.924 µs 19.965 µs 20.014 µs] change: [+1.7722% +2.5362% +3.1082%] (p = 0.00 < 0.05) decode 1048576 bytes, mask 7f: Change within noise threshold.time: [5.0515 ms 5.0682 ms 5.0889 ms] change: [−0.9776% −0.5504% −0.0766%] (p = 0.01 < 0.05) decode 4096 bytes, mask 3f: 💔 Performance has regressed.time: [8.2650 µs 8.2975 µs 8.3365 µs] change: [+47.173% +48.696% +50.007%] (p = 0.00 < 0.05) decode 1048576 bytes, mask 3f: 💚 Performance has improved.time: [1.5855 ms 1.5911 ms 1.5980 ms] change: [−9.8134% −9.4979% −9.1085%] (p = 0.00 < 0.05) coalesce_acked_from_zero 1+1 entries: No change in performance detected.time: [88.774 ns 89.072 ns 89.374 ns] change: [−0.3476% +0.1094% +0.5901%] (p = 0.65 > 0.05) coalesce_acked_from_zero 3+1 entries: No change in performance detected.time: [106.35 ns 106.89 ns 107.55 ns] change: [−1.5089% +0.4484% +2.9321%] (p = 0.76 > 0.05) coalesce_acked_from_zero 10+1 entries: No change in performance detected.time: [105.53 ns 105.84 ns 106.24 ns] change: [−0.3284% +0.0960% +0.5412%] (p = 0.68 > 0.05) coalesce_acked_from_zero 1000+1 entries: No change in performance detected.time: [89.364 ns 89.473 ns 89.612 ns] change: [−0.9734% +0.0520% +0.9473%] (p = 0.93 > 0.05) RxStreamOrderer::inbound_frame(): Change within noise threshold.time: [107.74 ms 107.81 ms 107.88 ms] change: [−1.0131% −0.9202% −0.8238%] (p = 0.00 < 0.05) sent::Packets::take_ranges: :green_heart: Performance has improved.time: [5.0679 µs 5.1373 µs 5.1998 µs] change: [−42.082% −35.482% −24.086%] (p = 0.00 < 0.05) transfer/pacing-false/varying-seeds: Change within noise threshold.time: [37.786 ms 37.869 ms 37.952 ms] change: [+1.3178% +1.6318% +1.9647%] (p = 0.00 < 0.05) transfer/pacing-true/varying-seeds: Change within noise threshold.time: [38.673 ms 38.789 ms 38.908 ms] change: [+1.1776% +1.6292% +2.0804%] (p = 0.00 < 0.05) transfer/pacing-false/same-seed: Change within noise threshold.time: [37.351 ms 37.415 ms 37.480 ms] change: [+1.5200% +1.7767% +2.0321%] (p = 0.00 < 0.05) transfer/pacing-true/same-seed: Change within noise threshold.time: [39.053 ms 39.149 ms 39.251 ms] change: [+1.1312% +1.5003% +1.8620%] (p = 0.00 < 0.05) Download data for |
neqo-crypto/src/agent.rs
Outdated
@@ -459,9 +459,9 @@ impl SecretAgent { | |||
alert: Box::pin(None), | |||
now: TimeHolder::default(), | |||
|
|||
extension_handlers: Vec::new(), | |||
extension_handlers: Vec::with_capacity(4), // Typical number of TLS extensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WAT. We add ONE.
We could move to SmallVec<[1; _]>
, I guess.
neqo-crypto/src/agent.rs
Outdated
|
||
ech_config: Vec::new(), | ||
ech_config: Vec::with_capacity(1), // Usually 0 or 1 ECH config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually some number of u8
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so it's a byte array? I'm going to suggest 64 then.
neqo-crypto/src/agent.rs
Outdated
@@ -1040,7 +1040,7 @@ impl Client { | |||
let mut client = Self { | |||
agent, | |||
server_name, | |||
resumption: Box::pin(Vec::new()), | |||
resumption: Box::pin(Vec::with_capacity(256)), // Typical TLS resumption token size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This number is very wrong. And trivially measurable. We should be able to get traces with a better number. Any of our resumption tests, with logging on, will spit out a value that we can round up by a little bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tokens are 840 or 856 bytes in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd hedge and say 900.
neqo-crypto/src/agentio.rs
Outdated
Self { | ||
input: AgentIoInput { | ||
input: null(), | ||
available: 0, | ||
}, | ||
output: Vec::new(), | ||
output: Vec::with_capacity(1500), // Pre-allocate for typical TLS record output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is also wrong. It's the analogue of the pre-allocation we'd use for CRYPTO frames. It is not record size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the one test (neqo-crypto::agent basic
) where this isn't zero, it's 238.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems low. Don't we have a handshake message with an ML-KEM share in it?
#[expect(clippy::cast_sign_loss, reason = "OK because <= 2^24")] | ||
let mut ocsp_helper: Vec<Vec<u8>> = Vec::with_capacity(len as usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is solid. I don't like the cast much, but the preallocation is legit. (I should say, it won't do squat for performance though, so the extra complexity doesn't really pay off at all.)
Client/server transfer resultsPerformance differences relative to 6942acc. Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.
Download data for |
There doesn't seem to be any win here. |
Similar to #2827