Skip to content

Commit 1c1a8f5

Browse files
committed
Setup and use new benchmarking harness
1 parent f14458b commit 1c1a8f5

File tree

14 files changed

+1288
-913
lines changed

14 files changed

+1288
-913
lines changed

Cargo.lock

Lines changed: 5 additions & 117 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

benches/Cargo.toml

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,19 +16,18 @@ name = "wgpu-benchmark"
1616
harness = false
1717

1818
[features]
19-
# Uncomment these features to enable tracy and superluminal profiling.
20-
# tracy = ["dep:tracy-client", "profiling/profile-with-tracy"]
21-
# superluminal = ["profiling/profile-with-superluminal"]
19+
tracy = ["dep:tracy-client"]
2220

2321
[lints.rust]
2422
unexpected_cfgs = { level = "warn", check-cfg = [
2523
'cfg(feature, values("tracy"))',
2624
] }
2725

2826
[dependencies]
27+
anyhow.workspace = true
2928
bincode = { workspace = true, features = ["serde"] }
3029
bytemuck.workspace = true
31-
criterion.workspace = true
30+
# criterion.workspace = true
3231
naga = { workspace = true, features = [
3332
"deserialize",
3433
"serialize",
@@ -43,8 +42,12 @@ naga = { workspace = true, features = [
4342
] }
4443
naga-test = { workspace = true, features = [] }
4544
nanorand.workspace = true
45+
pico-args.workspace = true
4646
pollster.workspace = true
4747
profiling.workspace = true
4848
rayon.workspace = true
49+
serde = { workspace = true, features = ["derive"] }
50+
serde_json.workspace = true
51+
termcolor.workspace = true
4952
tracy-client = { workspace = true, optional = true }
5053
wgpu.workspace = true

benches/README.md

Lines changed: 47 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,6 @@
11
Collection of CPU benchmarks for `wgpu`.
22

33
These benchmarks are designed as a first line of defence against performance regressions and generally approximate the performance for users.
4-
They all do very little GPU work and are testing the CPU performance of the API.
5-
6-
Criterion will give you the end-to-end performance of the benchmark, but you can also use a profiler to get more detailed information about where time is being spent.
74

85
## Usage
96

@@ -14,73 +11,38 @@ cargo bench -p wgpu-benchmark
1411
cargo bench -p wgpu-benchmark -- "filter"
1512
```
1613

17-
## Benchmarks
18-
19-
#### `Renderpass`
20-
21-
This benchmark measures the performance of recording and submitting a render pass with a large
22-
number of draw calls and resources, emulating an intense, more traditional graphics application.
23-
By default it measures 10k draw calls, with 90k total resources.
24-
25-
Within this benchmark, both single threaded and multi-threaded recording are tested, as well as splitting
26-
the render pass into multiple passes over multiple command buffers.
27-
If available, it also tests a bindless approach, binding all textures at once instead of switching
28-
the bind group for every draw call.
29-
30-
#### `Computepass`
31-
32-
This benchmark measures the performance of recording and submitting a compute pass with a large
33-
number of dispatches and resources.
34-
By default it measures 10k dispatch calls, with 60k total resources, emulating an unusually complex and sequential compute workload.
35-
36-
Within this benchmark, both single threaded and multi-threaded recording are tested, as well as splitting
37-
the compute pass into multiple passes over multiple command buffers.
38-
If available, it also tests a bindless approach, binding all resources at once instead of switching
39-
the bind group for every draw call.
40-
TODO(https://github.com/gfx-rs/wgpu/issues/5766): The bindless version uses only 1k dispatches with 6k resources since it would be too slow for a reasonable benchmarking time otherwise.
41-
42-
43-
#### `Resource Creation`
44-
45-
This benchmark measures the performance of creating large resources. By default it makes buffers that are 256MB. It tests this over a range of thread counts.
46-
47-
#### `Shader Compilation`
48-
49-
This benchmark measures the performance of naga parsing, validating, and generating shaders.
14+
Use `WGPU_BACKEND` and `WGPU_ADAPTER_NAME` to adjust which device the benchmarks use. [More info on env vars](../README.md#environment-variables).
5015

5116
## Comparing Against a Baseline
5217

5318
To compare the current benchmarks against a baseline, you can use the `--save-baseline` and `--baseline` flags.
5419

55-
For example, to compare v0.20 against trunk, you could run the following:
20+
For example, to compare v28 against trunk, you could run the following:
5621

5722
```sh
58-
git checkout v0.20
59-
23+
git checkout v28
6024
# Run the baseline benchmarks
61-
cargo bench -p wgpu-benchmark -- --save-baseline "v0.20"
25+
cargo bench -p wgpu-benchmark -- --save-baseline "v28"
6226

6327
git checkout trunk
64-
6528
# Run the current benchmarks
66-
cargo bench -p wgpu-benchmark -- --baseline "v0.20"
29+
cargo bench -p wgpu-benchmark -- --baseline "v28"
6730
```
6831

69-
You can use this for any bits of code you want to compare.
32+
The current benchmarking framework was added before v28, so comparisons only work after it was added. Before that the same commands will work, but comparison will be done using `criterion`.
7033

7134
## Integration with Profilers
7235

7336
The benchmarks can be run with a profiler to get more detailed information about where time is being spent.
74-
Integrations are available for `tracy` and `superluminal`. Due to some implementation details,
75-
you need to uncomment the features in the `Cargo.toml` to allow features to be used.
37+
Integrations are available for `tracy` and `superluminal`.
7638

7739
#### Tracy
7840

7941
Tracy is available prebuilt for Windows on [github](https://github.com/wolfpld/tracy/releases/latest/).
8042

8143
```sh
8244
# Once this is running, you can connect to it with the Tracy Profiler
83-
cargo bench -p wgpu-benchmark --features tracy
45+
cargo bench -p wgpu-benchmark --features tracy,profiling/profile-with-tracy
8446
```
8547

8648
#### Superluminal
@@ -89,10 +51,10 @@ Superluminal is a paid product for windows available [here](https://superluminal
8951

9052
```sh
9153
# This command will build the benchmarks, and display the path to the executable
92-
cargo bench -p wgpu-benchmark --features superluminal -- -h
54+
cargo bench -p wgpu-benchmark --features profiling/profile-with-superluminal -- -h
9355

9456
# Have Superluminal run the following command (replacing with the path to the executable)
95-
./target/release/deps/root-2c45d61b38a65438.exe --bench "filter"
57+
<path_to_exe> --bench "filter"
9658
```
9759

9860
#### `perf` and others
@@ -105,6 +67,42 @@ For example, the command line tool `perf` can be used to profile the benchmarks.
10567
cargo bench -p wgpu-benchmark -- -h
10668

10769
# Run the benchmarks with perf
108-
perf record ./target/release/deps/root-2c45d61b38a65438 --bench "filter"
70+
perf record <path_to_exe> --bench "filter"
10971
```
11072

73+
## Benchmarks
74+
75+
#### `Renderpass Encoding`
76+
77+
This benchmark measures the performance of recording and submitting a render pass with a large
78+
number of draw calls and resources, emulating an intense, more traditional graphics application.
79+
By default it measures 10k draw calls, with 90k total resources.
80+
81+
Within this benchmark, both single threaded and multi-threaded recording are tested, as well as splitting
82+
the render pass into multiple passes over multiple command buffers.
83+
If available, it also tests a bindless approach, binding all textures at once instead of switching
84+
the bind group for every draw call.
85+
86+
#### `Computepass Encoding`
87+
88+
This benchmark measures the performance of recording and submitting a compute pass with a large
89+
number of dispatches and resources.
90+
By default it measures 10k dispatch calls, with 60k total resources, emulating an unusually complex and sequential compute workload.
91+
92+
Within this benchmark, both single threaded and multi-threaded recording are tested, as well as splitting
93+
the compute pass into multiple passes over multiple command buffers.
94+
If available, it also tests a bindless approach, binding all resources at once instead of switching
95+
the bind group for every draw call.
96+
TODO(https://github.com/gfx-rs/wgpu/issues/5766): The bindless version uses only 1k dispatches with 6k resources since it would be too slow for a reasonable benchmarking time otherwise.
97+
98+
#### `Device::create_buffer`
99+
100+
This benchmark measures the performance of creating large buffers.
101+
102+
#### `Device::create_bind_group`
103+
104+
This benchmark measures the performance of creating large bind groups of 5 to 50,000 resources.
105+
106+
#### `naga::back`, `naga::compact`, `naga::front`, and `naga::valid`
107+
108+
These benchmark measures the performance of naga parsing, validating, and generating shaders.

0 commit comments

Comments
 (0)