You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These benchmarks are designed as a first line of defence against performance regressions and generally approximate the performance for users.
4
-
They all do very little GPU work and are testing the CPU performance of the API.
5
-
6
-
Criterion will give you the end-to-end performance of the benchmark, but you can also use a profiler to get more detailed information about where time is being spent.
7
4
8
5
## Usage
9
6
@@ -14,73 +11,38 @@ cargo bench -p wgpu-benchmark
14
11
cargo bench -p wgpu-benchmark -- "filter"
15
12
```
16
13
17
-
## Benchmarks
18
-
19
-
#### `Renderpass`
20
-
21
-
This benchmark measures the performance of recording and submitting a render pass with a large
22
-
number of draw calls and resources, emulating an intense, more traditional graphics application.
23
-
By default it measures 10k draw calls, with 90k total resources.
24
-
25
-
Within this benchmark, both single threaded and multi-threaded recording are tested, as well as splitting
26
-
the render pass into multiple passes over multiple command buffers.
27
-
If available, it also tests a bindless approach, binding all textures at once instead of switching
28
-
the bind group for every draw call.
29
-
30
-
#### `Computepass`
31
-
32
-
This benchmark measures the performance of recording and submitting a compute pass with a large
33
-
number of dispatches and resources.
34
-
By default it measures 10k dispatch calls, with 60k total resources, emulating an unusually complex and sequential compute workload.
35
-
36
-
Within this benchmark, both single threaded and multi-threaded recording are tested, as well as splitting
37
-
the compute pass into multiple passes over multiple command buffers.
38
-
If available, it also tests a bindless approach, binding all resources at once instead of switching
39
-
the bind group for every draw call.
40
-
TODO(https://github.com/gfx-rs/wgpu/issues/5766): The bindless version uses only 1k dispatches with 6k resources since it would be too slow for a reasonable benchmarking time otherwise.
41
-
42
-
43
-
#### `Resource Creation`
44
-
45
-
This benchmark measures the performance of creating large resources. By default it makes buffers that are 256MB. It tests this over a range of thread counts.
46
-
47
-
#### `Shader Compilation`
48
-
49
-
This benchmark measures the performance of naga parsing, validating, and generating shaders.
14
+
Use `WGPU_BACKEND` and `WGPU_ADAPTER_NAME` to adjust which device the benchmarks use. [More info on env vars](../README.md#environment-variables).
50
15
51
16
## Comparing Against a Baseline
52
17
53
18
To compare the current benchmarks against a baseline, you can use the `--save-baseline` and `--baseline` flags.
54
19
55
-
For example, to compare v0.20 against trunk, you could run the following:
20
+
For example, to compare v28 against trunk, you could run the following:
You can use this for any bits of code you want to compare.
32
+
The current benchmarking framework was added before v28, so comparisons only work after it was added. Before that the same commands will work, but comparison will be done using `criterion`.
70
33
71
34
## Integration with Profilers
72
35
73
36
The benchmarks can be run with a profiler to get more detailed information about where time is being spent.
74
-
Integrations are available for `tracy` and `superluminal`. Due to some implementation details,
75
-
you need to uncomment the features in the `Cargo.toml` to allow features to be used.
37
+
Integrations are available for `tracy` and `superluminal`.
76
38
77
39
#### Tracy
78
40
79
41
Tracy is available prebuilt for Windows on [github](https://github.com/wolfpld/tracy/releases/latest/).
80
42
81
43
```sh
82
44
# Once this is running, you can connect to it with the Tracy Profiler
@@ -105,6 +67,42 @@ For example, the command line tool `perf` can be used to profile the benchmarks.
105
67
cargo bench -p wgpu-benchmark -- -h
106
68
107
69
# Run the benchmarks with perf
108
-
perf record ./target/release/deps/root-2c45d61b38a65438 --bench "filter"
70
+
perf record <path_to_exe> --bench "filter"
109
71
```
110
72
73
+
## Benchmarks
74
+
75
+
#### `Renderpass Encoding`
76
+
77
+
This benchmark measures the performance of recording and submitting a render pass with a large
78
+
number of draw calls and resources, emulating an intense, more traditional graphics application.
79
+
By default it measures 10k draw calls, with 90k total resources.
80
+
81
+
Within this benchmark, both single threaded and multi-threaded recording are tested, as well as splitting
82
+
the render pass into multiple passes over multiple command buffers.
83
+
If available, it also tests a bindless approach, binding all textures at once instead of switching
84
+
the bind group for every draw call.
85
+
86
+
#### `Computepass Encoding`
87
+
88
+
This benchmark measures the performance of recording and submitting a compute pass with a large
89
+
number of dispatches and resources.
90
+
By default it measures 10k dispatch calls, with 60k total resources, emulating an unusually complex and sequential compute workload.
91
+
92
+
Within this benchmark, both single threaded and multi-threaded recording are tested, as well as splitting
93
+
the compute pass into multiple passes over multiple command buffers.
94
+
If available, it also tests a bindless approach, binding all resources at once instead of switching
95
+
the bind group for every draw call.
96
+
TODO(https://github.com/gfx-rs/wgpu/issues/5766): The bindless version uses only 1k dispatches with 6k resources since it would be too slow for a reasonable benchmarking time otherwise.
97
+
98
+
#### `Device::create_buffer`
99
+
100
+
This benchmark measures the performance of creating large buffers.
101
+
102
+
#### `Device::create_bind_group`
103
+
104
+
This benchmark measures the performance of creating large bind groups of 5 to 50,000 resources.
105
+
106
+
#### `naga::back`, `naga::compact`, `naga::front`, and `naga::valid`
107
+
108
+
These benchmark measures the performance of naga parsing, validating, and generating shaders.
0 commit comments