Skip to content

Commit 6b813be

Browse files
openshift-merge-bot[bot]heyselbi
authored andcommitted
Merge pull request opendatahub-io#63 from dtrifiro/sync-release-with-main
sync release with main @ v0.5.0.post1-99-g8720c92e
2 parents 45f4fe4 + eeb6f33 commit 6b813be

File tree

546 files changed

+39374
-15406
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

546 files changed

+39374
-15406
lines changed

.buildkite/check-wheel-size.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import os
22
import zipfile
33

4-
MAX_SIZE_MB = 150
4+
MAX_SIZE_MB = 200
55

66

77
def print_top_10_largest_files(zip_file):
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# vLLM benchmark suite
2+
3+
## Introduction
4+
5+
This directory contains the performance benchmarking CI for vllm.
6+
The goal is to help developers know the impact of their PRs on the performance of vllm.
7+
8+
This benchmark will be *triggered* upon:
9+
- A PR being merged into vllm.
10+
- Every commit for those PRs with `perf-benchmarks` label.
11+
12+
**Benchmarking Coverage**: latency, throughput and fix-qps serving on A100 (the support for more GPUs is comming later), with different models.
13+
14+
**Benchmarking Duration**: about 1hr.
15+
16+
**For benchmarking developers**: please try your best to constraint the duration of benchmarking to less than 1.5 hr so that it won't take forever to run.
17+
18+
19+
## Configuring the workload
20+
21+
The benchmarking workload contains three parts:
22+
- Latency tests in `latency-tests.json`.
23+
- Throughput tests in `throughput-tests.json`.
24+
- Serving tests in `serving-tests.json`.
25+
26+
See [descriptions.md](tests/descriptions.md) for detailed descriptions.
27+
28+
### Latency test
29+
30+
Here is an example of one test inside `latency-tests.json`:
31+
32+
```json
33+
[
34+
{
35+
"test_name": "latency_llama8B_tp1",
36+
"parameters": {
37+
"model": "meta-llama/Meta-Llama-3-8B",
38+
"tensor_parallel_size": 1,
39+
"load_format": "dummy",
40+
"num_iters_warmup": 5,
41+
"num_iters": 15
42+
}
43+
},
44+
]
45+
```
46+
47+
In this example:
48+
- The `test_name` attributes is a unique identifier for the test. In `latency-tests.json`, it must start with `latency_`.
49+
- The `parameters` attribute control the command line arguments to be used for `benchmark_latency.py`. Note that please use underline `_` instead of the dash `-` when specifying the command line arguments, and `run-benchmarks-suite.sh` will convert the underline to dash when feeding the arguments to `benchmark_latency.py`. For example, the corresponding command line arguments for `benchmark_latency.py` will be `--model meta-llama/Meta-Llama-3-8B --tensor-parallel-size 1 --load-format dummy --num-iters-warmup 5 --num-iters 15`
50+
51+
Note that the performance numbers are highly sensitive to the value of the parameters. Please make sure the parameters are set correctly.
52+
53+
WARNING: The benchmarking script will save json results by itself, so please do not configure `--output-json` parameter in the json file.
54+
55+
56+
### Throughput test
57+
The tests are specified in `throughput-tests.json`. The syntax is similar to `latency-tests.json`, except for that the parameters will be fed forward to `benchmark_throughput.py`.
58+
59+
The number of this test is also stable -- a slight change on the value of this number might vary the performance numbers by a lot.
60+
61+
### Serving test
62+
We test the throughput by using `benchmark_serving.py` with request rate = inf to cover the online serving overhead. The corresponding parameters are in `serving-tests.json`, and here is an example:
63+
64+
```
65+
[
66+
{
67+
"test_name": "serving_llama8B_tp1_sharegpt",
68+
"qps_list": [1, 4, 16, "inf"],
69+
"server_parameters": {
70+
"model": "meta-llama/Meta-Llama-3-8B",
71+
"tensor_parallel_size": 1,
72+
"swap_space": 16,
73+
"disable_log_stats": "",
74+
"disable_log_requests": "",
75+
"load_format": "dummy"
76+
},
77+
"client_parameters": {
78+
"model": "meta-llama/Meta-Llama-3-8B",
79+
"backend": "vllm",
80+
"dataset_name": "sharegpt",
81+
"dataset_path": "./ShareGPT_V3_unfiltered_cleaned_split.json",
82+
"num_prompts": 200
83+
}
84+
},
85+
]
86+
```
87+
88+
Inside this example:
89+
- The `test_name` attribute is also a unique identifier for the test. It must start with `serving_`.
90+
- The `server-parameters` includes the command line arguments for vLLM server.
91+
- The `client-parameters` includes the command line arguments for `benchmark_serving.py`.
92+
- The `qps_list` controls the list of qps for test. It will be used to configure the `--request-rate` parameter in `benchmark_serving.py`
93+
94+
The number of this test is less stable compared to the delay and latency benchmarks (due to randomized sharegpt dataset sampling inside `benchmark_serving.py`), but a large change on this number (e.g. 5% change) still vary the output greatly.
95+
96+
WARNING: The benchmarking script will save json results by itself, so please do not configure `--save-results` or other results-saving-related parameters in `serving-tests.json`.
97+
98+
## Visualizing the results
99+
The `convert-results-json-to-markdown.py` helps you put the benchmarking results inside a markdown table, by formatting [descriptions.md](tests/descriptions.md) with real benchmarking results.
100+
You can find the result presented as a table inside the `buildkite/performance-benchmark` job page.
101+
If you do not see the table, please wait till the benchmark finish running.
102+
The json version of the table (together with the json version of the benchmark) will be also attached to the markdown file.
103+
The raw benchmarking results (in the format of json files) are in the `Artifacts` tab of the benchmarking.
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
steps:
2+
- label: "Wait for container to be ready"
3+
agents:
4+
queue: A100
5+
plugins:
6+
- kubernetes:
7+
podSpec:
8+
containers:
9+
- image: badouralix/curl-jq
10+
command:
11+
- sh
12+
- .buildkite/nightly-benchmarks/scripts/wait-for-image.sh
13+
- wait
14+
- label: "A100 Benchmark"
15+
agents:
16+
queue: A100
17+
plugins:
18+
- kubernetes:
19+
podSpec:
20+
priorityClassName: perf-benchmark
21+
containers:
22+
- image: public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT
23+
command:
24+
- bash .buildkite/nightly-benchmarks/run-benchmarks-suite.sh
25+
resources:
26+
limits:
27+
nvidia.com/gpu: 8
28+
volumeMounts:
29+
- name: devshm
30+
mountPath: /dev/shm
31+
env:
32+
- name: VLLM_USAGE_SOURCE
33+
value: ci-test
34+
- name: HF_TOKEN
35+
valueFrom:
36+
secretKeyRef:
37+
name: hf-token-secret
38+
key: token
39+
nodeSelector:
40+
nvidia.com/gpu.product: NVIDIA-A100-SXM4-80GB
41+
volumes:
42+
- name: devshm
43+
emptyDir:
44+
medium: Memory
45+
# - label: "H100: NVIDIA SMI"
46+
# agents:
47+
# queue: H100
48+
# plugins:
49+
# - docker#v5.11.0:
50+
# image: public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT
51+
# command:
52+
# - bash
53+
# - .buildkite/nightly-benchmarks/run-benchmarks-suite.sh
54+
# mount-buildkite-agent: true
55+
# propagate-environment: true
56+
# propagate-uid-gid: false
57+
# ipc: host
58+
# gpus: all
59+
# environment:
60+
# - VLLM_USAGE_SOURCE
61+
# - HF_TOKEN
62+
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#!/usr/bin/env bash
2+
3+
# NOTE(simon): this script runs inside a buildkite agent with CPU only access.
4+
set -euo pipefail
5+
6+
# Install system packages
7+
apt update
8+
apt install -y curl jq
9+
10+
# Install minijinja for templating
11+
curl -sSfL https://github.com/mitsuhiko/minijinja/releases/latest/download/minijinja-cli-installer.sh | sh
12+
source $HOME/.cargo/env
13+
14+
# If BUILDKITE_PULL_REQUEST != "false", then we check the PR labels using curl and jq
15+
if [ "$BUILDKITE_PULL_REQUEST" != "false" ]; then
16+
PR_LABELS=$(curl -s "https://api.github.com/repos/vllm-project/vllm/pulls/$BUILDKITE_PULL_REQUEST" | jq -r '.labels[].name')
17+
18+
if [[ $PR_LABELS == *"perf-benchmarks"* ]]; then
19+
echo "This PR has the 'perf-benchmarks' label. Proceeding with the nightly benchmarks."
20+
else
21+
echo "This PR does not have the 'perf-benchmarks' label. Skipping the nightly benchmarks."
22+
exit 0
23+
fi
24+
fi
25+
26+
# Upload sample.yaml
27+
buildkite-agent pipeline upload .buildkite/nightly-benchmarks/benchmark-pipeline.yaml

0 commit comments

Comments
 (0)