Skip to content

Commit dd3fba3

Browse files
upfixergemini-code-assist[bot]key4ng
authored
[Plot] Rename plot axis label and xlimt start from 0 and update docs (#39)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: key4ng <[email protected]>
1 parent 4a42ed4 commit dd3fba3

19 files changed

+142
-177
lines changed

.coveragerc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@ omit =
33
genai_bench/cli/report.py
44
genai_bench/analysis/excel_report.py
55
genai_bench/analysis/plot_report.py
6+
genai_bench/analysis/flexible_plot_report.py
7+
genai_bench/analysis/plot_config.py
68
genai_bench/ui/*
79
genai_bench/logging.py
810
tests/*

docs/user-guide/generate-plot.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@
33
## Quick Start
44
You can check out `genai-bench plot --help` to find how to generate a 2x4 Plot containing:
55

6-
1. Output Inference Speed (tokens/s) vs Output Throughput of Server (tokens/s)
7-
2. TTFT (s) vs Output Throughput of Server (tokens/s)
6+
1. Per-Request Inference Speed (tokens/s) vs Server Output Throughput (tokens/s)
7+
2. TTFT (s) vs Server Output Throughput (tokens/s)
88
3. Mean E2E Latency (s) per Request vs RPS
99
4. Error Rates by HTTP Status vs Concurrency
10-
5. Output Inference Speed per Request (tokens/s) vs Total Throughput (Input + Output) of Server (tokens/s)
11-
6. TTFT (s) vs Total Throughput (Input + Output) of Server (tokens/s)
10+
5. Per-Request Inference Speed (tokens/s) vs Server Total Throughput (Input + Output) (tokens/s)
11+
6. TTFT (s) vs Server Total Throughput (Input + Output) (tokens/s)
1212
7. P90 E2E Latency (s) per Request vs RPS
1313
8. P99 E2E Latency (s) per Request vs RPS
1414

docs/user-guide/multi-cloud-auth-storage.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -176,7 +176,7 @@ genai-bench benchmark \
176176
--max-requests-per-run 100 \
177177
--max-time-per-run 10
178178
```
179-
**Note:** for Dedicated model, the `--api-model-name` is just a placeholder, the model depends on the the endpointId you provided
179+
**Note:** for Dedicated model, the `--api-model-name` is just a placeholder, the model depends on the the endpointId you provided
180180

181181
**Advanced features:**
182182
```bash
@@ -343,7 +343,7 @@ vLLM and SGLang use OpenAI-compatible APIs with optional authentication.
343343
**Example:**
344344
```bash
345345
genai-bench benchmark \
346-
--api-backend vllm \
346+
--api-backend sglang \
347347
--api-base http://localhost:8000 \
348348
--api-key optional-key \
349349
--api-model-name meta-llama/Llama-2-7b-hf \
@@ -657,4 +657,4 @@ The main changes are:
657657

658658
- `--bucket``--storage-bucket`
659659
- `--prefix``--storage-prefix`
660-
- Add `--storage-provider oci` (though OCI is the default for backward compatibility)
660+
- Add `--storage-provider oci` (though OCI is the default for backward compatibility)

docs/user-guide/multi-cloud-quick-reference.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This is a quick reference guide for common multi-cloud scenarios with genai-bench. For detailed information, see the [comprehensive guide](multi-cloud-auth-storage.md).
44

5-
> **Note**: For OpenAI, vLLM, and SGLang backends, both `--api-key` and `--model-api-key` are supported for backward compatibility.
5+
> **Note**: For OpenAI, SGLang and vLLM backends, both `--api-key` and `--model-api-key` are supported for backward compatibility.
66
77
## OpenAI Benchmarking
88

@@ -277,4 +277,4 @@ export GITHUB_REPO=benchmarks
277277
```bash
278278
# HuggingFace (for downloading tokenizers)
279279
export HF_TOKEN=hf_...
280-
```
280+
```

docs/user-guide/run-benchmark.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,12 @@ export TRANSFORMERS_VERBOSITY=error
2121
genai-bench benchmark --api-backend openai \
2222
--api-base "http://localhost:8082" \
2323
--api-key "your-openai-api-key" \
24-
--api-model-name "vllm-model" \
24+
--api-model-name "meta-llama/Meta-Llama-3-70B-Instruct" \
2525
--model-tokenizer "/mnt/data/models/Meta-Llama-3.1-70B-Instruct" \
2626
--task text-to-text \
2727
--max-time-per-run 15 \
2828
--max-requests-per-run 300 \
29-
--server-engine "vLLM" \
29+
--server-engine "SGLang" \
3030
--server-gpu-type "H100" \
3131
--server-version "v0.6.0" \
3232
--server-gpu-count 4
@@ -119,7 +119,7 @@ genai-bench benchmark --api-backend oci-cohere \
119119
--api-base "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com" \
120120
--api-model-name "c4ai-command-r-08-2024" \
121121
--model-tokenizer "/home/ubuntu/c4ai-command-r-08-2024" \
122-
--server-engine "vLLM" \
122+
--server-engine "SGLang" \
123123
--task text-to-text \
124124
--num-concurrency 1 \
125125
--server-gpu-type A100-80G \
@@ -344,4 +344,4 @@ If you want to benchmark a specific portion of a vision dataset, you can use the
344344
- Access to ALL HuggingFace `load_dataset` parameters
345345
- Reusable and version-controllable
346346
- Support for complex configurations
347-
- Future-proof (no CLI updates needed for new HuggingFace features)
347+
- Future-proof (no CLI updates needed for new HuggingFace features)

docs/user-guide/upload-benchmark-result.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,12 @@ To enable result uploading, use the following options with the `benchmark` comma
1616
genai-bench benchmark \
1717
--api-base "http://localhost:8082" \
1818
--api-key "your-openai-api-key" \
19-
--api-model-name "vllm-model" \
19+
--api-model-name "meta-llama/Meta-Llama-3-70B-Instruct" \
2020
--model-tokenizer "/mnt/data/models/Meta-Llama-3.1-70B-Instruct" \
2121
--task text-to-text \
2222
--max-time-per-run 15 \
2323
--max-requests-per-run 300 \
24-
--server-engine "vLLM" \
24+
--server-engine "SGLang" \
2525
--server-gpu-type "H100" \
2626
--server-version "v0.6.0" \
2727
--server-gpu-count 4 \
@@ -44,4 +44,4 @@ GenAI Bench now supports multiple cloud storage providers:
4444
- **GCP Cloud Storage**: Use `--storage-provider gcp`
4545
- **GitHub Releases**: Use `--storage-provider github`
4646

47-
For detailed configuration and authentication options for each provider, please refer to the [Multi-Cloud Authentication & Storage Guide](multi-cloud-auth-storage.md).
47+
For detailed configuration and authentication options for each provider, please refer to the [Multi-Cloud Authentication & Storage Guide](multi-cloud-auth-storage.md).

examples/experiment_excel.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
LoggingManager("excel")
1212

1313

14-
folder_name = "/Users/changsu/openai_chat_vllm-model_tokenizer__mnt_data_models_Llama-3-70B-Instruct_20240904_003850" # noqa: E501
14+
folder_name = "/Users/changsu/openai_chat_sglang-model_tokenizer__mnt_data_models_Llama-3-70B-Instruct_20240904_003850" # noqa: E501
1515
os.makedirs(folder_name, exist_ok=True)
1616
experiment_metadata, run_data = load_one_experiment(folder_name)
1717
create_workbook(

examples/experiment_plots.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,16 @@
66
load_multiple_experiments,
77
load_one_experiment,
88
)
9-
from genai_bench.analysis.plot_report import plot_experiment_data
9+
from genai_bench.analysis.flexible_plot_report import plot_experiment_data_flexible
1010
from genai_bench.logging import LoggingManager
1111

1212
LoggingManager("plot")
1313

1414

1515
# Example usage with filtering multiple experiments
16-
folder_name = "/Users/changsu/experiment_plot"
16+
folder_name = "<Path to the experiment folder>"
1717
filter_criteria = {
18-
"model": "vllm-model",
18+
"model": "Llama-4-Scout-17B-16E-Instruct",
1919
}
2020

2121
os.makedirs(folder_name, exist_ok=True)
@@ -26,20 +26,20 @@
2626
print("Empty data after filtering")
2727
else:
2828
# Plot the data grouped by 'server_version'
29-
plot_experiment_data(
29+
plot_experiment_data_flexible(
3030
run_data_list, group_key="server_version", experiment_folder=folder_name
3131
)
3232

3333
# Plot for one experiment
3434
experiment_folder = os.path.join(
3535
folder_name,
36-
"openai_chat_vllm-model_tokenizer__mnt_data_models_Llama-3-70B-Instruct_20240904_003850",
36+
"openai_SGLang_v0.4.7.post1_text-to-text_Llama-4-Scout-17B-16E-Instruct_20250620_042005",
3737
)
3838
experiment_metadata, run_data = load_one_experiment(experiment_folder)
3939
if not experiment_metadata or not run_data:
4040
print("Didn't find any experiment data")
4141
else:
42-
plot_experiment_data(
42+
plot_experiment_data_flexible(
4343
[
4444
[experiment_metadata, run_data],
4545
],

genai_bench/analysis/excel_report.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -195,17 +195,17 @@ def _create_appendix_sheet_common(
195195
[
196196
"End-to-End Latency per Request (s)",
197197
"Request Throughput (RPS)",
198-
"Total Throughput (Input + Output) of Server (tokens/s)",
198+
"Server Total Throughput (Input + Output) (tokens/s)",
199199
]
200200
)
201201
else:
202202
headers.extend(
203203
[
204-
"Output Inference Speed per Request (tokens/s)",
205-
"Output Throughput of Server (tokens/s)",
204+
"Per-Request Inference Speed (tokens/s)",
205+
"Server Output Throughput (tokens/s)",
206206
"End-to-End Latency per Request (s)",
207207
"Request Throughput (RPS)",
208-
"Total Throughput (Input + Output) of Server (tokens/s)",
208+
"Server Total Throughput (Input + Output) (tokens/s)",
209209
]
210210
)
211211

genai_bench/analysis/plot_config.py

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -138,20 +138,20 @@ class PlotConfigManager:
138138
"layout": {"rows": 2, "cols": 4, "figsize": [32, 12]},
139139
"plots": [
140140
{
141-
"title": "Output Inference Speed per Request vs "
142-
"Output Throughput of Server",
141+
"title": "Per-Request Inference Speed vs "
142+
"Server Output Throughput",
143143
"x_field": "mean_output_throughput_tokens_per_s",
144144
"y_field": "stats.output_inference_speed.mean",
145-
"x_label": "Output Throughput of Server (tokens/s)",
146-
"y_label": "Output Inference Speed per Request (tokens/s)",
145+
"x_label": "Server Output Throughput (tokens/s)",
146+
"y_label": "Per-Request Inference Speed (tokens/s)",
147147
"plot_type": "line",
148148
"position": [0, 0],
149149
},
150150
{
151-
"title": "TTFT vs Output Throughput of Server",
151+
"title": "TTFT vs Server Output Throughput",
152152
"x_field": "mean_output_throughput_tokens_per_s",
153153
"y_field": "stats.ttft.mean",
154-
"x_label": "Output Throughput of Server (tokens/s)",
154+
"x_label": "Server Output Throughput (tokens/s)",
155155
"y_label": "TTFT",
156156
"plot_type": "line",
157157
"position": [0, 1],
@@ -175,20 +175,20 @@ class PlotConfigManager:
175175
"position": [0, 3],
176176
},
177177
{
178-
"title": "Output Inference Speed per Request vs "
179-
"Total Throughput (Input + Output) of Server",
178+
"title": "Per-Request Inference Speed vs "
179+
"Server Total Throughput (Input + Output)",
180180
"x_field": "mean_total_tokens_throughput_tokens_per_s",
181181
"y_field": "stats.output_inference_speed.mean",
182-
"x_label": "Total Throughput (Input + Output) of Server (tokens/s)",
183-
"y_label": "Output Inference Speed per Request (tokens/s)",
182+
"x_label": "Server Total Throughput (Input + Output) (tokens/s)",
183+
"y_label": "Per-Request Inference Speed (tokens/s)",
184184
"plot_type": "line",
185185
"position": [1, 0],
186186
},
187187
{
188-
"title": "TTFT vs Total Throughput (Input + Output) of Server",
188+
"title": "TTFT vs Server Total Throughput (Input + Output)",
189189
"x_field": "mean_total_tokens_throughput_tokens_per_s",
190190
"y_field": "stats.ttft.mean",
191-
"x_label": "Total Throughput (Input + Output) of Server (tokens/s)",
191+
"x_label": "Server Total Throughput (Input + Output) (tokens/s)",
192192
"y_label": "TTFT",
193193
"plot_type": "line",
194194
"position": [1, 1],

0 commit comments

Comments
 (0)