Fix and improve apply_token_bitmask benchmark script #391

Jialin · 2025-08-09T05:44:42Z

Currently, the bench script is not runnable (from xgrammar.kernels import apply_token_bitmask_inplace_kernels not found).

Change

Update the script to make it runnable
Kick off multiple setup in a single run, so we could create benchmark report in one shot

Usage

(xgrammar) Fri Aug 08 22:18:25 [/data/users/jialino/gitrepos/xgrammar] python3 examples/benchmark/bench_apply_token_bitmask_inplace.py
Running cmake --build & --install in /data/users/jialino/gitrepos/xgrammar/build
ninja: no work to do.
-- Install configuration: "RelWithDebInfo"
-- Up-to-date: /home/jialino/uv_env/xgrammar/lib64/python3.12/site-packages/xgrammar/./xgrammar_bindings.cpython-312-x86_64-linux-gnu.so
W0808 22:18:51.578000 320509 torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
W0808 22:18:51.578000 320509 torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [01:13<00:00,  4.92s/it]
|   Batch |   Vocab |   Masked cnt |   Torch Compile |         Triton  |
|    size |    size |              |     Baseline us |    us (speedup) |
|--------:|--------:|-------------:|----------------:|----------------:|
|       1 |  128000 |            1 |            6.04 |    5.52 (1.09x) |
|       1 |  128000 |        64000 |            5.96 |    6.16 (0.97x) |
|       1 |  128000 |       127000 |            6.01 |    6.27 (0.96x) |
|       8 |  128000 |            1 |           10.90 |    6.04 (1.81x) |
|       8 |  128000 |        64000 |           10.90 |    7.76 (1.40x) |
|       8 |  128000 |       127000 |           10.91 |    8.02 (1.36x) |
|      64 |  128000 |            1 |           48.72 |   13.36 (3.65x) |
|      64 |  128000 |        64000 |           48.74 |   46.35 (1.05x) |
|      64 |  128000 |       127000 |           48.74 |   33.26 (1.47x) |
|     512 |  128000 |            1 |          350.11 |   67.43 (5.19x) |
|     512 |  128000 |        64000 |          347.57 |  330.76 (1.05x) |
|     512 |  128000 |       127000 |          345.73 |  250.06 (1.38x) |
|    4096 |  128000 |            1 |         2903.81 |  494.67 (5.87x) |
|    4096 |  128000 |        64000 |         2855.70 | 2516.79 (1.13x) |
|    4096 |  128000 |       127000 |         2720.98 | 1936.44 (1.41x) |

Signed-off-by: Jialin Ouyang <[email protected]>

Ubospica

LGTM. Previously we are using

https://github.com/mlc-ai/xgrammar/blob/main/tests/python/test_token_bitmask_operations.py#L167

to test the kernels. With this script fixed I think we can

Use this script to test efficiency
Use test_token_bitmask_operations.py to test correctness
Remove test_apply_token_bitmask_inplace_kernel_large in test_token_bitmask_operations.py

Fix and improve apply_token_bitmask benchmark script

de223aa

Signed-off-by: Jialin Ouyang <[email protected]>

Ubospica approved these changes Aug 10, 2025

View reviewed changes

Ubospica merged commit 591dff9 into mlc-ai:main Aug 10, 2025
8 checks passed

Jialin deleted the bitmask_bench branch August 11, 2025 06:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix and improve apply_token_bitmask benchmark script #391

Fix and improve apply_token_bitmask benchmark script #391

Uh oh!

Jialin commented Aug 9, 2025

Uh oh!

Ubospica left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix and improve apply_token_bitmask benchmark script #391

Fix and improve apply_token_bitmask benchmark script #391

Uh oh!

Conversation

Jialin commented Aug 9, 2025

Change

Usage

Uh oh!

Ubospica left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants