Skip to content

Conversation

Jialin
Copy link
Contributor

@Jialin Jialin commented Aug 9, 2025

Currently, the bench script is not runnable (from xgrammar.kernels import apply_token_bitmask_inplace_kernels not found).

Change

  • Update the script to make it runnable
  • Kick off multiple setup in a single run, so we could create benchmark report in one shot

Usage

(xgrammar) Fri Aug 08 22:18:25 [/data/users/jialino/gitrepos/xgrammar] python3 examples/benchmark/bench_apply_token_bitmask_inplace.py
Running cmake --build & --install in /data/users/jialino/gitrepos/xgrammar/build
ninja: no work to do.
-- Install configuration: "RelWithDebInfo"
-- Up-to-date: /home/jialino/uv_env/xgrammar/lib64/python3.12/site-packages/xgrammar/./xgrammar_bindings.cpython-312-x86_64-linux-gnu.so
W0808 22:18:51.578000 320509 torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
W0808 22:18:51.578000 320509 torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [01:13<00:00,  4.92s/it]
|   Batch |   Vocab |   Masked cnt |   Torch Compile |         Triton  |
|    size |    size |              |     Baseline us |    us (speedup) |
|--------:|--------:|-------------:|----------------:|----------------:|
|       1 |  128000 |            1 |            6.04 |    5.52 (1.09x) |
|       1 |  128000 |        64000 |            5.96 |    6.16 (0.97x) |
|       1 |  128000 |       127000 |            6.01 |    6.27 (0.96x) |
|       8 |  128000 |            1 |           10.90 |    6.04 (1.81x) |
|       8 |  128000 |        64000 |           10.90 |    7.76 (1.40x) |
|       8 |  128000 |       127000 |           10.91 |    8.02 (1.36x) |
|      64 |  128000 |            1 |           48.72 |   13.36 (3.65x) |
|      64 |  128000 |        64000 |           48.74 |   46.35 (1.05x) |
|      64 |  128000 |       127000 |           48.74 |   33.26 (1.47x) |
|     512 |  128000 |            1 |          350.11 |   67.43 (5.19x) |
|     512 |  128000 |        64000 |          347.57 |  330.76 (1.05x) |
|     512 |  128000 |       127000 |          345.73 |  250.06 (1.38x) |
|    4096 |  128000 |            1 |         2903.81 |  494.67 (5.87x) |
|    4096 |  128000 |        64000 |         2855.70 | 2516.79 (1.13x) |
|    4096 |  128000 |       127000 |         2720.98 | 1936.44 (1.41x) |

Copy link
Collaborator

@Ubospica Ubospica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Previously we are using

https://github.com/mlc-ai/xgrammar/blob/main/tests/python/test_token_bitmask_operations.py#L167

to test the kernels. With this script fixed I think we can

  • Use this script to test efficiency
  • Use test_token_bitmask_operations.py to test correctness
  • Remove test_apply_token_bitmask_inplace_kernel_large in test_token_bitmask_operations.py

@Ubospica Ubospica merged commit 591dff9 into mlc-ai:main Aug 10, 2025
8 checks passed
@Jialin Jialin deleted the bitmask_bench branch August 11, 2025 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants