Bind fill_next_token_bitmask against nb::ndarray #338

Ahajha · 2025-06-16T06:13:06Z

Nanobind has built-in support for the DLPack standard, which allows us to take anything adhering to the DLPack spec here rather than just PyTorch objects. See here: https://nanobind.readthedocs.io/en/latest/ndarray.html

This does add a dependency on numpy, but only for a single type, which does seem a little overkill. If I push on #233 a little bit though, we should only need these definitions in the stubfiles, and will be unneeded for the .py files. So ideally this is a temporary dependency.

My goal is to reduce the dependency on PyTorch a bit by making the code more general. I don't know if (or even think) that we can remove it entirely, but this seems worthwhile to do.

The annotation on the nb::ndarray class will actually check at runtime that the parameter has the desired properties (in this case, it's on the CPU and is an int32_t), and will simply fail to call the function if not. This might be a slight breaking change in terms of what exception actually gets raised, but I think this is reasonable.

A few related changes:

I've converted a few function signatures to take DLTensors by const& instead of *, didn't see a good reason for it.
Added an anonymous namespace in nanobind.cc just to keep things hygenic.

Ubospica

Sorry, I just got time to review this. I have left some comments. It's generally good to support non-torch tensor types, like mlx, tensorflow, jax.array, mojo tensor, etc.

Currently there are still relatively few users using non-torch tensors, but I do hope more users can leverage XGrammar to achieve structured generation in different platforms.

You’re also very welcome to support Mojo. If you’re interested, you can implement support for Mojo tensors in contrib/mojo.py.

cpp/nanobind/nanobind.cc

cpp/grammar_matcher.cc

python/xgrammar/matcher.py

Signed-off-by: Ubospica <[email protected]>

Ubospica · 2025-07-13T07:11:00Z

@Ahajha I have made the changes. Please check it out and you're welcome to make comments!

Ahajha

Here's where I'm not sure about. I have a few goals here that I should probably lay out. Though admittedly, we (Modular) might be switching away from xgrammar for some of these.

The whole reason we're even considering switching away is because of the PyTorch dependency. We're in a sense a PT competitor, and so having it as a dependency is weird. It's also a packaging nightmare - it's massive, and you have to opt-in to a CPU-only version which is smaller if that's all you use (which is the case for us currently). So our options here are either switch away, use xgrammar as a C++ library (in which case this PR doesn't really matter, we'd just make the requisite bindings changes in-house), or submit the requisite changes to remove PT as a dependency here, which this is the only function touching PT that we actually use (I don't think we even use the kernels AFAICT).

All that said, by annotating the type with torch.tensor, we keep the PT dependency around. I think there's a way we could mark it as annotation-only, but we'd still need to do a bit of work to remove the dep from the rest of the file. At the end of the day, for this to work for us, the PT dependency would have to be dropped, which probably the only way forwards is to push everything else into an extra, which unfortunately there's no (current) way to specify a default extra, so it would break users (you'd have to require something like xgrammar[torch]. I think we're of the mindset that it's a good project, just not a good fit for us.

The other goal is just a personal one which is moving more code to bindings, I'll respond to that comment on the issue shortly but the idea of using ArrayLike is that that is what nanobind autogenerates for ndarrays, so it would carry over nicely.

cpp/grammar_matcher.cc

Ahajha · 2025-07-13T07:16:40Z

cpp/nanobind/nanobind.cc


 namespace nb = nanobind;
-using namespace xgrammar;

-namespace {
+namespace xgrammar {



This anonymous namespace was intentional - it probably doesn't make too much of a difference, but it marks any functions within it as compilation-unit-local, which could help optimize the code. It also can give warnings if anything in it isn't used. Since the functions are just used in the file, we can mark it as such.

Probably not a huge deal, but that was the motivation.

Yes, I understand the motivation. But since we have enabled LTO, the difference between the two could be minor. So both is fine I think.

cpp/nanobind/nanobind.cc

Ubospica · 2025-07-22T09:18:33Z

Here's where I'm not sure about. I have a few goals here that I should probably lay out. Though admittedly, we (Modular) might be switching away from xgrammar for some of these.

The whole reason we're even considering switching away is because of the PyTorch dependency. We're in a sense a PT competitor, and so having it as a dependency is weird. It's also a packaging nightmare - it's massive, and you have to opt-in to a CPU-only version which is smaller if that's all you use (which is the case for us currently). So our options here are either switch away, use xgrammar as a C++ library (in which case this PR doesn't really matter, we'd just make the requisite bindings changes in-house), or submit the requisite changes to remove PT as a dependency here, which this is the only function touching PT that we actually use (I don't think we even use the kernels AFAICT).

All that said, by annotating the type with torch.tensor, we keep the PT dependency around. I think there's a way we could mark it as annotation-only, but we'd still need to do a bit of work to remove the dep from the rest of the file. At the end of the day, for this to work for us, the PT dependency would have to be dropped, which probably the only way forwards is to push everything else into an extra, which unfortunately there's no (current) way to specify a default extra, so it would break users (you'd have to require something like xgrammar[torch]. I think we're of the mindset that it's a good project, just not a good fit for us.

The other goal is just a personal one which is moving more code to bindings, I'll respond to that comment on the issue shortly but the idea of using ArrayLike is that that is what nanobind autogenerates for ndarrays, so it would carry over nicely.

@Ahajha Thanks for the response!

I think it’s reasonable to gradually separate out the Torch-related code (for example, placing it in contrib/torch.py). But several issues arise when we want to maintain backward compatibility:

allocate_token_bitmask still needs to return a torch.Tensor, and since it’s a public API, that means the codebase still needs to depend on Torch.
Regarding Python dependencies, one possible solution is to let xgrammar[all] install all dependencies, while xgrammar itself doesn’t depend on Torch. I’m just a bit concerned that this might break backward compatibility with pip install xgrammar.

I’d like to hear your thoughts on these two issues.

At the very least, moving Torch-related code to contrib/torch.py might make it easier for you to fork the code and remove torch dependency for your usecase.

As for this PR, I think we can change the annotation to ArrayLike. Can you also mention in the Python docstring that this parameter accepts array-like objects such as torch.Tensor that supports the DLPack procotol? Then we can go ahead and merge it.

Ubospica · 2025-07-29T06:04:26Z

@Ahajha I think we can update to remove torch.Tensor in the type annotation, and merge this PR. I can do it tomorrow. If you have any further comments, feel free to share them!

python/xgrammar/matcher.py

Ubospica · 2025-07-30T23:29:00Z

@Ahajha Thank you for your contribution!

Ahajha · 2025-07-30T23:30:50Z

@Ubospica Sorry about the lack of response. Yes that should be fine, thanks!

Ubospica · 2025-07-30T23:31:41Z

@Ubospica Sorry about the lack of response. Yes that should be fine, thanks!

No worries. Thank you!

Bind fill_next_token_bitmask against nb::ndarray

4ccf711

Ahajha force-pushed the ahajha/torch-to-dltensor branch from a962dbc to 4ccf711 Compare June 16, 2025 06:25

Ubospica reviewed Jul 12, 2025

View reviewed changes

cpp/nanobind/nanobind.cc Outdated Show resolved Hide resolved

cpp/grammar_matcher.cc Outdated Show resolved Hide resolved

python/xgrammar/matcher.py Show resolved Hide resolved

Ubospica added 2 commits July 13, 2025 03:07

Merge branch 'main' into ahajha-local/torch-to-dltensor

2138968

fix according to comments

ba77825

Signed-off-by: Ubospica <[email protected]>

Ahajha commented Jul 13, 2025

View reviewed changes

Merge branch 'main' into ahajha/torch-to-dltensor

c9c3dc4

Ubospica approved these changes Jul 30, 2025

View reviewed changes

python/xgrammar/matcher.py Outdated Show resolved Hide resolved

Update python/xgrammar/matcher.py

d1dc909

Ubospica merged commit 604d004 into mlc-ai:main Aug 4, 2025
38 checks passed

Bind fill_next_token_bitmask against nb::ndarray #338

Bind fill_next_token_bitmask against nb::ndarray #338

Uh oh!

Conversation

Ahajha commented Jun 16, 2025

Uh oh!

Ubospica left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ubospica commented Jul 13, 2025

Uh oh!

Ahajha left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Ahajha Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

Ubospica Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Ubospica commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ubospica commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Ubospica commented Jul 30, 2025

Uh oh!

Ahajha commented Jul 30, 2025

Uh oh!

Ubospica commented Jul 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ahajha left a comment •

edited

Loading

Ubospica commented Jul 22, 2025 •

edited

Loading

Ubospica commented Jul 29, 2025 •

edited

Loading