Skip to content

Conversation

@Seven-Streams
Copy link
Collaborator

At present, the FSM class can only build a FSM from a string. This PR offers a basic RegexToFSM function.
These regex grammars are supported:

  • Strings. For instance, RegexToFSM("abc") will build a FSM to accept "abc".
  • Character classes. For instance, RegexToFSM("[a-z]") will build a FSM to accept characters from 'a' to 'z'.
  • Repeatation(*, +, ?, {m}, {m, n}).
  • Union. (|).
    Moreover, some basic optimization algorithms are implemented, which can help to simplify the state machine:
  • ToDFA().(Powerset construction)
  • MinimizeDFA()(Hopcraft Algorithm).
  • SimplifyTransition()(Xgrammar node merging, type I)
  • SimplifyEpsilon()(Xgrammar node merging, type II)

@Ubospica Ubospica merged commit ce7a8d1 into mlc-ai:main May 4, 2025
38 checks passed
Seven-Streams added a commit to Seven-Streams/xgrammar that referenced this pull request May 5, 2025
At present, the FSM class can only build a FSM from a string. This PR
offers a basic RegexToFSM function.
These regex grammars are supported:
- Strings. For instance, `RegexToFSM("abc")` will build a FSM to accept
"abc".
- Character classes. For instance, `RegexToFSM("[a-z]")` will build a
FSM to accept characters from 'a' to 'z'.
- Repeatation(`*`, `+`, `?`, `{m}`, `{m, n}`).
- Union. (`|`).
Moreover, some basic optimization algorithms are implemented, which can
help to simplify the state machine:
- `ToDFA()`.(Powerset construction)
- `MinimizeDFA()`(Hopcraft Algorithm).
- `SimplifyTransition()`(Xgrammar node merging, type I)
- `SimplifyEpsilon()`(Xgrammar node merging, type II)

---------

Co-authored-by: Yixin Dong <[email protected]>
@Seven-Streams Seven-Streams deleted the 4.14/dev/fsm branch May 14, 2025 12:28
Ubospica added a commit to Ubospica/xgrammar that referenced this pull request May 29, 2025
* fix:fix ci.

* [Feature] Rewrite the FSM.h to support some regex grammar. (mlc-ai#302)

At present, the FSM class can only build a FSM from a string. This PR
offers a basic RegexToFSM function.
These regex grammars are supported:
- Strings. For instance, `RegexToFSM("abc")` will build a FSM to accept
"abc".
- Character classes. For instance, `RegexToFSM("[a-z]")` will build a
FSM to accept characters from 'a' to 'z'.
- Repeatation(`*`, `+`, `?`, `{m}`, `{m, n}`).
- Union. (`|`).
Moreover, some basic optimization algorithms are implemented, which can
help to simplify the state machine:
- `ToDFA()`.(Powerset construction)
- `MinimizeDFA()`(Hopcraft Algorithm).
- `SimplifyTransition()`(Xgrammar node merging, type I)
- `SimplifyEpsilon()`(Xgrammar node merging, type II)

---------

Co-authored-by: Yixin Dong <[email protected]>

* tests:add some tests.

* Revert "tests:add some tests."

This reverts commit 11c2994.

* refac:refac the fsm_builder.

* fix:fix the not function.

* refac:refac the simplify functions.

* refac:refac the fsm.cc.

* fix:enable it to compile.

* fix:pass the basic test.

* fix:fix the fucking bug in concat.

* fix:pass all the basic tests.

* fix:fix the simplify.

* fix:fix the config.

---------

Co-authored-by: Yixin Dong <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants