Incompatibility with the vLLM Mistral Tokenizer 

The current version of `lm-format-enforcer` does not seem compatible with the Mistral tokenizer.

There is no known  workaround and vLLM disabled `guided_json` when the mistral tokenizer is used ([see this PR](https://github.com/vllm-project/vllm/pull/8521)).

This prevent models like Pixtral, which requires the `MistralTokenizer`, to be used with lm-format-enforcer

Minimal exemple to reproduce the issue:

```
import vllm
from lmformatenforcer.integrations.vllm import build_vllm_token_enforcer_tokenizer_data

model_id = "mistral-community/pixtral-12b-240910"
llm = vllm.LLM(model=model_id, tokenizer_mode="mistral")
tokenizer_data = build_vllm_token_enforcer_tokenizer_data(llm)
```

Logs:

```
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/usr/local/lib/python3.10/dist-packages/lmformatenforcer/integrations/vllm.py", line 40, in build_vllm_token_enforcer_tokenizer_data
     return build_token_enforcer_tokenizer_data(tokenizer)
   File "/usr/local/lib/python3.10/dist-packages/lmformatenforcer/integrations/transformers.py", line 77, in build_token_enforcer_tokenizer_data
     regular_tokens = _build_regular_tokens_list(tokenizer)
   File "/usr/local/lib/python3.10/dist-packages/lmformatenforcer/integrations/transformers.py", line 57, in _build_regular_tokens_list
     token_0 = tokenizer.encode("0")[-1]
 TypeError: Tekkenizer.encode() missing 2 required positional arguments: 'bos' and 'eos'
```

The `encode` method of the MistralTokenizer requires 2 additional arguments (`bool`): [code](https://github.com/mistralai/mistral-common/blob/ce9ce7984c25dcad21f210829ab1a7b5b8881792/src/mistral_common/tokens/tokenizers/tekken.py#L216)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incompatibility with the vLLM Mistral Tokenizer #141

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Incompatibility with the vLLM Mistral Tokenizer #141

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions