-
Notifications
You must be signed in to change notification settings - Fork 78
Closed
Description
The current version of lm-format-enforcer
does not seem compatible with the Mistral tokenizer.
There is no known workaround and vLLM disabled guided_json
when the mistral tokenizer is used (see this PR).
This prevent models like Pixtral, which requires the MistralTokenizer
, to be used with lm-format-enforcer
Minimal exemple to reproduce the issue:
import vllm
from lmformatenforcer.integrations.vllm import build_vllm_token_enforcer_tokenizer_data
model_id = "mistral-community/pixtral-12b-240910"
llm = vllm.LLM(model=model_id, tokenizer_mode="mistral")
tokenizer_data = build_vllm_token_enforcer_tokenizer_data(llm)
Logs:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.10/dist-packages/lmformatenforcer/integrations/vllm.py", line 40, in build_vllm_token_enforcer_tokenizer_data
return build_token_enforcer_tokenizer_data(tokenizer)
File "/usr/local/lib/python3.10/dist-packages/lmformatenforcer/integrations/transformers.py", line 77, in build_token_enforcer_tokenizer_data
regular_tokens = _build_regular_tokens_list(tokenizer)
File "/usr/local/lib/python3.10/dist-packages/lmformatenforcer/integrations/transformers.py", line 57, in _build_regular_tokens_list
token_0 = tokenizer.encode("0")[-1]
TypeError: Tekkenizer.encode() missing 2 required positional arguments: 'bos' and 'eos'
The encode
method of the MistralTokenizer requires 2 additional arguments (bool
): code
Metadata
Metadata
Assignees
Labels
No labels