Skip to content

[Model] Refactoring of MiniCPM-V and add MiniCPM-o-2.6 support for vLLM #12069

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 118 commits into from
Jan 29, 2025
Merged
Show file tree
Hide file tree
Changes from 115 commits
Commits
Show all changes
118 commits
Select commit Hold shift + click to select a range
f78ad12
refactor for images
HwwwwwwwH Jan 14, 2025
95230b9
supprot image embedding for minicpmv
HwwwwwwwH Jan 15, 2025
42ffb1b
[Bugfix][SpecDecode] Adjust Eagle model architecture to align with in…
llsj14 Jan 11, 2025
43ff2e9
[Bugfix] fused_experts_impl wrong compute type for float32 (#11921)
shaochangxu Jan 11, 2025
0ec9974
[CI/Build] Move model-specific multi-modal processing tests (#11934)
DarkLight1337 Jan 11, 2025
b4a9094
[Doc] Basic guide for writing unit tests for new models (#11951)
DarkLight1337 Jan 11, 2025
ac29198
[Bugfix] Fix RobertaModel loading (#11940)
NickLucche Jan 11, 2025
286107f
[Model] Add cogagent model support vLLM (#11742)
sixsixcoder Jan 11, 2025
535e120
[V1] Avoid sending text prompt to core engine (#11963)
ywang96 Jan 12, 2025
925562b
[CI/Build] Add markdown linter (#11857)
rafvasq Jan 12, 2025
936b306
[Model] Initialize support for Deepseek-VL2 models (#11578)
Isotr0py Jan 12, 2025
141151f
[Hardware][CPU] Multi-LoRA implementation for the CPU backend (#11100)
Akshat-Tripathi Jan 12, 2025
eac7811
[Hardware][TPU] workaround fix for MoE on TPU (#11764)
avshalomman Jan 12, 2025
e251866
[V1][Core][1/n] Logging and Metrics (#11962)
robertgshaw2-redhat Jan 12, 2025
e46c06b
[Model] Support GGUF models newly added in `transformers` 4.46.0 (#9685)
Isotr0py Jan 13, 2025
d12c0de
[V1] [2/n] Logging and Metrics - `OutputProcessor` Abstraction (#11973)
robertgshaw2-redhat Jan 13, 2025
e459c90
[MISC] fix typo in kv transfer send recv test (#11983)
yyccli Jan 13, 2025
93a78ba
[Bug] Fix usage of `.transpose()` and `.view()` consecutively. (#11979)
liaoyanqing666 Jan 13, 2025
dd2f627
[CI][Spec Decode] fix: broken test for EAGLE model (#11972)
llsj14 Jan 13, 2025
570e067
[Misc] Fix Deepseek V2 fp8 kv-scale remapping (#11947)
Concurrensee Jan 13, 2025
eaccb74
[Misc]Minor Changes about Worker (#11555)
noemotiovon Jan 13, 2025
7adb4a0
[platform] add ray_device_key (#11948)
youkaichao Jan 13, 2025
a014ddd
Fix Max Token ID for Qwen-VL-Chat (#11980)
alex-jw-brooks Jan 13, 2025
cedf6cc
[Kernel] unified_attention for Attention.forward (#11967)
heheda12345 Jan 13, 2025
a1f053f
[Doc][V1] Update model implementation guide for V1 support (#11998)
ywang96 Jan 13, 2025
651ee49
[Doc] Organise installation documentation into categories and tabs (#…
hmellor Jan 13, 2025
adc0b54
[platform] add device_control env var (#12009)
youkaichao Jan 13, 2025
1fa0b25
[Platform] Move get_punica_wrapper() function to Platform (#11516)
shen-shanshan Jan 13, 2025
e55869e
bugfix: Fix signature mismatch in benchmark's `get_tokenizer` functio…
e1ijah1 Jan 13, 2025
7f2aa68
[Doc] Fix build from source and installation link in README.md (#12013)
Yikun Jan 13, 2025
a1f0814
[Bugfix] Fix deepseekv3 gate bias error (#12002)
SunflowerAries Jan 13, 2025
0ca468e
[Docs] Add Sky Computing Lab to project intro (#12019)
WoosukKwon Jan 14, 2025
6bec0d0
[HPU][Bugfix] set_forward_context and CI test execution (#12014)
kzawora-intel Jan 14, 2025
0badf14
[Doc] Update Quantization Hardware Support Documentation (#12025)
tjtanaa Jan 14, 2025
c6a5060
[HPU][misc] add comments for explanation (#12034)
youkaichao Jan 14, 2025
055a2b7
[Bugfix] Fix various bugs in multi-modal processor (#12031)
DarkLight1337 Jan 14, 2025
941a5d5
[Kernel] Revert the API change of Attention.forward (#12038)
heheda12345 Jan 14, 2025
3a05c49
[Platform] Add output for Attention Backend (#11981)
wangxiyuan Jan 14, 2025
87a687b
[Bugfix][Kernel] Give unique name to BlockSparseFlashAttention (#12040)
heheda12345 Jan 14, 2025
3183e6a
Explain where the engine args go when using Docker (#12041)
hmellor Jan 14, 2025
cc9cde5
[Doc]: Update the Json Example of the `Engine Arguments` document (#1…
maang-h Jan 14, 2025
58d45cd
[Misc] Merge bitsandbytes_stacked_params_mapping and packed_modules_…
jeejeelee Jan 14, 2025
bb13b8a
[Kernel] Support MulAndSilu (#11624)
jeejeelee Jan 15, 2025
1bba3f6
[HPU][Bugfix] Don't use /dev/accel/accel0 for HPU autodetection in se…
kzawora-intel Jan 15, 2025
ef22c6c
[Platform] move current_memory_usage() into platform (#11369)
shen-shanshan Jan 15, 2025
94adbff
[V1][BugFix] Fix edge case in VLM scheduling (#12065)
WoosukKwon Jan 15, 2025
654f5d7
[Misc] Add multipstep chunked-prefill support for FlashInfer (#10467)
elfiegg Jan 15, 2025
8146c68
[core] Turn off GPU communication overlap for Ray executor (#12051)
ruisearch42 Jan 15, 2025
59e5cf4
[core] platform agnostic executor via collective_rpc (#11256)
youkaichao Jan 15, 2025
920038b
merge main
HwwwwwwwH Jan 15, 2025
6f6d2eb
video embedding supports
HwwwwwwwH Jan 15, 2025
364bca1
update support for minicpmo on images and videos
HwwwwwwwH Jan 15, 2025
c2d8dbb
audio language
HwwwwwwwH Jan 22, 2025
1ba77eb
audio embedding inputs
HwwwwwwwH Jan 22, 2025
1c6f7d8
format
HwwwwwwwH Jan 22, 2025
26d40a5
merge main x
HwwwwwwwH Jan 23, 2025
24d9a80
merge main
HwwwwwwwH Jan 23, 2025
29774db
Merge branch 'main' of https://github.com/vllm-project/vllm into mini…
jeejeelee Jan 23, 2025
6c409c5
docs/server-chat-utils/tests for minicpmo
HwwwwwwwH Jan 23, 2025
ee2f7da
Merge branch 'minicpmv-refactor' of github.com:HwwwwwwwH/vllm into mi…
HwwwwwwwH Jan 23, 2025
42e7e78
Update docs/source/models/supported_models.md
HwwwwwwwH Jan 23, 2025
6c0a686
Update tests/models/decoder_only/vision_language/test_models.py
HwwwwwwwH Jan 23, 2025
c15228b
format
HwwwwwwwH Jan 23, 2025
c51026d
Merge branch 'minicpmv-refactor' of github.com:HwwwwwwwH/vllm into mi…
HwwwwwwwH Jan 23, 2025
ac26f59
split minicpmo in a separate file
HwwwwwwwH Jan 23, 2025
8b0cbf7
format
HwwwwwwwH Jan 23, 2025
428ae5a
Update vllm/model_executor/models/minicpmo.py
HwwwwwwwH Jan 23, 2025
edfac98
add hints
HwwwwwwwH Jan 24, 2025
4ed8b11
format
HwwwwwwwH Jan 24, 2025
b44085e
clean unnecessary logic of WhisperEncoder
HwwwwwwwH Jan 24, 2025
763c578
format
HwwwwwwwH Jan 24, 2025
cd68484
Update vllm/model_executor/models/minicpmo.py
HwwwwwwwH Jan 24, 2025
1e47208
add torchaudio for test
HwwwwwwwH Jan 24, 2025
781d1c3
add annotations
HwwwwwwwH Jan 24, 2025
f0b0270
format
HwwwwwwwH Jan 24, 2025
ed1dd9e
Merge remote-tracking branch 'upstream/main' into minicpmv-refactor
ywang96 Jan 25, 2025
6d5978a
enable MiniCPMV-MiniCPMO for cache
HwwwwwwwH Jan 26, 2025
3bb67f8
Merge branch 'minicpmv-refactor' of github.com:HwwwwwwwH/vllm into mi…
HwwwwwwwH Jan 26, 2025
25d86ce
add multimodal tests for minicpmv
HwwwwwwwH Jan 26, 2025
bec9a73
format
HwwwwwwwH Jan 26, 2025
2120dd6
custom_hf_runner for minicpmo
HwwwwwwwH Jan 26, 2025
0fd4347
Merge branch 'minicpmv-refactor' of github.com:HwwwwwwwH/vllm into mi…
HwwwwwwwH Jan 26, 2025
6d2f4e4
format
HwwwwwwwH Jan 26, 2025
fac61eb
pass all tests
HwwwwwwwH Jan 27, 2025
6037606
format / pass all tests
HwwwwwwwH Jan 27, 2025
b6f24f7
fix num_slices bug
HwwwwwwwH Jan 27, 2025
e439d3a
Merge branch 'minicpmv-refactor' of github.com:HwwwwwwwH/vllm into mi…
HwwwwwwwH Jan 27, 2025
0f67ac9
add examples
HwwwwwwwH Jan 27, 2025
eab479f
add examples and format tests
HwwwwwwwH Jan 27, 2025
05a0ef8
format
HwwwwwwwH Jan 27, 2025
6650450
Update tests/models/decoder_only/vision_language/vlm_utils/model_util…
HwwwwwwwH Jan 27, 2025
8f5b069
Update vllm/model_executor/models/minicpmv.py
HwwwwwwwH Jan 27, 2025
de0b55f
Update vllm/model_executor/models/minicpmv.py
HwwwwwwwH Jan 27, 2025
ad52859
Update vllm/model_executor/models/minicpmv.py
HwwwwwwwH Jan 27, 2025
c5b912d
Update vllm/model_executor/models/minicpmv.py
HwwwwwwwH Jan 27, 2025
00e9e5a
Update vllm/model_executor/models/minicpmo.py
HwwwwwwwH Jan 27, 2025
49ea11e
alphabet
HwwwwwwwH Jan 27, 2025
595c679
add annotations
HwwwwwwwH Jan 27, 2025
061596f
Merge branch 'minicpmv-refactor' of github.com:HwwwwwwwH/vllm into mi…
HwwwwwwwH Jan 27, 2025
26d4b2b
add torchaudio dependency
HwwwwwwwH Jan 27, 2025
5867171
format
HwwwwwwwH Jan 27, 2025
bed7843
torchaudio
HwwwwwwwH Jan 27, 2025
715bd9f
fix minicpmo_patch_hf_runner
HwwwwwwwH Jan 27, 2025
cf4788f
fix slice bug
HwwwwwwwH Jan 28, 2025
53c679e
Merge branch 'main' into minicpmv-refactor
HwwwwwwwH Jan 28, 2025
3127a6b
format
HwwwwwwwH Jan 28, 2025
290795b
test model register
HwwwwwwwH Jan 28, 2025
d9dedd7
delete minicpmv2.5 in test_common
HwwwwwwwH Jan 28, 2025
f6d5cfa
add dependencies of minicpmo audio tests
HwwwwwwwH Jan 28, 2025
da2ddd3
format
HwwwwwwwH Jan 28, 2025
4222899
add vocos in requirements_test.in
HwwwwwwwH Jan 28, 2025
26ebc7c
Merge branch 'minicpmv-refactor' of github.com:HwwwwwwwH/vllm into mi…
HwwwwwwwH Jan 28, 2025
2e93896
alphabet in example file and server
HwwwwwwwH Jan 28, 2025
0dfa513
Merge branch 'main' into minicpmv-refactor
HwwwwwwwH Jan 28, 2025
dadd030
Merge branch 'main' into minicpmv-refactor
DarkLight1337 Jan 28, 2025
f5a188a
merge main && fix conflict
HwwwwwwwH Jan 29, 2025
8216fd5
delete vocos in setup.py
HwwwwwwwH Jan 29, 2025
4cfd785
update docs
HwwwwwwwH Jan 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/source/models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -690,6 +690,13 @@ See [this page](#generative-models) for more information on how to use generativ
-
- ✅︎
- ✅︎
* - `MiniCPMO`
- MiniCPM-o
- T + I<sup>E+</sup> + V<sup>E+</sup> + A<sup>E+</sup>
- `openbmb/MiniCPM-o-2_6`, etc.
- ✅︎
- ✅︎
-
* - `MiniCPMV`
- MiniCPM-V
- T + I<sup>E+</sup>
Expand Down
32 changes: 31 additions & 1 deletion examples/offline_inference/audio_language.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,37 @@ def run_qwen2_audio(question: str, audio_count: int):
return llm, prompt, stop_token_ids


model_example_map = {"ultravox": run_ultravox, "qwen2_audio": run_qwen2_audio}
def run_minicpmo(question: str, audio_count: int):
model_name = "openbmb/MiniCPM-o-2_6"
tokenizer = AutoTokenizer.from_pretrained(model_name,
trust_remote_code=True)
llm = LLM(model=model_name,
trust_remote_code=True,
max_model_len=4096,
max_num_seqs=5,
limit_mm_per_prompt={"audio": audio_count})

stop_tokens = ['<|im_end|>', '<|endoftext|>']
stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]

audio_placeholder = "(<audio>./</audio>)" * audio_count
audio_chat_template = "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n<|spk_bos|><|spk|><|spk_eos|><|tts_bos|>' }}{% endif %}" # noqa: E501
messages = [{
'role': 'user',
'content': f'{audio_placeholder}\n{question}'
}]
prompt = tokenizer.apply_chat_template(messages,
tokenize=False,
add_generation_prompt=True,
chat_template=audio_chat_template)
return llm, prompt, stop_token_ids


model_example_map = {
"ultravox": run_ultravox,
"qwen2_audio": run_qwen2_audio,
"minicpmo": run_minicpmo
}


def main(args):
Expand Down
33 changes: 28 additions & 5 deletions examples/offline_inference/vision_language.py
Original file line number Diff line number Diff line change
Expand Up @@ -265,8 +265,9 @@ def run_mantis(question: str, modality: str):


# MiniCPM-V
def run_minicpmv(question: str, modality: str):
assert modality == "image"
def run_minicpmv_base(question: str, modality: str, model_name):
assert modality in ["image", "video"]
# If you want to use `MiniCPM-o-2_6` with audio inputs, check `audio_language.py` # noqa

# 2.0
# The official repo doesn't work yet, so we need to use a fork for now
Expand All @@ -277,7 +278,15 @@ def run_minicpmv(question: str, modality: str):
# model_name = "openbmb/MiniCPM-Llama3-V-2_5"

# 2.6
model_name = "openbmb/MiniCPM-V-2_6"
# model_name = "openbmb/MiniCPM-V-2_6"
# o2.6

# modality supports
# 2.0: image
# 2.5: image
# 2.6: image, video
# o2.6: image, video, audio
# model_name = "openbmb/MiniCPM-o-2_6"
tokenizer = AutoTokenizer.from_pretrained(model_name,
trust_remote_code=True)
llm = LLM(
Expand All @@ -294,20 +303,33 @@ def run_minicpmv(question: str, modality: str):
# 2.5
# stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id]

# 2.6
# 2.6 / o2.6
stop_tokens = ['<|im_end|>', '<|endoftext|>']
stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]

modality_placeholder = {
"image": "(<image>./</image>)",
"video": "(<video>./</video>)",
}

messages = [{
'role': 'user',
'content': f'(<image>./</image>)\n{question}'
'content': f'{modality_placeholder[modality]}\n{question}'
}]
prompt = tokenizer.apply_chat_template(messages,
tokenize=False,
add_generation_prompt=True)
return llm, prompt, stop_token_ids


def run_minicpmo(question: str, modality: str):
return run_minicpmv_base(question, modality, "openbmb/MiniCPM-o-2_6")


def run_minicpmv(question: str, modality: str):
return run_minicpmv_base(question, modality, "openbmb/MiniCPM-V-2_6")


# LLama 3.2
def run_mllama(question: str, modality: str):
assert modality == "image"
Expand Down Expand Up @@ -523,6 +545,7 @@ def run_qwen2_vl(question: str, modality: str):
"llava-next-video": run_llava_next_video,
"llava-onevision": run_llava_onevision,
"mantis": run_mantis,
"minicpmo": run_minicpmo,
"minicpmv": run_minicpmv,
"mllama": run_mllama,
"molmo": run_molmo,
Expand Down
1 change: 1 addition & 0 deletions requirements-cpu.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,6 @@
# Dependencies for CPUs
torch==2.5.1+cpu; platform_machine != "ppc64le" and platform_machine != "aarch64" and platform_system != "Darwin"
torch==2.5.1; platform_machine == "aarch64" or platform_system == "Darwin"
torchaudio; platform_machine != "ppc64le" # required for the image processor of minicpm-o-2_6, this must be updated alongside torch
torchvision; platform_machine != "ppc64le" # required for the image processor of phi3v, this must be updated alongside torch
datasets # for benchmark scripts
1 change: 1 addition & 0 deletions requirements-cuda.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
ray[default] >= 2.9
nvidia-ml-py >= 12.560.30 # for pynvml package
torch == 2.5.1
torchaudio==2.5.1
# These must be updated alongside torch
torchvision == 0.20.1 # Required for phi3v processor. See https://github.com/pytorch/vision?tab=readme-ov-file#installation for corresponding version
xformers == 0.0.28.post3; platform_system == 'Linux' and platform_machine == 'x86_64' # Requires PyTorch 2.5.1
3 changes: 3 additions & 0 deletions requirements-test.in
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,16 @@ decord # required for video tests
einops # required for MPT, qwen-vl and Mamba
httpx
librosa # required for audio tests
vector_quantize_pytorch # required for minicpmo_26 test
vocos # required for minicpmo_26 test
peft
pqdm
ray[adag]==2.40.0
sentence-transformers # required for embedding tests
soundfile # required for audio tests
timm # required for internvl test
torch==2.5.1
torchaudio==2.5.1
transformers_stream_generator # required for qwen-vl test
matplotlib # required for qwen-vl test
mistral_common[opencv] >= 1.5.0 # required for pixtral test
Expand Down
37 changes: 35 additions & 2 deletions requirements-test.txt
Original file line number Diff line number Diff line change
Expand Up @@ -106,9 +106,17 @@ dnspython==2.7.0
docutils==0.16
# via awscli
einops==0.8.0
# via -r requirements-test.in
# via
# -r requirements-test.in
# encodec
# vector-quantize-pytorch
# vocos
einx==0.3.0
# via vector-quantize-pytorch
email-validator==2.2.0
# via pydantic
encodec==0.1.1
# via vocos
evaluate==0.4.3
# via lm-eval
fastparquet==2024.11.0
Expand All @@ -125,6 +133,8 @@ filelock==3.16.1
# triton
fonttools==4.54.1
# via matplotlib
frozendict==2.4.6
# via einx
frozenlist==1.5.0
# via
# aiohttp
Expand Down Expand Up @@ -159,6 +169,7 @@ huggingface-hub==0.26.2
# timm
# tokenizers
# transformers
# vocos
idna==3.10
# via
# anyio
Expand Down Expand Up @@ -261,6 +272,8 @@ numpy==1.26.4
# cupy-cuda12x
# datasets
# decord
# einx
# encodec
# evaluate
# fastparquet
# genai-perf
Expand All @@ -283,6 +296,7 @@ numpy==1.26.4
# torchvision
# transformers
# tritonclient
# vocos
nvidia-cublas-cu12==12.4.5.8
# via
# nvidia-cudnn-cu12
Expand Down Expand Up @@ -455,6 +469,7 @@ pyyaml==6.0.2
# responses
# timm
# transformers
# vocos
ray[adag]==2.40.0
# via -r requirements-test.in
redis==5.2.0
Expand Down Expand Up @@ -517,6 +532,7 @@ scipy==1.13.1
# scikit-learn
# sentence-transformers
# statsmodels
# vocos
sentence-transformers==3.2.1
# via -r requirements-test.in
sentencepiece==0.2.0
Expand All @@ -540,7 +556,9 @@ sqlitedict==2.1.0
statsmodels==0.14.4
# via genai-perf
sympy==1.13.1
# via torch
# via
# einx
# torch
tabledata==1.3.3
# via pytablewriter
tabulate==0.9.0
Expand Down Expand Up @@ -568,12 +586,21 @@ torch==2.5.1
# -r requirements-test.in
# accelerate
# bitsandbytes
# encodec
# lm-eval
# peft
# sentence-transformers
# tensorizer
# timm
# torchaudio
# torchvision
# vector-quantize-pytorch
# vocos
torchaudio==2.5.1
# via
# -r requirements-test.in
# encodec
# vocos
torchvision==0.20.1
# via timm
tqdm==4.66.6
Expand All @@ -584,6 +611,7 @@ tqdm==4.66.6
# lm-eval
# nltk
# peft
# pqdm
# sentence-transformers
# tqdm-multiprocess
# transformers
Expand Down Expand Up @@ -615,6 +643,7 @@ typing-extensions==4.12.2
# huggingface-hub
# librosa
# mistral-common
# pqdm
# pydantic
# pydantic-core
# torch
Expand All @@ -626,6 +655,10 @@ urllib3==2.2.3
# requests
# responses
# tritonclient
vector-quantize-pytorch==1.21.2
# via -r requirements-test.in
vocos==0.1.0
# via -r requirements-test.in
word2number==1.1
# via lm-eval
xxhash==3.5.0
Expand Down
3 changes: 2 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -656,7 +656,8 @@ def _read_requirements(filename: str) -> List[str]:
extras_require={
"tensorizer": ["tensorizer>=2.9.0"],
"runai": ["runai-model-streamer", "runai-model-streamer-s3", "boto3"],
"audio": ["librosa", "soundfile"], # Required for audio processing
"audio": ["librosa", "soundfile",
"vocos"], # Required for audio processing
"video": ["decord"] # Required for video processing
},
cmdclass=cmdclass,
Expand Down
14 changes: 14 additions & 0 deletions tests/models/decoder_only/vision_language/test_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,20 @@
postprocess_inputs=model_utils.wrap_inputs_post_processor,
hf_output_post_proc=model_utils.minicpmv_trunc_hf_output,
),
"minicpmo_26": VLMTestInfo(
models=["openbmb/MiniCPM-o-2_6"],
test_type=(VLMTestType.IMAGE, VLMTestType.MULTI_IMAGE),
prompt_formatter=lambda img_prompt: f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n{img_prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n", # noqa: E501
img_idx_to_prompt=lambda idx: "(<image>./</image>)\n",
max_model_len=4096,
max_num_seqs=2,
get_stop_token_ids=lambda tok: tok.convert_tokens_to_ids(['<|im_end|>', '<|endoftext|>']), # noqa: E501
postprocess_inputs=model_utils.ignore_inputs_post_processor(
"image_sizes"
),
hf_output_post_proc=model_utils.minicpmv_trunc_hf_output,
patch_hf_runner=model_utils.minicpmo_patch_hf_runner
),
"minicpmv_26": VLMTestInfo(
models=["openbmb/MiniCPM-V-2_6"],
test_type=(VLMTestType.IMAGE, VLMTestType.MULTI_IMAGE),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -497,6 +497,17 @@ def _generate(self, *args, **kwargs):
return hf_model


def minicpmo_patch_hf_runner(hf_model: HfRunner) -> HfRunner:
orig_generate = hf_model.model.generate

def _generate(self, *args, **kwargs):
return orig_generate(*args, decode_text=False, **kwargs)

hf_model.model.generate = types.MethodType(_generate, hf_model.model)

return hf_model


def _generate_greedy_logprobs_limit(
self,
prompts: List[str],
Expand Down
2 changes: 2 additions & 0 deletions tests/models/multimodal/processing/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,8 @@ def _test_processing_correctness(
"llava-hf/llava-onevision-qwen2-0.5b-ov-hf",
"TIGER-Lab/Mantis-8B-siglip-llama3",
"mistral-community/pixtral-12b",
"openbmb/MiniCPM-o-2_6",
"openbmb/MiniCPM-V-2_6",
"Qwen/Qwen-VL-Chat",
"Qwen/Qwen2-VL-2B-Instruct",
"Qwen/Qwen2-Audio-7B-Instruct",
Expand Down
4 changes: 3 additions & 1 deletion tests/models/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,9 @@ def check_available_online(
"LlavaOnevisionForConditionalGeneration": _HfExamplesInfo("llava-hf/llava-onevision-qwen2-0.5b-ov-hf"), # noqa: E501
"MantisForConditionalGeneration": _HfExamplesInfo("TIGER-Lab/Mantis-8B-siglip-llama3", # noqa: E501
hf_overrides={"architectures": ["MantisForConditionalGeneration"]}), # noqa: E501
"MiniCPMV": _HfExamplesInfo("openbmb/MiniCPM-Llama3-V-2_5",
"MiniCPMO": _HfExamplesInfo("openbmb/MiniCPM-o-2_6",
trust_remote_code=True),
"MiniCPMV": _HfExamplesInfo("openbmb/MiniCPM-V-2_6",
trust_remote_code=True),
"MolmoForCausalLM": _HfExamplesInfo("allenai/Molmo-7B-D-0924",
trust_remote_code=True),
Expand Down
6 changes: 5 additions & 1 deletion vllm/entrypoints/chat_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -392,7 +392,7 @@ def _placeholder_str(self, modality: ModalityStr,
if model_type == "phi3_v":
# Workaround since this token is not defined in the tokenizer
return f"<|image_{current_count}|>"
if model_type == "minicpmv":
if model_type in ("minicpmo", "minicpmv"):
return "(<image>./</image>)"
if model_type in ("blip-2", "chatglm", "fuyu", "paligemma",
"pixtral"):
Expand Down Expand Up @@ -424,10 +424,14 @@ def _placeholder_str(self, modality: ModalityStr,
if model_type == "qwen2_audio":
return (f"Audio {current_count}: "
f"<|audio_bos|><|AUDIO|><|audio_eos|>")
if model_type == "minicpmo":
return "(<audio>./</audio>)"
raise TypeError(f"Unknown model type: {model_type}")
elif modality == "video":
if model_type == "qwen2_vl":
return "<|vision_start|><|video_pad|><|vision_end|>"
if model_type in ("minicpmo", "minicpmv"):
return "(<video>./</video>)"
if model_type.startswith("llava"):
return self._cached_token_str(self._tokenizer,
hf_config.video_token_index)
Expand Down
Loading