[New Model]: nomic-embed-text-v2-moe #17785

noooop · 2025-05-07T11:22:30Z

Summary

support https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe
validate models previously supported
- nomic-ai/nomic-embed-text-v1
- nomic-ai/nomic-embed-text-v1.5
- nomic-ai/CodeRankEmbed

Details

NomicBertModel

The new model uses the MoE architecture but still named NomicBertModel.

create a new module bert_with_rope and
split NomicBertModel, GteModel, JinaRobertaModel out from bert.py and roberta.py,
revert all modifications to bert.py and roberta.py.

I don't want to add too much code to the classic BERT architecture.

Context extension

If during inference it is found that seqlen > max_trained_positions, it will automatically change from NomicBertRotaryEmbedding to NomicBertDynamicNTKRotaryEmbedding.

https://huggingface.co/Snowflake/snowflake-arctic-embed-m-long/blob/main/modeling_hf_nomic_bert.py#L639
https://huggingface.co/nomic-ai/nomic-bert-2048/blob/main/modeling_hf_nomic_bert.py#L1413

        if seqlen > self.max_position_embeddings:
            base = self.base * (
                (self.rotary_scaling_factor * seqlen / self.max_position_embeddings) - (self.rotary_scaling_factor - 1)
            ) ** (self.dim / (self.dim - 2))
            inv_freq = self._compute_inv_freq(base=base, device=device)
            self.register_buffer("inv_freq", inv_freq, persistent=False)

It might lead to hard-to-detect bugs.

we ignore config.rotary_scaling_factor so that for datasets shorter than max_trained_positions 2048, the results are consistent with SentenceTransformer.

The context extension uses vllm style rope_theta and rope_scaling.

NomicMoELayer

NomicExpertMLP did not use GatedMLP, while FusedMoE currently only supports GatedMLP.

performance might be poor

mteb_test_embed_models

add mteb_test_embed_models, but only import it when needed.

numerical stability

The models from nomic require float32 to achieve relatively good numerical stability (<1e-4).

float16 nomic-ai/nomic-embed-text-v1 0.7375691474332452 -0.0023990889416417582 0.0005159950361333374
float32 nomic-ai/nomic-embed-text-v1 0.7375691474332452 -2.484723391815713e-06 7.641653300452288e-06

float16 nomic-ai/nomic-embed-text-v1.5 0.7449963653818389 -0.007560234163759838 0.0011759593384633366
float32 nomic-ai/nomic-embed-text-v1.5 0.7449963653818389 -4.274332828013705e-06 7.1521680795431445e-06

float16 nomic-ai/nomic-embed-text-v2-moe 0.7154891254186093 -0.001148407179619948 0.00026305757800751525
float32 nomic-ai/nomic-embed-text-v2-moe 0.7154891254186093 -2.480245292479921e-07 6.559447973608627e-06

However, Snowflake/snowflake-arctic-embed-m-long uses NomicBertModel(even with the same code), using float16 can also yield good results.

float16 Snowflake/snowflake-arctic-embed-m-long 0.6811445157066163 3.396798716037708e-05 1.224356222837439e-05
float32 Snowflake/snowflake-arctic-embed-m-long 0.6811445157066163 -2.0093559980338682e-06 1.1042117285383271e-05

nomic-ai/CodeRankEmbed using float16 can also yield good results.

float16 nomic-ai/CodeRankEmbed 0.6738279162360256 -2.992811111390825e-05 1.1006315084661332e-05
float32 nomic-ai/CodeRankEmbed 0.6738279162360256 -5.298823933408414e-06 9.233419183517363e-06

weird

tests Involved

pytest tests/models/language/pooling/test_nomic.py
pytest tests/models/language/pooling/test_snowflake_arctic_embed.py
pytest tests/models/language/pooling/test_jina.py

Partial_Fix #15849
Fix #12054
Fix #17949

github-actions · 2025-05-07T11:22:39Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

noooop · 2025-05-07T13:30:05Z

@DarkLight1337

NomicExpertMLP did not use GatedMLP, while FusedMoE currently only supports GatedMLP.

performance might be poor

I am not familiar with moe, can you ask relevant experts for help?

DarkLight1337 · 2025-05-07T13:35:17Z

cc @mgoin @Isotr0py

vllm/model_executor/models/nomic.py

DarkLight1337 · 2025-05-08T05:44:45Z

Can you also update the Supported Models page with these models?

DarkLight1337 · 2025-05-08T05:45:29Z

cc @maxdebayser as well

vllm/model_executor/models/nomic.py

noooop · 2025-05-08T09:34:13Z

@Isotr0py @mgoin

NomicExpertMLP did not use GatedMLP, while FusedMoE currently only supports GatedMLP.

performance might be poor

Please assess how much effort is needed to make FusedMoE compatible with NomicExpertMLP.

If it will take a great effort , can I merge the current version in first?

noooop · 2025-05-08T10:09:21Z

@DarkLight1337

docs build errors may be due to snowballstemmer version upgrade

https://pypi.org/project/snowballstemmer/#history

snowballstemmer==3.0.0 Two hours ago posted

correct build
https://app.readthedocs.org/api/v2/build/28103004.txt
Collecting snowballstemmer>=2.2 (from sphinx)
Downloading snowballstemmer-2.2.0-py2.py3-none-any.whl.metadata (6.5 kB)

error
https://app.readthedocs.org/api/v2/build/28104027.txt
Collecting snowballstemmer>=2.2 (from sphinx)
Downloading snowballstemmer-3.0.0-py3-none-any.whl.metadata (7.7 kB)

DarkLight1337 · 2025-05-08T10:14:22Z

Yeah, you can ignore that for now

Isotr0py · 2025-05-08T11:31:32Z

can I merge the current version in first?

I'm fine to have this version merged first. We can leave the fused_moe optimization to be done in a following PR.

maxdebayser

I spent some time tracing all the forward() calls and there doesn't seem to be huge changes compared to Bert. But since Nomic has renamed layers and changed their position a little bit, I can see that it's awkward to keep maintaining complicated weight mappings. So this is probably more maintainable. Thanks for contributing.

vllm/model_executor/models/nomic.py

tests/models/language/pooling/test_snowflake_arctic_embed.py

noooop · 2025-05-10T06:32:15Z

@DarkLight1337 @maxdebayser

how about

create a new module bert_with_rope and
split NomicBertModel, GteModel, JinaRobertaModel out from bert.py and roberta.py,
revert all modifications to bert.py and roberta.py.

I don't want to add too much code to the classic BERT architecture.

maxdebayser · 2025-05-10T13:45:51Z

I don't want to add too much code to the classic BERT architecture.

I think that's reasonable. Adding a lot of new code makes the original logic hard to follow.

split NomicBertModel, GteModel, JinaRobertaModel out from bert.py and roberta.py,

Do all of these models follow the same naming scheme for the layers?

noooop · 2025-05-10T14:11:30Z

Do all of these models follow the same naming scheme for the layers?

They are almost the same in architecture, just with different layer names.

I use well-known layer names like qkv_proj, gate_up_proj, down_proj, attn_ln, mlp_ln in the code instead of Wqkv, fc1, fc2, norm1, norm2.

By remapping the layers, they all produce the correct results.

noooop · 2025-05-11T03:32:40Z

@DarkLight1337

ready to final review

bhaktatejas922 · 2025-05-11T06:33:43Z

how much more work would it be to support the alibaba version? its fairly similar to nomicembed in that it also applies RoPE and GLU to BERT so I imagine it wouldnt be that much work to also add?

https://huggingface.co/Alibaba-NLP/new-impl
Alibaba-NLP/new-impl--modeling.NewModel

noooop · 2025-05-11T06:35:03Z

how much more work would it be to support the alibaba version? its fairly similar to nomicembed in that it also applies RoPE and GLU to BERT so I imagine it wouldnt be that much work to also add?

https://huggingface.co/Alibaba-NLP/new-impl Alibaba-NLP/new-impl--modeling.NewModel

I will verify as soon as possible

noooop · 2025-05-11T08:04:08Z

Thanks for reviewing

Signed-off-by: Mu Huai <[email protected]>

jiqiujia · 2025-05-16T14:24:50Z

mark

Signed-off-by: Yuqi Zhang <[email protected]>

Signed-off-by: minpeter <[email protected]>

support nomic

75bf31b

fix

de94e75

+ comments

af5ff3c

noooop marked this pull request as ready for review May 8, 2025 03:50

noooop requested review from DarkLight1337 and ywang96 as code owners May 8, 2025 03:50

DarkLight1337 reviewed May 8, 2025

View reviewed changes

vllm/model_executor/models/nomic.py Outdated Show resolved Hide resolved

+ docs

5832336

mergify bot added the documentation Improvements or additions to documentation label May 8, 2025

remove duplicates

2cae829

Isotr0py reviewed May 8, 2025

View reviewed changes

vllm/model_executor/models/nomic.py Outdated Show resolved Hide resolved

vllm/model_executor/models/nomic.py Outdated Show resolved Hide resolved

fix

5b95dd3

Merge branch 'vllm-project:main' into nomic

4605be9

maxdebayser suggested changes May 9, 2025

View reviewed changes

vllm/model_executor/models/nomic.py Outdated Show resolved Hide resolved

vllm/model_executor/models/nomic.py Outdated Show resolved Hide resolved

tests/models/language/pooling/test_snowflake_arctic_embed.py Show resolved Hide resolved

noooop added 2 commits May 10, 2025 14:28

bert_with_rope

ed18ba7

Merge remote-tracking branch 'origin/nomic' into nomic

817c623

fix

77c89d0

noooop mentioned this pull request May 11, 2025

[Bugfix] When use set rope_scaling should replace it #17953

Closed

DarkLight1337 approved these changes May 11, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) May 11, 2025 04:18

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label May 11, 2025

Merge branch 'vllm-project:main' into nomic

81316af

noooop mentioned this pull request May 11, 2025

[Bug]: Embedding model RoPE scaling factor always None #17949

Closed

1 task

vllm-bot merged commit e4b8713 into vllm-project:main May 11, 2025
59 of 62 checks passed

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[New Model]: nomic-embed-text-v2-moe (vllm-project#17785)

ccca8b7

Signed-off-by: Mu Huai <[email protected]>

mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025

[New Model]: nomic-embed-text-v2-moe (vllm-project#17785)

370f95d

Isotr0py mentioned this pull request May 18, 2025

[Model]: Fused MoE for nomic-embed-text-v2-moe #18321

Merged

noooop mentioned this pull request May 19, 2025

[RFC]: hybrid dtype: float32 for weights and activation, float16 or bfloat16 for attention. #18342

Closed

1 task

zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025

[New Model]: nomic-embed-text-v2-moe (vllm-project#17785)

ea34e5c

Signed-off-by: Yuqi Zhang <[email protected]>

minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025

[New Model]: nomic-embed-text-v2-moe (vllm-project#17785)

655a551

Signed-off-by: minpeter <[email protected]>

tanujtiwari1998 mentioned this pull request Jul 8, 2025

cached tokens completions character-tech/vllm#22

Merged

4 tasks

noooop deleted the nomic branch July 10, 2025 04:46

Uh oh!

[New Model]: nomic-embed-text-v2-moe #17785

[New Model]: nomic-embed-text-v2-moe #17785

Uh oh!

Conversation

noooop commented May 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

NomicBertModel

Context extension

NomicMoELayer

mteb_test_embed_models

numerical stability

tests Involved

Uh oh!

github-actions bot commented May 7, 2025

Uh oh!

noooop commented May 7, 2025

Uh oh!

DarkLight1337 commented May 7, 2025

Uh oh!

Uh oh!

DarkLight1337 commented May 8, 2025

Uh oh!

DarkLight1337 commented May 8, 2025

Uh oh!

Uh oh!

Uh oh!

noooop commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noooop commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented May 8, 2025

Uh oh!

Isotr0py commented May 8, 2025

Uh oh!

maxdebayser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

noooop commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxdebayser commented May 10, 2025

Uh oh!

noooop commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noooop commented May 11, 2025

Uh oh!

bhaktatejas922 commented May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noooop commented May 11, 2025

Uh oh!

Uh oh!

noooop commented May 11, 2025

Uh oh!

jiqiujia commented May 16, 2025

Uh oh!

Uh oh!

noooop commented May 7, 2025 •

edited by github-actions bot

Loading

noooop commented May 8, 2025 •

edited

Loading

noooop commented May 8, 2025 •

edited

Loading

noooop commented May 10, 2025 •

edited

Loading

noooop commented May 10, 2025 •

edited

Loading

bhaktatejas922 commented May 11, 2025 •

edited

Loading