Skip to content

[New Model]: nomic-embed-text-v2-moe #17785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 11, 2025
Merged

[New Model]: nomic-embed-text-v2-moe #17785

merged 11 commits into from
May 11, 2025

Conversation

noooop
Copy link
Contributor

@noooop noooop commented May 7, 2025

Summary

Details

NomicBertModel

The new model uses the MoE architecture but still named NomicBertModel.

  • create a new module bert_with_rope and
  • split NomicBertModel, GteModel, JinaRobertaModel out from bert.py and roberta.py,
  • revert all modifications to bert.py and roberta.py.

I don't want to add too much code to the classic BERT architecture.

Context extension

If during inference it is found that seqlen > max_trained_positions, it will automatically change from NomicBertRotaryEmbedding to NomicBertDynamicNTKRotaryEmbedding.

https://huggingface.co/Snowflake/snowflake-arctic-embed-m-long/blob/main/modeling_hf_nomic_bert.py#L639
https://huggingface.co/nomic-ai/nomic-bert-2048/blob/main/modeling_hf_nomic_bert.py#L1413

        if seqlen > self.max_position_embeddings:
            base = self.base * (
                (self.rotary_scaling_factor * seqlen / self.max_position_embeddings) - (self.rotary_scaling_factor - 1)
            ) ** (self.dim / (self.dim - 2))
            inv_freq = self._compute_inv_freq(base=base, device=device)
            self.register_buffer("inv_freq", inv_freq, persistent=False)

It might lead to hard-to-detect bugs.

we ignore config.rotary_scaling_factor so that for datasets shorter than max_trained_positions 2048, the results are consistent with SentenceTransformer.

The context extension uses vllm style rope_theta and rope_scaling.

NomicMoELayer

NomicExpertMLP did not use GatedMLP, while FusedMoE currently only supports GatedMLP.

performance might be poor

mteb_test_embed_models

add mteb_test_embed_models, but only import it when needed.

numerical stability

The models from nomic require float32 to achieve relatively good numerical stability (<1e-4).

float16 nomic-ai/nomic-embed-text-v1 0.7375691474332452 -0.0023990889416417582 0.0005159950361333374
float32 nomic-ai/nomic-embed-text-v1 0.7375691474332452 -2.484723391815713e-06 7.641653300452288e-06

float16 nomic-ai/nomic-embed-text-v1.5 0.7449963653818389 -0.007560234163759838 0.0011759593384633366
float32 nomic-ai/nomic-embed-text-v1.5 0.7449963653818389 -4.274332828013705e-06 7.1521680795431445e-06

float16 nomic-ai/nomic-embed-text-v2-moe 0.7154891254186093 -0.001148407179619948 0.00026305757800751525
float32 nomic-ai/nomic-embed-text-v2-moe 0.7154891254186093 -2.480245292479921e-07 6.559447973608627e-06

However, Snowflake/snowflake-arctic-embed-m-long uses NomicBertModel(even with the same code), using float16 can also yield good results.

float16 Snowflake/snowflake-arctic-embed-m-long 0.6811445157066163 3.396798716037708e-05 1.224356222837439e-05
float32 Snowflake/snowflake-arctic-embed-m-long 0.6811445157066163 -2.0093559980338682e-06 1.1042117285383271e-05

nomic-ai/CodeRankEmbed using float16 can also yield good results.

float16 nomic-ai/CodeRankEmbed 0.6738279162360256 -2.992811111390825e-05 1.1006315084661332e-05
float32 nomic-ai/CodeRankEmbed 0.6738279162360256 -5.298823933408414e-06 9.233419183517363e-06

weird

tests Involved

pytest tests/models/language/pooling/test_nomic.py
pytest tests/models/language/pooling/test_snowflake_arctic_embed.py
pytest tests/models/language/pooling/test_jina.py

Partial_Fix #15849
Fix #12054
Fix #17949

Copy link

github-actions bot commented May 7, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@noooop
Copy link
Contributor Author

noooop commented May 7, 2025

@DarkLight1337

NomicExpertMLP did not use GatedMLP, while FusedMoE currently only supports GatedMLP.

performance might be poor

I am not familiar with moe, can you ask relevant experts for help?

@DarkLight1337
Copy link
Member

cc @mgoin @Isotr0py

@noooop noooop marked this pull request as ready for review May 8, 2025 03:50
@DarkLight1337
Copy link
Member

Can you also update the Supported Models page with these models?

@DarkLight1337
Copy link
Member

cc @maxdebayser as well

@mergify mergify bot added the documentation Improvements or additions to documentation label May 8, 2025
@noooop
Copy link
Contributor Author

noooop commented May 8, 2025

@Isotr0py @mgoin

NomicExpertMLP did not use GatedMLP, while FusedMoE currently only supports GatedMLP.

performance might be poor

Please assess how much effort is needed to make FusedMoE compatible with NomicExpertMLP.

If it will take a great effort , can I merge the current version in first?

@noooop
Copy link
Contributor Author

noooop commented May 8, 2025

@DarkLight1337

docs build errors may be due to snowballstemmer version upgrade

https://pypi.org/project/snowballstemmer/#history

snowballstemmer==3.0.0 Two hours ago posted

correct build
https://app.readthedocs.org/api/v2/build/28103004.txt
Collecting snowballstemmer>=2.2 (from sphinx)
Downloading snowballstemmer-2.2.0-py2.py3-none-any.whl.metadata (6.5 kB)

error
https://app.readthedocs.org/api/v2/build/28104027.txt
Collecting snowballstemmer>=2.2 (from sphinx)
Downloading snowballstemmer-3.0.0-py3-none-any.whl.metadata (7.7 kB)

@DarkLight1337
Copy link
Member

Yeah, you can ignore that for now

@Isotr0py
Copy link
Member

Isotr0py commented May 8, 2025

can I merge the current version in first?

I'm fine to have this version merged first. We can leave the fused_moe optimization to be done in a following PR.

Copy link
Contributor

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time tracing all the forward() calls and there doesn't seem to be huge changes compared to Bert. But since Nomic has renamed layers and changed their position a little bit, I can see that it's awkward to keep maintaining complicated weight mappings. So this is probably more maintainable. Thanks for contributing.

@noooop
Copy link
Contributor Author

noooop commented May 10, 2025

@DarkLight1337 @maxdebayser

how about

  • create a new module bert_with_rope and
  • split NomicBertModel, GteModel, JinaRobertaModel out from bert.py and roberta.py,
  • revert all modifications to bert.py and roberta.py.

I don't want to add too much code to the classic BERT architecture.

@maxdebayser
Copy link
Contributor

I don't want to add too much code to the classic BERT architecture.

I think that's reasonable. Adding a lot of new code makes the original logic hard to follow.

split NomicBertModel, GteModel, JinaRobertaModel out from bert.py and roberta.py,

Do all of these models follow the same naming scheme for the layers?

@noooop
Copy link
Contributor Author

noooop commented May 10, 2025

Do all of these models follow the same naming scheme for the layers?

They are almost the same in architecture, just with different layer names.

I use well-known layer names like qkv_proj, gate_up_proj, down_proj, attn_ln, mlp_ln in the code instead of Wqkv, fc1, fc2, norm1, norm2.

By remapping the layers, they all produce the correct results.

@noooop
Copy link
Contributor Author

noooop commented May 11, 2025

@DarkLight1337

ready to final review

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) May 11, 2025 04:18
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label May 11, 2025
@bhaktatejas922
Copy link

bhaktatejas922 commented May 11, 2025

how much more work would it be to support the alibaba version? its fairly similar to nomicembed in that it also applies RoPE and GLU to BERT so I imagine it wouldnt be that much work to also add?

https://huggingface.co/Alibaba-NLP/new-impl
Alibaba-NLP/new-impl--modeling.NewModel

@noooop
Copy link
Contributor Author

noooop commented May 11, 2025

how much more work would it be to support the alibaba version? its fairly similar to nomicembed in that it also applies RoPE and GLU to BERT so I imagine it wouldnt be that much work to also add?

https://huggingface.co/Alibaba-NLP/new-impl Alibaba-NLP/new-impl--modeling.NewModel

I will verify as soon as possible

@vllm-bot vllm-bot merged commit e4b8713 into vllm-project:main May 11, 2025
59 of 62 checks passed
@noooop
Copy link
Contributor Author

noooop commented May 11, 2025

Thanks for reviewing

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025
@jiqiujia
Copy link

mark

zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025
minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025
@noooop noooop deleted the nomic branch July 10, 2025 04:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Embedding model RoPE scaling factor always None [New Model]: nomic-ai/nomic-embed-text-v1
7 participants