-
-
Notifications
You must be signed in to change notification settings - Fork 9.2k
[New Model]: nomic-embed-text-v2-moe #17785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
NomicExpertMLP did not use GatedMLP, while FusedMoE currently only supports GatedMLP. performance might be poor I am not familiar with moe, can you ask relevant experts for help? |
Can you also update the Supported Models page with these models? |
cc @maxdebayser as well |
docs build errors may be due to snowballstemmer version upgrade https://pypi.org/project/snowballstemmer/#history snowballstemmer==3.0.0 Two hours ago posted correct build error |
Yeah, you can ignore that for now |
I'm fine to have this version merged first. We can leave the fused_moe optimization to be done in a following PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some time tracing all the forward()
calls and there doesn't seem to be huge changes compared to Bert
. But since Nomic has renamed layers and changed their position a little bit, I can see that it's awkward to keep maintaining complicated weight mappings. So this is probably more maintainable. Thanks for contributing.
how about
I don't want to add too much code to the classic BERT architecture. |
I think that's reasonable. Adding a lot of new code makes the original logic hard to follow.
Do all of these models follow the same naming scheme for the layers? |
They are almost the same in architecture, just with different layer names. I use well-known layer names like qkv_proj, gate_up_proj, down_proj, attn_ln, mlp_ln in the code instead of Wqkv, fc1, fc2, norm1, norm2. By remapping the layers, they all produce the correct results. |
ready to final review |
how much more work would it be to support the alibaba version? its fairly similar to nomicembed in that it also applies RoPE and GLU to BERT so I imagine it wouldnt be that much work to also add? https://huggingface.co/Alibaba-NLP/new-impl |
I will verify as soon as possible |
Thanks for reviewing |
Signed-off-by: Mu Huai <[email protected]>
mark |
Signed-off-by: Yuqi Zhang <[email protected]>
Signed-off-by: minpeter <[email protected]>
Summary
Details
NomicBertModel
The new model uses the MoE architecture but still named NomicBertModel.
I don't want to add too much code to the classic BERT architecture.
Context extension
If during inference it is found that seqlen > max_trained_positions, it will automatically change from NomicBertRotaryEmbedding to NomicBertDynamicNTKRotaryEmbedding.
https://huggingface.co/Snowflake/snowflake-arctic-embed-m-long/blob/main/modeling_hf_nomic_bert.py#L639
https://huggingface.co/nomic-ai/nomic-bert-2048/blob/main/modeling_hf_nomic_bert.py#L1413
It might lead to hard-to-detect bugs.
we ignore config.rotary_scaling_factor so that for datasets shorter than max_trained_positions 2048, the results are consistent with SentenceTransformer.
The context extension uses vllm style rope_theta and rope_scaling.
NomicMoELayer
NomicExpertMLP did not use GatedMLP, while FusedMoE currently only supports GatedMLP.
performance might be poor
mteb_test_embed_models
add mteb_test_embed_models, but only import it when needed.
numerical stability
The models from nomic require float32 to achieve relatively good numerical stability (<1e-4).
However, Snowflake/snowflake-arctic-embed-m-long uses NomicBertModel(even with the same code), using float16 can also yield good results.
nomic-ai/CodeRankEmbed using float16 can also yield good results.
weird
tests Involved
Partial_Fix #15849
Fix #12054
Fix #17949