-
Notifications
You must be signed in to change notification settings - Fork 99
Closed
Description
Hello (again)!
After experimenting with the project some more, I've found a few points of potential improvements. I'll aggregate them here if that's okay:
- Set
device=None
as the default indistill
, rather than"cpu"
. This should use Sentence Transformers' functionality to get the "strongest" available device by default (e.g. "cuda" first). P.s.: distillingmxbai-embed-large-v1
takes 00:05 seconds on my GPU vs 01:58 on CPU. (Edit: Might be moot since remove sentence transformers dependency #35) - Allow 1) passing an initialized Sentence Transformer model or 2) introduce the
trust_remote_code
argument. This is required for loading models with custom code, such as https://huggingface.co/jinaai/jina-embeddings-v2-base-en or https://huggingface.co/nomic-ai/nomic-embed-text-v1.5. (Edit: Might be moot since remove sentence transformers dependency #35) - Don't be afraid to link to yourself in the README.md, i.e. in the bottom of the model card template you could put a hyperlink over your names to your GitHub or something. I've included this in [
enh
] Rely on Jinja for model card, use model id/path in snippet #37. - In my experience, the
private
argument inpush_to_hub
is quite well liked, it allows people to look the model (card) over before making the model public to everyone. - The
token
argument inpush_to_hub
should be optional. If it's not specified, then the Hugging Face tools will automatically load it from your local filesystem if you ever didhuggingface-cli login
. - "Model Authors" in the model card is a bit of a misnomer: if someone else trained the model then they're the authors, you're the Model2Vec authors.
- You can add "Model2Vec models on Hugging Face" or something to the Additional Resources, linking to https://huggingface.co/models?library=model2vec, if you want.
- Tom Aarsen
Metadata
Metadata
Assignees
Labels
No labels