You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- deactivate # leaving Travis' virtualenv first since otherwise Jupyter/IPython gets confused with conda inside a virtualenv (See: https://github.com/ipython/ipython/issues/8898)
34
-
- mkdir -p download
35
-
- cd download
36
-
- rm -rf ~/miniconda
37
-
- if [[ "$TRAVIS_PYTHON_VERSION" == "2.7" ]]; then
@@ -48,74 +48,142 @@ However, **_Snorkel is very much a work in progress_**, so we're eager for any a
48
48
*_[Learning to Compose Domain-Specific Transformations for Data Augmentation](https://arxiv.org/abs/1709.01643)_ (NIPS 2017)
49
49
*_[Gaussian Quadrature for Kernel Features](https://arxiv.org/abs/1709.02605)_ (NIPS 2017)
50
50
51
-
## Learning how to use Snorkel
52
-
The [introductory tutorial](https://github.com/HazyResearch/snorkel/tree/master/tutorials/intro) covers the entire Snorkel workflow, showing how to extract spouse relations from news articles.
53
-
The tutorial is available in the following directory:
54
-
```
55
-
tutorials/intro
51
+
## Quick Start
52
+
53
+
This section has the commands to quickly get started running Snorkel.
54
+
For more detailed installation instructions, see the [Installation section](#installation) below.
55
+
These instructions assume that you already have [conda](https://conda.io/) installed.
56
+
57
+
First, download and extract a copy of the Snorkel directory from a [GitHub release](https://github.com/HazyResearch/snorkel/releases) (version 0.7.0 or greater).
58
+
Then navigate to the root of the `snorkel` directory in a terminal and run the following:
You can also check out all the great **[materials](https://simtk.org/frs/?group_id=1263)** from the recent Mobilize Center-hosted [Snorkel workshop](http://mobilize.stanford.edu/events/snorkelworkshop2017/)!
58
76
59
-
Then, for more content, check out the other tutorials avaliable [here](https://github.com/HazyResearch/snorkel/tree/master/tutorials).
77
+
Then a Jupyter notebook tab will open in your browser. From here you can run existing Snorkel notebooks or create your own.
78
+
79
+
### Tutorials
80
+
81
+
From within the Jupyter browser, navigate to the [`tutorials`](tutorials) directory and try out one of the existing notebooks!
82
+
83
+
The [introductory tutorial](tutorials/intro) in `tutorials/intro` covers the entire Snorkel workflow, showing how to extract spouse relations from news articles.
84
+
You can also check out all the great [materials](https://simtk.org/frs/?group_id=1263) from the recent Mobilize Center-hosted [Snorkel workshop](http://mobilize.stanford.edu/events/snorkelworkshop2017/)!
60
85
61
86
## Release Notes
87
+
88
+
### Major changes in v0.7:
89
+
*[PyTorch](https://pytorch.org/) classifiers
90
+
* Installation now via [Conda](https://conda.io/) and `pip`
91
+
* Now [spaCy](https://spacy.io/) is the default parser (v1), with support for v2
92
+
* And many more fixes, additions, and new material!
93
+
94
+
### Older versions
95
+
96
+
<details>
97
+
62
98
### Major changes in v0.6:
99
+
63
100
* Support for categorical classification, including "dynamically-scoped" or _blocked_ categoricals (see [tutorial](tutorials/advanced/Categorical_Classes.ipynb))
64
101
* Support for structure learning (see [tutorial](tutorials/advanced/Structure_Learning.ipynb), ICML 2017 paper)
65
102
* Support for labeled data in generative model
66
103
* Refactor of TensorFlow bindings; fixes grid search and model saving / reloading issues (see `snorkel/learning`)
* Refactored parser class and support for [spaCy](https://spacy.io/) as new default parser
105
+
* Refactored parser class and support for [spaCy](https://spacy.io/) as new parser
69
106
* Support for easy use of the [BRAT annotation tool](http://brat.nlplab.org/) (see [tutorial](tutorials/advanced/BRAT_Annotations.ipynb))
70
107
* Initial Spark integration, for scale out of LF application (see [tutorial](tutorials/snark/Snark%20Tutorial.ipynb))
71
108
* Tutorial on using crowdsourced data [here](tutorials/crowdsourcing/Crowdsourced_Sentiment_Analysis.ipynb)
72
109
* Integration with [Apache Tika](http://tika.apache.org/) via the [Tika Python](http://github.com/chrismattmann/tika-python.git) binding.
73
110
* And many more fixes, additions, and new material!
74
111
112
+
</details>
113
+
75
114
## Installation
76
-
Snorkel uses Python 2.7 or Python 3 and requires [a few python packages](python-package-requirement.txt) which can be installed using [`conda`](https://www.continuum.io/downloads) and `pip`.
77
115
78
-
### Setting Up Conda
79
-
Installation is easiest if you download and install [`conda`](https://www.continuum.io/downloads).
80
-
You can create a new conda environment with e.g.:
81
-
```
82
-
conda create -n py2Env python=2.7 anaconda
83
-
```
84
-
And then run the correct environment:
85
-
```
86
-
source activate py2Env
87
-
```
116
+
Starting with version 0.7.0, Snorkel should be installed as a Python package using `pip`.
117
+
However, installing Snorkel via `pip` will not install dependencies, which are required for Snorkel to run.
118
+
To manage its dependencies, Snorkel uses [conda](https://conda.io/), which allows specifying an environment via an `environment.yml` file.
88
119
89
-
### Installing dependencies
90
-
First install [NUMBA](https://numba.pydata.org/), a package for high-performance numeric computing in Python via Conda:
91
-
```bash
92
-
conda install numba
93
-
```
120
+
This documentation covers two common cases (usage and development) for setting up conda environments for Snorkel.
121
+
In both cases, the environment can be activated using `conda activate snorkel` and deactivated using `conda deactivate`
122
+
(for versions of conda prior to 4.4, replace `conda` with `source` in these commands).
123
+
Users just looking to try out a Snorkel tutorial notebook should see the quick-start instructions above.
This setup is intended for users who would like to use Snorkel in their own applications by importing the package.
128
+
In such cases, users should define a custom `environment.yml` to manage their project's dependencies.
129
+
We recommend starting with the [`environment.yml`](environment.yml) in this repository.
130
+
The below modifications can help customize it for your needs:
131
+
132
+
<details>
133
+
134
+
1. Specifying versions for the listed packages, such as changing `python` to `python=3.6.5`.
135
+
Versioned specification of your environment is critical to reproducibility and ensuring dependency updates do not break your pipeline.
136
+
When first setting your package versions, you likely want to start with the latest versions available on the [conda-forge](https://anaconda.org/conda-forge/) channel, unless you have a reason to do otherwise.
137
+
2. Adding other packages to your environment as required by your use case.
138
+
Consider maintaining alphabetical sorting of packages in `environment.yml` to assist with maintainability.
139
+
In addition, we recommend installing packages via pip, only if they are not available in the conda-forge channel.
140
+
3. Add the `snorkel` package installation to your `environment.yml`, under the `- pip` section.
141
+
Of course, we suggest versioning snorkel, which you can do via a release number or commit hash (to access more bleeding edge functionality)
Finally, consider versioning the `numbskull` and `treedlib` pip dependencies by changing `master` to their latest commit hash on GitHub.
149
+
150
+
</details>
151
+
152
+
### Development Environment
153
+
154
+
This setup is intended for users who have cloned this repository and would like to access the environment for development.
155
+
This approach installs the `snorkel` package in development mode, meaning that changes you make to the source code will automatically be applied to the `snorkel` package in the environment.
156
+
157
+
```sh
158
+
# From the root direcectory of this repo run the following command.
159
+
conda env create --file=environment.yml
160
+
161
+
# Activate the conda environment (if using a version of conda below 4.4, use "source" instead of "conda")
_Note: If you are using conda and experience issues with `lxml`, try running `conda install libxml2`._
168
+
### Additional installation notes
106
169
107
-
_Note: Currently the `Viewer` is supported on the following versions:_
108
-
*`jupyter`: 4.1
109
-
*`jupyter notebook`: 4.2
170
+
<details>
110
171
111
-
In some tutorials, etc. we also use [Stanford CoreNLP](http://stanfordnlp.github.io/CoreNLP/) for pre-processing text; you will be prompted to install this when you run `run.sh`.
172
+
Snorkel can be installed directly from its GitHub repository via:
112
173
113
-
## Running
114
-
After installing, just run:
115
174
```
116
-
./run.sh
175
+
# WARNING: read installation section before running this command! This command
176
+
# does not install any dependencies. It installs the latest master version but
_Note: Currently the `Viewer` is supported on the following versions:_
182
+
* `jupyter`: 4.1
183
+
* `jupyter notebook`: 4.2
184
+
185
+
</details>
186
+
119
187
## Q & A
120
188
**Many questions about Snorkel get answered in the issues section--along with general discussions and conversations of interest.
121
189
We tag these all as "Q&A" and save them [here](https://github.com/HazyResearch/snorkel/issues?utf8=%E2%9C%93&q=is%3Aissue+label%3A%22Q%26A%22+)**
@@ -130,6 +198,8 @@ If submitting an issue about a bug, however, **please provide a pointer to a not
130
198
131
199
Snorkel is built specifically with usage in **Jupyter/IPython notebooks** in mind; an incomplete set of best practices for the notebooks:
132
200
201
+
<details>
202
+
133
203
It's usually most convenient to write most code in an external `.py` file, and load as a module that's automatically reloaded; use:
134
204
```python
135
205
%load_ext autoreload
@@ -140,3 +210,5 @@ A more convenient option is to add these lines to your IPython config file, in `
0 commit comments