Skip to content

Commit b047dfb

Browse files
authored
Merge pull request #923 from HazyResearch/0.7.0-alpha
Version 0.7
2 parents 33db6f7 + fdf9db7 commit b047dfb

File tree

90 files changed

+1407
-10500
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

90 files changed

+1407
-10500
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ downloads/
1111
**/checkpoint
1212
**/checkpoints/
1313
__pycache__
14+
*.egg-info/
1415
*corenlp.log
1516

1617
# Sphinx

.gitmodules

Lines changed: 0 additions & 4 deletions
This file was deleted.

.travis.yml

Lines changed: 41 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -3,101 +3,68 @@
33

44
dist: trusty
55
sudo: false # to use container-based infra, see: http://docs.travis-ci.com/user/migrating-from-legacy/
6-
7-
language:
8-
- python
9-
python:
10-
- "2.7"
11-
- "3.6"
12-
jdk:
13-
- oraclejdk8
6+
language: generic
7+
env:
8+
matrix:
9+
- PYTHON_VERSION=2.7
10+
- PYTHON_VERSION=3.6
1411

1512
cache:
1613
directories:
1714
- download
18-
- $HOME/.cache/pip
19-
- $HOME/miniconda/envs/test # to avoid repetitively setting up Ana/Miniconda environment
20-
- parser # to avoid repetitively downloading CoreNLP
21-
22-
addons:
23-
apt:
24-
packages:
25-
# CoreNLP needs Java 8
26-
- oracle-java8-installer
2715

28-
# Following trick is necessary to get a binary distribution of numpy, scipy, etc. which takes too long to build every time
29-
# See: http://stackoverflow.com/q/30588634
30-
# See: https://github.com/Theano/Theano/blob/master/.travis.yml (for caching)
31-
# See: http://conda.pydata.org/docs/travis.html
3216
before_install:
33-
- deactivate # leaving Travis' virtualenv first since otherwise Jupyter/IPython gets confused with conda inside a virtualenv (See: https://github.com/ipython/ipython/issues/8898)
34-
- mkdir -p download
35-
- cd download
36-
- rm -rf ~/miniconda
37-
- if [[ "$TRAVIS_PYTHON_VERSION" == "2.7" ]]; then
38-
travis_retry wget -c https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh -O miniconda.sh;
39-
else
40-
travis_retry wget -c https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh;
41-
fi
42-
- chmod +x miniconda.sh
43-
- bash miniconda.sh -b -f -p ~/miniconda
44-
- cd ..
45-
- export PATH=~/miniconda/bin:$PATH
46-
- conda update --yes conda
47-
48-
# Make sure Java 8 is used
49-
- export PATH="/usr/lib/jvm/java-8-oracle/bin:$PATH"
50-
- export JAVA_HOME=/usr/lib/jvm/java-8-oracle
51-
- java -version
52-
53-
# Set environment variables
54-
- source set_env.sh
17+
- travis_retry
18+
wget https://repo.continuum.io/miniconda/Miniconda3-4.5.1-Linux-x86_64.sh
19+
--output-document=miniconda.sh
20+
- bash miniconda.sh -b -p $HOME/miniconda
21+
- source $HOME/miniconda/etc/profile.d/conda.sh
22+
- conda config --set always_yes yes --set changeps1 no
23+
- conda info --all
5524

5625
install:
57-
# Install binary distribution of scientific python modules
58-
- test -e ~/miniconda/envs/test/bin/activate || ( rm -rf ~/miniconda/envs/test; conda create --yes -n test python=$TRAVIS_PYTHON_VERSION )
59-
- source activate test
60-
- conda install --yes numpy scipy matplotlib pip
61-
62-
# Install Numba
63-
- conda install --yes numba
64-
65-
# Install all remaining dependencies as per our README
66-
- pip install -r python-package-requirement.txt
67-
- test -e parser/corenlp.sh || ./install-parser.sh
68-
69-
# Use runipy to run Jupyter/IPython notebooks from command-line
70-
- pip install runipy
26+
- sed --in-place 's/- python/- python='"$PYTHON_VERSION"'/' environment.yml
27+
- conda env create --quiet --file=environment.yml
28+
- conda activate snorkel
29+
- pip install .
30+
- conda install --quiet tensorflow # Installs Tensorflow to test optional components
31+
- conda list
7132

7233
script:
7334

74-
# Run test modules
35+
# Run generative model test modules
7536
- python test/learning/test_gen_learning.py
7637
- python test/learning/test_supervised.py
7738
- python test/learning/test_categorical.py
78-
- runipy test/learning/test_TF_notebook.ipynb
79-
- runipy test/learning/test_parallel_grid_search.ipynb
39+
40+
# Run PyTorch test modules
41+
- python test/learning/pytorch/test_lstm.py
42+
- python test/learning/pytorch/test_model_reloading.py
43+
- python test/learning/pytorch/test_determinism.py
44+
45+
# Run Tensorflow test modules
46+
- runipy test/learning/tensorflow/test_TF_notebook.ipynb
47+
- runipy test/learning/tensorflow/test_parallel_grid_search.ipynb
8048

8149
# Runs intro tutorial notebooks
82-
- cd tutorials
83-
- runipy intro/Intro_Tutorial_1.ipynb
84-
- runipy intro/Intro_Tutorial_2.ipynb
85-
- runipy intro/Intro_Tutorial_3.ipynb
50+
- runipy tutorials/intro/Intro_Tutorial_1.ipynb
51+
- runipy tutorials/intro/Intro_Tutorial_2.ipynb
52+
- runipy tutorials/intro/Intro_Tutorial_3.ipynb
8653

8754
# Run advanced notebooks
88-
- runipy advanced/Categorical_Classes.ipynb
89-
- runipy advanced/Structure_Learning.ipynb
55+
- runipy tutorials/advanced/Categorical_Classes.ipynb
56+
- runipy tutorials/advanced/Structure_Learning.ipynb
9057

9158
# Run CDR tutorials
92-
- runipy cdr/CDR_Tutorial_1.ipynb
93-
- runipy cdr/CDR_Tutorial_2.ipynb
94-
- runipy cdr/CDR_Tutorial_3.ipynb
59+
- runipy tutorials/cdr/CDR_Tutorial_1.ipynb
60+
- runipy tutorials/cdr/CDR_Tutorial_2.ipynb
61+
- runipy tutorials/cdr/CDR_Tutorial_3.ipynb
9562

9663
# TODO check outputs, upload results, etc.
9764
# for more ideas, see: https://github.com/rossant/ipycache/issues/7
9865

99-
after_success:
100-
- killall java
101-
102-
after_failure:
103-
- killall java
66+
# Build Sphinx documentation
67+
# # Disabled due to the following error:
68+
# # make: *** docs: No such file or directory. Stop.
69+
# - conda install --channel=conda-forge sphinx=1.7.4
70+
# - make --directory=docs html

MANIFEST.in

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
include README.md
2+
include LICENSE
3+
include environment.yml
4+
include snorkel/vis/tree-chart.html
5+
include snorkel/vis/tree-chart.js
6+
recursive-include snorkel/viewer *

README.md

Lines changed: 114 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
<img src="figs/logo_01.png" width="150"/>
22

33

4-
**_v0.6.3_**
4+
**_v0.7.0-beta_**
55

66
[![Build Status](https://travis-ci.org/HazyResearch/snorkel.svg?branch=master)](https://travis-ci.org/HazyResearch/snorkel)
77
[![Documentation](https://readthedocs.org/projects/snorkel/badge/)](http://snorkel.readthedocs.io/en/master/)
@@ -14,8 +14,8 @@
1414

1515
## Getting Started
1616

17-
* Installation instructions [below](#installation)
18-
* Get started with the tutorials [below](#learning-how-to-use-snorkel)
17+
* Get set up quickly [below](#quick-start)
18+
* Try the tutorials with [these instructions](#tutorials)
1919
* Documentation [here](http://snorkel.readthedocs.io/en/master/)
2020

2121
## Motivation
@@ -48,74 +48,142 @@ However, **_Snorkel is very much a work in progress_**, so we're eager for any a
4848
* _[Learning to Compose Domain-Specific Transformations for Data Augmentation](https://arxiv.org/abs/1709.01643)_ (NIPS 2017)
4949
* _[Gaussian Quadrature for Kernel Features](https://arxiv.org/abs/1709.02605)_ (NIPS 2017)
5050

51-
## Learning how to use Snorkel
52-
The [introductory tutorial](https://github.com/HazyResearch/snorkel/tree/master/tutorials/intro) covers the entire Snorkel workflow, showing how to extract spouse relations from news articles.
53-
The tutorial is available in the following directory:
54-
```
55-
tutorials/intro
51+
## Quick Start
52+
53+
This section has the commands to quickly get started running Snorkel.
54+
For more detailed installation instructions, see the [Installation section](#installation) below.
55+
These instructions assume that you already have [conda](https://conda.io/) installed.
56+
57+
First, download and extract a copy of the Snorkel directory from a [GitHub release](https://github.com/HazyResearch/snorkel/releases) (version 0.7.0 or greater).
58+
Then navigate to the root of the `snorkel` directory in a terminal and run the following:
59+
60+
```sh
61+
# Install the environment
62+
conda env create --file=environment.yml
63+
64+
# Activate the environment
65+
conda activate snorkel
66+
67+
# Install snorkel in the environment
68+
pip install .
69+
70+
# Activate jupyter widgets
71+
jupyter nbextension enable --py widgetsnbextension
72+
73+
# Initiate a jupyter notebook server
74+
jupyter notebook
5675
```
57-
You can also check out all the great **[materials](https://simtk.org/frs/?group_id=1263)** from the recent Mobilize Center-hosted [Snorkel workshop](http://mobilize.stanford.edu/events/snorkelworkshop2017/)!
5876

59-
Then, for more content, check out the other tutorials avaliable [here](https://github.com/HazyResearch/snorkel/tree/master/tutorials).
77+
Then a Jupyter notebook tab will open in your browser. From here you can run existing Snorkel notebooks or create your own.
78+
79+
### Tutorials
80+
81+
From within the Jupyter browser, navigate to the [`tutorials`](tutorials) directory and try out one of the existing notebooks!
82+
83+
The [introductory tutorial](tutorials/intro) in `tutorials/intro` covers the entire Snorkel workflow, showing how to extract spouse relations from news articles.
84+
You can also check out all the great [materials](https://simtk.org/frs/?group_id=1263) from the recent Mobilize Center-hosted [Snorkel workshop](http://mobilize.stanford.edu/events/snorkelworkshop2017/)!
6085

6186
## Release Notes
87+
88+
### Major changes in v0.7:
89+
* [PyTorch](https://pytorch.org/) classifiers
90+
* Installation now via [Conda](https://conda.io/) and `pip`
91+
* Now [spaCy](https://spacy.io/) is the default parser (v1), with support for v2
92+
* And many more fixes, additions, and new material!
93+
94+
### Older versions
95+
96+
<details>
97+
6298
### Major changes in v0.6:
99+
63100
* Support for categorical classification, including "dynamically-scoped" or _blocked_ categoricals (see [tutorial](tutorials/advanced/Categorical_Classes.ipynb))
64101
* Support for structure learning (see [tutorial](tutorials/advanced/Structure_Learning.ipynb), ICML 2017 paper)
65102
* Support for labeled data in generative model
66103
* Refactor of TensorFlow bindings; fixes grid search and model saving / reloading issues (see `snorkel/learning`)
67104
* New, simplified Intro tutorial ([here](tutorials/intro))
68-
* Refactored parser class and support for [spaCy](https://spacy.io/) as new default parser
105+
* Refactored parser class and support for [spaCy](https://spacy.io/) as new parser
69106
* Support for easy use of the [BRAT annotation tool](http://brat.nlplab.org/) (see [tutorial](tutorials/advanced/BRAT_Annotations.ipynb))
70107
* Initial Spark integration, for scale out of LF application (see [tutorial](tutorials/snark/Snark%20Tutorial.ipynb))
71108
* Tutorial on using crowdsourced data [here](tutorials/crowdsourcing/Crowdsourced_Sentiment_Analysis.ipynb)
72109
* Integration with [Apache Tika](http://tika.apache.org/) via the [Tika Python](http://github.com/chrismattmann/tika-python.git) binding.
73110
* And many more fixes, additions, and new material!
74111

112+
</details>
113+
75114
## Installation
76-
Snorkel uses Python 2.7 or Python 3 and requires [a few python packages](python-package-requirement.txt) which can be installed using [`conda`](https://www.continuum.io/downloads) and `pip`.
77115

78-
### Setting Up Conda
79-
Installation is easiest if you download and install [`conda`](https://www.continuum.io/downloads).
80-
You can create a new conda environment with e.g.:
81-
```
82-
conda create -n py2Env python=2.7 anaconda
83-
```
84-
And then run the correct environment:
85-
```
86-
source activate py2Env
87-
```
116+
Starting with version 0.7.0, Snorkel should be installed as a Python package using `pip`.
117+
However, installing Snorkel via `pip` will not install dependencies, which are required for Snorkel to run.
118+
To manage its dependencies, Snorkel uses [conda](https://conda.io/), which allows specifying an environment via an `environment.yml` file.
88119

89-
### Installing dependencies
90-
First install [NUMBA](https://numba.pydata.org/), a package for high-performance numeric computing in Python via Conda:
91-
```bash
92-
conda install numba
93-
```
120+
This documentation covers two common cases (usage and development) for setting up conda environments for Snorkel.
121+
In both cases, the environment can be activated using `conda activate snorkel` and deactivated using `conda deactivate`
122+
(for versions of conda prior to 4.4, replace `conda` with `source` in these commands).
123+
Users just looking to try out a Snorkel tutorial notebook should see the quick-start instructions above.
94124

95-
Then install the remaining package requirements:
96-
```bash
97-
pip install --requirement python-package-requirement.txt
98-
```
125+
### Using Snorkel as a Package
126+
127+
This setup is intended for users who would like to use Snorkel in their own applications by importing the package.
128+
In such cases, users should define a custom `environment.yml` to manage their project's dependencies.
129+
We recommend starting with the [`environment.yml`](environment.yml) in this repository.
130+
The below modifications can help customize it for your needs:
131+
132+
<details>
133+
134+
1. Specifying versions for the listed packages, such as changing `python` to `python=3.6.5`.
135+
Versioned specification of your environment is critical to reproducibility and ensuring dependency updates do not break your pipeline.
136+
When first setting your package versions, you likely want to start with the latest versions available on the [conda-forge](https://anaconda.org/conda-forge/) channel, unless you have a reason to do otherwise.
137+
2. Adding other packages to your environment as required by your use case.
138+
Consider maintaining alphabetical sorting of packages in `environment.yml` to assist with maintainability.
139+
In addition, we recommend installing packages via pip, only if they are not available in the conda-forge channel.
140+
3. Add the `snorkel` package installation to your `environment.yml`, under the `- pip` section.
141+
Of course, we suggest versioning snorkel, which you can do via a release number or commit hash (to access more bleeding edge functionality)
142+
```yml
143+
# Versioned via release tag
144+
- git+https://github.com/HazyResearch/[email protected]
145+
# Versioned via commit hash (commit hash below is fake to ensure you change it)
146+
- git+https://github.com/HazyResearch/snorkel@7eb7076f70078c06bef9752f22acf92fd86e616a
147+
```
148+
Finally, consider versioning the `numbskull` and `treedlib` pip dependencies by changing `master` to their latest commit hash on GitHub.
149+
150+
</details>
151+
152+
### Development Environment
153+
154+
This setup is intended for users who have cloned this repository and would like to access the environment for development.
155+
This approach installs the `snorkel` package in development mode, meaning that changes you make to the source code will automatically be applied to the `snorkel` package in the environment.
156+
157+
```sh
158+
# From the root direcectory of this repo run the following command.
159+
conda env create --file=environment.yml
160+
161+
# Activate the conda environment (if using a version of conda below 4.4, use "source" instead of "conda")
162+
conda activate snorkel
99163
100-
Finally, enable `ipywidgets`:
101-
```bash
102-
jupyter nbextension enable --py widgetsnbextension --sys-prefix
164+
# Install snorkel in development mode
165+
pip install --editable .
103166
```
104167

105-
_Note: If you are using conda and experience issues with `lxml`, try running `conda install libxml2`._
168+
### Additional installation notes
106169

107-
_Note: Currently the `Viewer` is supported on the following versions:_
108-
* `jupyter`: 4.1
109-
* `jupyter notebook`: 4.2
170+
<details>
110171

111-
In some tutorials, etc. we also use [Stanford CoreNLP](http://stanfordnlp.github.io/CoreNLP/) for pre-processing text; you will be prompted to install this when you run `run.sh`.
172+
Snorkel can be installed directly from its GitHub repository via:
112173

113-
## Running
114-
After installing, just run:
115174
```
116-
./run.sh
175+
# WARNING: read installation section before running this command! This command
176+
# does not install any dependencies. It installs the latest master version but
177+
# you can change master to tag or commit
178+
pip install git+https://github.com/HazyResearch/snorkel@master
117179
```
118180

181+
_Note: Currently the `Viewer` is supported on the following versions:_
182+
* `jupyter`: 4.1
183+
* `jupyter notebook`: 4.2
184+
185+
</details>
186+
119187
## Q & A
120188
**Many questions about Snorkel get answered in the issues section--along with general discussions and conversations of interest.
121189
We tag these all as "Q&A" and save them [here](https://github.com/HazyResearch/snorkel/issues?utf8=%E2%9C%93&q=is%3Aissue+label%3A%22Q%26A%22+)**
@@ -130,6 +198,8 @@ If submitting an issue about a bug, however, **please provide a pointer to a not
130198

131199
Snorkel is built specifically with usage in **Jupyter/IPython notebooks** in mind; an incomplete set of best practices for the notebooks:
132200

201+
<details>
202+
133203
It's usually most convenient to write most code in an external `.py` file, and load as a module that's automatically reloaded; use:
134204
```python
135205
%load_ext autoreload
@@ -140,3 +210,5 @@ A more convenient option is to add these lines to your IPython config file, in `
140210
c.InteractiveShellApp.extensions = ['autoreload']
141211
c.InteractiveShellApp.exec_lines = ['%autoreload 2']
142212
```
213+
214+
</details>

docs/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,4 @@ make html
1111

1212
**Note: Most problems are caused by dependence on libraries that readthedocs can't
1313
load (ones that rely on C libs) like `numpy` or `scipy`; just add these (and all
14-
submodules loaded) to the `MOCK_MODULES` array in `conf.py`.**
14+
submodules loaded) to the `MOCK_MODULES` array in `conf.py`.**

0 commit comments

Comments
 (0)