Skip to content

HazyResearch/treedlib

Repository files navigation

TreeDLib

jupyter notebook

Feature Ideas Doc

Shared google doc here

Dependencies

Binning

This will eventually be done automatically (in some more or less-sophisticated manner...), however in the mean time, to use Indicator operators that use bins- such as LengthBin(NodeSet, bin_divs)- you can follow a simple rough procedure:

First, generate the features table, making sure to include full-path features for the lengths of interest. For example, for sequence and dependency tree path lengths, you would need to include:

Indicator(Between(Mention(0), Mention(1)), 'word')
Indicator(SeqBetween(), 'word')

(these are currently implemented as get_relation_binning_features). Then, you can use code such as:

SELECT * FROM genepheno_features WHERE feature LIKE '%SEQ%'
seq_lens = [len(rs.feature.split('_')) for rs in res_seq]
n, bins, patches = plt.hist(seq_lens, 50, normed=1, facecolor='green', alpha=0.75)
print [np.percentile(seq_lens, p) for p in [25,50,75]]

See treedlib.ipynb for an example implementation.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 7