Skip to content

Memoise beta, the stick breaking proportions of a DP #575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 23, 2017

Conversation

JohnReid
Copy link
Contributor

A DirichletProcess should retain its beta parameters across calls to sample. I've implemented this in the same way as @dustinvtran did for the theta parameters in issue #564. Also I simplified the code somewhat. Now the body and condition functions for the tf.while_loop are defined in sample() so that the scoping is easier. The theta and beta tensors are added to as needed and don't need to be sampled in __init__, rather they are initialised as empty. Also I ran the code through AutoPep8, I hope that is OK. I'm not sure what edward's standards are re. indentation.

…raws from the base distribution. (didn't include all changes in previous commit)
@dustinvtran
Copy link
Member

Thanks for this.

We use 2-indent, following TensorFlow's style. I wasn't able to read the diff since the 2-to-4 swap makes it difficult. One comment from your description:

the body and condition functions for the tf.while_loop are defined in sample() so that the scoping is easier

I thought about this a little and think it's preferable to have them outside the function. While scoping is immediate with nested functions, it requires redefining the function for each call to sample, which can be an unnecessary expense.

@JohnReid
Copy link
Contributor Author

Well if I could work out how to remove the last commit from this pull request then it would change it back to 2-space indent and move the functions back to class methods. I think it might be easier just to create another pull request. I would be surprised if it ran significantly quicker with the functions as methods rather than defined inline.

@dustinvtran
Copy link
Member

dustinvtran commented Mar 22, 2017

In your branch, try:

git reset HEAD~1

then you can force push to update your repository:

git push -f

@dustinvtran
Copy link
Member

I also recommend using pep8 to manually check your file. I personally prefer the manual approach as it enforces good practices so eventually you always write PEP8-able code. :)

@JohnReid
Copy link
Contributor Author

OK @dustinvtran, I've changed the indentation back and changed the functions to be methods again.

Copy link
Member

@dustinvtran dustinvtran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve of the major changes. Minor comments below.

@@ -37,7 +38,7 @@ def __init__(self, alpha, base_cls, validate_args=False, allow_nan_stats=True,
>>>
>>> # vector of concentration parameters, matrix of Exponentials
>>> dp = DirichletProcess(tf.constant([0.1, 0.4]),
... Exponential, lam=tf.ones([5, 3]))
... Exponential, lam=tf.ones([5, 3]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove blankspaces

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

@@ -10,8 +10,9 @@


class DirichletProcess(RandomVariable, Distribution):
def __init__(self, alpha, base_cls, validate_args=False, allow_nan_stats=True,
name="DirichletProcess", value=None, *args, **kwargs):
def __init__(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason you added a linebreak? i'd default to no break otherwise, to be consistent with other class definitions in edward

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flake8 told me the line was too long by 1 char

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it. It should pass pep8 according to the additional rules we set up, so no need to bother (https://github.com/blei-lab/edward/blob/master/setup.cfg). (the pep8 python package automatically uses additional rules from setup.cfg; not sure about flake8.)

# Define atoms of Dirichlet process, storing only the first as default.
self._theta = tf.expand_dims(
self._base.sample(self.get_batch_shape()), 0)
# Create empty tensor to store future atoms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use period when ending comments that are sentences, similar to the comment two lines above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do


super(DirichletProcess, self).__init__(
dtype=tf.int32,
dtype=tf.float32,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you comment on this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could have the wrong end of the stick here but, isn't the process associated with the thetas? They don't have int32 type. Perhaps this should be base_cls.dtype

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Dirichlet process always returns atoms, i.e., it is (almost surely) discrete even when the base distribution is continuous. This made me opt to use tf.int32. That said, the jury's still out on what dtype should be used for discrete distributions.

@dustinvtran
Copy link
Member

dustinvtran commented Mar 23, 2017

The test for DP sampling is failing. It says that you're concatenating two tensors of differing dtype.

I think this can be solved by passing in the dtype whenever you're first initializing the tensors. For example,

self._theta = tf.zeros(..., dtype=self._base.dtype)
...
self._beta = tf.zeros(..., dtype=self._betadist.dtype)
...
draws = tf.zeros([n] + batch_shape + event_shape, dtype=self._base.dtype)

@JohnReid
Copy link
Contributor Author

JohnReid commented Mar 23, 2017

Thanks for all the help. I think I managed to get it to pass the tests. I need to start using pep8 more often I think. I hope the fix is useful.

@dustinvtran dustinvtran merged commit 6a38828 into blei-lab:master Mar 23, 2017
@JohnReid JohnReid deleted the dpfix branch March 23, 2017 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants