Memoise beta, the stick breaking proportions of a DP #575

JohnReid · 2017-03-22T13:45:51Z

A DirichletProcess should retain its beta parameters across calls to sample. I've implemented this in the same way as @dustinvtran did for the theta parameters in issue #564. Also I simplified the code somewhat. Now the body and condition functions for the tf.while_loop are defined in sample() so that the scoping is easier. The theta and beta tensors are added to as needed and don't need to be sampled in __init__, rather they are initialised as empty. Also I ran the code through AutoPep8, I hope that is OK. I'm not sure what edward's standards are re. indentation.

…raws from the base distribution.

…raws from the base distribution. (didn't include all changes in previous commit)

… inside sample function for easier scoping.

dustinvtran · 2017-03-22T15:29:56Z

Thanks for this.

We use 2-indent, following TensorFlow's style. I wasn't able to read the diff since the 2-to-4 swap makes it difficult. One comment from your description:

the body and condition functions for the tf.while_loop are defined in sample() so that the scoping is easier

I thought about this a little and think it's preferable to have them outside the function. While scoping is immediate with nested functions, it requires redefining the function for each call to sample, which can be an unnecessary expense.

JohnReid · 2017-03-22T15:56:48Z

Well if I could work out how to remove the last commit from this pull request then it would change it back to 2-space indent and move the functions back to class methods. I think it might be easier just to create another pull request. I would be surprised if it ran significantly quicker with the functions as methods rather than defined inline.

dustinvtran · 2017-03-22T15:58:07Z

In your branch, try:

git reset HEAD~1

then you can force push to update your repository:

git push -f

dustinvtran · 2017-03-22T16:00:11Z

I also recommend using pep8 to manually check your file. I personally prefer the manual approach as it enforces good practices so eventually you always write PEP8-able code. :)

JohnReid · 2017-03-22T16:43:03Z

OK @dustinvtran, I've changed the indentation back and changed the functions to be methods again.

dustinvtran

I approve of the major changes. Minor comments below.

dustinvtran · 2017-03-22T17:10:22Z

edward/models/dirichlet_process.py

@@ -37,7 +38,7 @@ def __init__(self, alpha, base_cls, validate_args=False, allow_nan_stats=True,
    >>>
    >>> # vector of concentration parameters, matrix of Exponentials
    >>> dp = DirichletProcess(tf.constant([0.1, 0.4]),
-    ...                       Exponential, lam=tf.ones([5, 3]))
+        ...                       Exponential, lam=tf.ones([5, 3]))


remove blankspaces

dustinvtran · 2017-03-22T17:11:11Z

edward/models/dirichlet_process.py

@@ -10,8 +10,9 @@


 class DirichletProcess(RandomVariable, Distribution):
-  def __init__(self, alpha, base_cls, validate_args=False, allow_nan_stats=True,
-               name="DirichletProcess", value=None, *args, **kwargs):
+  def __init__(


any reason you added a linebreak? i'd default to no break otherwise, to be consistent with other class definitions in edward

Flake8 told me the line was too long by 1 char

Ah, got it. It should pass pep8 according to the additional rules we set up, so no need to bother (https://github.com/blei-lab/edward/blob/master/setup.cfg). (the pep8 python package automatically uses additional rules from setup.cfg; not sure about flake8.)

dustinvtran · 2017-03-22T17:12:15Z

edward/models/dirichlet_process.py

-        # Define atoms of Dirichlet process, storing only the first as default.
-        self._theta = tf.expand_dims(
-            self._base.sample(self.get_batch_shape()), 0)
+        # Create empty tensor to store future atoms


use period when ending comments that are sentences, similar to the comment two lines above

dustinvtran · 2017-03-22T17:14:22Z

edward/models/dirichlet_process.py


        super(DirichletProcess, self).__init__(
-            dtype=tf.int32,
+            dtype=tf.float32,


Could you comment on this change?

I could have the wrong end of the stick here but, isn't the process associated with the thetas? They don't have int32 type. Perhaps this should be base_cls.dtype

The Dirichlet process always returns atoms, i.e., it is (almost surely) discrete even when the base distribution is continuous. This made me opt to use tf.int32. That said, the jury's still out on what dtype should be used for discrete distributions.

dustinvtran · 2017-03-23T04:16:21Z

The test for DP sampling is failing. It says that you're concatenating two tensors of differing dtype.

I think this can be solved by passing in the dtype whenever you're first initializing the tensors. For example,

self._theta = tf.zeros(..., dtype=self._base.dtype)
...
self._beta = tf.zeros(..., dtype=self._betadist.dtype)
...
draws = tf.zeros([n] + batch_shape + event_shape, dtype=self._base.dtype)

JohnReid · 2017-03-23T10:27:17Z

Thanks for all the help. I think I managed to get it to pass the tests. I need to start using pep8 more often I think. I hope the fix is useful.

JohnReid added 3 commits March 22, 2017 13:23

Memoise beta, the stick breaking proportions, as well as theta, the d…

6656162

…raws from the base distribution.

Memoise beta, the stick breaking proportions, as well as theta, the d…

0e1a240

…raws from the base distribution. (didn't include all changes in previous commit)

Format with AutoPep8. Move condition and body of while loop functions…

f0bf22a

… inside sample function for easier scoping.

JohnReid mentioned this pull request Mar 22, 2017

DP does not memoise stick breaking proportions across calls to sample #573

Closed

Change indentation back to 2-space.

b906f34

Fix flake8 errors.

3845c08

dustinvtran reviewed Mar 22, 2017

View reviewed changes

JohnReid added 3 commits March 22, 2017 17:22

Minor style changes in response to @dustinvtran

18d9364

Remove new line. Change back to int32 for discrete dist.

69aa313

Improve comment.

8ac8829

JohnReid added 2 commits March 23, 2017 09:26

Fix dtype issue with tf.zeros

784b1ec

Try to fix indent.

1ab25b5

dustinvtran merged commit 6a38828 into blei-lab:master Mar 23, 2017

JohnReid deleted the dpfix branch March 23, 2017 16:10

Memoise beta, the stick breaking proportions of a DP #575

Memoise beta, the stick breaking proportions of a DP #575

Uh oh!

Conversation

JohnReid commented Mar 22, 2017

Uh oh!

dustinvtran commented Mar 22, 2017

Uh oh!

JohnReid commented Mar 22, 2017

Uh oh!

dustinvtran commented Mar 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dustinvtran commented Mar 22, 2017

Uh oh!

JohnReid commented Mar 22, 2017

Uh oh!

dustinvtran left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dustinvtran commented Mar 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohnReid commented Mar 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

dustinvtran commented Mar 22, 2017 •

edited

Loading

dustinvtran commented Mar 23, 2017 •

edited

Loading

JohnReid commented Mar 23, 2017 •

edited

Loading