-
Notifications
You must be signed in to change notification settings - Fork 190
Open
Description
Hello, in game_ac_network.py, def prepare_loss(self, entropy_beta), you have:
# temporary difference (R-V) (input for policy)
self.td = tf.placeholder("float", [None])
value_loss = 0.5 * tf.nn.l2_loss(self.r - self.v)
But td == self.r-self.v, right?
So, why not use self.td directly instead of recalculating self.v ? Also for pi, why not pass it as placeholder?
Hope reply thanks.
itane13 and ThomasLecat
Metadata
Metadata
Assignees
Labels
No labels