-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Hi authors,
I was reading your paper and implementation and I am confused about the following design choices. Much appreciate it if you can clarify them, thank you!
-
In the paper you said " In the Curious Replay implementation for DreamerV3, the probability of training on a sequence is based on the priority calculated for the last step of the sequence." Why not any other aggregations of the score along the sequence like
sum
ormean
? -
Why do we need
priority_scalar
? I don't get the comment you put in the code
self.priority_scalar = (
10.0 # Used to scale all priorities. Avoids reverb precision issue.
)
- What is
self.flush
inBasePrioritizedReverb
for?
Metadata
Metadata
Assignees
Labels
No labels