Some questions about design choices 

Hi authors, 

I was reading your paper and implementation and I am confused about the following design choices. Much appreciate it if you can clarify them, thank you!

1. In the paper you said " In the Curious Replay implementation for DreamerV3, the probability of training on a sequence is based on the priority calculated for the last step of the sequence." Why not any other aggregations of the score along the sequence like `sum` or `mean`? 

2. Why do we need `priority_scalar`? I don't get the comment you put in the code 

```
        self.priority_scalar = (
            10.0  # Used to scale all priorities. Avoids reverb precision issue.
        )
```

3. What is `self.flush` in `BasePrioritizedReverb` for? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some questions about design choices #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Some questions about design choices #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions