Skip to content

Some questions about design choices  #3

@realjoenguyen

Description

@realjoenguyen

Hi authors,

I was reading your paper and implementation and I am confused about the following design choices. Much appreciate it if you can clarify them, thank you!

  1. In the paper you said " In the Curious Replay implementation for DreamerV3, the probability of training on a sequence is based on the priority calculated for the last step of the sequence." Why not any other aggregations of the score along the sequence like sum or mean?

  2. Why do we need priority_scalar? I don't get the comment you put in the code

        self.priority_scalar = (
            10.0  # Used to scale all priorities. Avoids reverb precision issue.
        )
  1. What is self.flush in BasePrioritizedReverb for?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions