Skip to content

The problem of different action time scales #864

@battery7

Description

@battery7

I want to use HPPO to handle mixed action space problems, but I encountered a problem where the time scales of discrete and continuous actions are different. For example, if a continuous action is executed once, the discrete action needs to be executed many times. How should I handle this issue? I think it's difficult to converge in setting the reward function. Or are there any other algorithms that support it, or am I discretizing continuous actions and then using a discrete action mask?

Metadata

Metadata

Assignees

No one assigned

    Labels

    algoAdd new algorithm or improve old one

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions