The problem of different action time scales

I want to use HPPO to handle mixed action space problems, but I encountered a problem where the time scales of discrete and continuous actions are different. For example, if a continuous action is executed once, the discrete action needs to be executed many times. How should I handle this issue? I think it's difficult to converge in setting the reward function. Or are there any other algorithms that support it, or am I discretizing continuous actions and then using a discrete action mask?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The problem of different action time scales #864

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The problem of different action time scales #864

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions