You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this, we refactor out the ILQL loss function and model additions so they can be reused with other accelerator libraries.
I also refactored the loss to be slightly clearer and fixed some type errors.
First part of #75
W&B run: https://wandb.ai/carperai/trlx/runs/3tam2www
:param tau: Control tradeoff in value loss between punishing value network for underestimating the target Q (i.e. Q value corresponding to the action taken) (high tau) and overestimating the target Q (low tau)
66
-
:type tau: float
67
-
68
-
:param gamma: Discount factor for future rewards
69
-
:type gamma: float
70
-
71
-
:param cql_scale: Weight for CQL loss term
72
-
:type cql_scale: float
73
-
74
-
:param awac_scale: Weight for AWAC loss term
75
-
:type awac_scale: float
76
-
77
-
:param steps_for_target_q_sync: Number of steps to wait before syncing target Q network with Q network
78
-
:type steps_for_target_q_sync: int
79
-
80
-
:param two_qs: Use minimum of two Q-value estimates
81
-
:type two_qs: bool
50
+
Return constructor for specified method config
82
51
"""
83
-
84
-
tau: float
85
-
gamma: float
86
-
cql_scale: float
87
-
awac_scale: float
88
-
alpha: float
89
-
steps_for_target_q_sync: int
90
-
betas: List[float]
91
-
two_qs: bool
52
+
name=name.lower()
53
+
ifnamein_METHODS:
54
+
return_METHODS[name]
55
+
else:
56
+
raiseException("Error: Trying to access a method that has not been registered")
0 commit comments