The torch_ac package contains the PyTorch implementation of two Actor-Critic deep reinforcement learning algorithms:
Note: An example of use of this package is given in the rl-starter-files repository. More details below.
- Recurrent policies
- Reward shaping
- Handle observation spaces that are tensors or dict of tensors
- Handle discrete action spaces
- Observation preprocessing
- Multiprocessing
- CUDA
pip3 install torch-acNote: If you want to modify torch-ac algorithms, you will need to rather install a cloned version, i.e.:
git clone https://github.com/lcswillems/torch-ac.git
cd torch-ac
pip3 install -e .
A brief overview of the components of the package:
torch_ac.A2CAlgoandtorch_ac.PPOAlgoclasses for A2C and PPO algorithmstorch_ac.ACModelandtorch_ac.RecurrentACModelabstract classes for non-recurrent and recurrent actor-critic modelstorch_ac.DictListclass for making dictionnaries of lists list-indexable and hence batch-friendly
Here are detailled the most important components of the package.
torch_ac.A2CAlgo and torch_ac.PPOAlgo have 2 methods:
__init__that may take, among the other parameters:- an
acmodelactor-critic model, i.e. an instance of a class inheriting from eithertorch_ac.ACModelortorch_ac.RecurrentACModel. - a
preprocess_obssfunction that transforms a list of observations into a list-indexable objectX(e.g. a PyTorch tensor). The defaultpreprocess_obssfunction converts observations into a PyTorch tensor. - a
reshape_rewardfunction that takes into parameter an observationobs, the actionactiontaken, the rewardrewardreceived and the terminal statusdoneand returns a new reward. By default, the reward is not reshaped. - a
recurrencenumber to specify over how many timesteps gradient is backpropagated. This number is only taken into account if a recurrent model is used and must divide thenum_frames_per_agentparameter and, for PPO, thebatch_sizeparameter.
- an
update_parametersthat first collects experiences, then update the parameters and finally returns logs.
torch_ac.ACModel has 2 abstract methods:
__init__that takes into parameter anobservation_spaceand anaction_space.forwardthat takes into parameter N preprocessed observationsobsand returns a PyTorch distributiondistand a tensor of valuesvalue. The tensor of values must be of size N, not N x 1.
torch_ac.RecurrentACModel has 3 abstract methods:
__init__that takes into parameter the same parameters thantorch_ac.ACModel.forwardthat takes into parameter the same parameters thantorch_ac.ACModelalong with a tensor of N memoriesmemoryof size N x M where M is the size of a memory. It returns the same thing thantorch_ac.ACModelplus a tensor of N memoriesmemory.memory_sizethat returns the size M of a memory.
Note: The preprocess_obss function must return a list-indexable object (e.g. a PyTorch tensor). If your observations are dictionnaries, your preprocess_obss function may first convert a list of dictionnaries into a dictionnary of lists and then make it list-indexable using the torch_ac.DictList class as follow:
>>> d = DictList({"a": [[1, 2], [3, 4]], "b": [[5], [6]]})
>>> d.a
[[1, 2], [3, 4]]
>>> d[0]
DictList({"a": [1, 2], "b": [5]})Note: if you use a RNN, you will need to set batch_first to True.
Examples of use of the package components are given in the rl-starter-scripts repository.
...
algo = torch_ac.PPOAlgo(envs, acmodel, args.frames_per_proc, args.discount, args.lr, args.gae_lambda,
args.entropy_coef, args.value_loss_coef, args.max_grad_norm, args.recurrence,
args.optim_eps, args.clip_eps, args.epochs, args.batch_size, preprocess_obss)
...
exps, logs1 = algo.collect_experiences()
logs2 = algo.update_parameters(exps)More details here.
torch_ac.DictList({
"image": preprocess_images([obs["image"] for obs in obss], device=device),
"text": preprocess_texts([obs["mission"] for obs in obss], vocab, device=device)
})More details here.
class ACModel(nn.Module, torch_ac.RecurrentACModel):
...
def forward(self, obs, memory):
...
return dist, value, memoryMore details here.
More details here.