Skip to content

Jasonxu1225/Awesome-Constraint-Inference-in-RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

Awesome-Constraint-Inference-in-RL

A collection of research papers on constraint inference within the field of Reinforcement Learning (RL), with a primary focus on Inverse Constrained Reinforcement Learning (ICRL).

Our survey paper A Comprehensive Survey on Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges has been accepted to TMLR 2025.

This repository will be continuously updated. Welcome to follow and star it!

Table of Contents

Importance of Inferring Constraints

To ensure the reliability of a Reinforcement Learning (RL) algorithm within safety-critical applications, it is crucial for the agent to have knowledge of the underlying constraints. However, in many real-world tasks, the constraints are often unknown and difficult to specify mathematically, particularly when these constraints are time-varying, context-dependent, and inherent to the expert’s own experience. For example, the following figure shows a contemporary example of a highway merging task, where the ideal constraints depend on the traffic or road conditions as well as the weather.

introduction

An example of the context-sensitive car distance constraint between vehicles during a merge on the highway. Under proper weather conditions, when vehicle speed is relatively low and traffic congestion is high, the distance between cars can be reduced. However, in adverse weather conditions, when vehicles are moving fast and traffic is sparse, it becomes necessary to increase the distance between cars to ensure safety.

Procedure of ICRL

An effective approach to resolve the above challenges is Inverse Constrained Reinforcement Learning (ICRL), which infers the implicit constraints respected by expert agents, utilizing experience collected from both the environment and the observed demonstration dataset. These constraints, learned through a data-driven approach, can effectively generalize across multiple environments, thereby providing a more comprehensive explanation of the expert agents’ behavior and facilitating safety control in downstream applications.

In the typical preference modeling approach, the agent must first recover the rewards optimized and constraints respected by expert agents, and imitate experts by optimizing the Constrained Reinforcement Learning (CRL) objective under these constraints. This is a challenging task since there might be various equivalent combinations of reward distributions and constraints that can explain the same expert demonstrations. Striving for identifiability, ICRL algorithms simplify the problem by assuming that rewards are observable, and the goal is to recover only the constraints that best explain the expert data. The inference process of ICRL often involves alternating between updating an imitating policy and updating a constraint function. The following figure summarizes the main procedure of ICRL.

flowchart

Papers

Constraint Inference in Inverse Optimal Control

Constraint Inference from Human Interventions

Inverse Constrained Reinforcement Learning

ICRL in Deterministic Environments

ICRL in Stochastic Environments

ICRL from Limited Demonstrations

ICRL for Both Rewards and Constraints

ICRL from Multiple Expert Agents

  • Learning shared safety constraints from multi-task demonstrations [NeurIPS 2023]

    • Konwoo Kim, Gokul Swamy, Zuxin Liu, Ding Zhao, Sanjiban Choudhury, Zhiwei Steven Wu
    • Additionally leverage side information in the form of a reasonable set of constraints, enabling policy performance guarantees.
    • Shared Constraint
  • Multi-modal inverse constrained reinforcement learning from a mixture of demonstrations [NeurIPS 2023]

    • Guanren Qiao, Guiliang Liu, Pascal Poupart, Zhiqiang Xu
    • Study expert data that record demonstrations from multiple experts who respect different kinds of constraints, and propose a Multi-Modal Inverse Constrained Reinforcement Learning (MM-ICRL) algorithm that performs unsupervised agent identification and multi-modal policy optimization to learn agent-specific constraints
    • Multi-modal Constraint
  • Learning safety constraints from demonstrations with unknown rewards [AISTATS 2024]

    • David Lindner, Xin Chen, Sebastian Tschiatschek, Katja Hofmann, Andreas Krause
    • Study a setting where the expert agents optimize different rewards under a shared constraint, and define the safe set as the convex hull of the feature expectations of the expert demonstrations
    • Shared Constraint

About

A collection of research papers on constraint inference within the field of RL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •