Skip to content

MSc thesis project on behavior-aware transfer learning in reinforcement learning, using π-vectors and GMMs for policy characterization and adaptive knowledge transfer.

Notifications You must be signed in to change notification settings

marcopetix/msc-thesis-pi-pact

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo

π-PACT: Behavior Characterization via π-vectors in Multi-Task Reinforcement Learning

This repository collects materials related to my Master's thesis for the MSc in Computer Science (Artificial Intelligence Curriculum) at the University of Pisa.

🚀 Final Grade: 110/110 cum laude


📜 Thesis Abstract

This thesis explores behavior-aware transfer learning in reinforcement learning, focusing on a novel framework called π-PACTπ-vectors-based Policy Adaptation by Characterization and Transfer. π-PACT leverages policy supervectors (π-vectors) as compact representations of policy behavior to characterize and compare skills across tasks.

Key components:

  • Adaptation of a Universal Background Model (UBM) to observed policy state features.
  • Dynamic monitoring of learning progress via π-vector distances.
  • Selective transfer of knowledge from source policies to a target agent when relevant.

The approach is evaluated on the highway-env RL suite, with experiments analyzing:

  • Representation quality of π-vectors,
  • Transfer learning performance,
  • Challenges like negative transfer and matching thresholds.

📊 System Flowchart

Below is a high-level overview of the π-PACT framework:

π-PACT System Flowchart

Highlights:

  • Policy behavior is periodically summarized as π-vectors.
  • KL-based distance matching identifies similar source policies.
  • A gatekeeper threshold controls when and how to transfer useful knowledge.
  • The Universal Background Model (UBM) enables efficient feature adaptation.

⚙️ About the Code

⚠️ Note: The code in this repository represents an earlier prototype version developed during the research phase. The final experimental version used for thesis evaluation may include refinements not reflected here.

Main components include:

  • Feature extraction tools (lane, road, temporal buffers).
  • Modified SAC agent with π-vector integration.
  • Utilities for GMM training, MAP adaptation, and π-vector computation.

📖 Resources


⚡ Potential Future Work

  • Update codebase to align with final thesis version.
  • Package π-PACT as a modular framework.
  • Provide reproducible experiments and pretrained agents.
  • Explore broader benchmarks beyond highway-env.

🧠 References

  • Kanervisto, A., Wiltschko, A., & Ha, D. (2020). General Characterization of Agents by the States They Visit.
  • Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker Verification Using Adapted Gaussian Mixture Models.

About

MSc thesis project on behavior-aware transfer learning in reinforcement learning, using π-vectors and GMMs for policy characterization and adaptive knowledge transfer.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published