Reference
S. Mallick, F. Airaldi, A. Dabiri, and B. De Schutter, "Multi-agent
reinforcement learning via distributed MPC as a function approximator,"
Automatica, vol. 167, p. 111803, Sept. 2024.
Abstract
This paper presents a novel approach to multi-agent reinforcement learning (RL)
for linear systems with convex polytopic constraints. Existing work on RL has
demonstrated the use of model predictive control (MPC) as a function
approximator for the policy and value functions. The current paper is the first
work to extend this idea to the multi-agent setting. We propose the use of a
distributed MPC scheme as a function approximator, with a structure allowing
for distributed learning and deployment. We then show that Q-learning updates
can be performed distributively without introducing nonstationarity, by
reconstructing a centralized learning update. The effectiveness of the approach
is demonstrated on a numerical example.
Publisher page
Downloads
BibTeX
@article{MalAir:24-012,
author = {Mallick, Samuel and Airaldi, Filippo and Dabiri, Azita and De
Schutter, Bart},
title = {Multi-Agent Reinforcement Learning via Distributed {MPC} as a
Function Approximator},
journal = {Automatica},
volume = {167},
pages = {111803},
month = sep,
year = {2024}
}