Reference
L. Buşoniu, A. Lazaric, M. Ghavamzadeh, R. Munos, R. Babuška, and
B. De Schutter, "Least-squares methods for policy iteration," in
Reinforcement Learning: State-Of-The-Art (M. Wiering and
v. Otterlo. Martijn, eds.), vol. 12 of
Adaptation, Learning, and Optimization, Heidelberg,
Germany: Springer, ISBN 978-3-642-27644-6, pp. 75-109, 2012.
Abstract
Approximate reinforcement learning deals with the essential problem of applying
reinforcement learning in large and continuous state-action spaces, by using
function approximators to represent the solution. This chapter reviews
least-squares methods for policy iteration, an important class of algorithms
for approximate reinforcement learning. We discuss three techniques for solving
the core, policy evaluation component of policy iteration, called:
least-squares temporal difference, least-squares policy evaluation, and Bellman
residual minimization. We introduce these techniques starting from their
general mathematical principles and detailing them down to fully specified
algorithms. We pay attention to online variants of policy iteration, and
provide a numerical example highlighting the behavior of representative offline
and online methods. For the policy evaluation component as well as for the
overall resulting approximate policy iteration, we provide guarantees on the
performance obtained asymptotically, as the number of samples processed and
iterations executed grows to infinity. We also provide finite-sample results,
which apply when a finite number of samples and iterations are considered.
Finally, we outline several extensions and improvements to the techniques and
methods reviewed.
Publisher page
Downloads
BibTeX
@incollection{BusLaz:12-009,
author = {Bu{\c{s}}oniu, Lucian and Lazaric, Alssandro and Ghavamzadeh,
Mohammad and Munos, R{\'{e}}mi and Babu{\v{s}}ka, Robert and De
Schutter, Bart},
title = {Least-Squares Methods for Policy Iteration},
booktitle = {Reinforcement Learning: State-Of-The-Art},
series = {Adaptation, Learning, and Optimization},
volume = {12},
editor = {Wiering, Marco and van Otterlo. Martijn},
publisher = {Springer},
address = {Heidelberg, Germany},
pages = {75--109},
year = {2012}
}