Reference
L. Buşoniu, B. De Schutter, R. Babuška, and D. Ernst, "Exploiting
policy knowledge in online least-squares policy iteration: An empirical study,"
Automation, Computers, Applied Mathematics, vol. 19,
no. 4, pp. 521-529, 2010.
Abstract
Reinforcement learning (RL) is a promising paradigm for learning optimal
control. Traditional RL works for discrete variables only, so to deal with the
continuous variables appearing in control problems, approximate representations
of the solution are necessary. The field of approximate RL has tremendously
expanded over the last decade, and a wide array of effective algorithms is now
available. However, RL is generally envisioned as working without any prior
knowledge about the system or the solution, whereas such knowledge is often
available and can be exploited to great advantage. Therefore, in this paper we
describe a method that exploits prior knowledge to accelerate online
least-squares policy iteration (LSPI), a state-of-the-art algorithm for
approximate RL. We focus on prior knowledge about the monotonicity of the
control policy with respect to the system states. Such monotonic policies are
appropriate for important classes of systems appearing in control applications,
including for instance nearly linear systems and linear systems with monotonic
input nonlinearities. In an empirical evaluation, online LSPI with prior
knowledge is shown to learn much faster and more reliably than the original
online LSPI.
Downloads
BibTeX
@article{BusDeS:10-064,
author = {Bu{\c{s}}oniu, Lucian and De Schutter, Bart and Babu{\v{s}}ka,
Robert and Ernst, Damien},
title = {Exploiting Policy Knowledge in Online Least-Squares Policy
Iteration: {An} Empirical Study},
journal = {Automation, Computers, Applied Mathematics},
volume = {19},
number = {4},
pages = {521--529},
year = {2010}
}