Reference
L. Buşoniu, B. De Schutter, R. Babuška, and D. Ernst, "Using
prior knowledge to accelerate online least-squares policy iteration,"
Proceedings of the 2010 IEEE International Conference on Automation,
Quality and Testing, Robotics (AQTR 2010), Cluj-Napoca, Romania, 6 pp.,
May 2010. Paper A-S2-1/3005.
Abstract
Reinforcement learning (RL) is a promising paradigm for learning optimal
control. Although RL is generally envisioned as working without any prior
knowledge about the system, such knowledge is often available and can be
exploited to great advantage. In this paper, we consider prior knowledge about
the monotonicity of the control policy with respect to the system states, and
we introduce an approach that exploits this type of prior knowledge to
accelerate a state-of-the-art RL algorithm called online least-squares policy
iteration (LSPI). Monotonic policies are appropriate for important classes of
systems appearing in control applications. LSPI is a data-efficient RL
algorithm that we previously extended to online learning, but that did not
provide until now a way to use prior knowledge about the policy. In an
empirical evaluation, online LSPI with prior knowledge learns much faster and
more reliably than the original online LSPI.
Downloads
BibTeX
@inproceedings{BusDeS:10-024,
author = {Bu{\c{s}}oniu, Lucian and De Schutter, Bart and Babu{\v{s}}ka,
Robert and Ernst, Damien},
title = {Using Prior Knowledge to Accelerate Online Least-Squares Policy
Iteration},
booktitle = {Proceedings of the 2010 IEEE International Conference on
Automation, Quality and Testing, Robotics (AQTR 2010)},
address = {Cluj-Napoca, Romania},
month = may,
year = {2010},
note = {Paper A-S2-1/3005}
}