Reference
L. Buşoniu, D. Ernst, B. De Schutter, and R. Babuška, "Policy
search with cross-entropy optimization of basis functions,"
Proceedings of the 2009 IEEE Symposium on Adaptive Dynamic
Programming and Reinforcement Learning (ADPRL 2009), Nashville,
Tennessee, pp. 153-160, Mar.-Apr. 2009.
Abstract
This paper introduces a novel algorithm for approximate policy search in
continuous-state, discrete-action Markov Decision Process (MDP). Previous
policy search approaches have typically used ad-hoc parameterizations developed
for specific MDPs. In contrast, the novel algorithm employs a flexible policy
parameterization, suitable for solving general discrete-action MDPs. The
algorithm looks for the best closed-loop policy that can be represented using a
given number of basis functions, where a discrete action is assigned to each
basis function. The locations and shapes of the basis functions are optimized,
together with the action assignments. This allows a large class of policies to
be represented. The optimization is carried out with the cross-entropy method
and evaluates the policies by their empirical return from a representative set
of initial states. We report simulation experiments in which the algorithm
reliably obtains good policies with only a small number of basis functions,
albeit at sizable computational costs.
Downloads
BibTeX
@inproceedings{BusErn:08-031,
author = {Bu{\c{s}}oniu, Lucian and Ernst, Damien and De Schutter, Bart
and Babu{\v{s}}ka, Robert},
title = {Policy Search with Cross-Entropy Optimization of Basis
Functions},
booktitle = {Proceedings of the 2009 IEEE Symposium on Adaptive Dynamic
Programming and Reinforcement Learning (ADPRL 2009)},
address = {Nashville, Tennessee},
pages = {153--160},
month = mar # {--} # apr,
year = {2009}
}