LSPI: Least-Squares Policy Iteration

Introduction

Least-Squares Policy Iteration (LSPI) is a reinforcement learning algorithm designed to solve control problems. It uses value function approximation to cope with large state spaces and batch processing for efficient use of training data. LSPI has been used successfully to learn good policies in several domains using relatively few training data. This page contains information about LSPI, examples, papers, and a code distribution that can be used for academic and/or research purposes.


Authors

Michail G. Lagoudakis
Ph.D. Candidate, Department of Computer Science, Duke University (at that time)
Associate Professor, Dept of ECE, Technical University of Crete, Greece (now)

lagoudakis @ ece . tuc . gr

Ronald Parr
Professor, Department of Computer Science, Duke University
parr @ cs . duke . edu

Papers

This is the paper that introduced LSPI:
Model-Free Least-Squares Policy Iteration
Michail G. Lagoudakis and Ronald Parr
Proceedings of NIPS*2001: Neural Information Processing Systems: Natural and Synthetic
Vancouver, BC, December 2001, pp. 1547-1554.

A longer journal version is also available:

Least-Squares Policy Iteration
Michail G. Lagoudakis and Ronald Parr
Journal of Machine Learning Research, 4, 2003, pp. 1107-1149.

Several other papers have been published since then. They are available from Michail's and Ron's pages.


LSPI Code Distribution

This is a MatLab implementation of LSPI with certain parts written in C. It should run on any Unix or Linux architecture with MatLab installed without any problems. It has not been tested on a Windows machine.

At the moment, the distribution includes the core LSPI code, the chain, the pendulum, and the bicycle domain. Check the README file in each directory for instructions.

Distribution and use of this code is subject to the following agreement:
This Program is provided by Duke University and the authors as a service to the research community. It is provided without cost or restrictions, except for the User's acknowledgement that the Program is provided on an "As Is" basis and User understands that Duke University and the authors make no express or implied warranty of any kind.  Duke University and the authors specifically disclaim any implied warranty or merchantability or fitness for a particular purpose, and make no representations or warranties that the Program will not infringe the intellectual property rights of others. The User agrees to indemnify and hold harmless Duke University and the authors from and against any and all liability arising out of User's use of the Program.

Email Michail or Ron if you encounter any problems.