Michail G. Lagoudakis



Research Statement

In my research I effectively use computers to solve challenging decision-making and control problems. In particular, I focus on developing general tools that enable a machine to adaptively learn a solution. This is a far superior method than executing a hand-coded solution, because the problems we face today contain uncertainty, and thus do not admit static solutions.

I build on the strength of the current computer technology to manipulate numbers and complex data structures extremely quickly. In extracting statistics from huge data sets, evaluating complex functions, and solving large systems of equations machines routinely outperform people. I am exploiting this computational power in conjunction with solid statistical methods to provide a basis for handling the inherent uncertainty and nondeterminism that characterizes most outstanding decision-making problems.

In joint work with Michael Littman at Duke and AT&T Labs, I focused on learning methods for the algorithm selection problem (adaptively and recursively choosing the appropriate algorithm for each incoming problem instance). Eventually, I came to realize that there is need for more efficient and powerful reinforcement-learning methods, so during my Ph.D. work with Ronald Parr at Duke, I focused on general-purpose reinforcement-learning algorithms with goals in two directions: efficient utilization of data and scaling to large problems. I used techniques from linear algebra, function approximation, linear programming, monte-carlo estimation, and classification to devise and propose learning algorithms that meet the challenges of a wide range of decision making problems.

I am currently doing research in this area at Georgia Tech and apply my methods to industrial problems (disassembly planning, routing in re-entrant lines) with a plan to continuing on problems from operations research (economic markets and auctions, scheduling), networking (dynamic packet routing, router/server reconfiguration), robotics (coordination, planning), and meta--computation (using learning and reasoning to build ``smarter'' computers that make better use of their resources, as opposed to simply faster computers). During this process, I expect to raise more fundamental and/or theoretical questions. In parallel, I collaborate with medical doctors from Emory University on supervised learning methods for computer-aided diagnosis of gastrointestinal bleeding.

In the past, I have established a solid basis in neural systems (Hopfield networks, neural maps), robotics (path planning, motion control), human-computer interaction (speech interfaces), and DNA computation (self--assembly). I still maintain a strong interest in all these fields and I am actively seeking collaborations along these lines.

My long-term vision is two-fold and bidirectional in terms of interdisciplinary collaboration. On one hand, I want to spread the machine learning technology by using it to solve outstanding problems in other disciplines. Computational biology, medical diagnosis, operations research, and robotics are just a few sample areas where uncertainty is inherent and machine learning can make a diŽerence. On the other hand, I want to bring new tools, developed in other disciplines, into machine learning. I believe that mathematics, statistics, control theory, and physics will be some of the main such collaborators in the years to come. Tools, such as wavelets, experimental design, adaptive control, and chaotic dynamics, have the potential of contributing significantly to the core challenges of machine learning. Under this view, I envision my research career crossing the narrow borders of strict specialization while maintaining a scholarly expertise on a very promising field.

Research statement in PDF

Ph.D. Thesis Research 

My Ph.D. dissertation is on "Efficient Approximate Policy Iteration Methods for Sequential Decision Making in Reinforcement Learning". Briefly, I developed two algorithms for the control problem in reinforcement learning. Least-Squares Policy Iteration (LSPI) uses approximate value function representations and Rollout Classification Policy Iteration (RCPI) uses approximate policy representations. Both algorithms are based on the approximate policy iteration framework and therefore inherit certain stability and soundness properties.

Here is my Ph.D. dissertation committee:

Here are the slides from my defense examination on April 18, 2003.

Check out the publications page for related published papers.
An online copy of my thesis will appear on this page soon.

Least-Squares Policy Iteration (LSPI) 

As part of my Ph.D. dissertation I worked on the use of "Least-Squares Methods in Reinforcement Learning for Control". Briefly, I used linear architectures and least-squares projections for value function approximation in order to solve large-scale control problems using relatively few training data. The core algorithm, Least-Squares Policy Iteration (LSPI), is simple, stable, data-efficient, and widely applicable. Using LSPI and its variations I was able to obtain very good results in several collaborative/competitive single-/multi- agent domains.

Some related papers:

Model-Free Least-Squares Policy Iteration
Michail G. Lagoudakis and Ronald Parr
Proceedings of NIPS*2001: Neural Information Processing Systems: Natural and Synthetic
Vancouver, BC, December 2001, pp. 1547-1554.

Least-Squares Policy Iteration
Michail G. Lagoudakis and Ronald Parr
Journal of Machine Learning Research, 4, 2003, pp. 1107-1149.

Check out the publications page for more related published papers.
For additional information on LSPI and a code distribution go to the LSPI page

Algorithm Selection using Reinforcement Learning

This is the work I did with Prof. Michael L. Littman for my 2nd year research project requirement. I used reinforcement learning methods to optimize traditional computer science algorithms. I looked into the problems of sorting, order-statistic selection, and branching in satisfiability. 

Algorithm Selection using Reinforcement Learning
Michail G. Lagoudakis and Michael L. Littman
Proceedings of the 17th International Conference on Machine Learning
Stanford, CA, June 2000, pp. 511-518

Learning to Select Branching Rules in the DPLL Procedure for Satisfiability
Michail G. Lagoudakis and Michael L. Littman
Electronic Notes in Discrete Mathematics (ENDM), Vol. 9, Elsevier
LICS 2001 Workshop on Theory and Applications of Satisfiability Testing (SAT 2001)

Boston, MA, June 14-15, 2001 (ENDM Volume 9 online)

You can also check out RLSAT, a DPLL like solver for SAT and #SAT problems with learning capabilities.

DNA Computation: Tiling and Self Assembly

In my early stages in the Ph.D. program, I worked with Prof. Thomas H. LaBean and Prof. John Reif on the use of synthetic DNA strands for solving hard combinatorial problems. Using the principles of a new emerging technology, known as DNA tiling, in which DNA tiles self-assemble to form certain two-dimensional structures, we  presented algorithmic designs of such tiles that could possibly solve the satisfiability (SAT) problem

2D DNA Self-Assembly for Satisfiability
Michail G. Lagoudakis and Thomas H. LaBean
Proceedings of the 5th DIMACS International Meeting on DNA Based Computers
MIT, Boston, MA, June 1999, pp. 139-152

Neural Maps for Mobile Robot Navigation

This is the research I did for my M.Sc. degree at the University of Louisiana, Lafayette. I worked with Prof. Anthony Maida on the use of neural maps for the problem of mobile robot navigation. Our method was successfully implemented on a Nomad 200 mobile robot for sonar-based local navigation.

Here is my M.Sc. thesis committee:

And here is a poster describing the main points of my thesis.

Neural Maps for Mobile Robot Navigation
Michail G. Lagoudakis and Anthony S. Maida
Proceedings of the 1999 IEEE International Joint Conference on Neural Networks
Washington, D.C., July 1999

Mobile Robot Local Navigation with a Polar Neural Map
M.Sc. Thesis
Center for Advanced Computer Studies, University of Louisiana, Lafayette, 1998