RLvSL project page
Reinforcement Learning via Supervised Learning


The field of machine learning develops learning paradigms and algorithms which allow systems to learn some desired functionality on their own. Supervised learning is learning with a teacher; some authoritative source provides a finite set of correct examples and the learner generalises from the examples and learns a correct function over the entire spectrum. An example from human learning would be the learning of correct spelling by observing correctly-spelled words. Supervised learning focuses on two static learning problems: classification, where the learner induces a correct classification of inputs to one of many classes, and regression, where the learner infers the correct values of a numerical function over its entire domain. In both cases, learning is based on a limited, finite set of correct classification or regression training examples.

Reinforcement learning on the other hand is learning by trial and error; there is no teacher and the learner interacts directly with its environment to acquire information. The learner makes decisions arbitrarily and occasionally receives a numerical score (reinforcement signal) for its overall behaviour. This score does not indicate correct or incorrect actions, but can be used to reinforce good decision making and discourage bad decision making. An example from human learning would be the process of learning how to balance and ride a bicycle (falls incur negative scores). Reinforcement learning focuses on two interactive learning problems within the scope of decision making: prediction, where the learner estimates the quality of a fixed control policy, and control, where the learner infers a good control policy. In both cases, learning is based on training data collected through interaction between the learning agent and its environment.

These two learning paradigms have been researched mostly independently. Recent advances in supervised learning have demonstrated outstanding, near optimal, generalisation performance, whereas reinforcement learning has not reached the same level of applicability to real-world problems. This research proposal investigates the potential of using supervised learning technology for advancing reinforcement learning. Preliminary results have shown that it is possible to incorporate supervised learning algorithms within the inner loops of several reinforcement learning algorithms and therefore reduce one problem to the other. This synergy opens the door to a variety of promising combinations. The proposed research will establish the criteria under which this reduction is possible, investigate viable combinations, propose novel algorithms, assess their potential, and apply them to real problems of practical interest to demonstrate their effectiveness.

Research nowadays has become so specialized that innovation in one field rarely finds its way and becomes useful in another field. Therefore, researchers are doomed to ``reinventing the wheel'' whenever needs arise, instead of drawing from solutions already invented by their colleagues in a related field. The proposed research demonstrates how researchers can benefit each other by building bridges across disciplines. Reinforcement learning finds applications in robotics, automatic control, combinatorial optimization, networking, signal processing, dialogue management, and numerous other fields. Advances in reinforcement learning can only widen the breadth of applications and strengthen the ties between different fields.