Home

Reinforcement learning algorithms based on the methods of temporal differences


Author(s) : Pawe Cichosz, 
Publisher : N/A
Publication Date : 1994
ISSN : N/A
Abstract : The reinforcement learning paradigm differs significantly from the traditional supervised learning paradigm. An agent in each particular input situation must generate an action. Then it receives a reinforcement value from the environment, providing a measure of the agent's performance. The task for the agent is to maximize the reinforcement values it receives in long term. Reinforcement learning agents are adaptive, reactive, and self-improving. To formulate a particular task as a reinforcement learning task one just has to design an appropriate reinforcement function, specifying the goal of the task. This makes the paradigm widely applicable, especially in such domains as game playing, automatic control, and robotics. The reinforcement value received by the agent at a particular time step may reflect the positive or negative consequences of actions taken several steps before. In order to deal with such delayed reinforcement one needs some algorithms for temporal credit assignment. This thesis concentrates on a class of algorithms based on,