Home

Sensitive discount optimality: Unifying discounted and average reward reinforcement learning


Author(s) : Sridhar Mahadevan, 
Publisher : N/A
Publication Date : 1996
ISSN : N/A
Abstract : Research in reinforcement learning (RL) has thus far concentrated on two optimality criteria: the discounted framework, which has been very well-studied, and the averagereward framework, in which interest is rapidly increasing. In this paper, we present a framework called sensitive discount optimality which offers an elegant way of linking these two paradigms. Although sensitive discount optimality has been well studied in dynamic programming, with several provably convergent algorithms, it has not received any attention in RL. This framework is based on studying the properties of the expected cumulative discounted reward, as discounting tends to 1. Under these conditions, the cumulative discounted reward can be expanded using a Laurent series expansion to yields a sequence of terms, the first of which is the average reward, the second involves the average adjusted sum of rewards (or bias), etc. We use the sensitive discount optimality framework to derive a new model-free average reward technique, which is related to Q-learning type methods proposed by Bertsekas, Schwartz, and Singh, but which unlike these previous methods, optimizes both the first and second terms in the Laurent series (average reward and bias values).,