Home

Hierarchically optimal average reward reinforcement learning


Author(s) : Sridhar Mahadevan Mohammad Ghavamzadeh, 
Publisher : N/A
Publication Date : 2002
ISSN : N/A
Abstract : Two notions of optimality have been explored in previous work on hierarchical reinforcement learning (HRL): hierarchical optimality, or the optimal policy in the space dened by a task hierarchy, and a weaker local model called recursive optimality. In this paper, we introduce two new average-reward HRL algorithms for nding hierarchically optimal policies. We compare them to our previously reported algorithms for computing recursively optimal policies, using a grid-world taxi problem and a more real-world AGV scheduling problem. The new algorithms are based on a three-part value function decomposition proposed recently by Andre and Russell, which generalizes Dietterich's MAXQ value function decomposition. A key dierence between the algorithms proposed in this paper and our previous work is that there is only a single global gain (average reward), instead of a gain for each subtask. Our results show the new average-reward algorithms have better performance than both the previous recursively optimal counterparts, as well as the corresponding discounted hierarchical optimal algorithms. 1.,