Home

Hierarchical optimization of policy-coupled semi-markov decision processes


Author(s) : Sridhar Mahadevan Gang Wang, 
Publisher : N/A
Publication Date : 1999
ISSN : N/A
Abstract : One general strategy for approximately solving large Markov decision processes is "divide-and-conquer": the original problem is decomposed into sub-problems which interact with each other, but yet can be solved independently by taking into account the nature of the interaction. In this paper we focus on a class of "policy-coupled" semi-Markov decision processes (SMDPs), which arise in many nonstationary real-world multi-agent tasks, such as manufacturing and robotics. The nature of the interaction among sub-problems (agents) is more subtle than that studied previously: the components of a sub-SMDP, namely the available states and actions, transition probabilities and rewards, depend on the policies used in solving the "neighboring " sub-SMDPs. This "strongly-coupled " interaction among subproblems causes the approach of solving each sub-SMDP in parallel to fail. We present a novel approach whereby many variants of each sub-SMDP are solved, explicitly taking into account the different modes of interaction, and a dynamic merging algorithm is used to combine the base level policies. We present detailed experimental results for a 12-machine transfer line, a large real-world manufacturing task. We show that the hierarchical approach is not only much faster than a "flat " algorithm, but also outperforms two well-known heuristics for running transfer lines used in many factories.,