|
Abstract : |
The theory of stochastic dynamic programming requires that the current state of the system is known to the decision maker. However, this condition is violated in a number of cases, as for instance in high-speed data networks. In this article, we first show that by enlarging the state space so as to include the last known state as well as all the decisions made during the travel time of the information (e.g., propagation delay in networks), we may reduce a partially observable Markov control model to a completely observable Markov control model. This result is then applied to a flow control problem with delayed state information pattern. More precisely, we consider a discrete time queueing model with Bernoulli services, where at most one customer may join the queue in every time slot. We consider a general cost formulation that captures standard control problems (e.g., delay versus throughput). At the beginning of slot t+N, where N is a fixed integer and t = 0; 1; 2: ::, the decision maker has to select the probability of having one arrival in the current time slot (it may be either p 1 or p 2, p 1 p 2) only on the basis of the queue-length history in [0; t] and on the actions made in [0; t + N). We show that there exists a policy of a threshold type that minimizes a discounted cost criterion over an infinite horizon. The threshold is seen to depend on the N previous actions., |