Markov Decision Drift Processes; Conditions for Optimality Obtained by Discretization

Abstract
In a recent paper (Hordijk, A., F. A. van der Duyn Schouten. 1983. Discretization and weak convergence in Markov decision drift processes. Math. Oper. Res.8 112–141.) the authors gave sufficient conditions for the weak convergence of a sequence of discrete time Markov decision drift processes to a related continuous time Markov decision drift process. The goal of this analysis was to obtain for the continuous time model conditions for optimality of a limit point of a sequence of discounted discrete time optimal policies. However, the general conditions in (Hordijk, A., F. A. van der Duyn Schouten. 1983. Discretization and weak convergence in Markov decision drift processes. Math. Oper. Res.8 112–141.) can only be applied to a very restrictive class of models. To obtain more widely applicable conditions we are concerned in this paper not with the weak convergence of the stochastic processes induced by policies, but rather with the convergence of the total expected discounted costs. Special attention is paid to the case of unbounded cost functions. Sufficient conditions on model parameters and policies are given to guarantee the convergence of the expected discounted costs of a sequence of discrete time models to those of a related continuous time model. These results are used to identify the optimal policy of a continuous time model by means of the optimal policies of a sequence of approximating discrete time models. An application to an M/M/1 queueing model is given.