Policy Improvement Algorithm for Singularly Perturbed Discounted Markov Decision Processes
In this paper, we consider a perturbed Markov decision process with the discounted reward
criterion .The transition probabilities and discount factor are perturbed slightly.We assume t
hat the underlying process is completely decomposable in finite number of separate irreducible
processes .We introduce the limit Markov control problem which is the optimization problem th
at should be solved in case of singular perturbations. In order to solve the limit Markov cont
rol problem, we propose an aggregation-disaggregation policy improvement algorithm which conve
rges in a finite number of iterations to an optimal deterministic strategy.