Learning Algorithms for Two-Person Zero-Sum Stochastic Games with Incomplete Information

Abstract
This paper investigates conditions under which two learning algorithms playing a zero-sum sequential stochastic game would arrive at optimal pure strategies. Neither player has knowledge of either the pay-off matrix or the choice of strategies available to the other and both players update their own strategies at every stage entirely on the basis of the random outcome at that stage. The proposed learning algorithms are shown to converge to the optimal pure strategies when they exist with probabilities as close to 1 as desired.