REPRODUCING DISTRIBUTIONS FOR MACHINE LEARNING

Abstract
A model is proposed for learning the nature and value of an unknown parameter, or unknown parameters, in a probability distribution which forms part of a body of statistics related to some system or process. The model is Bayesian, involving the assumption of an a priori probability distribution over the possible values of the unknown parameters; the performance of experiments to gain information about the parameters; and the alteration of the a priori probabilities by Bayes' rule. In the limit, as the number of experiments approaches infinity, the a posteriori distribution in most cases encountered in practice approaches a delta function at the true values of the unknown parameters, so the system learns the values of the parameters exactly. The learning process developed in the paper is shown to be technically feasible if the a priori and a posteriori distributions are of the same form, with the learning accomplished by calculating new parameters for these distributions. It is shown that a necessary and sufficient condition for fulfillment of this feasibility criterion is for a sufficient statistic of fixed dimension to exist. If such a sufficient statistic exists, the a posteriori distributions may vary in form initially, but they eventually become of fixed form. The techniques developed indicate logical methods for choosing a priori probabilities and are applied in pattern recognition, estimation, and other problems.