Training Products of Experts by Minimizing Contrastive Divergence

Top Cited Papers

1 August 2002

journal article
Published by MIT Press in Neural Computation

Vol. 14 (8), 1771-1800
https://doi.org/10.1162/089976602760128018

Abstract

It is possible to combine multiple latent-variable models of the same data by multiplying their probability distributions together and then renormalizing. This way of combining individual “expert” models makes it hard to generate samples from the combined model but easy to infer the values of the latent variables of each expert, because the combination rule ensures that the latent variables of different experts are conditionally independent when given the data. A product of experts (PoE) is therefore an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary. Training a PoE by maximizing the likelihood of the data is difficult because it is hard even to approximate the derivatives of the renormalization term in the combination rule. Fortunately, a PoE can be trained using a different objective function called “contrastive divergence” whose derivatives with regard to the parameters can be approximated accurately and efficiently. Examples are presented of contrastive divergence learning using several types of expert on several types of data.

Keywords

This publication has 8 references indexed in Scilit:

Attractor Dynamics in Feedforward Neural Networks
Neural Computation, 2000
Bias/Variance Decompositions for Likelihood-Based Estimators
Neural Computation, 1998
Biologically Plausible Error-Driven Learning Using Local Activation Differences: The Generalized Recirculation Algorithm
Neural Computation, 1996
Using generative models for handwritten digit recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1996
The "Wake-Sleep" Algorithm for Unsupervised Neural Networks
Science, 1995
Connectionist learning of belief networks
Artificial Intelligence, 1992
Combining Probability Distributions: A Critique and an Annotated Bibliography
Statistical Science, 1986
Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1984

Cited by 2995 articles