Incomplete Data in Generalized Linear Models

Abstract
This article examines incomplete data for the class of generalized linear models, in which incompleteness is due to partially missing covariates on some observations. Under the assumption that the missing data are missing at random, it is shown that the E step of the EM algorithm for any generalized linear model can be expressed as a weighted complete data log-likelihood when the unobserved covariates are assumed to come from a discrete distribution with finite range. Expressing the E step in this manner allows for a straightforward maximization in the M step, thus leading to maximum likelihood estimates (MLE's) for the parameters. Asymptotic variances of the MLE's are also derived, and results are illustrated with two examples.