Bias and efficiency of multiple imputation compared with complete‐case analysis for missing covariate values

Top Cited Papers

13 September 2010

journal article
research article
Published by Wiley in Statistics in Medicine

Vol. 29 (28), 2920-2931
https://doi.org/10.1002/sim.3944

Abstract

When missing data occur in one or more covariates in a regression model, multiple imputation (MI) is widely advocated as an improvement over complete‐case analysis (CC). We use theoretical arguments and simulation studies to compare these methods with MI implemented under a missing at random assumption. When data are missing completely at random, both methods have negligible bias, and MI is more efficient than CC across a wide range of scenarios. For other missing data mechanisms, bias arises in one or both methods. In our simulation setting, CC is biased towards the null when data are missing at random. However, when missingness is independent of the outcome given the covariates, CC has negligible bias and MI is biased away from the null. With more general missing data mechanisms, bias tends to be smaller for MI than for CC. Since MI is not always better than CC for missing covariate problems, the choice of method should take into account what is known about the missing data mechanism in a particular substantive application. Importantly, the choice of method should not be based on comparison of standard errors. We propose new ways to understand empirical differences between MI and CC, which may provide insights into the appropriateness of the assumptions underlying each method, and we propose a new index for assessing the likely gain in precision from MI: the fraction of incomplete cases among the observed values of a covariate (FICO). Copyright © 2010 John Wiley & Sons, Ltd.

Keywords

This publication has 33 references indexed in Scilit:

Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls
BMJ, 2009
4. Regression with Missing Ys: An Improved Strategy for Analyzing Multiply Imputed Data
Sociological Methodology, 2007
Sensitivity analysis after multiple imputation under missing at random: a weighting approach
Statistical Methods in Medical Research, 2007
Much Ado About Nothing
The American Statistician, 2007
Missing-Data Methods for Generalized Linear Models
Journal of the American Statistical Association, 2005
Adjusting for partially missing baseline measurements in randomized trials
Statistics in Medicine, 2004
Multiple Imputation for Missing Data
Sociological Methods & Research, 2000
Multiple Imputation after 18+ Years
Journal of the American Statistical Association, 1996
Data Analysis Using Hot Deck Multiple Imputation
Journal of the Royal Statistical Society: Series D (The Statistician), 1993
Regression With Missing X's: A Review
Journal of the American Statistical Association, 1992

Cited by 585 articles