Multiple imputation in health‐are databases: An overview and some applications

Abstract
Multiple imputation for non‐response replaces each missing value by two or more plausible values. The values can be chosen to represent both uncertainty about the reasons for non‐response and uncertainty about which values to impute assuming the reasons for non‐response are known. This paper provides an overview of methods for creating and analysing multiply‐imputed data sets, and illustrates the dramatic improvements possible when using multiple rather than single imputation. A major application of multiple imputation to public‐use files from the 1970 census is discussed, and several exploratory studies related to health care that have used multiple imputation are described.