Logistic Regression When the Outcome Is Measured with Uncertainty

Abstract
In epidemiologic research, logistic regression is often used to estimate the odds of some outcome of interest as a function of predictors. However, in some datasets, the outcome of interest is measured with imperfect sensitivity and specificity. It is well known that the misclassification induced by such an imperfect diagnostic test will lead to biased estimates of the odds ratios and their variances. In this paper, the authors show that when the sensitivity and specificity of a diagnostic test are known, it is straightforward to incorporate this information into the fitting of logistic regression models. An EM algorithm that produces unbiased estimates of the odds ratios and their variances is described. The resulting odds ratio estimates tend to be farther from the null but have greater variance than estimates found by ignoring the imperfections of the test. The method can be extended to the situation where the sensitivity and specificity differ for different study subjects, i.e., nondifferential misclassification. The method is useful even when the sensitivity and specificity are not known, as a way to see the degree to which various assumptions about sensitivity and specificity affect one's estimates. The method can also be used to estimate sensitivity and specificity under certain assump tions or when a validation subsample is available. Several examples are provided to compare the results of this method with those obtained by standard logistic regression. A SAS macro that implements the method is available on the World Wide Web at http://som1.ab.umd.edu/Epidemiology/software.html Am J Epidemiol 1997;146:195–203.