Insights into latent class analysis of diagnostic test performance

Abstract
Latent class analysis is used to assess diagnostic test accuracy when a gold standard assessment of disease is not available but results of multiple imperfect tests are. We consider the simplest setting, where 3 tests are observed and conditional independence (CI) is assumed. Closed-form expressions for maximum likelihood parameter estimates are derived. They show explicitly how observed 2- and 3-way associations between test results are used to infer disease prevalence and test true- and false-positive rates. Although interesting and reasonable under CI, the estimators clearly have no basis when it fails. Intuition for bias induced by conditional dependence follows from the analytic expressions. Further intuition derives from an Expectation Maximization (EM) approach to calculating the estimates. We discuss implications of our results and related work for settings where more than 3 tests are available. We conclude that careful justification of assumptions about the dependence between tests in diseased and nondiseased subjects is necessary in order to ensure unbiased estimates of prevalence and test operating characteristics and to provide these estimates clinical interpretations. Such justification must be based in part on a clear clinical definition of disease and biological knowledge about mechanisms giving rise to test results.