Problems due to Small Samples and Sparse Data in Conditional Logistic Regression Analysis

Top Cited Papers

Open Access

1 March 2000

journal article
research article
Published by Oxford University Press (OUP) in American Journal of Epidemiology

Vol. 151 (5), 531-539
https://doi.org/10.1093/oxfordjournals.aje.a010240

Abstract

Conditional logistic regression was developed to avoid “sparse-data” biases that can arise in ordinary logistic regression analysis. Nonetheless, it is a large-sample method that can exhibit considerable bias when certain types of matched sets are infrequent or when the model contains too many parameters. Sparse-data bias can cause misleading inferences about confounding, effect modification, dose response, and induction periods, and can interact with other biases. In this paper, the authors describe these problems in the context of matched case-control analysis and provide examples from a study of electrical wiring and childhood leukemia and a study of diet and glioma. The same problems can arise in any likelihood-based analysis, including ordinary logistic regression. The problems can be detected by careful inspection of data and by examining the sensitivity of estimates to category boundaries, variables in the model, and transformations of those variables. One can also apply various bias corrections or turn to methods less sensitive to sparse data than conditional likelihood, such as Bayesian and empirical-Bayes (hierarchical regression) methods. Am J Epidemiol 2000;151:531–9.

Keywords

This publication has 2 references indexed in Scilit:

Assumptions for Statistical Inference
The American Statistician, 1993
Bias correction in maximum likelihood logistic regression
Statistics in Medicine, 1983

Cited by 271 articles