Logistic regression of family data from retrospective study designs
- 7 October 2003
- journal article
- Published by Wiley in Genetic Epidemiology
- Vol. 25 (3), 177-189
- https://doi.org/10.1002/gepi.10267
Abstract
We wish to study the effects of genetic and environmental factors on disease risk, using data from families ascertained because they contain multiple cases of the disease. To do so, we must account for the way participants were ascertained, and for within‐family correlations in both disease occurrences and covariates. We model the joint probability distribution of the covariates of ascertained family members, given family disease occurrence and pedigree structure. We describe two such covariate models: the random effects model and the marginal model. Both models assume a logistic form for the distribution of one person's covariates that involves a vector β of regression parameters. The components of β in the two models have different interpretations, and they differ in magnitude when the covariates are correlated within families. We describe ascertainment assumptions needed to estimate consistently the parameters βRE in the random effects model and the parameters βM in the marginal model. Under the ascertainment assumptions for the random effects model, we show that conditional logistic regression (CLR) of matched family data gives a consistent estimate for βRE and a consistent estimate for the covariance matrix of . Under the ascertainment assumptions for the marginal model, we show that unconditional logistic regression (ULR) gives a consistent estimate for βM, and we give a consistent estimator for its covariance matrix. The random effects/CLR approach is simple to use and to interpret, but it can use data only from families containing both affected and unaffected members. The marginal/ULR approach uses data from all individuals, but its variance estimates require special computations. A C program to compute these variance estimates is available at http://www.stanford.edu/dept/HRP/epidemiology. We illustrate these pros and cons by application to data on the effects of parity on ovarian cancer risk in mother/daughter pairs, and use simulations to study the performance of the estimates. Genet Epidemiol 25:177–189, 2003.Keywords
This publication has 14 references indexed in Scilit:
- Evaluation of Candidate Genes in Case-Control Studies: A Statistical Method to Account for Related SubjectsAmerican Journal of Human Genetics, 2001
- Bias and Efficiency in Family-Based Gene-Characterization Studies: Conditional, Prospective, Retrospective, and Joint LikelihoodsAmerican Journal of Human Genetics, 2000
- Combined association and aggregation analysis of data from case-control family studiesBiometrika, 1998
- Informative Drop-Out in Longitudinal Data AnalysisJournal of the Royal Statistical Society Series C: Applied Statistics, 1994
- Longitudinal data analysis using generalized linear modelsBiometrika, 1986
- Logistic disease incidence models and case-control studiesBiometrika, 1979
- Choosing between Logistic Regression and Discriminant AnalysisJournal of the American Statistical Association, 1978
- The Efficiency of Logistic Regression Compared to Normal Discriminant AnalysisJournal of the American Statistical Association, 1975
- Separate sample logistic discriminationBiometrika, 1972
- A multivariate analysis of the risk of coronary heart disease in FraminghamJournal of Chronic Diseases, 1967