Logistic regression of family data from retrospective study designs

7 October 2003

journal article
Published by Wiley in Genetic Epidemiology

Vol. 25 (3), 177-189
https://doi.org/10.1002/gepi.10267

Abstract

We wish to study the effects of genetic and environmental factors on disease risk, using data from families ascertained because they contain multiple cases of the disease. To do so, we must account for the way participants were ascertained, and for within‐family correlations in both disease occurrences and covariates. We model the joint probability distribution of the covariates of ascertained family members, given family disease occurrence and pedigree structure. We describe two such covariate models: the random effects model and the marginal model. Both models assume a logistic form for the distribution of one person's covariates that involves a vector β of regression parameters. The components of β in the two models have different interpretations, and they differ in magnitude when the covariates are correlated within families. We describe ascertainment assumptions needed to estimate consistently the parameters β_RE in the random effects model and the parameters β_M in the marginal model. Under the ascertainment assumptions for the random effects model, we show that conditional logistic regression (CLR) of matched family data gives a consistent estimate for β_RE and a consistent estimate for the covariance matrix of . Under the ascertainment assumptions for the marginal model, we show that unconditional logistic regression (ULR) gives a consistent estimate for β_M, and we give a consistent estimator for its covariance matrix. The random effects/CLR approach is simple to use and to interpret, but it can use data only from families containing both affected and unaffected members. The marginal/ULR approach uses data from all individuals, but its variance estimates require special computations. A C program to compute these variance estimates is available at http://www.stanford.edu/dept/HRP/epidemiology. We illustrate these pros and cons by application to data on the effects of parity on ovarian cancer risk in mother/daughter pairs, and use simulations to study the performance of the estimates. Genet Epidemiol 25:177–189, 2003.

Keywords

This publication has 14 references indexed in Scilit:

Evaluation of Candidate Genes in Case-Control Studies: A Statistical Method to Account for Related Subjects
American Journal of Human Genetics, 2001
Bias and Efficiency in Family-Based Gene-Characterization Studies: Conditional, Prospective, Retrospective, and Joint Likelihoods
American Journal of Human Genetics, 2000
Combined association and aggregation analysis of data from case-control family studies
Biometrika, 1998
Informative Drop-Out in Longitudinal Data Analysis
Journal of the Royal Statistical Society Series C: Applied Statistics, 1994
Longitudinal data analysis using generalized linear models
Biometrika, 1986
Logistic disease incidence models and case-control studies
Biometrika, 1979
Choosing between Logistic Regression and Discriminant Analysis
Journal of the American Statistical Association, 1978
The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis
Journal of the American Statistical Association, 1975
Separate sample logistic discrimination
Biometrika, 1972
A multivariate analysis of the risk of coronary heart disease in Framingham
Journal of Chronic Diseases, 1967

Cited by 24 articles