Determining the value of additional surrogate exposure data for improving the estimate of an odds ratio

15 December 1995

journal article
Published by Wiley in Statistics in Medicine

Vol. 14 (23), 2581-2598
https://doi.org/10.1002/sim.4780142307

Abstract

We consider the design of both cohort and case‐control studies in which an initial (‘stage 1’) sample of complete data on an error‐free disease indicator (D), a correct (‘gold‐standard’) dichotomous exposure measurement (X) and an error‐prone exposure measurement (Z) are available. We calculate the amount of additional information on the odds ratio relating D to X that one can obtain from a second (‘stage 2’) sample of measurements only on D and Z. If one allows for differential measurement error in Z, there is often little advantage in having more than four times as much data in stage 2 data as in stage 1. With the assumption that a non‐differential measurement error model is reasonable, larger amounts of stage 2 data can be useful. Simulations indicate that stage 1 samples of modest size (50 cases in case‐control studies and 50 failures in cohort studies) yield sufficiently reliable estimates of needed parameters to assist in determining an appropriate size for the stage 2 sample. These ideas apply in settings either where the amount of stage 1 data is limited and fixed by external constraints or where one has gathered stage 1 data in advance to avoid collecting superfluous stage 2 data.

Keywords

This publication has 15 references indexed in Scilit:

Testing hypotheses with binary data subject to misclassification errors: Analysis and experimental design
Biometrika, 1991
A review of methods for misclassified categorical data in epidemiology
Statistics in Medicine, 1989
Precision of double sampling estimators for comparing two probabilities
Biometrika, 1987
Adjusting for Errors in Classification and Measurement in the Analysis of Partly and Purely Categorical Data
Journal of the American Statistical Association, 1986
Log-Linear Models for Doubly Sampled Categorical Data Fitted by the EM Algorithm
Journal of the American Statistical Association, 1985
Log-Linear Models for Categorical Data With Misclassification and Double Sampling
Journal of the American Statistical Association, 1979
A Double Sampling Scheme for Estimating from Misclassified Multinomial Data with Applications to Sampling Inspection
Technometrics, 1972
A Double Sampling Scheme for Estimating from Binomial Data with Misclassifications
Journal of the American Statistical Association, 1970

Cited by 3 articles