The effects of model selection on confidence intervals for the size of a closed population

Abstract
One encounters in the literature estimates of some rates of genetic and congenital disorders based on log-linear methods to model possible interactions among sources. Often the analyst chooses the simplest model consistent with the data for estimation of the size of a closed population and calculates confidence intervals on the assumption that this simple model is correct. However, despite an apparent excellent fit of the data to such a model, we note here that the resulting confidence intervals may well be misleading in that they can fail to provide an adequate coverage probability. We illustrate this with a simulation for a hypothetical population based on data reported in the literature from three sources. The simulated nominal 95 per cent confidence intervals contained the modelled population size only 30 per cent of the time. Only if external considerations justify the assumption of plausible interactions of sources would use of the simpler model's interval be justified.