Uncertainties in Identifying Responsible Pollutants in Observational Epidemiology Studies

Abstract
Studies of community air pollution must deal with a complex mixture of substances for which the available data on concentrations and their distributions vary greatly in completeness and accuracy. The monitoring database available for some pollutants (such as suspended particulate matter) far exceeds that available for others (such as carbon monoxide or nitrogen dioxide) in terms of spatial and temporal coverage. Little or no routine monitoring data are available on aeroallergens or on particles classified by size and chemistry, for example. In addition, the relationships between outdoor air concentrations and personal exposures vary by chemical species. This article addresses the concern that the availability and quality of observed data may limit the validity of the conclusions that can be derived from retrospective studies. The basic assumptions of multiple regression analysis, the statistical tool most commonly used to study the effects of air quality on health, are reviewed. We show by data simulation and by numerical experiments with mortality and air quality data from Philadelphia that differences in the reliability of exposure estimates can be critical in the implied relationships between correlated variables in multiple (Joint) regressions. Further, measurement error obscures the true degree of collinearity that may actually be present. Finally, we consider how nonlinear transformations can affect Judgments about the relative importance of the variables considered. While models based on linear pollution relationships may be facile and may be convenient in characterizing effects, we have no assurance that they are in fact correct. Resolution of these issues will require better population-based air quality monitoring data as well as laboratory studies appropriate to characterizing the nature of the implied biological responses to the mixtures and concentrations that currently comprise community air quality.