Abstract
Information in claims databases resides in data patterns rather than in data elements. Finding this information requires new terminology, a willingness to pose questions of form rather than specific hypotheses, and a quality control system that elevates the correctness of data relations above the validity of single facts. The language of claims data is a newspeak of CPT (Current Procedural Terminology), HCPCS (Health Care Financing Agency Common Procedure Coding System), ICD (International Classification of Disease), and NDC (National Drug Codes) for pharmaceutical codes. The techniques of pattern discovery are really ways of asking the data for classes of relations, and they vary in their reliance on external information. Sometimes, the question is entirely constrained by preceding factors. Other times we may recast the natural history of disease into a claims context and ask the data to give us the shape of disease evolution. We can use highly automated systems to evaluate the relations between prespecified factors, or empirical techniques to search out common relations that we have not specified in advance. Using massive data sets requires that quality control corresponds to the nature of the high-level information that we derive from large databases. Copyright © 2001 John Wiley & Sons, Ltd.