Computer Algorithm for Automated Work Group Classification From Free Text: The DREAM Technique
- 1 January 2007
- journal article
- research article
- Published by Wolters Kluwer Health in Journal of Occupational and Environmental Medicine
- Vol. 49 (1), 41-49
- https://doi.org/10.1097/01.jom.0000251826.37828.2e
Abstract
This study developed and tested a computer method to automatically assign subjects to aggregate work groups based on their free text work descriptions.The Double Root Extended Automated Matcher (DREAM) algorithm classifies individuals based on pairs of subjects' free text word roots in common with those of standard classification systems and several explicitly defined linkages between term roots and aggregates.DREAM effectively analyzed free text from 5887 participants in a multisite chronic obstructive pulmonary disease prevention study (Lung Health Study). For a test set of 533 cases, DREAMs classifications compared favorably with those of a four-human panel. The humans rated the accuracy of DREAM as good or better in 80% of the test cases.Automated text interpretation is a promising tool for analyzing large data sets for applications in data mining, research, and surveillance. Work descriptive information is most useful when it can link an individual to aggregate entities that have occupational health relevance. Determining the appropriate group requires considerable expertise. This article describes a new method for making such assignments using a computer algorithm to reduce dependence on the limited number of occupational health experts. In addition, computer algorithms foster consistency of assignments.Keywords
This publication has 20 references indexed in Scilit:
- Influence of Residency Training on Occupational Medicine Practice PatternsJournal of Occupational and Environmental Medicine, 2005
- Airflow obstruction attributable to work in industry and occupation among U.S. race/ethnic groups: A study of NHANES III dataAmerican Journal of Industrial Medicine, 2004
- The use of occupation and industry classifications in general population studiesInternational Journal of Epidemiology, 2003
- On the use and usefulness of fuzzy sets in medical AIArtificial Intelligence in Medicine, 2001
- Performance of population specific job exposure matrices (JEMs): European collaborative analyses on occupational risk factors for chronic obstructive pulmonary disease with job exposure matrices (ECOJEM)Occupational and Environmental Medicine, 2000
- Occupational case‐control studies: I. Collecting information on work histories and work‐related exposuresAmerican Journal of Industrial Medicine, 1994
- Occupational case‐control studies: II. Recommendations for exposure assessmentAmerican Journal of Industrial Medicine, 1994
- Chronic Obstructive Pulmonary Disease Early Intervention Trial (Lung Health Study)Chest, 1993
- Design of the Lung Health Study: A randomized clinical trial of early intervention for chronic obstructive pulmonary diseaseControlled Clinical Trials, 1993
- COSTS AND STATISTICAL POWER ASSOCIATED WITH FIVE METHODS OF COLLECTING OCCUPATION EXPOSURE INFORMATION FOR POPULATION-BASED CASE-CONTROL STUDDIESAmerican Journal of Epidemiology, 1989