Evaluating the Coverage of Controlled Health Data Terminologies: Report on the Results of the NLM/AHCPR Large Scale Vocabulary Test
Open Access
- 1 November 1997
- journal article
- Published by Oxford University Press (OUP) in Journal of the American Medical Informatics Association
- Vol. 4 (6), 484-500
- https://doi.org/10.1136/jamia.1997.0040484
Abstract
Objective: To determine the extent to which a combination of existing machine-readable health terminologies cover the concepts and terms needed for a comprehensive controlled vocabulary for health information systems by carrying out a distributed national experiment using the Internet and the UMLS Knowledge Sources, lexical programs, and server. Methods: Using a specially designed Web-based interface to the UMLS Knowledge Source Server, participants searched the more than 30 vocabularies in the 1996 UMLS Metathesaurus and three planned additions to determine if concepts for which they desired controlled terminology were present or absent. For each term submitted, the interface presented a candidate exact match or a set of potential approximate matches from which the participant selected the most closely related concept. The interface captured a profile of the terms submitted by the participant and for each term searched, information about the concept (if any) selected by the participant. The term information was loaded into a database at NLM for review and analysis and was also available to be downloaded by the participant. A team of subject experts reviewed records to identify matches missed by participants and to correct any obvious errors in relationships. The editors of SNOMED International and the Read Codes were given a random sample of reviewed terms for which exact meaning matches were not found to identify exact matches that were missed or any valid combinations of concepts that were synonymous to input terms. The 1997 UMLS Metathesaurus was used in the semantic type and vocabulary source analysis because it included most of the three planned additions. Results: Sixty-three participants submitted a total of 41,127 terms, which represented 32,679 normalized strings. More than 80% of the terms submitted were wanted for parts of the patient record related to the patient's condition. Following review, 58% of all submitted terms had exact meaning matches in the controlled vocabularies in the test, 41% had related concepts, and 1% were not found. Of the 28% of the terms which were narrower in meaning than a concept in the controlled vocabularies, 86% shared lexical items with the broader concept, but had additional modification. The percentage of exact meaning matches varied by specialty from 45% to 71%. Twenty-nine different vocabularies contained meanings for some of the 23,837 terms (a maximum of 12,707 discrete concepts) with exact meaning matches. Based on preliminary data and analysis, individual vocabularies contained SNOMED International and the Read Codes had more than 60% of the terms and more than 50% of the concepts. Conclusions: The combination of existing controlled vocabularies included in the test represents the meanings of the majority of the terminology needed to record patient conditions, providing substantially more exact matches than any individual vocabulary in the set. From a technical and organizational perspective, the test was successful and should serve as a useful model, both for distributed input to the enhancement of controlled vocabularies and for other kinds of collaborative informatics research.Keywords
This publication has 13 references indexed in Scilit:
- Call for a Standard Clinical VocabularyJournal of the American Medical Informatics Association, 1997
- Phase II Evaluation of Clinical Coding Schemes: Completeness, Taxonomy, Mapping, Definitions, and ClarityJournal of the American Medical Informatics Association, 1997
- Conducting the NLM/AHCPR Large Scale Vocabulary Test: a distributed Internet-based experiment.1997
- A clinically derived terminology: qualification to reduction.1997
- Planned NLM/AHCPR Large-scale Vocabulary Test: Using UMLS Technology to Determine The Extent to Which Controlled Vocabularies Cover Terminology Needed for Health Care and Public HealthJournal of the American Medical Informatics Association, 1996
- The Content Coverage of Clinical ClassificationsJournal of the American Medical Informatics Association, 1996
- The UMLS Knowledge Source Server: a versatile Internet-based research tool.1996
- Standards for Medical Identifiers, Codes, and Messages Needed to Create an Efficient Computer-Stored Medical RecordJournal of the American Medical Informatics Association, 1994
- The Measurement of Observer Agreement for Categorical DataBiometrics, 1977
- Measuring nominal scale agreement among many raters.Psychological Bulletin, 1971