A study of abbreviations in the UMLS.

  • 1 January 2001
    • journal article
    • research article
    • p. 393-7
Abstract
Abbreviations are widely used in medicine. The understanding of abbreviations is important for medical language processing and information retrieval systems. The Unified Medical Language System (UMLS) contains a large number of abbreviations. We hypothesized that extracting and studying the UMLS abbreviations can be helpful for understanding the characteristics of abbreviations in medicine. In this paper, we describe a method for extracting abbreviations from the UMLS. We evaluated the method and studied the ambiguous nature of the abbreviations. In addition, the coverage of the UMLS abbreviations in medical reports was studied. Using our method, we extracted 163,666 unique (abbreviation, full form) pairs from the UMLS with a precision of 97.5%, and a recall of 96%. The UMLS abbreviations were highly ambiguous: 33.1% of abbreviations with six characters or less had multiple meanings; the average number of different full forms for all abbreviations with six characters or less was 2.28. The coverage of the UMLS abbreviations in medical reports was over 66%.