A Brief Review of Clinical Vocabularies

by Michael Krauthammer, M.D., Columbia U., New York, USA

-I asked Michael, currently a Ph.D. student in Medical Informatics, to comment on the entries provided for clinical terms.-Chris S.

Here is a short review of the clinical vocabularies discussed. Table 1 gives some sort of an overview. The breakdown follows some of the recommendations made by Jim Cimino in his paper on desiderata of controlled vocabularies (attached). One of the most important features -concept permanence- is hard to determine (but not impossible). What it means is basically that codes used today are not suddenly discarded or reassigned in a later release of the vocabulary. This would result in meaningless or wrong coding of older data stored in a database.

 

1)     UMLS: The UMLS Metathesaurus is a collection of different source vocabularies. Grouping of concepts is done according to meaning and lexical characteristics of terms, resulting in a target term/synonym collection for each concept. Although the content is great, there is no real class hierarchy which encompasses all UMLS concepts (besides the UMLS semantic network, which has just few concepts.). The relationships in the UMLS stem either from relationships submitted by the individual source vocabularies or are added by the UMLS ( http://www.nlm.nih.gov/research/umls/umlsmain.html ).

2)     MESH: Although geared specifically for information retrieval, MESH can almost be seen as a general purpose vocabulary with concepts from all domains of the biomedical domain.. Disadvantage: No real is-a hierarchy (rather something like broader-than hierarchy). MESH web site features a vocabulary browser: http://www.nlm.nih.gov/mesh/meshhome.html.

3)     ICD9-CM: International classification of diseases, clinical modification. Includes medical procedures. Advantage: Broadly used (state-mandated etc.), Disadvantage: No context-free concept identifier, lack of granularity (http://www.cdc.gov/nchs/about/otheract/icd9/abticd9.htm).

4)     SNOMED: Potentially, SNOMED could be the ultimate choice for coding clinical data. Main disadvantage: SNOMED RT and SNOMED CT are not freely available. SNOMED CT is the newest release, which incorporates codes from the READ code version3 (UK coding schema) which are actually mapped to the UMLS. SNOMED International is the last free** release (I think in 1998) and is mapped to UMLS. Several things: First, I compared some random codes from SNOMED RT and SNOMED International and found an upward compatibility (SNOMED RT has many more codes than SNOMED International). LOINC codes are cross-mapped to SNOMED RT codes (see below). **there is of course a copyright line in the UMLS. (http://www.snomed.org/main.html).

5)     GALEN: Not a real vocabulary, rather a coding technology for mapping between different vocabularies. Sure, GALEN contains many medical terms which drive the technology. But access to these terms is not straightforward, I don't think they are mapped to UMLS. The GALEN project lost its funding sometimes in the last years and I am not sure whether the terminology is still updated. (http://www.opengalen.org)

6)     LOINC: Clinical and laboratory observations, ADVANTAGE: up to date, contains many genetic tests, mapped to UMLS. LOINC codes are cross-mapped to SNOMED RT and CT, underlying LOINC's significance. Free browsing and mapping program available from their website http://www.regenstrief.org/loinc/. As an disadvantage I could see the flat representation of the data.

 

 

Table.1

 

Content

Concept-based

Formal definitions

Concept permanence

Multiple hierarchies

Context free concept identifier

UMLS

 

++++

Y

N*

?

N*

Y

MESH

+

Y

N

?

Yes

N

ICD9-CM

+

Y/N

N

N

N

N

SNOMED International (1998)

++

Y

N

?

?

Y/N

SNOMED RT

+++

Y

Y

?

Y

Y

GALEN

?

Y

Y

?

Y

?

LOINC

++

Y

N

?

N

Y

 

* dependent on source vocabularies