UMLS consists of the following components:
The UMLS was designed and is maintained by the US National Library of Medicine, is updated quarterly and may be used for free. The project was initiated in 1986 by Donald Lindberg, M.D., then Director of the Library of Medicine.
UMLS can be used to design information retrieval or patient record systems, to facilitate the communication between different systems, or to develop systems that parse the biomedical literature. For many of these applications, the UMLS will have to be used in a customized form, for instance by excluding certain source vocabularies that are not relevant to the application. The Library of Medicine itself uses it for its PubMed and ClinicalTrials.gov systems.
Users of the system have to sign a "UMLS agreement" and file brief annual reports on their use. Academic users can employ the UMLS free of charge for research. Commercial or production use requires copyright licenses for some of the incorporated source vocabularies.
The Metathesaurus is organized by concept, and each concept has specific attributes defining its meaning and is linked to the corresponding concept names in the various source vocabularies. Numerous relationships between the concepts are represented, for instance hierarchical ones such as "isa" for subclasses and "is part of" for subunits, and associative ones such as "is caused by" or "in the literature often occurs close to" (the latter being derived from Medline).
The scope of the Metathesaurus is determined by the scope of the source vocabularies. If different vocabularies use different names for the same concept, or if they use the same name for different concepts, then this will be faithfully represented in the Metathesaurus. All hierarchical information from the source vocabularies is retained in the Metathesaurus. Metathesaurus concepts can also link to resources outside of the database, for instance gene sequence databases.
The Metathesaurus itself is produced by the automated processing of machine-readable versions of the source vocabularies, followed by human intervention of editing and review. It is distributed as an SQL relational database and can also be accessed via a Java object-oriented API.
The major semantic types are organisms, anatomical structures, biologic function, chemicals, events, physical objects, and concepts or ideas. The links among semantic types provide the structure for the network and show important relationships between the groupings and concepts. The primary link between semantic types is the "isa" link, establishing a hierarchy of types and allowing to locate the most specific semantic type to assign to a given Metathesaurus concept. The network also has 5 major categories of non-hierarchical (or "associational") relationships. These are "physically related to", "spatially related to", "temporally related to", "functionally related to" and "conceptually related to".
The information about a Semantic type includes an identifier, definition, examples, hierarchical information about the encompassing Semantic type(s), and its associational relationships. Associational relationships within the Semantic Network are very weak. They capture at most some-some relationships, i.e. they capture the fact that some instance of the first type may be connected by the salient relationship to some instance of the second type. Phrased differently, they capture the fact that a corresponding relational assertion is meaningful (though it need not be true in all cases).
Entries may be one-word or multiple-word terms. Records contain four parts: base form (i.e. "run" for "running"); parts of speech (of which Specialist recognizes eleven); a unique identifier; and any available spelling variants. For example, a query for "anesthetic" would return the following:
{base=anaesthetic
spelling_variant=anesthetic
entry=E0008769
cat=nounvariants=reg}
{base=anaesthetic
spelling_variant=anesthetic
entry=E0008770
cat=adj
variants=invposition=attrib(3)} (Browne et al., 2000)
The SPECIALIST lexicon is available in two formats. The "unit record" format can be seen above, and comprises slots and fillers. A slot is the element (i.e. "base=" or "spelling variant=") and the fillers are the values attributable to that slot for that entry. The "relational table" format is not yet normalized and contains a great deal of duplication of data.
lvg is a program that uses the SPECIALIST lexicon to generate lexical variants of a given term and to support the parsing of natural language text.
MetaMap is an online tool that, when given an arbitrary piece of text, finds and returns the relevant Metathesaurus concepts. MetaMap Transfer (MMTx) provides the same functionality as a Java program.
Knowledge Source Server is an online application that allows one to browse the Metathesaurus.