`Subject Headings (MeSH) Vocabulary
`to Perform Literature Searches
`
`Henry J. Lowe, MD, G. Octo Barnett, MD
`
`The United States National Library of Medicine's (NLM) MEDLINE database is
`the largest and most widely used medical bibliographic database. MEDLINE is
`manually indexed with NLM's Medical Subject Headings (MeSH) vocabulary.
`Using MeSH, a searcher can potentially create powerful and unambiguous
`MEDLINE queries. This article reviews the structure and use of MeSH, directed
`toward the nonexpert, and outlines how MeSH may help resolve a number of
`common difficulties encountered when searching MEDLINE. The increasing
`importance of the MEDLINE database as an information resource and the trend
`toward individuals performing their own bibliographic searches makes it crucial
`that health care professionals become familiar with MeSH.
`(JAMA. 1994;271:1103-1108)
`
`THE UNITED STATES National Li¬
`brary of Medicine's (NLM)1 Medical Sub¬
`ject Headings (MeSH) vocabulary is used
`to index a number of important compu¬
`ter-based biomedicai databases such as
`MEDLINE. The MEDLINE database,
`created and maintained by NLM, con¬
`tains more than 7 million references to
`the biomedicai
`literature since 1966.
`MEDLINE covers more than 3500 jour¬
`nals. Approximately two thirds of these
`references contain English-language, au¬
`thor-written abstracts. When properly
`used, MeSH can be a powerful tool, im¬
`proving access to the medical literature.
`Until recently, most MEDLINE searches
`were performed by specially trained in¬
`dividuals who of necessity became ex¬
`perts on MeSH. However, with the trend
`toward end users' performing their own
`searches,2·3 a review of the structure and
`use of MeSH, directed toward the non¬
`expert, appears timely. This article pro¬
`vides an introduction to MeSH, includ¬
`ing some important principles used in
`selecting MeSH terms for searching
`
`From the Laboratory of Computer Science, Massa-
`chusetts General Hospital, Harvard Medical School,
`Boston. Dr Lowe is now with the Section of Medical In-
`formatics, Department of Medicine, University of Pitts-
`burgh (Pa) School of Medicine.
`Reprint requests to Section of Medical Informatics,
`Department of Medicine, University of Pittsburgh
`School of Medicine, B50A Lothrop Hall, Pittsburgh, PA
`15261 (Dr Lowe).
`
`MEDLINE and other MeSH-indexed
`biomedicai databases.
`
`THE INFORMATION EXPLOSION
`IN BIOMEDICINE
`Physicians as primary clinical deci¬
`sion makers have a significant impact on
`the cost and quality of care. However,
`the current information explosion4 in bio-
`medicine makes it progressively more
`difficult for physicians to access the in¬
`formation necessary to make intelligent
`and cost-effective clinical decisions. More
`than 20000 biomedicai journals and ap¬
`proximately 17 000 new biomedicai books
`are published annually.5 Covell et al,6 in
`their landmark study of information
`needs in the office setting, found that
`only 30% of physicians'
`information
`needs were met during the patient visit.
`They observed that traditional paper-
`based references played a minor role in
`problem solving with most questions be¬
`ing answered by other health profes¬
`sionals. In 1979 Stross and Harlan7 docu¬
`mented a serious problem with the dis¬
`semination of new medical information
`to practicing physicians. More recently,
`Williamson et al8 conducted a survey of
`more than 700 physicians and concluded
`that "primary practitioners require sub¬
`stantial help in meeting current science
`information needs."
`What is needed is a new model of
`medical knowledge acquisition that em-
`
`phasizes the importance of knowledge-
`seeking skills and the ability of widely
`available computer technology to aug¬
`ment the physician's cognitive poten¬
`tial. Currently the most striking example
`of the use of computer technology to
`support the physician's need for medi¬
`cal knowledge is the retrieval ofinforma¬
`tion from a wide variety of on-line bio-
`medical databases such as MEDLINE."
`Proud et al10 recently concluded that
`"when [medical] students were taught
`the skills ofaccessing MEDLINE by com¬
`puter, they could formulate a question,
`retrieve current information, critically
`review relevant articles, communicate ef¬
`fectively, and use these skills to contrib¬
`ute to patient care."
`
`THE MEDLINE DATABASE
`Access to MEDLINE11·12 is widely
`available through NLM's computer-
`based Medical Literature Analysis and
`Retrieval System (MEDLARS)13 and
`from a variety of commercial vendors.14-16
`MEDLARS is often used in conjunction
`with NLM's Grateful Med1719 software,
`which provides an easy-to-use system
`for formulating and executing MED¬
`LINE queries using inexpensive micro¬
`computers. Approximately 5.5 million
`searches were performed on NLM's
`MEDLARS system in 1992. The de¬
`creasing costs of optical storage tech¬
`nology has resulted in the emergence of
`systems offering MEDLINE on CD-
`ROM.20·21 Teaching the skills required to
`conduct computer-assisted literature
`searches is now a part of the curriculum
`in many US medical schools.2224 Acces¬
`sibility and awareness are no longer ma¬
`jor impediments to the widespread use
`of on-line biomedicai databases.
`The NLM carefully indexes each new
`MEDLINE citation25 with a number of
`terms from its MeSH vocabulary. Al¬
`most all MEDLINE retrieval systems
`support the use of MeSH when search-
`
`Downloaded From: http://jama.jamanetwork.com/ by a University of Michigan User on 04/20/2015
`
`IBM-1009
`Page 1 of 6
`
`
`
`Table 1.—Major National Library of Medicine Data¬
`bases Indexed With Medical Subject Headings
`(MeSH)*
`
`Database
`AIDSLINE
`AIDSTRIALS
`
`AVLINE
`
`BIOETHICS
`
`CANCERLIT
`HEALTH
`
`MEDLINE
`
`TOXLINE
`
`Contents
`Citations to the AIDS literature
`Active and closed clinical AIDS
`trials
`Audiovisual materials for health
`professionals
`Citations to biomedicai ethics
`literature
`Citations to the cancer literature
`Health care administration and
`planning
`Citations to the biomedicai
`literature
`Citations to the toxicology
`literature
`
`:i:AIDS indicates acquired immunodeficiency syn¬
`drome.
`
`ing for citations. Using MeSH, a searcher
`can potentially create powerful and un¬
`ambiguous MEDLINE queries. MeSH
`is therefore an important gateway to
`literature. The increasing
`the medical
`importance of the MEDLINE database
`as an information resource makes it cru¬
`cial that health care professionals be¬
`come familiar with MeSH.
`
`THE MeSH VOCABULARY
`The MeSH vocabulary is a controlled
`thesaurus of almost 17000 terms main¬
`tained by NLM.26 MeSH is used to index
`citations in a number of biomedicai da¬
`tabases produced by NLM. Table 1 lists
`a sample of these databases that use
`MeSH vocabulary.
`Each MeSH term represents a single
`concept appearing in the biomedicai lit¬
`erature. As important new concepts or
`significant modifications ofexisting con¬
`cepts appear in the literature, NLM adds
`new terms to MeSH. When a new cita¬
`tion (a citation is the MEDLINE rep¬
`resentation of an article and includes
`information such as title, authors, source,
`abstract, MeSH indexing terms, and the
`like) is added to MEDLINE, NLM in¬
`dexers choose and attach the appropri¬
`ate MeSH terms (usually 10 to 12)26 rep¬
`resenting the contents of the article. A
`searcher can then use these MeSH terms
`to rapidly retrieve that citation and oth¬
`ers indexed with the same terms.
`Figure 1 shows a MEDLINE citation
`with its MeSH terms used to index an
`article on screening strategies for col-
`orectal cancer. The abstract is not in¬
`cluded in this example.
`To retrieve this citation from MED¬
`LINE a searcher could use the MeSH
`terms COLORECTAL NEOPLASMS
`and MASS SCREENING. The MeSH
`terms preceded by an asterisk in this
`sample citation are those judged by
`NLM indexers to represent the main
`concepts covered by this article and it is
`under these headings that the citation
`can be located in Index Medicus (the
`
`printed index of MEDLINE citations
`produced monthly by NLM). The MeSH
`terms not flagged with an asterisk are
`used to identify concepts that are dis¬
`cussed in the article but are not its pri¬
`mary topics.26
`When searching MEDLINE, one can
`take advantage of this "Major Concept"
`designation to limit the retrieval of po¬
`tentially less irrelevant citations. For
`example, a recent search of the last 5
`years of MEDLINE using the terms
`COLORECTAL NEOPLASMS and
`MASS SCREENING produced 144 ci¬
`tations when the Major Concept desig¬
`nation was used with each term vs 245
`citations if the Major Concept designa¬
`tion was not used to limit the search.
`This strategy effectively screened out
`101 potentially less irrelevant articles
`(approximately 41% of the total
`re¬
`trieved when the Major Concept desig¬
`nation was not used).
`If all MEDLINE citations indexed
`with a specific term must be retrieved
`(as might be the case when preparing a
`grant request or writing a review article),
`then the searcher would not use the Ma¬
`jor Concept designation to limit retrie¬
`val. Similarly, many expert MEDLINE
`searchers would initially execute a search
`without using the Major Concept desig¬
`nator and if the number of citations re¬
`trieved exceeded some arbitrary limit,
`then the searcher would reduce the num¬
`ber retrieved by repeating the search us¬
`ing the Major Concept designator with
`one or more MeSH terms.
`The Scope of MeSH
`The MeSH vocabulary reflects the
`scope of the biomedicai literature in that
`NLM adds terms as new concepts ap¬
`pear in the literature and removes or
`modifies terms as concepts change.
`MeSH is updated on a yearly basis to
`reflect these changes. For example, the
`1994 version of MeSH contains 716 new
`terms representing concepts with no di¬
`rectly corresponding terms in the 1993
`MeSH. For the 1994 MeSH, NLM also
`replaced 263 terms with more up-to-date
`terminology and deleted 44 terms.26
`MeSH terms are organized into a set
`of 15 hierarchies called the "MeSH Tree
`Structures" (described later in this ar¬
`ticle. Figure 2 lists the major MeSH
`Tree categories. Each of these catego¬
`ries is the root of a complex hierarchical
`increasingly specific
`arrangement of
`MeSH terms. These categories provide
`an overview of the general concept ar¬
`eas covered by MeSH.
`Special MeSH Terms
`MeSH contains some special types of
`terms that are never designated as Ma¬
`jor Concept headings but can be used
`
`when searching. These special MeSH
`terms are "Publication Types," "Check
`Tags," and "Geographic Terms."
`Publication Types.—This group of 47
`MeSH terms was introduced in 1991 to
`replace and extend the group of terms
`formerly known as "Citation Types."
`These terms provide an additional clas¬
`sification dimension for citations in MED¬
`LINE and other NLM databases. MeSH
`terms designated as Publication Types
`characterize the type of a publication
`rather than what it is about (Table 2).
`The MEDLINE searcher can use Pub¬
`lication Types to limit retrieval of cita¬
`tions to specific types ofpublications. For
`example, the addition of the term RE¬
`VIEW to our search on COLORECTAL
`NEOPLASMS and MASS SCREEN¬
`ING reduced the number of retrieved
`citations from 144 to 21. Each of these 21
`citations had been designated as a formal
`review article by NLM indexers.
`Check Tags.—This group of MeSH
`terms designates very broad attri¬
`butes of the content of journal articles
`(Table 3). These terms may be useful to
`MEDLINE searchers. Examples oftheir
`use include: MYOCARDIAL INFARC¬
`TION and FEMALE, PANCREAS
`TRANSPLANTATION and HUMAN,
`LAPAROTOMY and COMPARATIVE
`STUDY.
`Geographies.—Includes terms iden¬
`tifying individual geographic regions,
`continents, countries, states, and se¬
`lected cities. These terms can be used to
`restrict retrieval to articles dealing with
`concepts in specific geographic areas.
`For example: AIR POLLUTION and
`LOS ANGELES, INFANT MORTAL¬
`ITY and SOUTH AFRICA, HEALTH
`CARE RATIONING and OREGON.
`MeSH Subheadings
`MeSH contains a group of 80 terms
`called MeSH Subheadings (Table 4). Sub¬
`headings are used to qualify the use of
`MeSH terms and allow the searcher to
`limit retrieval to citations that deal with
`a specific aspect of a biomedicai concept.
`For example,
`in the sample citation
`shown in Fig 1, the term COLOREC¬
`TAL NEOPLASMS is qualified with the
`subheading PREVENTION & CON¬
`TROL. Use of this subheading allows
`the searcher to limit retrieval only to
`those citations dealing with the preven¬
`tion and control of colorectal neoplasms.
`MeSH contains "Scope Notes" that aid
`the searcher in selecting appropriate
`subheadings. For example, the MeSH
`Scope Note for the subheading PRE¬
`VENTION & CONTROL states: "Used
`with disease headings for increasing hu¬
`man or animal resistance against dis¬
`ease (eg, immunization), for control of
`transmission agents, for prevention and
`
`Downloaded From: http://jama.jamanetwork.com/ by a University of Michigan User on 04/20/2015
`
`IBM-1009
`Page 2 of 6
`
`
`
`Strategies for screening for colorectal carcinoma.
`Title:
`Authors: England WL; Halls JJ; Hunt VB
`Source: Med Decis Making 1989 Jan-Mar;9(1 ):3-13
`Barium Sulfate/DIAGNOSTIC USE
`MeSH:
`Colonoscopy
`Colorectal Neoplasms/*PREVENTION & CONTROL
`Comparative Study
`Cost Benefit Analysis
`*Decision Making, Computer-Assisted
`*Decision Trees
`Enema
`Human
`"Mass Screening
`Occult Blood
`Risk Factors
`Support, U.S. Gov't, P.H.S.
`
`Analytical, Diagnostic and Therapeutic Techniques (including Anesthesia)
`Anatomical Terms, Body Regions, Organs & Systems, Cytology and Embryology
`Anthropology, Education, Human Activities and Social Sciences
`Biological Phenomena, Genetics, Physiology, Occupations and Public Health
`Chemicals, Drugs, Biomedicai Materials, Hormones and Pollutants
`Human & Animal Diseases, Symptoms and General Pathology
`Geographicals (Continents, Regions, Countries, States and Some Cities)
`Health Care, Demography, Organizations and Population Characteristics
`Humanities, Art, History, Literature, Philosophy, Ethics and Religion
`Information & Library Sciences, Medical Informatics and Communications
`Named Groups (e.g., Age, Disabled, Ethnic, Occupational Groups etc.)
`Algae, Fungi, Bacteria, Invertebrates, Plants, Vertebrates and Viruses
`Physical Sciences (Specific Disciplines and Methods)
`Psychiatry and Psychology
`Technology, Materials, Industry, Transportation, Agriculture and Food
`
`Fig 1.—Sample MEDLINE citation.
`
`Fig 2.—Medical Subject Headings (MeSH) Tree categories.
`
`control of environmental disease. It in¬
`cludes preventive measures in individual
`cases."26
`Most MEDLINE retrieval systems
`support the use of subheadings to limit
`retrieval to specific aspects of a subject.
`For example, subheadings can be used
`to focus on citations dealing with the
`Complications, Diagnosis, Epidemiology,
`Etiology, Genetics, Mortality, Pathol¬
`ogy, Prevention & Control, Rehabilita¬
`tion, or Therapy of Disease States. In
`general, the searcher should, if possible,
`use a MeSH heading/subheading com¬
`bination rather than a MeSH heading/
`MeSH heading combination to search
`for citations dealing with a specific as¬
`pect of a topic. For example, to search
`for citations dealing with the surgical
`treatment of COLORECTAL NEO¬
`PLASMS use COLORECTAL NEO¬
`PLASMS/SURGERY rather than com¬
`bining the MeSH terms COLORECTAL
`NEOPLASMS and SURGERY, OP¬
`ERATIVE.
`Clearly not all MeSH term/subhead¬
`ing combinations are valid. For example,
`HEART/TRANSMISSION makes no
`sense. The valid combinations of MeSH
`terms and subheadings are governed by
`a set of "allowable category" rules built
`into the MeSH Tree Structures. A well-
`designed MEDLINE access system
`should allow the searcher to view all of
`the subheadings that may be combined
`with any given MeSH term.20 The ap¬
`propriate use of MeSH subheadings can
`help the searcher create highly specific
`MEDLINE queries that may improve
`search precision.
`TEXT-BASED VS MeSH-BASED
`SEARCHES
`Perhaps the most compelling reason
`for the use of MeSH terms when search¬
`ing MEDLINE is the challenge in choos-
`
`ing how one represents search topics. In
`addition to using MeSH indexing terms,
`most MEDLINE retrieval systems also
`support searching for citations by using
`one or more words that occur in the
`citation's title or abstract. This can be a
`useful searching strategy if, for example,
`there is no appropriate MeSH term, or
`if one wishes to modify the scope of a
`search by combining MeSH terms witn
`title/abstract words. However, a funda¬
`mental problem with this "free-text"
`searching method is that the words used
`in the title and abstract are part of an
`uncontrolled vocabulary. This means
`that no effort has been made to ensure
`that the language used by the author
`conforms to any standard or conven¬
`tion. Therefore, a searcher using these
`free-text representations, rather than
`MeSH terms, may not find relevant ci¬
`tations because the author and searcher
`differ in how they represent a concept.
`For example, if a searcher performs a
`MEDLINE search using the word "Hy-
`perlipidemia" and an author has used
`the narrower term "Hypercholester-
`olemia," then many relevant citations
`may be missed because only those ar¬
`ticles with the word "Hyperlipidemia"
`in their title or abstract will be retrieved.
`However, appropriate use of the MeSH
`term HYPERLIPIDEMIA (using the
`MeSH "Explode" feature described later
`in this article) would find all citations
`indexed with HYPERLIPIDEMIA,
`HYPERCHOLESTEROLEMIA, HY-
`PERLIPOPROTEINEMIA, and HY-
`PERTRIGLYCERIDEMIA, irrespec¬
`tive of the words used by individual au¬
`thors. The lesson here is that the MeSH
`indexing performed by NLM is a form
`of intelligent preprocessing that should
`be taken advantage of whenever pos¬
`sible. Failure to do so is an important
`reason why MEDLINE searches fail.27
`
`Precision and Recall—The Sensitivity
`and Specificity of Searching
`Formal studies of searches using un¬
`controlled free text, such as occurs in
`titles and abstracts, suggest that these
`searches may have a lower recall rate
`than searches performed using index¬
`ing terms such as MeSH. "Recall," de¬
`fined as the number of relevant cita¬
`tions retrieved by a search divided by
`the number of relevant citations in the
`database being searched, is expressed
`in the following equation as:
`Recall =
`
`Number of relevant citations retrieved
`Number of relevant citations in database
`Using the terminology of decision
`analysis, recall can be viewed as the
`sensitivity of the search, in that it mea¬
`sures the ability of the search to re¬
`trieve relevant citations from the data¬
`base.
`A study of 975 MEDLINE searches
`conducted by medical students at Har¬
`vard Medical School suggests that
`MeSH-based searches may be superior
`to free-text searches.
`In this unpub¬
`lished study,
`title-abstract
`free-text
`searches produced significantly lower
`than MeSH-based
`recall
`searches.
`While 31% of all searches in this series
`were title-abstract searches, this group
`comprised 48% of all searches that
`found no citations. Similar results have
`been found with databases other than
`MEDLINE. In 1985, Blair and Maron28
`evaluated the well-known STAIRS au¬
`tomatic text retrieval system as applied
`to a collection of 40 000 free-text docu¬
`ments (approximately 350 000 pages of
`text) and found an average recall rate of
`only 20% (ie, only one in five relevant
`documents were found). They con-
`
`Downloaded From: http://jama.jamanetwork.com/ by a University of Michigan User on 04/20/2015
`
`IBM-1009
`Page 3 of 6
`
`
`
`Table 2.—Medical Subject Headings
`Publication Types, 1994*
`
`(MeSH)
`
`Table 3.—Medical
`Check Tags, 1994
`
`Subject Headings
`
`(MeSH)
`
`Abstract
`Bibliography
`Classical Article
`Clinical Conference
`Clinical Trial
`Clinical Trial, Phase I
`Clinical Trial, Phase II
`Clinical Trial, Phase III
`Clinical Trial, Phase IV
`Comment
`Congress
`Consensus Development
`Conference
`Consensus Development
`Conference, NIH
`Corrected and
`Republished Article
`Current Bio-Obit
`Dictionary
`Directory
`Duplicate Publication
`Editorial
`Festschrift
`Guideline
`Historical Article
`Historical Biography
`Interview
`
`Journal Article
`Legal Brief
`Letter
`Meeting Report
`Meta-analysis
`Monograph
`Multicenter Study
`News
`Overall
`Periodical Index
`Practice Guideline
`Published Erratum
`Randomized Controlled
`Trial
`Retracted Publication
`Retraction of Publication
`Review
`Review Literature
`Review of Reported
`Cases
`Review, Academic
`Review, Mullicase
`Review Tutorial
`Scientific Integrity
`Review
`Technical Report
`
`*NIH indicates National Institutes of Health.
`
`eluded that to achieve acceptable recall,
`use of a manual indexing scheme (such
`as MeSH) is preferable to free-text re¬
`trieval. Searches using MeSH can
`achieve recall rates as high as 90%,29 al¬
`though average recall values ofapproxi¬
`mately 50% are more typical. A number
`of studies have demonstrated that
`MEDLINE recall is related to searcher
`expertise.3032
`using
`Inversely,
`not
`MeSH or using MeSH inappropriately
`can result in search failures.27
`In addition to recall, the searcher is
`also concerned with "precision," which
`is defined as the number of relevant ci¬
`tations retrieved divided by the total
`number of citations retrieved,
`ex¬
`pressed in equation form as:
`Precision =
`Number of relevant citations retrieved
`Total number of citations retrieved
`
`Using the terminology of decision
`analysis, precision can be viewed as the
`specificity of the search, in that it mea¬
`sures the ability of the search to dis¬
`criminate between relevant and nonrel¬
`evant citations.
`The precision of free-text searches is
`usually better than their recall. In the
`STAIRS study, precision was about
`75% vs a recall ofapproximately 20%. In
`the Harvard study cited earlier, preci¬
`sion was not measured directly but
`when asked to assess the relevance of
`retrieved citations, searchers gave an
`average "relevancy score" of 66% for
`MeSH-based searches vs 55% for title-
`abstract searches.
`The relationship of recall to precision
`is dependent on many variables includ¬
`ing the database being searched,
`the
`
`Animal
`Case Report
`Comparative Study
`Female
`Human
`
`In Vitro
`Male
`Support, Non-U.S. Gov't
`Support, U.S. Gov't, Non-P.H.S.
`Support, U.S. Gov't, P.H.S.
`
`1.0·
`
`0.8
`
`I 0.6
`2 0.4
`Q.
`
`0.2
`
`Narrow,
`Specific Query
`Formulation
`
`Broad,
`General Query
`Formulation
`
`0.2
`
`— -1—
`0.4
`0.6
`Recall
`
`0.8
`
`1.0
`
`Fig 3.—Average recall-precision graph (adapted
`from Saltón and McGIII33).
`
`retrieval system, and the searcher's
`information needs, but Saltón and
`McGill33 have suggested a composite
`recall-precision graph that reflects the
`average performance of a retrieval sys¬
`tem for a large number of individual
`queries (Fig 3).
`This graph illustrates that search
`strategies that are designed to maxi¬
`mize recall will tend to retrieve irrel¬
`evant citations and vice versa.34 One of
`the skills required of
`the proficient
`MEDLINE searcher is effectively bal¬
`ancing precision and recall. As in most
`exhaustively indexed databases,
`the
`use of MeSH indexing terms when
`searching MEDLINE tends to enhance
`recall by making it possible to retrieve
`many of the relevant citations.34·35 As
`each important biomedicai concept
`is
`represented by a single MeSH term,
`use ofthat MeSH term when searching
`MEDLINE should retrieve most of the
`citations in which the concept is a sig¬
`nificant topic. This principle is reflected
`in the practice of expert MEDLINE
`searchers who use the MEDLINE Ex¬
`plode feature, described later in this
`article, to ensure inclusion of all MeSH
`terms that might be used to index a
`topic.34
`MeSH can also be used to improve
`precision because MeSH indexers are
`instructed to choose the most specific
`MeSH term that describes a topic.
`Therefore, using highly specific MeSH
`terms to describe a concept in a MED¬
`LINE search query ensures that only
`citations indexed with that term are re¬
`this principle is re¬
`trieved. Again,
`flected in the strategy of expert search-
`
`(MeSH)
`
`Table 4.—Medical Subject Headings
`Topical Subheadings, 1994
`Abnormalities
`Administration & Dosage
`Adverse Effects
`Analogs & Derivatives
`Analysis
`Anatomy & Histology
`Antagonists & Inhibitors
`Biosynthesis
`Blood
`Blood Supply
`Cerebrospinal Fluid
`Chemical Synthesis
`Chemically Induced
`Chemistry
`Classification
`Complications
`Congenital
`Contraindications
`Cytology
`Deficiency
`Diagnosis
`Diagnostic Use
`Diet Therapy
`Drug Effects
`Drug Therapy
`Economics
`Education
`Embryology
`Enzymology
`Epidemiology
`Ethnology
`Etiology
`Genetics
`Growth & Development
`History
`Immunology
`Injuries
`Innervation
`Instrumentation
`Isolation & Purification
`Legislation &
`Jurisprudence
`
`Manpower
`Metabolism
`Methods
`Microbiology
`Mortality
`Nursing
`Organization &
`Administration
`Parasitology
`Pathogenicity
`Pathology
`Pharmacokinetics
`Pharmacology
`Physiology
`Physiopathology
`Poisoning
`Prevention & Control
`Psychology
`Radiation Effects
`Radiography
`Radionuclide Imaging
`Radiotherapy
`Rehabilitation
`Secondary
`Secretion
`Standards
`Statistical & Numerical
`Data
`Supply & Distribution
`Surgery
`Therapeutic Use
`Therapy
`Toxicity
`Transmission
`Transplantation
`Trends
`Ultrasonography
`Ultrastructure
`Urine
`Utilization
`Veterinary
`
`ers who use the MeSH Tree Structures
`(described later in this article) to find
`the most specific MeSH term describing
`a concept.
`The successful application of these
`searching principles assumes high-
`quality MeSH indexing. In 1983 Funk et
`al25 concluded that "MEDLINE, with
`its excellent controlled [MeSH] vocabu¬
`lary, exemplary quality control, and
`highly trained indexers, probably rep¬
`resents the state of the art in manually
`indexed data bases."25 Since then NLM
`has significantly enhanced MeSH to
`cover many new concepts in areas such
`as the acquired immunodeficiency syn¬
`drome (AIDS), genetics, immunology,
`medical informatics,36 and molecular bi¬
`ology. Over $2 million and 44 full-time
`equivalent indexers are used each year
`by NLM to ensure optimal indexing of
`MEDLINE5
`THE MeSH TREE STRUCTURES
`The MeSH vocabulary is not simply a
`list of approximately 17000 concept
`terms. It is organized into a complex
`hierarchy called the "MeSH Tree Struc¬
`tures." In this hierarchy the MeSH terms
`are arranged into a set of branching,
`treelike structures of increasing speci¬
`ficity. Figure 4 illustrates this organi¬
`zation using a portion of the MeSH Tree
`dealing with Intestinal Neoplasms.
`
`Downloaded From: http://jama.jamanetwork.com/ by a University of Michigan User on 04/20/2015
`
`IBM-1009
`Page 4 of 6
`
`
`
`MeSH Diseases
`
`Neoplasms
`
`£N
`
`eoplasms by Site
`
`y D
`
`igestive System Neoplasms
`JL
`Gastrointestinal Neoplasms
`
`Intestinal Neoplasms
`
`Colorectal Neoplasms
`
`Fig 5.—Path through Medical Subject Headings
`(MeSH) Tree to COLORECTAL NEOPLASMS.
`
`FAILURE, CONGESTIVE is the only
`MeSH term used to represent all types
`of cardiac failure. How can a MEDLINE
`searcher find the MeSH terms repre¬
`senting the concepts he or she wishes to
`search for? A number of approaches to
`this problem are possible using MeSH.
`MeSH contains a large set of Entry
`Terms that are used to map non-MeSH-
`concept descriptors to appropriate
`MeSH terms. For example, the MeSH
`Entry Term WILSON DISEASE in¬
`vokes the MeSH term HEPATOLEN-
`TICULAR DEGENERATION. Most
`MEDLINE searching systems support
`the use of Entry Terms and this gives
`the searcher some freedom in using com¬
`monly used biomedicai terms even ifthey
`are not MeSH terms. Another strategy
`useful in finding MeSH terms is to search
`MEDLINE for a known article on a sub¬
`ject (perhaps searching by author or one
`or more words in the title/abstract fields)
`and then view the MeSH terms used to
`index that citation. Searchers can also
`use the MeSH Tree to find MeSH terms
`by beginning from a very general cat¬
`egory and browsing through the hier¬
`archy until a specific term is identified.
`Many MEDLINE access systems pro¬
`vide assistance in choosing MeSH terms
`but they vary in their scope and ease of
`use. What is needed is a method for
`converting the searchers uncontrolled
`natural language to MeSH. Given the
`richness and complexity oflanguage, this
`is not a trivial problem. To take advan¬
`tage of MeSH as a unique gateway to
`the biomedicai literature, searchers need
`tools to help them overcome both the
`complexity of the MeSH thesaurus and
`the problems implicit in using a highly
`controlled vocabulary system to create
`information retrieval queries. A num¬
`ber of computer-based tools have been
`developed to address this problem.
`
`Intestinal
`Neoplasms
`
`Cecal Neoplasms
`Colonie Neoplasms
`Duodenal Neoplasms
`Heal Neoplasms |
`Intestinal Polyps
`
`-| Jejunal Neoplasms]
`Rectal Neoplasms
`
`| Appendiceal Neoplasms |
`Polyposis Syndrome, Familial
`Colonie Polyps
`-| Colorectal Neoplasms \
`Colorectal Neoplasms,
`Hereditary Nonpolyposls
`H Sigmoid Neoplasms]
`Colonie Polyps |—| Polyposis Syndrome, Familial
`Gardner Syndrome |
`Peutz-Jeghers Syndrome |
`
`Anus Neoplasms
`
`Anal Gland Neoplasms
`
`Fig 4.—Sample Medical Subject Headings (MeSH) Tree Structure.
`
`The MeSH Tree Structures support a
`number ofuseful strategies when search¬
`ing MEDLINE. By traversing the tree
`to the most specific term representing a
`concept, the searcher can create very
`specific search queries. When indexing
`new MEDLINE citations NLM index¬
`ers use the most specific MeSH term
`available. For example, in the sample
`citation shown in Fig 1 the specific term
`COLORECTAL NEOPLASMS is used
`rather than the more general INTES¬
`TINAL NEOPLASMS. Being aware of
`this indexing practice and using the
`MeSH Tree to find the mos+ ..j^eific
`term allows the searcher to improve the
`precision of a MEDLINE search, thus
`reducing the number of irrelevant cita¬
`tions retrieved.
`The MeSH Tree also allows one to
`broaden the scope of a search and there¬
`fore improve recall. For example, to re¬
`trieve MEDLINE citations dealing with
`screening strategies for all intestinal can¬
`cers (not just COLORECTAL NEO¬
`the searcher can use the
`PLASMS),
`MeSH Tree to find the more general
`term INTESTINAL NEOPLASMS.
`However, instead of searching with just
`this term, one would Explode INTES¬
`TINAL NEOPLASMS to retrieve all
`citations indexed with that term as well
`as citations indexed with any of its more
`specific descendants arranged beneath
`it in the MeSH Tree (all MeSH terms
`contained in Fig 4).
`Exploding a term is a useful strategy
`if a MEDLINE search produces too few
`citations or if the searcher feels that
`recall is unacceptable. Indeed many ex¬
`pert searchers would say that inclusive
`searches ("Explosions") are the prefer¬
`able strategy in general since their use
`usually increases the retrieval of rel¬
`evant citations. This increased recall may
`be accompanied by a parallel increase in
`the retrieval of irrelevant citations, as
`predicted by Salton's average recall-pre¬
`cision graph (Fig 3).
`
`The MeSH Tree Structures provide
`not only a way to vary the specificity of
`a search but also a method for finding
`MeSH terms when only the general con¬
`cept area is known. For example, one
`can easily find the more specific term
`COLORECTAL NEOPLASMS by en¬
`tering the MeSH Tree at NEOPLASMS
`and traversing the path as shown in
`Fig 5.
`THE PROBLEM OF LANGUAGE
`As the volume and complexity ofmedi¬
`cal knowledge increase so too does the
`language used to describe that knowl¬
`edge. Each specialist field has its own
`subvocabulary that can serve as a bar¬
`rier to the nonspecialist. The increasing
`importance ofhighly technical fields such
`as molecular genetics, biotechnology, and
`medical informatics will surely deepen
`the language divide between specialist
`and nonspecialist. To effectively retrieve
`information from large biomedicai da¬
`tabases, the searcher must be able to
`express a query in the language appro¬
`priate to the target domain. Translating
`from the searcher's own vocabulary to
`the appropriate domain vocabulary is a
`fundamental problem in information re¬
`trieval.
`MeSH is the canonical
`language of
`MEDLINE. It is a difficult vocabulary
`to master. The official printed reference
`consists of three volumes containing
`more than 2300 pages of text and weighs
`approximately 5.5 kg (12.1 lb).26 Despite
`extensive cross-referencing and the use
`of MeSH "entry terms" that link com¬
`monly used terms to MeSH terms, the
`inexperienced or infrequent MEDLINE
`searcher may still have difficulty find¬
`ing appropriate MeSH terms.
`The fundamental difficulty with any
`controlled vocabulary such as MeSH is
`finding the precise term used to repre¬
`sent a concept. Many associated con¬
`cepts may map to a single canonical
`MeSH term. For example, HEART
`
`Downloaded From: http://jama.jamanetwork.com/ by a University of Michigan User on 04/20/2015
`
`IBM-1009
`Page 5 of 6
`
`
`
`The NLM's Unified Medical Language
`System (UMLS) project38"40 should be of
`considerable value in reducing the dif¬
`ficulties inherent in mapping standard
`biomedicai terms to MeSH. The UMLS
`Metathesaurus contains a rich set oflinks
`between MeSH Terms and related con¬
`cepts in classification systems such as
`the American Psychiatric Association's
`Diagnostic and Statistical Manual of
`Mental Disorders (DSM) the American
`College of Pathologists' Systematized
`Nomenclature ofMedicine (SNOMED),
`the American Medical Association's Cur¬
`rent Procedural Terminology (CPT), the
`International Classification ofDi