`
`
`WORLDWIDE WEB JOURNAL
`
`J_ (erraya
`
`XML
`
`Principles, Tools,
`
`and Techniques
`
`
`
`
`
`O’REILLY"
`
`
`
`IBM-1012
`
`Page 1 of 17
`
`IBM-1012
`Page 1 of 17
`
`
`
`WORLD WIDE WEB JOURNAL
`. XML: PRINCIPLES, TOOLS, AND TECHNIQ UES Volume 2, Issue 4, Fall 1997
`
`Publisher: Dale Dougherty
`
`Guest Editor.- Dan Connolly
`
`Series Editor: Rohit Khare
`
`Managing Editor: Donna Woonteiler
`News Editor.- DC. Denison
`
`Production Editor: Nancy Crumpton
`
`Tecbnicai Illustrator: Robert Romano
`
`Software Tools specialist: Mike Sierra
`
`Quality Assurance: Ellie Fountain Maden
`Cover Design: Hanna Dyer
`
`Text Design: Nancy Priest, Marcia Ciro
`
`Subscription Administrator: Marianne Cooke
`Photos: Flint Born
`
`ISBN: 156592—3499
`
`The individual contributions are copyrighted by the authors or their respective employers. The print
`compilation is Copyright © 1997 O’Reilly & Associates, Inc. All rights reserved. Printed in the United
`States of America.
`
`Many of the designations used by manufacturers and sellers to distinguish their products are claimed
`as trademarks. Where those designations appear in this book, and O’Reilly 8; Associates, Inc. was aware
`of a trademark claim, the designations have been printed in caps or initial caps.
`
`While every precaution has been taken in the preparation of this book, the publisher assumes no
`responsibility for errors or omissions, or for damages resulting from the use of the information
`contained herein.
`
`This book is printed on acid—free paper with 85% recycled content, 15% posteconsumer
`{X}
`Q waste. O’Reilly 8: Associates is committed to using paper With the highest recycled content
`
`available consistent With high quality.
`ISSN: 1085—2301
`
`IBM-1012
`
`Page 2 of 17
`
`IBM-1012
`Page 2 of 17
`
`
`
`
`Arnaud Le Hors
`Architecture
`iehors@w3.org
`Dan Connolly
`Domain Leader
`Hakon Lie
`howcome@w3.org
`connoliy@w3.org
`Jim Gettys
`Chris Liiiey
`chris@w3.org
`ig@w3. org
`Philipp Hoschka
`Masayasu ”Mimasa” ishikawa
`hoschka@w3.org
`mimasa@w3.org
`Youichirou Koga
`Dave Raggett
`dsr@w3.org
`yvkoga@w3.org
`Yves Lafon
`lréne Vatton
`lafon@w3.org
`vatton@w3.org
`Ora Lassila
`lassila@w3.org
`Henrik Frystyk Nielsen
`irystyk@w3.org
`Daniel Veiliard
`veiiiard@w3.org
`
`W3C Administration
`Jean-Francois Abramatic
`W3C Chairman and Associate
`Director, MIT Laboratory for
`Computer Science
`jfa@w3.org
`Tim Berners-Lee
`Director of the W30
`timbl@w3.org
`Vincent Quint
`Deputy Director for Europe
`quint@w3.org
`Nobuo Saito
`W30 Associate Chairman
`and Dean, Keio University
`nobuo.saito@w3,org
`Alan Kotok
`W30 Associate Chairman
`kotok@w3.org
`Tatsuya Hagino
`Deputy Director for Asia
`hagino@w3.org
`
`-‘
`
`:
`
`User interface
`Vincent Quint
`Domain Leader
`quint@w3.org
`Bert Hos
`bert@w3.org
`Ramzi Guetari
`guetari@w3.org
`
`lose’ Kahan
`kahan@w3.org
`Sally Khudairi
`khudairi@w3.org
`Stephan Montigaud
`montigaud@w3.org
`Gerald Oskoboiny
`geraid@w3.org
`Luc Ottavj
`ottavj@w3. org
`Pierre Fiilauit
`fliiauit@w3.org
`Takeshi "Yamachan" Yamane
`yamachan@w3,org
`
`Administrative Support
`Pamela Ahern
`pam@w3.org
`susan Hardy
`susan@w3,org
`MarieLine Ramfos
`ramfos@w3.org
`Josiane Roberts
`roberts@inria.ir
`
`Nancy Ryan
`ryan@w3.org
`Yukari Mitsuhashi
`yukari@w3.org
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Technology and Society
`Jim Miller
`Domain Leader
`jmiiier@w3.org
`Eui-Suk Chung
`euisuk@w3.org
`Daniel Dardaiiier
`danieid@w3.org
`Philip DesAutels
`philiod@w3.org
`Josef Dieii
`jdieti@w3.org
`Joseph Reagle
`reagie@w3.org
`Ralph Swick
`swick@w3.org
`
`Cross Areas and
`Technical Support
`Janet Bertot
`bertot@w3.org
`Stephane Boyera
`boyera@w3.org
`Daicho Funato
`daichi@w3sorg
`Tom Greene
`tjg@w3.org
`
`
`
`IBM-1012
`
`Page 3 of 17
`
`IBM-1012
`Page 3 of 17
`
`
`
`Consulting, Inc. Poet Soft
`
`Primrose
`Pretty G
`
`
`
`IBM-1012
`
`Page 4 of 17
`
`ABNVAMRO Bank
`
`Access Company Limited
`Adobe Systems Inc.
`Ae’rospatiale
`AGFSI
`
`Agfa Division, Bayer Corp.
`Agranat Systems, Inc.
`AIcatel Aisthom Recherche
`AlfaAOmega Foundation
`AIis Technologies, Inc.
`America Oniine, Inc.
`American International Group Data
`Center, Inc. {AlG}
`American Internet Corporation
`Apple Computer, Inc.
`ArborText, Inc.
`Architecture Projects
`Management Ltd.
`ArrowPoint Communications
`Art Technology Group
`Asymetrix Corporation
`AT&T
`
`Attachmate Corporation
`BackWeb Technologies, Inc,
`BELGACOM
`Bel/core
`Bitstream, Inc.
`British Telecommunications
`Laboratories
`Bull SA.
`Canal +
`Canon, Inc.
`Cap Gemini Innovation
`Center for Democracy
`and Technology
`Center for Mathematics and
`Computer Science (CWI)
`CERN
`CIRAD
`
`CNETeThe Computer Network
`CNR—Instltuto Elaborazione
`deii’lnformazione
`CNRS
`
`Commissariat a L’Energie
`Atomioue (CEA)
`CompuServe, Inc.
`Computer Answer Line
`Corporation for National Research
`Initiatives (CNRI)
`CosmosBay
`Council for the Central
`Laboratory of the Research
`Councils (CCL)
`CyberCash, Inc.
`Cygnus Support
`Daewoo Electronics Company
`Dassault Aviation
`Data Channel
`
`Data Research Associates, inc.
`Defense Information Systems
`Agency {DISA}
`Deutsche Teiekom—Oniine Pro
`Dienste GmbH R Co. KG
`(T—DnIine)
`Digital Equipment Corporation
`Digital Vision Laboratories
`Corporation
`DigitalSty/e Corporation
`Direct Marketing Association, Inc.
`DoubleC/Ick
`
`Eastman Kodak Company
`Ecole Nationale Supe‘rieure
`d’Informatique et de
`Mathe’matiques
`Applique'es (ENSIMAG)
`EDF
`
`EEIG/ERCIM
`ENEL
`
`Engage Technologies
`ENN Corporation
`Enterprise Integration Technology
`Entrust Technologies, Inc.
`ERICSSON
`
`Ernst & Young LLP
`ETNO TEAM S.p.A.
`Firefly Network, Inc.
`First Virtual Holdings, Inc.
`
`FirstFloor Software, Inc.
`Folio Corporation
`Foundation for Research and
`Technology (FORTH)
`France Telecom
`Fujitsu Limited
`Fulcrum Technologies, Inc.
`GCTECH S.A.
`GEMPLUS
`
`General Magic, Inc.
`Geoworks
`GMD Institute FIT
`Graphic Communications
`Association
`Grenoble Network Initiative
`GR/F S.A.
`Groupe ESC Grenoble
`Harlequin Inc.
`HA VAS
`Hewlett Packard
`Laboratories, Bristol
`Hitachi, Ltd.
`@Home Network
`Hong Kong Jockey Club
`Hummingbird Communications Ltd.
`IBERDROLA S.A.
`IBM Corporation
`ILOG, S.A.
`InContext Systems
`Industrial Technology
`Research Institute
`Infopartners
`INRETS
`
`Inso Corporation, Providence
`Institut FrancoeRusse A.M.
`Liapunov
`Institute for Information Industry
`Intel Corporation
`lntermind
`
`lnternet Profiles Corporation
`Intraspect Software, Inc.
`Joint Info. Systems Comm. of the
`UK Higher Ed. Funding Council
`
`Justsystem Corporation
`K2Net, Inc.
`KnowiedgeCite
`Kumamoto Institute of Computer
`Software, inc.
`Lexmark International, Inc.
`Los Alamos National Laboratory
`Lotus Development Corporation
`Lucent Technologies
`Mainspring Communications, Inc.
`Marimba, Inc.
`Matra Hachette
`MBED Software
`MCI Telecommunications
`Metrowerks Corporation
`Michelin
`
`Microsoft Corp.
`Microsystems Software, Inc.
`MITRE Corporation
`Mitsubishi Electric Corporation
`MTA SZTAKI
`Narrowiine
`National Center for
`Supercomputing
`Applications (NCSA)
`National Security Agency (NSA)
`National University of Singapore
`NCR
`
`NEC Corporation
`
`Netscape Communications
`NHS {National Health Service, UK)
`Nippon Telegraph & Telephone
`Corp. (NiT)
`NOKIA Corporation
`Novell, Inc.
`NU Data Communications
`Systems Corp.
`Nynex Science & Technology, Inc.
`O’Reiliy & Associates, Inc.
`Object Management Group,
`Inc. {OMG}
`Object Services and
`
`IBM-1012
`Page 4 of 17
`
`
`
`
`
`
`
`
`
`Thomson-63F
`SottOuad. Inc.
`Progressive Networks
`0CLC (Online Computer Library
`TIAA-CREF
`Software Publishers Association
`Center, Inc.)
`Public IP Exchange, Ltd. (PIPEX)
`Toshiba Corporation
`(SPA)
`Omron Corporation
`Qua/comm Inc.
`TriTeaI Corporation
`Sony Corporation
`Raptor Systems, Inc.
`Open Market, Inc.
`TRUSTe
`Spyglass, Inc.
`ReedrEIsevier
`Open Sesame
`UKERNA
`
`Strategic interactive Group
`Reuters Limited
`Open Software Associates, Inc.
`Unwired Planet
`
`Sun Microsystems Corporation
`Rice University for Nat’l
`Open Software Foundation
`SURFnet bv
`USWeb Corporation
`HCPP Software
`
`Open Text Corporation
`Swedish Institute for Systems
`VeriSign, inc.
`Riveriand Holding NV/SA
`Oracle Corp.
`
`Development (SISU)
`Verity, Inc.
`ORSTOM
`Royal Melbourne institute of
`
`Syracuse University
`Technology
`Vignette Corporation
`Pacifitech Corporation
`
`Tandem Computers Inc.
`VTT Information Technology
`Security Dynamics
`Partners HealthCare System, Inc.
`
`Technische Universitat Graz
`Technologies, Inc.
`webMethods, Inc.
`Pencom Web Works
`
`Sema Group
`Teknema Corporation
`WebTV Networks Inc.
`
`Philips Electronic N. V.
`
`Telecom Italia
`Wolfram Research, Inc.
`Sharp Corporation
`
`Poet Software Corporation
`SICS
`Telequip Corporation
`WWW. Consult Pty Ltd.
`PointCast Incorporated
`
`Siemens-Nixdorf
`WWW—KR
`Terisa Systems, Inc.
`
`Pretty Good Privacy, inc.
`
`Tercel Group
`Silicon Graphics, Inc.
`Xerox Corporation
`
`Prodigy Services Corporation
`SLIGOS
`Xionics Document
`The Productivity Works, inc,
`
`Technologies, Inc.
`
`
`
`IBM-1012
`
`Page 5 of 17
`
`IBM-1012
`Page 5 of 17
`
`
`
`
`
`CONTENTS
`
` 1
`
`EDITORIAL
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Guest Editor Dan Connolly and Series Editor Rohit
`Khare team up to herald the appearance of XML
`and discuss its evolution.
`
`
`XML BACKGROUND
`
`Members of the W3C's XML Editorial Review
`
`Board talk about the road to XML: its history,
`breakthroughs, the participation of Microsoft and
`Netscape, and the work that remains.
`
`
`WORK IN PROGRESS
`13
`
`In “The Web Is Ruined and l Ruined lt”self-
`proclaimed HTML Terrorist David Siegel discusses
`how proper separation of structure (HTML), style
`{683), and semantics (XML) make content more
`compelling and design more effective
`
`
`TIMELINE
`22
`
`
`W3C REPORTS
`27
`
`Recent, noteworthy W3C events
`
`See next page for detailed listing
`
`
`TECHNICAL PAPERS
`95
`
`See next page for detailed listing
`
`
`
`This ASSUG ’3
`cover image was
`photographed by
`Kevin Thomas and
`manipulated in
`Adobe Photoshop 4.0
`by Edie Freedman.
`
`
`
`
`
`
`
`
`IBM-1012
`
`Page 6 of 17
`
`IBM-1012
`Page 6 of 17
`
`
`
`CONTENTS
`
`W3C REPORTS
`
`Extensible Markup Language (XML)
`TLM BRAY, JEAN PAOLI, C.M. SPERBERG—MCQUEEN
`
`
`
`Extensible Markup Language (XML)
`
`Part 2: Linking
`TIM BRAY, STEVE DEROSE
`
`H TML-Math:
`
`Mathematical Markup Language Working Draft
`ROBERT R. MINER, PATRICK D. F. ION
`
`
`
`Document Object Model Requirements
`LAUREN WOOD, JARED SORENSEN
`
`
`
`TECHNICAL PAPERS
`
`A Guide to XML
`NORMAN WALSH
`
`
`
`Table of Content5
`
`
`
`
`
`XML and CSS
`STUART CULSHAw, MICHAEL LEVENTHAL, AND MURRAY MALONEY
`
`The Evolution of Web Documents:
`
`The Ascent of XML
`DAN CONNOLLY, ROHiT KHARE, ADAM RIFKIN
`
`Embedded Markup Considered Harmful
`THEODOR HOLM NELSON
`
`IBM-1012
`
`Page 7 of 17
`
`IBM-1012
`Page 7 of 17
`
`
`
`
`
`
`
`
`M C
`
`ONTENTS
`
`W
`
`WEB
`
`Chemical Markup Language:
`A Simple Introduction to Structured Documents
`PEiER MURRAY‘RUST
`
`Codifylng Medical Records in XML:
`Philosophy and Engineering
`THOMAS L. LINCOLN
`
`1
`
`;
`l
`
`1
`)1
`
`XML: Can the Desperate Perl Hacker Do it?
`
`
`
`MICHAEL LEVENTHAL
`
`XML: From Bytes to Characters
`BERT B05
`
`
`
`An introduction to XML Processing with Lark
`'IM BRAY
`
`
`
`135
`
`149
`
`153
`
`165
`
`177
`
`187
`
`197
`
`207
`
`219
`
`229
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`IBM-1012
`
`Page 8 of 17
`
`Table of Contents
`
`K
`
`(
`
`l
`
`é
`“
`i
`
`'2,
`
`Building XML Parsers for Microsoft's iE4
`EAN PAOLI, DAVID SCHACH, CHRIS LOVEIT, ANDREW LAYMAN, iSWAN CSERI
`
`
`
`JUMBO: An Object—Based XML Browser
`
`JEIER MURRAY-RUST
`
`
`
`Capturing the State of Distributed Systems with XML
`XOHIT KHARE, ADAM RiFKIN
`
`
`
`XML, Java, and the Future of the Web
`ON BOSAK
`
`WiDL:Appiicatian integration with XML
`CHARLES ALLEN
`
`
`
`
`
`
`
`IBM-1012
`Page 8 of 17
`
`
`
`
`
`
`
`XML, JAVA,
`
`AND THE FUTURE OF THE WEB
`
`W3i
`
`fun Bosa/e
`
`introduction
`
`The extraordinary growth of the World Wide
`Web has been fueled by the ability it gives
`authors to easily and Cheaply distribute electronic
`documents to an international audience. As Web
`
`documents have become larger and more come
`plex, however, Web content providers have
`begun to experience the limitations of a medium
`that does not provide the extensibility, structure,
`and data checking needed for large—scale com—
`mercial publishing. The ability of Java applets to
`embed powerful data manipulation capabilities in
`Web clients makes even clearer the limitations of
`current methods for the transmittal of document
`data.
`
`To address the requirements of commercial Web
`publishing and enable the further expansion of
`Web technology into new domains of distributed
`document processing, the World Wide Web Con-
`sortium has developed an Extensible Markup
`Language (XML)
`for applications
`that
`require
`functionality
`beyond the
`current Hypertext
`Markup Language (HTML). This paper describes
`the XML effort and discusses new kinds of Java-
`based Web applications made possible by XML.”
`
`Background: HTML and SGML
`Most documents on the Web are stored and
`
`transmitted in HTML. HTML is a simple language
`well suited for hypertext, multimedia, and the
`display of small and reasonably simple docu—
`ments. HTML is based on SGML (Standard Gener~
`alized Markup Language, ISO 8879), a standard
`system for defining and using document formats.
`
`SGML allows documents to describe their own
`grammarfithat is, to specify the tag set used in
`the document and the structural relationships that
`those tags
`represent, HTML applications are
`applications that hardwire a small set of tags in
`conformance with a single SGML specification.
`Freezing a small set of tags allows users to leave
`the language specification out of the document
`and makes it much easier to build applications,
`but this case comes at the cost of severely limit~
`ing HTML in several
`important respects, chief
`among which are extensibility, structure, and val—
`idation,
`
`- Extensibility. IITML does not allow users to
`specify their own tags or attributes in order
`to parameterize or otherwise semantically
`qualify their data.
`
`0 Structure. HTML does not support the speci—
`fication of deep structures needed to repre—
`sent database schemas or object—oriented
`hierarchies.
`
`0 Validation, HTML does not support the kind
`of language specification that allows con—
`suming applications to check data for struc—
`tural validity on importation.
`
`to HTML stands generic SGML. A
`In contrast
`generic SGML application is one that supports
`SGML language specifications of arbitrary com—
`plexity and makes possible the qualities of exten—
`sibility, structure, and validation missing from
`HTML. SGML makes it possible to define your
`own formats for your own documents, to handle
`large and complex documents, and to manage
`large
`information repositories. However,
`full
`SGML contains many optional features that are
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`* This paper, first published on the Web on November 17, 1996 [1], was revised March 10, 1997 [Z] and is here pre—
`sented in a form edited for the World Wide W'ebjoztrlmi
`
`
`Technical Papers
`
`
`
`IBM-1012
`
`Page 9 of 17
`
`IBM-1012
`Page 9 of 17
`
`
`
`not needed for Web applications and has proven
`to have a cost/benefit
`ratio unattractive to cur-
`rent vendors of Web browsers.
`
`The XML Effort
`
`The World Wide Web Consortium (W3C) has cre-
`ated an SGML Working Group to build a set of
`specifications to make it easy and straightforward
`to use the beneficial features of SGML on the
`Web. See the W3C SGML/XML Activity page [3]
`for the current status of this effort. The goal of the
`W5C SGML activity is to enable the delivery of
`self—describing data structures of arbitrary depth
`and complexity to applications that require such
`structures.
`
`The first phase of this effort is the specification of
`a simplified subset of SGML specially designed
`for Web applications. This subset, called XML
`(Extensible Markup Language), retains the key
`SGML advantages of extensibility, structure, and
`validation in a language that is designed to be
`vastly easier to learn, use, and implement than
`full SGML.
`
`XML differs from HTML in three major respects:
`
`1. Information providers can define new tag
`and attribute names at will.
`
`. Document structures can be nested to any
`level of complexity,
`
`. Any XML document can contain an optional
`description of its grammar for use by appli—
`cations that need to perform structural vali-
`dation.
`
`XML has been designed for maximum expressive
`power, maximum teachability, and maximum
`ease Of
`implementation. The language is not
`backward—compatible with existing HTML docu—
`ments, but documents conforming to the W3C
`HTML 3.2 specification can easily be converted to
`XML, as can generic SGML documents and docu-
`ments generated from databases.
`
`Tire first working draft of XML was announced
`November 1996 at the SGML 96 Conference. A
`
`220
`
`XML: Principles, Tools, and Techniques
`
`
`major revision of the draft was announced at the
`Sixth World Wide Web Conference in April 1997,
`XML 1.0 is currently scheduled for recommenda-
`tion to the W3C Advisory Council during October
`l997. See the W3C XML page [3] for links to the
`latest draft.
`
`Web Applications of XML
`
`The applications that will drive the acceptance of
`XML are those that cannot be accomplished
`Within the the limitations of HTML. These appli—
`cations can be divided into four broad categories:
`
`1. Applications that require the Web client to
`mediate between two or more heteroge—
`neous databases.
`
`. Applications that attempt to distribute a sig-
`nificant proportion of the processing load
`from the Web server to the Web client.
`
`. Applications that require the Web client to
`present different views of the same data to
`different users.
`
`. Applications in which intelligent Web agents
`attempt to tailor information discovery to the
`needs of individual users.
`
`The alternative to XML for these applications is
`proprietary code embedded as “script elements”
`in HTML documents and delivered in conjunction
`with proprietary browser plug-ins or Java applets.
`XML derives from a philosophy that data belongs
`to its creators and that content providers are best
`served by-a data format that does not bind them
`to particular script
`languages, authoring tools,
`and delivery engines but provides a standardized,
`vendor—independent,
`level playing field upon
`which different authoring and delivery tools may
`freely complete.
`
`Database Interchange:
`The Universal Hub
`
`A paradigmatic example of this first category of
`XML applications is the information tracking sys-
`tem for a home health care agency.
`
`
`
`
`
`
`7 IBM1-102 .
`
`Page 10 of 17
`
`IBM-1012
`Page 10 of 17
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
` l.i-ll
`
`l
`
`
`
`
`
`Home health care is a major component of Amer-
`ica’s multibillion—dollar medical industry that con-
`tinues to increase in importance as the health
`care burden is shifted from hospitals to home
`care settings. Information management is critical
`to this industry in order to meet the record—keep—
`ing requirements of the federal agencies and
`health maintenance organizations that pay for
`patient care.
`
`The typical patient entering a home health care
`agency is represented to the information system
`by a large collection of paper—based historical
`materials in the form of patient medical histories
`and billing data from a variety of doctors, hospi—
`tals, pharmacies, and insurance companies. The
`biggest task in getting the patient into the system
`is
`the manual entry of this material
`into the
`agency’s database
`
`The coming of the Web has given the medical
`informatics community the hope that an elec—
`tronic means can be found to alleviate this bur-
`den.‘ Unfortunately, existing Web applications
`represent fundamentally insufficient models for
`an adequate solution. Hospitals have begun to
`offer the agencies a solution that goes something
`like this:
`
`patient data from the Web browser and keying it
`directly into the agency’s online forms-based
`interface in a separate window instead of making
`a printout first. The only difference between this
`version and the previous one is that it saves the
`paper that would have been needed for the print—
`out.
`It does nothing to address the root of the
`problem. A real solution would look more like
`this:
`
`1. Log into the hospital’s Web site.
`
`2. Become an authorized user.
`
`5. Access the patient’s medical records in a
`Web—based
`interface
`that
`represents
`the
`records for that patient with a folder icon.
`
`4. Drag the folder from the Web application
`over to the internal database application.
`
`5. Drop it into the database.
`
`However, this solution is not possible within the
`limitations of HTML, for three reasons.
`
`- The HTML tag set is too limited to represent
`or differentiate between the multitude of
`database fields in the mixture of documents
`making up the patient’s medical history.
`
`- HTML is incapable of representing the vari-
`ety of structures in those documents.
`
`- HTML lacks any mechanism for checking the
`data for structural validity before the receiv-
`ing application attempts to import it into the
`target database.
`
`One technically feasible way to implement seam-
`less interchange of patient care records is simply
`to require all hospitals and health care agencies
`to use a single standard system dictated by the
`government (such an approach has actually been
`suggested).
`In an environment where hospitals
`are going out of business on a daily basis and
`many health care agencies are in deep financial
`
`1. Log into the hospital’s Web site.
`
`2. Become an authorized user.
`
`3. Access the patient’s medical records using a
`Web browser.
`
`4. Print out the records from the browser.
`
`5. Manually key in the data from the printouts.
`
`this
`The knowledgeable reader may smile at
`“solution,” but in fact this is not a joke; this is an
`actual proposal from a large American hospital
`known for its early adoption of advanced medical
`information systems.
`
`this
`A slightly more sophisticated version of
`“solution” envisions the operator
`reading the
`R,
`
`
`
`
`
`
`
`
`
`
`
`
`
`" For more information we refer you to Lincoln Stein’s article, “Electronic Medical Records: Promises and Threats,"
`in the Summer 1997 issue of the W5] entitled Web Security A Matter of Trust.
`
`Technical Papers
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`IBM-1012 '
`
`Page 11 of 17
`
`
`
`
`IBM-1012
`Page 11 of 17
`
`
`
`difficulty, however, a scheme that en masse is
`hardly practical.
`
`The other way to enable interchange between
`heterogeneous systems is to adopt a single indus—
`try—wide interchange format that serves as the sin—
`gle output format for all exporting systems and
`the single input format for all importing systems.
`This is, in fact, the purpose for which SGML was
`initially designed, and XML simply carries on this
`tradition.
`
`allergic to penicillin. The ability of XML to define
`tags specific to an area of application is critical to
`this scenario, because the otherwise unqualified
`word “penicillin" in the thousands of pages of a
`patient’s entire medical history could not trigger
`the recognition that
`the same word inside an
`<allergies> element could trigger.
`
`The health care example is relevant not only
`because of the scope of the problem and the
`enormous sums of money involved but also
`because it is paradigmatic of a very wide range of
`future Web applications—any in which Web cli-
`ents (or Java applications running on those cli—
`ents)
`are
`expected to mediate
`the
`lossless
`exchange of complex data between systems that
`use different forms of data representation in a
`way that can be standardized across an industry
`or other interest group. Some random examples
`of such applications are:
`
`- Legal publishing
`
`The government drug approval process
`Collaborative CAD/CAM efforts
`
`XML: Principles, Tools, and Techniques
`
`
`A number of industries, including the aerospace,
`automotive,
`telecommunications, and computer
`software industries, have been using. hub lan-
`guages to perform data interchange for years, and
`by this time the process is well understood. Typi—
`cally,
`the major players in an industry form a
`standards consortium tasked with defining a Doc-
`ument Type Definition, which is
`the way in
`which the tag set and grammar of a markup lane
`guage are defined. This DTD can then be sent
`with documents that have been marked up in the
`industry standard language using off-the-shelf
`editing tools, and any standard application on the
`receiving end can validate and process them.
`The XML solution is system-independent, vendor—
`independent, and proven by over a decade of
`SGML implementation experience XML merely
`extends this proven approach to document inter~
`change over the Web. Interestingly, the same day
`on which the first XML 1.0 draft was released also
`saw the formal announcement of an SGML initia-
`tive within HL7,
`the standards organization for
`health care IS vendors, to develop a Health Care
`Markup Language designed to solve exactly the
`kind of problem described in this example.
`
`Previous vertical-industry efforts have shown that
`capturing data in a rich markup often has benefits
`beyond the immediate requirements of data
`exchange.
`In.
`a well—designed
`standardized
`patient data system, for example, specific infor—
`mation originally gathered in the course of a rou—
`tine physical exam and tagged <allergies>, <drug—
`reactions>, and so on would instantly be avail-
`able to alert the staff of an emergency room that
`an unconscious patient from a distant city was
`222
`
`Collaborative calendar management across
`different systems
`
`corporate network application that
`Any
`works across databases, especially where
`policies must be enforced: purchase orders,
`expense requests, etC.
`
`Exchange of information between players in
`any broker—organized business:
`insurance,
`securities, banking, etc.
`
`Distributed Processing:
`Giving Java Something to Do
`
`A paradigmatic example of this second category
`of XML applications is the data delivery system
`designed by the semiconductor industry.
`
`Each major semiconductor manufacturer main~
`tains several terabytes of technical data on all of
`the ICs that it produces. To enable interchange of
`this data, an industry consortium (the Pinnacles
`Group) was formed several years ago by Intel,
`
`,lii
`
`i i s
`
`IBM-1012
`
`Page 12 of 17
`
`IBM-1012
`Page 12 of 17
`
`
`
`
`
`
`National Semiconductor, Philips, Texas Instru—
`ments, and Hitachi'to design an industry—specific
`SGML markup language, The consortium finished
`that specification in 1995, and its member compa-
`nies are now well into the implementation phase
`of the process.
`
`think that the rise in popularity of
`One might
`HTML would cause the Pinnacles members to
`reconsider their decision, but in fact the limita—
`tions of IITML have convinced them that
`their
`
`original strategy was the correct one, Their initial
`idea was
`that
`the richly parameterized data
`stream made possible by the industryespecific
`SGML markup would enable intelligent applica—
`tions not merely to display semiconductor data
`sheets as readable documents but actually to
`drive design processes. It is now recognized that
`this approach is a perfect fit with the concept of
`distributed java applets, and the vision of the
`near future is one in which engineers can access
`a manufacturer’s Web site and download not only
`viewable data on particular integrated circuits but
`also a java applet
`that allows them to model
`those circuits in various combinations.
`
`The semiconductor application is a good demon-
`stration of the advantages of XML because:
`
`1,
`
`It requires industry—specific markup that can«
`not be implemented within the confines of
`the fixed HTML tag set.
`
`the data representation be
`. It requires that
`platform— and vendor—independent so that
`data from a variety of sources can be used to
`drive a variety of distributed applications
`(some of which may be provided by third
`parties, generating a subindustry of provid-
`ers of tools that can work with the standard—
`ized data stream).
`
`_ Its utility rests ultimately in the fact that a
`process
`computation—intensive
`(modeling
`circuits for hours at a time) that would other—
`wise entail an enormous, extended resource
`hit on the server has been changed into a
`brief interaction with the server followed by
`an extended interaction with the user’s own
`
`Technical Papers
`
`computers.
`
`
`Web client. This aspect has been summed
`up in the slogan “XML gives Java something
`to do.”
`
`Note that validation, while sometimes important,
`does not always play the crucial role in this cate-
`gory of applications that it does in applications
`where data must be checked for structural integ-
`rity before entering a database, To make process-
`ing as efficient
`as possible, XML has been
`designed so that validation is optional in applica—
`tions where it is not needed.
`
`the semicon-
`As with the health—care example,
`ductor application is notable not merely for the
`sheer size of the market it represents but also
`because it is paradigmatic of an enormous range
`of future Java-based Web applications —— virtually
`any application in which standardized data is
`expected to be manipulated in interesting ways
`on the client. Perhaps the most obvious examples
`of such applications are the following:
`
`0 Design applications where the designer
`would otherwise use server cycles to con
`sider various alternatives: electronics, engi-
`neering, architecture, menu planning, etc.
`
`Scheduling applications where a customer
`would otherwise use server cycles to enter—
`tain various
`possibilities:
`airlines,
`trains,
`buses, and subways;
`restaurants, movies,
`plays, and concerts. This is what Easy Saabre
`and Ticketron will look like a few years from
`now as the economies of distributed Java—
`based processing become evident.
`
`Commercial applications that allow consum—
`ers to explore alternatives by supplying dif—
`ferent shopping criteria: real estate, automo-
`biles, appliances, etc.
`
`The entire spectrum of educational applica—
`tions,
`at small subset of which are the ones
`we call “online help.”
`
`The entire spectrum of customer—support
`applications,
`ranging
`from lawn—mower
`maintenance through technical support for
`
`IBM-1012
`
`Page 13 of 17
`
`IBM-1012
`Page 13 of 17
`
`
`
`A harbinger of applications to come in the last
`category is the Solution Exchange Standard, an
`SGML markup language announced in June 1996
`by a consortium of over 60 hardware, software,
`and communications companies to facilitate the
`exchange
`of
`technical
`support
`information
`among vendors, system integrators, and corpo-
`rate help desks. In the Words of the announce—
`ment:
`
`tunately, the Web latency built into every expan~
`sion or contraction of the TOC makes this pro-
`cess sluggish in many user environments. A much
`better solution is to download the entire struc-
`tured TOC to the client rather than just individual
`server-generated views of the TOC. Then the user
`can expand, contract, and move about
`in the
`TOC supported by a much faster process running
`directly on the client.
`
`The standard has been designed to be
`flexible.
`It
`is independent of any plat—
`form, vendor or application, so it can be
`used to exchange solution information
`without regard to the system it is com—
`ing from or going to.
`[.
`.
`.] Additionally,
`the standard has been designed to have
`a long lifetime. SGML offers room for
`growth and extensibility, so the stan—
`dard can easily accommodate rapidly
`changing support environments.
`
`A group at Sun actually implemented a form of
`this solution as part of a Java—based HTML help
`browser, but the limitations of HTML required the
`team to come up with a couple of clever
`workarounds. In this application, a TOC was con-
`structed by hand (the lack of structure in ordinary
`HTML makes it impossible to reliably generate a
`TOC directly from the document) using non—
`standard tags invented for the purpose, and then
`the TOC piece was wrapped in a comment within
`an HTML page to hide the nonstandard markup
`from Web browsers. A Java applet downloaded
`with the HTML document interpreted the hidden
`markup and provided the client-based TOC
`behavior.
`
`XML: Principles, Tools, and Techniques
`
`Such applications, which the XML subset is spe-
`cifically designed to address, will grow in impor-
`tance as consumers come to expect interoperabil-
`ity among their data-manipulating applets and
`information providers confront
`the realities of
`trying to support computation—intensive tasks
`directly on their Web servers.
`
`View Selection: Letting the User Decide
`
`A third variety of XML applications are those in
`which users may wish to switch between differ—
`ent views of the data without requiring that the
`data be downloaded again in a different form
`from the Web server.
`
`One early application in this category Will be
`dynamic tables of contents. It
`is possible now,
`using Web servers built on object—oriented data-
`bases, to present the user with a table of contents
`into a
`large collection of data that can be
`expanded with a mouse click to “open up” a por—
`tion of the TOC and reveal more detailed levels
`of the document structure. Dynamic TOCs of this
`kind can be generated at run time directly from
`the hierarchical structure of the document. Unfor-
`
`224
`
`In practice, this application worked very well and
`testified both to the ingenuity of its designers and
`to the validity of the basic concept. But in an
`XML environment, neither the manual creation of
`the TOC nor its concealment would have been
`necessar