`
`
`WORLDWIDE WEB JOURNAL
`
`J_ (erraya
`
`XML
`
`Principles, Tools,
`
`and Techniques
`
`
`
`
`
`O’REILLY"
`
`
`
`IBM-1011
`
`Page 1 of 16
`
`IBM-1011
`Page 1 of 16
`
`
`
`WORLD WIDE WEB JOURNAL
`. XML: PRINCIPLES, TOOLS, AND TECHNIQ UES Volume 2, Issue 4, Fall 1997
`
`Publisher: Dale Dougherty
`
`Guest Editor.- Dan Connolly
`
`Series Editor: Rohit Khare
`
`Managing Editor: Donna Woonteiler
`News Editor.- DC. Denison
`
`Production Editor: Nancy Crumpton
`
`Tecbnicai Illustrator: Robert Romano
`
`Software Tools specialist: Mike Sierra
`
`Quality Assurance: Ellie Fountain Maden
`Cover Design: Hanna Dyer
`
`Text Design: Nancy Priest, Marcia Ciro
`
`Subscription Administrator: Marianne Cooke
`Photos: Flint Born
`
`ISBN: 156592—3499
`
`The individual contributions are copyrighted by the authors or their respective employers. The print
`compilation is Copyright © 1997 O’Reilly & Associates, Inc. All rights reserved. Printed in the United
`States of America.
`
`Many of the designations used by manufacturers and sellers to distinguish their products are claimed
`as trademarks. Where those designations appear in this book, and O’Reilly 8; Associates, Inc. was aware
`of a trademark claim, the designations have been printed in caps or initial caps.
`
`While every precaution has been taken in the preparation of this book, the publisher assumes no
`responsibility for errors or omissions, or for damages resulting from the use of the information
`contained herein.
`
`This book is printed on acid—free paper with 85% recycled content, 15% posteconsumer
`{X}
`Q waste. O’Reilly 8: Associates is committed to using paper With the highest recycled content
`
`available consistent With high quality.
`ISSN: 1085—2301
`
`IBM-1011
`
`Page 2 of 16
`
`IBM-1011
`Page 2 of 16
`
`
`
`
`Arnaud Le Hors
`Architecture
`iehors@w3.org
`Dan Connolly
`Domain Leader
`Hakon Lie
`howcome@w3.org
`connoliy@w3.org
`Jim Gettys
`Chris Liiiey
`chris@w3.org
`ig@w3. org
`Philipp Hoschka
`Masayasu ”Mimasa” ishikawa
`hoschka@w3.org
`mimasa@w3.org
`Youichirou Koga
`Dave Raggett
`dsr@w3.org
`yvkoga@w3.org
`Yves Lafon
`lréne Vatton
`lafon@w3.org
`vatton@w3.org
`Ora Lassila
`lassila@w3.org
`Henrik Frystyk Nielsen
`irystyk@w3.org
`Daniel Veiliard
`veiiiard@w3.org
`
`W3C Administration
`Jean-Francois Abramatic
`W3C Chairman and Associate
`Director, MIT Laboratory for
`Computer Science
`jfa@w3.org
`Tim Berners-Lee
`Director of the W30
`timbl@w3.org
`Vincent Quint
`Deputy Director for Europe
`quint@w3.org
`Nobuo Saito
`W30 Associate Chairman
`and Dean, Keio University
`nobuo.saito@w3,org
`Alan Kotok
`W30 Associate Chairman
`kotok@w3.org
`Tatsuya Hagino
`Deputy Director for Asia
`hagino@w3.org
`
`-‘
`
`:
`
`User interface
`Vincent Quint
`Domain Leader
`quint@w3.org
`Bert Hos
`bert@w3.org
`Ramzi Guetari
`guetari@w3.org
`
`lose’ Kahan
`kahan@w3.org
`Sally Khudairi
`khudairi@w3.org
`Stephan Montigaud
`montigaud@w3.org
`Gerald Oskoboiny
`geraid@w3.org
`Luc Ottavj
`ottavj@w3. org
`Pierre Fiilauit
`fliiauit@w3.org
`Takeshi "Yamachan" Yamane
`yamachan@w3,org
`
`Administrative Support
`Pamela Ahern
`pam@w3.org
`susan Hardy
`susan@w3,org
`MarieLine Ramfos
`ramfos@w3.org
`Josiane Roberts
`roberts@inria.ir
`
`Nancy Ryan
`ryan@w3.org
`Yukari Mitsuhashi
`yukari@w3.org
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Technology and Society
`Jim Miller
`Domain Leader
`jmiiier@w3.org
`Eui-Suk Chung
`euisuk@w3.org
`Daniel Dardaiiier
`danieid@w3.org
`Philip DesAutels
`philiod@w3.org
`Josef Dieii
`jdieti@w3.org
`Joseph Reagle
`reagie@w3.org
`Ralph Swick
`swick@w3.org
`
`Cross Areas and
`Technical Support
`Janet Bertot
`bertot@w3.org
`Stephane Boyera
`boyera@w3.org
`Daicho Funato
`daichi@w3sorg
`Tom Greene
`tjg@w3.org
`
`
`
`IBM-1011
`
`Page 3 of 16
`
`IBM-1011
`Page 3 of 16
`
`
`
`Consulting, Inc. Poet Soft
`
`Primrose
`Pretty G
`
`
`
`IBM-1011
`
`Page 4 of 16
`
`ABNVAMRO Bank
`
`Access Company Limited
`Adobe Systems Inc.
`Ae’rospatiale
`AGFSI
`
`Agfa Division, Bayer Corp.
`Agranat Systems, Inc.
`AIcatel Aisthom Recherche
`AlfaAOmega Foundation
`AIis Technologies, Inc.
`America Oniine, Inc.
`American International Group Data
`Center, Inc. {AlG}
`American Internet Corporation
`Apple Computer, Inc.
`ArborText, Inc.
`Architecture Projects
`Management Ltd.
`ArrowPoint Communications
`Art Technology Group
`Asymetrix Corporation
`AT&T
`
`Attachmate Corporation
`BackWeb Technologies, Inc,
`BELGACOM
`Bel/core
`Bitstream, Inc.
`British Telecommunications
`Laboratories
`Bull SA.
`Canal +
`Canon, Inc.
`Cap Gemini Innovation
`Center for Democracy
`and Technology
`Center for Mathematics and
`Computer Science (CWI)
`CERN
`CIRAD
`
`CNETeThe Computer Network
`CNR—Instltuto Elaborazione
`deii’lnformazione
`CNRS
`
`Commissariat a L’Energie
`Atomioue (CEA)
`CompuServe, Inc.
`Computer Answer Line
`Corporation for National Research
`Initiatives (CNRI)
`CosmosBay
`Council for the Central
`Laboratory of the Research
`Councils (CCL)
`CyberCash, Inc.
`Cygnus Support
`Daewoo Electronics Company
`Dassault Aviation
`Data Channel
`
`Data Research Associates, inc.
`Defense Information Systems
`Agency {DISA}
`Deutsche Teiekom—Oniine Pro
`Dienste GmbH R Co. KG
`(T—DnIine)
`Digital Equipment Corporation
`Digital Vision Laboratories
`Corporation
`DigitalSty/e Corporation
`Direct Marketing Association, Inc.
`DoubleC/Ick
`
`Eastman Kodak Company
`Ecole Nationale Supe‘rieure
`d’Informatique et de
`Mathe’matiques
`Applique'es (ENSIMAG)
`EDF
`
`EEIG/ERCIM
`ENEL
`
`Engage Technologies
`ENN Corporation
`Enterprise Integration Technology
`Entrust Technologies, Inc.
`ERICSSON
`
`Ernst & Young LLP
`ETNO TEAM S.p.A.
`Firefly Network, Inc.
`First Virtual Holdings, Inc.
`
`FirstFloor Software, Inc.
`Folio Corporation
`Foundation for Research and
`Technology (FORTH)
`France Telecom
`Fujitsu Limited
`Fulcrum Technologies, Inc.
`GCTECH S.A.
`GEMPLUS
`
`General Magic, Inc.
`Geoworks
`GMD Institute FIT
`Graphic Communications
`Association
`Grenoble Network Initiative
`GR/F S.A.
`Groupe ESC Grenoble
`Harlequin Inc.
`HA VAS
`Hewlett Packard
`Laboratories, Bristol
`Hitachi, Ltd.
`@Home Network
`Hong Kong Jockey Club
`Hummingbird Communications Ltd.
`IBERDROLA S.A.
`IBM Corporation
`ILOG, S.A.
`InContext Systems
`Industrial Technology
`Research Institute
`Infopartners
`INRETS
`
`Inso Corporation, Providence
`Institut FrancoeRusse A.M.
`Liapunov
`Institute for Information Industry
`Intel Corporation
`lntermind
`
`lnternet Profiles Corporation
`Intraspect Software, Inc.
`Joint Info. Systems Comm. of the
`UK Higher Ed. Funding Council
`
`Justsystem Corporation
`K2Net, Inc.
`KnowiedgeCite
`Kumamoto Institute of Computer
`Software, inc.
`Lexmark International, Inc.
`Los Alamos National Laboratory
`Lotus Development Corporation
`Lucent Technologies
`Mainspring Communications, Inc.
`Marimba, Inc.
`Matra Hachette
`MBED Software
`MCI Telecommunications
`Metrowerks Corporation
`Michelin
`
`Microsoft Corp.
`Microsystems Software, Inc.
`MITRE Corporation
`Mitsubishi Electric Corporation
`MTA SZTAKI
`Narrowiine
`National Center for
`Supercomputing
`Applications (NCSA)
`National Security Agency (NSA)
`National University of Singapore
`NCR
`
`NEC Corporation
`
`Netscape Communications
`NHS {National Health Service, UK)
`Nippon Telegraph & Telephone
`Corp. (NiT)
`NOKIA Corporation
`Novell, Inc.
`NU Data Communications
`Systems Corp.
`Nynex Science & Technology, Inc.
`O’Reiliy & Associates, Inc.
`Object Management Group,
`Inc. {OMG}
`Object Services and
`
`IBM-1011
`Page 4 of 16
`
`
`
`
`
`
`
`
`
`Thomson-63F
`SottOuad. Inc.
`Progressive Networks
`0CLC (Online Computer Library
`TIAA-CREF
`Software Publishers Association
`Center, Inc.)
`Public IP Exchange, Ltd. (PIPEX)
`Toshiba Corporation
`(SPA)
`Omron Corporation
`Qua/comm Inc.
`TriTeaI Corporation
`Sony Corporation
`Raptor Systems, Inc.
`Open Market, Inc.
`TRUSTe
`Spyglass, Inc.
`ReedrEIsevier
`Open Sesame
`UKERNA
`
`Strategic interactive Group
`Reuters Limited
`Open Software Associates, Inc.
`Unwired Planet
`
`Sun Microsystems Corporation
`Rice University for Nat’l
`Open Software Foundation
`SURFnet bv
`USWeb Corporation
`HCPP Software
`
`Open Text Corporation
`Swedish Institute for Systems
`VeriSign, inc.
`Riveriand Holding NV/SA
`Oracle Corp.
`
`Development (SISU)
`Verity, Inc.
`ORSTOM
`Royal Melbourne institute of
`
`Syracuse University
`Technology
`Vignette Corporation
`Pacifitech Corporation
`
`Tandem Computers Inc.
`VTT Information Technology
`Security Dynamics
`Partners HealthCare System, Inc.
`
`Technische Universitat Graz
`Technologies, Inc.
`webMethods, Inc.
`Pencom Web Works
`
`Sema Group
`Teknema Corporation
`WebTV Networks Inc.
`
`Philips Electronic N. V.
`
`Telecom Italia
`Wolfram Research, Inc.
`Sharp Corporation
`
`Poet Software Corporation
`SICS
`Telequip Corporation
`WWW. Consult Pty Ltd.
`PointCast Incorporated
`
`Siemens-Nixdorf
`WWW—KR
`Terisa Systems, Inc.
`
`Pretty Good Privacy, inc.
`
`Tercel Group
`Silicon Graphics, Inc.
`Xerox Corporation
`
`Prodigy Services Corporation
`SLIGOS
`Xionics Document
`The Productivity Works, inc,
`
`Technologies, Inc.
`
`
`
`IBM-1011
`
`Page 5 of 16
`
`IBM-1011
`Page 5 of 16
`
`
`
`
`
`CONTENTS
`
` 1
`
`EDITORIAL
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Guest Editor Dan Connolly and Series Editor Rohit
`Khare team up to herald the appearance of XML
`and discuss its evolution.
`
`
`XML BACKGROUND
`
`Members of the W3C's XML Editorial Review
`
`Board talk about the road to XML: its history,
`breakthroughs, the participation of Microsoft and
`Netscape, and the work that remains.
`
`
`WORK IN PROGRESS
`13
`
`In “The Web Is Ruined and l Ruined lt”self-
`proclaimed HTML Terrorist David Siegel discusses
`how proper separation of structure (HTML), style
`{683), and semantics (XML) make content more
`compelling and design more effective
`
`
`TIMELINE
`22
`
`
`W3C REPORTS
`27
`
`Recent, noteworthy W3C events
`
`See next page for detailed listing
`
`
`TECHNICAL PAPERS
`95
`
`See next page for detailed listing
`
`
`
`This ASSUG ’3
`cover image was
`photographed by
`Kevin Thomas and
`manipulated in
`Adobe Photoshop 4.0
`by Edie Freedman.
`
`
`
`
`
`
`
`
`IBM-1011
`
`Page 6 of 16
`
`IBM-1011
`Page 6 of 16
`
`
`
`CONTENTS
`
`W3C REPORTS
`
`Extensible Markup Language (XML)
`TLM BRAY, JEAN PAOLI, C.M. SPERBERG—MCQUEEN
`
`
`
`Extensible Markup Language (XML)
`
`Part 2: Linking
`TIM BRAY, STEVE DEROSE
`
`H TML-Math:
`
`Mathematical Markup Language Working Draft
`ROBERT R. MINER, PATRICK D. F. ION
`
`
`
`Document Object Model Requirements
`LAUREN WOOD, JARED SORENSEN
`
`
`
`TECHNICAL PAPERS
`
`A Guide to XML
`NORMAN WALSH
`
`
`
`Table of Content5
`
`
`
`
`
`XML and CSS
`STUART CULSHAw, MICHAEL LEVENTHAL, AND MURRAY MALONEY
`
`The Evolution of Web Documents:
`
`The Ascent of XML
`DAN CONNOLLY, ROHiT KHARE, ADAM RIFKIN
`
`Embedded Markup Considered Harmful
`THEODOR HOLM NELSON
`
`IBM-1011
`
`Page 7 of 16
`
`IBM-1011
`Page 7 of 16
`
`
`
`
`
`
`
`
`M C
`
`ONTENTS
`
`W
`
`WEB
`
`Chemical Markup Language:
`A Simple Introduction to Structured Documents
`PEiER MURRAY‘RUST
`
`Codifylng Medical Records in XML:
`Philosophy and Engineering
`THOMAS L. LINCOLN
`
`1
`
`;
`l
`
`1
`)1
`
`XML: Can the Desperate Perl Hacker Do it?
`
`
`
`MICHAEL LEVENTHAL
`
`XML: From Bytes to Characters
`BERT B05
`
`
`
`An introduction to XML Processing with Lark
`'IM BRAY
`
`
`
`135
`
`149
`
`153
`
`165
`
`177
`
`187
`
`197
`
`207
`
`219
`
`229
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`IBM-1011
`
`Page 8 of 16
`
`Table of Contents
`
`K
`
`(
`
`l
`
`é
`“
`i
`
`'2,
`
`Building XML Parsers for Microsoft's iE4
`EAN PAOLI, DAVID SCHACH, CHRIS LOVEIT, ANDREW LAYMAN, iSWAN CSERI
`
`
`
`JUMBO: An Object—Based XML Browser
`
`JEIER MURRAY-RUST
`
`
`
`Capturing the State of Distributed Systems with XML
`XOHIT KHARE, ADAM RiFKIN
`
`
`
`XML, Java, and the Future of the Web
`ON BOSAK
`
`WiDL:Appiicatian integration with XML
`CHARLES ALLEN
`
`
`
`
`
`
`
`IBM-1011
`Page 8 of 16
`
`
`
`
`
`
`
`
`
`
`
`
`XML BACKGROUND
`
`
`The Road to XML
`
`
`ADAPTING SGML TO THE WEB
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ny computer scientists have talked about
`simplifying SGML. The W3C’s XML Edito-
`
`.
`4
`rlal Review Board has been working at it
`since July ’96. So far, their efforts have received almost
`universal acclaim. Recently D. C. Denison canvassed a
`group of Editorial Review Board (ERB) members, and
`asked them to look back on how the XML project got off
`the ground, and where they think it’s going from here.
`
`XML wasn't the only acronym in the running
`when WSC’s Working Group began to con-
`sider a name for what they hoped to create:
`specifications for a subset of SGML that was
`optimized for the Web,
`
`“There were several acronyms that we consid—
`ered,” Tim Bray remembers. “I believe there
`was MGML, for Minimal Generalized Markup
`Language, and something called SIMPL for
`Simple Internet Markup Protocol, or something
`
`like that Eventually we voted, and XML—for
`Extensible Markup Languageuwon out. It was
`short and sweet, and people liked it.”
`
`“Marketing XML to the HTML user was one of
`our prime goals,” Jean Paoli adds. “We thought
`that putting the spin on the ‘Extensibility’ part
`of the language would attract the HTML user.”
`
`Choosing a name for the project was trivial, of
`course, compared to some of the other Chal-
`lenges that faced the group when they first
`started working together in July, 1996. Many
`other efforts to simplify SGML had run out of
`steam long before reaching the proposal stage.
`Somehow, however,
`this group managed to
`pull
`it off, publishing a working draft
`that’s
`received Wide acceptance in both the SGML
`and Web communities. How did they do it?
`
`
`
`
`
`XML Background
`
`
`
`IBM-1011
`
`Page 9 of 16
`
`IBM-1011
`Page 9 of 16
`
`
`
`
`
`XML BACKGROUND
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`activity was
`
`for Web use
`
`
`‘
`L
`_ Working and
`
`The official
`
`SGML ERB, _
`
`
`
`
`
`
`let’s Go Back a Few Years
`
`A slimmed-down SGML is not a new concept.
`Many members of the XML team have been
`discussing the idea for years.
`
`“Most computer scientists who have worked
`with SGML have proposed simplifications; spe-
`cifically, keeping all
`the structural flexibility
`but losing many syntax options,” ERB member
`Steve DeRose says.
`“I‘ve heard of about a
`dozen proposals over the years."
`
`Some of the XML authors, in fact, were already
`using a sort of proto-XML.
`
`“I think it’s important to understand that I and
`some other people had actually been doing
`XML for years,” Tim Bray says. “A lot of people
`who are in the business were actually using
`
`N ”
`
`Everyone who uses HTML for very long
`discovers that they want
`'just one more tag.
`
`N S
`
`GML data in the case of open text searching
`and displaying. In the case of electronic book
`technology, there was a similar kind of story:
`we had long observed the fact that if they sent
`you some nicely—tagged text you could do any
`number, any amount of useful things with it
`without worrying about
`the minutiae of the
`standard and without having to have a DTD.
`So what XML in effect is has been around for a
`long time.”
`
`Dave Hollander was another XML author who
`had already jumped the gun, so to speak.
`
`6
`
`simplified SGML language
`“I developed a
`while working on HP’s LaserROM program in
`the early ’90s,” he recalls. “That evolved into
`the language used in our HP-UX help sys-
`terns.”
`
`The rise of the Web, and HTML, pressed other
`members of the ERB to approach XML from
`the other direction.
`
`“Everyone who uses HTML for very long dis-
`covers that
`they want
`‘just one more tag,”
`according to Steve DeRose. “If you’re doing
`catalogs you need a <PRICE> tag; for repair
`manuals you need <PARTNUM>; for ancient
`manuscripts you need <LACUNA> and <SIC>.
`Having been through this enough times,
`I
`want
`to be able to create new information
`structures any time my data justifies them, and
`do it easily. This is why C++ lets you make
`your own classes
`(imagine a development
`environment that didn’tl), and it’s why XML is
`absolutely necessary. To do generalized pro-
`cessing, retrieval, etc, I have to be able to say
`what things in documents are.
`I can do that
`with XML, but I can’t do it with any one partic-
`ular fixed tag set.”
`
`Jean Paoli was also well aware of HTML’S
`shortcomings. “I discovered that a lot of Web
`content providers were using what they called
`‘structured comments’ to hide information in
`their HTML,” he says.
`“I was convinced that
`they needed a simple way to extend HTML,
`and I always thought that it could be a kind of
`simplified SGML that my SGML customers
`were all already using.”
`
`Jon Bosak was similarly inspired.
`
`
`
`
`
`XML: Principles, Tools, and Techniques
`
`IBM-1011
`
`Page 10 of 16
`
`IBM-1011
`Page 10 of 16
`
`
`
`
`
`____________——_———-—-————————_
`
`XML BACKGROUND
`
`
`
`“XML arose from the realization that HTML is
`insufficient for certain kinds of Web applica—
`
`tions,” he says. “I was one of the people who
`came to this realization early because I was
`
`working in a field—online technical docu—
`mentationA—where the requirements are well
`understood and it’s clear that HTML can’t meet
`
`I was putting this complex material in
`them.
`online browsers used by millions of people
`before anyone had heard of HTML or the Web,
`and I knew from experience that HTML wasn’t
`going to work for that kind of publishing.
`I
`knew that it wouldnt work well for any kind
`
`of large—scale content production. SO I could
`see a time coming when large content provid—
`ers would have to turn from HTML to some-
`
`thing more powerful. The question was, what
`would they provide?
`
`“1 could see only two possibilities: either the
`big software companies would offer propri—
`etary and probably binary—coded formats or
`we could get them to adopt a single, standard,
`human-readable format. The only standard
`solution that
`I knew could do the job was
`SGML."
`
`Bosak’s solution: “I started a working group in
`
`the W5C to provide specifications that could
`put SGML on the Web. What came out of that
`activity was XML-a subset of SGML designed
`for Web use.”
`
`Working and Evangelizing
`
`The official W3C group, originally called the
`SGML ERB, began working together in July
`1996:.
`the larger mailing list discussions,
`the
`SGML W'orking Group, started the following
`
`September. Work proceeded quietly through
`most of 1996 and early 1997, Via teleconfer—
`ences, email, and the occasional conference.
`(In July ’97, the SGML ERB became the XML
`WG and the SGML WG became the XML SIG.)
`
`Meanwhile interest was growing, as the XML
`authors discussed the project with their col—
`leagues. Perhaps it was an early indication of
`XML’s flexibility that some authors,
`like Tim
`Bray} found that they could tailor their descrip-
`tions of XML to their audience,
`
`“If I was talking to people to whom search and
`retrieval is very important, I would point out
`that when you invent your own tags, you can
`use them to drive searches and that’s a lot bet-
`ter," he recalls.
`
`“When I was talking to people to whom Java
`and that whole type of thing is important,
`I
`would point out
`that HTML is
`fine but
`it
`doesn’t give Java much to chew on. And XML
`does. And if was talking to people who are in
`
`the publishing business and are irritated at
`HTML’s fairly primitive page make-up facilities,
`I would point out that one solution to that is to
`de-couple the markup ‘syntax and the format—
`ting semantics, and XML does that.
`
`When ERB member CM. Sperberg—McQueen
`
`spoke to colleagues about XML, he promoted
`“the ability to use your own tags, rather than
`the rather eccentric and constricted vocabulary
`of HTML,” he recalls. “That’s easily the most
`important aspect of XML from the point of
`View of academic research. The ability to write
`
`an XML parser that fits in 50 Kb of memory
`also captured the attention of a lot of program—
`mers and tool developers.”
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`XML Background
`
`
`
`
`IBM-1011
`
`Page 11 of 16
`
`IBM-1011
`Page 11 of 16
`
`
`
`l l
`
`i!
`
`____,—___._———_—__———
`X
`BACKGROUND
`
`l l
`
`The SGM
`plicity at
`board.
`T}
`same, thij
`ity and v3
`gets you",
`nology 1i
`the thing
`
`communi
`
`
`Eve Maler found that the XML applications that
`generated the most excitement were “the ones
`that blur the distinction between information
`
`such as
`delivery and transacting business,
`ordering a new part by clicking on a part num—
`ber in an online service bulletin. And the idea
`
`of using XML as an exchange protocol for
`purely transaction—oriented applications is also
`pretty popular, as we’ve seen by the quick
`promotion of XML-based EDI initiatives.
`
`“Of course, for many people who have had
`exposure
`only
`to HTML,
`they’re most
`impressed simply by the notion that tags can
`have meaning,” Maler continues. “Many of the
`business and technical requirements they’ve
`conceived to date could be addressed with this
`one innovation!”
`
`Soon, a certain software company began to
`show an interest
`in XML. Jean Paoli, of
`Microsoft, a member of the original SGML Edie
`torial Review Board, had been aggressively
`evangelizing XML to the company’s Explorer
`product teams.
`
`“When I talked about XML to the people here
`at Microsoft,” Paoli
`remembers,
`“I always
`stressed its ability to encode data, not docu-
`ments. Nobody at Microsoft understood why
`you would want to use XML for things that
`HTML is good for. But data? Yes. And describ—
`ing customers and orders? Yes. Financial infor—
`mation? Yes. 80 I always sold XML to the
`database people, the people who understood
`the value of structuring data.”
`
`“Adam Bosworth (who designed Microsoft
`Access) and Thomas Reardon helped me a lot
`selling this idea.”
`
`“But, even more important, it was the Channel
`Definition Format (CDF) that helped sell the
`whole XML story to Microsoft,” Paoli contin—
`ues. “At that moment (February ’97), the push
`battle was
`terrible between Netscape and
`Microsoft, and the Internet Explorer team was
`searching for a good data file format to repre—
`sent Webcasting information.
`It was evident
`that XML was a good choice. I presented XML
`to the managers of the Microsoft Internet Push
`team and we modeled their Webcasting data
`in ten minutes!
`It
`took only a few days to
`decide to use XML. The first XML application
`(GDP) by Microsoft gave Microsoft at big win.
`This was the beginning of a lot of PR around
`XML. Starting XML with a winning application
`was a great thing for XML!”
`
`In March ’97, Microsoft officially announced
`that they were going to base their new Chan—
`nel Definition Format on XML. This generated
`a'fair amount of interest in XMI. among pro—
`grammers and Internet professionals.
`
`Breakthroughs
`
`two events
`As late ’96 turned into early ’97,
`brought a new level of attention to the XML
`project. The first was the SGML :96 conference,
`held in November 1996.
`
`“The SGML ’96 conference was a watershed,”
`Steve DeRose remembers, “because it was not
`clear whether the SGML community would see
`XML as SGML writ large, or as some kind of
`competitor, Since SGML software already sup-
`ports tag extensibility, variant delimiters, etc.,
`and the SGML market has huge amounts of
`high—value data,
`this community is important.
`
`XML: Principles, Tools, and Techniques
`
`
`
`purpose
`XML,” P
`
`to the
`noise as
`
`“The so
`
`XML,” j
`
`IBM-1011
`
`Page 12 of 16
`
`IBM-1011
`Page 12 of 16
`
`
`
`
`
`—_———————_——_——_—
`XML BACKGROUND
`
`
`
`The SGML community saw the benefits of sim—
`plicity and ease of adoption and jumped on
`board. The Web community has done the
`same, though for different reasons: extensibil—
`ity and validation. The beauty of XML is that it
`gets you the best of both worlds; but any tech-
`nology like that overlaps partly with both of
`the things it draws on; the reception in both
`communities is therefore crucial. As soon as i
`
`saw the major SGML vendors and the major
`Web vendors all diving in, I knew we were in
`good shape.”
`
`The WW W6 Conference, held early in 1997 in
`Santa Clara, California, was another milestone.
`
`“We put on a major PR blitz at that conference,
`and I think it went over pretty well,” Tim Bray
`recalls. “I think XML was one of the hot stories
`
`of that conference. By May 1 of ’)7 it was
`pretty obvious we were onto something that
`was going to be significant. And it’s grown
`since then."
`
`I was
`earlier, not many people knew what
`talking about. But when we presented the
`XML draft at the W6 Conference in April
`’97, about half the faces in the audience lit up.
`Those were content providers and Web site
`administrators who’d finally hit that wall. They
`knew that they had a problem, they just didn’t
`know what to do about it. As soon as they saw
`XML, they knew.”
`
`Microsoft versus Netscape
`
`Soon Netscape joined Microsoft in agreeing to
`support
`the new standard. Tim Bray began
`working with Netscape as a consultant Articles
`on XML began showing up in a variety of print
`magazines and online publications. Predict-
`
`”The SGML people got it as soon as they
`
`saw XML because they all come from
`
`industries that had to solve this problem
`
`“Microsoft announced CDF based on XML a
`
`along time ago.”
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`few weeks before the W6 conference, on
`purpose,
`in order to boost
`the interest
`in
`XML,” Paoli remembers. “I took a bunch of
`Microsoft people who were involved in XML
`to the conference, and we made as much
`noise as possible in all the XML sessions.”
`
`“The SGML people got it as soon as they saw
`XML,” Jon Bosak recalls, “because they all
`come from industries that had to solve this
`
`ably, many media stories played up the
`Microsoft—versus—Netscape angle.
`
`Many ERB members tend to downplay the
`importance of the competition between Micro—
`soft and Netscape, but they all agree it will
`have an impact.
`
`“Looking at this purely from the industry point
`of View,” Jon Bosak says, “the competition can
`only do us good by accelerating the accep—
`tance of a truly open, human—readable data
`format.”
`
`problem a long time ago. The HTML people
`only got it this year; that’s when they started
`hitting the wall in large numbers, in terms of
`having to deal with significant levels of con—
`
`
`tent. At the WWWS Conference in Paris a year
`
`
`XML Background
`
`
`
`IBM-1011
`
`Page 13 of 16
`
`IBM-1011
`Page 13 of 16
`
`
`
`
`
`“The participation of both Microsoft and
`Netscape has been very beneficial,” C.M. Sper~
`berg—McQueen adds. “They bring a particular
`technical perspective to the discussions:
`the
`View of the world from a large programming
`shop with enormous numbers of current users
`is rather different from the view of the world
`from an ac
`ademic institution or from a smaller
`nization.
`In that sense,
`the
`commercial orga
`Microsoft and Netscape viewpoints have been
`more similar than different, in my view.”
`
`Steve DeRose believes that competitive issues
`will not intrude on the creation of the XML
`specification.
`and
`“The
`competition between Microsoft
`Netscape would be almost a non-issue if not
`for a few over—excited articles,” he says. “All
`the representatives on the XML Working
`Group are deeply committed to doing the right
`thing, and to a consensus process. Neither
`Netscape nor Microsoft has tried to dominate
`the process or to foist any self—serving propos-
`als on the [XML working] group. Also, I think
`both companies realize they have better places
`to compete than over syntax. Let them and
`everyone else compete on user interface qual—
`ity,
`reliability, performance, and functional—
`itye-not on who can dream up new tag names
`p)
`or punctuation marks faster
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Details, Details
`
`Although XML has met with an enthusiastic
`reception, the ERB members are well aware of
`the work that remains. First and foremost, they
`have to finish the specification.
`
`“It would be nice if we could finish XML 1.0
`and get it snapshotted,” Tim Bray says. “We
`should get it blessed by W3C as a recommen—
`dation, and maybe even get
`it blessed by
`another standards organization as well, just so
`that we have a line in the sand and can say,
`‘Okay, this phase is done.‘ I think we need to
`do that simply because there are so many
`implementations happening so fast that just to
`be fair to the people who believed in what
`we’ve done we have to stop changing it. We
`have to stop and say, ‘Okay, here’s what it is.
`Maybe
`it’s not perfect yet,
`it could be
`improved still further, but here’s 1.0 and that’s
`what 1.0 is.’ i think clearly by the end of the
`year we must have 1.0 finished, blessed, and
`canonized. There will still be lots of other
`things to work on. The 1.0 version won’t have
`a solution to the style sheet problem, it won’t
`have a solution for lots of other things, but the
`base language has to be frozen.”
`
`is hopeful
`for one,
`Jon Bosak,
`issues are behind them.
`
`that the big
`
`“I may be whistling in the dark,” he says. “But
`aside from the political issues were going to
`have to deal with as a result of competition, I
`don’t think that XML really faces any major
`problems once we get the specification for 1.0
`finished.
`It’s been designed to be easy to
`implement, and outside of all the last—minute
`internationalization details,
`it hasn’t
`really
`changed much for a while. The basics have
`been in place since last November ’96, and
`most of the finer points were settled by April
`797"
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Still there are;
`
`“in addition é
`itself,” Bosa
`kinds of issg“
`directly in H"
`
`for example
`names case-
`
`Japanese ch
`really a spa
`bogglingly c
`be sidestepp.
`policy questi
`error—handli
`
`
`“The real ac
`
`to the other
`
`ing piece a
`
`them XLL, fo
`XSL, for ext:
`
`
`itself is just
`into
`we get
`
`you lose a
`
`says,
`
`sure that t1
`
`and accorni
`
`getting klu
`
`clear, but i
`
`well enou"
`
`
`
`
`
`XML: Principles, Tools, and Techniques
`
`IBM-1011
`
`Page 14 of 16
`
`IBM-1011
`Page 14 of 16
`
`
`
`
`
`
`
`
`———————————_——_——_
`XML BACKGROUND
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`IBM-1011
`
`Page 15 of 16
`
`Fortunately, XML will be easier to develop
`than HTML, according to Tim Bray.
`
`“HTML is painfully difficult to evolve,” he says,
`“because it is a mixture of formatting seman—
`tics and hypertext semantics and GUI seman—
`tics with forms and so on. And trying to evolve
`all of those capabilities at once without break—
`ing them is very difficult. Now XML, the basic
`language, has a syntax and there’s going to be
`a style sheet facility and there’s going to be
`various behavior facilities That doesn’t mean
`
`it just
`that evolving any of this stuff is easy,
`means that you can partition the problems and
`solve them without having to solve them all at
`once, which is the problem that HTML faces.
`So a lot of the advanced capabilities that users
`of the Web are asking for, I think, are going to
`be easier to solve in an XML context."
`
`Still there are details on top of details.
`
`“In addition to the greater complexity of XML
`itself,” Bosak says, ”we’re dealing with all
`kinds of issues that were never confronted
`
`directly in HTML—how to handle whitespacc,
`for example, or whether to make stuff like tag
`names case-sensitive or not, or whether the
`
`japanese character for an ideographic space is
`really a space or not. Lots of nitty but mind—
`bogglingly complex problems that finally can’t
`be sidestepped any more. And there was a big
`policy question, which was what to do about
`error—handling—but we’re past that now.”
`
`“The real action," Bosak continues, “shifts now
`to the other two pieces of the puzzle, the link—
`ing piece and the style sheet piece. We call
`them XLL, for extensible linking language, and
`XSL, for extensible style sheet language. XML
`itself is just about syntax. With XLL and XSL
`we get into semantics, and that’s where the
`
`real competition is going to be: how you actu—
`ally do stuff.”
`
`“The hardest thing, in general,” Steve DeRose
`says, “is to look far enough ahead to make
`sure that the language will scale up smoothly
`and accommodate later extensions without
`
`getting kludgy. The broadstroke picture is very
`clear, but if you don’t pin all the details down
`well enough, systems won’t interoperate and
`you lose a central benefit of standardization.
`
`Yet still ahead, after the big technical problems
`are largely solved,
`there’s another Challenge:
`inspiring people to exploit the new possibili-
`ties that come with XML.
`
`“Now that it is reasonable to expect next gen-
`eration tools
`to have better control over
`
`encoding information,” Dave Hollander says,
`“we need to get ready to use these features.
`My next key initiative is how to get authors,
`collaborators, and consumers of information to
`make the best use of the new capabilities.”
`
`
`
`
`
`
`
`“Now, we have to encourage the market to
`It’s nice to see descriptive markup move into
`
`create specific horizontal and vertical DTDs, to
`the mainstream and be adopted so quickly.
`1
`build common vocabularies,” Paoli says. “We
`
`hope that it will let us really move data into
`need to let content providers generate useful
`forms that will outlast rev 5.3.9.1b of some—
`
`XML data while we,
`the software and tool
`body’s word processor, and help make bit-rot
`
`builders, build tools which access and uses
`a noneissue for the future of literature.”
`this data.”
`
`XML Background
`
`IBM-1011
`Page 15 of 16
`
`
`
`—__—__
`BACKGROUND
`
`to be sure. Yet, at this
`There’s plenty to do,
`point it appears likely that the early work of
`the XML ERB has created enough momentum
`to carry the project to completion.
`
`“What’s important, from here on in, is to keep
`all these activities moving toward the goal we
`
`started with in July 1996,” Jon Bosak says. “It’s
`more like a snowball gathering speed down a
`slope now.
`It doesn’t need pushing,
`it
`just
`to be kept pointed in
`needs
`the
`right
`direction.” l
`
`
`
`
`
`XML: Principles, Tools, and Techniques
`
`