`
`• NUMIEl •
`
`• JUlY
`
`• AUGUST 1991
`
`UNIVERSITY Of WASHINGTON
`
`AUG 5
`
`1998
`
`LIBRARIES
`
`ENGINEERING LIBRARY
`DISPLAY PERIODICAL
`Non-circulating until:
`
`r
`?.'175
`£'57
`.1"3?
`z: 'I
`
`--
`
`IEEE~
`COMPUTER
`SOCIETY
`
`ll,t .. t .. t. ,, .tlt.t ... t.t ... t.tl,t .. lt .. ,11, .. 1 ... II
`****************** 3-DIGIT 981
`0::31998 173
`IJHI1JEP..SIT'i OF WASHI.HGTOH
`S ERT'I.ALS t)'I.IJT.S'I.OH
`PO BOX 352900
`SEATTLE WA 9et9S-2900
`
`00347
`
`IBM-1008
`Page 1 of 10
`
`
`
`IEEEiill
`r.
`
`VOLUME 2
`
`NUMBER 4
`
`• JULY / AUGUST 1998
`
`On the Wire
`9 Reliable Multicast: When Many Must Absolutely Positively Receive It
`Christopha Metz
`Reliable IP Multicast, which permits reliable multipoint distribution of data over TP networks, is finding sup(cid:173)
`port in the vendor communiry.
`
`INTERNET SEARCH TECHNOLOGIES
`
`Guest Editor's Introduction
`21 Searching the Internet
`Robert E. Filman and Sangam Pam
`Networks and devices arc going to get faster and cheaper. What will continue to be Al-hard is the problem of making sense
`of the mass of data and misinfo rmation that fills the Web.
`
`24 Toward a Qualitative Search Engine
`Yanhong Li
`Traditional search engines do nor consider document
`qualiry in ranking search results. The Hyperlink Vector
`Voting method adds a qualitative dimension ro irs rank(cid:173)
`in gs by facroring in the number and d escriptions of
`hyperlinks ro rhc document.
`
`47 Case-Based Reasoning Support for
`Online Catalog Sales
`lvo Vollrath, Wolfgang Wilke, and Ralph Bergmann
`Case-based reasoning uses similarity measures and
`domain-specific kn owledge for informacion retrieval
`and problem solving. C BR techniques can be applied ro
`e-commerce applications for intelligent sales support.
`
`30 Web Metadata: A Matter of
`Semantics
`Ora Lassila
`The sheer volume of informacion can make searching
`the Web fru stratin g. The Reso u rce Description
`Framework, with irs focus on machine-understandable
`semantics, has rhc potential for savi ng time and yield(cid:173)
`ing more accurate search results.
`
`38 Context and Page Analysis for
`Improved Web Search
`Steve Lawrence and C Lee Giles
`NECI Research Institute has developed a merasearch
`engine char improves the efficiency of Web searches by
`downloading and analyzing each document and then
`displaying results chat show the query terms in context.
`
`55 Virtual Database Technology:
`Transforming the Internet into a
`Database
`Anand Rajaraman and Peter Norvig
`VDB techn ology makes external data sources ace
`as an extension of an enterprise's relational database
`sysrern.
`
`59 Using Relevance Feedback in
`Content-Based Image Metasearch
`Ana B. Benitez, Mandis Beigi, and Shih-Fu Chang
`MctaSeek is an image merasearch engine developed co
`explore the query of large, distributed, online visual
`informacion systems. The current implementation
`integrates user feedback into a performance-ranking
`mechanism.
`
`70
`
`Research Feature
`Increasing Application Accessibility Through Java
`Antonio Puliajito, Orazio Tomarchio, Lorenzo Vita, and Kishor S. Trivedi
`Java can be used ro create a network computing platform char lets users share applications nor specifically
`devised for the Web. The authors used one such platform ro porr an existi ng rool and develop a new appl ication.
`
`IBM-1008
`Page 2 of 10
`
`
`
`WEB
`METADATA:
`A MaHer of
`Semantics
`
`The sheer volume of
`
`informa~on can make
`
`ORA LASSILA
`Nokio Research Center
`
`searching the Web frustra~ng.
`
`The Resource Description
`
`Framework, with its focus on
`
`machine-understandable
`
`seman~cs, has the poten~al for
`
`saving ~me and yielding more
`
`accurate search results.
`
`T he surge in popularity of the Wo rld Wide Web-and in the quan(cid:173)
`
`is staggering. Although rhe Web is
`ti ty of information it contains-
`built on relatively simple principles, its growth has not been with(cid:173)
`out substantial growing pains. The Web was built for human consump(cid:173)
`tion, and although everything on the Web is machine-readable, it is nor
`machi ne-understandable. This makes it very hard to automate anything
`on the Web and-because of the sheer volume of information-
`impossi(cid:173)
`ble ro manage manually.
`Coping with the volume of information on the Web is a real problem,
`as witnessed by anyone who has used one of the popular search services.
`Since Web documents are not designed to be understood by machines, the
`only real form of searching available to us is full-text search. Entering key(cid:173)
`words into a search engine and receiving thousands of hits is not neces(cid:173)
`the documents we seek may or may no t be amo ng those
`sarily useful-
`thousands. Mere words used as search keywords are subject to cross-disci(cid:173)
`p linary semantic drift. Keywords thus perform poorly in situations where
`a search index covers multiple subject areas, as is the case with the Web. 1
`Wouldn't it be useful if other means of searching were available to us,
`in addition to full-text search (matching strings)? For example, we might
`know who wrote the document, when it was published , and what specif(cid:173)
`ically it discusses (although any particular word descri bing that subject
`might not be contained in the document desired). Machines (in this case,
`search engines) cannot understand a natural-language document and thus
`cannot always extract specific information from t he document, such as
`author, publication dare, or topic.
`In a recent white paper, Tim Berners-Lee, director of the Wo rld Wide
`Web Consortium, wrote: "Currently there is not only a large industry in
`applications to put information from legacy information systems onto the
`Web, there is also an industry in applications which surf the Web and, pro-·
`grammed with some idea of how the Web pages were automatically gen(cid:173)
`erated, retrieve the information and reconvert it into hard, well-defi ned
`machine-processable data. "2
`
`30
`
`JUlY • AU&UST 1991
`
`http:// computer.org/ internet/
`
`1089-7801/9 8/SIO.OOCI998 1EIE
`
`IEEE INTERNET COMPUTING
`
`IBM-1008
`Page 3 of 10
`
`
`
`W E B METADATA
`
`Markup Language), and also the knowledge repre(cid:173)
`sentation community. Framework design contribu(cid:173)
`t ions have also come from object-oriented pro(cid:173)
`gran1ming and modeling languages, and databases.
`T he framework's purpose is describing Web
`resources to facilitate automated processing ofWeb
`
`It's clear that stronger, more precise means of
`describing documents are needed. Information such
`as author, publication dates, and so forth is often
`qlled metadata. Metadata is commonly defined as
`data about data. For example, a library catalog is
`metadata because it describes publications, or library
`data. Similarly, a file system main(cid:173)
`tains access control information
`about files; this informatio n can
`also be seen as meradata. To main(cid:173)
`tain a library catalog you may also
`need an application that treats the
`catalog itself as data. H ence, one
`application's metadata is another
`application's data.
`This article discusses Web
`metadata, which we define as
`"machine-understandable descrip(cid:173)
`tions of Web resources." Web
`metadata has a number of uses,
`such as cataloging, software
`agents, and describing inrellecrual
`property rights. (For a description
`ofWeb metadata in applications,
`see the sidebar "Applications of
`Web Meradata. ")
`
`Cataloging
`Metadato can describe the contents of on individual Web resource, such as a page, on
`image, or the content of a collection-Web site, directory, and so forth. Metodata can
`also describe the relationships between members of a collection (for example, book, chap·
`ter, or table of contents). Descriptions of typically complex collections, especially those of
`Web sites, are sometimes referred to as site maps.
`
`Software Agents and Resource Discovery
`Search engines could take advantage of metadata, such as that used in cataloging, to
`perform more accurate searches. With the need for manual "weeding" of search results
`eliminated, we could better automate the search process. This a lso suggests that intelli·
`gent software agents could use metadata to exchange and shore knowledge {agent to
`agent), to communicate (agent to service or agent to user), and to "understand" their envi·
`ronment (that is, to do resource discovery on their own).
`
`RDF: AN
`INTRODUCTION
`The World Wide Web Consor(cid:173)
`tium (W3C) recently published
`the Resource Description Frame(cid:173)
`work, 2·5 a new standard for Web
`meradata. RDF, a fow1dation for
`processing metadata, provides
`inreroperability between applica(cid:173)
`tions that exchange machine(cid:173)
`understandable information on
`the Web. The design ofRDF has
`been
`influenced by several
`sources, all of wh ich have agreed
`on the basic principles of metada(cid:173)
`ta representation and transport.
`Key influences have come, for
`example, from the Web develop(cid:173)
`ment community itself, in the
`form ofHTML metadata and the
`Platform for Internet Content
`Selection.6 Other influences are
`the library community, the struc(cid:173)
`tured document communi ty (i n
`the fo rm of SGML and, more
`importantly, XML, the Extensible
`
`Electronic Commerce
`Metadata can encode information needed for electronic commerce. For example, with
`metodata we can locate a seller or buyer. We can find a product by searching the yellow
`pages, and we con agree on terms of sole (metodoto con represent prices, terms of pay·
`ment, and other contractual information).
`
`Content Rating
`The World Wide Web is a free medium, and balancing between free speech and pro·
`tection of minors is difficult. Metodoto can encode content rating labels that disclose the
`nature of a particular page's contents. This information, in turn, can be used in filtering
`content when "surfing" the Web. For example, parents can block their children's access
`to material deemed inappropriate.
`
`Intellectual Property Rights
`As with content rating, metodato could describe information about intellectual property
`rights of a document: the contractual terms related to the document's use and distribution.
`
`Digital Signatures
`Metadato con encode digital signatures, which, in turn, con help users decide which infor·
`motion and documents to trust.
`
`Privacy
`Metadoto can describe users' preferences regarding privacy-that is, what information users
`are willing to disclose about themselves when visiHng a Web site. Metadata can also describe
`a Web site's information-gathering policy regarding visiting users. This capability may dis(cid:173)
`suade users' suspicions about privacy on the Web and the perceived need for anonymity.
`
`IIEE INtlRIIET COMPUTIIIG
`
`http:// compuler.org/ internet/
`
`JULY • AUGUST 1991
`
`31
`
`IBM-1008
`Page 4 of 10
`
`
`
`INTERNET
`
`S EAR C H
`
`<?xml:namespoce ns='http:/ /'NWW.w3.org/TR/WD-rdf-syntox' prelix='RDF.?>
`<?xml:namespoce ns=·hHp:/ /purl.org/metadalo/dublin_core" prefix=' DC' ?>
`
`<RDF:RDf>
`<RDF:Description obout="http:/ /'NWW.some.org/ smith' >
`<DC:Creolor>John Smith</DC:Creolor>
`</RDF:Descriplion>
`</RDF:RDF>
`
`Figure 1. This RDF instance describes a Web resource with a given URL and states
`that "John Smith" is the creator of this particular resource. The Web page is a node
`with one property-DC:Creator-whose value is the string "John Smith."
`
`I http:/ /www.some.org/ smith ~--------t•~l __ " J_o_hn_ Sm_ith_" _ ___,
`
`.
`
`.
`
`DC:Creotor
`
`.
`
`Z linked by an edge labeled Y,
`pointing from X to Z.
`To score instances of this
`model in files, or to communicate
`these instances from one agent to
`another, we need a graph serial(cid:173)
`ization syntax. XML is the lan-
`guage the designers chose for use
`in the RDF specification.? RDF
`and XML are complementary.
`RDF leverages XML; however,
`XML needs RDF for defining
`what instances of metadara mean,
`and for allowing agents to agree
`on a common meaning. XML is
`only one syntactic representation
`for the RDF model; oth er syn(cid:173)
`taxes are possible.
`In the following example we
`will use terminology from Dublin Core, a metadata
`schema for building digital library catalogs. Figure
`1 is an example of a simple RDF instance. This
`metadata fragment describes a Web resource with
`a given URL and states that "John Smith" is the cre(cid:173)
`ator-
`that is, author in Dublin C ore library meta(cid:173)
`data terms-of this particular resource. In the
`model, the Web page is a node and it has one prop(cid:173)
`erty, namely DC:Creator, whose value is the string
`"John Smith." RDF relies on the XML namespace
`mechanism 8 to uniquely qual ify eleme nt names,
`hence two XML processing instructions precede
`the example. The element name prefix "RDF: " is
`used by all RDF core names, and in this example
`an XML processing instruction associates the prefix
`"D C:" with a Dublin Core schema U RI . RDF
`designers anticipate that RD F metadata will typi(cid:173)
`cally consist of instances and amibutes from many
`different sources. The probability of name conflicts
`is high, but the namespace mechanism solves this
`problem.
`Figure 2 is a graphical representation of rhe RDF
`instance shown in Figure 1.
`T he RDF designers also discussed alternate syn(cid:173)
`taxes based on S-expressians.6 $-expressions are an
`efficient, compact way of encoding structured data.
`T he RDF instance in Figure 1 could have been
`expressed as follows:
`
`Figure 2. A graph generated from the example in Figure 1.
`
`information. The resources RDF describes are gen(cid:173)
`erally anything that can be named with a Uniform
`Resource Identifier (URI), the class of Web identi(cid:173)
`fier that includes the common URL. D esigned as
`domain-neutral, RDF makes no assumptions about
`any particular application domain, nor defines a
`priori the semantics of any domain. Despite this,
`the mechanism is suitable for describing informa(cid:173)
`tion about any domain.
`RDF is a data model of metadata instances. T he
`RDF Model and Syntax Specification3 describes the
`model and one possible syntax for encoding and
`transporting RD F instances. To give RD F an
`object-oriented nature, the RDF Sch ema Specifi(cid:173)
`cation5 defines an extensible type system using the
`basic RDF model as building blocks.
`
`Model and Syntax
`RDF data consists of nodes and attached
`attribute/value pairs. Nodes can be any Web
`resources (in fact, anything to which you can give
`a URI), including other metadata instances. Attrib(cid:173)
`utes are named properties of nodes, and their val(cid:173)
`ues are either aromic (text strings) or other nodes
`(Web resources or metadata instances). The essence
`ofRDF is this model of nodes, attributes (or prop(cid:173)
`erties}, and their values.
`In addition to the node-centric view-an object(cid:173)
`oriented view of the RDF model reminiscent of
`frame-based representation systems-the RDF
`model can be seen as directed, labeled graphs
`(DLGs). The nodes are the vertices of a graph, and
`the properties name the edges. Therefore, if X has a
`property Y with the value Z, we can think of X and
`
`'
`
`(rdf:description about "http:/ / www.some.org/smith"
`dc:creotor "John Smith")
`
`T he designers chase XML in the RDF specifi(cid:173)
`cation on the basis of its perceived prevalence in
`
`32
`
`JULY • AUGUST 1991
`
`http:/ I computer.org/ internet/
`
`IEEE INTERMIT COMPUTING
`
`IBM-1008
`Page 5 of 10
`
`
`
`W E B META DATA
`
`Web software, rather chan irs
`technical merit.
`ln RDF, property values can be
`complex objects. In Figure 3, the
`"crearor property from Figure 1
`now has a value with more struc(cid:173)
`ture. Here, the value of the
`DC:Crearor property
`is an
`instance with rwo properties:
`Name and EMaiL Using the
`RDF instance in Figure 3, we
`could produce the graph shown
`in Figure 4.
`
`<?xml: namespoce ns= "hNp:/ /www w3 .org/TR/WD-rdf·syntax" prefix="RDF"?>
`<?xml:namespoce ns="hNp:/ /purl.org/metadata/dublin_core" prefix="DC"?>
`<?xml:namespace ns="http:/ /some.org/schemata/people" prefix="P"?>
`
`<RDF:RDF>
`<RDF:Descriplion about="hNp:/ /www.some.org/smith">
`<DC:Creator>
`<RDF: Descri otion>
`<P:Name>John Smith</P:Name>
`<P: EMail>mailto:smith@some.org</P:EMali>
`</RDF :Description>
`</DC:Creotor>
`</RDF: Description>
`</RDF:RDf>
`
`Figure 3. An RDF instance where the value of the creator property from Figure 1 has
`more structure.
`
`P:Nome
`
`"John Smith"
`
`"mailto:smith@some.org
`
`Metadata on Metadata
`As is often the case, metadam
`authors and processors need ro
`make statements about other
`statements expressed in RDF
`(we refer ro these as higher-o rder
`statements). This possibility
`requires careful consideration.
`For example, if we make the nat(cid:173)
`ural-language statement "The
`Web contains one billion docu(cid:173)
`ments," RDF would regard chis Figure 4. Graph generated from example in Figure 3.
`as true. On the other hand, the
`statement "John estimates that
`the Web contains one billion documents" makes a
`sratemenr about the relationship between John and
`his view of the Web, but it does not express any
`facts abou t the Web per se. Both kinds of state(cid:173)
`menrs are possible in RDF.
`When we create a statement in RDF that con(cid:173)
`sists of a node X, property Y, and value Z, we think
`of a triple [Y, X, Z] having been asserted (placed
`in RDF's inrernal database). Statements that exist
`in this database in the form of triples are consid(cid:173)
`ered true. This has noth ing ro do with epistemo(cid:173)
`logical, absolute truth; it merely says char the RDF
`system, when queried, will know that these state(cid:173)
`ments have been asserted. To make statements
`about [Y, X, Z], we must build a model of this
`statement. ln RDF we do this by asserting three
`new statements:
`
`[Believes, mailta:smith@some.org, P)
`("smith believes P")
`
`or
`
`[CreotedOn, P, " 1998-05-0 1 " )
`(the assertion P was created on 1 May 1998)
`
`Whether the original triple is still in the database
`determines whether the RDF system considers it
`true, but we could have asserted the above three
`reifying triples without ever asserting [Y, X, Z].
`The RDF syntax has a shorthand for expressing
`statements about other statements. If we wanted ro
`augment the first example by saying that "Jane
`Smith" is the author(= DC:Crearor) of the state(cid:173)
`ment abou t John's home page, it could be written
`as shown in Figure 5.
`The ability ro make statements about other
`statements is important. We originally included it
`in RDF ro make it possible ro digitally sign RDF
`statements. Because a typical use of RDF is to
`manipulate metadata from many different sources,
`however, it makes sense co have a mechanism for
`expressing beliefs and other modalities.
`
`[RDF:PropObj, P, X)
`(RDF:PropName, P, Y)
`[RDF:Value, P, Z)
`
`This modeling process is often called reification.
`We now have a new node, P, representing the state(cid:173)
`mem. We can make statements involving P, as in
`
`IEEE INTERN IT COMPUTING
`
`hllp:/ / computer.org/internet/
`
`JULY • AUGUST 1991
`
`33
`
`IBM-1008
`Page 6 of 10
`
`
`
`INT E RN ET
`
`S E A R C H
`
`<2xml:nomespoce ns="http:/ /INWW.w3.org/TR/WD-rdf-syntox" prefix="RDF"?>
`<?xml·nomespoce ns="http:/ / purl.org/metodoto/dublin_core· prefix="DC?>
`
`<RDF:RDF>
`<RDF:Description obout="http:/ /INWW.some.org/smith" bogiD="foo">
`<DC·Creotor>John Smith</DC ·Creator>
`</RDF: Description>
`<RDF: Description oboutEoch= · #loa·>
`<DC:Creotor>Jone Smith</DC:Creotor>
`</RDF ·Description>
`</RDF:RDF>
`
`Figure 5. An example of a higher-order statement.
`
`Schemata
`RDF does nor conrain any predefined vocabularies
`for authoring metadata. However, standard vocab(cid:173)
`ularies, or schemata as they are called in RDF, will
`emerge. T hey will do so eirher by specialized com(cid:173)
`munities cooperating in the design, or by natural
`selection. (Some schemata are selected simply
`because they are used more frequently than others
`in rhe same do main.) The existence of standard, or
`de facto standard, schemata is a core requirement
`for large-scale inreroperability.
`Anticipated schemata include a PICS-Iike con(cid:173)
`tent-rating architecture, a digital library vocabulary
`(currently rhe "Dublin Core"), and a sch ema for
`expressing digital signatures. Anyone can design a
`new schema; rhe only requirement is that a desig(cid:173)
`nating URl be included in rhe meradara instances.
`The use ofURls to name vocabularies is an impor(cid:173)
`tant RDF design feature: some meradata standard(cid:173)
`ization efforts have stumbled on the issue of estab(cid:173)
`lishing a central attribute registry. RDF permits,
`but does not require, a central registry.
`The RDF schema mechanism defines the root of
`rhe RDF type hierarchy. It does so in the form of
`basic classes such as Resource, Class, and so on. The
`classes include the necessary meta-object types for
`
`W3C Metadata and RDF Information
`http:/ /INWW.w3.org/RDF/
`http:/ /www.w3.org/TR/ NOTE-rdfarch
`Introductory Articles
`http:/ /www.w3.org/ TR/ NOTE-rdf-simple-intro
`http:/ / www.dlib.org/ dlib/ may98/ miller/ 05miller.htrnl
`Resources for Programmers
`http:/ /www. mozillo.org/ rdf/doc/
`http:/ / www.alphaWorks.ibm.com/formula/ rdfxml
`
`defining new classes: for example,
`Property Type. The RDF schema
`specification's class definition
`facilities let metadata authors
`place restrictions on property val(cid:173)
`ues and define classes in terms of
`existing classes
`(subclassing).
`Property value restrictions are in
`the form of cardinality and type
`consrrainrs; that is, restrictions on
`rhe number of values a property
`can have, and the classes of
`objects that can be values of a
`particular property. The W3C's schema work is
`ongoing. For specific instan ces, see the sidebar
`"Sample Schema Projects."
`
`WHY RDF AND NOT
`JUST XML?
`W h at are the RDF fram ework's major benefits?
`After all, XML offers structured data that could be
`used to encode and transport attribute/value pairs.
`As I stared earlier, RDF and XML are complemen(cid:173)
`tary. RDF is a model of metadata, and it only
`superficially addresses many encoding issues that
`transportation and file storage require, such as
`inrernationalization and character sees. For these
`issues, RDF relies on XML. But RDF also has sev(cid:173)
`eral advantages over XML.
`One design goal for RDF was to enable metada(cid:173)
`ta authors to specify semantics for data based on
`XML in a standardized, interoperable manner. RDF
`also offers features like collection containers and
`higher-order statements. RDF's mai n advantage,
`however, is that it requires metadata authors ro des(cid:173)
`ignate at least one underlying schema, and that the
`schemata are sharable and extensible. RDF is based
`on an object-oriented mindser, and schemata corre(cid:173)
`spond to classes in an object-oriented programming
`system . Organized in a hierarchy, schemata offer
`extensibility through subclass refinement. To create a
`schema slightly different from an existing one, there(cid:173)
`fore, requires only that you provide incremental
`modifications to the base schema. XML document
`type descriptions (DTDs) do nor offer this capabil(cid:173)
`ity. Through schemata sharability, RDF supports the
`reusability of definitions resulting from the metada(cid:173)
`ta work by individuals and specialized communities.
`Due to RDF's incremental extensibility, agents
`processing metadata will be able ro trace the origins
`of schemata with whidh they are unfamiliar to known
`schemata. They will be able to perform meaningful
`actions on meradata they weren't originaJly designed
`
`34
`
`JULY • AUGUST 1991
`
`http:/ I computer.org/internet/
`
`IEEE INTIINET COMPUTING
`
`IBM-1008
`Page 7 of 10
`
`
`
`ro process. For example, suppose you were to design
`an extension to the Dublin Core schema to leverage
`work done by the library communiry and also to
`allow o rganization-specific document metadata. To
`do so, you could simply use standard tools designed
`for plain Dublin Core. Because of the self-describing
`nature ofRDF schemata, a well-designed tool would
`be able ro do meaningful processing for the extended
`properties as well.
`RDF's sharabiliry and extensibiliry will also lead
`to a mix-and-match use of meradata and metadara
`schema descriptions. Metadata authors will be able
`to use multiple inheritance ro provide multiple views
`to their data, leveraging work done by others. Mo re(cid:173)
`over, it's possible to create RDF instance data based
`on multiple schemata from multiple sources-that
`is, interleaving different rypes of metadara; XML
`DTDs do nor support this featu re. This will lead to
`exciting possibilities when agems process meradata.
`For example, a processing agent may know how to
`process several rypes of RDF instances individually,
`but it will later also be able to reason about the com(cid:173)
`bination. Essentially, the combinacio n is more pow(cid:173)
`erful than the sum of its parts.
`From an implcmemation standpoint, RDF
`offers a clean, simple object model independent of
`the transport syntax of m etadata. An API for pro(cid:173)
`cessing RDF is likely to appear. It is also important
`to remember that although the RDF specificatio n
`defines an encoding syntax for RDF based on
`Xl\1L, RDF itself is not depend ent on XML: it
`could also usc other syntaxes (fo r example, S(cid:173)
`expressions). It is conceivable rhar various "rransla(cid:173)
`rors" will emerge, allowing data in various form ats
`(corresponding internally to the RDF data model)
`robe filtered and used by RDF processors.
`
`ORIGINS OF WEB METADATA
`AND RDF
`From the srandpoim of the Web, the history of
`standard ized meradata mechanisms begins with the
`HTML <META> and <LINK> rags. These let a Web
`page author record merainformation about a page
`and also indicate that page's relationship ro other
`relevam pages, such as a table of conrents.
`The <META> tag can specify the author of a Web
`page as follows:
`
`<META nome=" Author" content=" John Smith">
`
`Although both <META> and <LINK> tags are
`useful , they have certain shortcomings. What does
`the nam e "Autho r" really mean? It could be the
`
`W E B META DATA
`
`Dublin Core. The library community is designing o schema for build(cid:173)
`ing digital library catalogs (http:/ / purl.org/ metodoto/dublin_core/ ).
`
`PJP. This is the W3C's project to allow privacy preferences and policies
`to be expressed (http:/ / www.w3.org/ P3P/ ).
`
`IMS. This Instructional Management Systems project is building a meta(cid:173)
`data
`schema
`for managing online
`learning
`resources
`(http:/ /www.imsproject.org/ metodota).
`
`name of the person who created the page, or the
`person who wrote the page contents, o r even the
`Webmaster who maintains the page. In other
`words, the meaning of "Author" on one Web page
`might be different from its meaning on another
`page. In short, the namespace of atrribute names
`is unconrrolled (at least prior ro the new HTM L
`4.0 specification). The structure of amibure val(cid:173)
`ues is also not specified (for example, is it "John
`Smith" o r "Smith, Jo hn"?). Furthermore, it is very
`difficult to use <META> for higher-order state(cid:173)
`ments s uch as those described in the section
`"Metadata on Metadata."
`
`Content Rating
`Content rating is a hot ropic in the standardization
`communi ry. Attempts to balance free speech and
`protectio n of minors resulted in PICS, the W3C's
`content-raring archirecrure.9 PICS is a simple meta(cid:173)
`data mechanism well suited ro content rating; how(cid:173)
`ever, because attribute values can be chosen o nl y
`from controlled vocabularies (actually, they are all
`numeric), it has limited use as a general metadata
`architecture. On the positive side, PICS inrroduced
`the no tion of machine-inrerprerable schemata for
`m eradata. It also defined va rio us ways in which
`metadata can be associated wi th Web resources.
`Meradata can be
`
`•
`
`•
`
`•
`
`embedded in an HTML <MeTA> tag in the
`document head;
`transported in HITP headers-this is also pos(cid:173)
`sible with the <META> rag by using the attribute
`"hrrp-cquiv" instead of "name," and
`stored in and retrieved from a thi rd-parry meta(cid:173)
`data bureau.
`
`Figure 6 shows an example of a PICS label.
`
`Ill[ IIITll lllT COMPUTING
`
`hllp/ I compuler org/inlernel/
`
`JUlY • AU$UST 1991
`
`35
`
`IBM-1008
`Page 8 of 10
`
`
`
`INTERNET SEARCH
`
`(PICS·I I "http:/ /wwwgcf.org/v2 5"
`by "john Doe"
`labeb c '" ·1994 1 1 05T08 15.0500"
`until 1995. 12 31T23 59{)()()()"
`for ' hllp:/ /w3.org/PICS/Overviewhtml"
`ratings (suds 0.5 density 0 color/h~;<> I)
`for •hup:/ /w3.org/PICS/Underview.html"
`by •jone Doe"
`rohngs (subject 2 density I color/hue 1))
`
`Figure 6. A PICS label (that is, an instance of PICS metadata) pro·
`vides information about the content of a Web page.
`
`The desire m develop P!CS inro a general mera(cid:173)
`dara mechanism led the W3C ro work on "PICS(cid:173)
`NG," RDF's predecessor. c. With the advent ofRDF,
`W3C plans to transition PlCS to use RDE RDF will
`also allow content-rating information m be mixed
`with privacy informacion. The W3C's project on pri(cid:173)
`vacy technologies, P3P, builds directly on mp ofRDE
`
`Support for the RDF Standard
`umerous recem projects build meradata mecha(cid:173)
`nisms ttnd standards for narrow domains. Examples
`include the Internet Mail Consortium's vCard, an
`formalism, 10 and
`electronic business card
`Microsoft's Channel Definition Format for describing
`pushed Web content. When RDF is widely
`deployed , many of the special metadata standards
`can be cast as RDF applications.
`The library community has invested consider(cid:173)
`able effort in the development of electronic cata(cid:173)
`loging standards (for example, MARC, which
`stands for "Machine-Readable Catalog. " 11 ) Unfor(cid:173)
`tunately, some of these standards are nor useful in
`rhe Web's conrexL. These efforts are important,
`however, because they led to the Dublin Core
`meradata element. W3C considers the support of
`rhe Dublin Core paramount in the current meta(cid:173)
`clara standardization efforts. The digital library
`communi ty has very strongly advocated RDF
`throughout the development of this standard.
`Meradata is a form of structured data rransmit(cid:173)
`ted on the Web. The structured documem , or
`SGML, community has influenced the meradata
`standardization through the introduction ofXML.7
`XML is often billed as a type of universal syntax to
`solve the lack of interoperabili ty between various
`Web-based software systems. This language is only a
`way of"serializing a rree" or, more generally, a way of
`encoding structured dara for transport on rhe !m er(cid:173)
`ncr. It has no inherent semantics, nor docs ir offer a
`way for agents to exchange descriptions of seman(cid:173)
`tics. Provided that mechanisms ro define semantics
`
`are built atop XML, it is a natural choice for meta(cid:173)
`data synrax. This is because the ability to parse XM L
`syntax is (o r will be) prevalent in numerous Web(cid:173)
`related software products. Providing the semantic
`machinery is exactly what the W3C's RD F project
`has done. Wirhour RDF, everybody would have ro
`reinvent a mechanism for communicating seman(cid:173)
`tics between imeropcraring software systems.
`Microsoft's XML-Dara 12 is another framework
`that simplifies the definition of data written in
`XML. Early versions ofXML-Data were studied by
`the RDF design ream. Cu rrem focus in XML-Dara
`seems to be how to map legacy data into XML.
`RDF has also been influenced by knowledge
`representation research. T he KR community has
`spent a great deal of effort on the crucial problem
`of how to represent knowledge in a way machines
`can understand. RDF design was influenced, for
`one, by associarional representacions such as seman(cid:173)
`tic networks developed by Ross Quillian 13 and
`William Woods. 11 Equal influence came from
`frame-based representation systems: for example,
`those by Marvin Minsky, 15 and by Richard Fikes
`and Tom Kehler. 16 Another direct influence came
`from Meta Content Framework (MCF), amerada(cid:173)
`ta framework reminiscent of semantic networks. 17
`RDF should not be confused with more power(cid:173)
`ful knowledge representation formalisms. Jn knowl(cid:173)
`edge interchange, KIF 18 is a de facto standard in rhe
`research community. Description logics, such as
`CLASSIC, is another area auracting much attention
`recently. RD F lacks certain mechanisms, ~uch as
`negation a nd quantification. Designers deliberate(cid:173)
`ly excluded these features-first-order predicate
`for fear that such complex features would
`logic-
`discourage RD F's acceptance and deployment
`within the Web community.
`
`FUTURE OF WEB METADATA
`Standardized metadata is a solution to the lack of
`machine- understandable semantics, one of the
`Wo