throbber
VOlUME 2
`
`• NUMIEl •
`
`• JUlY
`
`• AUGUST 1991
`
`UNIVERSITY Of WASHINGTON
`
`AUG 5
`
`1998
`
`LIBRARIES
`
`ENGINEERING LIBRARY
`DISPLAY PERIODICAL
`Non-circulating until:
`
`r
`?.'175
`£'57
`.1"3?
`z: 'I
`
`--
`
`IEEE~
`COMPUTER
`SOCIETY
`
`ll,t .. t .. t. ,, .tlt.t ... t.t ... t.tl,t .. lt .. ,11, .. 1 ... II
`****************** 3-DIGIT 981
`0::31998 173
`IJHI1JEP..SIT'i OF WASHI.HGTOH
`S ERT'I.ALS t)'I.IJT.S'I.OH
`PO BOX 352900
`SEATTLE WA 9et9S-2900
`
`00347
`
`IBM-1008
`Page 1 of 10
`
`

`

`IEEEiill
`r.
`
`VOLUME 2
`
`NUMBER 4
`
`• JULY / AUGUST 1998
`
`On the Wire
`9 Reliable Multicast: When Many Must Absolutely Positively Receive It
`Christopha Metz
`Reliable IP Multicast, which permits reliable multipoint distribution of data over TP networks, is finding sup(cid:173)
`port in the vendor communiry.
`
`INTERNET SEARCH TECHNOLOGIES
`
`Guest Editor's Introduction
`21 Searching the Internet
`Robert E. Filman and Sangam Pam
`Networks and devices arc going to get faster and cheaper. What will continue to be Al-hard is the problem of making sense
`of the mass of data and misinfo rmation that fills the Web.
`
`24 Toward a Qualitative Search Engine
`Yanhong Li
`Traditional search engines do nor consider document
`qualiry in ranking search results. The Hyperlink Vector
`Voting method adds a qualitative dimension ro irs rank(cid:173)
`in gs by facroring in the number and d escriptions of
`hyperlinks ro rhc document.
`
`47 Case-Based Reasoning Support for
`Online Catalog Sales
`lvo Vollrath, Wolfgang Wilke, and Ralph Bergmann
`Case-based reasoning uses similarity measures and
`domain-specific kn owledge for informacion retrieval
`and problem solving. C BR techniques can be applied ro
`e-commerce applications for intelligent sales support.
`
`30 Web Metadata: A Matter of
`Semantics
`Ora Lassila
`The sheer volume of informacion can make searching
`the Web fru stratin g. The Reso u rce Description
`Framework, with irs focus on machine-understandable
`semantics, has rhc potential for savi ng time and yield(cid:173)
`ing more accurate search results.
`
`38 Context and Page Analysis for
`Improved Web Search
`Steve Lawrence and C Lee Giles
`NECI Research Institute has developed a merasearch
`engine char improves the efficiency of Web searches by
`downloading and analyzing each document and then
`displaying results chat show the query terms in context.
`
`55 Virtual Database Technology:
`Transforming the Internet into a
`Database
`Anand Rajaraman and Peter Norvig
`VDB techn ology makes external data sources ace
`as an extension of an enterprise's relational database
`sysrern.
`
`59 Using Relevance Feedback in
`Content-Based Image Metasearch
`Ana B. Benitez, Mandis Beigi, and Shih-Fu Chang
`MctaSeek is an image merasearch engine developed co
`explore the query of large, distributed, online visual
`informacion systems. The current implementation
`integrates user feedback into a performance-ranking
`mechanism.
`
`70
`
`Research Feature
`Increasing Application Accessibility Through Java
`Antonio Puliajito, Orazio Tomarchio, Lorenzo Vita, and Kishor S. Trivedi
`Java can be used ro create a network computing platform char lets users share applications nor specifically
`devised for the Web. The authors used one such platform ro porr an existi ng rool and develop a new appl ication.
`
`IBM-1008
`Page 2 of 10
`
`

`

`WEB
`METADATA:
`A MaHer of
`Semantics
`
`The sheer volume of
`
`informa~on can make
`
`ORA LASSILA
`Nokio Research Center
`
`searching the Web frustra~ng.
`
`The Resource Description
`
`Framework, with its focus on
`
`machine-understandable
`
`seman~cs, has the poten~al for
`
`saving ~me and yielding more
`
`accurate search results.
`
`T he surge in popularity of the Wo rld Wide Web-and in the quan(cid:173)
`
`is staggering. Although rhe Web is
`ti ty of information it contains-
`built on relatively simple principles, its growth has not been with(cid:173)
`out substantial growing pains. The Web was built for human consump(cid:173)
`tion, and although everything on the Web is machine-readable, it is nor
`machi ne-understandable. This makes it very hard to automate anything
`on the Web and-because of the sheer volume of information-
`impossi(cid:173)
`ble ro manage manually.
`Coping with the volume of information on the Web is a real problem,
`as witnessed by anyone who has used one of the popular search services.
`Since Web documents are not designed to be understood by machines, the
`only real form of searching available to us is full-text search. Entering key(cid:173)
`words into a search engine and receiving thousands of hits is not neces(cid:173)
`the documents we seek may or may no t be amo ng those
`sarily useful-
`thousands. Mere words used as search keywords are subject to cross-disci(cid:173)
`p linary semantic drift. Keywords thus perform poorly in situations where
`a search index covers multiple subject areas, as is the case with the Web. 1
`Wouldn't it be useful if other means of searching were available to us,
`in addition to full-text search (matching strings)? For example, we might
`know who wrote the document, when it was published , and what specif(cid:173)
`ically it discusses (although any particular word descri bing that subject
`might not be contained in the document desired). Machines (in this case,
`search engines) cannot understand a natural-language document and thus
`cannot always extract specific information from t he document, such as
`author, publication dare, or topic.
`In a recent white paper, Tim Berners-Lee, director of the Wo rld Wide
`Web Consortium, wrote: "Currently there is not only a large industry in
`applications to put information from legacy information systems onto the
`Web, there is also an industry in applications which surf the Web and, pro-·
`grammed with some idea of how the Web pages were automatically gen(cid:173)
`erated, retrieve the information and reconvert it into hard, well-defi ned
`machine-processable data. "2
`
`30
`
`JUlY • AU&UST 1991
`
`http:// computer.org/ internet/
`
`1089-7801/9 8/SIO.OOCI998 1EIE
`
`IEEE INTERNET COMPUTING
`
`IBM-1008
`Page 3 of 10
`
`

`

`W E B METADATA
`
`Markup Language), and also the knowledge repre(cid:173)
`sentation community. Framework design contribu(cid:173)
`t ions have also come from object-oriented pro(cid:173)
`gran1ming and modeling languages, and databases.
`T he framework's purpose is describing Web
`resources to facilitate automated processing ofWeb
`
`It's clear that stronger, more precise means of
`describing documents are needed. Information such
`as author, publication dates, and so forth is often
`qlled metadata. Metadata is commonly defined as
`data about data. For example, a library catalog is
`metadata because it describes publications, or library
`data. Similarly, a file system main(cid:173)
`tains access control information
`about files; this informatio n can
`also be seen as meradata. To main(cid:173)
`tain a library catalog you may also
`need an application that treats the
`catalog itself as data. H ence, one
`application's metadata is another
`application's data.
`This article discusses Web
`metadata, which we define as
`"machine-understandable descrip(cid:173)
`tions of Web resources." Web
`metadata has a number of uses,
`such as cataloging, software
`agents, and describing inrellecrual
`property rights. (For a description
`ofWeb metadata in applications,
`see the sidebar "Applications of
`Web Meradata. ")
`
`Cataloging
`Metadato can describe the contents of on individual Web resource, such as a page, on
`image, or the content of a collection-Web site, directory, and so forth. Metodata can
`also describe the relationships between members of a collection (for example, book, chap·
`ter, or table of contents). Descriptions of typically complex collections, especially those of
`Web sites, are sometimes referred to as site maps.
`
`Software Agents and Resource Discovery
`Search engines could take advantage of metadata, such as that used in cataloging, to
`perform more accurate searches. With the need for manual "weeding" of search results
`eliminated, we could better automate the search process. This a lso suggests that intelli·
`gent software agents could use metadata to exchange and shore knowledge {agent to
`agent), to communicate (agent to service or agent to user), and to "understand" their envi·
`ronment (that is, to do resource discovery on their own).
`
`RDF: AN
`INTRODUCTION
`The World Wide Web Consor(cid:173)
`tium (W3C) recently published
`the Resource Description Frame(cid:173)
`work, 2·5 a new standard for Web
`meradata. RDF, a fow1dation for
`processing metadata, provides
`inreroperability between applica(cid:173)
`tions that exchange machine(cid:173)
`understandable information on
`the Web. The design ofRDF has
`been
`influenced by several
`sources, all of wh ich have agreed
`on the basic principles of metada(cid:173)
`ta representation and transport.
`Key influences have come, for
`example, from the Web develop(cid:173)
`ment community itself, in the
`form ofHTML metadata and the
`Platform for Internet Content
`Selection.6 Other influences are
`the library community, the struc(cid:173)
`tured document communi ty (i n
`the fo rm of SGML and, more
`importantly, XML, the Extensible
`
`Electronic Commerce
`Metadata can encode information needed for electronic commerce. For example, with
`metodata we can locate a seller or buyer. We can find a product by searching the yellow
`pages, and we con agree on terms of sole (metodoto con represent prices, terms of pay·
`ment, and other contractual information).
`
`Content Rating
`The World Wide Web is a free medium, and balancing between free speech and pro·
`tection of minors is difficult. Metodoto can encode content rating labels that disclose the
`nature of a particular page's contents. This information, in turn, can be used in filtering
`content when "surfing" the Web. For example, parents can block their children's access
`to material deemed inappropriate.
`
`Intellectual Property Rights
`As with content rating, metodato could describe information about intellectual property
`rights of a document: the contractual terms related to the document's use and distribution.
`
`Digital Signatures
`Metadato con encode digital signatures, which, in turn, con help users decide which infor·
`motion and documents to trust.
`
`Privacy
`Metadoto can describe users' preferences regarding privacy-that is, what information users
`are willing to disclose about themselves when visiHng a Web site. Metadata can also describe
`a Web site's information-gathering policy regarding visiting users. This capability may dis(cid:173)
`suade users' suspicions about privacy on the Web and the perceived need for anonymity.
`
`IIEE INtlRIIET COMPUTIIIG
`
`http:// compuler.org/ internet/
`
`JULY • AUGUST 1991
`
`31
`
`IBM-1008
`Page 4 of 10
`
`

`

`INTERNET
`
`S EAR C H
`
`<?xml:namespoce ns='http:/ /'NWW.w3.org/TR/WD-rdf-syntox' prelix='RDF.?>
`<?xml:namespoce ns=·hHp:/ /purl.org/metadalo/dublin_core" prefix=' DC' ?>
`
`<RDF:RDf>
`<RDF:Description obout="http:/ /'NWW.some.org/ smith' >
`<DC:Creolor>John Smith</DC:Creolor>
`</RDF:Descriplion>
`</RDF:RDF>
`
`Figure 1. This RDF instance describes a Web resource with a given URL and states
`that "John Smith" is the creator of this particular resource. The Web page is a node
`with one property-DC:Creator-whose value is the string "John Smith."
`
`I http:/ /www.some.org/ smith ~--------t•~l __ " J_o_hn_ Sm_ith_" _ ___,
`
`.
`
`.
`
`DC:Creotor
`
`.
`
`Z linked by an edge labeled Y,
`pointing from X to Z.
`To score instances of this
`model in files, or to communicate
`these instances from one agent to
`another, we need a graph serial(cid:173)
`ization syntax. XML is the lan-
`guage the designers chose for use
`in the RDF specification.? RDF
`and XML are complementary.
`RDF leverages XML; however,
`XML needs RDF for defining
`what instances of metadara mean,
`and for allowing agents to agree
`on a common meaning. XML is
`only one syntactic representation
`for the RDF model; oth er syn(cid:173)
`taxes are possible.
`In the following example we
`will use terminology from Dublin Core, a metadata
`schema for building digital library catalogs. Figure
`1 is an example of a simple RDF instance. This
`metadata fragment describes a Web resource with
`a given URL and states that "John Smith" is the cre(cid:173)
`ator-
`that is, author in Dublin C ore library meta(cid:173)
`data terms-of this particular resource. In the
`model, the Web page is a node and it has one prop(cid:173)
`erty, namely DC:Creator, whose value is the string
`"John Smith." RDF relies on the XML namespace
`mechanism 8 to uniquely qual ify eleme nt names,
`hence two XML processing instructions precede
`the example. The element name prefix "RDF: " is
`used by all RDF core names, and in this example
`an XML processing instruction associates the prefix
`"D C:" with a Dublin Core schema U RI . RDF
`designers anticipate that RD F metadata will typi(cid:173)
`cally consist of instances and amibutes from many
`different sources. The probability of name conflicts
`is high, but the namespace mechanism solves this
`problem.
`Figure 2 is a graphical representation of rhe RDF
`instance shown in Figure 1.
`T he RDF designers also discussed alternate syn(cid:173)
`taxes based on S-expressians.6 $-expressions are an
`efficient, compact way of encoding structured data.
`T he RDF instance in Figure 1 could have been
`expressed as follows:
`
`Figure 2. A graph generated from the example in Figure 1.
`
`information. The resources RDF describes are gen(cid:173)
`erally anything that can be named with a Uniform
`Resource Identifier (URI), the class of Web identi(cid:173)
`fier that includes the common URL. D esigned as
`domain-neutral, RDF makes no assumptions about
`any particular application domain, nor defines a
`priori the semantics of any domain. Despite this,
`the mechanism is suitable for describing informa(cid:173)
`tion about any domain.
`RDF is a data model of metadata instances. T he
`RDF Model and Syntax Specification3 describes the
`model and one possible syntax for encoding and
`transporting RD F instances. To give RD F an
`object-oriented nature, the RDF Sch ema Specifi(cid:173)
`cation5 defines an extensible type system using the
`basic RDF model as building blocks.
`
`Model and Syntax
`RDF data consists of nodes and attached
`attribute/value pairs. Nodes can be any Web
`resources (in fact, anything to which you can give
`a URI), including other metadata instances. Attrib(cid:173)
`utes are named properties of nodes, and their val(cid:173)
`ues are either aromic (text strings) or other nodes
`(Web resources or metadata instances). The essence
`ofRDF is this model of nodes, attributes (or prop(cid:173)
`erties}, and their values.
`In addition to the node-centric view-an object(cid:173)
`oriented view of the RDF model reminiscent of
`frame-based representation systems-the RDF
`model can be seen as directed, labeled graphs
`(DLGs). The nodes are the vertices of a graph, and
`the properties name the edges. Therefore, if X has a
`property Y with the value Z, we can think of X and
`
`'
`
`(rdf:description about "http:/ / www.some.org/smith"
`dc:creotor "John Smith")
`
`T he designers chase XML in the RDF specifi(cid:173)
`cation on the basis of its perceived prevalence in
`
`32
`
`JULY • AUGUST 1991
`
`http:/ I computer.org/ internet/
`
`IEEE INTERMIT COMPUTING
`
`IBM-1008
`Page 5 of 10
`
`

`

`W E B META DATA
`
`Web software, rather chan irs
`technical merit.
`ln RDF, property values can be
`complex objects. In Figure 3, the
`"crearor property from Figure 1
`now has a value with more struc(cid:173)
`ture. Here, the value of the
`DC:Crearor property
`is an
`instance with rwo properties:
`Name and EMaiL Using the
`RDF instance in Figure 3, we
`could produce the graph shown
`in Figure 4.
`
`<?xml: namespoce ns= "hNp:/ /www w3 .org/TR/WD-rdf·syntax" prefix="RDF"?>
`<?xml:namespoce ns="hNp:/ /purl.org/metadata/dublin_core" prefix="DC"?>
`<?xml:namespace ns="http:/ /some.org/schemata/people" prefix="P"?>
`
`<RDF:RDF>
`<RDF:Descriplion about="hNp:/ /www.some.org/smith">
`<DC:Creator>
`<RDF: Descri otion>
`<P:Name>John Smith</P:Name>
`<P: EMail>mailto:smith@some.org</P:EMali>
`</RDF :Description>
`</DC:Creotor>
`</RDF: Description>
`</RDF:RDf>
`
`Figure 3. An RDF instance where the value of the creator property from Figure 1 has
`more structure.
`
`P:Nome
`
`"John Smith"
`
`"mailto:smith@some.org
`
`Metadata on Metadata
`As is often the case, metadam
`authors and processors need ro
`make statements about other
`statements expressed in RDF
`(we refer ro these as higher-o rder
`statements). This possibility
`requires careful consideration.
`For example, if we make the nat(cid:173)
`ural-language statement "The
`Web contains one billion docu(cid:173)
`ments," RDF would regard chis Figure 4. Graph generated from example in Figure 3.
`as true. On the other hand, the
`statement "John estimates that
`the Web contains one billion documents" makes a
`sratemenr about the relationship between John and
`his view of the Web, but it does not express any
`facts abou t the Web per se. Both kinds of state(cid:173)
`menrs are possible in RDF.
`When we create a statement in RDF that con(cid:173)
`sists of a node X, property Y, and value Z, we think
`of a triple [Y, X, Z] having been asserted (placed
`in RDF's inrernal database). Statements that exist
`in this database in the form of triples are consid(cid:173)
`ered true. This has noth ing ro do with epistemo(cid:173)
`logical, absolute truth; it merely says char the RDF
`system, when queried, will know that these state(cid:173)
`ments have been asserted. To make statements
`about [Y, X, Z], we must build a model of this
`statement. ln RDF we do this by asserting three
`new statements:
`
`[Believes, mailta:smith@some.org, P)
`("smith believes P")
`
`or
`
`[CreotedOn, P, " 1998-05-0 1 " )
`(the assertion P was created on 1 May 1998)
`
`Whether the original triple is still in the database
`determines whether the RDF system considers it
`true, but we could have asserted the above three
`reifying triples without ever asserting [Y, X, Z].
`The RDF syntax has a shorthand for expressing
`statements about other statements. If we wanted ro
`augment the first example by saying that "Jane
`Smith" is the author(= DC:Crearor) of the state(cid:173)
`ment abou t John's home page, it could be written
`as shown in Figure 5.
`The ability ro make statements about other
`statements is important. We originally included it
`in RDF ro make it possible ro digitally sign RDF
`statements. Because a typical use of RDF is to
`manipulate metadata from many different sources,
`however, it makes sense co have a mechanism for
`expressing beliefs and other modalities.
`
`[RDF:PropObj, P, X)
`(RDF:PropName, P, Y)
`[RDF:Value, P, Z)
`
`This modeling process is often called reification.
`We now have a new node, P, representing the state(cid:173)
`mem. We can make statements involving P, as in
`
`IEEE INTERN IT COMPUTING
`
`hllp:/ / computer.org/internet/
`
`JULY • AUGUST 1991
`
`33
`
`IBM-1008
`Page 6 of 10
`
`

`

`INT E RN ET
`
`S E A R C H
`
`<2xml:nomespoce ns="http:/ /INWW.w3.org/TR/WD-rdf-syntox" prefix="RDF"?>
`<?xml·nomespoce ns="http:/ / purl.org/metodoto/dublin_core· prefix="DC?>
`
`<RDF:RDF>
`<RDF:Description obout="http:/ /INWW.some.org/smith" bogiD="foo">
`<DC·Creotor>John Smith</DC ·Creator>
`</RDF: Description>
`<RDF: Description oboutEoch= · #loa·>
`<DC:Creotor>Jone Smith</DC:Creotor>
`</RDF ·Description>
`</RDF:RDF>
`
`Figure 5. An example of a higher-order statement.
`
`Schemata
`RDF does nor conrain any predefined vocabularies
`for authoring metadata. However, standard vocab(cid:173)
`ularies, or schemata as they are called in RDF, will
`emerge. T hey will do so eirher by specialized com(cid:173)
`munities cooperating in the design, or by natural
`selection. (Some schemata are selected simply
`because they are used more frequently than others
`in rhe same do main.) The existence of standard, or
`de facto standard, schemata is a core requirement
`for large-scale inreroperability.
`Anticipated schemata include a PICS-Iike con(cid:173)
`tent-rating architecture, a digital library vocabulary
`(currently rhe "Dublin Core"), and a sch ema for
`expressing digital signatures. Anyone can design a
`new schema; rhe only requirement is that a desig(cid:173)
`nating URl be included in rhe meradara instances.
`The use ofURls to name vocabularies is an impor(cid:173)
`tant RDF design feature: some meradata standard(cid:173)
`ization efforts have stumbled on the issue of estab(cid:173)
`lishing a central attribute registry. RDF permits,
`but does not require, a central registry.
`The RDF schema mechanism defines the root of
`rhe RDF type hierarchy. It does so in the form of
`basic classes such as Resource, Class, and so on. The
`classes include the necessary meta-object types for
`
`W3C Metadata and RDF Information
`http:/ /INWW.w3.org/RDF/
`http:/ /www.w3.org/TR/ NOTE-rdfarch
`Introductory Articles
`http:/ /www.w3.org/ TR/ NOTE-rdf-simple-intro
`http:/ / www.dlib.org/ dlib/ may98/ miller/ 05miller.htrnl
`Resources for Programmers
`http:/ /www. mozillo.org/ rdf/doc/
`http:/ / www.alphaWorks.ibm.com/formula/ rdfxml
`
`defining new classes: for example,
`Property Type. The RDF schema
`specification's class definition
`facilities let metadata authors
`place restrictions on property val(cid:173)
`ues and define classes in terms of
`existing classes
`(subclassing).
`Property value restrictions are in
`the form of cardinality and type
`consrrainrs; that is, restrictions on
`rhe number of values a property
`can have, and the classes of
`objects that can be values of a
`particular property. The W3C's schema work is
`ongoing. For specific instan ces, see the sidebar
`"Sample Schema Projects."
`
`WHY RDF AND NOT
`JUST XML?
`W h at are the RDF fram ework's major benefits?
`After all, XML offers structured data that could be
`used to encode and transport attribute/value pairs.
`As I stared earlier, RDF and XML are complemen(cid:173)
`tary. RDF is a model of metadata, and it only
`superficially addresses many encoding issues that
`transportation and file storage require, such as
`inrernationalization and character sees. For these
`issues, RDF relies on XML. But RDF also has sev(cid:173)
`eral advantages over XML.
`One design goal for RDF was to enable metada(cid:173)
`ta authors to specify semantics for data based on
`XML in a standardized, interoperable manner. RDF
`also offers features like collection containers and
`higher-order statements. RDF's mai n advantage,
`however, is that it requires metadata authors ro des(cid:173)
`ignate at least one underlying schema, and that the
`schemata are sharable and extensible. RDF is based
`on an object-oriented mindser, and schemata corre(cid:173)
`spond to classes in an object-oriented programming
`system . Organized in a hierarchy, schemata offer
`extensibility through subclass refinement. To create a
`schema slightly different from an existing one, there(cid:173)
`fore, requires only that you provide incremental
`modifications to the base schema. XML document
`type descriptions (DTDs) do nor offer this capabil(cid:173)
`ity. Through schemata sharability, RDF supports the
`reusability of definitions resulting from the metada(cid:173)
`ta work by individuals and specialized communities.
`Due to RDF's incremental extensibility, agents
`processing metadata will be able ro trace the origins
`of schemata with whidh they are unfamiliar to known
`schemata. They will be able to perform meaningful
`actions on meradata they weren't originaJly designed
`
`34
`
`JULY • AUGUST 1991
`
`http:/ I computer.org/internet/
`
`IEEE INTIINET COMPUTING
`
`IBM-1008
`Page 7 of 10
`
`

`

`ro process. For example, suppose you were to design
`an extension to the Dublin Core schema to leverage
`work done by the library communiry and also to
`allow o rganization-specific document metadata. To
`do so, you could simply use standard tools designed
`for plain Dublin Core. Because of the self-describing
`nature ofRDF schemata, a well-designed tool would
`be able ro do meaningful processing for the extended
`properties as well.
`RDF's sharabiliry and extensibiliry will also lead
`to a mix-and-match use of meradata and metadara
`schema descriptions. Metadata authors will be able
`to use multiple inheritance ro provide multiple views
`to their data, leveraging work done by others. Mo re(cid:173)
`over, it's possible to create RDF instance data based
`on multiple schemata from multiple sources-that
`is, interleaving different rypes of metadara; XML
`DTDs do nor support this featu re. This will lead to
`exciting possibilities when agems process meradata.
`For example, a processing agent may know how to
`process several rypes of RDF instances individually,
`but it will later also be able to reason about the com(cid:173)
`bination. Essentially, the combinacio n is more pow(cid:173)
`erful than the sum of its parts.
`From an implcmemation standpoint, RDF
`offers a clean, simple object model independent of
`the transport syntax of m etadata. An API for pro(cid:173)
`cessing RDF is likely to appear. It is also important
`to remember that although the RDF specificatio n
`defines an encoding syntax for RDF based on
`Xl\1L, RDF itself is not depend ent on XML: it
`could also usc other syntaxes (fo r example, S(cid:173)
`expressions). It is conceivable rhar various "rransla(cid:173)
`rors" will emerge, allowing data in various form ats
`(corresponding internally to the RDF data model)
`robe filtered and used by RDF processors.
`
`ORIGINS OF WEB METADATA
`AND RDF
`From the srandpoim of the Web, the history of
`standard ized meradata mechanisms begins with the
`HTML <META> and <LINK> rags. These let a Web
`page author record merainformation about a page
`and also indicate that page's relationship ro other
`relevam pages, such as a table of conrents.
`The <META> tag can specify the author of a Web
`page as follows:
`
`<META nome=" Author" content=" John Smith">
`
`Although both <META> and <LINK> tags are
`useful , they have certain shortcomings. What does
`the nam e "Autho r" really mean? It could be the
`
`W E B META DATA
`
`Dublin Core. The library community is designing o schema for build(cid:173)
`ing digital library catalogs (http:/ / purl.org/ metodoto/dublin_core/ ).
`
`PJP. This is the W3C's project to allow privacy preferences and policies
`to be expressed (http:/ / www.w3.org/ P3P/ ).
`
`IMS. This Instructional Management Systems project is building a meta(cid:173)
`data
`schema
`for managing online
`learning
`resources
`(http:/ /www.imsproject.org/ metodota).
`
`name of the person who created the page, or the
`person who wrote the page contents, o r even the
`Webmaster who maintains the page. In other
`words, the meaning of "Author" on one Web page
`might be different from its meaning on another
`page. In short, the namespace of atrribute names
`is unconrrolled (at least prior ro the new HTM L
`4.0 specification). The structure of amibure val(cid:173)
`ues is also not specified (for example, is it "John
`Smith" o r "Smith, Jo hn"?). Furthermore, it is very
`difficult to use <META> for higher-order state(cid:173)
`ments s uch as those described in the section
`"Metadata on Metadata."
`
`Content Rating
`Content rating is a hot ropic in the standardization
`communi ry. Attempts to balance free speech and
`protectio n of minors resulted in PICS, the W3C's
`content-raring archirecrure.9 PICS is a simple meta(cid:173)
`data mechanism well suited ro content rating; how(cid:173)
`ever, because attribute values can be chosen o nl y
`from controlled vocabularies (actually, they are all
`numeric), it has limited use as a general metadata
`architecture. On the positive side, PICS inrroduced
`the no tion of machine-inrerprerable schemata for
`m eradata. It also defined va rio us ways in which
`metadata can be associated wi th Web resources.
`Meradata can be
`
`•
`
`•
`
`•
`
`embedded in an HTML <MeTA> tag in the
`document head;
`transported in HITP headers-this is also pos(cid:173)
`sible with the <META> rag by using the attribute
`"hrrp-cquiv" instead of "name," and
`stored in and retrieved from a thi rd-parry meta(cid:173)
`data bureau.
`
`Figure 6 shows an example of a PICS label.
`
`Ill[ IIITll lllT COMPUTING
`
`hllp/ I compuler org/inlernel/
`
`JUlY • AU$UST 1991
`
`35
`
`IBM-1008
`Page 8 of 10
`
`

`

`INTERNET SEARCH
`
`(PICS·I I "http:/ /wwwgcf.org/v2 5"
`by "john Doe"
`labeb c '" ·1994 1 1 05T08 15.0500"
`until 1995. 12 31T23 59{)()()()"
`for ' hllp:/ /w3.org/PICS/Overviewhtml"
`ratings (suds 0.5 density 0 color/h~;<> I)
`for •hup:/ /w3.org/PICS/Underview.html"
`by •jone Doe"
`rohngs (subject 2 density I color/hue 1))
`
`Figure 6. A PICS label (that is, an instance of PICS metadata) pro·
`vides information about the content of a Web page.
`
`The desire m develop P!CS inro a general mera(cid:173)
`dara mechanism led the W3C ro work on "PICS(cid:173)
`NG," RDF's predecessor. c. With the advent ofRDF,
`W3C plans to transition PlCS to use RDE RDF will
`also allow content-rating information m be mixed
`with privacy informacion. The W3C's project on pri(cid:173)
`vacy technologies, P3P, builds directly on mp ofRDE
`
`Support for the RDF Standard
`umerous recem projects build meradata mecha(cid:173)
`nisms ttnd standards for narrow domains. Examples
`include the Internet Mail Consortium's vCard, an
`formalism, 10 and
`electronic business card
`Microsoft's Channel Definition Format for describing
`pushed Web content. When RDF is widely
`deployed , many of the special metadata standards
`can be cast as RDF applications.
`The library community has invested consider(cid:173)
`able effort in the development of electronic cata(cid:173)
`loging standards (for example, MARC, which
`stands for "Machine-Readable Catalog. " 11 ) Unfor(cid:173)
`tunately, some of these standards are nor useful in
`rhe Web's conrexL. These efforts are important,
`however, because they led to the Dublin Core
`meradata element. W3C considers the support of
`rhe Dublin Core paramount in the current meta(cid:173)
`clara standardization efforts. The digital library
`communi ty has very strongly advocated RDF
`throughout the development of this standard.
`Meradata is a form of structured data rransmit(cid:173)
`ted on the Web. The structured documem , or
`SGML, community has influenced the meradata
`standardization through the introduction ofXML.7
`XML is often billed as a type of universal syntax to
`solve the lack of interoperabili ty between various
`Web-based software systems. This language is only a
`way of"serializing a rree" or, more generally, a way of
`encoding structured dara for transport on rhe !m er(cid:173)
`ncr. It has no inherent semantics, nor docs ir offer a
`way for agents to exchange descriptions of seman(cid:173)
`tics. Provided that mechanisms ro define semantics
`
`are built atop XML, it is a natural choice for meta(cid:173)
`data synrax. This is because the ability to parse XM L
`syntax is (o r will be) prevalent in numerous Web(cid:173)
`related software products. Providing the semantic
`machinery is exactly what the W3C's RD F project
`has done. Wirhour RDF, everybody would have ro
`reinvent a mechanism for communicating seman(cid:173)
`tics between imeropcraring software systems.
`Microsoft's XML-Dara 12 is another framework
`that simplifies the definition of data written in
`XML. Early versions ofXML-Data were studied by
`the RDF design ream. Cu rrem focus in XML-Dara
`seems to be how to map legacy data into XML.
`RDF has also been influenced by knowledge
`representation research. T he KR community has
`spent a great deal of effort on the crucial problem
`of how to represent knowledge in a way machines
`can understand. RDF design was influenced, for
`one, by associarional representacions such as seman(cid:173)
`tic networks developed by Ross Quillian 13 and
`William Woods. 11 Equal influence came from
`frame-based representation systems: for example,
`those by Marvin Minsky, 15 and by Richard Fikes
`and Tom Kehler. 16 Another direct influence came
`from Meta Content Framework (MCF), amerada(cid:173)
`ta framework reminiscent of semantic networks. 17
`RDF should not be confused with more power(cid:173)
`ful knowledge representation formalisms. Jn knowl(cid:173)
`edge interchange, KIF 18 is a de facto standard in rhe
`research community. Description logics, such as
`CLASSIC, is another area auracting much attention
`recently. RD F lacks certain mechanisms, ~uch as
`negation a nd quantification. Designers deliberate(cid:173)
`ly excluded these features-first-order predicate
`for fear that such complex features would
`logic-
`discourage RD F's acceptance and deployment
`within the Web community.
`
`FUTURE OF WEB METADATA
`Standardized metadata is a solution to the lack of
`machine- understandable semantics, one of the
`Wo

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket