`(12) Patent Application Publication (10) Pub. No.: US 2004/0024898A1
`(43) Pub. Date:
`Feb. 5, 2004
`Wan
`
`US 2004.0024898A1
`
`(54) DELIVERING MULTIMEDIA DESCRIPTIONS
`(76) Inventor: Ernest Yiu Cheong Wan, Carlingford,
`NSW (AU)
`Correspondence Address:
`FITZPATRICK CELLAHARPER & SCINTO
`30 ROCKEFELLER PLAZA
`NEW YORK, NY 10112 (US)
`(21) Appl. No.:
`10/296,162
`(22) PCT Filed:
`Jul. 5, 2001
`(86) PCT No.:
`PCT/AU01/00799
`(30)
`Foreign Application Priority Data
`
`Jul. 10, 2000 (AU)............................................ PO 8677
`Publication Classification
`
`(51) Int. Cl. .................................................. G06F 15/16
`(52) U.S. Cl. ............................................ 709/231; 709/246
`(57)
`ABSTRACT
`Disclosed is method of processing a document (20)
`described in a mark up language (eg. XML). Initially, a
`
`Structure (21a) and a text content (21b) of the document are
`Separated, and then the structure (22) is transmitted, for
`example by Streaming, before the text content (23). Parsing
`of the received structure (22) is commenced before the text
`content (23) is received. Also disclosed is a method of
`forming a streamed presentation (37,38) from at least one
`media object having content (31, 32) and description (33)
`components. A presentation description (35) is generated
`(36) from at least one component description of the media
`object and is then processed (34) to schedule delivery of
`component descriptions and content of the presentation to
`generate elementary data Streams associated with the com
`ponent descriptions (38) and content (37). Another method
`of forming a streamed presentation of at least one media
`object having content and description components is also
`disclosed. A presentation template (53) is provided that
`defines a structure of a presentation description (56). The
`template is then applied (54) to at least one description
`component (52) of the associated media object to form the
`presentation description from each description component.
`The presentation description is then Stream encoded with
`each associated media object (51) to form the streamed
`presentation (57, 58), whereby the media object is repro
`ducible using the presentation description.
`
`go
`Scene description stream do
`(
`N
`
`do
`
`46
`47
`
`4.
`
`Audio and Video StreamS
`
`Description streams
`
`DZDDDDDDD-42
`sye.<rite
`43
`
`CSCeed...
`</scened
`
`CSCened...
`</scened
`
`<Shotel...
`<Shot)
`
`<Shotd...
`<Shotd
`
`Elementary Streams
`
`44
`
`45
`
`-1-
`
`Amazon v. Audio Pod
`US Patent 10,805,111
`Amazon EX-1067
`
`
`
`Patent Application Publication
`
`Feb. 5, 2004
`
`Sheet 1 of 11
`
`US 2004/0024898 Al
`
`ol
`
`pepooug "stein49}01018}U}
`iU,tPena,ihiu,ryia00o19,12,chv910LO
`908810SO00£060SOZhOO.AOWY
`vO€8100054“€0S800%AxX€08098
`
`
`}USWINDOGsy}Josadedsepoy
`
`10101000WN.€0VO209870€810
`pAPNee=|XSL|LP
`juawinsog
`sd1n0sg
`
`
`
`<LXAL(AHOMSSVdlLXS.L)SdALLNdNILSOLLLVi>
`
`
`
`
`<G3NdINi#NAYOLAINSNVNGHYVDLSMLLVi>
`
`
`
`<.(O0|LNdNI!VLVdOd#)GHYOLNSWa1si>
`</,S/Ho'ZA//:dYU=TWNwdIOOVu=AdALOG>
`
`
`
`
`</uNu=ASYwLXSL=SdALLNdNI>oweusayuy
`
`
`<CAITdWI#NAYOLAINASMLANILSILLVi>
`
`
`
`
`<1SI1.(LASILSIN)STALSGHV9LSIMLLWi>
`
`
`<GSHINDAY#VLVG9SdALOGISTLLLiVi>
`
`QloO}
`<d3dWitVLVG9THNOdLSMLLV
`
`(uy40d)Wh“BI
`
`
`SALSMFRSTALS1°92,=45NNNquvo>
`
`<+(GuVvO)ZAXLNSWSATSi>
`
`
`<ALdINALAdNILNAWS1ai>
`
`
`
`
`
`<En0"Lu=UOISIOA[LUXZ>
`
`
`
`1ZAXAdALOOG>
`
`qusWNdOd
`
`<ALdW3OGLNSWI1S5i>
`
`
`
`<v09149,GsquALLLNAI>
`
`<-JUSULUODBS}SIU—j><[
`
`<ZAX>
`
`<duvor
`
`<ZAX/>
`
`-2-
`
`-2-
`
`
`
`
`Patent Application Publication
`
`Feb. 5, 2004 Sheet 2 of 11
`
`US 2004/0024898A1
`
`
`
`
`
`
`
`
`
`
`
`
`
`Token in Stream
`
`Description
`
`12
`'a', 'b', 'c', OO,'', 'E', 'n',
`'t, 'e', 'r',
`', 'n', 'a', 'm',
`'e', '', ', OO
`
`String table length
`String table
`
`09
`3
`
`06
`86
`08
`03
`
`NAME
`tring table reference follows
`trind table index
`
`D
`TYPE
`URL="http://
`inline string follows
`
`Inline string follows
`
`String table reference follows
`
`Inline string follows
`
`
`
`Fig. 1 B (Prior Art)
`
`
`
`
`
`
`
`
`
`-3-
`
`
`
`Patent Application Publication
`
`US 2004/0024898A1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`-4-
`
`
`
`Patent Application Publication
`
`Feb. 5, 2004 Sheet 4 of 11
`
`US 2004/0024898A1
`
`09
`
`
`
`
`
`
`
`
`
`
`
`
`
`99
`
`
`
`S?? Opný
`
`}}}}}}}}
`
`| 9
`
`===============
`
`-5-
`
`
`
`Patent Application Publication
`
`Feb. 5, 2004 Sheet 5 of 11
`
`US 2004/0024898A1
`
`MPEG-4
`
`OCI
`
`description
`
`Fig. 4A
`(Prior Art)
`
`)
`
`N
`
`description
`
`
`
`MPEG-4
`
`OCI
`
`MPEG-7
`
`URI:URI:
`URI
`YS.N.
`
`description;
`
`description
`
`Fig. 4B
`
`-6-
`
`
`
`Patent Application Publication
`
`Feb. 5, 2004 Sheet 6 of 11
`
`US 2004/0024898A1
`
`
`
`Oy '61-I
`
`-7-
`
`
`
`Patent Application Publication
`
`Feb. 5, 2004 Sheet 7 of 11
`
`US 2004/0024898A1
`
`JesoduuOO
`
`a
`
`is
`
`a
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`99
`
`| 9
`
`-8-
`
`
`
`Patent Application Publication
`
`Feb. 5, 2004 Sheet 8 of 11
`
`US 2004/0024898 Al
`
`Fig. 6(a)
`
`Fig. 6(a)
`
`</xsl:template>
`<xsI:template match="/movie/right’>
`
`</xsl:template>
`<xsktemplate match="/movie/scene’”>
`
`</xsk:template>
`
`Presentation Template
`
` «<xsl:template match="/moviestitle’>
`
`
`
`
`
`
`
`
`
`
`
`
`<movie ...=”aMovie.mpg’>
`<title>...</title>
`
`<right>...</right>
`
`<scene ...begin="0:2:0.0" dur="300s">
`<shot...begin="0:0:30.0" dur="30s">
`
`
`</shot>
`Composer
`
`
`
` <scene . .begin="1:0:0.0” dur="600s">
`
`<shot...begin="0:0:15.0" dur="60s">
`
`</shot>
`
`Fig. 6(b)
`
`</xsl:template>
`<xsl:template match="/movie/scene/shot">
`
`Movie Description
`
`</scene>
`
`</movie>
`
`</scene>
`
`
`
`
`
`-9-
`
`-9-
`
`
`
`Patent Application Publication
`
`Feb. 5, 2004
`
`Sheet 9 of 11
`
`US 2004/0024898A1
`
`
`
`
`
`(q)9 -61-I
`
`-10-
`
`
`
`Patent Application Publication
`
`Feb. 5, 2004 Sheet 10 of 11
`
`US 2004/0024898A1
`
`715
`
`
`
`Computer
`NetWOrk
`
`714.
`
`716
`
`Video
`Display
`
`720
`
`7OO
`?n
`
`701
`
`707
`
`710
`
`711
`
`
`
`
`
`Video
`Interface
`
`
`
`V
`
`to the Il- 709
`
`Storage Device
`
`
`
`704
`
`705
`
`8 N- 706 Y712
`71
`
`Keyboard
`
`702
`
`N-703
`
`-11-
`
`
`
`Patent Application Publication
`
`Feb. 5, 2004 Sheet 11 of 11
`
`US 2004/0024898A1
`
`Z CIO 4.
`
`
`
`
`
`| aiueº || || vuae |uueÐJ?S ?OO
`
`8 (61-)
`
`\ /
`
`
`
`
`
`
`
`ŒØ] No.º|| uenis uodesopousos
`
`-12-
`
`
`
`US 2004/0024898A1
`
`Feb. 5, 2004
`
`DELIVERING MULTIMEDIA DESCRIPTIONS
`
`TECHNICAL FIELD OF THE INVENTION
`0001. The present invention relates generally to the dis
`tribution of multimedia and, in particular, to the delivery of
`multimedia descriptions in different types of applications.
`The present invention has particular application to, but is not
`limited to, the evolving MPEG-7 standard.
`
`BACKGROUND ART
`0002 Multimedia may be defined as the provision of, or
`access to, media, Such as text, audio and images, in which an
`application can handle or manipulate a range of media types.
`Invariably where access to a Video is desired, the application
`must handle both audio and images. Often Such media is
`accompanied by text that describes the content and may
`include references to other content. AS Such, multimedia
`may be conveniently referred to as being formed of content
`and descriptions. The description is typically formed by
`metadata which is, practically Speaking, data which is used
`to described other data.
`0003) The World Wide Web (WWW or, the “Web”) uses
`a client/server paradigm. Traditional access to multimedia
`over the Web involves an individual client accessing a
`database available via a server. The client downloads the
`multimedia (content and description) to the local processing
`System where the multimedia may be utilised, typically by
`compiling and replaying the content with the aid of the
`description. The description is “static” in that usually the
`entire description must be available at the client in order for
`the content, or parts thereof, to be reproduced. Such tradi
`tional access is problematic in the delay between client
`request and actual reproduction, and the Sporadic load on
`both the Server and any communications network linking the
`Server and local processing System as media components are
`delivered. Real-time delivery and reproduction of multime
`dia in this fashion is typically unobtainable.
`0004) The evolving MPEG-7 standard has identified a
`number of potential applications for MPEG-7 descriptions.
`The various MPEG-7 “pull”, or retrieval applications,
`involve client access to databases and audio-Visual archives.
`The “push” applications are related to content Selection and
`filtering and are used in broadcasting, and the emerging
`concept of “webcasting”, in which media, traditionally
`broadcast over the airways by radio frequency propagation,
`is broadcast over the structured links of the Web. Webcast
`ing, in its most fundamental form, requires a Static descrip
`tion and Streamed content. However webcasting usually
`necessitates the downloading of the entire description before
`any content may be received. Desirably, webcasting requires
`Streamed descriptions received with or in association with,
`the content. Both types of applications benefit Strongly from
`the use of metadata.
`0005. The Web is likely to be the primary medium for
`most people to Search and retrieve audio-visual (AV) con
`tent. Typically, when locating information, the client issues
`a query and a Search engine Searches its database and/or
`other remote databases for relevant content. MPEG-7
`descriptions, which are constructed using XML documents,
`enable more efficient and effective Searching because of the
`well-known Semantics of the Standardised descriptors and
`description schemes used in MPEG-7. Nevertheless,
`
`MPEG-7 descriptions are expected to form only a (small)
`portion of all content descriptions available on the Web. It is
`desirable for MPEG-7 descriptions to be searchable and
`retrievable (or downloadable) in the same manner as other
`XML documents on the Web since users of the Web do not
`expect or want AV content to be downloaded with descrip
`tion. In Some cases, the descriptions rather than the AV
`content are what may be required. In other cases, users will
`want to examine the description before deciding on whether
`to download or Stream the content.
`0006 MPEG-7 descriptors and description schemes are
`only a sub-set of the set of (well-known) vocabulary used on
`the Web. Using the terminology of XML, the MPEG-7
`descriptors and description Schemes are elements and types
`defined in the MPEG-7 namespace. Further, Web users
`would expect that MPEG-7 elements and types could be
`used in conjunction with those of other namespaces. Exclud
`ing other widely used Vocabularies and restricting all
`MPEG-7 descriptions to consist only of the standardised
`MPEG-7 descriptors and description schemes and their
`derivatives would make the MPEG-7 standard excessively
`rigid and unusable. A widely accepted approach is for a
`description to include Vocabularies from multiple
`namespaces and to permit applications to process elements
`(from any namespace, including MPEG-7) that the applica
`tion understands, and ignore those elements that are not
`understood.
`0007 To make downloading, and any consequential stor
`ing, of a multimedia (eg. MPEG-7) description more effi
`cient, the descriptions can be compressed. A number of
`encoding formats have been proposed for XML, and include
`WBXML, derived from the Wireless Application Protocol
`(WAP). In WBXML, frequently used XML tags, attributes
`and values are assigned a fixed Set of codes from a global
`code Space. Application Specific tag names, attribute names
`and Some attribute values that are repeated throughout
`document instances are assigned codes from Some local
`code spaces. WBXML preserves the structure of XML
`documents. The content as well as attribute values that are
`not defined in the Document Type Definition (DTD) can be
`Stored in line or in a String table. An example of encoding
`using WBXML is shown in FIGS. 1A and 1B. FIG. 1A
`depicts how an XML source document 10 is processed by an
`interpreter 14 according various code Spaces 12 defining
`encoding rules for WBXML. The interpreter 14 produces an
`encoded document 16 Suitable for communication according
`to the WBXML standard. FIG. 1B provides a description of
`each token in the data stream formed by the document 16.
`0008 While WBXML encodes XML tags and attributes
`into tokens, no compression is performed on any textual
`content of the XML description. Such may be achieved
`using a traditional text compression algorithm, preferably
`taking advantage of the Schema and data-types of XML to
`enable better compression of attribute values that are of
`primitive data-types.
`SUMMARY OF THE INVENTION
`0009. It is an object of the present invention to substan
`tially overcome, or at least ameliorate, one or more disad
`Vantages of existing arrangements to Support the Streaming
`of multimedia descriptions.
`0010 General aspects of the present invention provide
`for Streaming descriptions, and for Streaming descriptions
`
`-13-
`
`
`
`US 2004/0024898A1
`
`Feb. 5, 2004
`
`with AV (audio-visual) content. When streaming descrip
`tions with AV content, the Streaming can be "description
`centric' or “media-centric'. The Streaming can also be
`unicast with upstream channel or broadcast.
`0011. According to a first aspect of the invention, there is
`provided a method of forming a streamed presentation from
`at least one media object having content and description
`components, Said method comprising the Steps of:
`0012 generating a presentation description from at
`least one component description of Said at least one
`media object; and
`0013 processing said presentation description to
`Schedule delivery of component descriptions and
`content of Said presentation to generate elementary
`data Streams associated with Said component
`descriptions and content.
`0.014. According to another aspect of the present inven
`tion there is disclosed a method of forming a presentation
`description for Streaming content with description, Said
`method comprising the Steps of
`0015 providing a presentation template that defines
`a structure of a presentation description;
`0016 applying Said template to at least one descrip
`tion component of at least one associated media
`object to form Said presentation description from
`each said description component, Said presentation
`description defining a Sequential relationship
`between description components desired for
`Streamed reproduction and content components asso
`ciated with Said desired descriptions.
`0.017. According to another aspect of the present inven
`tion there is disclosed a streamed presentation comprising a
`plurality of content objects interspersed amongst a plurality
`of description objects, Said description objects comprising
`references to multimedia content reproducible from Said
`content objects.
`0.018. According to another aspect of the present inven
`tion there is disclosed a method of delivering an XML
`document, Said method comprising the Steps of
`0019 dividing the document to separate XML struc
`ture from XML text; and
`0020 delivering said document in a plurality of data
`Streams, at least one Said Stream comprising Said
`XML structure and at least one other of said streams
`comprising Said XML text.
`0021. In accordance with another aspect of the present
`invention, there is disclosed a method of processing a
`document described in a mark up language, Said method
`comprising the Steps of:
`0022 Separating a structure and a text content of
`Said document;
`0023 sending the structure before the text content;
`and
`0024 commencing to parse the received structure
`before the text content is received.
`0.025. Other aspects of the present invention are also
`disclosed.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`0026. At least one embodiment of the present invention
`will now be described with reference to the drawings, in
`which:
`0027 FIGS. 1A and 1B show an example of a prior art
`encoding of an XML document;
`0028 FIG. 2 illustrates a first method of streaming an
`XML document;
`0029 FIG.3 illustrates a second method of “description
`centric' Streaming in which the Streaming is driven by a
`presentation description;
`0030)
`FIG. 4A illustrates a prior art stream;
`0031
`FIG. 4B shows a stream according to one imple
`mentation of the present disclosure;
`0032 FIG. 4C shows a preferred division of a descrip
`tion Stream;
`0033 FIG. 5 illustrates a third method of “media-cen
`tric' Streaming,
`0034 FIG. 6 is an example of a composer application;
`0035 FIG. 7 is a schematic block diagram of a general
`purpose computer upon which the implementation of the
`present disclosure can be practiced; and
`0036 FIG. 8 schematically represents an MPEG-4
`Stream.
`
`DETAILED DESCRIPTION INCLUDING BEST
`MODE
`0037. The implementations to be described are each
`founded upon the relevant multimedia descriptions being
`XML documents. XML documents are mostly stored and
`transmitted in their raw textual format. In Some applications,
`XML documents are compressed using Some traditional text
`compression algorithms for Storage or transmission, and
`decompressed back into XML before they are parsed and
`processed. Although compression may greatly reduce the
`size of an XML document, and thus reduce the time for
`reading or transmitting the document, an application Still has
`to receive the entire XML document before the document
`can be parsed and processed. A traditional XML parser
`expects an XML document to be well-formed (ie. the
`document has matching and non-overlapping Start-tag and
`end-tag pairs), and is unable to complete the parsing of the
`XML document until the whole XML document is received.
`Incremental parsing of a Streamed XML document is unable
`to be performed using a traditional XML parser.
`0038 Streaming an XML document permits parsing and
`processing to commence as Soon as a Sufficient portion of the
`XML document is received. Such capability will be most
`useful in the case of a low bandwidth communication link
`and/or a device with very limited resources.
`0039. One way of achieving incremental parsing of an
`XML document is to send the tree hierarchy of an XML
`document (such as the Dominant Object Model (DOM)
`representation of the document) in a breadth-first or depth
`first manner. To make Such a process more efficient, the
`XML (tree) structure of the document can be separated from
`the text components of the document and encoded and Sent
`
`-14-
`
`
`
`US 2004/0024898A1
`
`Feb. 5, 2004
`
`before the text. The XML structure is critical in providing
`the context for interpreting the text. Separating the two
`components allows the decoder (parser) to parse the Struc
`ture of the document more quickly, and to ignore elements
`that are not required or are unable to be interpreted. Such a
`decoder (parser) may optionally choose not to buffer any
`irrelevant text that arrives at a later stage. Whether the
`decoder converts the encoded document back into XML or
`not depends on the application.
`0040. The XML structure is vital in the interpretation of
`the text. In addition, as different encoding Schemes are
`usually used for the Structure and the text and, in general,
`there is far less Structural information than textual content,
`two (or more) separate streams may be used for delivering
`the Structure and the text.
`0041 FIG. 2 shows one method of streaming XML
`document 20. Firstly, the document 20 is converted to a
`DOM representation 21, which is then streamed in a depth
`first fashion. The structure of the document 20, depicted by
`the tree 21a of the DOM representation 21, and the text
`content 21b, are encoded as two Separate Streams 22 and 23
`respectively. The structure stream 23 is headed by code
`tables 24. Each encoded node 25, representing a node of the
`DOM representation 21, has a size field that indicates its size
`including the total size of corresponding descendant nodes.
`Where appropriate, encoded leaf nodes and attribute nodes
`contain pointerS 26 to their corresponding encoded content
`27 in the text stream 23. Each encoded string in the text
`Stream is headed by a size field that indicates the size of the
`String.
`0042. Not all multimedia (eg. MPEG-7) descriptions
`need be streamed with content or Serve as a presentation. For
`instance, television and film archives Store a vast amounts of
`multimedia material in Several different formats, including
`analogue tapes. It would not be possible to Stream the
`description of a movie, in which the movie is recorded on
`analogue tapes, with the actual movie content. Similarly,
`treating the multimedia description of a patient's medical
`records as a multimedia presentation makes little Sense. AS
`an analogy, while Synchronised Multimedia Integration
`Language (SMIL) presentations are themselves XML docu
`ments, not all XML documents are SMIL presentations.
`Indeed, only a very small number of XML documents are
`SMIL presentations. SMIL can be used for creating presen
`tation Script that enables a local processor to compile an
`output presentation from a number of local files or resources.
`SMIL specifies the timing and synchronisation model but
`does not have any built-in Support for the Streaming of
`content or description.
`0.043
`FIG. 3 shows an arrangement 30 for streaming
`descriptions together with content. A number of multimedia
`resources are shown including audio files 31 and Video files
`32. Associated with the resources 31 and 32 are descriptions
`33 each typically formed of a number of descriptors and
`descriptor relationships. Significantly, there need not be a
`one-to-one relationship between the descriptions 33 and the
`content files 31 and 32. For example, a Single description
`may relate to a number of files 31 and/or 32, or any one file
`31 or 32 may have associated therewith more than one
`description.
`0044 As seen in FIG. 3, a presentation description 35 is
`provided to describe the temporal behaviour of a multimedia
`
`presentation desired to be reproduced through a method of
`description-centric Streaming. The presentation description
`35 can be created manually or interactively through the use
`of editing tools and a Standardized presentation description
`Scheme 36. The scheme 36 utilises elements and attributes to
`define the hyperlinks between the multimedia objects and
`the layout of the desired multimedia presentation. The
`presentation description 35 can be used to drive the Stream
`ing process. Preferably, the presentation description is an
`XML document that uses a SMIL-based description scheme.
`0045 An encoder 34, with knowledge of the presentation
`description Scheme 36, interprets the presentation descrip
`tion 35, to construct an internal time graph of the desired
`multimedia presentation. The time graph forms a model of
`the presentation Schedule and Synchronization relationships
`between the various resources. Using the time graph, the
`encoder 34 schedules the delivery of the required compo
`nents and then generates elementary data Streams 37 and 38
`that may be transmitted. Preferably, the encoder 34 splits the
`descriptions 33 of the content into multiple data streams 38.
`The encoder 34 preferably operates by constructing a URI
`table that maps the URI-references contained in the AV
`content 31, 32 and the descriptions 33 to a local address (eg.
`offset) in the corresponding elementary (bit) streams 37 and
`38. The streams 37 and 38, having been transmitted, are
`received into a decoder (not illustrated) that uses the URI
`table when attempting to decode any URI-reference.
`0046) The presentation description scheme 36, in some
`implementations, may be based on SMIL. Current develop
`ments in MPEG-4 enable SMIL-based presentation descrip
`tion to be processed into MPEG-4 streams.
`0047. An MPEG-4 presentation is made up of scenes. An
`MPEG-4 Scene follows a hierarchical structure called a
`Scene graph. Each node of the Scene graph is a compound or
`primitive media object. Compound media objects group
`primitive media objects together. Primitive media objects
`correspond to leaves in the Scene graph and are AV media
`objects. The Scene graph is not necessarily Static. Node
`attributes (eg. positioning parameters) can be changed and
`nodes can be added, replaced or removed. Hence, a Scene
`description Stream may be used for transmitting Scene
`graphs, and updates to Scene graphs.
`0048. An AV media object may rely on streaming data
`that is conveyed in one or more elementary streams (ES). All
`Streams associated to one media object are identified by an
`object descriptor (OD). However, streams that represent
`different content must be referenced through distinct object
`descriptors. Additional auxiliary information can be attached
`to an object descriptor in a textual form as an OCI (object
`content information) descriptor. It is also possible to attach
`an OCI stream to the object descriptor. The OCI stream
`conveys a set of OCI events that are qualified by their start
`time and duration. The elementary streams of an MPEG-4
`presentation are schematically illustrated in FIG. 8.
`0049. In MPEG-4, information about an AV object is
`Stored and transmitted using the Object Content Information
`(OCI) descriptor or stream. The AV object contains a refer
`ence to the relevant OCI descriptor or Stream. AS Seen in
`FIG. 4A, Such an arrangement requires a specific temporal
`relationship between the description and the content and a
`one-to-one relationship between AV objects and OCI.
`0050. However, typically, multimedia (eg. MPEG-7)
`descriptions are not written for specific MPEG-4 AV objects
`
`-15-
`
`
`
`US 2004/0024898A1
`
`Feb. 5, 2004
`
`or Scene graphs and, indeed are written without any specific
`knowledge of the MPEG-4 AV objects and scene graphs that
`make up the presentation. The descriptions usually provide
`a high level view of the information of the AV content.
`Hence, the temporal Scope of the descriptions might not
`align with those of the MPEG-4 AV objects and scene
`graphs. For instance, a Video/audio Segment described by an
`MPEG-7 description may not correspond to any MPEG-4
`Video/audio Stream or Scene description Stream. The Seg
`ment may describe the last portion of one video Stream and
`the beginning part of the following one.
`0051. The present disclosure presents a more flexible and
`consistent approach in which the multimedia description, or
`each fragment thereof, is treated as another class of AV
`object. That is, like other AV objects, each description will
`have its own temporal scope and object descriptor (OD). The
`Scene graph is extended to Support the new (eg. MPEG-7)
`description node. With Such a configuration, it is possible to
`send a multimedia (eg. MPEG-7) description fragment, that
`has Sub-fragments of different temporal Scopes, as a Single
`data Stream or as Separate Streams, regardless of the tem
`poral Scopes of the other AV media objects. Such a task is
`performed by the encoder 34 and a example of Such a
`structure, applied to the MPEG-4 example of FIG. 4A, is
`shown in FIG. 4B. In FIG. 4B, the OCI stream is also used
`to contain references of relevant description fragments and
`other AV object Specific information as required.
`0.052
`Treating MPEG-7 descriptions in the same way as
`other AV objects also means that both can be mapped to a
`media object element of the presentation description Scheme
`36 and Subjected to the Same timing and Synchronisation
`model. Specifically, in the case of an SMIL-based presen
`tation description Scheme 36, a new media object element,
`Such as an <mpeg7> tag, may be defined. Alternately,
`MPEG-7 descriptions can be treated as a specific type of text
`(eg. represented in Italics). Note that a set of common media
`object elements <VideoD, <audio>, <animation>, <text>, etc.
`are pre-defined in SMIL. The description Stream can poten
`tially be further Separated into a structure Stream and a text
`Stream.
`0053. In FIG. 4C, a multimedia stream 40 is shown
`which includes an audio Stream 41 and a Video stream 42.
`Also included is a high-level Scene description Stream 46
`comprising (compound or primitive) nodes of media objects
`and having leaf nodes (which are primitive media objects)
`that point to object descriptorS ODn that make up an object
`descriptor stream 47. A number of low level description
`Streams 43, 44 and 45 are also shown, each having compo
`nents configured to be pointed to, or linked to the object
`description Stream 47, as do the audio and Video Streams 41
`and 42. With Such an object-oriented Streaming treating both
`content and description as media objects, the temporally
`irregular relationship between description and content may
`be accommodated through a temporal object description
`Structured into the Streams.
`0.054 The above approach to streaming descriptions with
`content is appropriate where the description has Some tem
`poral relationship with the content. An example of this is a
`description of a particular Scene in a movie, that provides for
`multiple camera angles to be viewed, thus permitting viewer
`access to multiple Video Streams for which only one Video
`Stream may, practically Speaking, be viewed in the real-time
`
`running of the movie. This is to be contrasted with arbitrary
`descriptions which have no definable temporal relationship
`with the Streamed content. An example of Such may be a
`newspaper critic's text review of the movie. Such a review
`may make text reference, as opposed to a temporal and
`Spatial reference to Scenes and characters. Converting an
`arbitrary description into a presentation is a non-trivial (and
`often impossible) task. Most descriptions of AV content are
`not written with presentation in mind. They simply describe
`the content and its relationship with other objects at various
`levels of granularity and from different perspectives. Gen
`erating a presentation from a description that does not use
`the presentation description Scheme 36 involves arbitrary
`decisions, best made by a user operating a specific applica
`tion, as opposed to the Systematic generation of the presen
`tation description 35.
`0055 FIG. 5 shows another arrangement 50 for stream
`ing descriptions with content that the present inventor has
`termed “media-centric'. AV content 51 and descriptions 52
`of the content 51 are provided to a composer 54, also input
`with a presentation template 53 and having knowledge of a
`presentation description scheme 55. Although the content 51
`shows a Video and its audio track is shown as the initial AV
`media object, the initial AV object can actually be a multi
`media presentation.
`0056. In media-centric streaming, an AV media object
`provides the AV content 51 and the timeline of the final
`presentation. This is in contrast to the description centric
`Streaming where the presentation description provides the
`timeline of the presentation. Information relevant to the AV
`content is pulled in from a set of descriptions 52 of the
`content by the composer 54 and delivered with the content
`in a final presentation. The final presentation output from the
`composer 54 is in the form of elementary streams 57 and 58,
`as with the previous configuration of FIG. 3, or as a
`presentation description 56 of all the associated content.
`0057 The presentation template 53 is used to specify the
`type of descriptive elements that are required and those that
`should be omitted for the final presentation. The template 53
`may also contain instructions as to how the required descrip
`tions should be incorporated into the presentation. An exist
`ing language Such as XSL Transformations (XSLT) may be
`used for Specifying the templates. The composer 54, which
`may be implemented as a Software application, parses the Set
`of required descriptions that describe the content, and
`extracts the required elements (and any associated Sub
`elements) to incorporate the elements into the time line of
`the presentation. Required elements are preferably those
`elements that contain descriptive information about the AV
`content that is useful for the presentation. In addition,
`elements (from the same set of the descriptions) that are
`referred to (by IDREF's or URI-references) by the selected
`elements are also included and Streamed before their corre
`sponding referring elements (their “referrers'). It is possible
`that a selected element is in turn referenced (either directly
`or indirectly) by an element that it references. It is also
`possible that a Selected element has a forward reference to
`another Selected element. An appropriate heuristic may be
`used to determine the order by which Such elements are
`Streamed. The presentation template 53 can also be config
`ured to avoid Such situations.
`0058. The composer 54 may generate the