`International Bureau
`
`PCT
`INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (Pen
`WO 97/41504
`
`(51) International Patent Classification 6 :
`G06F 3/00, 3n4, 13/00, 13/42
`
`Al
`
`(11) International Publication Number:
`
`(43) International Publication Date:
`
`6 November 1997 (06.11.97)
`
`(21) International Application Number:
`
`PCT/US97/06982
`
`(22) International Filing Date:
`
`24 April 1997 (24.04.97)
`
`(30) Priority Data:
`08/638,350
`
`26 April 1996 (26.04.96)
`
`us
`
`(71) Applicant (for all designated States except US): ELOQUENT,
`INC. [US/US]; Suite 200, 1710 S. Amphlett Boulevard, San
`Mateo, CA 94402-2703 (US).
`
`(81) Designated States: AL, AM, AT, AU, AZ, BA, BB, BG, BR,
`BY, CA, CH, CN, CU, CZ, DE, DK, EE, ES, FI, GB, GE,
`HU, IL, IS, JP, KE, KG, KP, KR, KZ, LC, LK, LR, LS,
`LT, LU, LV, MD, MG, MK, MN, MW, MX, NO, NZ, PL,
`PT, RO, RU, SD, SE, SG, SI, SK, TJ, TM, TR, TT, UA,
`UG, US, UZ, VN, ARIPO patent (GH, KE, LS, MW, SD,
`SZ, UG), Eurasian patent (AM, AZ, BY, KG, KZ, MD, RU,
`TJ, TM), European patent (AT, BE, CH, DE, DK, ES, FI,
`FR, GB, GR, IE, IT, LU, MC, NL, PT, SE), OAPI patent
`(BF, BJ, CF, CG, Cl, CM, GA, GN, ML, MR, NE, SN, TD,
`TG).
`
`(72) Inventors; and
`(75) Inventors/Applicants (for US only): REID, Clifford, A.
`[US/US]; Apartment 2-710, 2 Townsend Street, San
`Francisco, CA 94107 (US). GLAZER, David [US/US]; 263
`Glenwood Avenue, Woodside, CA 94062 (US).
`
`Published
`With international search report.
`Before the expiration of the time limit for amending the
`claims and to be republished in the event of the receipt of
`amendments.
`
`(74) Agents: BARR, Robert et al.; Weil, Gotshal & Manges L.L.P.,
`Suite 280, 2882 Sand Hill Road, Menlo Park, CA 94025
`(US).
`
`(54) Title: A METHOD AND SYSTEM FOR SYNCHRONIZING AND NAVIGATING MULTIPLE STREAMS OF ISOCHRONOUS
`AND NON-ISOCHRONOUS DATA
`
`Sz. ;, z.
`S.3 36 t
`S, 3'0
`JS'O -61 ;~:i:::==~i ~I=¥, ==f-1 T": -jlE=31-E t-1 ~.~~~i ~;~~;¼-1-.► VIDEO
`
`' .
`
`:
`
`'
`
`I
`
`I
`,
`
`I
`
`'
`I
`
`I
`I
`I I
`I
`
`I
`I
`
`I
`1
`
`I
`
`I
`
`1
`I
`I
`I
`I
`I
`I
`:
`I
`I
`I
`I
`I
`1 I
`I
`I
`I
`I
`I
`I '
`I
`I
`I
`I
`: :
`•
`: i
`:
`I
`I
`I
`I
`:
`--:---~----.....L..-;--, T1--;-'----+-I ---~,~tr--....._--,,r--11----,,--1(►► Su~
`I
`I
`I
`I
`I
`I
`I
`I
`I
`t
`
`: :
`I
`I
`
`I
`t
`
`I
`;
`I
`
`I
`
`I t
`I I
`I '
`I
`I
`
`I
`'
`I I
`I
`I
`I
`I
`I
`I
`I I
`
`;
`I
`I
`It
`I
`l
`ft
`
`I
`I
`I
`I
`I
`I
`~!, -,o'
`I✓
`J
`I
`I
`I
`: : s./37'{
`sJ..' pz.: '
`I~ 37i:
`~, 3 '
`I
`S413~6
`3sz. -~· tt--f+-. =====-==-==.i::i:::::..5=i.11-+' -l=I ==11===::=;, =p:=11--1=-=t--.:_-1► .. /\lJ~o
`
`I
`I
`
`I
`
`t
`I
`I
`
`J
`I
`J
`
`I.
`I
`I
`
`I
`
`I
`
`I
`I
`I
`1
`c
`
`I
`
`~
`
`I
`
`I
`I
`I
`
`I
`I
`I
`I
`
`I
`
`(57) Abstract
`
`A method and system for synchronizing multiple streams of isochronous and non-isochronous data (100) and navigating through the
`synchronized streams by reference to a common time base (210) and by means of a structured framework of conceptual events provides
`computer users with an effective means to interact with multimedia programs of speakers giving presentations ( 400). The multimedia
`programs consisting of synchronized video, audio, graphics, text, hypertext, and other data types can be stored on a server (130), and use~
`can navigate and play them from a client CPU (l lO) over a non-isochronous network connection (150).
`
`-i-
`
`Amazon v. Audio Pod
`US Patent 9,319,720
`Amazon EX-1076
`
`
`
`Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT.
`
`FOR THE PURPOSES OF INFORMATION ONLY
`
`AL
`AM
`AT
`AU
`AZ
`BA
`BB
`BE
`BF
`BG
`BJ
`BR
`BY
`CA
`CF
`CG
`CH
`CJ
`CM
`CN
`cu
`CZ
`DE
`DK
`EE
`
`Albania
`Annenia
`Austria
`Australia
`Azerbaijan
`Bosnia and Herzegovina
`Barbados
`Belgium
`Burkina Faso
`Bulgaria
`Benin
`Brazil
`Belarus
`Canada
`Central African Republic
`Congo
`Switzerland
`COie d'Ivoire
`Cameroon
`China
`Cuba
`Czech Republic
`Germany
`Denmark
`Estonia
`
`ES
`FI
`FR
`GA
`GB
`GE
`GH
`GN
`GR
`HU
`IE
`IL
`IS
`IT
`JP
`KE
`KG
`KP
`
`KR
`KZ
`LC
`LI
`LK
`LR
`
`Spain
`Finland
`France
`Gabon
`United Kingdom
`Georgia
`Ghana
`Guinea
`Greece
`Hungary
`Ireland
`Israel
`Iceland
`Italy
`Japan
`Kenya
`Kyrgyzstan
`Democratic People's
`Republic of Korea
`Republic of Korea
`Kazalcstan
`Saint Lucia
`Liechtenstein
`Sri Lanka
`Liberia
`
`LS
`LT
`LU
`LV
`MC
`MD
`MG
`MK
`
`ML
`MN
`MR
`MW
`MX
`NE
`NL
`NO
`NZ
`PL
`PT
`RO
`RU
`SD
`SE
`SG
`
`Lesotho
`Lithuania
`Luxembourg
`Latvia
`Monaco
`Republic of Moldova
`Madagascar
`The former Yugoslav
`Republic of Macedonia
`Mali
`Mongolia
`Mauritania
`Malawi
`Mexico
`Niger
`Netherlands
`Norway
`New Zealand
`Poland
`Portugal
`Romania
`Russian Federation
`Sudan
`Sweden
`Singapore
`
`SI
`SK
`SN
`sz
`TD
`TG
`TJ
`TM
`TR
`TI
`UA
`UG
`us
`uz
`VN
`YU
`zw
`
`Slovenia
`Slovakia
`Senegal
`Swaziland
`Chad
`Togo
`Tajikistan
`Turkmenistan
`Turkey
`Trinidad and Tobago
`Ukraine
`Uganda
`United States of America
`Uzbekistan
`Viet Nam
`Yugoslavia
`Zimbabwe
`
`-ii-
`
`
`
`WO 97/41504
`
`PCT/US97 /06982
`
`A METHOD AND SYSTEM FOR SYNCHRONIZING AND NAVIGATING
`
`MULTIPLE STREAMS OF ISOCHRONOUS AND NON-ISOCHRONOUS
`
`DATA
`
`5
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`
`The present invention generally relates to the production and delivery of
`
`1 0
`
`video recordings of speakers giving presentations, and, more particularly, to the
`
`production and delivery of digital multimedia programs of speakers giving
`
`presentations. These digital multimedia programs consist of multiple synchronized
`
`streams of isochronous and non-isochronous data, including video, audio, graphics,
`
`text, hypertext, and other data types.
`
`15
`
`2. Description of the Prior Art
`
`The recording of speakers giving presentations, at events such as
`
`professional conferences, business or government organizations' internal training
`
`20
`
`seminars, or classes conducted by educational institutions, is a common practice.
`
`Such recordings provide access to the content of the presentation to individuals
`
`who were not able to attend the live event.
`
`The most common form of such recordings is analog video taping. A video
`
`25
`
`camera is used to record the event onto a video tape, which is subsequently
`
`duplicated to an analog medium suitable for distribution, most commonly a VHS
`
`1
`
`-1-
`
`
`
`W097/41S04
`
`PCT/US97/06982
`
`tape, which can be viewed using a commercially-available VCR and television set.
`
`Such video tapes generally contain a video recording of the speaker and a
`
`synchronized audio recording of the speaker's words. They may also contain a
`
`video recording of any visual aids which the speaker used, such as text or graphics
`
`5
`
`projected in a manner visible to the audience. Such video tapes may also be edited
`
`prior to duplication to include a textual transcript of the audio component
`
`recording, typically presented on the bottom of the video display as subtitles. Such
`
`subtitles are of particular use to the hearing impaired, and if translated into other
`
`languages, are of particular use to viewers who prefer to read along in a language
`
`10
`
`other than the language used by the speaker.
`
`Certain characteristics of such analog recordings of speakers giving
`
`presentations are unattractive to producers and to viewers. Analog tape players
`
`offer limited navigation facilities, generally limited to fast forward and rewind
`
`15
`
`capabilities. In addition, analog tapes have the capacity to store only a few hours
`
`of video and audio, resulting in the need to duplicate and distribute a large number
`
`of tapes, leading to the accumulation of a large number of such tapes by viewers.
`
`Advancements in computer technology have allowed analog recordings of
`
`20
`
`speakers giving presentations to be converted to digital format, stored on a digital
`
`storage medium, such as a CD-ROM, and presented using a computer CPU and
`
`display, rather than a VCR and a television set. Such digital recordings generally
`
`include both isochronous and non-isochronous data. Isochronous data is data that is
`
`time ordered and must be presented at a particular rate. The isochronous data
`
`25
`
`contained in such a digital recording generally includes video and audio. Non(cid:173)
`
`isochronous data may or may not be time ordered, and need not be presented at a
`
`particular rate. Non-isochronous data contained in such a digital recording may
`
`include graphics, text, and hypertext.
`
`2
`
`-2-
`
`
`
`WO97/41504
`
`PCT/US97/06982
`
`The use of computers to play digital video recordings of speakers giving
`
`presentations provides navigational capabilities not available with analog video
`
`tapes. Computer-based manipulation of the digital data offers random access to
`
`any point in the speech, and if there is a text transcript, allows the users to search
`
`5
`
`for words in the transcript to locate a particular segment of the speech.
`
`Certain characteristics of state-of-the-art digital storage and presentation of
`
`recordings of speakers giving presentations are unattractive to producers and to
`
`viewers. There is no easy way to navigate direct1y to a particular section of a
`
`1 O
`
`presentation that discusses a topic of particular interest to the user. In addition,
`
`there is no easy way to associate a table of contents with a presentation, and
`
`navigate directly to section of the presentation associated with each entry in the
`
`table of contents. Finally, like analog tapes, CD-RO Ms can store only a view
`
`hours of digital video and audio, resulting in the need to duplicate and distribute a
`
`15
`
`large number of CD-ROMs, leading to the accumulation of a large number of such
`
`CD-ROMs by viewers.
`
`SUMMARY OF THE INVENTION
`
`20
`
`It is therefore an object of the present invention to provide a mechanism for
`synchronizing multiple streams of isochronous and non-isochronous digital data in a
`
`manner that supports navigating by means of a structured framework of conceptual
`
`events.
`
`25
`
`It is another object of the invention to provide a mechanism for navigating
`
`through any stream using the navigational approach most appropriate to the
`
`structure and content of that stream.
`
`3
`
`-3-
`
`
`
`WO97/41504
`
`PCT/US97/06982
`
`It is another object of the invention to automatically position each of the
`
`streams at the position corresponding to the selected position in the navigated
`
`stream, and simultaneously display some or all of the streams at that position.
`
`5
`
`It is another object of the invention to provide for the delivery of programs
`
`made up of multiple streams of synchronized isochronous and non-isochronous
`
`digital data across non-isochronous network connections.
`
`In order to accomplish these and other objects of the invention, a method
`
`10
`
`and system for manipulating multiple streams of isochronous and non-isochronous
`
`digital data is provided, including synchronizing multiple streams of isochronous
`
`and non-isochronous data by reference to a common time base, supporting
`
`navigation through each stream in the manner most appropriate to that stream,
`
`defining a framework of conceptual events and allowing a user to navigate though
`
`15
`
`the streams using this structured framework, identifying the position in each stream
`
`corresponding to the position selected in the navigated stream, and simultaneously
`
`displaying to the user some or all of the streams at the position corresponding to
`
`the position selected in the navigated stream. Further, a method and system of
`
`efficiently supporting sequential and random access into streams of isochronous and
`
`20
`
`non-isochronous data across non-isochronous networks is provided, including
`
`reading the isochronous and non-isochronous data from the storage medium into
`
`memory of the server CPU, transmitting the data from the memory of the server
`
`CPU to the memory of the client CPU, and caching the different types of data in
`
`the memory of the client CPU in a manner that ensures continuous display of the
`
`25
`
`isochronous data on the client CPU display device.
`
`4
`
`-4-
`
`
`
`WO 97/41504
`
`PCT/0S97 /06982
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The foregoing and other objectives, aspects, and advantages of the present
`
`invention will be better understood from the following detailed description of
`
`5
`
`embodiments thereof with reference to the following drawings.
`
`FIG. 1 is a schematic diagram of the organization of a data processing
`
`system incorporating an embodiment of the present invention.
`
`FIGS. 2 and 3 are schematic diagrams of the organization of the data in an
`
`embodiment of the present invention.
`
`FIG. 4 is a diagram showing how two different sets of "conceptual events"
`
`may be associated with the same presentation in an embodiment of the present
`
`15
`
`invention.
`
`FIGS. 5, 6 and 9 are exemplary screens produced in accordance with an
`
`embodiment of the present invention.
`
`20
`
`FIGS. 7, 8, 10, and 11 are flow charts indicating the operation of an
`
`embodiment of the present invention.
`
`DETAILED DESCRIPTION OF AN EMBODIMENT OF THE INVENTION
`
`25
`
`Referring now to the drawings, and more particularly to FIG. 1, there is
`
`shown, in schematic representation, a data processing system 100 incorporating the
`
`invention. Conventional elements of the system include a client central processing
`
`unit 110 which includes high-speed memory, a local storage device 112 such as a
`
`30
`
`hard disk or CD-ROM, input devices such as keyboard 114 and pointing device
`
`5
`
`-5-
`
`
`
`WO 97/41504
`
`PCT/US97 /06982
`
`116 such as a mouse, and a visual data presentation device 118, such as a computer
`
`display screen, capable of presenting visual data perceptible to the senses of a user,
`
`and an audio data presentation device 120, such as speakers or headphones, capable
`
`of presenting audio data to the senses of a user. Other conventional elements of
`
`5
`
`the system include a server central processing unit 130 which includes high-speed
`
`memory, a local storage device 132 such as a hard disk or CD-ROM, input devices
`
`such as keyboard 134 and pointing device 136, and a visual data presentation
`device 138, and an audio data presentation device 140. The client CPU is
`
`connected to the server CPU by means of a network connection 150.
`
`10
`
`The invention includes three basic aspects: ( 1) synchronizing multiple
`
`streams of isochronous and non-isochronous data, (2) navigating through the
`
`synchronized streams of data by means of a structured framework of conceptual
`
`events, or by means of the navigational method most appropriate to each stream,
`
`15
`
`and (3) delivering the multiple synchronized streams of isochronous and non(cid:173)
`
`isochronous data over a non-isochronous network connecting the client CPU and
`
`the server CPU.
`
`An exemplary form of the organization of the data embodied in the
`
`20
`
`invention is shown in FIG. 2 and FIG. 3. Beginning with FIG. 2, the video/audio
`stream 200 is of a type known in the art capable of being played on a standard
`
`computer equipped with the appropriate video and audio subsystems, such as shown
`
`in FIG. 1. An example of such a video/audio stream is Microsoft Corporation's
`
`A VI™ format, which stands for "audio/video interleaved." A VI™ and other such
`
`25
`
`video/audio formats consist of a series of digital images, each referred to as a
`
`"frame" of the video, and a series of samples that make up the digital audio. The
`
`frames are spaced equally in time, so that displaying consecutive frames on a
`
`display device at a sufficiently high and constant rate produces the sensation of
`
`continuous motion to the human perceptual system. The rate of displaying frames
`
`30
`
`typically must exceed ten to fifteen frames per second to achieve the effect of
`
`6
`
`-6-
`
`
`
`WO97/41504
`
`PCT/US97 /06982
`
`continuous motion. The audio samples are synchronized with the video frames, so
`
`that the associated audio can be played in synchronization with the displayed video
`
`images. Both the digital images and digital audio samples may be compressed to
`
`reduce the amount of data that must be stored or transmitted.
`
`5
`
`A time base 210 associates a time code with each video frame. The time
`
`base is used to associate other data with each frame of video. The audio data,
`
`which for the purposes of this invention consists primarily of spoken words, is
`
`transcribed into a textual format, called the Transcript 220. The transcript is
`
`10
`
`synchronized to the audio data stream by assigning a time code to each word,
`
`producing the Time-Coded Transcript 225. The time codes (shown in angle(cid:173)
`
`brackets) preceding each word in the Time-Coded Transcript correspond to the time
`
`at which the speaker begins pronouncing that word. For example, the time code
`
`230 of 22.51 s is associated with the word 235 "the." The Time-Coded Transcript
`
`15
`
`may be created manually or by means of an automatic procedure. Manual time(cid:173)
`
`coding requires a person to associate a time code with each word in the transcript.
`
`Automatic time coding, for example, uses a speech recognition system of a type
`
`well-known in the art to automatically assign a time code to each word as it is
`
`recognized and recorded. The current state of the art of speech recognition systems
`
`20
`
`renders automatic time coding of the transcript less economical than manual time
`
`coding.
`
`Referring now to FIG. 3, the set 310 of Slides Sl 311, S2 312, ... that the
`
`speaker used as part of the presentation may be stored in an electronic format of
`
`25
`
`any of the types well-known in the art. Each slide may consist of graphics, text,
`
`and other data that can be rendered on a computer display. A Slide Index 315
`
`assigns a time code to each Slide. For example, Slide S 1 311 would have a time
`
`code 316 of 0 s, S2 312 having a time code 317 of 20.40 s, and so on. The time
`
`code corresponds to the time during the presentation at which the speaker caused
`
`30
`
`the specified Slide to be presented. In one embodiment, all of the Slides are
`
`7
`
`-7-
`
`
`
`WO97/41S04
`
`PCT/US97/06982
`
`contained in the same disk file, and the Slide Index contains pointers to the
`
`locations of each Slide in the disk file. Alternatively, each Slide may be stored in
`
`a separate disk file, and the Slide Index contains pointers to the files containing the
`
`Slides.
`
`5
`
`An Outline 320 of the presentation is stored as a separate text data object.
`
`The Outline is a hierarchy of topics 321, 322, .. that describe the organization of
`
`the presentation, analogous to the manner in which a table of contents describes the
`
`organization of a book. The outline may consist of an arbitrary number of entries,
`
`l O
`
`and an arbitrary number of levels in the hierarchy. An Outline Index 325 assigns a
`
`time code to each entry in the Outline. The time code corresponds to the time
`
`during the presentation at which the speaker begins discussing the topic represented
`
`by the entry in the Outline. For example, topic 321, "Introduction" has entry
`
`name "O l" and time code 326 of O s, topic 322 "The First Manned Flight" has
`
`15
`
`entry name "02" and time code 327 of 20.50 s, "The Wright Brothers" 323 has
`
`entry name "021" (and hence is a subtopic of topic 322) with time code 328 of
`
`120.05 s, and so on. The Outline and the Outline Index may be created by means
`
`of a manual or an automatic procedure. Manual creation is accomplished by a
`
`person viewing the presentation, authoring the Outline, and assigning a time code
`
`20
`
`to each element in the outline. Automatic creation may be accomplished by
`
`automatically constructing the outline consisting of the titles of each of the Slides,
`
`and associating with each entry on the Outline the time code of the corresponding
`
`Slide. Note that manual and automatic creation may produce different Outlines.
`
`25
`
`The set 330 of Hypertext Objects 331, 332, ... relating to the subject of the
`
`presentation may be stored in an electronic formats of various types well-known in
`
`the art. Each Hypertext Object may consist of graphics, text, and other data that
`
`can be rendered on a computer display, or pointers to other software applications,
`
`as spreadsheets, word processors, and electronic mail systems, as well as more
`
`8
`
`-8-
`
`
`
`WO 97/41504
`
`PCT/US97 /06982
`
`specialized applications such as proficiency testing applications or computer-based
`
`training applications.
`
`A Hypertext Index table 335 is used to assign two time codes and a display
`
`5
`
`location to each Hypertext Object. The first time code 336 corresponds to the
`
`earliest time during the presentation at which the Hypertext Object relates to the
`
`content of the presentation. The second time code 337 corresponds to the latest
`
`time during the presentation at which the Hypertext Object relates to the content of
`the presentation. The Object Name 338, as the name suggests, denotes the
`
`10
`
`Hypertext Object's name. The display location 339 denotes how the connection to
`
`the Hypertext Object, referred to as the Hypertext Link, is to be displayed on the
`
`computer screen. Hypertext Links may be displayed as highlighted words in the
`
`Transcript or the Slides, as buttons or menu items on the end-user interface, or in
`
`other visual presentation that may be selected by the user.
`
`15
`
`It may be appreciated by one of ordinary skill in the art that other data
`
`types may be synchronized to the common time base in a manner similar to the
`
`approaches used to synchronize the video/audio stream with the Transcript, the
`
`Slides, and the Hypertext Objects. Examples of such other data types include
`
`20
`
`animations, series of computer screen images, and other specialty video streams.
`
`An Outline represents an example of what is termed here a set of
`
`"conceptual events." A conceptual event is an association one makes with a
`
`segment of a data stream, having a beginning and end (though the beginning and
`
`25
`
`end may be the points), that represents something of interest. These data segments
`
`delineating a set of conceptual events may overlap each other, and furthermore,
`
`need not cover the entire data stream. An Outline represents a set of conceptual
`
`events that does cover the entire data stream and, if arranged hierarchically, such as
`
`with sections and subsections, has sections covering subsections. In the Outline
`
`30
`
`320 of FIG. 3, one has the sections 01 :"Introduction" 321, 02:"The First Manned
`
`9
`
`-9-
`
`
`
`WO97/41504
`
`PCT/US97 /06982
`
`Flight" 322 , and so on, covering the entire presentation. The subsections 021 :"The
`Wright Brothers" 324, 022:"Failed Attempts" 324 and so on, represents another
`coverage of the same segment as 02:"The First Manned Flight" 322. In accordance
`
`with the principles of the present invention, multiple Outlines, created manually or
`
`5
`
`automatically, may be associated with the same presentation, thereby allowing
`
`different users with different purposes in viewing the presentation to use the
`
`Outline most suitable for their purposes. These Outlines have been described from
`
`the perspective of having been created beforehand, but there is no reason, under the
`
`principles of the present invention, for this to be so. It should be readily
`
`IO
`
`understood by one of ordinary skill in the art that a similar approach would allow a
`
`user to create a set of "bookmarks" that denote particular segments, or user-chosen
`
`"conceptual events" within presentations. The bookmarks allow the user, for
`
`example, to return quickly to interesting parts of the presentation, or to pick up at
`
`the previous stopping point.
`
`15
`
`20
`
`25
`
`With reference to FIG. 4, the implementation of sets of conceptual events
`
`may be understood. There are time lines representing the various data streams, as
`for example, video 350, audio 352, slides 354 and transcript 356. There are two
`sets of conceptual events or data segments of these time lines shown, S1 360, S2
`362, S3 364, S4 366, ... and S' 1 370, S\ 372, S\ 374, S' 4 376 , S\ 378, ... , the first
`set indexed into the video 350 stream and second set indexed into the audio 352
`stream. Thus, the first set S1 360, S2 362, S3 364, etc., would respectively invoke
`time codes 380 and 381, 382 and 383, 384 and 385, etc., not only for the video
`350 data stream, but for the audio 352 , slides 354 and transcript 356 streams.
`Similarly, the second set S' 1 370, S\ 372, S' 3 384, etc., would invoke respectively
`time codes 390 (a point), 391 and 392, 393 and 394 (394 shown collinear with
`384, whether by choice or accident), etc., respectively, not only on the audio 352
`
`data stream, but on the video 350, slides 354 and transcript 356 streams. Consider
`
`the following example of a presentation of ice skating performed to music, with
`
`30
`
`voice-over commentaries and slides showing the relative standings of the ice
`
`-10-
`
`
`
`W097/41504
`
`PCT/US97 /06982
`
`skaters. A first Outline might list each skater and be broken down further into the
`
`individual moves of each skater's program. A second Outline might track the
`
`musical portion of the audio stream, following the music piece to piece, even
`movement to movement. Thus, one user might be interested in how a skater
`
`5
`
`performed a particular move, while another user might wish to study how a
`
`particular passage of music inspired a skater to make a particular move. Note that
`
`there is no requirement that two sets of conceptual events track each other in any
`
`way, they represent two different ways of studying the same presentation.
`
`Furthermore, the examples showed sets of conceptual events indexed into
`
`10
`
`isochronous data streams; it may be appreciated by someone of ordinary skill in the
`
`art that sets of conceptual events may be indexed into non-isochronous data streams
`
`as well. As was stated earlier, an Outline for a presentation may be indexed to the
`
`slide stream.
`
`15
`
`Referring now to the exemplary screen shown in FIG. 5, the exemplary
`screen 400 shows five windows 410, 420, 430, 440, 450 contained within the
`
`display. The Video Window 410 is used to display the video stream. The Slide
`
`Window 420 is used to display the slides used in the presentation. The Transcript
`
`Window 430 is used to display the transcribed audio of the speech. The Outline
`
`20 Window 440 is used to display the Outline of the presentation. The Control Panel
`450 is used to control the display in each of the other four windows. The
`
`Transcript Window 430 includes a Transcript Slider Bar 432 that allows the user to
`
`scroll through the transcript, and Next 433 and Previous 434 Phrase Buttons that
`
`allow the user to step through the transcript a phrase at a time, where a phrase
`consists of a single line of the transcript. It also includes a Hypertext Link 436, as
`
`25
`
`illustrated here in the form of the highlighted words, "Robert Jones", in the
`
`transcript. The Outline Window 440 includes an Outline Slider Bar 442 that allows
`
`the user to scroll through the outline, and Next 443 and Previous Entry buttons 444
`
`that allow the user to jump directly to the next or previous topic. The Control
`
`30
`
`Panel 450 includes a Video Slider Bar 452 used to select a position in the video
`
`11
`
`-11-
`
`
`
`WO 97/41504
`
`PCT/US97 /06982
`
`stream, and a Play Button 454 used to play the program. It also includes a Slider
`
`Bar 456 used to position the program at a Slide, and Previous 457 and Next 458
`
`Slide Buttons used to display the next and previous Slides in the Slide Window
`
`420. It also includes a Search Box 460 used to search for text strings(~. words)
`
`5
`
`in the Transcript.
`
`FIG. 5 shows the beginning of a presentation, corresponding to a time code
`
`of zero. The speaker's first slide is displayed in the Slide Window 410, the
`
`speaker's first words are displayed in the Transcript Window 430, and the
`
`10
`
`beginning of the outline is displayed in the Outline Window 440. The user can
`
`press the play button 454 to begin playing the presentation, which will cause the
`
`video and audio data to begin streaming, the transcript and outline scroll in
`
`synchronization with the video and audio, and the slides to advance at the
`
`appropriate times.
`
`15
`
`Alternatively, the user can jump directly to a point of interest. FIG. 6
`
`shows the result of the user selecting the second entry in the Outline from Outline
`
`Window 440', entitled "The First Manned Flight" (recall entry 322 of Outline 320
`
`in FIG. 3). From the Outline Index 327 in FIG. 3, the system determines that the
`
`20
`
`time code 327 of "The First Manned Flight" is 20.50 s. The system looks in the
`Slide Index 315 (also in FIG. 3) and determines that the second slide S2 begins at
`
`time code 317 of 20.40 s, and thus the second slide should be displayed in the
`
`Slide Window 420'. The system looks at the Time-Coded Transcript 215 (shown
`
`in FIG. 2), locates the word "the" 235 that begins on or immediately after time
`
`25
`
`code of 20.50 s, and displays that word and the appropriate number of subsequent
`
`words to fill up the Transcript Window 430'. The effect of this operation is that
`
`the user is able to jump directly to a point in the presentation, and the system
`
`positions each of the synchronized data streams to that point, including the video in
`
`Video Window 410'. The user may then begin playing the presentation at this
`
`12
`
`-12-
`
`
`
`WO97/41504
`
`PCT/US97 /06982
`
`point, or upon scanning the newly displayed slide and transcript jump directly to
`
`another point in the presentation.
`
`Referring now to FIG. 7, the flowchart starting at 600 indicates the
`
`5
`
`operation of an embodiment of the present invention. When the user slides the
`
`video slider bar 452 in FIG. 5, the Event Handler 601 in FIG. 7 receives a Move
`
`Video Slider Event 610. The Move Video Slider Event 610 causes the invention to
`
`calculate the video frame of the new position of the slider 452. The position of the
`
`video slider 452 is translated into the position in the video data stream in a
`
`10
`
`proportional fashion. For example, if the new position of the video slider 452 is
`
`positioned half-way along its associated slider bar, and the video stream consist of
`10,000 frames of video, then the 5,000th frame of video is displayed on the Video
`
`Window 420. The invention displays the new video frame 611, and computes the
`
`time code of the new video frame 612. Using this new time code, the system looks
`
`15
`
`up the Slide associated with the displayed video frame, and displays 613 the new
`
`Slide in the Slide Window 410. Again using this new time code, the system looks
`
`up the Phrase associated with the displayed video frame, and displays the new
`
`Phrase 614 in the Transcript Window 430. Again using this new time code, the
`
`system looks up the Outline Entry associated with the displayed video frame, and
`
`20
`
`displays the new Outline Entry 615 in the Outline Window 440. Finally, using this
`
`new time code, the system looks up the Hypertext Links associated with the
`
`displayed video frame, and displays them 616 in the appropriate place in the
`
`Transcript Window 430.
`
`25
`
`Referring back to FIG. 5, when the user moves the Slide Slider Bar 456 or
`
`presses the Previous 457 and Next 458 Slide Buttons, the Event Handler 601 in
`
`FIG. 7 receives a New Slide Event 620. The New Slide Event causes the system
`
`to display the selected new Slide 621 in the Slide Window 420, and to look up the
`
`time code of the new Slide in the Slide Index 622. Using the time code of the new
`
`30
`
`Slide as the new time code, the system computes the video frame associated with
`
`13
`
`-13-
`
`
`
`WO 97/41504
`
`PCT/US97 /06982
`
`the new time code and displays the indicated video frame 623 in the Video
`
`Window. Again using the new time code, the system looks up the Phrase
`
`associated with the displayed Slide, and displays the new Phrase 624 in the
`Transcript Window 430. Again using the new time code, the invention looks up the
`
`5
`
`Outline Entry associated with the displayed Slide, and displays the new Outline
`
`Entry 625 in the Outline Window 440. Finally, using the new time code, the
`
`system looks up the Hypertext Links associated with the displayed Slide, and
`
`displays them 626 in the appropriate place in the Transcript Window 430.
`
`Referring again back to FIG. 5, when the user moves the Transcript Slider
`
`Bar 432 or presses the Next 433 or Previous 434 Phrase Buttons, the Event
`
`Handler 601 in FIG. 7 receives a New Phrase Event 630. The New Phrase Event
`
`causes the system to display the selected new Phrase 631 in the Transcript Window
`
`430, and to look up the time code of the new Phrase in the Transcript Index 632.
`
`15
`
`Using the time code of the new Phrase as the new time