throbber
(12) United States Patent
`Fischer et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7.412,643 B1
`Aug. 12, 2008
`
`USOO7412643B1
`
`(54) METHOD AND APPARATUS FOR LINKING
`REPRESENTATION AND REALIZATION
`DATA
`
`(75) Inventors: Uwe Fischer, Schoenaich (DE); Stefan
`Hoffmann, Weil im Schoenbuch (DE):
`Werner Kriechbaum,
`Ammerbuch-Breitenholz (DE); Gerhard
`Stenzel, Herrenberg (DE)
`(73) Assignee: International Business Machines
`Corporation, Armonk, NY (US)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`(21) Appl. No.: 09/447,871
`
`(*) Notice:
`
`Nov. 23, 1999
`
`(22) Filed:
`(51) Int. Cl.
`(2006.01)
`G06F 7700
`(2006.01)
`G06F 3/00
`(2006.01)
`GOL 5/00
`(52) U.S. Cl. ....................... 715/200; 715/201: 715/234;
`715/730; 704/246; 704/251
`(58) Field of Classification Search .............. 715/500.1,
`715/501.1, 513, 730, 804; 345/723,730;
`704/246, 251, 260
`See application file for complete search history.
`References Cited
`
`(56)
`
`U.S. PATENT DOCUMENTS
`5,649,060 A * 7/1997 Ellozy et al. ................ 704/278
`5,737,725 A * 4, 1998 Case .......................... TO4,260
`5,857,099 A *
`1/1999 Mitchell et al. ............. TO4/235
`5,929,849 A * 7/1999 Kikinis ....................... 725,113
`5,963.215 A * 10/1999 Rosenzweig ................ 345,649
`6,076,059 A * 6/2000 Glickman et al. ........... TO4,260
`6,098,082 A * 8/2000 Gibbon et al. ........... T15,501.1
`
`6,172,675 B1* 1/2001 Ahmad et al. ............ T15,500.1
`6,243,676 B1* 6/2001 Witteman ................... TO4,243
`6,249,765 B1* 6/2001 Adler et al. ...........
`... 704,500
`6,260,011 B1* 7/2001 Heckerman et al. ......... 704/235
`6,263,507 B1 * 7/2001 Ahmad et al. .........
`... 725,134
`6,271,892 B1* 8/2001 Gibbon et al. ....
`... 348,700
`6,282,511 B1* 8/2001 Mayer ..............
`... TO4,270
`6,336,093 B2
`1/2002 Fasciano ..................... 704/278
`(Continued)
`FOREIGN PATENT DOCUMENTS
`
`JP
`
`2000-138664
`5, 2000
`(Continued)
`OTHER PUBLICATIONS
`Gibbon et al., “Generating Hypermedia Documents from Transcrip
`tions of Television Programs Using Parallel TextAlignment'. AT&T
`Labs Research, Feb. 1998, pp. 26-33.*
`(Continued)
`Primary Examiner William Bashore
`Assistant Examiner—Maikhanh Nguyen
`(74) Attorney, Agent, or Firm Scully, Scott, Murphy &
`Presser, P.C.: Lisa M. Yamonaco
`
`(57)
`
`ABSTRACT
`
`A method and apparatus for creating links between a repre
`sentation, (e.g. text data.) and a realization, (e.g. correspond
`ing audio data.) is provided. According to the invention the
`realization is structured by combining a time-stamped ver
`sion of the representation generated from the realization with
`structural information from the representation. Thereby so
`called hyper links between representation and realization are
`created. These hyper links are used for performing search
`operations in realization data equivalent to those which are
`possible in representation data, enabling an improved access
`to the realization (e.g. via audio databases).
`
`14 Claims, 7 Drawing Sheets
`
`ANAYZE THE STRUCTURE OF
`REPRESENTATION 10 AND
`SEPARATESTRUCTURE 105
`AND CONFENT 04
`
`301
`
`ANALYEE THE REALIZATION
`AND CREATE A TIME
`STAMPED REPRESENTATION 107
`
`302
`
`
`
`
`
`
`
`CREATE AN ALIGNED
`REPRESENTATION OS BY
`ALIGNING CONTENT 104 AND
`TIME-SIAMPED REPRESENTATION 107
`
`CREATE HYPERLINKS 111 BY
`COMBINING ALIGNED REPRESENTATION
`109 AND STRUCTURE 105
`
`303
`
`304
`
`-1-
`
`Amazon v. Audio Pod
`US Patent 9,319,720
`Amazon EX-1048
`
`

`

`US 7.412,643 B1
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`JP
`
`2001-313633
`
`11, 2001
`
`6,357,042 B2 * 3/2002 Srinivasan et al. ............ 725/32
`6,404,978 B1
`6/2002 Abe ............................ 386/55
`6,434.520 B1* 8/2002 Kanevsky et al. ........... TO4,243
`6,462,754 B1 102002 Chakraborty et al. .345,723
`6,473,778 B1
`10/2002 Gibbon .................... T15,501.1
`6,603,921 B1 ck
`8, 2003 Kanevsky et al. ............. 386.96
`6,636,238 B1 ck 10/2003 Amir et al.
`......
`... 345,730
`6,728,753 B1 ck
`4/2004 Parasnis et al. ............. TO9,203
`6,791,571 B1 ck
`9, 2004 Lamb - - - - - - - - - - - - - - - - - - - - - - - - - 345,619
`2001/002343.6 A1
`9/2001 Srinivasan et al.
`709,219
`2002/0059604 A1* 5/2002 Papagan et al. ............... 7.25/51
`FOREIGN PATENT DOCUMENTS
`
`
`
`JP
`
`2001-111543
`
`4/2001
`
`OTHER PUBLICATIONS
`Amir et al., “CueVideo: Automated video/audio indexing and brows
`ing”, IBM Almaden Research Center, ACM, 1996, p. 326.*
`Favela et al., “Image-retrieval agent: integrating image content and
`text', CICESE Research Center, IEEE, Oct. 1999, pp. 36-39.
`S. Srinivasan et al., “What is in that video anyway?'. In Search of
`Better Browsing, 6th IEEE Int. Conference on Multimedia Comput
`ing and Systems, Jun. 7-11, 1999, Florence, Italy, pp. 1-6.*
`D. Ponceleon et al., “Key to Effective Video Retrieval: Effective
`Cataloging and Browsing.” Proceedings of the 6th ACM Interna
`tional Conference on Multimedia, 1998, pp. 99-107.*
`* cited by examiner
`
`-2-
`
`

`

`U.S. Patent
`
`Aug. 12, 2008
`
`Sheet 1 of 7
`
`US 7.412,643 B1
`
`
`
`NOII WINESEHdBH
`
`(EdWWIS-EWI]
`
`NOII WINHSHHdHH
`
`NIWTd
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`——————————————————————-
`
`-3-
`
`

`

`U.S. Patent
`
`Aug. 12, 2008
`
`Sheet 2 of 7
`
`US 7412,643 B1
`
`FIG 2
`
`
`
`
`
`
`
`
`
`
`
`
`
`O
`
`<!DOCTYPE book system "book.dtd"
`<!ENTITY; ISOlatl system "isolatl.ent">
`%ISOlatl
`>
`<book)
`<chapter
`<heading>Key Note Speechk/heading>
`KSection>
`<p><a>
`Its a great honor for me to share this stage with the Lord Mayor and Chief
`Executive Of Hanover; Mr. Jung; and in a few minutes, Chancell Or Kohl.<laKa- I've
`been looking forward to this evening for a long time, because I've known for many
`years how important CeBIT is to the global information technology industry.<ls><l p>
`
`
`
`
`
`FIG. 4
`
`FIG. S
`
`104
`
`400
`
`
`
`It's
`d
`great
`honor
`
`Chancell Of
`Kohl,
`I've
`been
`lOOking
`technology
`industry.
`
`It's
`
`great
`honor
`
`Chancellor
`Kohl.
`I've
`been
`looking
`technology
`industry.
`
`400
`
`501
`
`OS
`/
`
`502
`
`-4-
`
`

`

`U.S. Patent
`
`Aug. 12, 2008
`
`Sheet 3 of 7
`
`US 7.412,643 B1
`
`FIG 3
`
`
`
`
`
`
`
`ANALYZE THE STRUCTURE OF
`REPRESENTATION 101 AND
`SEPARATE STRUCTURE 105
`AND CONTENT 104
`
`
`
`301
`
`ANALYZE THE REALIZATION
`AND CREATE A TIME
`STAMPED REPRESENTATION 107
`
`302
`
`CREATE AN ALIGNED
`REPRESENTATION 109 BY
`ALIGNING CONTENT 104 AND
`TIME-STAMPED REPRESENTATION O7
`
`303
`
`CREATE HYPER LINKS 111 BY
`COMBINING ALIGNED REPRESENTATION
`109 AND STRUCTURE 105
`
`304
`
`
`
`
`
`
`
`
`
`
`
`
`
`-5-
`
`

`

`U.S. Patent
`
`Aug. 12, 2008
`
`Sheet 4 of 7
`
`US 7.412,643 B1
`
`FIG. 5
`
`<book) 1
`
`Kchapter) 11
`
`<heading 111
`
`<Section> 112
`
`Key 1111
`note 1112
`speech 1113
`
`<lheading>
`
`<p> 1121
`| N.
`KSX 11211
`KSX 1122
`
`It's 112111
`a 112112
`great 112113
`
`I've 1122
`been 12122
`looking 112123
`
`KIS>
`
`KISP t
`
`50
`
`502
`
`-6-
`
`

`

`U.S. Patent
`
`Aug. 12, 2008
`
`Sheet 5 of 7
`
`US 7.412,643 B1
`
`FIG. 7
`
`O7
`
`FIG. B.
`
`O9
`
`It's
`
`great
`honor
`
`ChancellOr
`Kohl,
`I've
`been
`bOoking
`technology
`industry.
`
`It's
`8
`great
`honor
`
`ChancellOr
`Kohl.
`I've
`been
`lOOking
`technology
`industry.
`
`70
`
`702
`
`703
`
`400
`
`702
`
`703
`
`FIG. 9
`
`704
`
`
`
`<! DOCTYPE linkWeb SYSTEM linkWeb.dtd"
`<IENTITY sgml link SYSTEM "lousgm' CDATA SGML>
`>
`/
`<linkWeb>
`Kaudio linkends="sgml54 audio54">
`fic id-'audio54">file=d:\lou\lou beta, Mpg start-588 end-24703 unit-inskiurlloc
`111 <treelOC id="sgm 154" locsrc=sgml link-1 12 1 1k/treel OCX
`al linkends="sgml55 audio.55'>
`Kurilloc id="audio55">file=d:\lou\loubeta, mpg start=24703 end=38839 unitsmskfurll OC
`<treeloc id="sgm 155 locsrc=sgml link-1 12 12k/treel DCX
`
`-7-
`
`

`

`U.S. Patent
`U.S. Patent
`
`Aug. 12, 2008
`Aug. 12, 2008
`
`Sheet 6 of 7
`Sheet 6 of 7
`
`US 7.412,643 B1
`US 7,412,643 B1
`
`
`
`
`
`-8-
`
`

`

`U.S. Patent
`U.S. Patent
`
`Aug. 12, 2008
`
`Sheet 7 Of 7
`
`US 7.412,643 B1
`US 7,412,643 B1
`
`Qo|5tT“OTsET‘“9I4aSa‘Auysnpuy
`yueuodwtMoy=sueakAuewJO}UMOUX9A,Tasnegaq‘awit,Buo{eJo}BuluadaSty}0}P4eNsO}GuTYOOTVaagAAT]ZT27Ton.=+
`
`
`
`
`
`
`
`
`OrtOOET~TOvtVOETTOET
`
`
`
`AGoyouyoa,voT}ewsojuTTeqophay}0}STJ7Aa7
`eyaqrnoj-ayty
`SU=}1UNGERGE=PUaEOL-2=}e}SBdu'ejaq-noy=aqtj
`
`
`
`SW=}1UNEQ/p2=puaBag=}Je}5Hdw
`
`
`SAT]T.27I IX3L|YOLVQ01
`JOaATqNIaxzJatyypuesodeypyoyayzypIMaBezssty}aveys0}awJoysovoyyeoube
`
`
`
`
`_400ztar‘Ord
`
`
`“TYOYJOT]auey)‘SayNUTWAa}eUTpue‘Gun“sy‘JaAqUUeY
`aro1anv|GIWasGIWIS|HOIVIO)
` potefe||Ssoraw|SsTHOS|tsoraw|rsTHOS|
`
`[warseas[arrPswes
`
`
`Pe
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`-9-
`
`
`
`

`

`US 7,412,643 B1
`
`1.
`METHOD AND APPARATUS FORLINKING
`REPRESENTATION AND REALIZATION
`DATA
`
`FIELD OF THE INVENTION
`
`The present invention is directed to the field of multimedia
`data handling. It is more particularly directed to linking mul
`timedia representation and realization data.
`
`10
`
`BACKGROUND OF THE INVENTION
`
`15
`
`In recent years a new way of presenting information has
`been established. In this new multimedia approach, informa
`tion is presented by combining several media, e.g. written
`text, audio and video. However, when using e.g. the audio
`data, finding and addressing specific structures (pages, chap
`ters, etc. corresponding to the equivalent textual representa
`tion of the audio data) are either time consuming, complex, or
`impossible. A solution to overcome these problems is to link
`text and audio. The concept of linking text and audio is
`already used by some information providers. However, it is
`not widely used. One of the reasons for this is that it is a
`resource consuming process to build the hyper-links between
`the audio data and the corresponding textual representation.
`This either means a huge investment on the producers side, or
`a limited number of links, which limits the value for the user.
`As a result of the limiting state of the art user queries directed
`to databases containing multimedia material have to be in
`30
`most cases quite general. For example a user asks "In which
`document do the words “Italian' and “inflation’ occur?' A
`response to this query results in the complete audio document
`to be returned in which the requested data is enclosed.
`
`25
`
`2
`The hyper links are created by an apparatus according to
`the present invention. In one embodiment it is stored in a
`hyper document. These hyper links are used for performing
`search operations in audio data equivalent to those which are
`possible in representation data. This enables an improved
`access to the realization (e.g. via audio databases). Further
`more it is not only possible to search for elements of the input
`data, (e.g. words.) within the resulting hyper links or hyper
`document. But, it is also possible to navigate within the result
`ing data, (e.g. the hyper document.) and define the scope of
`the playback. In this context the word navigation means
`things like go to next paragraph, show complete section
`that includes this paragraph, etc. In an embodiment, the
`Scope of the playback is defined by clicking a display of a
`sentence, a paragraph, a chapter, etc. in a hyper link docu
`ment.
`Thereby the segments of the realization, (e.g. the audio
`stream.) become immediately accessible. In accordance with
`the present invention, these capabilities are not created
`through a manual process. All or part of this information is
`extracted and put together automatically.
`The time-alignment process of the present invention con
`nects the realization domain with the representation domain
`and therefore allows that certain operations, which are gen
`erally difficult to perform in the realization domain, be shifted
`into the representation domain where the corresponding
`operation is relatively easy to perform. For example, in
`recorded speech, standard text-mining technologies can be
`used to locate sequences of interest. The structure informa
`tion can be used to segment the audio signal in meaningful
`units like sentence, paragraph or chapter.
`An aspect of the present invention enables the automatic
`creation of link and navigation information between text and
`related audio or video. This gives producers of multimedia
`applications a huge process improvement. On one hand, an
`advantage is that the Software creates hyperlinks to the audio
`on a word by word, or sentence-by-sentence basis depending
`upon which is the more appropriate granularity for the appli
`cation. Other embodiments use another basis that is appro
`priate for the problem to be solved. Therefore a major disad
`Vantage of previous techniques, namely the limited number of
`links, is eliminated. On the other hand the technique of the
`present invention it dramatically reduces the amount of
`manual work necessary to synchronize a text transcript with
`its spoken audio representation, even if the result creates a
`higher value for the user. It also eliminates another disadvan
`tage of the previous techniques, namely the high costs of
`building such linked multimedia documents.
`Another aspect of the present invention is to generate a high
`level of detail, such that applications can be enhanced with
`new functions, or even new applications may be developed.
`Single or multiple words within a text can be aligned with the
`audio. Thus single or multiple words within a speech can be
`played, or one word in a sentence in a language learning
`application, or any sentence in a lesson, document, speech,
`etc. can be played.
`
`35
`
`40
`
`45
`
`50
`
`55
`
`SUMMARY OF THE INVENTION
`
`Accordingly, it is an aspect of the present invention to
`provide an enhanced method and apparatus to link text and
`audio data. It recognizes that most acoustic multimedia data
`have a common property which distinguishes them from
`visual data. These data can be expressed in two equivalent
`forms: as a textual or symbolic representation, e.g. score,
`Script or book, and as realizations, e.g. an audio stream. As
`used in an example of the present invention an audio stream is
`eitheran audio recording or the audio track of a video record
`ing or similar data.
`Information typically is presented as textual representa
`tion. The representation contains both the description of the
`content of the realization and the description of the structure
`of the realization. This information is used in the present
`invention to provide a method and apparatus for linking the
`representation and the realization.
`Starting from a textual or symbolic representation, (e.g. a
`structured electronic text document.) and one or multiple
`realizations (e.g. digital audio files like audio recording
`which represent the corresponding recorded spoken words.)
`so called hyper links between the representation, (e.g. the
`text.) and the related realization, (e.g. the audio part.) are
`created. An embodiment is provided such that the realization
`is structured by combining a time-stamped (or otherwise
`marked) version of the representation generated from the
`realization with structural information from the representa
`tion. Errors within the time stamped representation are elimi
`nated by aligning the time-stamped version of the represen
`tation generated from the realization with the content of the
`original representation in beforehand.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`60
`
`65
`
`These and other aspects, features, and advantages of the
`present invention will become apparent upon further consid
`eration of the following detailed description of the invention
`when read in conjunction with the drawing figures, in which:
`FIG. 1 shows an example of a schematic block diagram of
`the aligner in accordance with the present invention;
`FIG. 2 shows an example of a textual representation of a
`book in SGML:
`
`-10-
`
`

`

`US 7,412,643 B1
`
`3
`FIG.3 shows an example of a flow chart diagram describ
`ing a method of combining representation and realization in
`accordance with the present invention;
`FIG. 4 shows an example of a plain representation as cre
`ated by a structural analyzer,
`FIG. 5 shows an example of a tree structure of a represen
`tation with locators;
`FIG. 6 shows an example of structural information as cre
`ated by the structural analyzer;
`FIG. 7 shows an example of a time-stamped representation
`as created by the temporal analyzer,
`FIG. 8 shows an example of a time-stamped aligned rep
`resentation as created by the time aligner,
`FIG. 9 shows an example of a hyper-link document with
`hyperlinks as created by the link generator,
`FIG.10 shows an example ofa aligner for other file formats
`in accordance with the present invention;
`FIG. 11 shows an example of an aligner with enhancer in
`accordance with the present invention;
`FIG. 12 shows an example of a first mapping table as used
`in an audio database in accordance with the present invention;
`FIG. 13 shows an example of a second mapping table as
`used in an audio database in accordance with the present
`invention;
`FIG. 14 shows an example of a third mapping table as used
`in an audio database in accordance with the present invention;
`FIG. 15 shows an example of a fourth mapping table as
`used in an audio database in accordance with the present
`invention.
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`FIG. 1 shows an example embodiment of an aligner 100
`according to the present invention. The aligner 100 comprises
`a structural analyzer 103 with input means. The structural
`analyzer 103 is connected via two output means to a time
`aligner 108 and a link generator 110. The aligner 100 further
`comprises a temporal analyzer 106 with input means. The
`temporal analyzer 106 is connected via output means to the
`time aligner 108. The time aligner 108 with two input means
`for receiving data from the structural analyzer 103 as well as
`from the temporal analyzer 106 is connected via output
`means to the link generator 110. The link generator 110 with
`two input means for receiving data from the structural ana
`lyzer 103 as well as from the time aligner 108 has an output
`means for sending data.
`As shown in FIG. 1, the structuring process starts from a
`representation 101 and a realization 102. Usually both the
`representation 101 and the realization 102 are each stored in
`a separate file, but each of the data sets may actually be
`distributed among several files or be merged in one complex
`hyper-media file. In an alternate embodiment, both the rep
`resentation 101 and the realization 102 may be fed into the
`system as a data stream.
`The representation 101 is a descriptive mark-up document,
`e.g. the textual representation of a book, or the score of a
`Symphony. An example of a textual representation of a book
`marked up in Standard Generalized Markup Language
`(SGML) as defined in ISO 8879 is shown in FIG. 2. Thereby
`the SGML document comprises parts defining the structural
`elements of the book (characterized by the tag signs < ... >)
`and the plain content of the book. Instead of SGML other
`markup languages, e.g. Extensible Markup Language (XML)
`or LaTeX may be similarly used.
`An example of a realization 102 is an audio stream in a
`arbitrary standard format, e.g. WAVE or MPEG. It may be for
`example a RIFF-WAVE file with the following characteris
`
`4
`tics: 22050/11025 Hz, 16 bit mono. In the example the real
`ization 102 can be a narrated book in the form of a digital
`audio book.
`An example of a procedure for combining representation
`101 and realization 102 according to the present invention is
`illustrated in FIG. 3. In a first processing step 301, the repre
`sentation 101 is fed into the structural analyzer 103. The
`structural analyzer 103 analyzes the representation 101 and
`separates the original plain representation 104 and a struc
`tural information 105. The plain representation 104 includes
`the plain content of the representation 101, that is the repre
`sentation 101 stripped of all the mark-up. As an example the
`plain representation 104 (comprising the original words 400)
`of the representation 101 is shown in FIG. 4.
`An example for a structural information 105 appropriate
`for audio-books is a text with locators. Therefore in the above
`embodiment the structural analyzer 103 builds a tree structure
`of the SGML tagged text 101 of the audio book and creates
`locators which determine the coordinates of the elements
`(e.g. words) within the structure of the representation 101.
`Those skilled in the art will not fail to appreciate that the
`imposed structure is not restricted to a hierarchical tree like a
`table of contents, but other structures, e.g. lattice or index may
`be used as well.
`The process of document analysis and creation of struc
`tural information 105 as carried out in step 301 is now
`described. In FIG. 5 a tree structure with corresponding loca
`tors 501, 502,..., as built during this process is illustrated for
`the SGML formatted example depicted in FIG. 2.
`After the representation 101 is obtained, the SGML file is
`fed into the structural analyzer 103, the structural analyzer
`103 searches start elements (with the SGML tag
`structure < . . . >) and stop elements (with the SGML tag
`structure </... >) of the representation 101. If the event is a
`start element a new locator is created. In the present embodi
`ment, for the event <booke the locator “1”, for the event
`<chapters the locator “11” etc. is created. If the event is a data
`element, like <heading> or <S> (sentence), the content
`(words) together with the current locators are used to build the
`structural information 105 and the plain text is used to build
`the plain representation 104. In case the event is an end
`element, the structural analyzer 103 leaves the current locator
`and the procedure continues to examine the further events. If
`no further event exists the procedure ends.
`An example embodiment of structural information 105
`output by the structural analyzer 103 is shown in FIG. 6. The
`structural information 105 contains the elements of the real
`ization 101 (corresponding to the plain representation 104),
`e.g. the words, in the first column, and the corresponding
`locators 501, 502, ... in the second column.
`In step 302 of FIG.3, which may be carried out before, after
`or at the same time as step 301, the realization 102, e.g. the
`audio stream, is fed into the temporal analyzer 106. The
`temporal analyzer 106 generates a time-stamped (or other
`wise marked) representation 107 from the realization 102. It
`is advantageous to generate a time-stamped representation
`107 of the complete realization 102. However, some embodi
`ments create marked or time-stamped representations 107
`only of parts of the realization 102.
`The time-stamped representation 107 includes the tran
`Script and time-stamps of all elementary representational
`units like e.g. word or word clusters. In the above example a
`speech recognition engine is used as temporal analyzer 106 to
`generate a raw time-tagged transcript 107 of the audio file
`102. Many commercially available speech recognition
`engines might be used, for example IBM's ViaVoice. How
`ever, in addition to the recognition of words, the temporal/
`
`10
`
`15
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`-11-
`
`

`

`5
`marker analyzer 106 should be able to allocate time stamps
`and/or marks for each word. An example for a such a time
`stamped representation 107 is the transcript shown in FIG. 7.
`The start times 702 and the end times 703 in milliseconds are
`assigned to each word 701 of the resulting representation. The
`start and end times locators 702, 703 are time locators that
`specify an interval in the audio stream data using the coordi
`nate system appropriate for the audio format, e.g. millisec
`onds for WAVE-files. The time-stamped representation 107
`as shown in FIG. 7 may include words 704 which have not
`been recognized correctly, e.g. “Hohl” instead of “Kohl' or
`“booking instead of “looking”.
`In FIG. 3, step 303, the plain representation 104 derived
`from step 301 and the time-stamped representation 107
`derived from step 302are fed to the time aligner 108. The time
`aligner 108 aligns the plain representation 104 and the time
`stamped representation 107. Thereby for the aligned ele
`ments, the time locator (start time 702, end time 703) from the
`time-stamped representation 107 is attached to the content
`elements (e.g. words) from the plain representation 104 lead
`ing to the time-stamped aligned representation 109. The time
`aligner 108 creates an optimal alignment of the words 701
`from the time-stamped representation 107 and the words
`contained in the plain representation 104. This can be done by
`a variety of dynamic programming techniques. Such an align
`ment automatically corrects isolated errors 704 made by the
`temporal analyzer 106 by aligning the misrecognized words
`704 with the correct counterparts, e.g. “Hohl” with “Kohl”,
`“booking with “looking'. Missing parts of representation
`101 and/or missing realization 102 result in that segments of
`30
`the plain representation 104 and/or the time-stamped repre
`sentation 107 remain unaligned. An example of an aligned
`representation 109 combining the correct words 400 and the
`time locators 702, 703 is shown in FIG.8.
`In step 304 of FIG.3, the structural information 105 and the
`time-stamped aligned representation 109, e.g. in form of data
`streams, are fed into a link generator 110. The link generator
`110 then combines the locators 501, 502, ... of each element
`from the structural information 105 with the respective time
`locator 702. 703 from the time-stamped aligned representa
`tion 109, thereby creating connections between equivalent
`elements of representation 101 and realization 102, so called
`time-alignment hyper links 111. In an embodiment these
`hyper links 111 are stored in a hyperlink document. In an
`alternative embodiment these hyperlinks are transferred to a
`database.
`It is advantageous that the hyper-link document be a
`HyTime document conforming to the ISO/IEC 10744: 1992
`standard, or a type of document using another convention to
`express hyperlinks, e.g. DAISY. XLink, SMIL, etc.
`Whereas in the above example the locators of each word
`are combined, it is however possible to combine the locators
`of sentences or paragraphs or pages with the corresponding
`time locators. An example for a hyper-link document 900 in
`HyTime format is shown in FIG.9. Therein for each sentence
`the locators 501, 502. . . . . for the representation 101 and the
`time locator 702, 703, ..., for the realization 102 are com
`bined in hyper links. An alternate embodiment creates hyper
`links 111 wherein the locators for each word or for each other
`element (paragraph, page, etc.) are combined.
`It will be understood and appreciated by those skilled in the
`art that the inventive concepts described by the present appli
`cation may be embodied in a variety of system contexts. Some
`of the typical application domains are described in the fol
`lowing.
`Sometimes either the representation or the realization (or
`both) is not available in a native or operating data format
`
`50
`
`35
`
`40
`
`45
`
`55
`
`60
`
`65
`
`US 7,412,643 B1
`
`10
`
`15
`
`25
`
`6
`directly processable by the aligner 100. In this case the avail
`able data has to be converted from a native format into the data
`format which can be used by the aligner 100 directly.
`Thus, in Some cases, the native alien format of the original
`representation is not the same format as the native alien for
`mat of the realization. The representation is given in a native
`data format (A). The realization is given in a native data
`format (B). These data formats are different. In an embodi
`ment, the representation (A) is converted into an operating
`data format (A') and the realization (B) is converted into an
`operating data format (B').
`FIG. 10 illustrates an example of an aligner 1000 for other
`file formats in accordance with the present invention. Using
`the aligner 1000 it becomes possible to create hyper links or
`hyper-link documents defined in the native format of the
`representation and/or realization. For example for the repre
`sentation a large variety of Such native representational for
`mats exist. These range from proprietary text formats like
`e.g., Microsoft Word or Lotus WordPro, to text structuring
`languages like e.g. Troff or TeX.
`This aligner 1000 includes aligner 100 shown in FIG. 1.
`Additionally a first converter 1001, and/or a second converter
`1002, and a link transformer 1003 are elements of the aligner
`1000. These elements are connected to each other as shown in
`FIG 10.
`In an embodiment the following procedure is applied. First
`the native representation 1004 is converted by the first con
`verter 1001 into a representation 101 in an operating or stan
`dard format, e.g. SGML. Additionally the first converter 1001
`produces information necessary to re-convert the resulting
`hyper links 111 into the native format. Such information can
`be e.g. a representation mapping table 1006 (a markup map
`ping table).
`The native realization 1005 is converted by a second con
`verter 1002 into a realization 102 in the operating or standard
`format, e.g. WAVE. In addition a realization mapping table
`1007 (a time mapping table) is created by the second con
`verter 1002.
`In the described example it is assumed, that both the rep
`resentation and the realization have to be converted before
`being processed by the aligner 100. A situation is however
`possible, in which only the representation 101 or only or the
`realization 102 has to be converted. Accordingly the proce
`dure has to be adapted to the particular situation.
`Both converters 1001, 1002 are programmed according to
`the source and destination formats. The detailed implemen
`tation of the converters 1001, 1002 and the way of creating the
`mapping tables 1006, 1007 are accomplished in ways known
`to those skilled in the art. Next both the representation and the
`realization, each in operating/standard format, are fed into the
`aligner 100. Aligner 100 creates the hyper-links 111 as
`described above. Next, the hyper-links 111 or the correspond
`ing hyper-link document 900 and the mapping tables 1006,
`1007 are used by the link transformer 1003 to create native
`hyper-links 1008 in the format of the original representation
`1004. For this purpose the link transformer 1003 uses the
`mapping tables 1006 and/or 1007 to replace the locators in the
`hyper links 111 with locators using the appropriate coordi
`nate systems for the native representation 1004 and/or native
`realization 1005 as specified by the mapping tables 1006,
`1007. For example if the native representation 1004 was
`written in HTML format, it would than be converted into
`SGML format by the first converter 1001. The hyperlinks 111
`created by the aligner 100 would than be retransformed into
`HTML by the link transformer 1003 using the mapping table
`1006.
`
`-12-
`
`

`

`US 7,412,643 B1
`
`5
`
`10
`
`15
`
`7
`Sometimes either the representation 101 and/or the real
`ization 102 is enhanced by using information from the other.
`Examples include automatic Subtitling, time-stamping the
`dialogues in a script, etc.
`FIG. 11 illustrates an example of an aligner 1100 with an
`enhancer corresponding to the present invention. The
`enhancer 1101 is employed to create enhanced versions of
`representation 101 and/or realization 102. The enhancer 1101
`uses the hyper links 111 or the hyper-link document 900 from
`the aligner 100 and the original representation 101 and/or the
`original realization 102 to create an enhanced representation
`and/or an enhanced realization 1102 or both. Thereby the
`enhancer 1101 includes the hyper links 111 into the original
`representation 101 or realization 102. A typical example for
`an enhanced representation 1102 is the addition of audio clips
`to an HTML file. Other examples are the addition of a syn
`chronized representation to MIDI or RIFF files. It is noted
`that the aligner 1100 with enhancer can of course be com
`bined with the principle of the aligner 1000 for other file
`formats as described above.
`Telecast applications (TV, digital audio broadcasting etc.)
`use an interleaved system stream that carries the representa
`tion 101, the realization 102 and a synchronization informa
`tion. The synchronized information is created in the form of a
`system stream by an aligner with a multiplexer in accordance
`with the present invention. Again FIG. 11 may be used to
`illustrate the system. This aligner with multiplexer may be
`implemented to use the aligner 100 as described above. A
`multiplexer (corresponding to the enhancer 1101 in FIG. 11)
`is employed to generates an interleaved system stream (cor
`responding to the enhanced representation 1102 in FIG. 11).
`In this way, the multiplexer combines the original repre

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket