throbber
(19) United States
`(12) Patent Application Publication (10) Pub. No.: US 2003/0107592 A1
`(43) Pub. Date:
`Jun. 12, 2003
`Li et al.
`
`US 2003.01.07592A1
`
`(54) SYSTEM AND METHOD FOR RETRIEVING
`INFORMATION RELATED TO PERSONS IN
`VIDEO PROGRAMS
`(75) Inventors: Dongge Li, Ossining, NY (US);
`Nevenka Dimitrova, Yorktown Heights,
`NY (US); Lalitha Agnihotri, Fishkill,
`NY (US)
`Correspondence Address:
`PHILIPS ELECTRONICS NORTH
`AMERICAN CORP
`580 WHITE PLAINS RD
`TARRYTOWN, NY 10591 (US)
`(73) Assignee: KONINKLIJKE PHILIPS ELEC
`TRONICS N.V.
`
`(21) Appl. No.:
`
`10/014,234
`
`(22) Filed:
`
`Dec. 11, 2001
`Publication Classification
`
`(51) Int. Cl." ....................................................... G09G 5/00
`(52) U.S. Cl. .............................................................. 345/745
`(57)
`ABSTRACT
`An information tracking device receives content data, Such
`as a video or television signal from one or more information
`Sources and analyzes the content data according to a query
`criteria to extract relevant Stories. The query criteria utilizes
`a variety of information, Such as but not limited to a user
`request, a user profile, and a knowledge base of known
`relationships. Using the query criteria, the information track
`ing device calculates a probability of a perSon or event
`occurring in the content data and Spots and extracts Stories
`accordingly. The results are index, ordered, and then dis
`played on a display device.
`
`(O
`
`Remote USéru
`Stra
`
`Set-top
`gocesso
`
`O
`
`2Oo
`
`S
`
`
`
`spl A Y
`Device
`
`S
`
`le
`
`oMe
`c
`
`A AA |\rize
`
`2. 29
`
`seat'
`
`JFo Yu Aation
`4 ource
`
`so
`
`fe (Mft to
`
`in foe MA to N
`Souve-C e.
`
`Page 1
`
`Amazon Ex. 1106
`Amazon.com v. CustomPlay
`IPR2018-01498
`
`

`

`Patent Application Publication
`
`Jun. 12, 2003 Sheet 1 of 6
`
`US 2003/0107592 A1
`
`OTZ
`
`
`
`Q |
`
`Page 2
`
`

`

`Patent Application Publication
`
`US 2003/0107592 A1
`
`QS
`
`
`
`32 z)^ o5 |----------
`
`
`
`
`
`.*rm --••••••••••••••r•***~*~*~~~~ ~~~~ ~~~~ ~ ~ ~ ~ ~ ~~~~
`
`
`
`
`
`Page 3
`
`

`

`Patent Application Publication Jun. 12, 2003 Sheet 3 of 6
`FIG. 3
`
`US 2003/0107592 A1
`
`Video in
`
`
`
`StoryManagement Temporal, Causal
`(link, organize, score, rating)
`
`Page 4
`
`

`

`Patent Application Publication Jun. 12, 2003. Sheet 4 of 6
`
`US 2003/0107592 A1
`
`FIG. 4
`
`
`
`Video in
`
`Page 5
`
`

`

`Patent Application Publication Jun. 12, 2003 Sheet 5 of 6
`
`US 2003/0107592 A1
`
`FIG. 5
`
`Video/Text/Audio in
`
`Visual Segmentation:
`Cuts, face shots
`(constant faces
`
`Audio
`Segmentation &
`Classification
`
`Transcript topic
`by time slicing
`
`goe
`
`Information
`Fusion
`
`
`
`Internal Story
`Segmentation &
`Annotation
`
`Person
`Finding
`
`
`
`onferencing &
`Name resolution
`Spotted
`
`Page 6
`
`

`

`Patent Application Publication Jun. 12, 2003 Sheet 6 of 6
`
`US 2003/0107592 A1
`
`FIG. 6
`
`Ol
`
`STORY
`
`Cob
`
`Index by Name,
`Topic, Keywords
`
`Causality Relationship
`extraction
`
`Temporal Relationship
`Extraction
`
`lo
`
`I9
`
`
`
`l
`
`Profile &
`Relevance
`Feedback
`
`Page 7
`
`

`

`US 2003/0107592 A1
`
`Jun. 12, 2003
`
`SYSTEMAND METHOD FOR RETREVING
`INFORMATION RELATED TO PERSONS IN
`VIDEO PROGRAMS
`
`FIELD OF THE INVENTION
`0001. The present invention relates to a person tracker
`and method of retrieving information related to a targeted
`person from multiple information Sources.
`
`BACKGROUND OF INVENTION
`0002 With some 500+ channels of available television
`content and endleSS Streams of content accessible via the
`Internet, it might seem that one would always have access to
`desirable content. However, to the contrary, Viewers are
`often unable to find the type of content they are Seeking. This
`can lead to a frustrating experience.
`0003) When a user watches television there often occur
`times when the user would be interested in learning further
`information about perSons in the program the user is watch
`ing. Present Systems, however, fail to provide a mechanism
`for retrieving information related to a targeted Subject, Such
`as an actor or actress, or an athlete. For example, EP 1031
`964 is directed to an automated Search device. For example,
`a user with access to 200 television Stations Speaks his desire
`for watching, for example, Robert Redford movies or games
`shows. Voice recognition Systems cause a Search of available
`content and present the user with Selections based on the
`request. Thus, the System is an advanced channel Selecting
`System and does not go Outside the presented channels to
`obtain additional information for the user. Further, U.S. Pat.
`No. 5,596,705 presents the user with a multi-level presen
`tation of, for example, a movie. The Viewer can watch the
`movie or with the System, formulate queries to obtain
`additional information regarding the movie. However, it
`appears that the Search is of a closed System of movie related
`content In contrast, the disclosure of invention goes outside
`of the available television programs and outside of a single
`Source of content. Several examples are given. A user is
`watching a live cricket match and can retrieve detailed
`Statistics on the player at bat. A user watching a movie wants
`to know more about the actor on the Screen and additional
`information is located from various Web Sources, not a
`parallel Signal transmitted with the movie. A user Sees an
`actress on the Screen who looks familiar, but can’t remember
`her name. The System identifies all the programs the user has
`watched that the actress has been in. Thus, the proposal
`represents a broader, or open-ended Search System for
`accessing a much larger universe content than either of the
`two cited references.
`0004. On the Internet, a user looking for content can type
`a Search request into a Search engine. However, these Search
`engines are often hit or miss and can be very inefficient to
`use. Furthermore, current Search engines are unable to
`continuously access relevant content to update results over
`time. There are also specialized web sites and news groups
`(e.g., sports sites, movie sites, etc.) for users to access.
`However, these Sites require users to log in and inquire about
`a particular topic each time the user desires information.
`0005 Moreover, there is no system available that inte
`grates information retrieving capability acroSS Various
`media types, Such as television and the Internet, and can
`extract people or Stories about Such perSons from multiple
`
`channels and site. In one system, disclosed in EP915621,
`URLS are embedded in a closed caption portion of a trans
`mission so that the URLS can be extracted to retrieve the
`corresponding web pages in Synchronization with the tele
`Vision signal. However, Such Systems fail to allow for user
`interaction.
`0006 Thus there is a need for a system and method for
`permitting a user to create a targeted request for information,
`which request is processed by a computing device having
`access to multiple information Sources to retrieve informa
`tion related to the Subject of the request.
`
`SUMMARY OF THE INVENTION
`0007. The present invention overcomes the shortcomings
`of the prior art. Generally, a perSon tracker comprises a
`content analyzer comprising a memory for Storing content
`data received from an information Source and a processor for
`executing a Set of machine-readable instructions for analyZ
`ing the content data according to query criteria. The perSon
`tracker further comprises an input device communicatively
`connected to the content analyzer for permitting a user to
`interact with the content analyzer and a display device
`communicatively connected to the content analyzer for
`displaying a result of analysis of the content data performed
`by the content analyzer. According to the Set of machine
`readable instructions, the processor of the content analyzer
`analyzes the content data to extract and indeX one or more
`Stories related to the query criteria.
`0008 More specifically, in an exemplary embodiment,
`the processor of the content analyzer uses the query criteria
`to Spot a Subject in the content data and retrieve information
`about the Spotted person to the user. The content analyzer
`also further comprises a knowledge base which includes a
`plurality of known relationships including a map of known
`faces and Voices to names and other related information. The
`celebrity finder System is implemented based on the fusion
`of cues from audio, Video and available video-text or closed
`caption information. From the audio data, the System can
`recognize Speakers based on the Voice. From the visual cues,
`the System can track the face trajectories and recognize faces
`for each of the face trajectories. Whenever available, the
`System can extract names from Video text and close caption
`data. A decision-level fusion Strategy can then be used to
`integrate different cues to reach a result. When the user Sends
`a request related to the identify of the person shown on the
`Screen, the person tracker can recognize that perSon accord
`ing to the embedded knowledge, which may be Stored in the
`tracker or loaded from a Server. Appropriate responses can
`then be created according to the identification results. If
`additional or background information is desired, a request
`may also be sent to the Server, which then Searches through
`a candidate list or various external Sources, Such as the
`Internet (e.g., a celebrity web site) for a potential answer or
`clues that will enable the content analyzer to determine an
`SWC.
`0009. In general, the processor, according to the machine
`readable instructions performs Several Steps to make the
`most relevant matches to a user's request or interests,
`including but not limited to perSon Spotting, Story extraction,
`inferencing and name resolution, indexing, results presen
`tation, and user profile management. More specifically,
`according to an exemplary embodiment, a perSon Spotting
`
`Page 8
`
`

`

`US 2003/0107592 A1
`
`Jun. 12, 2003
`
`function of the machine-readable instructions extracts faces,
`Speech, and text from the content data, makes a first match
`of known faces to the extracted faces, makes a Second match
`of known Voices to the extracted Voices, Scans the extracted
`text to make a third match to known names, and calculates
`a probability of a particular perSon being present in the
`content databased on the first, Second, and third matches. In
`addition, a Story extraction function preferably Segments
`audio, Video and transcript information of the content data,
`performs information fusion, internal Story Segmentation/
`annotation, and inferencing and name resolution to extract
`relevant Stories.
`0.010 The above and other features and advantages of the
`present invention will become readily apparent from the
`following detailed description thereof, which is to be read in
`connection with the accompanying drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`0011. In the drawing figures, which are merely illustra
`tive, and wherein like reference numerals depict like ele
`ments throughout the Several views:
`0012 FIG. 1 is a schematic diagram of an overview of an
`exemplary embodiment of an information retrieval System in
`accordance with the present invention;
`0013 FIG. 2 is a schematic diagram of an alternate
`embodiment of an information retrieval System in accor
`dance with the present invention;
`0014 FIG. 3 is a is a flow diagram of a method of
`information retrieval in accordance with the present inven
`tion;
`FIG. 4 is a flow diagram of a method of person
`0.015
`Spotting and recognition in accordance with the present
`invention;
`0016 FIG. 5 is a flow diagram of a method of story
`extraction; and
`0017 FIG. 6 is a flow diagram of a method of indexing
`the extracted Stories.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`0.018. The present invention is directed to an interactive
`System and method for retrieving information from multiple
`media Sources according to a request of a user of the System.
`0019. In particular, an information retrieval and tracking
`System is communicatively connected to multiple informa
`tion sources. Preferably, the information retrieval and track
`ing System receives media content from the information
`Sources as a constant Stream of data. In response to a request
`from a user (or triggered by a user's profile), the System
`analyzes the content data and retrieves that data most closely
`related to the request. The retrieved data is either displayed
`or Stored for later display on a display device.
`0020 System Architecture
`0021. With reference to FIG. 1, there is shown a sche
`matic overview of a first embodiment of an information
`retrieval System 10 in accordance with the present invention.
`A centralized content analysis System 20 is interconnected to
`a plurality of information sources 50. By way of non
`
`limiting example, information Sources 50 may include cable
`or Satellite television and the Internet. The content analysis
`System 20 is also communicatively connected to a plurality
`of remote user sites 100, described further below.
`0022. In the first embodiment, shown in FIG. 1, central
`ized content analysis System 20 comprises a content ana
`lyzer 25 and one or more data storage devices 30. The
`content analyzer 25 and the storage devices 30 are prefer
`ably interconnected via a local or wide area network. The
`content analyzer 25 comprises a processor 27 and a memory
`29, which are capable of receiving and analyzing informa
`tion received from the information sources 50. The proces
`Sor 27 may be a microprocessor and associated operating
`memory (RAM and ROM), and include a second processor
`for pre-processing the Video, audio and text components of
`the data input. The processor 27, which may be, for example,
`an Intel Pentium chip or other more powerful multiproces
`Sor, is preferably powerful enough to perform content analy
`sis on a frame-by-frame basis, as described below. The
`functionality of content analyzer 25 is described in further
`detail below in connection with FIGS. 3-5.
`0023 The storage devices 30 may be a disk array or may
`comprise a hierarchical Storage System with tera, peta and
`exabytes of Storage devices, optical Storage devices, each
`preferably having hundreds or thousands of giga-bytes of
`Storage capability for Storing media content. One skilled in
`the art will recognize that any number of different Storage
`devices 30 may be used to Support the data Storage needs of
`the centralized content analysis system 20 of an information
`retrieval System 10 that accesses Several information Sources
`50 and can Support multiple users at any given time.
`0024. As described above, the centralized content analy
`sis System 20 is preferably communicatively connected to a
`plurality of remote user sites 100 (e.g., a user's home or
`office), via a network 200. Network 200 is any global
`communications network, including but not limited to the
`Internet, a wireleSS/satellite network, cable network, any the
`like. Preferably, network 200 is capable of transmitting data
`to the remote user sites 100 at relatively high data transfer
`rates to Support media rich content retrieval, Such as live or
`recorded television.
`0025. As shown in FIG. 1, each remote site 100 includes
`a set-top box 110 or other information receiving device. A
`Set-top box is preferable because most Set-top boxes, Such as
`TiVo(R), WebTB(R), or UltimateTV(R), are capable of receiving
`several different types of content. For instance, the Ulti
`mateTV(R) set-top box from Microsoft(R) can receive content
`data from both digital cable Services and the Internet.
`Alternatively, a Satellite television receiver could be con
`nected to a computing device, Such as a home personal
`computer 140, which can receive and proceSS web content,
`via a home local area network. In either case, all of the
`information receiving devices are preferably connected to a
`display device 115, such as a television or CRT/LCD dis
`play.
`0026. Users at the remote user sites 100 generally access
`and communicate with the set-top box 110 or other infor
`mation receiving device using various input devices 120,
`Such as a keyboard, a multi-function remote control, Voice
`activated device or microphone, or personal digital assistant.
`Using Such input devices 120, users can input specific
`
`Page 9
`
`

`

`US 2003/0107592 A1
`
`Jun. 12, 2003
`
`requests to the perSon tracker, which uses the requests Search
`for information related to a particular perSon, as described
`further below.
`0027. In an alternate embodiment, shown in FIG. 2, a
`content analyzer 25 is located at each remote site 100 and is
`communicatively connected to the information sources 50.
`In this alternate embodiment, the content analyzer 25 may be
`integrated with a high capacity Storage device or a central
`ized storage device (not shown) can be utilized. In either
`instance, the need for a centralized analysis System 20 is
`eliminated in this embodiment. The content analyzer 25 may
`also be integrated into any other type of computing device
`140 that is capable of receiving and analyzing information
`from the information sources 50, such as, by way of non
`limiting example, a personal computer, a hand held com
`puting device, a gaming console having increased proceSS
`ing and communications capabilities, a cable Set-top box,
`and the like. A secondary processor, such as the TriMedia TM
`Tricodec card may be used in Said computing device 140 to
`pre-process video signals. However, in FIG. 2 to avoid
`confusion, the content analyzer 25, the Storage device 130,
`and the Set-top box 110 are each depicted Separately.
`0028 Functioning of Content Analyzer
`0029. As will become evident from the following discus
`Sion, the functionality of the information retrieval system 10
`has equal applicability to both television/video based con
`tent and web-based content. The content analyzer 25 is
`preferably programmed with a firmware and Software pack
`age to deliver the functionalities described herein. Upon
`connecting the content analyzer 25 to the appropriate
`devices, i.e., a television, home computer, cable network,
`etc., the user would preferably input a personal profile using
`input device 120 that will be stored in a memory 29 of the
`content analyzer 25. The personal profile may include infor
`mation Such as, for example, the user personal interests (e.g.,
`Sports, news, history, gossip, etc.), perSons of interest (e.g.,
`celebrities, politicians, etc.), or places of interest (e.g.,
`foreign cities, famous sites, etc.), to name a few. Also, as
`described below, the content analyzer 25 preferably stores a
`knowledge base from which to draw known data relation
`ships, such as G. W. Bush is the President of the United
`States.
`0030. With reference to FIG. 3, the functionality of the
`content analyzer will be described in connection with the
`analysis of a Video signal. In Step 302, the content analyzer
`25 performs a Video content analysis using audio Visual and
`transcript processing to perform perSon Spotting and recog
`nition using, for example, a list of celebrity or politician
`names, Voices, or images in the user profile and/or knowl
`edge base and external data Source, as described below in
`connection with FIG. 4. In a real-time application, the
`incoming content stream (e.g., live cable television) is
`buffered either in the storage device 30 at the central site 20
`or in the local storage device 130 at the remote site 100
`during the content analysis phase. In other non-real-time
`applications, upon receipt of a request or other prescheduled
`event (described below), the content analyzer 25 accesses
`the storage device 30 or 130, as applicable, and performs the
`content analysis.
`0031. The content analyzer 25 of person tracking system
`10 receives a viewer's request for information related to a
`certain celebrity shown in a program and uses the request to
`
`return a response, which can help the viewer better Search or
`manage TV programs of interest. Here are four examples:
`0032 1. User is watching a cricket match. A new
`player comes to bat. The user asks the system 10 for
`detailed Statistics on this player based on this match
`and previous matches this year.
`0033 2. User sees an interesting actor on the screen
`and wants to know more about him. The system 10
`locates Some profile information about the actor from
`the Internet or retrieves news about the actor from
`recently issued Stories.
`0034 3. User sees an actress on the screen who
`looks familiar, but the user cannot remember the
`actress's name. System 10 responds with all the
`programs that this actress has been in along with her
`C.
`0035 4. A user who is very interested in the latest
`news involving a celebrity Sets her personal Video
`recorder to record all the news about the celebrity.
`The system 10 scans the news channels, and celeb
`rity and talk shows, for example, for the celebrity
`and records of channels all matching programs.
`0036 Because most cable and satellite television signals
`carry hundreds of channels it is preferable to target only
`those channels that are most likely to produce relevant
`Stories. For this purpose the content analyzer 25 may be
`programmed with knowledge base 450 or field database to
`aid the processor 27 in determining a “field types” for the
`user's request. For example, the name Dan Marino in the
`field database might be mapped to the field “sports”. Simi
`larly, the term “terrorism” might be mapped to the field
`“news. In either instance, upon determination of a field
`type, the content analyzer would then only Scan those
`channels relevant to the field (e.g., news channels for the
`field “news”). While these categorizations are not required
`for operation of the content analysis process, using the user's
`request to determine a field type is more efficient and would
`lead to quicker Story extraction. In addition, it should be
`noted that the mapping of particular terms to fields is a
`matter of design choice and could be implemented in any
`number of ways.
`0037 Next, in step 304, the video signal is further
`analyzed to extract Stories from the incoming Video. Again,
`the preferred process is described below in connection with
`FIG. 5. It should be noted that the person spotting and
`recognition can also be executed in parallel with Story
`extraction as an alternative implementation.
`0038 An exemplary method of performing content
`analysis on a Video Signal, Such as a television NTSC signal,
`which is the basis for both the perSon Spotting and Story
`extraction functionality, will now be described. Once the
`video signal is buffered, the processor 27 of the content
`analyzer 25, preferably uses a Bayesian or fusion Software
`engine, as described below, to analyze the Video signal. For
`example, each frame of the Video Signal may be analyzed So
`as to allow for the Segmentation of the Video data.
`0039. With reference to FIG. 4, a preferred process of
`performing perSon Spotting and recognition will be
`described. At level 410, face detection, Speech detection, and
`transcript extraction is performed Substantially as described
`
`Page 10
`
`

`

`US 2003/0107592 A1
`
`Jun. 12, 2003
`
`above. Next, at level 420, the content analyzer 25 performs
`face model and Voice model extraction by matching the
`extracted faces and Speech to known face and Voice models
`Stored in the knowledge base. The extracted transcript is also
`Scanned to match known names Stored in the knowledge
`base. At level 430, using the model extraction and name
`matches, a perSon is Spotted or recognized by the content
`analyzer. This information is then used in conjunction with
`the story extraction functionality as shown in FIG. 5.
`0040. By way of example only, a user may be interested
`in political events in the mid-east, but will be away on
`vacation on a remote island in South East Asia; thus, unable
`to receive news updates. Using input device 120, the user
`can enter keywords associated with the request. For
`example, the user might enter Israel, Palestine, Iraq, Iran,
`Ariel Sharon, Saddam Hussein, etc. These key terms are
`stored in a user profile on a memory 29 of the content
`analyzer 25 AS discussed above, a database of frequently
`used terms or perSons is Stored in the knowledge base of the
`content analyzer 25. The content analyzer 25 looks-up and
`matches the inputted key terms with terms Stored in the
`database. For example, the name Ariel Sharon is matched to
`Israeli Prime Minister, Israel is matched to the mid-east, and
`So on. In this Scenario, these terms might be linked to a news
`field type. In another example, the names of Sports figures
`might return a Sports field result.
`0041) Using the field result, the content analyzer 25
`accesses the most likely areas of the information Sources to
`find related content. For example, the information retrieval
`System might access news channels or news related web
`Sites to find information related to the request terms.
`0042. With reference now to FIG. 5, an exemplary
`method of story extract will be described and shown. First,
`in steps 502, 504, and 506, the video/audio source is
`preferably analyzed to Segment the content into visual, audio
`and textual components, as described below. Next, in Steps
`508 and 510, the content analyzer 25 performs information
`fusion and internal Segmentation and annotation. Lastly, in
`Step 512, using the perSon recognition result, the Segmented
`Story is inferenced and the names are resolved with the
`Spotted Subject.
`0043. Such methods of video segmentation include but
`are not limited to cut detection, face detection, text detec
`tion, motion estimation/segmentation/detection, camera
`motion, and the like. Furthermore, an audio component of
`the Video Signal may be analyzed. For example, audio
`Segmentation includes but is not limited to Speech to text
`conversion, audio effects and event detection, Speaker iden
`tification, program identification, music classification, and
`dialogue detection based on Speaker identification. Gener
`ally speaking, audio Segmentation involves using low-level
`audio features Such as bandwidth, energy and pitch of the
`audio data input. The audio data input may then be further
`Separated into various components, Such as music and
`Speech. Yet further, a Video Signal may be accompanied by
`transcript data (for closed captioning System), which can
`also be analyzed by the processor 27. As will be described
`further below, in operation, upon receipt of a retrieval
`request from a user, the processor 27 calculates a probability
`of the occurrence of a Story in the Video signal based upon
`the plain language of the request and can extract the
`requested Story.
`
`0044 Prior to performing segmentation, the processor 27
`receives the video signal as it is buffered in a memory 29 of
`the content analyzer 25 and the content analyzer accesses the
`video signal. The processor 27 de-multiplexes the video
`Signal to Separate the Signal into its video and audio com
`ponents and in Some instances a text component. Alterna
`tively, the processor 27 attempts to detect whether the audio
`Stream contains Speech. An exemplary method of detecting
`Speech in the audio stream is described below. If Speech is
`detected, then the processor 27 converts the Speech to text to
`create a time-Stamped transcript of the Video signal. The
`processor 27 then adds the text transcript as an additional
`Stream to be analyzed.
`0045 Whether speech is detected or not, the processor 27
`then attempts to determine Segment boundaries, i.e., the
`beginning or end of a classifiable event. In a preferred
`embodiment, the processor 27 performs Significant Scene
`change detection first by extracting a new keyframe when it
`detects a Significant difference between Sequential I-frames
`of a group of pictures. AS noted above, the frame grabbing
`and keyframe extracting can also be performed at pre
`determined intervals. The processor 27 preferably, employs
`a DCT-based implementation for frame differencing using
`cumulative macroblock difference measure. Unicolor key
`frames or frames that appear Similar to previously extracted
`keyframes get filtered out using a one-byte frame Signature.
`The processor 27 bases this probability on the relative
`amount above the threshold using the differences between
`the Sequential I-frames.
`0046. A method of frame filtering is described in U.S.
`Pat. No. 6,125,229 to Dimitrova et al. the entire disclosure
`of which is incorporated herein by reference, and briefly
`described below. Generally Speaking the processor receives
`content and formats the Video signals into frames represent
`ing pixel data (frame grabbing). It should be noted that the
`process of grabbing and analyzing frames is preferably
`performed at pre-defined intervals for each recording device.
`For instance, when the processor begins analyzing the Video
`Signal, keyframes can be grabbed every 30 Seconds.
`0047 Once these frames are grabbed every selected
`keyframe is analyzed. Video Segmentation is known in the
`art and is generally explained in the publications entitled, N.
`Dimitrova, T. McGee, L. Agnihotri, S. Dagtas, and R.
`Jasinschi, “On Selective Video Content Analysis and Filter
`ing,” presented at SPIE Conference on Image and Video
`Databases, San Jose, 2000; and “Text, Speech, and Vision
`For Video Segmentation: The Infomedia Project” by A.
`Hauptmann and M. Smith, AAAI Fall 1995 Symposium on
`Computational Models for Integrating Language and Vision
`1995, the entire disclosures of which are incorporated herein
`by reference. Any Segment of the Video portion of the
`recorded data including visual (e.g., a face) and/or text
`information relating to a perSon captured by the recording
`devices will indicate that the data relates to that particular
`individual and, thus, may be indexed according to Such
`Segments. AS known in the art, Video Segmentation includes,
`but is not limited to:
`0048 Significant scene change detection: wherein con
`secutive Video frames are compared to identify abrupt Scene
`changes (hard cuts) or Soft transitions (dissolve, fade-in and
`fade-out). An explanation of significant Scene change detec
`tion is provided in the publication by N. Dimitrova, T.
`
`Page 11
`
`

`

`US 2003/0107592 A1
`
`Jun. 12, 2003
`
`McGee, H. Elenbaas, entitled “Video Keyframe Extraction
`and Filtering: A Keyframe is Not a Keyframe to Everyone”,
`Proc. ACM Conf. on Knowledge and Information Manage
`ment, pp. 113-120, 1997, the entire disclosure of which is
`incorporated herein by reference.
`0049 Face detection: wherein regions of each of the
`Video frames are identified which contain skin-tone and
`which correspond to oval-like Shapes. In the preferred
`embodiment, once a face image is identified, the image is
`compared to a database of known facial images Stored in the
`memory to determine whether the facial image shown in the
`Video frame corresponds to the user's viewing preference.
`An explanation of face detection is provided in the publi
`cation by Gang Wei and Ishwar K. Sethi, entitled “Face
`Detection for Image Annotation’, Pattern Recognition Let
`ters, Vol. 20, No. 11, November 1999, the entire disclosure
`of which is incorporated herein by reference.
`0050 Motion
`Estimation/Segmentation/Detection:
`wherein moving objects are determined in Video Sequences
`and the trajectory of the moving object is analyzed. In order
`to determine the movement of objects in Video Sequences,
`known operations Such as optical flow estimation, motion
`compensation and motion Segmentation are preferably
`employed. An explanation of motion estimation/Segmenta
`tion/detection is provided in the publication by Patrick
`Bouthemy and Francois Edouard, entitled “Motion Segmen
`tation and Qualitative Dynamic Scene Analysis from an
`Image Sequence”, International Journal of Computer Vision,
`Vol. 10, No. 2, pp. 157-182, April 1993, the entire disclosure
`of which is incorporated herein by reference.
`0051. The audio component of the video signal may also
`be analyzed and monitored for the occurrence of words/
`Sounds that are relevant to the user's request. Audio Seg
`mentation includes the following types of analysis of Video
`programs: Speech-to-text conversion, audio effects and event
`detection, Speaker identification, program identification,
`music classification, and dialog detection based on Speaker
`identification.
`0.052 Audio segmentation and classification includes
`division of the audio signal into Speech and non-speech
`portions. The first Step in audio Segmentation involves
`Segment classification using low-level audio features Such as
`bandwidth, energy and pitch. Channel Separation is
`employed to Sepa

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket