throbber
(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
`
`(19) World Intellectual Property Organization
`International Bureau
`
`1111111111111111 IIIIII 1111111111111111111111111111111111111111 IIII IIIIIII IIII IIII IIII
`
`(43) International Publication Date
`10 October 2002 (10.10.2002)
`
`PCT
`
`(10) International Publication Number
`WO 02/080524 A2
`
`(51) International Patent Classitication7:
`
`H04N 5/00
`
`(21) International Application Number:
`
`PCT/JB02/00896
`
`(22) International Filing Date: 19 March 2002 (19.03.2002)
`
`(25) Filing Language:
`
`(26) Publication Language:
`
`English
`
`English
`
`MCGEE, Thomas, F.; Prof. Holstlaan 6, NL-5656 AA
`Eindhoven (NL). JASINSCHI, Radu, S.; Prof. Holstlaan
`6, NL-5656 AA Eindhoven (NL).
`
`(74) Agent: GROENENDAAL, Antonius, W., M.; Interna(cid:173)
`tionaal Octrooibureau B.V., Prof. Holstlaan 6, NL-5656
`AA Eindhoven (NL).
`
`(81) Designated States (national): CN, JP, KR.
`
`(30) Priority Data:
`09/822,447
`
`30 March 2001 (30.03.2001) US
`
`(84) Designated States (regional): European patent (A'.T, BE,
`CH, CY, DE, DK, ES, Fl, FR, GB, GR, IE, IT, LU, MC,
`NL, PT, SE, TR).
`
`(71) Applicant: KONINKLIJKE PHILIPS ELECTRON(cid:173)
`ICS N.V. [NL/NL]; Groenewoudseweg l, NL-5621 BA
`Eindhoven (NL).
`
`Published:
`without international search report and to be republished
`upon receipt of that report
`
`For two-letter codes and other abbreviations .. refer to the "Guid(cid:173)
`ance Notes on Codes and Abbreviations" appearing at the begin(cid:173)
`ning of each regular issue of the PCT Gazette.
`
`(72) Inventors: DIMITROVA, Nevenka; Prof. Holstlaan 6,
`NL-5656 AA Eindhoven (NL). AGNIHOTRI, Lalitha;
`Prof.
`Holstlaan 6, NL-5656 AA Eindhoven (NL).
`
`iiiiiiiiiiii
`iiiiiiiiiiii
`iiiiiiiiiiii
`
`---iiiiiiiiiiii
`=
`iiiiiiiiiiii --
`
`(54) Title: STREAMING VIDEO BOOKMARKS
`
`iiiiiiiiiiii
`iiiiiiiiiiii
`
`RAWVIOEO,
`NTSC
`
`202
`
`204
`
`FRAMES
`
`220
`
`210
`
`____,,,_ 206
`
`MACROBLOCK
`CREATOR
`MACROBLOCKS
`
`OCT
`TRANSFORMER
`
`OCT
`MACROBLOCKS
`
`230
`
`240
`
`260
`
`SIGNIFICANT
`SCENE
`PROCESSOR
`
`KEYFRAME
`FILTERER
`
`FRAMES
`MACROBLOCKS
`
`FRAMES
`MACROBLOCKS
`
`MEDIA PROCESSOR
`
`HOST PROCESSOR
`
`234
`
`FRAME MEMORY
`
`...._
`~ (57) Abstract: A method, apparatus and systems for bookmarking an area of interest of stored video content is provided. As a
`viewer is watching a video and finds an area of interest, they can bookmark the particular segment of the video and then return to
`0 that segment with relative simplicity. This can be accomplished by pressing a button, clicking with a mouse or otherwise sending
`:-,. a signal to a device for marking a particular location of the video that is of interest. Frame identifiers can also be used to select a
`~ desired video from an index and to then retrieve the video from a medium containing multiple videos.
`
`""' M
`
`lf) = Q0 =
`
`-i-
`
`Amazon v. Audio Pod
`US Patent 10,805,111
`Amazon EX-1069
`
`

`

`WO 02/080524
`
`PCT/IB02/00896
`
`Streaming video bookmarks
`
`5
`
`BACKGROUND OF THE INVENTION
`The invention relates generally to accessing stored video content and more
`particularly to a method and apparatus for bookmarking video content for identifying
`meaningful segments of a video signal for convenient retrieval at a later time.
`Users often obtain videos stored in VHS format, DVD, disks, files or
`otherwise for immediate viewing or for viewing at a later time. Frequently, the videos can be
`of great length and might have varied content. For example, a viewer might record several
`hours of content, including various television programs or personal activities on a single
`video cassette, hard drive or other storage medium. It is often difficult for viewers to return
`to particularly significant portions of a video. It is often inconvenient to record frame counter
`numbers or recording time information, particularly while viewing a video.
`Users frequently use frustrating hit-or-miss methods for returning to segments
`of particular interest. For example, a viewer might record or obtain a video that includes
`performances of a large number of comedians or figure skaters, but only be interested in the
`performances of a relatively small number of these individuals. Also, a viewer might be
`recording the broadcast while watching the Superbowl or World Series, and wish to return to
`five or six memorable plays of the game.
`Current methods for locating particular segments of interest have been
`inconvenient to use and accordingly, it is desirable to provide an improved apparatus and
`20 method for bookmarking a meaningful segment of a video.
`
`10
`
`15
`
`25
`
`SUMMARY OF THE INVENTION
`Generally speaking, in accordance with the invention, a method, apparatus and
`systems for bookmarking an area of interest of stored video content is provided. As a viewer
`is watching a video and finds an area of interest, they can bookmark the particular segment of
`the video and then return to that segment with relative simplicity. This can be accomplished
`by pressing a button, clicking with a mouse or otherwise sending a signal to a device for
`marking a particular location of the video that is of interest. The boundaries of the entire
`segment can then be automatically identified using various superhistograms, frame
`
`-1-
`
`

`

`WO 02/080524
`
`PCT/IB02/00896
`
`2
`signatures, cut detection methods, closed caption information, audio information, and so on,
`by analyzing the visual, audio and transcript portions of the video signal. The visual
`information can be analyzed for changes in color, edge and shape to determine change of
`individuals by face changes, key frames, video texts and the like. Various audio features
`such as silence, noise, speech, music, and combinations thereof can be analyzed to determine
`the beginning and ending of a segment. Closed captioning information can also be analyzed
`for words, categories and the like. By processing this information to determine the
`boundaries of a meaningful segment of the video, the bookmark will not merely correspond
`to a specific point of the video, but to an entire automatically created segment of the content.
`Thus, not only can bookmark methods, systems and devices in accordance
`with the invention enable a user to conveniently return to a segment of a video of interest, the
`user can be brought to the beginning of the segment and can optionally only view the
`particular segment of interest, or scroll through or view only segments of interest in
`sequence.
`
`For example, if a bookmark signal-is sent while a particular speaker is
`speaking in a video of a situation comedy, identifying the current speaker when the
`bookmark signal is delivered can identify segment boundaries by determining when that
`speaker begins and stops speaking. This information can be useful for certain types of
`content, such as identifying a segment of a movie, but not for others. Histogram information
`such as change of color-palette signals can also help identify segment changes. Closed
`captions and natural language processing techniques can provide further information for
`delineating one topic from the next and will also help in identifying boundaries based on
`topics, dialogues and so forth. By selecting or combining evidence from the above segment
`identification techniques, the boundaries of the segment can be determined and established.
`The above can also be combined with analysis of the structure of the program as a whole to
`further identify the segments.
`In one embodiment of the invention, the bookmark signal identifies a frame
`and the segment is based on time, such as 30 seconds or 1 minute, or video length such a s a
`selected number of frames, for example, before and after the selected frame. Alternatively,
`the segment can be set to a predefined length, such as 30 second$ or 1 minute from the
`segment beginning. Thus, if a bookmark signal is sent towards the end of a long segment,
`only the first part of the segment and possibly just the portion with the bookmark signal will
`be stored. Each segment can include EPG data, a frame or transcript information or
`combinations thereof. Indices of segments can be reviewed from remote locations, such as
`
`5
`
`IO
`
`15
`
`20
`
`25
`
`30
`
`-2-
`
`

`

`WO 02/080524
`
`PCT/IB02/00896
`
`3
`via the internet or world wide web and videos can be selected by searching through such an
`index.
`
`In one embodiment of the invention, new scenes are·detected on a running
`basis as a video is being watched. When a bookmark signal is activated, the system then
`looks for the end of the scene and records/indexes the bookmarked scene or stores the scene
`separately.
`
`In one embodiment of the invention, when a user watching video activates the
`bookmark feature, the unique characteristics of the individual frame are recorded. Then, if a
`user has a large volume of video content in a storage medium and wants to return to a
`bookmarked scene or segment, but cannot remember the identity of the movie, television
`program or sporting event, the characteristics of the frame, as a unique or relatively unique
`identifier are searched and the scene ( or entire work) can be retrieved. Thus, a viewer could
`scroll through a series of video bookmarks until the desired scene is located and go directly to
`the scene or to the beginning of the work. Users can even keep personal lists of favorite
`bookmarked segments of not only video, but music, audio and other stored content and can
`access content from various internet or web accessible content sources by transmitting the
`frame identifier or segment identifier to the content source.
`Bookmarks in accordance with the invention can be backed up to a remote
`device, such as a PDA or other computerized storage device. Such a device can categorize
`the bookmarks, such as by analyzing EPG data, frame information, transcript information,
`such as by doing a key word search, or other video features. In fact, the systems and methods
`in accordance with the invention can also be used to bookmark and categorize various types
`of electronic content, such as segments from audio books, music, radio programs, text
`documents, multimedia presentations, photographs or other images, and so on. It can also be
`advantageous to store bookmarks as different levels, so that certain privacy and/or parental
`guidance issues can be addressed. In certain embodiments of the invention, the bookmarks
`can be accessed through web pages, mobile communication devices, PDAs, watches and
`other electronic devices.
`Thus, an individual can store EPG data, textual data or some other information
`as well as the bookmarks to give a richer prospective of the video. This textual information
`could be part or all of the transcript, the EPG data related to a synopsis or actor, a keyframe
`and so on. This information could be further used to characterize the segment and bookmark.
`
`5
`
`IO
`
`15
`
`20
`
`25
`
`30
`
`-3-
`
`

`

`WO 02/080524
`
`PCT/IB02/00896
`
`4
`
`Accordingly, it is an object of the invention to provide an improved method,
`system and device for bookmarking and retrieving video and other content which overcomes
`drawbacks of existing methods, systems and devices.
`
`10
`
`5 BRIEF DESCRIPTION OF THE DRAWINGS
`For a fuller description of the invention, reference is had to the following
`description, taken in connection with the accompanying drawings, in which:
`FIG. 1 illustrates a video analysis process for segmenting video content in
`accordance with embodiments of the invention;
`FIGS. 2A and 2B are block diagrams of devices used in creating a visual index
`of segments in accordance with embodiments of the invention;
`FIG. 3 is a schematic diagram showing the selection of frame information
`from a video image in accordance with embodiments of the invention;
`FIG. 4 is a chart showing three levels of a segmentation analysis in accordance
`15 with embodiments of the invention; and
`FIG. 5 shows the process flow for the incoming video.
`
`25
`
`DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
`Often a viewer would like to bookmark a segment of the video they are
`20 watching for future retrieval. Bookmarking video can make it much easier to return to
`particular segments of interest. As a user watches a live video or video stored on a tape, disk,
`DVD, VHS tape or otherwise, they can press a button or otherwise cause a signal to be sent
`to a device electronically coupled to the video to enter a marking point. This marking point
`(or the signature of the frame) can be recorded in free areas of the tape (such as control areas)
`or medium on which the video is recorded or the time or frame count for the particular point
`of the tape can be recorded on a separate storage medium.
`FIG. 5 shows the process flow. The incoming video can be divided (formatted)
`into frames in step 501. Next for each of the frames, a signature is developed and stored in
`step 502. If the user has selected the frame for bookmarking then the frame is identified and
`the signature with its frame position and video information stored as a bookmark in step 503.
`The boundaries around the bookmark are then identified and their information can be stored
`as well in step 504. The segment identification, such as the segment boundaries or the video
`can be stored depending on the user in step 505.
`
`30
`
`-4-
`
`

`

`WO 02/080524
`
`PCT/IB02/00896
`
`5
`
`5
`
`10
`
`In one embodiment of the invention, a user might store the bookmarks on a
`PDA, server or other storage device. This can act as a look up table. A user can also verify if
`they have viewed or obtained a specific video by comparing a bookmark or frame
`information to frame information of the video, stored, for example on an external server. A
`viewer might download video and then after viewing, delete the video, keeping only the
`bookmark( s) and then retrieve the video from an external source when additional viewing is
`desired. Thus, storage resources can be maximized and the efficiency of centralized content
`storage sources can be utilized.
`In one embodiment of the invention, when a viewer clicks on a video, the
`frame being displayed at that time is extracted out for analysis. A signature, histogram,
`closed captioning or some other low-level feature or combination of features, could represent
`this frame. Examples will be provided below.
`Although systems in accordance with the invention can be set up to return to
`the exact point where the bookmark signal is activated, in enhanced systems or applications a
`15 meaningful segment of the video can be bookmarked and users can have the option of
`returning to either the exact point or to the beginning of a meaningful segment, rather than to
`the middle of a segment or to the end of a segment, as a user might not decide to bookmark a
`segment until after it has been viewed and found to be of interest.
`Identifying the segment to which a bookmark corresponds can be
`accomplished in various manners. For example, in a preferred embodiment of the invention,
`the entire video or large portions thereof can be analyzed in accordance with the invention
`and broken down into segments. Then, when a bookmark signal is activated, the segment
`which is occurring when the signal is activated ( or the prior segment, or both) can be
`bookmarked. In another embodiment of the invention, the analysis to determine the
`boundaries of a segment are not conducted until after the bookmark signal is activated. This
`information (video signature, start and end time of the tape, frame count and so forth) can be
`stored in the same location identified above.
`In still another embodiment of the invention, a method of identifying items of
`content such as videos, audio, images, text and combinations thereof, and the like can be
`performed by creating a bookmark comprising a selected segment of the content item having
`sufficient identifying information to identify the content item and retaining the segment
`identifying the item on a storage medium, such as a storage medium at a service provider.
`Users could then download the bookmarks at a remote location at their election. Users could
`
`20
`
`25
`
`30
`
`-5-
`
`

`

`WO 02/080524
`
`PCT/1B02/00896
`
`6
`then use the bookmarks to identify the original item of content from which the bookmark was
`created. These downloads of bookmarks can be created in accordance with personal profiles.
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`OCT Frame Signatures
`When the viewer selects a frame, one type of frame signature can be derived
`from the composition of the OCT (Discrete Cosine Transform) coefficients. A frame
`signature representation is derived for each grouping of similarly valued DCT blocks in a
`frame, i.e., a frame signature is derived from region signatures within the frame. Each region
`signature is derived from block signatures as explained in the section below. Qualitatively,
`the frame signatures contain information about the prominent regions in the video frames
`representing identifiable objects. The signature of this frame can than be used to retrieve this
`portion of the video.
`Referring to FIG. 3, extracting block, region and frame signatures can be
`performed as follows. Based on the DC and highest values of the AC coefficients, a signature
`is derived for each block 301 in a video frame 302. Then, blocks 301 with similar signatures
`are compared and size and location of groups of blocks 301 are determined in order to d_erive
`region signatures.
`The block signature 310 can be eight bits long, out of which three bits 320 are
`devoted to the DC signature and five bits 330 are devoted to the AC values. The DC part 320
`of the signature 310 is derived by determining where the DC value falls within a specified
`range of values (e.g. -2400 to 2400). The range can be divided into a preselected number of
`intervals. In this case, eight intervals are used (eight values are represented by three bits).
`Depending on the type of application, the size of the whole signature can be changed to
`accommodate a larger number of intervals and therefore finer granularity representation.
`Each interval is assigned a predefined mapping from the range of DC values to the DC part
`320 of the signature. Five bits 330 are used to represent the content of the AC values. Each
`AC value is compared to a threshold, e.g. 200 and if the value is greater than the threshold,
`the corresponding bit in the AC signature is set to one. An example is shown in FIG. 3, where
`only value 370 is greater than the threshold of 200.
`As shown in FIG. 3, five bits are used to represent the content of the AC
`values. Each AC value is compared to a threshold, if the value is greater than the threshold,
`the corresponding bit in the AC signature is set to one.
`After deriving block signatures for each frame, regions of similarly valued
`block signatures are determined. Regions consist of two or more blocks that share similar
`
`-6-
`
`

`

`WO 02/080524
`
`PCT/IB02/00896
`
`7
`
`block signatures. In this process, a region growing method can be used for isolating regions
`in the image. Traditionally, region growing methods use pixel color and neighborhood
`concepts to detect regions. In one embodiment of the invention, block signature is used as a
`basis for growing regions. Each region can then be assigned a region signature, e.g.:
`regionSignature( mblockSignature, regionSize, R.x, Ry) where Rx and Ry are the coordinates
`of the center of the region. Each region corresponds roughly to an object in the image.
`A selected frame can be represented by the most prominent groupings
`(regions) of OCT blocks. Ann-word long signature is derived for a frame where n determines
`the number of important regions ( defined by the application) and a word consists of a
`predetermined number of bytes. Each frame can be represented by a number of prominent
`regions. In one embodiment of the invention, the number of regions in the image is limited
`and only the largest regions are kept. Because one frame is represented by a number of
`regions, the similarity between frames can be regulated by choosing the number of regions
`that are similar, based on their block signature, size and location. The regions can be sorted
`by region size and then the top n region signatures can be selected as a representative of the
`frame:frame (regionSignaturel, .... regionSignaturen). It should be noted that this
`representation of keyframes is based on the visual appearance of the images, and does not
`attempt to describe any semantics of the images.
`
`Frame Searching
`To find the position in the video, a frame comparison procedure compares a
`bookmarked frame F" with all frames F' in a list of frames. Their respective region
`signatures are compared according to their size:
`
`5
`
`10
`
`15
`
`20
`
`frame_difference = Llregion_size,'-region_size,"I
`
`II
`
`i=I
`
`The frame difference can be calculated for the regions in the frame signature
`25 with the same centroids. In this case, the position of the objects as well as the signature value
`is taken into account. On the other hand, there are cases when the position is irrelevant and
`we need to compare just the region sizes and disregard the position of the regions.
`If the frame difference is zero then we can use the position information from
`the matching frame to retrieve that section of the video.
`Other Frame Signature Types
`Signatures can be created by using combination of features from the frames,
`such as the maximum absolute difference (MAD) between the preceding and/or following
`frame. The intensity of the frame, bitrate used for the frame, whether the frame is interlaced
`
`30
`
`-7-
`
`

`

`WO 02/080524
`
`PCT/IB02/00896
`
`8
`
`or progressive; whether the frame is from a 16:9 or 4:3 format, and so forth. This type of
`information could be used in any combination to identify the frame and a retrieval process
`developed similar to that described above used.
`
`5
`
`Color Histograms
`Instead of using the signatures described above, one could calculate a color
`histogram for the frame and use this for retrieval. The color histogram could consist of any
`
`10
`
`15
`
`20
`
`25
`
`30
`
`number of bins.
`
`Closed Captioning
`Closed captioning data could also be used to bookmark the segment by
`extracting out the key words that represent the section.
`
`Combinations
`Any combination of the above could also be used to bookmark the frame or
`
`section.
`
`Defining the segments
`The segments could be manually bookmarked by the viewer by having the
`viewer click on the start and end points of the video. Alternatively, the bookmarking could
`happen automatically using a technique such as a superhistogram. Automatic techniques for
`determining the boundaries of a frame are discussed below. For example, a scene will often
`maintain a certain color palette. A change in scene usually entails a break in this color
`palette. While the video is playing automatic video analysis can be performed to extract the
`histograms. When the viewer clicks on the video the color histogram for that frame is
`compared to the previous captured frames to identify the start of the frame then the same
`comparisons can be done to find the end of the scene. Using this information it is now
`possible to store only the segment of interest for the viewer. This information can also be
`used for more meaningful retrieval of the full video. For instance, instead of going directly to
`the position of when the viewer clicked, one could actually go to the start of the scene that
`contains that frame.
`
`Example
`The viewer is watching a video of the Wizard of Oz movie. The current view
`contains frames where Dorothy, the Tin Man, the Cowardly Lio:q and the Scarecrow go into
`the Emerald City from the poppy field. The viewer clicks on the video, e.g., when the Horse
`of a Different Color passes. In one embodiment of the invention, the frame/scene analysis has
`been continuous. The system can then extract the selected frame and generates both the DCT
`frame signature as well as the color histogram, for example. The analysis program searches
`
`-8-
`
`

`

`WO 02/080524
`
`PCT/IB02/00896
`
`9
`
`through the previous stored frames until it finds one that does not belong to the same color
`palette. This denotes the start of the scene. The program has continued analyzing the video
`until it locates the end of the scene by virtue of another significant change in color palette . If
`the user had already decided to record the whole video, the start and end points are marked.
`In another embodiment of the invention, only the segment is stored. Meanwhile the program
`has been analyzing and storing the DCT frame information for the individual frames.
`Sometime later, if the viewer views the bookmarked frame and decides to retrieve the portion
`of the video, the DCT frame information is compared with the stored information until a
`match is found. Then the marked points around this frame are used to retrieve that portion of
`the video.
`
`Segmenting the video can be performed using analysis techniques such as
`those discussed in US Pat. Nos. 6,137,544 and 6,125,229, the contents of which are
`incorporated herein by reference.
`Segmenting a video signal can also be accomplished with use of a layered
`probabilistic system which can be referred to as a Bayesian Engine or BE. Such a system is
`described in J. Pearl, "Probabilistic Reasoning in Intelligent Systems: Networks of Plausible
`Inference," Morgan Kaufmann Publishers, Inc. San Mateo, California (1988). Such a system
`can be understood with reference to FIG. 9.
`FIG. 4 shows a three layer probabilistic framework in three layers: low level
`410, mid-level 420 and high level 430. Low level layer 410 describes signal processing
`parameters for a video signal 401. These can include visual features such as color, edge, and
`shape, audio parameters, such as average energy, bandwidth, pitch, met-frequency cepstral
`coefficients, linear prediction coding coefficients, and zero-crossings; and the transcript,
`which can be pulled from the ASCII characters of the closed captions. If closed caption
`information is not available, voice recognition methods can be used to convert the audio to
`transcript characters.
`The arrows indicate the combinations of low-level 410 features that create
`mid-level 420 features. Mid-level 420 features are associated with whole frames or
`collections of frames, while low level 410 features are associated with pixels or short time
`intervals. Keyframes (first frame of a shot), faces, and video text are mid-level visual
`features. Silence, noise, speech, music and combinations thereof are mid-level 420 features.
`Keywords and the closed caption/transcript categories also are part of mid-level 420.
`High level features can describe semantic video content obtained through the
`integration of mid-level 420 features across the different modalities.
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`-9-
`
`

`

`WO 02/080524
`
`PCT/IB02/00896
`
`10
`
`This approach is highly suitable because probabilistic frameworks are
`designed to deal with uncertain information, and they are appropriate for representing the
`integration of information. The BE's probabilistic integration employs either intra or inter(cid:173)
`modalities. Intra-modality integration refers to integration of features within a single domain.
`For example: integration of color, edge, and shape information for videotext represents intra-
`modality integration because it all takes place in the visual domain. Integration of mid-level
`audio categories with the visual categories face and videotext offers an example of inter(cid:173)
`modalities.
`
`Bayesian networks are directed acyclical graphs (DAG) in which the nodes
`correspond to (stochastic) variables. The arcs describe a direct causal relationship between
`the linked variables. The strength of these links is given by conditional probability
`distributions (cpds). More formally, let the set A Li, ... _N) of N variables define a DAG. For
`each variable there exists a sub-set of variables of A, 9 _i, the parents set of _i i.e., the
`predecessors of _i, in the DAG, such that PL 19 _i) = P(_i l_t, ... ,_i-t), where P( 1 ·) is a cpd,
`strictly positive. Now, given the joint probability density function (pdf) P(_i, ... y}, using the
`chain rule:
`
`P(_i, ... y} = P(_N I _N-1, ... , _1) x ... xPG I _1)P(_1). According to this
`equation, the parent set 9 _i has the property that _i and {_1, ... , y} W _i are conditionally
`
`independent given 9 _i•
`In FIG. 4, the flow diagram of the BE has the structure of a DAG made up of
`three layers. In each layer, each element corresponds to a node in the DAG. The directed
`arcs join one node in a given layer with one or more nodes of the preceding layer. Two sets
`of arcs join the elements of the three layers. For a given layer and for a given element we
`compute a joint pdf as (probability density function) previously described. More precisely,
`for an element (node) ,-(I) associated with the /-th layer, the joint pdf is:
`
`5
`
`10
`
`15
`
`20
`
`25
`
`X
`
`{p(_(/-1) I 9(/-I))
`I
`I
`
`, • •
`
`pc_(l-1)
`N(l-1)
`
`. l}
`I 9(/-t)
`N{l-lJ/
`
`, • •
`
`(1)
`
`-10-
`
`

`

`WO 02/080524
`
`PCT/IB02/00896
`
`11
`
`where for each element}" there exists a parent set 9p>, the union of the parent sets for a
`given level/, i.e., 9U> <.®; i=/U>9lfJ, There can exist an overlap between the different parent
`
`sets for each level.
`Topic segmentation (and classification) performed by BE is shown in the third
`layer (high-level) of FIG. 4. The complex nature of multimedia content requires integration
`across multiple domains. It is preferred to use the comprehensive set of data from the audio,
`visual, and transcript domains.
`In the BE structure, FIG. 4, for each of the three layers, each node and arrow is
`associated to a cpd. In the low-level layer the cpd's are assigned by the AE as described
`above. For the mid-level layer, twenty closed captions categories (for example) are
`generated: weather, international, crime, sports, movie, fashion, tech stock, music,
`automobile, war, economy, energy, stock, violence, financial, national (affairs), biotech,
`disaster, art, and politics. It is advantageous to use a knowledge tree for each category made
`up of an association table of keywords and categories. After a statistical processing, the
`system performs categorization using category vote histograms. If a word in the closed
`captions file matches a knowledge base keyword, then the corresponding category gets a
`vote. The probability, for each category, is given by the ratio between the total number of
`votes per keyword and the total number of votes for a closed captions para-graph.
`Systems in accordance with the invention can perform segmentation
`segmenting the TV program into commercial vs. non-commercial parts; classifying the non(cid:173)
`commercial parts into segments based on two high-level categories: financial news and talk
`shows, for example (performed by the BE).
`Initial segmentation can be done using closed caption data to divide the video
`into program and commercial segments. Next the closed captions of the program segments
`are analyzed for single, double, and triple arrows. Double arrows indicate a speaker change.
`The system marks text between successive double arrows with a start and end time in order to
`use it as an atomic closed captions unit. Systems in accordance with the invention can use
`these units as the segmenting building blocks. In order to determine a segment's high-level
`indexing (whether it is financial news or a talk show, for example) Scout computes two joint
`probabilities. These are defined as:
`p-FIN-TOPIC = p-VTEXT * p-KWORDS * p-FACE *
`p-AUDIO-FIN * p-CC-FIN * p-FACETEXT-FIN
`
`(2),
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`p-TALK-TOPIC = p-VTEXT * p-KWORDS * p-FACE *p-AUDIO-TALK *
`
`-11-
`
`

`

`WO 02/080524
`
`p-CC- TALK* p-FACETEXT-TALK
`
`12
`
`PCT/IB02/00896
`
`(3).
`
`The audio probabilities p-AUDIO-FIN for financial news and p-AUDIO-
`
`5
`
`T ALK for talk shows are created by the combination of different individual audio category
`probabilities. The closed captions probabilities p-CC-FIN for financial news and p-CC-
`T ALK for talk shows are chosen as the largest probability out of the list of twenty
`
`probabilities. The face and videotext probabilities p-FACETEXT-FIN and p-FACETEXT(cid:173)
`TALK are obtained by comparing the face and videotext probabilities p-FACE and p-TEXT
`
`which determine, for each individual closed caption unit, the prob-ability of face and text
`
`l O occurrence. One heuristic use builds on the fact that talk shows are dominated by faces while
`financial news has both faces and text. The high-level segmenting is done on each closed
`captions unit by computing in a n

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket