throbber
(12) United States Patent
`Bergen et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 6,956,573 B1
`Oct. 18, 2005
`
`USOO6956573B1
`
`(54) METHOD AND APPARATUS FOR
`OTHER PUBLICATIONS
`EFFICIENTLY REPRESENTING STORING
`66
`AND ACCESSING WIDEO INFORMATION
`Shibata et al. (“Content-Based structuring of video infor
`mation": 0-8186-7436-9/96, 1996 IEEE).*
`(75) Inventors: James Russell Bergen, Hopewell, NJ
`Jaillon et al. (Image Mosaicing Applied to Three
`(US); Curtis R. Carlson, Princeton, NJ
`Dimensional Surfaces”: 1051-4651/94–1944 IEEE).
`Smoliar et al (“Content-based Video indexing and
`S. s ESSE"" Retrieval". 1070-986x94-1994 IEEE).
`US); Rakesh K
`M
`th Jct.
`Hong Jiang Zhang, Atreyi Kankanhalli, Stephen W. Smoliar,
`Cranbufy NJ (US)
`s
`“Automatic Partitioning of Full-motion Video', Multimedia
`(73) Assignee: Sarnoff Corporation, Princeton, NJ
`Systems pp. 10-28, 1993.
`(US)
`Y. Gong, C. H. Chuan, Z. Yongwei, M. Sakauchi, “A
`Generic Video Parsing System With A Scene Description
`Language (SDL)”, Real-Time Imaging, Vol. 2, pp. 45-59,
`1996.
`H. D. Wactlar, T. Kanade, M. A. Smith, S. M. Stevens,
`“Intelligent Access to Digital Video: Informedia Project',
`IEEE Computer Society, vol. 29, No. 5, pp. 46-52 (1996).
`M. Christel, S. Stevens, T. Kanade, M. Mauldin, R. Reddy
`and H. Wactlar, “Techniques for the Creation and Explora
`tion of Digital Video Libraries”, Multimedia Tools and
`Applications, vol. 2, pp. 1-33, (1996).
`* cited by examiner
`Primary Examiner Almis R. Jankus
`(74) Attorney, Agent, or Firm-William J. Burke
`(57)
`ABSTRACT
`A method and concomitant apparatus for comprehensively
`representing Video information in a manner facilitating
`indexing of the Video information. Specifically, a method
`according to the inveniton comprises the Steps of dividing a
`continuous video Stream into a plurality of Video Scenes, and
`at least one of the Steps of dividing, using intra-Scene motion
`analysis, at least one of the plurality of Scenes into one or
`more layers, representing, as a mosaic, at least one of the
`pluraliy of Scenes, computing, for at least one layer or Scene,
`one or more content-related appearance attributes, and Stor
`Ing, in a database, the content-related appearance attributes
`or said mosaic representations.
`
`(*) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 1414 days.
`(21) Appl. No.: 08/970,889
`(22) Filed:
`Nov. 14, 1997
`Related U.S. Application Data
`(60) Provisional application No. 60/031,003, filed on Nov.
`15, 1996.
`(51) Int. Cl. ............................................... G06T 15/00
`(52) U.S. Cl. ...................................................... 345/473
`(58) Field of Search ................................ 345/327,473;
`382/284, 220, 305,236; 715/716
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`4.941,125 A 7/1990 Boyne ........................ 364/900
`5,485,611 A 1/1996 Astle .......................... 395/600
`5,550.965 A 8/1996 Gabbe et al. ............... 395/154
`5,635,982 A * 6/1997 Zhang et al. ............... 348/231
`5,649.032 A 7/1997 Burt et al. .................. 382/284
`5,657,402 A 8/1997 Bender et al. .............. 382/284
`5,706,417 A
`1/1998 Adelson ..................... 395/129
`5,751,286 A * 5/1998 Barber et al. ............... 345.348
`5,821,945 A * 10/1998 Yeo et al. ................... 345/440
`5,915,044 A
`6/1999 Gardos et al. .............. 382/236
`
`23 Claims, 10 Drawing Sheets
`
`stice ...|s|sassissississ 74 sn-sn-710
`
`739--
`
`738
`
`732
`
`BACK FF Far F5
`GROUN
`
`F6
`
`
`
`748
`
`742
`
`Fr. Fm
`
`75
`
`769
`
`768
`
`770
`
`80
`
`Page 1
`
`AMAZON EX. 1028
`Amazon v. CustomPlay
`US Patent No. 9,380,282
`
`

`

`U.S. Patent
`
`Oct. 18, 2005
`
`Sheet 1 of 10
`
`US 6,956,573 B1
`
`Av1dSId
`
`colt
`
`i
`
`MYOMLIN
`
`
`
`AsvavivdJSNIDNA
`
`YHOMLAN
`
`vS
`
`OFOSI
`
`-69ZO
`
`
`—~|9ZtaouNos!|“9g
`
`
`7LtHATIOWINOO]{NOLWAYOANI
`
`rap|091\as$8GZ}SOIAaG4SS300V39VWIwo|__inan
`
`
`
`‘O24—-==-!O3dIAO3AIAsNOLWIWHOSNIoniABVTIONY|SCrecoh
`—CeOTONIBOHLAY|GL}SOVINALNI!l
`
`OOOz!
`
`NOLLWAHOSNIESSISATWNY[$5HOLNIWDAS
`
`
`6-021
`
`dvOSCIA
`
`YOOOSCIA
`
`U-OLI.e
`
`U-9S
`
`L5ld
`
`Page 2
`
`Page 2
`
`
`
`
`
`
`

`

`U.S. Patent
`
`Oct. 18, 2005
`
`Sheet 2 of 10
`
`US 6,956,573 B1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`205
`
`210
`
`220
`
`230
`
`CALCULATE
`FRAME(N)
`DESCRIPTION
`
`CALCULATE
`FRAME(N+1)
`DESCRIPTION
`
`CALCULATE
`FFD OF
`FRAME (N) AND
`FRAME (N+1)
`
`HRESHOLD
`
`245
`
`SET SCENE
`CUT FLAG
`
`
`
`250
`
`255
`
`FIG. 2
`
`Page 3
`
`

`

`U.S. Patent
`
`Oct. 18, 2005
`
`Sheet 3 of 10
`
`US 6,956,573 B1
`
`DIVIDESCENES
`3
`INFOF6RERD-1810
`{AND BACKGROUND
`
`COMPUTE
`INTRA-SCENE
`ATTRIBUTES
`
`
`
`35
`
`320
`
`325
`
`330
`
`STORE
`INTRA-SCENE
`ATTRIBUTEDAA
`
`COMPUTE
`NTER-SCENE
`ATTRIBUTES
`
`STORE
`iNTER-SCENE
`Al TRIBUTE DATA
`
`
`
`a lar
`
`as
`
`skr arma Y
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`INTER-SCENE
`REPRESENTATIONS
`3D- 345
`
`F.G. 3
`
`Page 4
`
`

`

`U.S. Patent
`
`Oct. 18, 2005
`
`Sheet 4 of 10
`
`US 6,956,573 B1
`
`470
`
`- - - - - - - - - - - - - - -
`CAMERA476-2
`
`:
`
`GPS
`RECEIVER-476-1
`
`}
`
`DISPLAY
`472
`
`VIDEO BOOK
`PROGRAM
`
`WDEOMAP
`PROGRAM
`CONTROLLER474
`
`(NETWORKS
`\ 160
`
`NETWORK
`INTERFACE 473
`
`
`
`STORAGE
`UNIT477
`
`STORAGE UNIT
`INTERFACE 478
`
`KEYPAD
`175
`
`FIG. 4
`
`Page 5
`
`

`

`U.S. Patent
`
`Oct. 18, 2005
`
`Sheet 5 of 10
`
`US 6,956,573 B1
`
`
`
`:
`
`P-3
`sh;
`Seaw:
`gases
`
`s:
`is raisals
`
`Page 6
`
`

`

`U.S. Patent
`
`Oct. 18, 2005
`
`Sheet 6 of 10
`
`US 6,956,573 B1
`
`605
`
`AUTHORING
`
`VIDEOAS
`AMAP
`
`DISTRIBUTION
`
`630
`
`610
`
`
`
`
`
`
`
`
`
`ANNOTATED
`REFERENCE
`VIDEO/IMAGE
`DATABASE
`CREATION
`
`612
`
`ACCESS
`
`PRESENTATION/
`CREATION OF
`ANNOTATED IMAGE
`
`620
`
`
`
`INDEXING
`INTO THE
`DATABASE
`
`64
`
`REPRESENTATION
`OF SCENES ASA
`COLLECTION OF
`VIEWS
`
`ANNOTATION OF
`REFERENCE
`IMAGERY WITH
`DATABASE/
`ANCILLIARY
`INFORMATION
`
`613
`
`622
`
`624
`
`APPEARANCE
`BASED
`INDEXING
`INFORMATION
`
`USING ANCILLIARY
`INFORMATION
`SUCHAS GPS,
`COMPASSETC.
`
`USING WIDEO
`INFORMATION
`
`
`
`
`
`FIG. 6
`
`Page 7
`
`

`

`U.S. Patent
`
`Oct. 18, 2005
`
`Sheet 7 of 10
`
`US 6,956,573 B1
`
`since .... S1s2S3S4S5Ss
`
`710
`
`FFFFFSF6/
`
`720
`
`
`
`769
`
`768
`
`Page 8
`
`

`

`U.S. Patent
`
`Oct. 18, 2005
`
`Sheet 8 of 10
`
`US 6,956,573 B1
`
`OUERYTYPE
`
`QUERY
`SPECIFICATION
`
`805
`
`810
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`COMPUTE FEATURES
`FOR THE SPECIFIED
`CRUERY
`
`TRANSMITAPPROPRIATEFEATURE
`VECTOR(S) TO THE DATABASE
`SEARCHENGINE
`
`SEARCH THROUGH THE STORED
`MULT-DIMENSIONALTREESTRUCTURESTO
`RETRIEVE POTENTIAL MATCHING DATA
`
`LINEARLY SEARCH THROUGH RETRIEVED DATA
`TO FINDETHER THE TOP k MATCHES
`OR
`ALLMATCHES WITHINA GIVENTHRESHOLD
`
`FORMAT THE REPRESENTATIVE
`FRAMES/MOSAICS OF ALL THE MATCHING
`VIDEO SHOTS FOR PRESENTATIONALONG
`WITH (OPTIONALLY) A VISUALORNUMERIC
`INDICATOR OF THE QUALITY OF MATCH
`
`TRANSMIT THE RESULT INA VISUAL
`STORYBOARD FORM FOR DISPLAYAT
`THE BROWSER/QUERY/USEREND
`
`820
`
`830
`
`840
`
`850
`
`860
`
`870
`
`FIG. 8
`
`Page 9
`
`

`

`U.S. Patent
`
`Oct. 18, 2005
`
`Sheet 9 of 10
`
`US 6,956,573 B1
`
`
`
`
`
`
`
`
`
`
`
`905
`
`
`
`910
`
`
`
`INPUT FRAME
`
`DECIMATE FRAME TO
`PRODUCE IMAGE PYRAMID
`
`SELECT FEATURE AND
`ASSOCATED FILTER
`
`APPLYN FEATURE FILTERS
`TO EACHSUBBAND OF
`PYRAMID
`
`RECTIFY FILTER OUTPUTS
`
`GENERATEFEATURE MAP
`FOREACH RECTIFIED
`FILTER OUTPUT
`
`
`
`INTEGRATEFEATURE MAPS
`FOREACHSUBBAND TO
`PRODUCEATTRIBUTEPYRAMD
`
`STOREATTRIBUTEPYRAMID
`
`ADDITIONAL
`FEATURE
`?
`
`FIG. 9
`
`Page 10
`
`

`

`U.S. Patent
`
`Oct. 18, 2005
`
`Sheet 10 of 10
`
`US 6,956,573 B1
`
`ALAINLLV
`
`SGINVYVd
`
`
`
`ATWO01SLVYSDALNI
`
`ASILOSY
`
`Page 11
`
`
`
`wADYANA.JUNLVAS
`
`SCINVWeVd
`
`LN~5
`yunLvas
`
`SdvNby
`
`OL‘Sid
`
`<M
`
`Page 11
`
`
`

`

`1
`METHOD AND APPARATUS FOR
`EFFICIENTLY REPRESENTING STORING
`AND ACCESSING WIDEO INFORMATION
`
`The invention claims benefit of U.S. Provisional Appli
`cation No. 60/031,003, filed Nov. 15, 1996.
`The invention relates to Video processing techniques and,
`more particularly, the invention relates to a method and
`apparatus for efficiently Storing and accessing Video infor
`mation.
`
`1O
`
`BACKGROUND OF THE DISCLOSURE
`
`15
`
`The capturing of analog video signals in the consumer,
`industrial and government/military environments is well
`known. For example, a moderately priced personal computer
`including a Video capture board is typically capable of
`converting an analog video input signal into a digital video
`Signal, and Storing the digital Video signal in a mass Storage
`device (e.g., a hard disk drive). However, the usefulness of
`the Stored digital Video signal is limited due to the Sequential
`nature of present video access techniques. These techniques
`treat the Stored Video information as merely a digital repre
`Sentation of a Sequential analog information Stream. That is,
`25
`Stored Video is accessed in a linear manner using familiar
`VCR-like commands, such as the PLAY, STOP, FAST
`FORWARD, REWIND and the like. Moreover, a lack of
`annotation and manipulation tools due to, e.g., the enormous
`amount of data inherent in a Video signal, precludes the use
`of rapid access and manipulation techniques common in
`database management applications.
`Therefore, a need exists in the art for a method and
`apparatus for analyzing and annotating raw video informa
`tion to produce a video information database having prop
`erties that facilitate a plurality of non-linear access tech
`niques.
`
`35
`
`SUMMARY OF THE INVENTION
`
`The invention is a method and apparatus for comprehen
`Sively representing Video information in a manner facilitat
`ing indexing of the Video information. Specifically, a method
`according to the inveniton comprises the Steps of dividing a
`continuous video Stream into a plurality of Video Scenes, and
`at least one of the Steps of dividing, using intra-Scene motion
`analysis, at least one of the plurality of Scenes into one or
`more layers, representing, as a mosaic, at least one of the
`pluraliy of Scenes, computing, for at least one layer or Scene,
`one or more content-related appearance attributes, and Stor
`ing, in a database, the content-related appearance attributes
`or Said mosaic representations.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The teachings of the present invention can be readily
`understood by considering the following detailed descrip
`tion in conjunction with the accompanying drawings, in
`which:
`FIG. 1 depicts a high level block diagram of a video
`information processing System according to the invention;
`FIG. 2 is a flow diagram of a Segmentation routine
`Suitable for use in the Video information processing System
`of FIG. 1;
`FIG. 3 is a flow diagram of an authoring routine suitable
`for use in the Video information processing System of FIG. 1;
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 6,956,573 B1
`
`2
`FIG. 4 depicts a “Video-Map” embodiment of the inven
`tion Suitable for use as a Stand-alone System, or as a client
`within the video information processing system of FIG. 1;
`FIG. 5 shows a user holding the Video-Map embodiment
`of FIG. 4, and an exemplary Screen display of an annotated
`image of the skyline of New York city;
`FIG. 6 depicts exemplary implementation and use Steps of
`the Video-Map embodiment of FIG. 4; and
`FIG. 7 is a graphical representation of the relative
`memory requirements of two Scene Storage methods.
`FIG. 8 is a flow diagram of a query execution routine
`according to the invention; and
`FIGS. 9 and 10 are, respectively, a flow diagram 900 and
`a high-level function diagram 1000 of an attribute genera
`tion method according to the invention.
`
`DETAILED DESCRIPTION
`
`The invention claims benefit of U.S. Provisional Appli
`cation No. 60/031,003, filed Nov. 15, 1996, and incorporated
`herein by reference in its entirety.
`The invention will be described within the context of a
`Video information processing System. It will be recognized
`by those skilled in the art that various other embodiments of
`the invention may be realized using the teachings of the
`following description. AS examples of Such embodiments, a
`video-on-demand embodiment and a “Video-Map” embodi
`ment will also be described.
`The invention is directed toward providing an information
`database Suitable for providing a Scene-based Video infor
`mation to a user. The representation may include motion or
`may be motionless, depending on the application. Briefly,
`the process of constructing the Scene-based Video represen
`tation may be conceptualized as a plurality of analysis Steps
`operative upon the appropriate portions of an evolving Scene
`representation. That is, each of the various video processing
`techniques that will be described below are operative on
`Some, but not all, of the information associated with a
`particular Scene. To illustrate this point, consider the fol
`lowing video processing steps (all of which will be
`described in more detail below): Segmenting, mosaic con
`Struction, motion analysis, appearance analysis and ancillary
`data capture.
`Segmenting comprises the process of dividing a continu
`ouS Video stream into a plurality of Segments, or Scenes,
`where each Scene comprises a plurality of frames, one of
`which is designated a “key frame.”
`Mosaic construction comprises the process of computing,
`for a given Scene or Video Segment, a variety of “mosaic'
`representations and associated frame coordinate transforms,
`Such as background mosaics, Synopsis mosaics, depth lay
`ers, parallax maps, frame-mosaic coordinate transforms, and
`frame-reference image coordinate transforms. For example,
`in one mosaic representation a single mosaic is constructed
`to represent the background Scenery in a Scene, while
`individual frames in the Scene include only foreground
`information that is related to the mosaic by an affine or a
`projective transformation. Thus, the 2D mosaic representa
`tion efficiently utilizes memory by Storing the background
`information of a Scene only once.
`Motion analysis comprises the process of computing, for
`a given Scene or video Segment, a description of the Scene
`or video segment in terms of: (1) layers of motion and
`Structure corresponding to objects, Surfaces and Structures at
`different depths and orientations; (2) independently moving
`objects; (3) foreground and background layer representa
`tions; and (4) parametric and parallax/depth representations
`
`Page 12
`
`

`

`3
`for layers, object trajectories and camera motion. This
`analysis in particular leads to the creation of the associated
`mosaic representations for the foreground, background and
`other layers in the Scene/Segment.
`Appearance analysis is the process of computing, for a
`frame or a layer (e.g., background, depth) of a scene or video
`Segment, content-related attribute information Such as color
`or texture descriptorS represented as a collection of feature
`VectOrS.
`Ancillary data capture comprises the process of capturing,
`through ancillary data streams (time, Sensor data, telemetry)
`or manual entry, ancillary data related to Some or all of the
`Scenes or Video Segments.
`Part of the invention is the selective use of the above
`mentioned Video processing Steps to provide a comprehen
`Sive method of representing video information in a manner
`facilitating indexing of the video information. That is, the
`Video information may be represented using Some or all of
`the above mentioned Video processing Steps, and each video
`processing Step may be implemented in a more or leSS
`complex manner. Thus, the invention provides a compre
`hensive, yet flexible method of representing video for index
`ing that may be adapted to many different applications.
`For example, a network newscast application may be
`adequately represented as 2D mosaic formed using a motion
`analysis processing Step that only Separates a background
`layer (i.e. the news Set) from a foreground object (i.e., the
`anchorperson). A more complex example is the representa
`tion of a baseball game as multiple layers, Such as a cloud
`layer, a field layer and a player layer. Factors including the
`complexity of a Scene, the type of camera motion for the
`scene, and the critical (or non-critical) nature of the scene
`content may be used as guides in determing the appropriate
`representation level of the Scene.
`FIG. 1 is a high level block diagram of a video informa
`tion processing System 100 according to the invention. The
`Video information processing System 100 comprises three
`functional Subsystems, an authoring Sub-System, an acceSS
`sub-system and a distribution sub-system. The three func
`tional Subsystems non-exclusively utilize various functional
`blocks within the video information processing system 100.
`Each of the three sub-systems will be described in more
`detail below, and with respect to the various drawings.
`Briefly, the authoring Sub-system 120, 140 is used to gen
`erate and Store a representation of pertinent aspects of raw
`Video information and, Specifically, to logically Segment,
`analyze and efficiently represent raw Video information to
`produce a video information database having properties that
`facilitate a plurality of acceSS techniques. The access Sub
`system 130, 125, 150 is used to access the video information
`database according access techniques Such as textual or
`Visual indexing and attribute query techniques, dynamic
`browsing techniques and other iterative and relational infor
`mation retrieval techniques. The distribution Sub-System
`130, 160, 170 is used to process accessed video information
`to produce video information Streams having properties that
`facilitate controllably accurate or appropriate information
`Stream retrieval and compositing by a client. Client-side
`compositing comprises the Steps necessary to retrieve Spe
`cific information in a form Sufficient to achieve a client-side
`purpose.
`Video information processing system 100 receives a video
`Signal S1 from a video signal Source (not shown). The video
`Signal S1 is coupled to an authoring Sub-System 120 and an
`image vault 150. The authoring subsystem 120 processes the
`video signal S1 to produce a video information database 125
`having properties that facilitate a plurality of acceSS tech
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 6,956,573 B1
`
`4
`niques. For example, the Video representative information
`resulting from the previously-mentioned comprehensive
`representation steps (i.e., Segmenting, mosaic construction,
`motion analysis, appearance analysis and ancillary data
`capture) is stored in video information database 125. Video
`information database 125, in response to a control C1
`requesting, e.g., Video frames or Scenes Substantially match
`ing Some or all of the Stored Video representative informa
`tion, generates an output signal S4 that flexibly provides
`Video information representation information Satisfying the
`request.
`The video information database 125 is optionally coupled
`to an ancillary information source 140. The ancillary infor
`mation Source is used to provide non-Video information
`asSociated with the Video information Stored in the database
`125. Such information may include, e.g., positional infor
`mation identifying, e.g., camera positions used to produce
`particular video Segments or Scenes. Such information may
`also comprise annotations, both Visual and audible, that, e.g.,
`identify portions of one or more frames or Scenes, or provide
`Some commentary relevant to one or more frames or Scenes.
`The image vault 150, illustratively a disk array or server
`Specifically designed to Store and distribute video informa
`tion, Stores the Video information carried by Video Signal S1.
`The image vault 150, in response to a control signal C2
`requesting, e.g., a Specific Video program, generates a Video
`output signal S5.
`An acceSS engine 130, illustratively a Video-on-demand
`Server, generates control Signals C1 and C2 for controlling,
`respectively, the annotated Video database 125 and the image
`vault 150. The access engine 130 also receives the video
`output signal S5 from the image vault 150, and the output
`signal S4 from the video information database 125. The
`acceSS engine 130, in response to a control Signal C3,
`illustratively a Video browser request or a Video Server
`request, produces a signal S6.
`The access engine 130 is coupled to one or more clients
`(170-1 through 170-n) via a distribution network 160, illus
`tratively a cable television network or a telecommunications
`network. Each client is associated with a control Signal path
`(C3-1 through C3-n) and a signal path (S6-1 through S6-n).
`Each client 170 includes a display 172 and a controller 174.
`The controller 174 is responsive to user input via an input
`device 175, illustratively a remote control unit or a key
`board. In operation, a client 170 provides, e.g., textual
`and/or visual browsing and query requests to the access
`engine 130. The acceSS engine responsively utilizes infor
`mation stored in the annotated video database 125 and the
`image vault 150 to produce the signal S6 responsive to the
`client request.
`The authoring and access Subsystems will first be
`described in a general manner with respect to the Video
`information processing system 100 of FIG. 1. The distribu
`tion subsystem will then be described within the context of
`several embodiments of the invention. In describing the
`Several embodiments of the invention, Several differences in
`the implementation of the authoring and access Subsystems
`with respect to the embodiments will be noted.
`The inventors have recognized that the problems of video
`Sequence Segmentation and Video Sequence Searching may
`be addressed by the use of a short, yet highly representative
`description of the contents of the imageS. This description is
`in the form of a low-dimensional vector of real-valued
`quantities defined by the inventors as a multi-dimensional
`feature vector (MDFV). The MDFV “descriptor” comprises
`a vector descriptor of a predetermined dimensionality that is
`representative of one or more attributes associated with an
`
`Page 13
`
`

`

`US 6,956,573 B1
`
`15
`
`25
`
`35
`
`40
`
`S
`image. An MDFV is generated by Subjecting an image to a
`predetermined set of digital filters, where each filter is tuned
`to a Specific range of Spatial frequencies and orientations.
`The filters, when taken together, cover a wide range of
`Spatial-frequencies and orientations. The respective output
`Signals from the filters are converted into an energy repre
`Sentation by, e.g., integrating the Squared modulus of the
`filtered image over the image region. The MDFV comprises
`these energy measures.
`FIGS. 9 and 10 are, respectively, a flow diagram 900 and
`a high-level function diagram 1000 of an attribute genera
`tion method according to the invention. The method of FIG.
`9 will be described with reference to FIG. 10. Specifically,
`the method 900 and function diagram 1000 are directed
`toward the processing of an input image I to produce
`attribute information (i.e., MDFV) in the form of an
`attribute pyramid.
`For the purposes of appearance-based indexing, two kinds
`of multi-dimensional features are computed: (1) Features
`that capture distributions without capturing any Spatial con
`Straints; and (2) Features that compute local appearance and
`are grouped together to capture the global spatial arrange
`ment.
`The first type of features that are computed do not
`preserve the previously, the input video signal S1 is option
`ally is divided into layerS and moving objects. In particular,
`a layer may be the complete background Scene or a portion
`of the background Scene (with respect to objects deemed to
`be part of a foreground portion of the Scene). For each of the
`layers (including potentially the complete background
`Scene) a multi-dimensional statistical distribution is com
`puted to capture the global appearance of the layer. Specific
`examples of these distributions are: (1) Histograms of multi
`dimensional color features chosen from a Suitable Space,
`such as Lab, YUV or RGB; (2) Histograms of multi
`dimensional texture-like features where each feature is the
`output of Gaussian and derivative and/or Gabor filters,
`where each filter is defined for a specific orientation and
`Scale. These filters, which are arranged individually or as
`filter banks, may be efficiently computed using pyramid
`techniques. Multi-dimensional histograms and, in particular,
`many one-dimensional histograms, are defined using the
`output of the filters (or filter banks) at each location in a
`Scene layer. In particular, a collection of Single dimensional
`histograms, Such as disclosed in the above-referenced U.S.
`application Ser. No. 08/511,258, may be used.
`The Second type of features that are computed preserve
`the Spatial arrangement of the features within a layer or an
`object. The following steps are followed to create this
`representation. First, the locations of distinctive features are
`computed. Second, multi-dimensional feature vectors are
`computed for each location.
`The locations of distinctive features are those locations in
`the layer or object where the appearance has Some Saliency.
`The inventors define Saliency as a local maximum response
`of a given feature with respect to Spatial Scale. For instance,
`if a corner-like feature is Selected to define Saliency, then a
`filter corresponding to a corner detector is computed at a
`collection of closely Spaced Spatial Scales for the filter. The
`Scale may also be defined using the levels of a feature
`pyramid. The response of the filter is computed at each
`Spatial location and acroSS multiple Scales. Locations where
`the filter response is a maximum both with respect to Scale
`and with respect to neighboring spatial locations is chosen as
`a Salient feature.
`Multi-dimensional feature vectors are next computed at
`each Salient location. That is, filter responses for filters at
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`multiple Scales and orientations are computed. These may be
`defined using Gaussian and derivative filters or Gabor filters.
`A collection of these filters that Systematically Sample the
`Space of orientations and Scales (within reasonable limits,
`for instance Scale changes between /8 and 8, but in principle
`may be arbitrary) is computed. This collection as each of the
`Salient points becomes the multi-dimensional feature repre
`Sentation for that point. For each layer and object, a collec
`tion of these features along with their spatial locations is
`Stored in a database using a kd-tree (R-tree) like multi
`dimensional data structure.
`The attribute generation method 900 of FIG. 9 is entered
`at step 905, when an input frame is made available. At step
`910 the input frame in retrieved, and at step 915 the input
`frame is Subjected to a known pyramid processing step (e.g.,
`decimation) to produce an image pyramid. In FIG. 10, the
`input frame is depicted as an input image Io, and the pyramid
`processing Step produces an image pyramid comprising
`three image pyramid Subbands, I, I and I. It is produced
`by, e.g., Subsampling Io. I2 is produced by, e.g., Subsampling
`I. I is produced by, e.g., Subsampling I. Since each
`Subband of the image pyramid will be processed in the same
`manner, only the processing of Subband I will be described
`in detail. Moreover, an image pyramid comprising any
`number of Subbands may be used. A suitable pyramid
`generation method is described in commonly assigned and
`copending U.S. application Ser. No. 08/511,258, entitled
`METHOD AND APPARATUS FOR GENERATING
`IMAGE TEXTURES, filed Aug. 4, 1995, and incorporated
`herein by reference in its entirety.
`After generating an image pyramid (step 915) the attribute
`generation method 900 of FIG. 9 proceeds to step 920,
`where an attribute feature and an associated filtering Scheme
`are selected, and to step 925, where N feature filter are used
`to filter each of the Subbands of the image pyramid. In FIG.
`10 the image Subband I is coupled to a digital filter F.
`comprising three subfilters f-f. Each of the three subfilters
`is tuned to a Specific, narrow range of Spatial frequencies and
`orientations. The type of filtering used, the number of filters
`used, and the range of each filter is adjusted to emphasis the
`type of attribute information produced. For example, the
`inventors have determined that color attributes are appro
`priately emphasized by using Gaussian filters, while texture
`attributes are appropriately emphasized by using oriented
`filters (i.e., filters looking for contrast information in differ
`ing pixel orientations). It must be noted that more or less
`than three sub-filters may be used, and that the filters may be
`of different types.
`After filtering each of the image pyramid Subbands (step
`925), the attribute generation method 900 of FIG. 9 proceeds
`to step 930, where the filter output signals are rectified to
`remove any negative components. In FIG. 10, the output
`signal from each of the three subfilters f-f of digital filter
`F is coupled to a respective Subrectifier within a rectifier R.
`The rectifier R removes negative terms by, e.g., Squaring
`the respective output Signals.
`After rectifying each of the filter output signals (step 930),
`the attribute generation method 900 of FIG. 9 proceeds to
`step 935, where a feature map is generated for the attributes
`represented by each rectified filter output signal. In FIG. 10,
`feature map FM comprises three feature maps associated
`with, e.g., three Spatial frequencies and orientations of
`Subband image I. The three feature maps are then integrated
`to produce a single attribute representation FM" of Sub
`band image I.
`After generating the feature maps (step 935), the attribute
`generation method 900 of FIG. 9 proceeds to step 940,
`
`Page 14
`
`

`

`US 6,956,573 B1
`
`15
`
`25
`
`35
`
`40
`
`7
`where the respective feature maps of each Subband are
`integrated together in one or more integration operations to
`produce an attribute pyramid. In FIG. 10, the previously
`described processing of Subband image I is performed for
`Subband images 12 and I in Substantially the same manner.
`After producing the attribute pyramid related to a par
`ticular attribute (step 940), the routine 900 of FIG. 9
`proceeds to step 945, where the attribute pyramid is stored,
`and to step 945, where a query is made as to whether any
`additional features of the image pyramid are to be examined.
`If the query at stop 945 is affirmatively answered, then the
`routine 900 proceeds to step 920, where the next feature and
`its associated filter are selected. Steps 925-950 are then
`repeated. If the query at Step 945 is negatively answered,
`then the routine 900 proceeds to step 955, where a query is
`made as to whether the next frame should be processed. If
`the query at step 955 is affirmatively answered, then the
`routine 900 proceeds to step 910, where the next frame is
`input. Steps 915-955 are then repeated. If the query at step
`955 is negatively answered, then the routine 900 exits at step
`96.O.
`It is important to note that the attribute information
`generated using the above-described attribute generation
`method 900, 1000 occupies much less memory space than
`the video frame itself. Moreover, a plurality of Such
`attributes Stored in non-pyramid or pyramid form comprise
`an indeX to the underlying Video information that may be
`efficiently accessed and Searched, as will be described
`below.
`The first functional Subsystem of the video information
`processing system 100 of FIG. 1, the authoring Sub-system
`120, will now be described in detail. As previously noted,
`the authoring Sub-System 120 is used to generate and Store
`a representation of pertinent aspects of raw video informa
`tion, Such as information present in Video signal S1. In the
`information processing system 100 of FIG. 1, the authoring
`Subsystem 120 is implemented using three functional
`blocks, a Video Segmentor 122, an analysis engine 124 and
`a video information database 125. Specifically, the video
`Segmentor 122 Segments the Video signal S1 into a plurality
`of logical Segments, Such as Scenes, to produce a Segmented
`Video signal S2, including Scene cut indicia. The analysis
`engine 124 analyzes one or more of a plurality of Video
`information frames included within each segment (i.e.,
`Scene) in the Segmented video signal S2 to produce an
`information stream S3. The information stream S3 couples,
`to an information database 125, information components
`generated by the analysis engine 124 that are used in the
`construction of the video information database 125. The
`video information database 125 may also include various
`annotations to the Stored Video information and ancillary
`information.
`The Segmentation, or “Scene c

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket