`
`
`
`
`
`
`
`Exhibit B
`
`United States Patent No. 6,545,706
`
`
`
`
`
`
`
`
`Case 2:20-cv-02640-NGG-SIL Document 1-5 Filed 06/14/20 Page 2 of 24 PageID #: 28
`I 1111111111111111 11111 lllll lllll lllll 111111111111111 11111 1111111111 11111111
`
`US006545706Bl
`
`(12) United States Patent
`Edwards et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 6,545,706 Bl
`Apr. 8, 2003
`
`(54) SYSTEM, METHOD AND ARTICLE OF
`MANUFACTURE FOR TRACKING A HEAD
`OF A CAMERA-GENERATED IMAGE OF A
`PERSON
`
`(75)
`
`Inventors: Jeffrey L. Edwards, Palo Alto, CA
`(US); Katerina H. Nguyen, Palo Alto,
`CA(US)
`
`(73) Assignee: Electric Planet, Inc., Palo Alto, CA
`(US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by O days.
`
`(21) Appl. No.: 09/364,859
`
`(22) Filed:
`
`Jul. 30, 1999
`
`Int. Cl.7 .................................................. H04N 7/18
`(51)
`(52) U.S. Cl. .......................... 348/169; 348/170; 348/20
`(58) Field of Search ................................. 348/169-171,
`348/19-21, 70-80
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`6/1989 Krueger et al.
`4,843,568 A
`9/1992 Neely et al.
`5,148,477 A
`1/1995 Ogrinc et al.
`5,384,912 A
`9/1995 Freeman
`5,454,043 A
`11/1995 Blank
`5,469,536 A
`7/1996 MacDougall
`5,534,917 A
`8/1996 Okamoto
`5,548,659 A
`10/1996 Zetts
`5,570,113 A
`12/1996 Cipolla et al.
`5,581,276 A
`4/1997 Bulman
`5,623,587 A
`5/1997 Nishimura et al.
`5,631,697 A
`6/1998 Hu
`5,767,867 A
`7/1998 Korn
`5,781,198 A
`8/1998 Fischer et al.
`5,790,124 A
`5,802,220 A * 9/1998 Black et al.
`................ 382/100
`6,154,559 A * 11/2000 Beardsley ................... 340/576
`6,301,370 Bl * 10/2001 Steffens et al. ............... 342/90
`
`OTHER PUBLICATIONS
`
`Crow, F. C., "Summed-Area Tables for Texture Mapping,"
`Computer Graphics, vol. 18(3), 207-212, Jul., 1984.
`Aggarwal, J. K., Cai, Q. "Human Motion Analysis: A
`Review," IEEE Nonrigid and Articulated Motion Workshop
`Proceedings, 90-102, (1997).
`Huang, Chu-Lin, Wu, Ming-Shan, "A Model-based Com(cid:173)
`plex Background Gesture Recognition System," IEEE Inter(cid:173)
`national Conference on Systems, Man and Cybernetics, vol.
`1 pp. 93-98, Oct. 1996.
`Cortes, C., Vapnik, V., "Support-Vector Networks,"
`Machine Learning, vol. 20, pp. 273-297, (1995).
`Swain, M. J., Ballard, D. H., "Indexing Via Color Histo(cid:173)
`grams," Third International Conference on Computer
`Vision, pp. 390-393, Dec., 1990.
`Review: Game Boy Camera, Jul. 15'\ 1998, http://www(cid:173)
`.gameweek.com/reviews/july 15 I gbc.html.
`Barbie PhotoDesigner w/Digital Camera, Box, http://ww(cid:173)
`w.actioned.com/ktktO 126. asp.
`GMD Digital Media Lab: The Virtual Studio; http://viswiz(cid:173)
`.gmd.de/DML/vst/vst.html.
`* cited by examiner
`
`Primary Examiner-Andy Rao
`(74) Attorney, Agent, or Firm-Van Pelt & Yi LLP
`
`(57)
`
`ABSTRACT
`
`A system, method and article of manufacture are provided
`for tracking a head portion of a person image in video
`images. Upon receiving video images, a first head tracking
`operation is executed for generating a first confidence value.
`Such first confidence value is representative of a confidence
`that a head portion of a person image in the video images is
`correctly located. Also executed is a second head tracking
`operation for generating a second confidence value repre(cid:173)
`sentative of a confidence that the head portion of the person
`image in the video images is correctly located. The first
`confidence value and the second confidence value are then
`outputted. Subsequently, the depiction of the head portion of
`the person image in the video images is based on the first
`confidence value and the second confidence value.
`
`24 Claims, 15 Drawing Sheets
`
`Video images
`
`200
`
`Background
`subtraction
`head tracker
`
`202
`
`Free form
`head tracker
`
`Mediator
`
`204
`
`
`
`Case 2:20-cv-02640-NGG-SIL Document 1-5 Filed 06/14/20 Page 3 of 24 PageID #: 29
`
`120
`
`110
`
`116
`
`114
`
`L
`
`Netwo~ (135) 134
`
`118
`
`I I ROM I I RAM
`I
`I
`
`CPU
`
`I
`
`I
`112
`
`124
`
`{
`
`[ ,
`
`I
`
`I
`
`122
`
`User
`interface
`adapter
`
`Camera
`
`133
`
`132
`
`126
`
`128
`
`1/0
`adapter
`
`I I
`!
`
`~ication
`ter
`
`I
`
`138
`,
`
`□ ~
`
`L 136
`
`Dtsptay
`adapter
`
`Figure 1
`
`d •
`r:JJ.
`•
`~
`~ ......
`~ = ......
`
`~ :-;
`
`~CIO
`N
`0
`8
`
`'JJ. =(cid:173)~
`~ ....
`'"""' 0 ....,
`'"""' Ul
`
`e
`rJ'J.
`O'I
`1J.
`,I;;..
`(It
`~
`Q
`O'I
`~
`i,-
`
`
`
`Case 2:20-cv-02640-NGG-SIL Document 1-5 Filed 06/14/20 Page 4 of 24 PageID #: 30
`
`U.S. Patent
`
`Apr. 8, 2003
`
`Sheet 2 of 15
`
`US 6,545,706 Bl
`
`Video images
`
`200
`
`Background
`subtraction
`head tracker
`
`202
`
`Free form
`head tracker
`
`Mediator
`
`204
`
`Figure 2
`
`
`
`Case 2:20-cv-02640-NGG-SIL Document 1-5 Filed 06/14/20 Page 5 of 24 PageID #: 31
`
`U.S. Patent
`
`Apr. 8, 2003
`
`Sheet 3 of 15
`
`US 6,545,706 Bl
`
`r 202
`
`Image time
`
`300
`
`302
`
`Get foreground
`
`Background model
`
`Scene parser
`
`304
`
`Find head for each
`person
`
`306
`
`Head conference
`
`Figure 3
`
`
`
`Case 2:20-cv-02640-NGG-SIL Document 1-5 Filed 06/14/20 Page 6 of 24 PageID #: 32
`
`U.S. Patent
`
`Apr. 8, 2003
`
`Sheet 4 of 15
`
`US 6,545,706 Bl
`
`~ 304
`
`Receive subtracted
`image
`
`Figure to create mass
`distribution
`
`400
`
`402
`
`Threshold elimination
`
`410
`
`Yes
`
`No
`
`416
`
`Pick best mass as
`person
`
`Update person(s)
`data based upon
`frame differencing
`and stored history
`
`Store
`
`418
`
`Figure 4
`
`404
`
`408
`
`Figure 4A
`
`
`
`Case 2:20-cv-02640-NGG-SIL Document 1-5 Filed 06/14/20 Page 7 of 24 PageID #: 33
`
`U.S. Patent
`
`Apr. 8, 2003
`
`Sheet 5 of 15
`
`US 6,545,706 Bl
`
`Foreground pixels
`
`r 30a
`
`Generate Y histogram
`
`500
`
`Search for head torso
`separation based on histogram
`
`504
`
`Search for head top
`
`506
`
`Search for left/right side
`
`508
`
`514
`
`No
`
`ocation an
`size similar to
`
`Change bounding
`box to be consistent t--------~,---------'
`with history
`
`Determine confidence of
`head bounding box
`
`514
`
`Update history if confidence
`is above threshold
`
`516
`
`End
`
`Figure 5
`
`
`
`Case 2:20-cv-02640-NGG-SIL Document 1-5 Filed 06/14/20 Page 8 of 24 PageID #: 34
`
`U.S. Patent
`
`Apr. 8, 2003
`
`Sheet 6 of 15
`
`US 6,545,706 Bl
`
`501
`
`502
`
`Figure SA
`
`
`
`Case 2:20-cv-02640-NGG-SIL Document 1-5 Filed 06/14/20 Page 9 of 24 PageID #: 35
`
`U.S. Patent
`
`Apr. 8, 2003
`
`Sheet 7 of 15
`
`US 6,545,706 Bl
`
`Image, time
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-.1,- -
`
`-
`
`-
`
`-
`
`-
`
`-
`I
`Head motion ~ 606
`detection
`.._--------.------l
`I
`
`600 _...r -
`- -,
`-
`I
`I
`I
`I
`I
`Skin
`IF-"--,--.
`I
`operation
`I
`I
`604 --..___---r--.....J
`I
`I
`I
`I
`I
`I
`l
`I
`I
`Head
`verifier
`I
`I
`~_ , e - - - J
`I
`I
`Head
`I
`I
`capture
`- - - - - - - - - - - - - - - - - -
`
`L,,--, 618
`
`608....,..
`
`.L
`
`'
`
`I
`
`r----------------7
`I
`I
`602 ~
`I
`Motion ~ 61
`Color
`O
`1
`I
`-11----'
`follower
`follower
`,_____~ __ ___.
`I
`I
`612 -....___---.-__ __,
`I
`I
`I
`I
`I
`I
`I 616 ---
`I
`I
`I
`Head
`verifier
`I
`' - -~ -~
`I
`I
`Head
`j
`! ___________ tracker ___ I
`
`J.
`
`614 ._,,-
`
`Figure 6
`
`
`
`d •
`r:JJ.
`•
`~
`~ ......
`~ = ......
`
`~ :-;
`
`~CIO
`N
`0
`
`8
`
`'JJ. =(cid:173)~
`~ ....
`0 ....,
`'"""' Ul
`
`CIO
`
`e
`rJ'J.
`O'I
`1J.
`,I;;..
`(It
`~
`Q
`O'I
`~
`i,-
`
`Case 2:20-cv-02640-NGG-SIL Document 1-5 Filed 06/14/20 Page 10 of 24 PageID #: 36
`
`,r 608
`
`Current
`Module
`
`702
`
`I
`
`'
`Extract
`flesh
`map
`
`706
`
`708
`
`I Raw flesh------.
`map
`
`I
`
`'II M~B~n
`
`Form
`regions
`
`710
`
`Fm
`holes
`
`712
`
`714
`
`716
`
`718
`
`Extract
`regions
`
`Combine
`regions
`
`Generate
`hypothesis
`
`Evatvate
`hypothesis
`
`f ;o-1
`l_~J
`
`715
`\
`ra
`
`1
`1_0_1
`
`0-
`
`Figure 7
`
`700
`
`710
`
`ro
`,
`l_t' _ I
`
`0-
`
`g~
`
`oO
`0
`
`80
`
`~
`
`Figure 7A
`
`Figure 78
`
`Figure 7C
`
`Figure 7D
`
`
`
`Case 2:20-cv-02640-NGG-SIL Document 1-5 Filed 06/14/20 Page 11 of 24 PageID #: 37
`
`U.S. Patent
`
`Apr. 8, 2003
`
`Sheet 9 of 15
`
`US 6,545,706 Bl
`
`,r 716
`
`Generate. a score for
`each region
`
`800
`~
`
`..
`
`Multiple scores for
`each possible
`combi·nation of regions
`
`1,...-.. 802
`
`-
`
`Pick combination with
`best score
`
`804
`~
`
`Figure 8
`
`
`
`Case 2:20-cv-02640-NGG-SIL Document 1-5 Filed 06/14/20 Page 12 of 24 PageID #: 38
`
`U.S. Patent
`
`Apr. 8, 2003
`
`Sheet 10 of 15
`
`US 6,545,706 Bl
`
`rB06
`
`Image, time
`
`~ ..
`Generate m-otion
`map
`
`i_ , - 900
`
`~
`
`Conve.rt into surnmed
`area table
`
`_,,.... 902
`
`. ,~
`
`Generate X, Y history
`
`~ 9·04
`
`.. , ..
`Determine number of
`intersecting objects
`
`~ 906
`
`,.
`
`Determine head for
`each object
`
`../""'> 908
`
`l
`
`Head bounding box
`confidence
`
`Figure 9
`
`
`
`Case 2:20-cv-02640-NGG-SIL Document 1-5 Filed 06/14/20 Page 13 of 24 PageID #: 39
`
`U.S. Patent
`
`Apr. 8, 2003
`
`Sheet 11 of 15
`
`US 6,545,706 Bl
`
`Capture tim~
`
`Verified
`head
`rectangle
`
`Current
`image
`
`~ 604
`
`Tracking tim~
`
`Extract out image
`sub-window
`
`1000
`
`Form color look-up
`table
`
`1006
`
`Current
`image
`
`Previous
`head
`rectangle
`
`Setup
`search grid
`
`1008
`
`Rectangular
`....--------------~
`search grid
`Perform
`search
`
`1016
`
`Smooth
`simulation
`map
`
`Find
`best
`head estimate
`
`Figure 10
`
`1018
`
`1020
`
`
`
`Case 2:20-cv-02640-NGG-SIL Document 1-5 Filed 06/14/20 Page 14 of 24 PageID #: 40
`
`U.S. Patent
`
`Apr. 8, 2003
`
`Sheet 12 of 15
`
`US 6,545,706 Bl
`
`0 ,....
`0 ......
`
`N
`......
`C>
`......
`
`• •
`. . . . . -
`I • • • • • • I
`L.:_ ·_: •_: :J
`
`(])
`t.....
`::a
`·-LL
`C)
`
`(.')
`
`al
`0
`
`~
`
`~
`::::,
`C) u:
`
`,....
`N
`0
`......
`
`C0
`......
`0
`
`-
`
`r - - - ,
`,....
`V
`I
`0
`I._
`I
`I
`_J
`
`......
`0 ,....
`
`<x::
`
`~
`
`~
`
`~
`:::J
`0)
`LL
`
`g 8 .......
`
`r - ,
`
`N
`0
`0 ......
`
`~[
`
`C>
`('iJ
`......
`
`0
`(0
`~
`
`~
`
`~
`~
`:::::I
`0) ·-LL
`
`
`
`Case 2:20-cv-02640-NGG-SIL Document 1-5 Filed 06/14/20 Page 15 of 24 PageID #: 41
`
`U.S. Patent
`
`Apr. 8, 2003
`
`Sheet 13 of 15
`
`US 6,545,706 Bl
`
`~ 1016
`
`Start
`
`Get search grid point
`
`1110
`
`Generate 3-0 history
`for each point of
`search grid
`
`1112
`
`Compare 3-D history
`to color model
`
`1114
`
`Generate score based
`on comparison
`
`1116
`
`Yes
`
`Figure 11
`
`
`
`Case 2:20-cv-02640-NGG-SIL Document 1-5 Filed 06/14/20 Page 16 of 24 PageID #: 42
`
`U.S. Patent
`
`Apr. 8, 2003
`
`Sheet 14 of 15
`
`US 6,545,706 Bl
`
`~ 702
`
`Generate R.. G map
`
`i---- 1200
`
`. ,~
`
`Find "Best Fit" oval ~ 1202
`
`... ,.,
`
`Fm in oval
`
`i-r' 1206
`
`Figure 12
`
`
`
`Case 2:20-cv-02640-NGG-SIL Document 1-5 Filed 06/14/20 Page 17 of 24 PageID #: 43
`
`U.S. Patent
`
`Apr. 8, 2003
`
`Sheet 15 of 15
`
`US 6,545,706 Bl
`
`1301
`\
`Get
`history
`
`.
`
`Image, time
`
`~ 610
`
`J
`
`Determine search
`region
`
`J
`Motion histograms
`(smooth)
`
`i..,,--. 1300
`
`.--..
`
`1302
`
`l
`
`Look for head shape
`
`...,-.. 1308
`
`J
`
`Output head bounding
`box and confidence
`
`i..,,--.
`
`1310
`
`I
`
`Figure 13
`
`
`
`Case 2:20-cv-02640-NGG-SIL Document 1-5 Filed 06/14/20 Page 18 of 24 PageID #: 44
`
`US 6,545,706 Bl
`
`1
`SYSTEM, METHOD AND ARTICLE OF
`MANUFACTURE FOR TRACKING A HEAD
`OF A CAMERA-GENERATED IMAGE OF A
`PERSON
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`This application is related to a U.S. patent application
`filed Jul. 30, 1999 with the title "SYSTEM, METHOD AND
`ARTICLE OF MANUFACTURE FOR DETECTING COL(cid:173)
`LISIONS BETWEEN VIDEO IMAGES GENERATED BY
`A CAMERA AND AN OBJECT DEPICTED ON A DIS(cid:173)
`PLAY" and Katerina H. Nguyen listed as inventor; a U.S.
`patent application filed Oct. 15, 1997 under U.S. Ser. No.
`08/951,083 with the title "A SYSTEM AND METHOD
`FOR PROVIDING A JOINT FOR AN ANIMATABLE
`CHARACTER FOR DISPLAY VIA A COMPUTER SYS(cid:173)
`TEM"; and a U.S. patent application filed Jul. 30, 1999 with
`the title "WEB BASED VIDEO ENHANCEMENT
`APPARATUS, METHOD, AND ARTICLE OF MANU- 20
`FACTURE" and Subutai Ahmad and Jonathan Cohen listed
`as inventors and which are all incorporated herein by
`reference in their entirety.
`
`BACKGROUND OF THE INVENTION
`
`2
`operation is executed for generating a first confidence value.
`Such first confidence value is representative of a confidence
`that a head portion of a person image in the video images is
`correctly located. Also executed is a second head tracking
`5 operation for generating a second confidence value repre(cid:173)
`sentative of a confidence that the head portion of the person
`image in the video images is correctly located. The first
`confidence value and the second confidence value are then
`outputted. Subsequently, the depiction of the head portion of
`10 the person image in the video images is based on the first
`confidence value and the second confidence value.
`In one embodiment of the present invention, the first head
`tracking operation begins with subtracting a background
`image from the video images in order to extract the person
`15 image. Further, a mass-distribution histogram may be gen(cid:173)
`erated that represents the extracted person image. A point of
`separation is then identified between a torso portion of the
`person image and the head portion of the person image.
`Next, the first head tracking operation continues by iden(cid:173)
`tifying a top of the head portion of the person image. This
`may be accomplished by performing a search upwardly from
`the point of separation between the torso portion and the
`head portion of the person image. Subsequently, sides of the
`head portion of the person image are also identified. As an
`25 option, the first head tracking operation may track the head
`portion of the person image in the video images using
`previous video images including the head portion of the
`person image.
`In one embodiment, the second head tracking operation
`may begin by identifying an initial location of the head
`portion of the person image in the video images. Thereafter,
`a current location of the head portion of the person image
`may be tracked starting at the initial location. As an option,
`the initial location of the head portion of the person image
`may be identified upon each instance that the second con(cid:173)
`fidence value falls below a predetermined amount. By this
`feature, the tracking is "restarted" when the confidence is
`low that the head is being tracked correctly. This ensures
`improved accuracy during tracking.
`As an option, the initial location of the head portion of the
`person image may be identified based on the detection of a
`skin color in the video images. This may be accomplished by
`extracting a flesh map; filtering the flesh map; identifying
`45 distinct regions of flesh color on the flesh map; ranking the
`regions of flesh color on the flesh map; and selecting at least
`one of the regions of flesh color as the initial location of the
`head portion of the person image based on the ranking.
`During such procedure, holes in the regions of flesh color on
`50 the flesh map may be filled. Further, the regions of flesh
`color on the flesh map may be combined upon meeting a
`predetermined criteria.
`In a similar manner, the current location of the head
`portion of the person image may be tracked based on the
`55 detection of a skin color in the video images. Such technique
`includes extracting a sub-window of the head portion of the
`person image in the video images; forming a color model
`based on the sub-window; searching the video images for a
`color similar to the color model; and estimating the current
`60 location of the head portion of the person image based on the
`search.
`In one embodiment, the module that identifies the initial
`location of the head portion of the person image and the
`module that identifies the current location of the head
`A system, method and article of manufacture are provided 65 portion of the person image may work together. In particular,
`for tracking a head portion of a person image in video
`while tracking the current location of the head portion of the
`images. Upon receiving video images, a first head tracking
`person image, a flesh map may be obtained. Thereafter, the
`
`40
`
`35
`
`1. The Field of the Invention
`The present invention relates to displaying video images
`generated by a camera on a display, and more particularly to
`tracking a head portion of a person image in camera- 30
`generated video images.
`2. The Relevant Art
`It is common for personal computers to be equipped with
`a camera for receiving video images as input.
`Conventionally, such camera is directed toward a user of the
`personal computer so as to allow the user to view himself or
`herself on a display of the personal computer during use. To
`this end, the user is permitted to view real-time images that
`can be used for various purposes.
`One purpose for use of a personal computer-mounted
`camera is to display an interaction between camera(cid:173)
`generated video images and objects generated by the per(cid:173)
`sonal computer and depicted on the associated display. In
`order to afford this interaction, a current position of the user
`image must be identified. This includes identifying a current
`position of the body parts of the user image, including the
`head. Identification of an exact current location of the user
`image and his or her body parts is critical for affording
`accurate and realistic interaction with objects in the virtual
`computer-generated environment. In particular, it is impor(cid:173)
`tant to track a head portion of the user image since this
`specific body part is often the focus of the most attention.
`Many difficulties arise, however, during the process of
`identifying the current position of the head portion of the
`user image. It is often very difficult to discern the head
`portion when relying on a single technique. For example,
`when identifying the location of a head portion using shape,
`color, motion etc., portions of the background image and the
`remaining body parts of the user image may be confused
`with the head. For example, a flesh coloring of a hand may
`be mistaken for features of the head.
`
`SUMMARY OF THE INVENTION
`
`
`
`Case 2:20-cv-02640-NGG-SIL Document 1-5 Filed 06/14/20 Page 19 of 24 PageID #: 45
`
`US 6,545,706 Bl
`
`3
`flesh map may be used during subsequent identification of
`an initial location of the head portion of the person image
`when the associated confidence level drops below the pre(cid:173)
`determined amount.
`Similar to using the skin color, the initial location of the 5
`head portion of the person image may be also be identified
`based on the detection of motion in the video images. Such
`identification is achieved by creating a motion distribution
`map from the video images; generating a histogram based on
`the motion distribution map; identifying areas of motion 10
`using the histogram; and selecting at least one of the areas
`of motion as being the initial location of the head portion of
`the person image.
`Similarly, the current location of the head portion of the
`person image may be tracked based on the detection of 15
`motion in the video images. This may be accomplished by
`determining a search window based on a previous location
`of the head portion of the person image; creating a motion
`distribution map within the search window; generating a
`histogram based on the distribution motion map; identifying 20
`areas of motion using the histogram; and selecting at least
`one of the areas of motion as being the initial location of the
`head portion of the person image.
`These and other aspects and advantages of the present
`invention will become more apparent when the Description
`below is read in conjunction with the accompanying Draw-
`ings.
`
`4
`FIG. 7C illustrates a flesh map, as outputted from the fill
`holes operation 710 of FIG. 7;
`FIG. 7D illustrates a flesh map, as outputted from the
`combine regions operation 714 of FIG. 7;
`FIG. 8 illustrates a flow chart for a process of the present
`invention associated with the generate hypothesis operation
`716 of FIG. 7;
`FIG. 9 shows a flow chart for a process of the present
`invention associated with the motion detection operation
`606 of FIG. 6;
`FIG. 10 shows a flow chart for a process of the present
`invention associated with the color follower operation 604
`of FIG. 6;
`FIG. lOA illustrates a sub-window of the present inven(cid:173)
`tion associated with operation 1000 of FIG. 10;
`FIG. lOB shows an RGB histogram of the present inven(cid:173)
`tion outputted for each pixel within the image sub-window
`of FIG. lOB as a result of operation 1006 of FIG. 10;
`FIG. lOC is an illustration of a previous verified head
`rectangle and a search grid generated therefrom in operation
`1009 of FIG. 10;
`FIG. 11 shows a flow chart for a process of the present
`25 invention associated with the perform search operation 1016
`of FIG. 10;
`FIG. llA shows the search grid and the areas involved
`with the process of FIG. 11;
`FIG. 12 illustrates a flow chart for a process of the present
`30 invention associated with a feedback process between the
`color follower operation 612 and the skin detection opera(cid:173)
`tion 604 of FIG. 6; and
`FIG. 13 shows a flow chart for a process of the present
`invention associated with the motion follower operation 610
`35 of FIG. 6.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`
`40
`
`The present invention affords a technique for tracking a
`head portion of a person image in camera-generated video
`images. This is accomplished using at least two head track(cid:173)
`ing operations that each track the head portion of the person
`image in camera-generated video images. In addition, each
`45 head tracking operation further generates a confidence value
`that is indicative of a certainty that the head portion of the
`person image is being tracked correctly. This information
`may be used by an associated application for depicting an
`interaction between the head and a virtual computer-
`so generated environment.
`FIG. 1 shows an exemplary hardware configuration in
`accordance with one embodiment of the present invention
`where a central processing unit 110, such as a
`microprocessor, and a number of other units interconnected
`ss via a system bus 112. The hardware configuration shown in
`FIG. 1 includes Random Access Memory (RAM) 114, Read
`Only Memory (ROM) 116, an 1/0 adapter 118 for connect(cid:173)
`ing peripheral devices such as disk storage units 120 to the
`bus 112, a user interface adapter 122 for connecting a
`60 keyboard 124, a mouse 126, a speaker 128, a microphone
`132, a camera 133 and/or other user interface devices to the
`bus 112, communication adapter 134 for connecting the
`hardware configuration to a communication network 135
`(e.g., a data processing network) and a display adapter 136
`65 for connecting the bus 112 to a display device 138.
`The hardware configuration typically has resident thereon
`an operating system such as the Microsoft Windows NT or
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The present invention will be readily understood by the
`following detailed description in conjunction with the
`accompanying drawings, with like reference numerals des(cid:173)
`ignating like elements.
`FIG. 1 is a schematic diagram illustrating an exemplary
`hardware implementation in accordance with one embodi(cid:173)
`ment of the present invention;
`FIG. 2 illustrates a flowchart of a process for tracking a
`head portion of a person image in camera-generated video
`images in accordance with one embodiment of the present
`invention;
`FIG. 3 shows a flow chart for a first head tracking
`operation that tracks a head portion of a person image in
`camera-generated video images using background subtrac(cid:173)
`tion in accordance with one embodiment of the present
`invention;
`FIG. 4 illustrates a flow chart for a process of the present
`invention which carries out the scene parsing operation 304
`of FIG. 3;
`FIG. 5 illustrates a flow chart for a process of the present
`invention which carries out operation 306 of FIG. 3;
`FIG. SA is an illustration of a y-axis histogram generated
`in operation 500 shown in FIG. 5.
`FIG. 6 shows a flow chart for a second head tracking
`operation that tracks a head portion of a person image in
`camera-generated video images using capture and tracker
`routines in accordance with one embodiment of the present
`invention;
`FIG. 7 shows a flow chart for a process of the present
`invention associated with the skin detection operation 604 of
`FIG. 6;
`FIG. 7Aillustrates a person image of the video images, as
`inputted into the extract flesh map operation 702 of FIG. 7;
`FIG. 7B illustrates a raw flesh map, as outputted from the
`extract flesh map operation 702 of FIG. 7;
`
`
`
`Case 2:20-cv-02640-NGG-SIL Document 1-5 Filed 06/14/20 Page 20 of 24 PageID #: 46
`
`US 6,545,706 Bl
`
`5
`Windows/98/2000 Operating System (OS), the IBM OS/2
`operating system, the MAC OS, or UNIX operating system.
`Those skilled in the art will appreciate that the present
`invention may also be implemented on platforms and oper(cid:173)
`ating systems other than those mentioned. For example, a 5
`game system such as a SONY PLAYSTATION or the like
`may be employed. Yet another example includes an appli(cid:173)
`cation specific integrated circuit (ASIC) or any other type of
`hardware logic that is capable of executing the processes of
`the present invention. Further, in one embodiment, the 10
`various processes employed by the present invention may be
`implemented using C++ programming language or the like.
`FIG. 2 illustrates a flowchart of a process for tracking a
`head portion of a person image in camera-generated video
`images in accordance with one embodiment of the present
`invention. As shown, upon receiving video images generated
`by a camera, a first head tracking operation 200 is executed
`for generating a first confidence value. It should be noted
`that the video images may be generated by the camera at any
`time and not necessarily immediately before being received 20
`by the head tracking operation. Further, the video images
`may be partly computer enhanced or completely computer
`generated per the desires of the user.
`The first confidence value generated by the first head
`tracking operation is representative of a confidence that a 25
`head portion of a person image in the camera-generated
`video images is located. Also executed is a second head
`tracking operation 202 for generating a second confidence
`value representative of a confidence that the head portion of
`the person image in the camera-generated video images is 30
`located.
`The first confidence value and the second confidence
`value may then be made available for use by various
`applications in operation 204. Such applications may decide
`whether the head portion of the person image has moved
`based on the confidence values. Logic such as an AND
`operation, an OR operation, or any other more sophisticated
`logic may be employed to decide whether the results of the
`first head tracking operation and/or the second head tracking 40
`operation are indicative of true head movement.
`For example, if at least one of the head tracking opera(cid:173)
`tions indicates a high confidence of head movement, it may
`be decided to assume that the head has moved. On the other
`hand, if both head tracking operations indicate a medium
`confidence of movement, it may be assumed with similar
`certainty that the head has moved. If it is decided to assume
`that the head has moved, an interaction may be shown
`between the video images generated by the camera and the
`virtual computer-generated environment.
`FIG. 3 shows a flow chart for a process associated with the
`first head tracking operation 200. In use, the first head
`tracking operation 200 tracks a head portion of a person
`image in camera-generated video images using background
`subtraction. As shown, in operation 300, the first head 55
`tracking operation begins by obtaining a foreground by
`subtracting a background image from the video images
`generated by the camera. This may be accomplished by first
`storing the background image, or model 302, without the
`presence of the person image. Then, a difference may be 60
`found between a current image and the background image.
`More information on the background model and background
`subtraction may be found in a patent application entitled
`"METHOD AND APPARATUS FOR MODEL-BASED
`COMPOSITING" filed Oct. 15, 1997 under application Ser.
`No: 08/951,089 which is incorporated herein by reference in
`its entirety.
`
`6
`Next, in operation 304, a "scene parsing" process is
`carried which identifies a location and a number of person
`images in the video images. This is accomplished by utiliz(cid:173)
`ing a person image, or foreground mask(s), that is generated
`by the background subtraction carried out in operation 300
`of FIG. 4. Addition information will be set forth regarding
`the "scene parsing" process with reference to FIG. 4. Finally,
`the head portion is found for each person image in operation
`306 that will be set forth in greater detail with reference to
`FIG. 5.
`FIG. 4 illustrates a flow chart for a process of the present
`invention which carries out the scene parsing operation 304
`of FIG. 3. As shown, in operation 400, the subtracted image,
`or foreground mask(s), is first received as a result of the
`15 background subtraction operation 300 of FIG. 3. Next, in
`operation 402, the foreground mask(s) is filtered using a
`conventional median filter to create a mass distribution map.
`FIG. 4A is an illustration of a mass distribution 404 used
`in the scene parsing process of FIG. 4. As shown, the mass
`distribution 404 indicates a number of pixels, or a pixel
`density, along the horizontal axis of the display that do not
`represent the background image. In the mass distribution
`404 of FIG. 4A, a curve 406 of the mass distribution 404 has
`a plurality of peaks 408 which represent high concentrations
`of pixels along the horizontal axis that do not correspond to
`the background image and, possibly, a person image or other
`objects.
`With continuing reference to FIG. 4, in operation 410,
`portions of the mass distribution 404 are eliminated if they
`do not surpass a predetermined threshold. This ensures that
`small peaks 408 of the curve 406 of the mass distribution
`404 having a low probability of being a person image are
`eliminated. Next, it is then determined whether a previous
`mass distribution 404, or history, is available in memory.
`35 Note decision 412.
`If a history is available, the location and number of person
`images in the video images are identified based on a frame
`difference between the peaks 408 of a previous mass distri(cid:173)
`bution and the peaks 408 of the current mass distribution
`404, as indicated in operation 414.
`On the other hand, if the history is not available in
`decision 412, the peaks 408 of the current mass distribution
`404 are considered person images in operation 416. In any
`45 case, the location and number of person images that are
`assumed based on the peaks 408 of the mass distribution 404
`are stored in operation 418. Further information may be
`found regarding scene parsing and locating person images in
`the video images in a U.S. patent application filed Jul. 30,
`50 1999 with the title "SYSTEM, METHOD AND ARTICLE
`OF MANUFACTURE FOR DETECTING COLLISIONS
`BETWEEN VIDEO IMAGES GENERATED BY A CAM(cid:173)
`ERA AND AN OBJECT DEPICTED ON A DISPLAY"
`which is incorporated herein by reference in its entirety.
`Once the person image(s) have been located in the video
`images generated by the camera, it then required that the
`head portion of each person image be located.
`FIG. 5 illustrates a flow chart for a process of the present
`invention which carries out operation 306 of FIG. 3. Such
`process starts in operation 500 by generating a mass(cid:173)
`distribution histogram that represents the extracted person
`image. FIG. SA is an illustration of the histogram 501
`generated in operation 500 shown in FIG. 5. For reasons that
`will soon become apparent, it is important that the histogram
`65 be formed along a y-axis.
`With continuing reference to FIG. 5, a point of separation
`502(See FIG. SA) is then identified in operation 504
`
`
`
`Case 2:20-cv-02640-NGG-SIL Document 1-5 Filed 06/14/20 Page 21 of 24 PageID #: 47
`
`US 6,545,706 Bl
`
`35
`
`8
`in the head tracker routine 602 in operation 614 in order to
`verify that the current location head portion has been iden(cid:173)
`tified after reviewing the detected parameters, e.g., motion,
`skin color, etc. Again, such verifier routine is commonly
`5 known to those of ordinary skill. Further details regarding
`such operation may be found with reference to "Indexing
`Via Color Histograms", by M. J. Swain and D. H. Ballard,
`in Proceedings of 1990 International Conf. on Computer
`Vision, p. 390-393. which is incorporated herein by refer