`Cohen et al.
`
`I lllll llllllll Ill lllll lllll lllll lllll lllll 111111111111111111111111111111111
`US006681031B2
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 6,681,031 B2
`*Jan.20,2004
`
`(54) GESTURE-CONTROLLED INTERFACES
`FOR SELF-SERVICE MACHINES AND
`OTHER APPLICATIONS
`
`6/1995 Davis ......................... 273/437
`5,423,554 A
`5,454,043 A * 9/1995 Freeman ..................... 382/168
`5,481,454 A
`1/1996 Inoue et al.
`................ 364/419
`
`(75)
`
`Inventors: Charles J. Cohen, Ann Arbor, MI
`(US); Glenn Beach, Ypsilanti, MI (US);
`Brook Cavell, Ypsilanti, MI (US);
`Gene Foulk, Ann Arbor, MI (US);
`Charles J. Jacobus, Ann Arbor, MI
`(US); Jay Obermark, Ann Arbor, MI
`(US); George Paul, Ypsilanti, MI (US)
`
`(73) Assignee: Cybernet Systems Corporation, Ann
`Arbor, MI (US)
`
`( *) Notice:
`
`This patent issued on a continued pros(cid:173)
`ecution application filed under 37 CFR
`1.53( d), and is subject to the twenty year
`patent term provisions of 35 U.S.C.
`154(a)(2).
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21)
`(22)
`(65)
`
`Appl. No.: 09/371,460
`Aug. 10, 1999
`Filed:
`Prior Publication Data
`
`US 2003/0138130 Al Jul. 24, 2003
`
`(60)
`
`(51)
`(52)
`
`(58)
`
`(56)
`
`Related U.S. Application Data
`Provisional application No. 60/096,126, filed on Aug. 10,
`1998.
`Int. Cl.7 .................................................. G06K 9/00
`U.S. Cl. ......................... 382/103; 382/209; 701/45;
`345/473; 345/474
`Field of Search ................................. 382/103, 107,
`382/168, 153, 154, 117, 118, 170, 181,
`190, 209, 219, 276; 701/45; 348/169, 170,
`171, 172
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`(List continued on next page.)
`
`OTHER PUBLICATIONS
`
`C. Cohen, G. Beach, G. Paul, J. Obermark, G. Foulk, "Issues
`of Controlling Public Kiosks and other Self Service
`Machines using Gesture Recognition," Oct. 1998.
`
`(List continued on next page.)
`
`Primary Examiner-Jayanti K. Patel
`Assistant Examiner---Abolfazl Tabatabai
`(74) Attorney, Agent, or Firm-Gifford, Krass, Groh,
`Sprinkle, Anderson & Citkowski, PC
`
`(57)
`
`ABSTRACT
`
`A gesture recognition interface for use in controlling self(cid:173)
`service machines and other devices is disclosed. A gesture is
`defined as motions and kinematic poses generated by
`humans, animals, or machines. Specific body features are
`tracked, and static and motion gestures are interpreted.
`Motion gestures are defined as a family of parametrically
`delimited oscillatory motions, modeled as a linear-in(cid:173)
`parameters dynamic system with added geometric con(cid:173)
`straints to allow for real-time recognition using a small
`amount of memory and processing time. A linear least
`squares method is preferably used to determine the param(cid:173)
`eters which represent each gesture. Feature position measure
`is used in conjunction with a bank of predictor bins seeded
`with the gesture parameters, and the system determines
`which bin best fits the observed motion. Recognizing static
`pose gestures is preferably performed by localizing the
`body/object from the rest of the image, describing that
`object, and identifying that description. The disclosure
`details methods for gesture recognition, as well as the
`overall architecture for using gesture recognition to control
`of devices, including self-service machines.
`
`5,047,952 A
`
`9/1991 Kramer et al. ........... 364/513.5
`
`17 Claims, 19 Drawing Sheets
`
`Gesture
`Generation
`
`Vision
`System
`
`Gesture
`Recognition
`
`Translator
`
`Multimedia
`Interface
`
`Device
`Control
`
`Virtual World
`Interaction
`Gesture Recognition System Flow
`Chart.
`
`1
`
`IROBOT 2011
`Shenzhen Zhiyi Technology v. iRobot
`IPR2017-02061
`
`
`
`US 6,681,031 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`8/1996 Abe et al. ................... 364/419
`5,544,050 A
`10/1996 Maes et al. ................. 395/121
`5,563,988 A
`10/1996 Barrus ........................ 364/559
`5,570,301 A
`12/1996 Cipolla et al. .............. 345/156
`5,581,276 A
`1/1997 Freeman ..................... 345/158
`5,594,469 A
`3/1997 Beernink et al.
`........... 345/173
`5,612,719 A
`7/1997 Conway et al. ............. 395/327
`5,652,849 A
`8/1997 Sakiyama et al.
`.......... 395/753
`5,659,764 A
`9/1997 Favot et al.
`................ 345/156
`5,668,573 A
`9/1997 Doi et al.
`................... 345/156
`5,670,987 A
`12/1997 Sagawa et al.
`............. 382/100
`5,699,441 A
`1/1998 Moghaddam et al. ....... 382/228
`5,710,833 A
`2/1998 Tokioka et al. ............ 73/865.4
`5,714,698 A
`3/1998 Kuzunuki et al.
`.......... 395/333
`5,732,227 A
`5/1998 Nitta et al.
`................. 345/156
`5,757,360 A
`6/1998 Redmond ............... 434/307 R
`5,759,044 A
`6/1998 Korth ......................... 345/168
`5,767,842 A
`8/1998 Harada et al.
`.............. 345/339
`5,798,758 A
`9/1998 Oohara et al. .............. 345/358
`5,801,704 A
`9/1998 Kramer et al. .............. 128/782
`5,813,406 A
`10/1998 Maggioni ................... 382/165
`5,828,779 A
`1/1999 Ando et al. ................. 704/251
`5,864,808 A
`1/1999 Horvitz et al. ................. 707/6
`5,864,848 A
`5,875,257 A * 2/1999 Marrin et al. ............... 382/107
`5,880,411 A
`3/1999 Gillespie et al. ......... 178/18.01
`3/1999 Sakou et al. ................ 382/100
`5,887,069 A
`5,889,236 A
`3/1999 Gillespie et al. ......... 178/18.01
`5,889,523 A
`3/1999 Wilcox et al.
`.............. 345/357
`................ 345/348
`4/1999 Small et al.
`5,898,434 A
`5/1999 Hoffberg et al. ............ 382/209
`5,901,246 A
`5,903,229 A
`5/1999 Kishi
`.......................... 341/20
`5,907,328 A
`5/1999 Brush II et al.
`............ 345/358
`5/1999 Yamada ...................... 707/541
`5,907,852 A
`5,917,490 A
`6/1999 Kuzunuki et al.
`.......... 345/351
`5,990,865 A * 11/1999 Gard .......................... 345/156
`6,035,053 A * 3/2000 Yoshioka et al.
`........... 382/104
`6,137,908 A * 10/2000 Rhee .......................... 382/187
`6,272,231 Bl * 8/2001 Maurer et al. .............. 382/103
`6,301,370 Bl * 10/2001 Steffens et al. ............. 382/103
`6,335,977 Bl * 1/2002 Kage .......................... 382/107
`
`OIBER PUBLICATIONS
`
`L. Conway, C. Cohen, "Video Mirroring and Iconic Ges(cid:173)
`tures: Enhancing Basic Videophones to Provide Visual
`Coaching and Visual Control," (no date available).
`C. Cohen, L. Conway, D. Koditschek, G. Roston, "Dynamic
`System Representation of Basic and Non-Linear in Param(cid:173)
`eters Oscillatory Motion Gestures," Oct. 1997.
`C. Cohen, L. Conway, D. Koditschek, "Dynamic System
`Representation, Generation, and Recognition of Basic Oscil(cid:173)
`latory Motion Gestures," Oct. 1996.
`
`C. Cohen, G. Beach, B. Cavell, G. Foulk, J. Obermark, G.
`Paul, "The Control of Self Service Machines Using Gesture
`Recognition," (Aug. 1999).
`United States Air f orce Instruction, "Aircraft Cockpit and
`Formation Flight Signals," May 1994 U.S. Army Field
`Manual No. 21-60, Washington, D.C., Sep. 30, 1987
`Arnold, V.I., "Ordinary Differential Equations," MIT Press,
`1978.
`Cohen, C., "Dynamical System Representation, Generation
`and Recognition of Basic Oscillatory Motion Gestures and
`Applications for the Control of Actuated Mechanisms,"
`Ph.D. Dissertation, Univ. of Michigan, 1996. Frank, D.,
`"HUD Expands Kiosk Program," Federal Computer Week,
`Mar. 8, 1999.
`Hager, G., Chang, W., Morse, A.; "Robert Feedback Control
`Based on Stereo Vision: Towards Calibration-Free Hand(cid:173)
`Eye Coordination," IEEE Int. Conf. Robotics and Automa(cid:173)
`tion, San Diego, CA, May 1994. Hauptmann, A., "Speech
`and Gestures for Graphic Image Manipulation," Computer
`Human Interaction 1989 Proc., pp. 241-245, May 1989.
`Hirsch, M. Smale, S., "Differential Equations, Dynamical
`Systems and Linear Algebra," Academic Press, Orlando, FL,
`1974 Kanade, T., "Computer Recognitionof Human Faces,
`"Birkhauser Verlag, Basel and Stuttgart, 1977.
`Karon, P., "Beating an Electronic Pathway to Government
`with Online Kiosks," Los Angeles Times, Aug. 25, 1996.
`Link-Belt Construction Equipment Co., "Operating Safety:
`Crames & Excavators," 1987. Turk, M., Pentland, A,
`"Eigenfaces for Recognition," Journal of Cognitive Neuro(cid:173)
`science, 3, 1, 71-86, 1991.
`Narendra, K. Balakrishnan, J. "Improving Transient
`Response to Adaptive Control Systems Using Multiple
`Models and Switching," IEEE Trans. on Automatic Control,
`39:1861-1866, Sep. 1994. Rizzi, A, Whitcomb, L.,
`Koditschek, D .; "Distributed Real-Time Control of a Spatial
`Robot Juggler," IEEE Computer, 25(5) May 1992.
`Wolf, C., Morrel-Samuels, P., "The use of hand-drawn
`gesetures for text editing, Int. Journ. of Man-Machine
`Studies," vol. 27, pp. 91-102, 1987. Wolf, C., Rhyne, J., "A
`Taxonomic Approach to Understanding Direct Manipula(cid:173)
`tion," lour. of the Human Factors Society 31th Annual
`Meeting, pp. 576-580.
`Yuille, A., "Deformable Templates for Face Recognition,"
`Journ. of Cognitive Neuroscience, 3, 1, 59-70, 1991.
`
`* cited by examiner
`
`2
`
`
`
`U.S. Patent
`
`Jan.20,2004
`
`Sheet 1of19
`
`US 6,681,031 B2
`
`Ged\11'
`GenemliOn
`
`kiosk
`
`Multimedia Interface
`
`Figure 1: Gesture Recognition System.
`
`Gesture
`Generation
`
`Vision
`System
`
`Gesture
`Recognition
`
`Translator
`
`Multimedia
`Interface
`
`Device
`Control
`
`Virtual World
`Interaction
`
`Figure 2: Gesture Recognition System Flow
`Chart.
`
`3
`
`
`
`U.S. Patent
`
`Jan.20,2004
`
`Sheet 2of19
`
`US 6,681,031 B2
`
`G Gesture 1--l...._
`
`Creation 1------.
`
`Sensor
`Module
`
`s
`
`x,y
`pos, vet
`image data
`
`Identification
`Module
`
`I
`
`identified
`gesture
`
`Transformation T
`
`Module
`
`transformed
`command
`
`Controlled R
`
`System
`
`system
`response
`
`Figure 3: Signal Flow Diagram of the Gesture
`Recognition System.
`
`4
`
`
`
`U.S. Patent
`
`Jan.20,2004
`
`Sheet 3of19
`
`US 6,681,031 B2
`
`x-pos
`
`large slow line
`xymin-xymax
`
`clockwise
`large slow circle
`
`large slow line
`xmaxymin-xmlnymax
`
`counter clockwise
`large slow circle
`
`Figure 4: Example gestures, showed
`in two dimensions.
`
`y-pos
`
`x-pos
`
`slow:
`large slow circle
`
`medium:
`large fast circle
`
`fast:
`small fast circle
`
`Figure 5: Three Example
`Gestures.
`
`5
`
`
`
`U.S. Patent
`
`Jan.20,2004
`
`Sheet 4of19
`
`US 6,681,031 B2
`
`y
`
`y
`
`y
`
`y
`
`clockwise large
`slow circle
`
`clockwise large
`fast circle
`
`y
`
`clockwise small
`slow circle
`y
`
`clockwise small
`fast circle
`y
`
`x
`
`x
`
`x
`
`x
`
`x
`
`ccw large
`slow circle
`y
`
`ccw large
`fast circle
`y
`
`ccw small
`slow circle
`y
`
`ccw small fast
`circle
`
`y
`
`x
`
`large slow line
`xmin-xymax
`y
`
`large fasl line
`xmin-xymax
`y
`
`x
`
`x
`
`x
`
`x
`
`x
`
`small slow line
`xm in·xyma·x
`y
`
`small fast line
`x m in·xymax
`y
`
`x
`
`x
`
`small fast ine
`large fast line
`1.
`I.
`11 1
`I
`I
`.
`sm a sow 1ne
`.
`arge sow tnc
`xmaxym in-xminymall
`xm axy m in·xm inym ax x m axym rn·x m mym axxm axym in-xm inym ax
`y
`y
`y
`
`y
`
`x
`
`large slow y-line
`y
`
`large fast y-line
`y
`
`x
`
`x
`
`x
`
`x
`
`small slow y-line
`y
`
`small fast y-line
`y
`
`x
`
`x
`
`large slow x-Iine
`
`large fast x-line
`
`small slow x-line
`
`small fast x-linc
`
`Figure 6: An Example 24 Gesture Lexicon.
`
`6
`
`
`
`U.S. Patent
`
`Jan. 20, 2004
`
`Sheets of 19
`
`US 6,681,031 B2
`
`Figure 7: Slow Down Gesture.
`
`Figure 8: Prepare to Move Gesture.
`
`7
`
`
`
`U.S. Patent
`
`Jan. 20, 2004
`
`Sheet 6 of 19
`
`US 6,681,031 B2
`
`Figure 9: Attention Gesture.
`
`8
`
`
`
`U.S. Patent
`
`Jan. 20, 2004
`
`Sheet 7 of 19
`
`US 6,681,031 B2
`
`Figure 1 O: Stop Gesture.
`
`Figure 11: Right or Left Turn Gestures.
`
`Figure 12: 110kay 11 Gesture.
`
`9
`
`
`
`U.S. Patent
`
`Jan.20,2004
`
`Sheet 8of19
`
`US 6,681,031 B2
`
`Figure 13: Freeze Gesture.
`
`y-position
`
`x-vekx;ity
`
`x-positlon
`
`x-posltion
`
`A one dimension oscillating
`x-line human gesture performed
`in two dimensional space.
`
`A time hisl<ny of the x-line
`human created g.sture.
`
`A two dimensional phase
`space lrajectory ol lhe
`human created x-line gestute.
`
`Figure 14: Pl~ts of a Human Created One
`Dimensional X-Line Oscillating Motion.
`
`x-position
`
`•
`
`Figure 15: Possible Lines Associated with
`x(t,p)=p0+p1t and Their Equivalent
`Representation in the p Parameter Space.
`
`10
`
`
`
`U.S. Patent
`
`Jan. 20, 2004
`
`Sheet 9 of 19
`
`US 6,681,031 B2
`
`x
`
`f(x,9 )
`
`• x
`
`I\ x
`
`e
`
`Figure 16: Parameter·Fitting: We Require a
`Rule for q to Bring the Error to Zero.
`
`y
`
`e
`
`y
`
`2
`
`y
`
`9
`
`x
`
`x
`
`x
`
`Figure 17: Plots of Different (xi,yi) Data
`Points that Result in a Different Best Fitting q
`Line.
`
`11
`
`
`
`U.S. Patent
`
`Jan.20,2004
`
`Sheet 10 of 19
`
`US 6,681,031 B2
`
`y
`
`9
`
`x
`
`x
`
`8
`
`x
`
`y
`
`4
`
`Figure 18: The Recursive Linear Least
`Squares Method for Updating q with Each
`Additional (xi,yi) Data Point.
`
`x-velocity
`
`.,....------.--....- e
`e ~ actual next state
`~ computed state from
`
`~ current state
`
`x-position
`
`fast res
`error
`
`•
`
`•
`
`slow prediction bin
`..____computed state from
`medium prediction bin
`
`.,..
`
`_computed state from
`fast prediction bin
`
`Figure 19: An Exaggerated Representation
`of the Residual Error Measurement.
`
`12
`
`
`
`U.S. Patent
`
`Jan.20,2004
`
`Sheet 11 of 19
`
`US 6,681,031 B2
`
`plot lexicon gestures
`in phase-plane
`
`1
`
`"guess" appropriate
`models to match plots
`
`l
`
`for each model, determine
`parameters for each
`gesture in lexicon
`
`l
`
`test models using total
`residual error calculation
`
`l
`does the model with lowest
`no
`total residual error have small -
`enough residuals?
`l yes
`
`select model with
`the smallest total
`residual error
`
`Figure 20: An Algorithm for Determining the
`Specific Gesture Model.
`
`13
`
`
`
`U.S. Patent
`
`Jan.20,2004
`
`Sheet 12 of 19
`
`US 6,681,031 B2
`
`worst residual
`error ratio
`
`1.0 - -
`
`0.8 - -
`
`0.6 - -
`
`0.4 - -
`
`0.2 - -
`
`Linear
`with Offset
`Component
`
`Van Van der Pol Higher
`der Pol with Drift Order
`Component Terms
`
`model type
`Velocity
`Damping
`
`Figure 21: The Worst Case Residual Ratios
`for Each Gesture Model. The Lower the
`Ratio, the Better the Model.
`
`14
`
`
`
`U.S. Patent
`
`Jan.20,2004
`
`Sheet 13 of 19
`
`US 6,681,031 B2
`
`y position
`
`x position
`
`x velocity
`
`x position
`
`time
`
`x position
`
`X-axis portion of a gesture
`
`y position
`
`A plot of the above x- me The two di nsional phase
`space trajectory of x-line
`portion's position as a
`function of time.
`gesture
`y position
`
`y velocity
`
`x position
`
`Y-axis portion of a gesture
`
`xandy
`
`\_; u
`
`A plot of the above y-
`line portion's position as
`a function of time.
`
`y
`
`nsional phase
`space trajectory of y-line
`gesture
`
`x positi
`
`Figure 22: Two Perpendicular Oscillatory
`Line Motions Combined into a Circular
`Gesture.
`
`15
`
`
`
`U.S. Patent
`
`Jan. 20, 2004
`
`Sheet 14 of 19
`
`US 6,681,031 B2
`
`Figure 23: Bounding Box Around Hand.
`
`Figure 24: Descriptions from Bounding Box.
`
`16
`
`
`
`U.S. Patent
`
`Jan.20,2004
`
`Sheet 15 of 19
`
`US 6,681,031 B2
`
`ills
`wx-pos
`
`_&s
`
`~x-pos
`
`slow:
`large slow circle
`
`medium:
`large fast circle
`
`fast:
`small fast circle
`
`Figure 25: The Example Gestures.
`
`Color Camera t---~
`
`Figure 26: Schematic of the Hand Tracking
`System Hardware.
`
`17
`
`
`
`U.S. Patent
`
`Jan.20,2004
`
`Sheet 16 of 19
`
`US 6,681,031 B2
`
`Capture New Image
`
`No
`
`Find Difference Image
`
`Compute Moving Center
`
`Compute Static Center
`
`Display Target Center
`
`Figure 27: Flowchart of the CTS.
`
`18
`
`
`
`U.S. Patent
`
`Jan.20,2004
`
`Sheet 17 of 19
`
`US 6,681,031 B2
`
`r. ·-·--· . ·--·- ·-·--··- ·····--· ·- ----··--···------·
`· File
`1 -
`
`~lp
`
`Box Row Size
`
`·.
`
`.
`
`. . . . 20 "··
`---
`'
`
`·~ .~~;~.
`
`:.
`
`. ···~! . ';.,
`
`;.·.· :.::. :.;
`
`· · ···
`
`-- -
`
`-
`
`-
`
`......• ··
`-
`-
`
`-
`
`-
`
`-
`
`,
`- - ~ - -,,.- ~ - --- -- -
`
`-
`
`-
`
`I .. I .
`I.· I
`I
`11 ~ ·ai-;:.;,-s'iz~ · ..,::-··~· ·"":7:- -r:)y-·-·,7:·-: - ·:-f'·· · - :~::- -·-T,,,,-'.~ :::;;:¥''. .. ··.:·J·..
`I· ,30.'.:
`I Hot!.;. ln~ity n..' tiOiion.C....t 1lr
`I ~~:£::;f ~ ~m:: ~~~ .. .
`
`.
`
`.
`
`. .·
`
`·.
`
`.. . . . .
`
`li 1:
`
`'
`
`-===--...::::....::..-=:::.-:·
`
`.
`
`l! i.A
`..·===-=~=~:=·-·-=-::~-:-.=.::...-:::::::. --:!
`'
`=·==·--·· ---· -·
`Figure 28: Graphical User Interface of the
`CTS.
`
`19
`
`
`
`U.S. Patent
`
`Jan.20,2004
`
`Sheet 18 of 19
`
`US 6,681,031 B2
`
`•
`
`Image 1
`
`•
`
`lmage2
`
`0
`
`Image 2 - Image 1
`
`)
`Diff. Image & Color-Filter Target Center
`
`Figure 29: Target Center from Difference
`Image.
`
`Figure 30: Color Matching Technique.
`
`Dynamic
`Gestures
`
`Which
`
`Screen
`
`Gesture?
`
`Display
`
`Static
`Gestures
`Figure 31: Identification Module.
`
`20
`
`
`
`U.S. Patent
`
`Jan.20,2004
`
`Sheet 19 of 19
`
`US 6,681,031 B2
`
`from
`sensor
`module
`
`geometric
`information
`
`min res and
`bin number
`
`threshold
`
`or null
`
`information
`specific overall
`gesture number
`
`Figure 32: Simplified Diagram of the
`Dynamic Gesture Prediction Module.
`
`21
`
`
`
`US 6,681,031 B2
`
`1
`GESTURE-CONTROLLED INTERFACES
`FOR SELF-SERVICE MACHINES AND
`OTHER APPLICATIONS
`
`REFERENCE TO RELATED APPLICATIONS
`
`This application claims priority of U.S. provisional patent
`application Ser. No. 60/096,126, filed Aug. 10, 1998, the
`entire contents of which are incorporated here by reference.
`
`STATEMENT
`
`This invention was made with Government support under
`contracts NAS9-98068 (awarded by NASA), DASWOl-98
`M-0791 (awarded by the U.S. Army), and F29601-98-C-
`0096 (awarded by the U.S. Air Force). The Government has
`certain rights in this invention.
`
`FIELD OF THE INVENTION
`
`This invention relates to person-machine interfaces and,
`in particular, to gesture-controlled interfaces for self-service
`machines and other applications.
`
`s
`
`2
`Simple tests can then be used to determine what gestures are
`truly intuitive for any given application.
`For certain types of devices, gesture inputs are the more
`practical and intuitive choice. For example, when control-
`ling a mobile robot, basic commands such as "come here",
`"go there", "increase speed", "decrease speed" would be
`most efficiently expressed in the form of gestures. Certain
`environments gain a practical benefit from using gestures.
`For example, certain military operations have situations
`10 where keyboards would be awkward to carry, or where
`silence is essential to mission success. In such situations,
`gestures might be the most effective and safe form of input.
`A system using gesture recognition would be ideal as
`input devices for self-service machines (SSMs) such as
`1s public information kiosks and ticket dispensers. SSMs are
`rugged and secure cases approximately the size of a phone
`booth that contain a number of computer peripheral tech(cid:173)
`nologies to collect and dispense information and services. A
`typical SSM system includes a processor, input device(s)
`20 (including those listed above), and video display. Many
`SSMs also contain a magnetic card reader, image/document
`scanner, and printer/form dispenser. The SSM system may
`or may not be connected to a host system or even the
`Internet.
`The purpose of SSMs is to provide information without
`the traditional constraints of traveling to the source of
`information and being frustrated by limited manned office
`hours or to dispense objects. One SSM can host several
`different applications providing access to a number of
`information/service providers. Eventually, SSMs could be
`the solution for providing access to the information con-
`tained on the World Wide Web to the majority of a popu(cid:173)
`lation which currently has no means of accessing the Inter(cid:173)
`net.
`SSMs are based on PC technology and have a great deal
`of flexibility in gathering and providing information. In the
`next two years SSMs can be expected to follow the tech(cid:173)
`nology and price trends of PC's. As processors become
`faster and storage becomes cheaper, the capabilities of SSMs
`40 will also increase.
`Currently SSMs are being used by corporations,
`governments, and colleges. Corporations use them for many
`purposes, such as displaying advertising (e.g. previews for a
`new movie), selling products (e.g. movie tickets and
`45 refreshments), and providing in-store directories. SSMs are
`deployed performing a variety of functions for federal, state,
`and municipal governments. These include providing motor
`vehicle registration, gift registries, employment information,
`near-real time traffic data, information about available
`so services, and tourism/special event information. Colleges
`use SSMs to display information about courses and campus
`life, including maps of the campus.
`
`BACKGROUND OF THE INVENTION
`Gesture recognition has many advantages over other input 25
`means, such as the keyboard, mouse, speech recognition,
`and touch screen. The keyboard is a very open ended input
`device and assumes that the user has at least a basic typing
`proficiency. The keyboard and mouse both contain moving
`parts. Therefore, extended use will lead to decreased per- 30
`formance as the device wears down. The keyboard, mouse,
`and touch screen all need direct physical contact between the
`user and the input device, which could cause the system
`performance to degrade as these contacts are exposed to the
`environment. Furthermore, there is the potential for abuse 35
`and damage from vandalism to any tactile interface which is
`exposed to the public.
`Tactile interfaces can also lead hygiene problems, in that
`the system may become unsanitary or unattractive to users,
`or performance may suffer. These effects would greatly
`diminish the usefulness of systems designed to target a wide
`range of users, such as advertising kiosks open to the general
`public. This cleanliness issue is very important for the touch
`screen, where the input device and the display are the same
`device. Therefore, when the input device is soiled, the
`effectiveness of the input and display decreases. Speech
`recognition is very limited in a noisy environment, such as
`sports arenas, convention halls, or even city streets. Speech
`recognition is also of limited use in situations where silence
`is crucial, such as certain military missions or library card
`catalog rooms.
`Gesture recognition systems do not suffer from the prob(cid:173)
`lems listed above. There are no moving parts, so device wear
`is not an issue. Cameras, used to detect features for gesture
`recognition, can easily be built to withstand the elements and ss
`stress, and can also be made very small and used in a wider
`variety of locations. In a gesture system, there is no direct
`contact between the user and the device, so there is no
`hygiene problem. The gesture system requires no sound to
`be made or detected, so background noise level is not a 60
`factor. A gesture recognition system can control a number of
`devices through the implementation of a set of intuitive
`gestures. The gestures recognized by the system would be
`designed to be those that seem natural to users, thereby
`decreasing the learning time required. The system can also 65
`provide users with symbol pictures of useful gestures similar
`to those normally used in American Sign Language books.
`
`SUMMARY OF THE INVENTION
`The subject invention resides in gesture recognition meth(cid:173)
`ods and apparatus. In the preferred embodiment, a gesture
`recognition system according to the invention is engineered
`for device control, and not as a human communication
`language. That is, the apparatus preferably recognizes com(cid:173)
`mands for the expressed purpose of controlling a device
`such as a self-service machine, regardless of whether the
`gestures originated from a live or inanimate source. The
`system preferably not only recognizes static symbols, but
`dynamic gestures as well, since motion gestures are typi(cid:173)
`cally able to convey more information.
`In terms of apparatus, a system according to the invention
`is preferably modular, and includes a gesture generator,
`
`22
`
`
`
`US 6,681,031 B2
`
`4
`of a gesture it represents will exhibit a smaller residual error
`than a bin predicting the future state of a gesture that it does
`not represent. For simple dynamic gestures applications, a
`linear-with-offset-component model is preferably used to
`discriminate between gestures. For more complex gestures,
`a variation of a velocity damping model is used.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`3
`sensing system, modules for identification and transforma(cid:173)
`tion in to a command, and a device response unit. At a high
`level, the flow of the system is as follows. Within the field
`of view of one or more standard video cameras, a gesture is
`made by a person or device. During the gesture making 5
`process, a video image is captured, producing image data
`along with timing information. As the image data is
`produced, a feature-tracking algorithm is implemented
`which outputs position and time information. This position
`information is processed by static and dynamic gesture
`recognition algorithms. When the gesture is recognized, a
`command message corresponding to that gesture type is sent
`to the device to be controlled, which then performs the
`appropriate response.
`The system only searches for static gestures when the 15
`motion is very slow (i.e. the norm of the x and y-and
`z-velocities is below a threshold amount). When this
`occurs, the system continually identifies a static gesture or
`outputs that no gesture was found. Static gestures are
`represented as geometric templates for commonly used 20
`commands such as Halt, Left/Right Turn, "OK," and Freeze.
`Language gestures, such as the American Sign Language,
`can also be recognized. A file of recognized gestures, which
`lists named gestures along with their vector descriptions, is
`loaded in the initialization of the system. Static gesture 25
`recognition is then performed by identifying each new
`description. A simple nearest neighbor metric is preferably
`used to choose an identification. In recognizing static human
`hand gestures, the image of the hand is preferably localized
`from the rest of the image to permit identification and 30
`classification. The edges of the image are preferably found
`with a Sobel operator. A box which tightly encloses the hand
`is also located to assist in the identification.
`Dynamic (circular and skew) gestures are preferably
`treated as one-dimensional oscillatory motions. Recognition 35
`of higher-dimensional motions is achieved by independently
`recognizing multiple, simultaneously created one(cid:173)
`dimensional motions. A circle, for example, is created by
`combining repeating motions in two dimensions that have
`the same magnitude and frequency of oscillation, but
`wherein the individual motions ninety degrees out of phase.
`A diagonal line is another example. Distinct circular ges(cid:173)
`tures are defined in terms of their frequency rate; that is,
`slow, medium, and fast.
`Additional dynamic gestures are derived by varying phase 45
`relationships. During the analysis of a particular gesture, the
`x and y minimum and maximum image plane positions are
`computed. Z position is computed if the system is set up for
`three dimensions. If the x and y motions are out of phase, as
`in a circle, then when x or y is minimum or maximum, the 50
`velocity along the other is large. The direction
`(clockwiseness in two dimensions) of the motion is deter(cid:173)
`mined by looking at the sign of this velocity component.
`Similarly, if the x and y motion are in phase, then at these
`extremum points both velocities are small. Using clockwise 55
`and counter-clockwise circles, diagonal lines, one(cid:173)
`dimensional lines, and small and large circles and lines, a
`twenty-four gesture lexicon was developed and described
`herein. A similar method is used when the gesture is per(cid:173)
`formed in three dimensions.
`An important aspect of the invention is the use of param(cid:173)
`eterization and predictor bins to determine a gesture's future
`position and velocity based upon its current state. The bin
`predictions are compared to the next position and velocity of
`each gesture, and the difference between the bin's prediction 65
`and the next gesture state is defined as the residual error.
`According to the invention, a bin predicting the future state
`
`10
`
`FIG. 1 is a drawing of a gesture recognition system
`according to the invention;
`FIG. 2 is a gesture recognition system flow chart;
`FIG. 3 is a signal flow diagram of a gesture recognition
`system according to the invention;
`FIG. 4 is a drawing which shows example gestures in two
`dimensions;
`FIG. 5 shows three example gestures;
`FIG. 6 is an example of a 24-gesture lexicon according to
`the invention;
`FIG. 7 depicts a Slow-Down gesture;
`FIG. 8 depicts a Move gesture;
`FIG. 9 depicts an Attention gesture;
`FIG. 10 depicts a Stop gesture;
`FIG. 11 shows Right/Left Turn gestures;
`FIG. 12 shows an "Okay" gesture;
`FIG. 13 shows a Freeze gesture;
`FIG. 14 provides three plots of a human created one
`dimensional X-Line oscillating motion;
`FIG. 15 shows possible lines associated with x(t,p)=pO+
`plt and their equivalent representation in the p-parameter
`space;
`FIG. 16 illustrates parameter fitting wherein a rule is used
`for q to bring the error to zero;
`FIG. 17 plots different (xi,yi) data points resulting in a
`different best fitting q line;
`FIG. 18 depicts a recursive linear least squares method for
`40 updating q with subsequent (xi,yi) data points;
`FIG. 19 illustrates an algorithm for determining a specific
`gesture model according to the invention;
`FIG. 20 is an exaggerated representation of a residual
`error measurement;
`FIG. 21 is a plot which shows worst case residual ratios
`for each gesture model, wherein the lower the ratio, the
`better the model;
`FIG. 22 illustrates how two perpendicular oscillatory line
`motions may be combined into a circular gesture;
`FIG. 23 shows how a bounding box may be placed around
`a hand associated with a gesture;
`FIG. 24 provides descriptions from the bounding box of
`FIG. 23;
`FIG. 25 shows example gestures;
`FIG. 26 is a schematic of hand-tracking system hardware
`according to the invention;
`FIG. 27 is a flowchart of a color tracking system (CTS)
`according to the invention;
`FIG. 28 depicts a preferred graphical user interface of the
`CTS;
`FIG. 29 illustrates the application of target center from
`difference image techniques;
`FIG. 30 illustrates a color matching technique;
`FIG. 31 is a representation of an identification module;
`and
`
`60
`
`23
`
`
`
`5
`FIG. 32 is a simplified diagram of a dynamic gesture
`prediction module according to the invention.
`
`US 6,681,031 B2
`
`6
`neously in two or three dimensions. A circle is such a
`motion, created by combining repeating motions in two
`dimensions that have the same magnitude and frequency of
`oscillation, but with the individual motions ninety degrees
`5 out of phase. A "diagonal" line is another such motion. We
`have defined three distinct circular gestures in terms of their
`frequency rates: slow, medium, and fast. An example set of
`such gestures is shown in FIG. 4. These gestures can also be
`performed in three dimensions, and such more complex
`motions can be identified by this system.
`The dynamic gestures are represented by a second order
`equation, one for each axis:
`
`10
`
`DETAILED DESCRIPTION OF IBE
`INVENTION
`FIG. 1 presents a system overview of a gesture controlled
`self service machine system according to the invention. FIG.
`2 shows a flow chart representation of how a vision system
`is views the gesture created, with the image data sent to the
`gesture recognition module, translated into a response, and
`then used to control a SSM, including the display of data, a
`virtual environment, and devices. The gesture recognition
`system takes the feature positions of the moving body parts
`(two or three dimensional space coordinates, plus a time
`stamp) as the input as quickly as vision system can output 15
`the data and outputs what gesture (if any) was recognized,
`again at the same rate as the vision system outputs data.
`The specific components of the gesture recognition sys(cid:173)
`tem are detailed in FIG. 3, and these include five modules:
`Gesture Generation
`S: Sensing (vision)
`I: Identification Module
`T: Transformation
`R: Response
`At a high level, the flow of the system is as follows.
`Within the field of view of one or more standard video
`cameras, a gesture is made by a person or device. During the
`gesture making process, a video capture card is capturing
`images, producing image data along with timing informa(cid:173)
`tion. As the image data is produced, they are run through a
`feature tracking algorithm which outputs position and time
`information. This position information is processed by static
`and dynamic gesture recognition algorithms. When the
`gesture is recognized, a command message corresponding to
`that gesture type is sent to the device to be controlled, which
`then performs