`Holzrichter
`
`[54]
`
`METHODS AND APPARATUS FOR NON(cid:173)
`ACOUSTIC SPEECH CHARACTERIZATION
`AND RECOGNITION
`
`[75]
`
`Inventor: John F. Holzrichter, Berkeley, Calif.
`
`[73]
`
`Assignee: The Regents of the University of
`California, Oakland, Calif.
`
`[21] Appl. No.: 08/597,596
`
`[22] Filed:
`
`Feb. 6, 1996
`
`Int. Cl.6
`........................................................ GlOL 3/02
`[51]
`[52] U.S. Cl. .......................... 704/208; 704/205; 704/206;
`704/207
`[58] Field of Search ..................................... 395/2.1, 2.16,
`395/2.27, 2.37, 2.67, 2.17, 2.15, 2.14; 704/205-208,
`201, 218, 228, 258
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`2,193,102
`2,539,594
`2,823,365
`3,555,188
`3,699,856
`3,925,774
`4,027,303
`4,092,493
`4,260,229
`4,461,025
`4,621,348
`4,769,845
`4,783,803
`4,803,729
`4,882,746
`4,903,305
`4,914,703
`5,008,941
`5,027,406
`5,030,956
`5,127,055
`5,202,952
`5,227,797
`5,280,563
`5,337,394
`
`3/1940 Koch ........................................... 250/6
`1/1951 Rines et al. ............................... 250/17
`2/1958 Rines .......................................... 340/6
`1/1971 Meacham ................................ 381/115
`10/1972 Chabot et al. ............................. 95/1.1
`12/1975 Amlung .................................. 340/258
`5/1977 Neuwirth et al. ....................... 340/258
`5/1978 Rabiner et al. ............................. 179/1
`4/1981 Bloomstein ............................... 352/50
`7/1984 Franklin .................................... 381/56
`11/1986 Tender .................................... 367/116
`9/1988 Nakamura ................................. 381/43
`11/1988 Baker et al.
`.............................. 381/42
`2/1989 Baker ........................................ 381/43
`11/1989 Shimada .................................. 455/462
`2/1990 Gillick et al. ............................. 381/41
`4/1990 Gillick ...................................... 381/43
`4/1991 Sejnoha ................................... 3881/43
`6/1991 Roberts et al.
`........................... 381/43
`7/1991 Murphy ..................................... 342/22
`6/1992 Larkey ...................................... 381/43
`4/1993 Gillick et al. ............................... 395/2
`7/1993 Murphy ..................................... 342/22
`1/1994 Ganong ....................................... 395/2
`8/1994 Sejnoha .................................... 395/2.5
`
`I 1111111111111111 11111 1111111111 11111 lllll 111111111111111 lll111111111111111
`US006006175A
`[11] Patent Number:
`[45] Date of Patent:
`
`6,006,175
`Dec. 21, 1999
`
`5,345,471
`5,361,070
`5,386,492
`5,388,183
`5,390,278
`5,428,707
`5,573,012
`
`9/1994 McEwan ..................................... 375/1
`11/1994 McEwan ................................... 342/21
`1/1995 Wilson et al.
`......................... 395/2.61
`2/1995 Lynch ..................................... 395/2.51
`2/1995 Gupta et al. ........................... 395/2.52
`6/1995 Gould et al. ............................. 395/2.4
`11/1996 McEwan ................................. 128/782
`
`OTHER PUBLICATIONS
`
`Rabiner, L. R. "Applications of Voice Processing to Tele(cid:173)
`communications", Proc. of the IEEE, 82(2), 199-228 (Feb.
`1994).
`Skolnik, M.I. ( ed.) "Radar Handbook 2nd ed." McGraw
`Hill, page v (1990).
`Waynant, R. W. and Ediger, M. N. (eds.) "Electro-Optics
`Handbook", McGraw-Hill, p. 24.22 (1994).
`Flanagan, J. L. "Speech Analysis Synthesis, and Percep(cid:173)
`tion", Academic Press NY, pp. 8, 16-20, 154--156 (1965).
`Coker, C.H. "A Model of Articulatory Dynamics and Con(cid:173)
`trol", Proc. IEEE, 64(4), 452-459 (1976).
`Javkin, H. et al "Multi-Parameter Speech Training System"
`Speech and Language Technology for Disabled Persons,
`Proceedings of a European Speech Communication Asso(cid:173)
`ciation (ESCA) Workshop, Stockholm, Sweden, 137-140
`(May 31, 1993).
`
`(List continued on next page.)
`
`Primary Examiner-Tariq R. Hafiz
`Attorney, Agent, or Firm-John P. Wooldridge
`
`[57]
`
`ABSTRACT
`
`By simultaneously recording EM wave reflections and
`acoustic speech information, the positions and velocities of
`the speech organs as speech is articulated can be defined for
`each acoustic speech unit. Well defined time frames and
`feature vectors describing the speech, to the degree required,
`can be formed. Such feature vectors can uniquely charac(cid:173)
`terize the speech unit being articulated each time frame. The
`onset of speech, rejection of external noise, vocalized pitch
`periods, articulator conditions, accurate timing, the identi(cid:173)
`fication of the speaker, acoustic speech unit recognition, and
`organ mechanical parameters can be determined.
`
`48 Claims, 31 Drawing Sheets
`
`Page 1 of 63
`
`GOOGLE EXHIBIT 1022
`
`
`
`6,006,175
`Page 2
`
`OIBER PUBLICATIONS
`
`Papcun, G. et. al. "Inferring articulation and recognizing
`gestures from acoustics with a neural network trained on
`x-ray microbeam data", J. Acoustic Soc. Am. 92(2),
`688-700 (Aug. 1992).
`Olive, J.P. et al. "Acoustics of American Engliish Speech",
`Springer-Verlag, pp. 79-80 (1993).
`Hirose, H. and Gay, T. "The Activity of the Intrinsic Laryn(cid:173)
`geal Muscles in Voicing Control", Phonetica 25, 140---164
`(1972).
`Tuller, B. et al. "An evaluation of an alternating magnetic
`field device for monitoring tongue movements", J. Acoust.
`Soc. Am. 88(2), 674-679 (Aug. 1990).
`Gersho, A. "Advances in Speech and Audio Compression"
`Proceeding of IEEE 82(6), 900-918 (1994).
`
`Schroeter, J. and Sondhi, M. M. "Techniques for Estimating
`Vocal-Tract Shapes from the Speech Signal", IEEE Trans.
`on Speech and Audio Proceeding 2(1), Part II, 133-150 (Jan.
`1994).
`
`Atal, B. S. and Hanauer, S. L. "Speech Analysis and
`Synthesis by Linear Prediction of the Speech Wave", J.
`Acoustic Soc. Am. 50(2) Part II, 637-655 (1971).
`
`Furui, S. "Cepstral Analysis Technique for Automatic
`Speaker Verification", IEEE Trans. on Acoustics, Speech,
`and Signal Processing, ASSP 29(2), 254-272 (1981).
`
`Rabiner, L. and Juang, B.-H. "Fundamentals of Speech
`Recognition", Prentice Hall, pp. 436-438, 494 (1993).
`
`Page 2 of 63
`
`
`
`6,006,175
`6,006,175
`
`(savaoiad)TS
`
`..._,,
`
`Sheet 1 of 31
`Sheet 1 of 31
`
`
`
`NOLLZIAWASINOLQaaoLs
`
`
`
`SOBNIONSSANIWAS
`
`
`
`KAOWAWSBOLIAN
`
`RIvwaen
`
`SISAIWNY
`
`
`MEW2SSYATEWASSYONTADLINNOLANONIGsaroSNALNAS
`
`
`
`
`t
`
`f
`
`
`
`Dec. 21, 1999
`Dec.21, 1999
`
`
`
`t
`
`f
`
`QaomMmNIAaaniuvasAWYNSIS
`
`
`
`U.S. Patent
`U.S. Patent
`
`w -z.
`0
`:%
`0.
`
`SNOHdOSAIW
`
`~ V -l
`
`Page 3 of 63
`
`Page 3 of 63
`
`
`
`
`
`Sheet 2 of 31
`Sheet 2 of 31
`
`6,006,175
`6,006,175
`
`
`
`
`
`SatAWNHANONOL
`
`Dec. 21, 1999
`Dec. 21, 1999
`
`
`WALNOZIGOHHANOWHALNOW—-XNADV
`pySdinQALiNIinaneALINGSD(7)Zaan
`
`
`
`3SON\vSVN
`
`AgNOALINGD13hXNAABWHS
`(HMA1UOdALIAWD
`
`
`
`U.S. Patent
`U.S. Patent
`
`Page 4 of 63
`
`a.
`
`
`
`
`
`dilSNONOL08ANSNOLAYDON
`
`N
`
`PBOSNAS
`
`
`
`ALINGDONO—
`
`ws
`
`
`
`Ns,UBaAnn$Q704WIONwUosnas
`
`bWSJd104
`
`Page 4 of 63
`
`
`
`
`
`Ul
`.....:a
`~
`....
`
`.... = = 0--,
`
`0--,
`
`'"""'
`~
`
`~
`
`0 ....,
`~ ....
`'JJ. =(cid:173)~
`
`\0
`\0
`'"""'
`\0
`'"""' ~
`N
`ri
`~
`~
`
`~ = ......
`~ ......
`~
`•
`r:JJ.
`d •
`
`ETC.
`lRANSMl '5~ \0"1
`
`MEMOR.'(,
`\0 0\5PLAY1
`+
`
`ASSEMBLY
`\5 ~ SE~,E~CE
`
`~SSEM'BL'(
`
`WORt:>
`
`~
`
`14 ""\
`
`M~'TC\-l\~G,
`PA..1t'E2N
`
`I
`
`,1 "'\..
`
`L\BRA.R.'(
`\/EC:TOR.
`I~~ FEA.'TURE
`I ':>ToREb
`
`-------_J
`
`1
`
`~E~TOR
`_,;J FEATURE
`
`FOR.M L\P
`
`_J
`I
`,----
`•
`
`9
`
`-
`
`ETC.
`E,G. TV, A\R..~LOUJ
`OT\-\ER.. seN4:>02E>,
`lo
`
`seNSOR I
`5~/ L\P5 l/4
`
`FIG.3
`
`~ECTOR
`Fe~T\J'RE.
`EM/TONGUEL---+-f----+1'0NGUE-~~~
`
`~3
`
`~EN~OR
`
`:r~VJ
`
`8
`
`l I 1 FORM
`~V~~~~LO
`
`\JECfOR R>R.MAT ION r-\ \
`
`~ 'l"O\ ~, l=E~1"UR.E
`
`AW0 P2.0CESS
`
`E~-ACO\JST\C
`
`VOCAL ~,er
`
`\/ECTOR.
`FEATURE
`
`FOR.\,,\
`
`1
`
`~ECTOR.
`FeATURE
`
`1
`
`\]
`
`M\C.R.OPMOWE M ~\.l~'{S\S-l'2M:.1'
`ACOV~'T\C.. I I \IOCAl.. 'S"!S,EM
`
`MOOE LS
`
`EM/~OCAL rl 111
`
`SEN~OR
`FOLDS
`
`Page 5 of 63
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 4 of 31
`
`6,006,175
`
`\/ELUM
`
`\/ELUM
`CLOSED
`
`PH~R'<~Y.
`
`L\PS
`
`~
`[ill
`
`2.0
`
`\JOCAL FOL.0:,
`
`FIG.4
`
`Page 6 of 63
`
`
`
`Ul
`.....:a
`~
`....
`
`.... = = 0--,
`
`0--,
`
`'"""'
`~
`
`0 ....,
`Ul
`~ ....
`'JJ. =(cid:173)~
`
`\0
`\0
`'"""'
`\0
`'"""' ~
`N
`ri
`~
`~
`
`~ = ......
`~ ......
`~
`•
`r:JJ.
`d •
`
`LJ
`
`4b
`
`47
`
`L\N\<..
`
`CO"-{MU~\CA""T lO~
`
`I
`\ELEPHON'f
`rOR. COOE.'D
`
`TELE
`
`"IOCOOE~
`145
`
`U~\T
`
`S'iSTe.M.S
`TO OT)(ER.
`lt.lFORM~"t\0
`ACOU~T\C
`WIRELESSJ
`\AJ\R.E., O?T'ICJ
`Ll~k ~01'
`
`4\
`TiRAN.'SMISSIOIJ
`
`G-e:"-LE~TOR.
`
`A~O S~NTENd:
`Ge.AMMAR
`S~E.LL\~G
`
`~'<WT A.)(.
`
`WOQ.t)
`'"\
`35
`
`\ NTERi=ACE
`E'GlUlPMa,rr
`AU~\L\..\AR.V
`
`W\T
`
`I
`
`CAME'RA
`\/\CEO
`
`-
`
`GENER>-.L P~0CESSOR.
`
`CON'TRs:>L UN\T A~O
`
`PR.o Ce SSOR.
`RECOGN\TlON
`
`WOIJ ACOU-STIC. SPEEC\-1
`'l
`
`V\DEO TER.MlNAL
`
`\.OUO g::,1;A.\'(6R
`
`31
`3Sl_
`~9 1..-
`
`3\-f
`
`FEEC\!.ACK
`R>R. \IE.RSAL
`HE'At:>PHONE5
`
`')
`2q
`'37----
`1~;
`
`L->20
`
`FIG.5
`
`U"1\T
`
`\CENT\ F\O\TION
`
`SPEP\\<.ER
`
`coce Book
`REC06NrnoN
`ACOUCST\C
`
`~o~
`
`41 _A, lDEWi\ FlER
`LAW.GUAGE
`t=OR.E\GN
`
`TraA~SL.ATOR.
`
`AWD
`
`44~
`
`A~O CODE
`
`BOOK
`
`ReC06Nl'Z.ER
`
`SPEECH
`ACOU!::>T\C
`
`;34
`
`4\
`
`,'t-..lt:> COCE
`S'MTf4ES\"Z.ER
`
`Sook
`
`s?eecH
`5 33
`
`4~
`
`MOU$E
`A~D
`
`k:.CE'( 13 O"R. 0
`CON\leN"'t \ON~L
`
`I
`
`/?,\
`_r3b
`
`!>l~
`
`COMMA.NOS
`FOR. l=AST
`k.EVPAO
`'SlMPLE
`
`MIC20PHOWE.
`C.ON\lelfT\ONAL
`
`AND
`
`SPEE°" SE.~SOR..'S
`NO~ ACOUST\C.
`
`Page 7 of 63
`
`
`
`U.S. Patent
`
`6,006,175
`Sheet 6 of 31
`Dec. 21, 1999
`2r-------------------,,
`,.
`u
`A
`I.S
`
`D ra O.S
`w~
`:> UJ
`
`- -1 w
`<.J .J
`w <( « 2
`l: \9
`UJ en
`
`0
`
`-0. 5
`
`- )
`-1.5
`-2,..__..,_ _ _,__..,__...L.-_ _ . _ _~ -~
`\.5
`-l -0,S O O.S
`'2.S
`\
`TIM~ (S)
`
`FIG. 6
`
`'l
`
`1.5'
`
`SALINE
`
`5.'\L\)(G
`
`~L\NE
`
`'
`~ ~ 0.5
`w2
`lJJ~
`0
`:, lf1
`" a -0.5
`z UJ
`~~ _,
`'l1J ~M hct
`-\.5
`-2
`-2.5
`
`-\.'~15
`-0.2.5
`T\ME (s)
`
`0.875
`
`FIG.7
`
`Page 8 of 63
`
`
`
`U.S. Patent
`U.S. Patent
`
`Dec. 21, 1999
`Dec.21, 1999
`
`Sheet 7 of 31
`Sheet 7 of 31
`
`6,006,175
`6,006,175
`
`\
`\
`\
`\
`\
`
`SONSOUANG
`
`@AIONV
`
`AMOADAN-3
`
`NO/LDAAG
`
`I
`I
`I
`I
`I
`—
`
`~~~*SNOVINOWdOUd—
`
`@anNMWa
`
`
`
`WNNALNYS10dId
`
`-Cl
`
`WPSB-7/X
`
`DBAUIWSNVAL
`
`mais-2)
`
`V8Dld
`
`Page 9 of 63
`
`Page 9 of 63
`
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 8 of 31
`
`6,006,175
`
`,-<
`
`0~
`
`~ ~-w~
`I.I) w C). u§
`u.w ~i
`~
`
`~
`
`,.i.
`cl>
`
`,<
`('-I
`
`~
`-
`w
`l!-
`I
`w
`
`V
`
`~
`tt)
`•
`0
`1--4
`J ~
`
`~ llJ
`~
`~
`~
`
`t
`~w
`u. ~
`~ '° :z
`~ c! t
`t-
`3 0. ~ ~ :, Gl --1711 r-
`c( 0 0 w w
`a"'-<-woo c:e ~ u
`u_JJ w l1J rJ
`w R ~ ...J a ........
`0
`,<
`llJ 5 bl -
`~ J c(
`0.
`ot O Q. D
`::>
`
`c:(
`
`I
`I
`I
`I
`I
`
`\
`
`Page 10 of 63
`
`
`
`U.S. Patent
`
`Dec.21,1999
`
`Sheet 9 of 31
`
`6,006,175
`
`;I.
`0
`
`0
`..J
`lu
`u.
`<.... _______ y __ 01 ___ _,,,J
`
`I
`
`Page 11 of 63
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 10 of 31
`
`6,006,175
`
`~
`0'
`•
`C,
`~
`~
`A
`
`r
`
`Cl
`l!J
`'\ '::!.
`0
`7
`2
`:>
`
`~
`•
`C,
`~
`~
`
`~
`cQ
`
`-!
`
`•
`•
`lo
`-ct
`~
`c:-,,1
`
`-
`I
`
`:#
`
`u.
`
`0
`
`lJ._
`
`V
`
`-
`~
`
`0
`\J
`-.<(
`
`(flt)
`Q~
`J a.
`~:,
`_J 8
`<u
`o< 9'-J
`
`:"
`-
`"'
`In -
`\J .. •
`~ ........_
`0 '
`u -0
`>
`}~
`
`'t.
`0
`
`c/)
`
`-]::
`I
`
`a
`llJ
`
`0
`ul
`
`0
`7
`
`" -
`5
`
`Page 12 of 63
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 11 of 31
`
`6,006,175
`
`(R.EFERENCE---,
`I
`T\ME.)
`
`I
`
`FQ.ONT OF
`'FA.d::
`
`TEETH CLOSED
`
`\/ELUM<:)Pl:i~
`TO NOS'e
`
`w
`D
`:)
`t(cid:173)
`J
`a.
`~
`
`w
`D
`:>
`t
`
`J 1
`
`l
`
`2
`
`(~t:=ERENce"i
`TlME)
`F'RONTOF
`l=ACE
`O?Q.l\N5
`L\PS
`
`FIG.IOA
`
`7"0~GUE T\P- l=>ALA,E
`C.ONTA.CT
`
`C .~ \JEUJM
`NO 2E~LECt"\Ol-J
`~'t'N'>'(cid:173)
`NoRMA.L
`
`FIG.lOB
`
`2..
`
`llME (.n.~ec.)
`4
`FIG.lOC
`
`Page 13 of 63
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 12 of 31
`
`6,006,175
`
`ROUNt>'TR.\P 'T\ME. i:"R.0""'
`~"'1'SM\TI'ER ,o \ . .\?5 SA.Cl<:
`TO R.ECli.\VER. = \.0 VlSec.
`A.PPR.O~.
`
`1\ME.
`
`FIG.llA
`
`~R,O\lT OF FA.CE
`
`TEET\-\
`TONG,UE
`
`\/ELUM
`
`L\PS
`
`"T\\.\ta.
`
`FIG.11B
`
`....
`-..1 I• RAlJ.c.£ GA"lE ON \\ME o., n.sec. T\~'E.
`\OTA\.. "TIME i:oR. 30 ?\.>LSE- ~1
`1~
`FIG.llC
`
`B\~S \S M!!,J:JJ'f' "3 't\Sec.
`
`FIG.11D
`
`\8
`
`I 3 S 1
`2. 4
`I::,
`'TOTAL TlME Fo~ 1000 ?oL~ES
`PeQ. ~l"-1 A'-lt:> '30 8\\lS lS ASOUT
`15 M\lL\$E<.ONOS
`
`Page 14 of 63
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 13 of 31
`
`6,006,175
`
`PULSE
`GE"-l•'RAT0R -
`
`El-'\ SeN soR..
`CONT~OL ~S1AR'T
`Ut-.llT
`r"
`so
`5\\.
`IR\GGER. -►
`
`I
`
`/54-
`v55
`. \W~GAATO~ ~
`5b_/
`
`•
`
`57./ AMP
`
`-
`l
`t:£LI\'{
`
`I
`
`- S\) .. HTC~
`
`6"8"\
`
`'
`AID
`
`H
`
`59
`•
`~
`PROc.eSSOR. r-4
`
`60(,_
`
`6\
`
`MEMORY
`
`~ FEATURE.
`\/cct"OR.S
`
`r-4
`
`- TOOTHEl<_
`At=>PL\ CJ!rr\ OW S
`
`~ 52)
`
`'
`
`Sf:
`
`~53
`
`Ml~OtJE. u-61.
`
`AID
`t
`ME'MOR.Y
`•
`
`l=EA,URE.
`\JECT0QS
`
`L,)6"3
`-
`
`564
`
`65
`
`.....
`
`-
`
`-
`
`66
`
`-
`
`-
`
`✓67
`
`PROCESSOR..
`
`COMS\NER.
`
`IS \"T
`SPEECJ4
`~OGNlT\OW NO
`7
`VE'.=»
`
`SPE~C\-t
`.
`- QJ:CDG"-l \, \ON
`ALGOR\1"MM
`
`~68
`
`FIG.12
`
`Page 15 of 63
`
`
`
`U.S. Patent
`
`Dec.21,1999
`
`Sheet 14 of 31
`
`6,006,175
`
`PUL~E
`GEW\:-Q.A,OR
`
`-
`
`E'M'SSN~OR.
`CON""{R.OL. ~START
`\JN\T
`
`1RIGGER.
`
`1---41
`
`\\l~-rc,R. -~
`
`AMP
`
`A/0
`
`--o
`
`TIM'E.
`
`MS~OR."(
`
`-
`(6
`
`r
`~ I Mel-\OR.'f -
`
`Me,AOR..'{ ---~o
`6oa
`
`COW'TR.OL
`Ul\l\T W\TH
`CONTROL
`~\G\..l~LS
`
`FIG.13
`
`J
`Qr:
`
`M\CQDPHoWE
`
`A/D
`
`MSMOR..~
`t
`FEAT\JR.E
`'\Jee.TORS
`t
`PWCESSOR. -
`t
`COM~lNER..
`
`.~
`
`-
`
`l
`O'cLA"(
`
`I
`
`SWlTCH
`
`l=EA"TURE..
`~ec:roR~
`••
`
`IS l i
`SPEl=C.H
`2EC06~\i\ON
`.
`7
`'{ES
`
`WO
`
`TO OT\i~R..
`Af''?\..\CA"T\ON~
`
`SPEECI-\
`- RECOGN\'i \ON
`.
`~OR.\THM
`
`Page 16 of 63
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 15 of 31
`
`6,006,175
`
`Acoustic Feature
`Vector Input
`
`,,
`
`Is the Quality of the fit
`between the processed
`+.: radar data and the decision
`filter high or Is it low?
`> 97% • high
`<97%-low
`
`◄ . , ._ Both High:
`
`---- Identification error is
`
`less than 0.1 % confidence
`Proceed with Recog)tion
`
`Combined or selected
`EM-sensor feature
`Vector Input
`
`, ,
`
`Does the conventional
`acoustic speech recognizer
`have a high probability of
`identification or low?
`> 97%-high
`< 97%-low
`I
`
`Both the acoustic recognizer and the EM
`based recognizer validate the
`identificaition with total probability the
`combination of tAe separate probabilities. .
`
`- - - - - - - - - . Check library for quality of expected fit to
`EM based information, and for type of
`4 ~
`acoustic uncertainty. If acoustic set Is - -
`EM data High & Acoustic
`resolvabte by EM data let EM data break the
`low
`Check to see it ambiguity acoustic based ·ambiguity"". If not find
`combined probability and if >97% print.
`can be resolved, then
`othewise send message to operator that
`i - - -+ - - - i proceed with recogniton:
`_________ _. word Is uncertain .
`
`. . . - - - - - - - - - · Check library for expectation of acoustic
`, .,._ EM sensor low & Acoustic
`identification-to be resolvable by EM- · .. •
`High:
`sensor data. If LOW choose acoustic.
`If HIGH, choose combined probability Is
`Check to see if ambiguitl
`>97%. If <97°.4, print uncertainty note to
`can be resolved, then
`- - - - proceed witn .r_pcogniton
`operator but continue with post
`proceSSing to use grammar and context
`to increase probability. .
`
`-
`
`Bothlow:
`
`Notify operator.
`and proceed
`
`Send note to operator
`with poor word - - - - - - - - - - . . t
`identification, or try
`another post processor to
`find word from context
`
`FIG.14
`
`Output to Control
`Unit for Action
`
`Page 17 of 63
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 16 of 31
`
`6,006,175
`
`\00
`so
`
`0
`
`0
`
`T\-\e SOUNt>
`ak\.\ •.••.
`
`I"-
`
`FIG.15A
`
`_,
`<
`-VJ
`~
`-0
`-50
`~ -100
`0 0.1 0.1. 0~ OA 0.5 0.60."1 0.6 O.~
`Tlt-\E. (1.ec.)
`2 0 0 - - - - - - - - - - -
`\ 60
`THE. SOONt>
`a\\\\ ....
`
`_j
`4.
`Z
`
`~
`
`\00
`
`s~ ~
`
`~ -50
`~ -lOO
`.
`iE
`-:.~~
`0 0.1 O?. O.~ OA OS 0.6 01 0.& O.C:,'
`TIME (Sec.)
`
`:FIG.15B
`
`-~----=------FIG.15D
`102..
`F~EQIJENC.'< (J\-z.)
`
`\0 '?,
`
`Page 18 of 63
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 17 of 31
`
`6,006,175
`
`5,-----------------
`
`~-'--MII--
`
`T~VJ t>Q.OPS
`N 0.'2., 'S g\:FQQE.
`"0\..16:" \ S
`\IOICE.t>
`
`,,,_...
`z,
`m %
`0
`0-
`~ -)
`ct
`cl..
`0
`
`- "3 1-------------a.___,j
`
`" -i.
`
`_,
`
`_ _ . _ _ _ i _ _ _
`0.5"
`
`~ -5,..._ __ L - -_
`-as
`o
`'T\ME (~c)
`
`\Joe.AL
`F"OL.t)S
`
`___J
`
`l
`
`FIG.16
`
`COWT/:ii..c:r
`
`0
`Tl ME ( sec.)
`
`s
`
`FIG.17
`
`Page 19 of 63
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 18 of 31
`
`6,006,175
`
`Fl W \ Si \-I e-0
`G,O TO ~E)CT
`FR.AME Y\+l
`
`TE~T ~e~"T\JQ..E.
`s-----1 'JEC.TOQ l=OR
`~PEEC.H 'F"R.A.Mli: V\
`
`It
`
`C'-'E:C.K A.LL F EA."TUQ.£ \IECTOR..
`r--_.-~ COlaFF\C\eNT K W\-\E.Q.E
`CONTAC..T MAY OCCUR.
`
`I
`
`OOES K
`l='EATUR.E. VEC.iOR
`C.OEFFlc\El--l-r E~C.EEO ies-T
`TuR.ESHOLD m.. FOR. ~AC:i
`'T'O HA'1 e: occ.uR;;: t) ?
`'/ES
`
`SU6'TR.A..C.T ,owc;.uE. F~"T\JQ.E
`<:.o~Fl=\C.lENT k
`\W FRAME Y\.
`F"R.oM K C.OEF'l='\C\EW, \N
`l=~Mi: V\.- \
`
`DOES l="EA.,UR.E \IECTOR
`COEFl='\CIE\IT K FOR "'TONG,\JE.
`\"'CR£A~E ~'1' MOR.~ ~AW
`6 1
`'IES
`
`COt-.lT AO \--ltii..S OCC.UR.l:~ OW
`FRA~l::S Y\.-.i "THRU V\.. A'T
`LOC.A"T\0"-1 K.. ~'=t' l=eATU~E:
`VEC.'TOR. WOTAi,-\0\..l ~OR.
`C.ONTAC..T TEST W\.leN Th\3l£
`LOO\<.uP occ..u~s
`
`._1--l_O ___ _
`
`NO
`
`- - -11 . - - - IL - - - - -
`
`.
`\NC.R.eA-:~e K SY
`\ TO Q-lEO<. R)R
`CON.,~c."r A.,
`OTHER..
`LOCA.1 lON S
`
`_]
`
`FIG.18
`
`Page 20 of 63
`
`
`
`.... = = 0--,
`
`Ul
`.....:a
`~
`....
`
`0--,
`
`signal ls low
`acoustic
`user that
`message to
`Send
`
`form feature vector,
`NASR system to
`Use data from
`
`set flag for no
`
`acoustiic data.
`
`Set T • T + ( ti • ~-, )
`
`to user
`Noise message
`T • T + {ti· ~-t )
`noise, aet
`
`Bl "'-1 mark as rejected
`information and
`Delete frame
`
`7e
`
`Return to tontrol
`
`Return 'to Control
`
`Return to'Control
`
`Return· to Control
`
`Returrf to Control
`
`77
`
`YES
`
`above threshold> £?
`Is there acoustic sound
`
`73
`
`NO I
`Timer T> 0.5 sec?
`Is unvoiced
`
`YES
`
`~"\"Mt'
`
`72.
`
`Checks present fram
`acoustic speech sensor
`
`for acoustic level,
`
`7\
`
`"am" In "sam", 0.5 sec
`example, voiced speech period
`
`example, unvoiced speech period for
`
`"s" In "sam", 0.2 sec
`
`Tell controller to
`
`start or continue
`frame t I
`
`be unvoiced.
`acoustic signal present, must
`There Is an acceptable
`
`Store feature vector for
`
`Set T • T + ( t1 -,.1 )
`
`7~
`
`NO
`
`present frame?
`moved to form the
`Have other organs
`
`83
`
`6~
`
`~
`
`'"""
`
`0 ....,
`'"""
`\0
`~ ....
`'JJ. =(cid:173)~
`
`'"""
`'""" ~
`N
`ri
`~
`~
`
`\0
`\0
`\0
`
`~ = ......
`~ ......
`~
`•
`r:JJ.
`d •
`
`S4-
`
`YES
`
`~
`
`last 0.5 sec.?
`of v .fold motion, in
`Are there 0.3 sec
`
`NOf
`
`S 1 Return to Control
`
`Signal to User
`
`End of Speech
`
`Set T • 0
`
`acoustics < £
`last 0.5 sec with
`Delete frames in
`
`85
`
`FIG.19
`
`so
`t1-_t!Olf, SE< dehi.?
`energy, liE, from frame
`Is the change in acoustic
`
`Is there vocal
`
`in box 72?
`1 fold motion
`
`ES
`
`1 .._
`
`NO
`
`are preaent,
`Both Acoustics aria EM
`
`Ht T • 0, for
`
`vocalized speech
`
`Store feature vector
`
`NO
`
`to stan or continue
`for frame t1 , Tell control
`
`ES I
`
`frame ti ?
`Fold FIiter pass band In I~
`Is there motion In Vccal
`
`Page 21 of 63
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 20 of 31
`
`6,006,175
`
`- o.os
`
`I
`
`\
`
`I
`
`o.os
`
`FIG.20A
`
`o. I
`
`-o. \
`
`-o.3
`
`\\ tAu..u.."
`-0.5 _ _ _ _ _ _ ____J,___ _ _ _ _ _ --1
`o.06'
`-0.05
`0
`T\ME (5)
`
`FIG. 20B
`
`Page 22 of 63
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 21 of 31
`
`6,006,175
`
`BIN 6
`\N NO-SP£'=C.~
`CO"-lD\"l'\O~ \ ~ OSEt) FOR
`~~='RE""CC:. "TI ME FOR.
`OTH'ER. REFLECT\O'-LS
`
`""teETH Cl.OSED
`TON<oUE RELA~eo
`\/liUJM OPEN
`
`r rP'nAQ.'<N'kOPal
`
`\lex.AL
`FO\_t)5
`RELA~O
`
`T\ME
`
`30
`
`FIG. 21A
`
`B\N NUt-\B~R.'5 ARt: At)'l\J~iet>
`01\l 5UBSEQUE\.\T SCll.\.lS TO
`MA\<&:. "'t'\oi1= ~,.~tlG'e.CS."T
`CS.\~NI.\L e.l:. IN SlN il 6, 'F~
`R.E.MA \N~ 45 TI LL-----.
`
`L\PS O?E\o-.\ SVT(cid:173)
`DoN'T MO\JE
`FORli:vJAR.t)
`
`'TON<:.U'i:. ii? L\ ~ "TO
`?"1.AlE. BE\-\\Nt) \\:.li=YH
`A"-lt) MA.Ke:."S C()NTAC.T
`tSc: f~ OPe-4 ~WO
`REOUCE RE.L.ECTION
`VELUM o..oses
`"TO t)\R.Eel" /).\'R.
`'THR.OUGH Mo..rrl-l
`
`~~ 'ft.\~ a...DSE:S
`A
`l-11tL..E
`
`\10<:AL.
`'FOL.OS
`UN\JO\<:.e:t>
`FOR. "'"I;_"
`
`Bl~':>
`
`l 3 S" 1
`246
`
`FIG. 21B
`
`Page 23 of 63
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 22 of 31
`
`6,006,175
`
`iEETM
`<:»L.l6\4'TL~ ~
`
`~ow..~ i\?
`M~ke5 co1-n~c.T
`
`l="CE \'5
`~~T~~T
`
`L\PS
`'Sl.16'-n\.'f
`FlDWER
`asFLECl1M6
`
`\IE.LUM CLOSE-0
`NO RErLECt\0"-l
`~"'-R.'<N~
`S\..\G,1-\TL'( Q.OSEt>
`\JOCA.LFOL05
`NOTMQIJ\~C"o
`"t'' , ~ uN\lo,cen
`
`l
`
`FIG. 21C
`
`TO~G.UE 'T\P DR.OPS
`8EI-I \Mt) 1'e.csn-t 1 "'TOMGUE ~OC.
`ALSO ~~~li.S
`
`TEET\-l OPEN MOR.E.
`
`L\P~ MO\le.
`FOR.VJARI:>
`
`Bl~'5
`
`I 3 5 1
`'l. 4
`I:,
`
`\16:LUM Oi>E"S
`1=0~ N"~L "' 0 ''
`
`\loc:AL l=Ol-'DS
`ARJ:. \10\CEt)
`W~\l.E
`SPEAK\""6 v.0'1
`
`l
`
`'T\Me.
`30
`
`FIG. 22A
`
`Page 24 of 63
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 23 of 31
`
`6,006,175
`
`TOlo.lG:.Ul=- C0~1'~"'T
`R.EL~seo ,o
`SA'( "'O ''
`
`TEET\-\
`OPeN MOR.E
`F,._c.e \S
`CONSTAW,
`
`L\P~
`SL\G.HTL.'f
`
`FOR.WAR.t> i
`
`\JE\..UM. 0Pe\4S
`A \..\1tLe l=OR.
`NA'SAL "o''
`
`P\-4ARXW~
`o~~ SlJc&\.m.Y
`l='OR. ''o''
`
`'JOCA.L FOL.t)~
`,<LOS\\\t6 ~Nt>
`I OPet\NG(FOR
`PHOWE:.M&i "'O"
`BUT "10T "FOR:"t.''
`
`!
`
`FIG. 22B
`
`Page 25 of 63
`
`
`
`U.S. Patent
`
`Dec.21,1999
`
`Sheet 24 of 31
`
`6,006,175
`
`< m
`N
`• 0
`t-4
`Jl.f
`-----
`
`/'
`
`r-:,,
`- .
`\J ,-
`•• UI
`.
`:
`-z
`a; 0 UJ J
`~ - i ,!
`- ~ ... ~ ~
`-
`~ .
`N U. {/)
`1 ui' 0 U)
`- ~ ~ ~
`~ ,,
`~ ~ ~ D-
`..,,_ ·~ u fl
`1d ("'I a...:
`
`~ I
`Ill
`'
`t-:~
`-u
`ffi ci
`' -'°
`fJ.
`•
`~ o,
`v~
`~ o,
`gll!
`tD -...
`....... -
`)? I'!
`-,J
`I ...
`'O
`w
`...
`~~
`
`-
`d.
`II
`2 r
`::>
`-
`al :z
`
`N -
`..
`.....,
`
`~
`
`~
`
`~
`m
`N
`•
`0
`t--1
`Jl.f
`
`~
`
`r
`
`'
`
`0
`
`"'
`
`-
`~
`tl1
`~
`~
`J
`%,-.
`
`I~
`.
`~ .
`~.
`~:
`~ ("J
`It
`'
`~~
`g~
`0 ~~
`I ~
`
`0
`
`~ o ..
`u. r-
`.
`......_
`oO
`..........
`wtJ
`F~
`
`..
`..
`0
`.,
`,
`
`'\
`
`0
`~
`
`,- ...0
`lD ,q-
`,n c""
`-
`d
`21
`- ~
`m :,
`
`'%
`
`Page 26 of 63
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 25 of 31
`
`6,006,175
`
`\
`\
`\
`\
`\
`
`\
`\
`
`~
`'J
`0
`rJ. .J
`\9 4(
`Y. 2
`v '9
`,cC-
`it) II)
`~
`\
`\
`\
`\
`\
`
`.::
`0
`,-
`~
`_.
`...
`~
`::.
`0
`0
`~
`
`II!
`~
`
`0
`~
`
`~
`~
`N
`•
`0
`- ~
`
`In
`
`~
`
`l") '3~0d~~ 6
`
`-
`
`0
`I
`
`~
`0
`I
`
`0
`1/l 0
`<:S
`I
`
`an
`i
`
`~
`8 .J
`a1
`\
`-
`"
`\
`"'
`\
`~
`
`\
`\
`\
`\
`\
`
`~
`0
`I
`
`Q <
`
`,n
`
`~
`N
`•
`0
`- ~
`In ~
`
`0
`d
`
`~
`
`w
`%
`0
`~
`
`~ -
`~
`
`~
`
`~
`0
`
`~
`0
`0
`l ") .3S1'\0dS3~
`
`0
`'
`
`Page 27 of 63
`
`
`
`U.S. Patent
`U.S. Patent
`
`Dec. 21, 1999
`Dec. 21, 1999
`
`Sheet 26 of 31
`Sheet 26 of 31
`
`6,006,175
`6,006,175
`
`
`
`
`
`BOvVsRIBINIDNSSiL/NOSMurUBANAWOUsSONWLSIG
`
`DreOIA
`
`~ 4 I
`!!
`VI -t-
`' z
`Q u
`"'
`I ~
`N
`t d
`•
`"4'~ C,
`'Z
`1--(
`::,
`~
`s
`~ u.
`llJ
`~
`~ -a
`
`tD ~
`
`ws3Tl
`“WA?+p‘|irrO
`
`I
`
`r
`
`r
`
`[
`[
`
`[
`
`I
`
`r
`
`I
`
`I
`
`I
`
`I
`
`r
`
`I
`
`I
`
`r
`
`I
`0
`
`[
`
`0
`
`;oa0W:oOr
`
`-z
`0 -
`t
`~
`~
`
`I
`
`~
`a
`)!
`IL
`:l
`
`I
`
`id
`"7
`i
`~
`UI
`~
`
`
`
`WwaoanINVsurne/
`
`o:~
`
`I
`
`I
`
`I
`
`I
`
`I
`
`~
`Y
`
`NONLINayANMNOLV4.4
`NO(IDaS323MYLAaMe)
`
`2
`Q --------...
`G
`111
`..J
`IL I
`lU
`
`I
`
`I
`
`I
`
`I
`
`I
`
`OY
`
`of4
`
`Page 28 of 63
`
`Page 28 of 63
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 27 of 31
`
`6,006,175
`
`u
`<
`~
`tn
`ln
`tn
`N
`N
`N
`•
`•
`•
`9
`C,
`C,
`~
`~
`i:i.,
`ii..
`i:i.,
`~ ~ ~
`rJ.
`d.
`£
`-
`-
`-
`J
`V
`0
`~
`(/)
`t;
`2
`-z
`~
`~ ffi
`~
`:> m
`=>
`J
`en
`8
`0
`~
`l:
`ffi
`~
`~
`w
`l!I
`
`-Ill "' "'
`
`-....J
`
`\)
`
`<(
`
`.. .. z
`
`~
`
`2
`3
`0
`ct.
`dl
`
`~
`
`v -:)
`
`0
`
`0
`~
`ill
`
`tl
`
`w
`._
`:!
`
`0 w
`-
`\)
`0
`I ..,
`7
`0
`:z.
`... ...
`co
`
`J
`}
`
`0
`
`Page 29 of 63
`
`
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 28 of 31
`
`6,006,175
`
`~
`lt)
`N
`• 0
`J---1
`~
`~
`0
`~ z
`~
`'E w
`
`~i
`oO
`:J E
`
`"~I
`
`j
`1JJ
`
`J
`
`~~
`
`}jl
`
`N
`
`J;l.c
`lt)
`N
`•
`0
`J---1
`~
`~
`~
`
`~
`~
`:1.
`u,
`:>
`0
`<f)
`u
`~
`4. w
`
`:
`QJ
`OJ
`,,
`
`/
`
`7-
`
`0 -f;
`~,~
`~
`3N
`0-
`
`v -~
`~s :)
`8
`~E H~
`4:
`llJ :r ,-
`
`~
`lt)
`N
`
`• 0 J---1
`
`-ti
`8
`
`?
`
`.c(
`
`~
`~
`rJ.
`u
`~
`~
`ill
`~
`UI
`
`w
`
`0
`uJ
`0-
`l:
`::>
`h
`
`ct
`w
`>
`0
`
`Page 30 of 63
`
`
`
`Dec. 21, 1999
`Dec. 21, 1999
`
`Sheet 29 of 31
`
`Sheet 29 of 31 = It)
`
`6,006,175
`6,006,175
`
`N
`
`VVVVVVVV
`AUUU
`
`0
`II)
`N
`
`DSc*DIA
`
`AOSNASWAj
`
`—_——
`
`"sss,
`
`U.S. Patent
`U.S. Patent
`
`DASANODY
`
`0
`
`Q
`
`Page 31 of 63
`
`ti)
`
`i :r
`
`GVadAQADION
`
`:. 0
`t') lJ,J
`t> u
`(I) -
`~ 0
`7
`
`NOVLISNVSLId
`
`:>
`..I
`0.
`
`Page 31 of 63
`
`
`
`Ul
`.....:a
`~
`....
`
`.... = = 0--,
`
`0--,
`
`'"""'
`~
`
`0 ....,
`0
`~
`
`~ ....
`'JJ. =(cid:173)~
`
`\0
`\0
`'"""'
`\0
`'"""' ~
`N
`ri
`~
`~
`
`~ = ......
`~ ......
`~
`•
`r:JJ.
`d •
`
`g-5~~ . I~ IFIG.26D
`
`~ 500-------------------------.
`
`Cl
`
`5
`
`~ FIG. 26C
`
`4.S"
`\ .r
`
`4
`
`'?>.S
`
`\....,rl
`
`2.
`v \,,I \..,..J V\ 1
`
`1\ME (45.ec.)
`3
`
`2.S
`
`I.S
`
`\. /
`
`\ _;
`
`0r-V W
`
`'500
`
`~
`::r
`
`4.5
`
`4
`
`'3.5
`
`TIME (sec..)
`3
`
`2.!5'
`
`2..
`
`I. 5
`
`J
`
`o.s
`
`0
`::::i-500
`~
`0
`
`500---------------------
`
`IFIG.26A
`
`5
`
`4,5
`
`1
`
`3,5
`1 -
`
`"T\ME (~c..)
`'3
`1
`
`1.5
`
`1
`
`2..
`1
`
`1,6
`1
`
`;_:~t ,
`
`0.5'
`
`0
`
`5
`
`-4.S
`
`4
`
`?,.5
`
`'T \ ME. ( ~ec:.)
`'3
`
`2.5
`
`2.
`
`\.'S
`
`I
`
`O.S"
`
`0
`
`g_
`
`Page 32 of 63
`
`
`
`--..
`::>
`V
`LU
`Cl)
`2
`0
`0.
`
`-1
`
`tJ -1.
`°'
`-3
`-4
`-I
`
`U.S. Patent
`
`Dec. 21, 1999
`
`Sheet 31 of 31
`
`6,006,175
`
`S\1<:TEEN
`
`S\Y, . ."T'(
`
`4
`3
`
`2
`
`\
`
`0
`
`0
`T\ME (s-c..)
`
`5
`
`T
`
`FIG.27
`
`Page 33 of 63
`
`
`
`6,006,175
`
`1
`METHODS AND APPARATUS FOR NON(cid:173)
`ACOUSTIC SPEECH CHARACTERIZATION
`AND RECOGNITION
`
`The United States Government has rights in this inven(cid:173)
`tion pursuant to Contract No. W-7405-ENG-48 between the
`United States Department of Energy and the University of
`California for the operation of Lawrence Livermore
`National Laboratory.
`
`BACKGROUND OF THE INVENTION
`The invention relates generally to speech recognition and
`more particularly to the use of nonacoustic information in
`combination with acoustic information for speech recogni(cid:173)
`tion and related speech technologies.
`Speech Recognition
`The development history of speech recogmt10n (SR)
`technology has spanned four decades of intensive research.
`In the '50s, SR research was focused on isolated digits,
`monosyllabic words, speaker dependence, and phonetic(cid:173)
`based attributes. Feature descriptions included a set of
`attributes like formants, pitch, voiced/unvoiced, energy,
`nasality, and frication, associated with each distinct pho(cid:173)
`neme. The numerical attributes of a set of such phonetic
`descriptions is called a feature vector. In the '60s, research(cid:173)
`ers addressed the problem that time intervals spanned by
`units like phonemes, syllables, or words are not maintained
`at fixed proportions of utterance duration, from one speaker
`to another or from one speaking rate to another. No adequate
`solution was found for aligning the sounds in time in such
`a way that statistical analysis could be used. Variability in
`phonetic articulation due to changes in speaker vocal organ
`positioning was found to be a key problem in speech
`recognition. Variability was in part due to sounds running
`together ( often causing incomplete articulation), or half-way
`organ positioning between two sounds ( often called
`coarticulation). Variability due to speaker differences were
`also very difficult to deal with. By the early '70s, the
`phonetic based approach was virtually abandoned because
`of the limited ability to solve the above problems. A much
`more efficient way to extract and store acoustic feature
`vectors, and relate acoustic patterns to underlying phonemic
`units and words, was needed.
`In the 1970s, workers in the field showed that short
`"frames" ( e.g., 10 ms intervals) of the time waveform could
`be well approximated by an all poles (but no zeros) analytic
`representation, using numerical "linear predictive coding"
`(LPC) coefficients found by solving covariance equations.
`Specific procedures are described in B. S. Atal and S. L.
`Hanauer, "Speech analysis and synthesis by linear prediction
`of the speech wave," J. Acoust. Soc. Am. 50(2), 637 (1971)
`and L. Rabiner, U.S. Pat. No. 4,092,493. Better coefficients
`for achieving accurate speech recognition were shown to be
`the Cepstral coefficients, e.g., S. Furui, "Cepstral analysis
`technique for automatic speaker verification," IEEE Trans. 55
`onAcoust. Speech and Signal Processing,ASSP-29 (2), 254,
`(1981). They are Fourier coefficients of the expansion of the
`logarithm of the absolute value of the corresponding short
`time interval power spectrum. Cepstral coefficients effec(cid:173)
`tively separate excitation effects of the vocal cords from 60
`resonant transfer functions of the vocal tract. They also
`capture the characteristic that human hearing responds to the
`logarithm of changes in the acoustic power, and not to linear
`changes. Cepstral coefficients are related directly to LPC
`coefficients. They provide a mathematically accurate 65
`method of approximation requiring only a small number of
`values. For example, 12 to 24 numbers are used as the
`
`5
`
`2
`component values of the feature vector for the measured
`speech time interval or "frame" of speech.
`The extraction of acoustic feature vectors based on the
`LPC approach has been successful, but it has serious limi-
`tations. Its success relies on being able to simply find the
`best match of the unknown waveform feature vector to one
`stored in a library (also called a codebook) for a known
`sound or word. This process circumvented the need for a
`specific detailed description of phonetic attributes. The
`10 LPC-described waveform could represent a speech
`phoneme, where a phoneme is an elementary word-sound
`unit. There are 40 to 50 phonemes in American English,
`depending upon whose definition is used. However, the LPC
`information does not allow unambiguous determination of
`15 physiological conditions for vocal tract model constraints.
`For example it does not allow accurate, unambiguous vocal
`fold on/off period measurements or pitch. Alternatively, the
`LPC representation could represent longer time intervals
`such as the entire period over which a word was articulated.
`20 Vector "quantization" (VQ) techniques assisted in handling
`large variations in articulation of the same sound from a
`potentially large speaker population. This helped provide
`speaker independent recognition capability, but the speaker
`normalization problem was not completely solved, and
`25 remains an issue today. Automatic methods were developed
`to time align the same sound units when spoken at a different
`rate by the same or different speaker. One successful tech(cid:173)
`niques was the Dynamic Time Warping algorithm which did
`a nonlinear time scaling of the feature coefficients. This
`30 provided a partial solution to the problem identified in the
`'60s as the nonuniform rate of speech.
`For medium size vocabularies (e.g., about 500 words), it
`is acceptable to use the feature vectors for the several speech
`units in a single word as basic matching units. During the
`35 late 1970s, many commercial products became available on
`the market, permitting limited vocabulary recognition.
`However, word matching also required the knowledge of the
`beginning and the end of the word. Thus sophisticated
`end-point (and onset) detection algorithms were developed.
`40 In addition, purposeful insertion of pauses by the user
`between words simplified the problem for many applica(cid:173)
`tions. This approach is known as discrete speech. However,
`for a larger vocabulary (e.g., >1000 words), the matching
`library becomes large and unwieldy. In addition, discrete
`45 speech is unnatural for human communications, but con(cid:173)
`tinuous speech makes end-point detection difficult. Over(cid:173)
`coming the difficulties of continuous speech with a large size
`vocabulary was a primary focus of speech recognition (SR)
`research in the '80s. To accomplish this, designers of SR
`50 systems found that the use of shorter sound units such as
`phonemes or PLUs (phone-like units) was preferable,
`because of the smaller number of units needed to describe
`human speech.
`In the '80s, a statistical pattern matching technique known
`as the Hidden Markov Model (HMM) was applied success(cid:173)
`fully in solving the problems associated with continuous
`speech and large vocabulary size. HMMs were constructed
`to first recognize the 50 phonemes, and to then recognize the
`words and word phrases based upon the pattern of pho-
`nemes. For each phoneme, a probability model is built
`during a learning phase, indicating the likelihood that a
`particular acoustic feature vector represents each particular
`phoneme. The acoustic system measures the qualities of
`each speaker during each time frame (e.g. 10 ms), software
`corrects for speaker rates, and forms Cepstral coefficients. In
`specific systems, other values such as total acoustic energy,
`differential Cepstral coefficients, pitch, and zero crossings
`
`Page 34 of 63
`
`
`
`6,006,175
`
`3
`are measured and added as components with the Cepstral
`coefficients, to make a longer feature vector. By example,
`assume 10 Cepstral coefficients are extracted from a con(cid:173)
`tinuous sp