`US 8,194,880 B2
`(0) Patent No.:
`Jun. 5, 2012
`(45) Date of Patent:
`Avendano
`
`US008194880B2
`
`(54) SYSTEM AND METHODFOR UTILIZING
`OMNI-DIRECTIONAL MICROPHONES FOR
`SPEECH ENHANCEMENT
`
`JP
`
`(75)
`
`Inventor: Carlos Avendano, Mountain View, CA
`(US)
`
`(73) Assignee: Audience, Inc., Mountain View, CA
`(US)
`
`(*) Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 1384 days.
`
`(21) Appl. No.: 11/699,732
`
`(22)
`
`(65)
`
`Filed:
`
`Jan. 29, 2007
`
`Prior Publication Data
`
`US 2008/0019548 Al
`
`Jan. 24, 2008
`
`Related U.S. Application Data
`
`(63) Continuation-in-part of application No. 11/343,524,
`filed on Jan. 30, 2006.
`
`(60) Provisional application No. 60/850,928, filed on Oct.
`10, 2006.
`
`(51)
`
`Int. Cl.
`(2006.01)
`HOAR 3/00
`(52) US.CL ....... 381/92; 381/94.1; 381/94.2; 381/943;
`381/94.7; 381/122; 704/226; 704/227; 704/233;
`704/275
`
`(58) Field of Classification Search .................. 381/313,
`381/312, 91, 92, 122, 95, 110, 94.1, 94.2,
`381/94.3, 94.7; 704/226, 227, 233, 275
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`3,976,863 A
`3,978,287 A
`4,137,510 A
`4,433,604 A
`4,516,259 A
`4,535,473 A
`
`8/1976 Engel
`8/1976 Fletcheretal.
`1/1979 Iwahara
`2/1984 Ott
`5/1985 Yato etal.
`8/1985 Sakata
`
`(Continued)
`
`Secondary
`Microphone
`108
`
`
`
`Primary
`Microphone 106
`
`4
`
`Audio
`Source 102
`
`FOREIGN PATENT DOCUMENTS
`62110349
`5/1987
`
`(Continued)
`
`OTHER PUBLICATIONS
`
`Mare Moonenet al. “Multi-Microphone Signal Enhancement Tech-
`niques for Noise Suppression and Dereverberation,” source(s): http://
`www.esat.kuleuven.ac.be/sista/yearreport97/node37 html.
`Steven Boll et al. “Suppression of Acoustic Noise in Speech Using
`Two Microphone Adaptive Noise Cancellation”, source(s): TEEE
`Transactions on Acoustic, Speech, and Signal Processing. vol. v
`ASSP-28, n 6, Dec. 1980, pp. 752-753.
`
`(Continued)
`
`Primary Examiner — Vivian Chin
`Assistant Examiner — Paul Kim
`
`(74) Attorney, Agent, or Firm — Carr & Ferrell LLP
`
`(57)
`
`ABSTRACT
`
`Systems and methods for utilizing inter-microphone level
`differences (ILD) to attenuate noise and enhance speechare
`provided. In exemplary embodiments, primary and second-
`ary acoustic signals are received by omni-directional micro-
`phones, and converted into primary and secondary electric
`signals. A differential microphone array module processes
`the electric signals to determine a cardioid primary signal and
`a cardioid secondary signal. The cardioid signals are filtered
`through a frequency analysis module whichtakes the signals
`and mimics a cochlea implementation (1.e., cochlear domain).
`Energylevels ofthe signals are then computed, and the results
`are processed by an ILD module using a non-linear combi-
`nation to obtain the ILD. In exemplary embodiments, the
`non-linear combination comprises dividing the energy level
`associated with the primary microphoneby the energy level
`associated with the secondary microphone. The ILD is uti-
`lized by a noise reduction system to enhancethe speech ofthe
`primary acoustic signal.
`
`28 Claims, 9 Drawing Sheets
`
`
`
`
`
`
`Noise
`4110
`
`Amazon v. Jawbone
`U.S. Patent 8,321,213
`
`Amazon Ex. 1005
`
`1
`
`Amazon v. Jawbone
`U.S. Patent 8,321,213
`Amazon Ex. 1005
`
`
`
`US 8,194,880 B2
`
`Page 2
`
`U.S. PATENT DOCUMENTS
`4.536.844 A
`8/1985 Lyon
`4,581,758 A
`4/1986 Coker et al.
`4698599 A
`19/1986 Borthet
`al
`4630304 A
`12/1986 Douthat
`4,649,505 A
`3/1987 Zinser, Jr. et al.
`4,658,426 A
`4/1987. Chabrieset al.
`4.674.195 A
`6/1987 Carlson et al
`4,718,104 A
`1/1988 Anderson
`4811404 A
`3/1989 Vil
`\
`4812996 A
`3/1989 Stuhte
`41864620 A
`9/1989 Bialick
`4.920.508 A
`4/1990. Yassaie etal.
`5,027,410 A
`6/1991 Williamsonetal.
`5,054,085 A
`10/1991 Meiselet al.
`S0s8419 A
`10/1991 Nord
`1
`2000-738
`3/190) Hor strom et al.
`SLOT A 6/1992 Belletal
`5.142.961 A
`9/1992 Paroutaud
`5150-413 A
`9/199) Nakatani et al
`5,175,769 A
`12/1992 Hejna,Jr. et al.
`5187776 A
`5/1993 Yank
`5°208'864 A
`5/1993 K "
`5010306 A
`5/1993 Sykes, It
`5.224.170 A
`6/1993. Waite,Jr.
`5930022 A
`7/1993 Sak
`5319736 A
`6/1994 ata
`5393450 A
`6/1994 Hono
`5.341.432 A
`8/1994. Suzuki etal.
`5,381,473 A
`1/1995 Andreaet al
`5,381,512 A
`1/1995. Holtonet al.
`5400409 A
`3/1998 Linhard
`5°402.493 A
`3/1998 Goldsiein
`5.402.496 A
`3/1995. Soli etal
`S471 195 A
`11/1995. Rickman
`5,473,702 A
`12/1995 Yoshida etal.
`5,473,759 A
`12/1995 Slaneyet al.
`5,479,564 A
`12/1995 Vogtenet al.
`5.502.663 A
`3/1996
`5'344°920 A
`3/1996 Uchanski
`Sst4so4 A
`11990 Spel
`5,583,784 A
`12/1996 Kapustet al.
`5,587,998 A
`12/1996 Velardo, Jr. et al.
`3500 241 A
`10/1996 Patk ct al
`5,602,962 A
`2/1997 Kellermann
`S675 778 A
`10/1907 J
`5682463 A
`10/1997 ‘Allen tal
`5604474 A
`«10/1997 N set i
`5706-305 A
`1998 Alon tal
`5717800 A
`7/1998 Takagi
`5,729,612 A
`3/1998. Abel et al.
`5,732,189 A
`3/1998 Johnstonet al.
`5,749,064 A
`5/1998 Pawateet al.
`3757937 A
`3/1998 Itoh et al
`5,792,971 A
`8/1998. Timisetal.
`5,796,819 A
`8/1998 Romesburg
`5806025 A
`9/1998 Vie
`at
`al
`5'809.463 A
`9/1998 Guptaefal
`5,825,320 A
`10/1998 Miyamori et al.
`5,839,101 A
`11/1998 Vahataloet al.
`5,920,840 A
`7/1999 Satyamutti et al.
`5.933.405 A
`8/1999 Oh
`5943-499 A
`3/1999 Handel
`5956674 A
`9/1999 Sim, © tal
`5974380 A
`10/1999 snyth eta
`5078804 A
`11/1999 pmyth
`.
`5983130 A
`11/1999
`ih 7
`5990405 A
`11/1999 ‘aute etal
`tal
`6002776 A
`12/1999 Bh “ka al.
`061456. A
`5/2000 padkamkatal eal.
`6.072881 A
`6/2000 Linder
`,
`6097320 A
`8/2000 Turner
`6.108.626 A
`8/2000 Cellario et al.
`6.122.610 A
`9/2000 Isabelle
`6,134,524 A
`10/2000 Peters etal.
`6,137,349 A
`10/2000 Menkhoff et al.
`6,140,809 A
`10/2000 Doi
`6,173,255 B
`1/2001 Wilsonet al.
`BSee
`6,180,273
`1/2001 Okamoto
`
`4/2001 Wuet al.
`6,216,103 Bl
`4/2001 Feng et al.
`6,222,927 B1
`4/2001 Brungart
`6,223,090 Bl
`5/2001 _Youetal.
`6,226,616 Bl
`7/2001 Arslan etal.
`6,263,307 BL
`eeeRT Lol vigeins ot al.
`ag’
`atsuo
`6,339,758 Bl
`1/2002 Kanazawaet al.
`6,355,869 Bl
`3/2002 Mitton
`6,363,345 Bl
`3/2002 Marashetal.
`6,381,570 B2
`4/2002 Lietal.
`6,430,295 Bl
`8/2002 Handeletal.
`Otoake BI
`5500. ett ama
`6469732 Bl
`10/2002. Chan *t al
`Ter
`pera
`6,487,257 Bl
`11/2002 Gustafssonetal.
`6,496,795 Bl
`12/2002 Malvar
`6,513,004 BL
`1/2003 Rigazioet al.
`6,516,066 B2
`2/2003 Hayashi
`6,529,606 Bl
`3/2003 Jackson,Jr. Iet al.
`6,549,630 Bl
`4/2003 Bobisuthi
`6,584,203 B2
`6/2003 Elko etal.
`6,622,030 BL
`9/2003 Romesburg etal.
`6,717,991 BL
`4/2004 Gustafssonetal.
`6,718,309 Bl
`4/2004 Selly
`6,738,482 Bl
`5/2004 Jaber
`6,760,450 B2
`7/2004 Matsuo
`6,785,381 B2
`8/2004 Gartneret al.
`6,792,118 B2
`9/2004 Watts
`6,795,558 B2
`9/2004 Matsuo
`6,798,886 Bl
`9/2004 Smith et al.
`6,810,273 Bl
`10/2004 Mattila et al.
`6,882,736 B2
`4/2005 Dickeletal.
`6,915,264 B2
`7/2005 Baumgarte
`6,917,688 B2
`7/2005 Yuetal.
`oeTo BD Doo. Beey st al
`6082377 B2
`1/2006 Saketooal
`G00°
`:
`'
`6,999,582 Bl
`2/2006 Popovic etal.
`7,016,507 Bl
`3/2006 Brennan
`7on05 B ¥2006 amy
`7'054.452 B2
`5/2006 Ukita
`.
`ee
`7,065,485 B1
`6/2006 Chong-Whiteet al.
`7,076,315 BI
`7/2006 Watts
`7,092,529 B2
`8/2006 Yuetal.
`7,092,882 B2
`8/2006 Arrowoodet al.
`7,099,821 B2
`8/2006 Visseret al.
`7,142,677 B2
`11/2006 Gonopolskiy
`7,146,316 B2
`12/2006 Alves
`Feyeoo BS eos Heo ama
`7171008 B2
`1/2007 Elko ¥
`ay?
`:
`7,171,246 B2
`1/2007 Mattila et al.
`aoeta BS
`qeer Chang etal.
`SOQ!
`ang et al
`7,209,567 Bl
`4/2007 Kozelet al.
`7,225,001 Bl
`5/2007 Erikssonetal.
`rteoek BS
`reer peotal
`795494) BD
`8/2007 Ise etal
`Se"
`:
`7,359,520 B2
`4/2008 Brennan etal.
`7,412,379 B2
`8/2008 Taori et al.
`7,433,907 B2
`10/2008 Nagaiet al.
`7,555,434 B2
`6/2009 Nomuraetal.
`7,949,522 B2
`5/2011 Hetheringtonet al.
`2001/0016020 Al
`8/2001 Gustafssonetal.
`2001/0031053 Al
`10/2001 Feng etal.
`2002/0002455 Al
`1/2002 Accardietal.
`2002/0009203 Al
`‘1/2002.
`_Erten
`2002/0041693 Al
`4/2002 Matsuo
`2002/0080980 Al
`6/2002 Matsuo
`2002/0106092 Al
`8/2002 Matsuo
`2002/0116187 Al
`8/2002 Erten
`2002/0133334 Al
`9/2002 Coorman etal.
`2002/0147595 Al
`10/2002 Baumgarte
`2002/0184013 Al
`12/2002 Walker
`2003/0014248 Al
`1/2003 Vetter
`2003/0026437 Al
`2/2003 Janse etal.
`
`2
`
`
`
`US 8,194,880 B2
`
`Page 3
`
`2003/0033140 Al
`2003/0039369 Al
`2003/0040908 Al
`2003/0061032 Al
`2003/0063759 Al
`2003/0072382 Al
`2003/0072460 Al
`2003/0095667 Al
`2003/0099345 Al
`2003/0101048 Al
`2003/0103632 Al
`2003/0128851 Al
`2003/0138116 Al
`2003/0147538 AL*
`2003/0169891 Al*
`2003/0228023 Al
`2004/0013276 Al
`2004/0047464 Al
`2004/0057574 Al
`2004/0078199 Al
`2004/0131178 Al
`2004/0133421 Al
`2004/0165736 Al
`2004/0196989 Al
`2004/0263636 Al
`2005/0025263 Al
`2005/0027520 Al
`2005/0049864 Al
`2005/0060142 Al
`2005/0152559 Al
`2005/0185813 Al
`2005/0213778 Al
`2005/0216259 Al
`2005/0228518 Al
`2005/0276423 Al
`2005/0288923 Al
`2006/0072768 Al
`2006/0074646 Al
`2006/0098809 Al
`2006/0120537 Al
`2006/0133621 Al
`2006/0149535 Al
`2006/0184363 Al
`2006/0198542 Al
`2006/0222184 Al
`2007/0021958 Al
`2007/0027685 Al
`2007/0033020 Al
`2007/0067166 Al
`2007/0078649 Al
`2007/0094031 Al
`2007/0100612 Al
`2007/0116300 Al
`3007/0150268 Al
`9007/0154031 Al
`2007/0165879 Al
`2007/0195968 Al
`2007/0230712 Al
`2007/0276656 Al
`2008/0033723 Al
`2008/0140391 Al
`2008/0201138 Al
`2008/0228478 Al
`2008/0260175 Al
`2009/0012783 Al
`2009/0012786 Al
`2009/0129610 Al
`2009/0220107 Al
`2009/0238373 Al
`2009/0253418 Al
`2009/0271187 Al
`2009/0323982 Al
`2010/0094643 Al
`2010/0278352 Al
`2011/0178800 Al
`
`2/2003 Taori etal.
`2/2003 Bullen
`2/2003 Yanget al.
`3/2003 Gonopolskiy
`4/2003 Brennan etal.
`4/2003 Raleighetal.
`4/2003 Gonopolskiy et al.
`5/2003 Watts
`5/2003 Gartneretal.
`5/2003 Liu
`6/2003 Goubran etal.
`7/2003 Furuta
`7/2003 Jonesetal.
`8/2003 Elko wc eeeeeeees 381/92
`9/2003 Ryanetal. we 381/92
`12/2003 Burnettetal.
`1/2004 Ellis et al.
`3/2004 Yuet al.
`3/2004 Faller
`4/2004 Kremeretal.
`7/2004 Shahaf etal.
`7/2004 Burnettetal.
`8/2004 Hetheringtonetal.
`10/2004 Friedman etal.
`12/2004 Cutler et al.
`2/2005 Wu
`2/2005 Mattila etal.
`3/2005 Kaltenmeieret al.
`3/2005 Visseretal.
`7/2005 Gierl et al.
`8/2005. Sinclair etal.
`9/2005 Buck etal.
`9/2005 Watts
`10/2005 Watts
`12/2005 Aubauer etal.
`12/2005 Kok
`4/2006 Schwartz et al.
`4/2006 Alveset al.
`5/2006 Nongpiuret al.
`6/2006 Burnettet al.
`6/2006 Chenetal.
`7/2006 Choietal.
`8/2006 McCree etal.
`9/2006 Benjelloun Touimietal.
`10/2006 Buck etal.
`1/2007. Visseret al.
`2/2007 Arakawaetal.
`2/2007 Francois etal.
`3/2007 Pan etal.
`4/2007 Hetheringtonetal.
`4/2007 Chen
`5/2007 Ekstrandet al.
`5/2007 Chen
`6/2007 Aceroet al.
`7/2007 Avendanoet al.
`7/2007 Dengetal.
`8/2007 Jaber
`10/2007 Belt etal.
`11/2007 Solbachet al.
`2/2008 Jang etal.
`6/2008 Yenet al.
`8/2008 Visseretal.
`9/2008 Hetheringtonetal.
`10/2008 Elko
`1/2009 Klein
`1/2009 Zhangetal.
`5/2009 Kim et al.
`9/2009 Every et al.
`9/2009 Klein
`10/2009 Makinen
`10/2009 Yen etal.
`12/2009 Solbach et al.
`4/2010 Avendanoet al.
`11/2010 Petit et al.
`7/2011 Watts
`
`JP
`JP
`
`FOREIGN PATENT DOCUMENTS
`04184400
`7/1992
`5053587
`3/1993
`
`JP
`JP
`JP
`JP
`JP
`JP
`JP
`JP
`JP
`JP
`JP
`WO
`WO
`WO
`WO
`WO
`WO
`WO
`WO
`WO
`WO
`
`05-172865
`06269083
`10-313497
`11-249693
`2004053895
`2004531767
`2004533155
`2005110127
`2005148274
`2005518118
`2005195955
`01/74118
`02080362
`02103676
`03/043374
`03/069499
`2003069499
`2004/010415
`2007/081916
`2007/140003
`2010/005493
`
`TN993
`9/1994
`11/1998
`9/1999
`2/2004
`10/2004
`10/2004
`4/2005
`6/2005
`6/2005
`7/2005
`10/2001
`10/2002
`12/2002
`5/2003
`8/2003
`8/2003
`1/2004
`7/2007
`12/2007
`1/2010
`
`OTHER PUBLICATIONS
`
`Chen Liu et al. “A two-microphone dual delay-line approach for
`extraction of a speech soundin the presence of multiple interferers”,
`source(s): Acoustical Society ofAmerica. vol. 110, Dec. 6, 2001, pp.
`3218-3231
`.
`Cohenet al. “Microphone Array Post-Filtering for Non-Stationary
`Noise”, source(s): IEEE. May 2002.
`Jingdong Chenetal. “New Insights into the Noise Reduction Wiener
`Filter’,
`source(s):
`IEEE Transactions on Audio, Speech,
`and.
`Langauge Processing. vol. 14, Jul. 4, 2006, pp. 1218-1234.
`Rainer Martin et
`al.
`“Combined Acoustic Echo Cancellation
`,
`.
`,
`:
`:
`.
`Dereverberation and Noise Reduction: A two Microphone
`Approach”, source(s): Annales des Telecommunications/Annals of
`Telecommunications. vol. 29, Jul. 7-8-Aug. 1994, pp. 429-438.
`Mitsunori Mizumachi et al. “Noise Reduction by Paired-Micro-
`phonesUsing Spectral Subtraction”, source(s): 1998 IEEE.pp. 1001-
`1004
`:
`7
`.
`.
`.
`.
`»
`LucasParraetal. Convolutive blind Separation ofNon-Stationary ;
`source(s): IEEE Transactions on Speech and Audio Processing. vol.
`8, May 3, 2008, pp. 320-327.
`Isreal Cohen. “Multichannel Post-Filtering in Nonstationary Noise
`Environment”, source(s): IEEE Transactions on Signal Processing.
`vol. 52, May 5, 2004, pp. 1149-1160.
`R.A. Goubran.“Acoustic Noise Suppression Using Regressive Adap-
`tive Filtering”, source(s): 1990 IEEE.pp. 48-53.
`Ivan Tashevetal. “Microphone Array of Headset with Spatial Noise
`S
`»
`.
`:
`:
`Suppressor”,
`source(s):
`http://research.microsoft.com/users/
`ivantash/Documents/Tashev_MAforHeadset_HSCMA_05.pdf. (4
`pages).
`.
`.
`.
`oo
`Martin Fuchsetal. “Noise Suppression for Automotive Applications
`Based on Directional Information”, source(s): 2004 IEEE. pp. 237-
`240.
`Jean-Marc Valin et al. “Enhanced Robot Audition Based on Micro-
`phone Array Source Separation with Post-Filter”’, source(s): Pro-
`ceedings of 2004 IEEE/RSJ International Conference on Intelligent
`Robots and Systems, Sep. 28-Oct. 2, 2004, Sendai, Japan. pp. 2123-
`2128.
`Jont B. Allen. “Short Term Spectral Analysis, Synthesis, and Modi-
`.
`.
`.
`e
`:
`fication by Discrete Fourier Transform”, IEEE Transactions on
`Acoustics, Speech, and Signal Processing. vol. ASSP-25, Jun. 3,
`1977. pp. 235-238.
`Jont B. Allen et al. “A Unified Approach to Short-Time Fourier
`Analysis and Synthesis”, Proceedings of the IEEE. vol. 65, Nov.11,
`1977. pp. 1558-1564.
`C. Avendano, “Frequency-Domain Techniques for Source Identifi-
`cation and Manipulation in Stereo Mixes for Enhancement, Suppres-
`sion and Re-Panning Applications,” in Proc. IEEE Workshop on
`Application of Signal Processing to Audio and Acoustics, Waspaa,
`03, New Paltz, NY, 2003.
`B. Widrow et al., “Adaptive Antenna Systems,” Proceedings IEEE,
`vol. 55, No. 12, pp. 2143-2159, Dec. 1967.
`
`3
`
`
`
`US 8,194,880 B2
`Page 4
`
`Avendano, Carlos, “Frequency-Domain Source Identification and
`Manipulation in Stereo Mixes for Enhancement, Suppression and
`Re-panning Applications,” 2003 IEEE Workshop on Applications of
`Signal Processing to Audio and Acoustics, Oct. 19-22, 2003, pp.
`55-58, New Peitz, New York, USA.
`Widrow,B. et al., “Adaptive Atenna Systems,” Dec. 1967, pp. 2143-
`2159, vol. 55 No. 12, Proceedings of the IEEE.
`Elko, Gary W., “Differential Microphone Arrays,” Audio Signal Pro-
`cessing for Next-Generation Multimedia Communication Systems,
`2004, pp. 12-65, Kluwer Academic Publishers, Norwell, Massachu-
`setts, USA.
`Boll, Steven F. “Suppression of Acoustic Noise in Speech using
`Spectral Subtraction”, IEEE Transactions on Acoustics, Speech and.
`Signal Processing, vol. ASSP-27, No. 2, Apr. 1979, pp. 113-120.
`Boll, Steven F. “Suppression of Acoustic Noise in Speech Using
`Spectral Subtraction”, Dept. of Computer Science, University of
`Utah Salt Lake City, Utah, Apr. 1979, pp. 18-19.
`Dahl, Mattiaset al., “Simultaneous Echo Cancellation and Car Noise
`Suppression Employing a Microphone Array”, 1997 IEEE Interna-
`tional Conference on Acoustics, Speech, and Signal Processing, Apr.
`21-24, pp. 239-242.
`“ENT 172.” Instructional Module. Prince George’s Community Col-
`lege Department of Engineering Technology. Accessed: Oct. 15,
`2011. Subsection: “Polar and Rectangular Notation”. <http://aca-
`demic.ppgec.edu/ent/ent172_instr_mod.html>.
`Fulghum,D. P. et al., “LPC Voice Digitizer with Background Noise
`Suppression”, 1979 IEEE International Conference on Acoustics,
`Speech, and Signal Processing, pp. 220-223.
`Graupe, Danielet al., “Blind Adaptive Filtering of Speech from Noise
`of Unknown Spectrum Using a Virtual Feedback Configuration”,
`IEEE Transactions on Speech and Audio Processing, Mar. 2000, vol.
`8, No. 2, pp. 146-158.
`Haykin, Simonetal. “Appendix A.2 Complex Numbers.”Signals and
`Systems. 2nd Ed. 2003. p. 764.
`in Proc.
`Hermansky, Hynek “Should Recognizers Have Ears?”,
`ESCATutorial and Research Workshop on Robust Speech Recogni-
`tion for Unknown Communication Channels, pp. 1-10, France 1997.
`Hohmann, V.
`“Frequency Analysis
`and Synthesis Using a
`Garmmatone Filterbank”, ACTA Acustica United with Acustica,
`2002, vol. 88, pp. 433-442.
`Jeffress, Lloyd A. et al. “A Place Theory of Sound Localizcion,”
`Journal of Comparative and Physiological Psychology, 1948, vol. 41,
`p. 35-39.
`Jeong, Hyuketal., “Implementation of a New Algorithm Using the
`STFT with Variable Frequency Resolution for the Time-Frequency
`Auditory Model”, J. Audio Eng. Soc., Apr. 1999, vol. 47, No. 4., pp.
`240-251.
`Kates, James M. “A Time-Domain Digital Cochlear Model”, IEEE
`Transactions on Signal Processing, Dec. 1991, vol. 39, No. 12, pp.
`2573-2592.
`Lazzaro, John et al., “A Silicon Model of Auditory Localization,”
`Neural Computation Spring 1989, vol. 1, pp. 47-57, Massachusetts
`Institute of Technology.
`Lippmann, Richard P. “Speech Recognition by Machines and
`Humans”, Speech Communication, Jul. 1997, vol. 22, No. 1, pp.
`1-15.
`Martin, Rainer “Spectral Subtraction Based on MinimumStatistics”,
`in Proceedings Europe. Signal Processing Conf., 1994, pp. 1182-
`1185.
`Mitra, Sanjit K. Digital Signal Processing: a Computer-based
`Approach. 2nd Ed. 2001. pp. 131-133.
`Watts, Lloyd Narrative of Prior Disclosure of Audio Display on Feb.
`15, 2000 and May 31, 2000.
`Cosi, Piero et al. (1996), “Lyon’s Auditory Model Inversion: a Tool
`for Sound Separation and Speech Enhancement,” Proceedings of
`ESCA Workshop on ‘The Auditory Basis of Speech Perception,’
`Keele University, Keele (UK), Jul. 15-19, 1996, pp. 194-197.
`Rabiner, LawrenceR.et al. “Digital Processing of Speech Signals”,
`(Prentice-Hall Series in Signal Processing). Upper Saddle River, NJ:
`Prentice Hall, 1978.
`Weiss, Ron et al., “Estimating Single-Channel Source Separation
`Masks: Revelance Vector MachineClassifiers vs. Pitch-Based Mask-
`ing”, Workshop on Statistical and Perceptual Audio Processing,
`2006.
`
`Schimmel, Steven et al., “Coherent Envelope Detection for Modula-
`tion Filtering of Speech,” 2005 IEEE International Conference on
`Acoustics, Speech, and Signal Processing,vol. 1, No. 7, pp. 221-224.
`Slaney, Malcom,“Lyon’s Cochlear Model”, Advanced Technology
`Group, Apple Technical Report #13, Apple Computer, Inc., 1988, pp.
`1-79.
`Slaney, Malcom, etal. “Auditory Model Inversion for Sound Sepa-
`ration,” 1994 IEEE International Conference on Acoustics, Speech
`and Signal Processing, Apr. 19-22, vol. 2, pp. 77-80.
`Slaney, Malcom.“An Introduction to Auditory Model Inversion”,
`Interval Technical Report IRC 1994-014, http://coweb.ecn.purdue.
`edu/~maclom/interval/1994-014/, Sep. 1994, accessed on Jul. 6,
`2010.
`Solbach, Ludger “An Architecture for Robust Partial Tracking and
`Onset Localization in Single Channel Audio Signal Mixes”, Techni-
`cal University Hamburg-Harburg, 1998.
`Stahl, V. et al., “Quantile Based Noise Estimation for Spectral Sub-
`traction and WienerFiltering,” 2000 IEEE International Conference
`on Acoustics, Speech, and Signal Processing, Jun. 5-9, vol. 3, pp.
`1875-1878.
`Syntrillium Software Corporation, “Cool Edit Users Manual”, 1996,
`pp. 1-74.
`Tchorz, Jurgen etal., “SNR Estimation Based on Amplitude Modu-
`lation Analysis with Applications to Noise Suppression”, TEEE
`Transactions on Speech and Audio Processing, vol. 11, No. 3, May
`2003, pp. 184-192.
`Watts, Lloyd, “Robust Hearing Systems for Intelligent Machines,”
`Applied Neurosystems Corporation, 2001, pp. 1-5.
`Yoo, Heejonget al., “Continuous-Time Audio Noise Suppression and
`Real-Time Implementation”, 2002 IEEE International Conference
`on Acoustics, Speech, and Signal Processing, May 13-17, pp.
`TV3980-1V3983.
`International Search Report dated Jun. 8, 2001 in Application No.
`PCT/US01/08372.
`International Search Report dated Apr. 3, 2003 in Application No.
`PCT/US02/36946.
`International Search Report dated May 29, 2003 in Application No.
`PCT/US03/04124.
`International Search Report and Written Opinion dated Oct. 19, 2007
`in Application No. PCT/US07/00463.
`International Search Report and Written Opinion dated Apr. 9, 2008
`in Application No. PCT/US07/2 1654.
`International Search Report and Written Opinion dated Sep. 16, 2008
`in Application No. PCT/US07/12628.
`International Search Report and Written Opinion dated Oct. 1, 2008
`in Application No. PCT/US08/08249.
`International Search Report and Written Opinion dated May 11, 2009
`in Application No. PCT/US09/0 1667.
`International Search Report and Written Opinion dated Aug. 27, 2009
`in Application No. PCT/US09/038 13.
`International Search Report and Written Opinion dated May20, 2010
`in Application No. PCT/US09/06754.
`Fast Cochlea Transform, US Trademark Reg. No. 2,875,755 (Aug.
`17, 2004).
`Dahl, Mattias et al., “Acoustic Echo and Noise Cancelling Using
`Microphone Arrays”, International Symposium on Signal Processing
`and its Applications, ISSPA, Gold coast, Australia, Aug. 25-30, 1996,
`pp. 379-382.
`Demol, M. et al. “Efficient Non-Uniform Time-Scaling of Speech
`With WSOLA for CALL Applications”, Proceedings of InSTIL/
`ICALL2004—NLP and Speech Technologies in Advanced Lan-
`guage Learning Systems—Venice Jun. 17-19, 2004.
`Laroche, Jean. “Time and Pitch Scale Modification of Audio Sig-
`nals”, in “Applications of Digital Signal Processing to Audio and
`Acoustics”, The Kluwer International Series in Engineering and.
`Computer Science, vol. 437, pp. 279-309, 2002.
`Moulines, Eric et al., “Non-Parametric Techniques for Pitch-Scale
`and Time-Scale Modification of Speech”, Speech Communication,
`vol. 16, pp. 175-205, 1995.
`Verhelst, Werner, “Overlap-Add Methods for Time-Scaling of
`Speech”, Speech Communication vol. 30, pp. 207-221, 2000.
`
`* cited by examiner
`
`4
`
`
`
`U.S. Patent
`
`Jun. 5, 2012
`
`Sheet 1 of 9
`
`US 8,194,880 B2
`
`
`
`OSION
`
`Or
`
`Asepuooas
`
`auoYydolaiyy
`
`801
`
`@SION
`
`OLL
`
`80l901—
`
`Q)Sls
`
`olpny
`
`20INOS
`
`cOL
`
`BLOld
`
`vol
`
`
`
`90}.auoYydoolyy
`
`Arewiid
`
`ZOL80sn0S
`
`olpny
`
`5
`
`
`
`
`U.S. Patent
`
`Jun. 5, 2012
`
`Sheet 2 of 9
`
`US 8,194,880 B2
`
`
`
`
`
`¢Olas
`
`
`
`gdlAeqindjno
`
`902
`
`
`
`Hulssad0/goipny
`
`auibuy
`
`v0Z
`
`
`
`AsepuooasAsewud
`
`
`
`SUOYCOII|yauoydooiyy
`
`sol901
`
`JOSS9B9N0IJ4
`
`
`
`vOLad!AaqOlpny
`
`6
`
`
`
`
`
`
`U.S. Patent
`
`Jun. 5, 2012
`
`Sheet 3 of 9
`
`US 8,194,880 B2
`
`~@
`
`yndjnoOL
`
`BdIAeq
`
`
`
`
`
`
`
`uolonpayasionFIEcle
`
`
`sisoujuASBuiyse-
`
`Aduenbai4"
`
` ||—_OdeaC
` Bulyyoowscron|Eg
`
`
`3iNpow
`Ole
`----_e8
`
`
`SINPOWLg|nefouls
`OSION
`
`WaysAS
`
`Ore
`
`
`
`
`
`ainpo
`
`
`
`49}
`
`|
`
`qi
`
`ainpow
`
`80€
`
`
`
`
`
`aulbuyBulsseoojgolpny
`
`0d
`
`ainpoyyamply
`vOE_—
`
`
`
`VAG
`
`
`
`'X
`
`“x
`
`7
`
`
`
`
`
`uononped|G4
`sINPOWCTI
`ainpowAbieug
`Aouanbel4
`
`80€
`
`90¢
`
`
`
`sajnpoy;ysisAjeuy
`
`U.S. Patent
`
`Jun. 5, 2012
`
`Sheet 4 of 9
`
`US 8,194,880 B2
`
`OSION
`
`Wwa}shS
`
`OLE
`
`COESINPOWVIG
`
`8
`
`
`
`U.S. Patent
`
`Jun. 5, 2012
`
`Sheet 5 of 9
`
`US 8,194,880 B2
`
` ZH8
`
`
`
`7Q=]aul)Aejap<e(zZ)q
`
`
`
`
`
`(J9PJO,8ZL)6ZL=14ONYMl4€(Z)4
`
`qvOla
`
`JOPOLJOUlle(Zz)by
`JOP1OOLJOUUNE(z)Oy
`OL=]oulAejep<(z)%q
`
`
`
`OL=71aulAejap<(z)bq
`
`9
`
`
`
`U.S. Patent
`
`Jun. 5, 2012
`
`Sheet 6 of 9
`
`US 8,194,880 B2
`
`@SION
`
`uolonpey
`
`wa}shS
`
`OLe
`
`80¢
`
`90€
`
`sINPOWCl
`ainpo-Abseug
`COESINPOWVG
`
`GSls
`
`10
`
`
`
`
`
`
`U.S. Patent
`
`Jun. 5, 2012
`
`Sheet7 of 9
`
`US 8,194,880 B2
`
`
`
`
`
`
`
`FIG.6 ereseyen
`
`11
`
`11
`
`
`
`U.S. Patent
`
`Jun. 5, 2012
`
`Sheet 8 of 9
`
`US 8,194,880 B2
`
`700
`
`Received Audio Signals
`
`
`
`Perform Differential Array
`
`704
`
`
`
`\ 702
`Analysis
`
`
`Perform Frequency Analysis
`
`Compute Energy
`708
`
` Y
`
`706
`
`
`
`Computer Inter-Microphone
`Level Difference
`
`710
`
`.
`Processing
`
`712
`
` Perform Noise Reduction
`,
`OutputAudio Signal
`hm
`
`
`
`End
`
`FIG. 7
`
`12
`
`12
`
`
`
`U.S. Patent
`
`Jun. 5, 2012
`
`Sheet9 of 9
`
`US 8,194,880 B2
`
`412
`
`\
`
`
`
`Estimate Noise
`
`802
`
`
`
`
`
`i
`Estimate Filter
`204
`
`806
`
`
`
`
`808
`
`810
`
`FIG. 8
`
`13
`
`13
`
`
`
`US 8,194,880 B2
`
`1
`SYSTEM AND METHOD FOR UTILIZING
`OMNI-DIRECTIONAL MICROPHONES FOR
`SPEECH ENHANCEMENT
`
`CROSS-REFERENCE TO RELATED
`APPLICATION
`
`The present application claimsthe priority benefit of U.S.
`Provisional Patent Application No. 60/850,928, filed Oct. 10,
`2006, and entitled “Array Processing Technique for Produc-
`ing Long-Range ILD Cues with Omni-Directional. Micro-
`phonePair;”the present application is also a continuation-in-
`part ofU.S. patent application Ser. No. 11/343,524, filed Jan.
`30, 2006 andentitled “System and Methodfor Utilizing Inter-
`Microphone Level Differences for Speech Enhancement,”
`which claimsthe priority benefit of U.S. Provisional Patent
`Application No. 60/756,826, filed Jan. 5, 2006, and entitled
`“Inter-Microphone Level Difference Suppresor,”all ofwhich
`are herein incorporated by reference.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of Invention
`
`Thepresent invention relates generally to audio processing
`and more. particularly to speech enhancement using inter-
`microphonelevel differences.
`2. Description of Related Art
`Currently, there are many methods for reducing back-
`ground noise and enhancing speech in an adverse environ-
`ment. One such methodis to use two or more microphones on
`an audio device. These microphonesare in prescribed posi-
`tions and allow the audio device to determine a level differ-
`ence between the microphonesignals. For example, due to a
`space difference between the microphones, the difference in
`times of arrival of the signals from a speech source to the
`microphones may be utilized to localize the speech source.
`Once localized, the signals can be spatially filtered to sup-
`press the noise originating from the different directions.
`In order to take advantage ofthe level difference between
`two omni-directional microphones, a speech source needsto
`be closerto one ofthe microphones. Thatis, in order to obtain
`a significant level difference, a distance from the source to a
`first microphoneneedsto be shorter than a distance from the
`source to a second microphone. As such, a speech source must
`remain in relative closeness to the microphones, especially if
`the microphonesare in close proximity as may be required by
`mobile telephony applications.
`A solution to the distance constraint may be obtained by
`using directional microphones. Using directional micro-
`phonesallowsa user to extend an effective level difference
`between the two microphones over a larger range with a
`narrow inter-level difference (ILD) beam. This may be desir-
`able for applications such as push-to-talk (PTT) or video-
`phones where a speech sourceis not in as close a proximity to
`the microphones, as for example, a telephone application.
`Disadvantageously, directional microphones have numer-
`ous physical drawbacks. Typically, directional microphones
`are large in size and do not fit well in small telephones or
`cellular phones. Additionally, directional microphones are
`difficult to mount as they required ports in order for sounds to
`arrive from a plurality of directions. Slight variations in
`manufacturing may result in a mismatch, resulting in more
`expensive manufacturing and production costs.
`Therefore, it is desirable to utilize the characteristics of
`directional microphones in a speech enhancement system,
`withoutthe disadvantages of using directional microphones,
`themselves.
`
`2
`SUMMARY OF THE INVENTION
`
`Embodiments of the present invention overcome or sub-
`stantially alleviate prior problemsassociated with noise sup-
`pression and speech enhancement. In general, systems and
`methods for utilizing inter-microphone level differences
`(ILD) to attenuate noise and enhance speech are provided.In
`exemplary embodiments, the ILD is based on energy level
`differences of a pair of omni-directional microphones.
`Exemplary embodiments of the present invention use a
`non-linear process to combine components of the acoustic
`signals from the pair of omni-directional microphones in
`order to obtain the ILD. In exemplary embodiments, a pri-
`mary acoustic signal is received by a primary microphone,
`and a secondary acoustic signal is received by a secondary
`microphone(e.g., omni-directional microphones). The pri-
`mary and secondary acoustic signals are converted into pri-
`mary and secondary electric signals for processing.
`A differential microphone array (DMA) module processes
`the primary and secondary electric signals to determine a
`cardioid primary signal and a cardioid secondary signal. In
`exemplary embodiments, the primary and secondary electric
`signals are delayed by a delay node. The cardioid primary
`signal is then determined by taking a difference between the
`primary electric signal and the delayed secondary electric
`signal, while the cardioid secondary signal is determined by
`taking a difference between the secondary electric signal and
`the delayed primary electric signal. In various embodiments
`the delayed primary electric signal and the delayed secondary
`electric signal are adjusted by a gain. The gain maybea ratio
`between a magnitude of the primary acoustic signal and a
`magnitude of the secondary acoustic signal.
`The cardioidsignals are filtered through a frequency analy-
`sis module which takes the signals and mimics the frequency
`analysis of the cochlea (i.e., cochlear domain) simulated in
`this embodimentbya filter bank. Alternatively, otherfilters
`such as short-time Fourier transform (STFT), sub-bandfilter
`banks, modulated complex lapped transforms, cochlear mod-
`els, wavelets, etc. can be used for the frequency analysis and
`synthesis. Energy levels associated with the cardioid primary
`signal and the cardioid secondary signals are then computed
`(e.g., aS powerestimates) and the results are processed by an
`ILD module using a non-linear combination to obtain the
`ILD. In exemplary embodiments, the non-linear combination
`comprises dividing the power estimate associated with the
`cardioid primary signal by the powerestimate associated with
`the cardioid secondary signal. The ILD maythen be used as a
`spatial discrimination cue in a noise reduction system to
`suppress unwanted sound sources and enhance the speech.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`FIG. 1a and FIG.1are diagramsof two environments in
`which embodiments of the present invention may be prac-
`ticed.
`
`60
`
`65
`
`FIG. 2 is a block diagram of an exemplary audio device
`implementing embodiments ofthe present invention.
`FIG. 3 isa block diagram ofan exemplary audio processing
`engine.
`FIG. 4a illustrates an exemplary implementation of the
`DMA module, frequency analysis module, energy module,
`and the ILD module.
`
`FIG. 461s an exemplary implementation ofthe DMA mod-
`ule.
`FIG.5 is a block diagram ofan alternative embodimentof
`the present invention.
`
`14
`
`14
`
`
`
`US 8,194,880 B2
`
`3
`FIG.6 is a polar plot of a front-to-back cardioid directivity
`pattern and ILD diagram produced according to embodi-
`ments of the present invention.
`FIG.7 is a flowchart of an exemplary methodfor utilizing
`ILD of omni-directional microphones for speech enhance-
`ment.
`
`FIG. 8 is a flowchart of an exemplary noise reduction
`process.
`
`DESCRIPTION OF EXEMPLARY
`EMBODIMENTS
`
`10
`
`The present invention provides exemplary systems and
`methods for utilizing inter-microphone level differences
`(ILD) of at least two microphones to identify frequency
`regions dominated by speech in order to enhance speech and
`attenuate backgroundnoise andfar-field distracters. Embodi-
`ments ofthe present invention maybe practiced on any audio
`device that is configured to receive sound such as, but not
`limited to, cellular phones, phone handsets, headsets, and
`conferencing systems. Advantageously, exemplary embodi-
`ments are configured to provide improved noise suppression
`on small devices and in applications where the main audio
`sourceis far from the device. While some embodimentsofthe
`present invention will be described in reference to operation
`ona cellular phone,the present invention maybe practiced on
`any audio device.
`Referring to FIG. 1a and FIG. 15, environments in which
`embodiments of the present invention may be practiced are
`shown. A user provides an audio (speech) source 102 to an
`audio device 104. The exemplary audio device 104 comprises
`two microphones: a primary microphone106 relative to the
`audio source 102 and a secondary microphone 108 located a
`distance, d, away from the primary microphone 106. In exem-
`plary embodiments, the microphones 106 and 108 are omni-
`directional microphones.
`While the microphones 106 and 108 receive sound(i.e.,
`acoustic signals) from the audio source 102, the microphones
`106 and 108 also pick up noise 110. Althoughthe noise 110 is
`shown coming from a single location in FIG. 1a and FIG.18,
`the noise 110 may comprise any sounds from one or more
`locations different than the audio source 102, and may
`include reverberations and echoes.
`Embodiments of the present invention exploit level differ-
`ences(e.g., energy differences) between the acoustic signals
`received by the two microphones 106 and 108 independent of
`how thelevel differences are obtained. In FIG. 1a, because the
`primary microphone 106 is much closer to the audio source
`102 than the secondary microphone108, the intensity level is
`higher for the primary microphone 106 resulting in a larger
`energy level during a speech/voice segment, for example. In
`FIG. 18, because directional response of the primary mi