`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 1 of 39
`
`
`
`
`
`
`
`
`EXHIBIT 7
`EXHIBIT 7
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 2 of 39
`US008503691B2
`
`USOO8503691B2
`
`(12)United States Patent
`Burnett
`
`(10)Patent No.:
`(45)Date of Patent:
`
`US 855035691 B2
`*Aug. 6, 2013
`
`(54) VIRTUAL MICROPHONE ARRAYS USING
`DUAL OMNIDIRECTIONAL MICROPHONE
`ARRAY (DOMA)
`
`(75) Inventor: Gregory C. Burnett, Dodge Center, MN
`(US)
`
`(73) Assignee: AliphCom, San Franeiseo, CA (US)
`
`(* ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) hy 1050 days.
`This patent is suhjeet to a terminal dis
`claimer.
`
`(21)
`
`.: 12/139,333
`Appl.No
`
`(22) Filed:
`
`Jun. 13, 2008
`
`(65)
`
`Prior Publication Data
`US 2009/0003623 Al Jan. 1, 2009
`R이ated U.S. Application Data
`(60) Provisional application No. 60/934,551, filed on Jnn.
`13, 2007, provisional application No. 60/953,444,
`filed on Ang. 1, 2007, provisional application No.
`60/954,712, filed on Ang. 8, 2007, provisional
`application No. 61/045,377, filed on Apr. 16, 2008.
`
`(51) Int. Cl.
`H04R 3/00
`
`(2006.01)
`
`(52) U.S. Cl.
`USPC ...... 381/92; 381/94.7; 704/233; 704/E21.004
`(58) Field of Classification Search
`USPC .................... 381/92, 94.7; 704/233, E21.004
`See applieation file for complete seareh history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`5,473,701 A * 12/1995 Cezanne et al.................... 381/92
`7,386,135 B2* 6/2008 Fan ................................... 381/92
`* cited by examiner
`
`Primary Examiner 一 Howard Weiss
`(74) Attorney, Agent, or Firm 一 Kokka & Backus, PC
`
`(57)
`ABSTRACT
`A dnal omnidirectional microphone array noise suppression
`is described. Compared to conventional arrays and algo
`rithms, which seek to reduce noise by nulling out noise
`sourees, the array of an embodiment is used to form two
`distinct virtual directional microphones whieh are eonfigured
`to have very similar noise responses and very dissimilar
`speech responses. The only null formed is one used to remove
`the speeeh of the user from'V)The two virtual microphones
`may be paired with an adaptive filter algorithm and VAD
`algorithm to significantly reduce the noise without distorting
`the speeeh, signifieantly improving the SNR of the desired
`speeeh over conventional noise suppression systems.
`
`46 Claims, 17 Drawing Sheets
`
`
`
`u
`
`s
`
`P a f e n f
`
`Aura・ 69 2013
`
`shee- 一 of 17
`
`US。잉
`
`39691 B2
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 3 of 39
`
`100
`
`(((:)))、
`
`SIGNAL
`s(n)
`
`101
`
`(粉))'
`
`NOISE
`n(n)
`
`哄
`
`Hi(z)
`
`s(n)
`
`n(n)
`
`104
`
`VAD
`
`Voicing Information
`
`MICl
`102
`
`100
`
`Noise Removal
`
`Cleaned Speech
`
`叱间
`
`DOMA
`110
`恤) ヽ
`亠ど
`
`mi(n)
`
`噸)
`
`MIC 2
`103
`
`FIG.l
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 4 of 39
`
`U.S. Patent
`
`Aug. 6, 2013
`
`Sheet 2 of 17
`
`US 8,503,691 B2
`
`yII
`
`<----------- --------------A
`d〇 i d〇
`
`FIG.2
`
`M Output A]
`V
`FIG.3
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 5 of 39
`
`U.S. Patent
`
`Aug. 6, 2013
`
`Sheet 3 of 17
`
`US 8,503,691 B2
`
`FIG.4
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 6 of 39
`
`U.S. Patent
`
`Aug. 6, 2013
`
`Sheet 4 of 17
`
`US &503,691 B2
`
`VN
`
`FIG.5
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 7 of 39
`
`U.S. Patent
`
`Aug. 6, 2013
`
`Sheet 5 of 17
`
`US 8,503,691 B2
`
`FIG.6
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 8 of 39
`
`U.S. Patent
`
`Aug. 6, 2013
`
`Sheet 6 of 17
`
`US &503,691 B2
`
`702
`
`704
`
`706
`
`708
`
`710
`
`802
`
`804
`
`700 ノ
`
`FIG.7
`
`Form physical microphone array including first
`physical microphone and second physical microphone.
`
`1 r
`
`Fonn virtual microphone array including first virtual
`microphone and second virtual microphone usin흥
`signals from physical microphone array.
`800 ノ
`
`FIG.8
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 9 of 39
`
`U.S. Patent
`
`Aug. 6, 2013
`
`Sheet 7 of 17
`
`US 8,503,691 B2
`
`Linear response ofV2 to a speech source at 0.10 meters
`
`Linear response ofV2 to a noise source at 1 meters
`90 0.8
`
`FIG. 10
`
`270
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 10 of 39
`
`U.S. Patent
`
`Aug. 6, 2013
`
`Sheet 8 of 17
`
`US 8,503,691 B2
`
`Linear response of VI to a noise source at 1 meters
`
`FIG. 12
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 11 of 39
`
`U.S. Patent
`
`Aug. 6, 2013
`
`Sheet 9 of 17
`
`US 8,503,691 B2
`
`Linear response of VI to a 叩eech source at 0.1 meters
`2
`90
`
`270
`
`FIG. 13
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 12 of 39
`
`U.S. Patent
`
`Aug. 6, 2013
`
`Sheet 10 of 17
`
`US 8,503,691 B2
`
`Frequency (Hz)
`
`FIG.14
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 13 of 39
`
`U.S. Patent
`
`Aug. 6, 2013
`
`Sheet 11 of 17
`
`US 8,503,691 B2
`
`VI (top, dashed) and V2 speech response vs. B assuming(,= 0.1m
`
`〇|
`
`I
`
`I
`
`I
`
`I
`
`I
`
`I
`
`I
`
`1.1
`
`0.6
`
`0.7
`
`0.4
`
`0.5
`
`(8P) 0SO&W.M
`
`也 专 gds JOJでVIA
`
`0.9
`
`1
`
`0.8
`B
`FIG. 15
`V1/V2 for speech versus B assuming dg = 0.1 끼
`
`0.4
`
`0.5
`
`0.6
`
`0.7
`
`0.8
`B
`FIG. 16
`
`0.9
`
`1
`
`1.1
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 14 of 39
`
`U.S. Patent
`
`Aug. 6, 2013
`
`Sheet 12 of 17
`
`US 8,503,691 B2
`
`FIG.17
`B versus theta assuming dg =0.1m
`
`1.25
`
`1.2
`
`1.15
`
`1.1
`
`1.05
`
`1
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 15 of 39
`
`U.S. Patent
`
`Aug. 6, 2013
`
`Sheet 13 of 17
`
`US 8,503,691 B2
`
`1 4기 , 須> 4
`
`(8므 gpn클 병
`
`〇
`
`《s
`
`eqd
`
`N(s) for B = 1 and D = -7.2e-006 seconds
`
`1000 2000 3000 4000 5000 6000 7000 8000
`
`1000 2000 3000 4000 5000 6000 7000 8000
`Frequency (Hz)
`
`FIG.19
`
`〇
`
`〇
`
`〇
`
`〇
`
`〇
`
`〇
`〇
`
`2 4 6 8 〇
`(當 胃 )
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 16 of 39
`
`U.S. Patent Aug. 6, 2013
`
`Sheet 14 of 17
`
`US &503,691 B2
`
`Frequency (Hz)
`
`FIG.20
`
`fflp) 은
`
`吾 d병
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 17 of 39
`
`U.S. Patent
`
`Aug. 6, 2013
`
`Sheet 15 of 17
`
`US 8,503,691 B2
`
`Cancellation with dl = 1, theta 1 = 0, d2 = 1, and theta2 = 30
`
`1 4「 つI
`(坦 므 与들 병
`
` T 4
`
`0 5 0 5 0 5 0
`9 8 8 7 7 6 6
`
`(sg 巴 89p) 9s 뚀 d
`
`Frequency (Hz)
`
`FIG.21
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 18 of 39
`
`U.S. Patent
`
`Aug. 6, 2013
`
`Sheet 16 of 17
`
`US 8,503,691 B2
`
`Cancellation with dl = 1, theta 1 = 0, d2 = 1, and theta2 = 45
`
`1 서 “ 瑚 4
`
`(mp)
`
`gp 를 mV
`
`Frequency (Hz)
`
`FIG.22
`
`0 5 0 5 0 5 0
`9 8 8 7 7 6 6
`
`(sgpRgp) «S 뚀 d
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 19 of 39
`
`U.S. Patent
`
`Aug. 6, 2013
`
`Sheet 17 of 17
`
`US 8,503,691 B2
`
`Original VI (top) and cleaned VI (bottom) with simplified VAD (dashed) in noise
`
`〇 0.5
`
`1.5
`1
`Time (samples at 8 kHz/sec)
`
`2
`
`2.5
`xl〇5
`
`FIG.23
`
`p g u s K )
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 20 of 39
`
`US 8,503,691 B2
`
`VIRTUAL MICROPHONE ARRAYS USING
`DUAL OMNIDIRECTIONAL MICROPHONE
`ARRAY (DOMA)
`
`RELATED APPEICATIONS
`
`This application claims the benefit ofU.S. Patent Applica
`tion Nos. 60/934,551, filed Jun. 13, 2007, 60/953,444, filed
`Aug. 1, 2007, 60/954,712, filed Aug. 8, 2007, and 61/045,
`377, filed Apr. 16, 2008.
`
`TECHNICAL EIEED
`
`The disclosure herein relates generally to noise suppres
`sion. In particular, this disclosure relates to noise suppression
`systems, deviees, and methods for use in aeoustie applica
`tions.
`
`BACKGROUND
`
`Conventional adaptive noise suppression algorithms have
`been around for some time. These eonventional algorithms
`have used two or more mierophones to sample both an (un
`wanted) aeoustie noise field and the (desired) speeeh of a user.
`The noise relationship between the mierophones is then
`determined using an adaptive filter (such as Least-Mean-
`Squares as described in Haykin & Widrow,
`ISBN#0471215708, Wiley, 2002, but any adaptive or station
`ary system identification algorithm may be used) and that
`relationship used to filter the noise from the desired signal.
`Most conventional noise suppression systems eurrently in
`use for speech communication systems are based on a single
`microphone speetral subtraction technique first develop in the
`197〇's and deseribed, for example, by S. F. Boll in "'Suppres
`sion of Acoustic Noise in Speeeh using Speetral Subtraction,''
`IEEE Trans. on ASSP, pp. 113-120, 1979. These techniques
`have been refined over the years, but the basie principles of
`operation have remained the same. See, for example, U.S. Pat.
`No. 5,687,243 of McLaughlin, et al., and U.S. Pat. No. 4,811,
`404 ofVilmur, et al. There have also been several attempts at
`multi-microphone noise suppression systems, sueh as those
`outlined in U.S. Pat. No. 5,406,622 of Silverberg et al. and
`U.S. Pat. No. 5,463,694 of Bradley et al. Multi-microphone
`systems have not been very successful for a variety of reasons,
`the most compelling being poor noise eaneellation perfor
`mance and/or significant speech distortion. Primarily, con
`ventional muhi-microphone systems attempt to increase the
`SNR of the user's speech by "steering" the nulls of the system
`to the strongest noise sourees. This approaeh is limited in the
`number of noise sourees removed by the number of available
`nulls.
`The Jawbone earpieee (referred to as the "Jawbone), intro-
`duced in December 2006 by AliphCom of San Francisco,
`Calif., was the first known commercial product to use a pair of
`physical directional microphones (instead of omnidirectional
`microphones) to reduce environmental acoustic noise. The
`technology supporting the Jawbone is eurrently deseribed
`under one or more of U.S. Pat. No. 7,246,058 by Burnett
`and/or U.S. patent applieation Ser. Nos. 10/400,282, 10/667,
`207, and/or 10/769,302. Generally, multi -microphone tech
`niques make use of an acoustic-based Voice Activity Detector
`(VAD) to determine the baekground noise eharaeteristies,
`where "voice" is generally understood to include human
`voiced speech, unvoiced speech, or a combination of voiced
`and unvoiced speech. The Jawbone improved on this by using
`a microphone-based sensor to construct a VAD signal using
`directly detected speech vibrations in the user's cheek. This
`
`2
`allowed the Jawbone to aggressively remove noise when the
`user was not producing speeeh. However, the Jawbone uses a
`directional microphone array.
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`INCORPORATION BY REFERENCE
`
`Each patent, patent application, and/or publication men
`tioned in this speeifieation is herein incorporated by reference
`in its entirety to the same extent as if each individual patent,
`patent applieation, and/or publication was specifically and
`individually indicated to be ineorporated by reference.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a two-microphone adaptive noise suppression
`system, under an embodiment.
`FIG. 2 is an array and speech source (S) eonfiguration,
`under an embodiment. The mierophones are separated by a
`distance approximately equal to 2d〇, and the speech source is
`located a distance d$ away from the midpoint of the array at an
`angle θ. The system is axially symmetric so only d$ and θ need
`be speeified.
`FIG. 3 is a block diagram for a first order gradient miero-
`phone using two omnidirectional elements C\ and 〇刀 under
`an embodiment.
`FIG. 4 is a block diagram for a DOMA including two
`physical microphones configured to form two virtual micro
`phones V] and V2, under an embodiment.
`FIG. 5 is a block diagram for a DOMA including two
`physical microphones configured to form N virtual micro
`phones V] through Vjv, where N is any number greater than
`one, under an embodiment.
`FIG. 6 is an example of a headset or head-worn device that
`includes the DOMA, as described herein, under an embodi
`ment.
`FIG. 7 is a flow diagram for denoising acoustic signals
`using the DOMA, under an embodiment.
`FIG. 8 is a flow diagram for forming the DOMA, under an
`embodiment.
`FIG. 9 is a plot of linear response of virtual microphone ヽら
`to a 1 kHz speech source at a distance of 0.1 m, under an
`embodiment. The null is at 〇 degrees, where the speeeh is
`normally located.
`FIG. 10 is a plot of linear response of virtual microphone
`V2 to a 1 kHz noise souree at a distance of 1.0 m, under an
`embodiment. There is no null and all noise sourees are
`deteeted.
`FIG. 11 is a plot of linear response of virtual microphone
`V] to a 1 kHz speech souree at a distance of 0.1 m, under an
`embodiment. There is no null and the response for speech is
`greater than that shown in FIG. 9.
`FIG. 12 is a plot of linear response of virtual microphone
`V] to a 1 kHz noise souree at a distance of 1.0 m, under an
`embodiment. There is no null and the response is very similar
`to V2 아lown in FIG. 10.
`FIG. 13 is a plot of linear response of virtual microphone
`V] to a speech source at a distance ofO.l m for frequencies of
`100, 500, 1000, 2000, 3000, and 4000 Hz, under an embodi
`ment.
`FIG. 14 is a plot showing comparison of frequency
`responses for speech for the array of an embodiment and for
`a conventional cardioid microphone.
`FIG. 15 is a plot showing speech response for ヽん(top,
`dashed) andヽら(bottom, solid) versus B with d$ assumed to be
`0.1 m, under an embodiment. The spatial null in ヽら is rela
`tively broad.
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 21 of 39
`
`US 8,503,691 B2
`
`3
`FIG. 16 is a plot showing a ratio ofV]八/2 speech responses
`shown in FIG. 10 versns B, nnder an embodiment. The ratio
`is above lOdB for all 0.8<B<l. 1. This means that the physical
`β of the system need not be exactly modeled for good perfor
`mance.
`FIG. 17 is a plot of B versus actual d$ assuming that d^=10
`cm and theta=〇, under an embodiment.
`FIG. 18 is a plot of B versus theta with d^=10 cm and
`assuming d^=10 cm, under an embodiment.
`FIG. 19 is a plot of amplitude (top) and phase (bottom)
`response of N(s) with B=1 and D=-7.2 psec, nnder an
`embodiment. The resulting phase diffërenee elearly affects
`high frequeneies more than low.
`FIG. 20 is a plot of amplitude (top) and phase (bottom)
`response of N(s) with B=1.2 and D=-7.2 |丄sec, nnder an
`embodiment. Non-unity B affëets the entire frequeney range.
`FIG. 21 is a plot of amplitude (top) and phase (bottom)
`response of the effeet on the speeeh eaneellation hi ヽら due to
`a mistake in the loeation of the speeeh souree with ql=〇
`degrees and q2=30 degrees, under an embodiment. The ean
`eellation remains below -lOdB for frequencies below 6 kHz.
`FIG. 22 is a plot of amplitude (top) and phase (bottom)
`response of the effeet on the speeeh eaneellation hi ヽら due to
`a mistake in the loeation of the speeeh souree with ql=〇
`degrees and q2=45 degrees, under an embodiment. The ean
`eellation is below -10 dB only for frequencies below about
`2.8 kHz and a reduction in performance is expected.
`FIG. 23 shows experimental results for a 2d〇=19 mm array
`using a linear β of 0.83 on a Bmel and Kjaer Head and Torso
`Simulator (HATS) in very loud (~85 dBA) music/speech
`noise environment, under an embodiment. The noise has been
`redueed by about 25 dB and the speeeh hardly affected, with
`no noticeable distortion.
`
`SUMMARY OF THE INVENTION
`
`The present invention provides for dual omnidirectional
`microphone array devices, systems and methods.
`In accordance with one embodiment, a mierophone array is
`formed with a first virtual microphone that includes a first
`combination of a first microphone signal and a second micro
`phone signal, wherein the first microphone signal is generated
`by a first physical microphone and the second microphone
`signal is generated by a second physical microphone; and a
`second virtual microphone that includes a second combina
`tion of the first microphone signal and the seeond mierophone
`signal, wherein the seeond combination is different from the
`first combination. The first virtual microphone and the seeond
`virtual microphone are distinet virtual directional micro
`phones with substantially similar responses to noise and sub
`stantially dissimilar responses to speech.
`In accordance with another embodiment, a microphone
`array is formed with a first virtual microphone formed from a
`first combination of a first microphone signal and a second
`microphone signal, wherein the first microphone signal is
`generated by a first omnidirectional microphone and the see
`ond microphone signal is generated by a second omnidirec
`tional microphone; and a second virtual microphone formed
`from a second combination of the first mierophone signal and
`the seeond microphone signal, wherein the seeond combina
`tion is different from the first combination. The first virtual
`microphone has a first linear response to speech that is devoid
`of a null, and the seeond virtual microphone has a second
`linear response to speech that has a single null oriented in a
`direction toward a source of the speeeh, wherein the speeeh is
`human speech.
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`In aeeordanee with another embodiment, a device includes
`a first microphone outputting a first microphone signal and a
`second microphone outputting a second microphone signal;
`and a processing eomponent eoupled to the first microphone
`signal and the seeond mierophone signal, the processing com
`ponent generating a virtual microphone array comprising a
`first virtual microphone and a second virtual microphone,
`wherein the first virtual microphone comprises a first combi
`nation of the first microphone signal and the seeond micro
`phone signal, and wherein the seeond virtual microphone
`eomprises a second combination of the first microphone sig
`nal and the seeond microphone signal. The seeond combina
`tion is different from the first combination. The first virtual
`microphone and the seeond virtual microphone have substan
`tially similar responses to noise and substantially dissimilar
`responses to speech.
`In accordance with another embodiment, a device ineludes
`a first microphone outputting a first microphone signal and a
`second microphone outputting a second microphone signal,
`wherein the first microphone and the seeond microphone are
`omnidireetional mierophones; and a virtual microphone array
`comprising a first virtual microphone and a second virtual
`microphone, wherein the first virtual microphone eomprises a
`first combination of the first microphone signal and the see
`ond microphone signal, and the seeond virtual microphone
`eomprises a second combination of the first microphone sig
`nal and the seeond microphone signal. The seeond combina
`tion is different from the first combination, and the first virtual
`microphone and the seeond virtual microphone are distinet
`virtual directional microphones.
`In accordance with another embodiment, a device ineludes
`a first physical microphone generating a first microphone
`signal; a second physical microphone generating a second
`microphone signal; and a processing component eoupled to
`the first microphone signal and the seeond mierophone signal,
`the proeessing eomponent generating a virtual microphone
`array comprising a first virtual microphone and a second
`virtual microphone. The first virtual microphone comprises
`the seeond mierophone signal subtracted from a delayed ver
`sion of the first microphone signal, and the seeond virtual
`microphone comprises a delayed version of the first micro
`phone signal subtracted from the second microphone signal.
`In accordance with another embodiment, a sensor includes
`a physical microphone array including a first physical micro
`phone and a second physical microphone, the first physical
`microphone outputting a first microphone signal and the see
`ond physical microphone outputting a second microphone
`signal; and a virtual microphone array comprising a first
`virtual microphone and a second virtual microphone, the first
`virtual microphone comprising a first combination of the first
`microphone signal and the seeond microphone signal, the
`seeond virtual microphone eomprising a second combination
`of the first microphone signal and the seeond microphone
`signal, fhe seeond combination is different from the first
`combination, and the virtual microphone array includes a
`single null oriented in a direction toward a source of speech of
`a human speaker.
`
`DETAILED DESCRIPTION
`
`A dual omnidirectional microphone array (DOMA) that
`provides improved noise suppression is described herein.
`Compared to eonventional arrays and algorithms, which seek
`to reduce noise by nulling out noise sourees, the array of an
`embodiment is used to form two distinct virtual directional
`microphones whieh are eonfigured to have very similar noise
`responses and very dissimilar speech responses. The only null
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 22 of 39
`
`US 8,503,691 B2
`
`5
`formed by the DOMAis one used to remove the speeeh of the
`user ffom V) The two virtual microphones of an embodiment
`ean be paired with an adaptive filter algorithm and/or VAD
`algorithm to significantly reduce the noise without distorting
`the speeeh, signifieantly improving the SNR of the desired 5
`speeeh over conventional noise suppression systems. The
`embodiments deseribed herein are stable in operation, flex
`ible with respeet to virtual microphone pattern choice, and
`have proven to be robust with respect to speech source-to-
`array distance and orientation as well as temperature and lo
`calibration techniques.
`In the following description, nnmerous specific details are
`introduced to provide a thorough understanding of, and
`enabling description for, embodiments of the DOMA. One
`skilled in the relevant art, however, will recognize that these 15
`embodiments ean be praetieed without one or more of the
`speeifie details, or with other components, systems, ete. In
`other instances, well-known structures or operations are not
`shown, or are not described in detail, to avoid obscuring
`aspects of the diselosed embodiments.
`Unless otherwise speeified, the following terms have the
`eorresponding meanings in addition to any meaning or under
`standing they may convey to one skilled in the art.
`The term "bleedthrough" means the undesired presenee of
`noise during speech.
`The term "denoising" means removing unwanted noise
`from Miel, and also refers to the amount of reduction of noise
`energy in a signal in decibels (dB).
`The term "devoicing" means removing/di storting the
`desired speeeh from Miel.
`The term "'directional microphone (DM)" means a physical
`directional microphone that is vented on both sides of the
`sensing diaphragm.
`The term "Micl (Ml)" means a general designation for an
`adaptive noise suppression system mierophone that usually 35
`contains more speech than noise.
`The term "Mic2 (M2)" means a general designation for an
`adaptive noise suppression system mierophone that usually
`contains more noise than speech.
`The term "noise" means unwanted environmental acoustic 가〇
`noise.
`The term "null'' means a zero or minima in the spatial
`response of a physical or virtual directional microphone.
`The term "〇/' means a first physical omnidirectional
`microphone used to form a microphone array.
`The term "〇J' means a second physical omnidirectional
`microphone used to form a microphone array.
`The term "speech" means desired speeeh of the user.
`The term "Skin Surface Microphone (SSM)" is a micro
`phone used in an earpiece (e.g., the Jawbone earpiece avail- 50
`able from Aliph of San Francisco, Calif.) to detect speech
`vibrations on the user's skin.
`The term "Vf means the virtual directional "speech"
`microphone, whieh has no nulls.
`The term "VJ' means the virtual directional "noise" micro- 55
`phone, which has a null for the user's speech.
`The term "'Voice Activity Detection (VAD) signal" means
`a signal indicating when user speech is detected.
`The term "virtual microphones (VM)" or "virtual direc
`tional microphones" means a microphone constructed using 60
`two or more omnidireetional microphones and associated
`signal processing.
`FIG. 1 is a two -microphone adaptive noise suppression
`system 100, under an embodiment. The two-mierophone sys
`tem 100 ineluding the eombination of physical microphones 65
`MIC 1 and MIC 2 along with the processing or circuitry
`components to which the microphones eouple (deseribed in
`
`20
`
`25
`
`30
`
`가5
`
`6
`detail below, but not shown in this figure) is referred to herein
`as the dual omnidirectional microphone array (DOMA) 110,
`but the embodiment is not so limited. Referring to FIG. 1, in
`analyzing the single noise souree 101 and the direet path to
`the mierophones, the total acoustic information coming into
`MIC 1 (102, which can be an physical or virtual microphone)
`is denoted by m^(n). The total acoustic information coming
`into MIC 2 (103, which can also be an physical or virtual
`microphone) is similarly labeled m2(n). In the z (digital fre
`quency) domain, these are represented as M/z) and M2(z).
`Then,
`
`•Ml(z)=S(z)+Aら(z)
`
`ル顷z)=Mz)+S2(z)
`
`with
`
`N2(z)=Mz)Hi(z)
`
`S2(Z)=S(Z)丑2(z),
`
`so that
`
`Mi(z)=S(z)+Mz)丑i(z)
`
`Eq. 1
`αち(z)=7V(z)+S(z)丑2(z)
`This is the general case for all two microphone systems.
`Equation 1 has four unknowns and only two known relation
`ships and therefore earmot be solved explieifiy.
`However, there is another way to solve for some of the
`unknowns in Equation E The analysis starts with an exami
`nation of the ease where the speeeh is not being generated,
`that is, where a signal from the'VAD subsystem 104 (optional)
`equals zero. In this case, s(n)=S(z)=0, and Equation 1 reduces
`to
`
`Ml 서Z)=Mz)Hi(z)
`
`も(z)=Mz),
`where the N subseript on the M variables indieate that only
`noise is being reeeived. This leads to
`
`M\n(Z)= M2N(Z)H\W)
`
`wg祐和
`
`Eq. 2
`
`The funetion HJz) ean be ealeulated using any of the avail
`able system identifieation algorithms and the mierophone
`outputs when the system is eertain that only noise is being
`received, fhe calculation can be done adaptively, so that the
`system ean react to changes in the noise.
`A solution is now available forH/z), one of the unknowns
`in Equation 1. The final unknown, H?*), can be determined
`by using the instances where speech is being produced and the
`VAD equals one. When this is oeeurring, but the reeent (per
`haps less than 1 second) history of the microphones indieate
`low levels of noise, it ean be assumed that n(s)=N(z)~〇. Then
`Equation 1 reduces to
`
`Mls(z)=S(z)
`
`ルGs(z)=S(z)丑2(z),
`which in turn leads to
`
`
`
`Case 6:21-cv-00984-ADA Document 55-7 Filed 05/25/22 Page 23 of 39
`
`7
`
`M2S(Z)= M1S(Z)H2 ⑵
`
`映)=뽀*,
`
`5
`
`which is the inverse of the H/z) ealeulation. However, it is
`noted that different inpnts are being nsed (now only the
`speeeh is oeeurring whereas before only the noise was occur
`ring). While calculating H2(z), the values calculated for H/z) 1〇
`are held eonstant (and vice versa) and it is assumed that the
`noise level is not high enough to cause errors in the H/z)
`calculation.
`After calculating H/z) and H^z), they are used to remove
`the noise from the signal. If Equation 1 is rewritten as 15
`S(z)=Mi(z)-N(z)Hi(z)
`
`N(z)弘(z)—S(z)丑2(z)
`
`S(z)=州(z)-卩ち(z)-S(z)丑2(z)]丑i(z)
`
`S(z)[l -丑2(z)丑 i(z)]=妬(z)』ら(z)丑 i(z),
`then N(z) may be substituted as shown to solve for S(z) as
`
`20
`
`25
`
`Eq. 3
`
`If the transfer functions H/z) and H?*) can be described 30
`with sufficient accuracy, then the noise ean be eompletely
`removed and the original signal recovered. This remains true
`without respect to the amplitude or spectral characteristics of
`the noise. If there is very little or no leakage from the speeeh
`souree into M》then H/z)«。and Equation 3 reduces to 35
`Eq. 4
`S(z)=Mi(z)-』ち(z)丑i(z).
`Equation 4 is much simpler to implement and is very
`stable, assuming H/z) is stable. However, if significant
`speech energy is in M/z), devoicing can occur. In order to 가〇
`construct a well-performing system and use Equation 4, con
`sideration is given to the following conditions:
`R1. Availability of a perfect (or at least very good) VAD in
`noisy conditions
`R2. Sufficiently accurate H/z)
`R3. Very small (ideally zero) H?*).
`R4. During speech production, HJz) cannot change sub
`stantially.
`R5. During noise, H/z) cannot change substantially.
`Condition R1 is easy to satisfy if the SNR of the desired 50
`speeeh to the unwanted noise is high enough. "Enough"
`means different things depending on the method of VAD
`generation. If a VAD vibration sensor is used, as in Burnett
`U.S. Pat. No. 7,256,048, accurate VAD in very low SNRs
`(-10 dB or less) is possible. Aeoustie-only methods using 55
`information from C\ and 〇2 can also return accurate VADs,
`but are limited to SNRs of ~3 dB or greater for adequate
`perfbrmanee.
`Condition R5 is normally simple to satisfy because for
`most applications the mierophones will not change position 60
`with respect to the user's mouth very often or rapidly. In those
`applieations where it may happen (sueh as hands-free eonfor-
`eneing systems) it ean be satisfied by eonfiguring Mie2 so that
`H2(z)r〇.
`Satisfying conditions R2, R3, and R4 are more difficult but 65
`are possible given the right combination ofV^ and V)Meth
`ods are examined below that have proven to be effeetive in
`
`가5
`
`US 8,503,691 B2
`
`8
`satisfying the above, resulting in exeellent noise suppression
`performance and minimal speech removal and distortion in an
`embodiment.
`The DOMA, in various embodiments, ean be used with the
`Pathfinder system as the adaptive filter system or noise
`removal. fhe Pathfinder system, available from AliphCom,
`San Francisco, Calif., is described in detail in other patents
`and patent applieations referenced herein. Alternatively, any
`adaptive filter ornoise removal algorithm can be used with the
`DOMA in one or more various alternative embodiments or
`configurations.
`When the DOMA is used with the Pathfinder system, the
`Pathfinder system generally provides adaptive noise eaneel-
`lation by combining the two microphone signals (e.g.. Mid,
`Mic2) by filtering and summing in the time domain. The
`adaptive filter generally uses the signal received from a first
`microphone of the DOMA to remove noise from the speeeh
`reeeived from at least one other microphone of the DOMA,
`whieh relies on a slowly varying linear transfer function
`between the two microphones for sources ofnoise. Following
`processing of the two channels of the DOMA, an output
`signal is generated in whieh the noise eontent is attenuated
`with respeet to the speeeh eontent, as described in detail
`below.
`FIG. 2 is a generalized two-mierophone array (DOMA)
`including an array 201/202 and speech source S configura
`tion, under an embodiment. FIG. 3 is a system 300 for gen
`erating or producing a first order gradient mierophone V
`using two omnidirectional elements C\ and O% under an
`embodiment. The array of an embodiment ineludes two
`physical microphones 201 and 202 (e.g., omnidirectional
`microphones) placed a distance 2d〇 apart and a speech source
`200 is located a distance d「away at au augle of θ. This array
`is axially symmetric (at least in free space), so no other angle
`is needed. The output from each microphone 201 and 202 can
`be delayed (瓦 and zQ, multiplied by a gain (A^ and AQ, and
`then summed with the other as demonstrated in FIG. 3. The
`output of the array is or forms at least one virtual microphone,
`as described in detail below. This operation can be over any
`frequency range desired. By varying the magnitude and sign
`of the delays and gains, a wide variety of virtual microphones
`(VMs), also referred to herein as virtual directional micro
`phones, can be realized. There are other methods known to
`those skilled in the art for constructing VMs but this is a
`common one and will be used in the enablement below.
`As an example, FIG. 4 is a block diagram for a DOMA 400
`including two physical microphones configured to form two
`virtual microphones ヽん and V% under an embodiment. The
`DOMA ineludes two first order gradient mierophones'V 】and
`V2 formed using the outputs of two microphones or elements
`〇i and 〇2 (201 and 202), under an embodiment. The DOMA
`of an embodiment ineludes two physical microphones 201
`and 202 that are onmidireetional microphones, as described
`above with referenee to FIGS. 2 and 3. The output from each
`microphone is eoupled to a processing component 402, or
`circuitry, and the processing component outputs signals rep
`resenting or corresponding to the virtual microphones ヽん and
`V)
`In this example system 400, the output of physical micro
`phone 201 is eoupled to processing component 402 that
`includes a first processing path that includes application of a
`first delay 瓦 and a first gain 】and a second processing path
`that includes application of a second delay 伞 and a second
`gain Ai2. The output of p



