`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 1 of 22
`
`
`
`
`
`
`
`
`EXHIBIT 2
`EXHIBIT 2
`
`
`
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 2 of 22
`sew= STATE ATTA
`
`US008019091B2
`
`US 8,019,091 B2
`(10) Patent No.:
`a2) United States Patent
`Burnett et al.
`(45) Date of Patent:
`*Sep. 13, 2011
`
`
`(54) VOICE ACTIVITY DETECTOR(VAD) -BASED
`MULTIPLE-MICROPHONE ACOUSTIC
`NOISE SUPPRESSION
`Inventors: Gregory C. Burnett, Dodge Center, MN
`(US); Eric F. Breitfeller, Dublin, CA
`(US)
`
`(75)
`
`(73) Assignee: Aliphcom, Inc., San Francisco, CA (US)
`(*) Notice:
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 713 days.
`This patent is subject to a terminal dis-
`claimer.
`(21) Appl. No.: 10/667,207
`i a °
`Filed:
`Sep. 18, 2003
`
`(22)
`
`4,901,354 A *
`2/1990 Gollmar etal.
`aea ‘ . 31903 baba hi
`212,
`tyoshi
`5,400,409 A
`3/1995 Linhard.
`5,406,622 A *
`4/1995 Silverb tal. 381/94.7
`5,414,776 A
`5/1995 Sims.I we
`5,463,694 A * 10/1995 Bradley et al... 381/92
`(Continued)
`
`EP
`
`FOREIGN PATENT DOCUMENTS
`0.637 187 A * 2/1995
`(Continued)
`
`OTHER PUBLICATIONS
`ZhaoLi et al: “Robust Speech Coding Using Microphone Arrays”,
`Signals Systems and Computers, 1997. Conf. recordof3 1stAsilomar
`Conf., Nov. 2-5, 1997, IEEE Comput. Soc. Nov.2, 1997. USA.
`(Continued)
`
`(65)
`
`Prior Publication Data
`US 2004/0133421 Al
`‘Jul. 8, 2004
`Related US. Appl
`D
`icati
`t
`t
`S.
`ee
`Ppmcatron ee
`(63) Continuation-in-part of application No. 09/905,361,
`filed on Jul. 12, 2001, now abandoned.
`(60) Provisional application No. 60/219,297, filed on Jul.
`19, 2000.
`Int. Cl
`(51)
`(2006.01)
`OBB 2900
`381/71.8: 704/215
`,
`(52) US. Cl
`- 381/70
`Fi ld f Cloeficatiue5verehwv
`58
`(58)
`Fie 381/9ati.7 a18 9..“90.Dda117047200.
`an 704/231 933 46 314.21 5
`See applicationfile for complete search histo
`P
`"y
`PP
`References Cited
`
`(56)
`
`U.S. PATENT DOCUMENTS
`
`3,789,166 A *
`4,006,318 A *
`4,591,668 A *
`
`1/1974 Sebesta
`2/1977 Sebesta etal.
`5/1986 Iwata
`
`Primary Examiner — Davetta Goins
`Assistant Examiner — Lun-See Lao
`(74) Attorney, Agent, or Firm — Gregory & Sawrie LLP
`
`ABSTRACT
`(67)
`Acoustic noise suppression is provided in multiple-micro-
`phone systems using Voice Activity Detectors (VAD). A host
`system receives acoustic signals via multiple microphones.
`The system also receives information on the vibration of
`humantissue associated with human voicing activity via the
`VAD.In response, the system generates a transfer function
`representative ofthe received acoustic signals upon determin-
`ing that voicing information is absent from the received
`acoustic signals during at least one specified period of time.
`The system removes noise from the received acoustic signals
`using the transfer function, thereby producing a denoised
`acoustic data stream.
`
`20 Claims, 10 Drawing Sheets
`
`204
`
`Voicing Information
`
`200
`
`
`
`
`
`Noise Removal
`
`
`
`Cleaned Speech
`
`
`
`100
`
`()
`Signal
`s(n)
`
`101
`
`()
`Noise
`n(n)
`
`
`
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 3 of 22
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 3 of 22
`
`US 8,019,091 B2
` Page 2
`
`U.S. PATENT DOCUMENTS
`5,473,701 A * 12/1995 Cezanne etal. wu... 381/92
`5,473,702 A * 12/1995 Yoshidaet al.
`.....0.. 381/94.7
`5,515,865 A *
`5/1996 Scanlonet al.
`5,517,435 A *
`5/1996 Sugiyama oo... 708/322
`5,539,859 A
`7/1996 Robbe etal.
`5,590,241 A * 12/1996 Parketal. oo. 704/227
`5,633,935 A *
`5/1997 Kanamori etal.
`.............. 381/26
`oeeadeo ‘ x Hitoos cuplaetal
`5.729.694 A *
`3/1998 Holzrichter etal. ss... 705/17
`5,754,665 A *
`5/1998 Hosoi occ 381/94.1
`5,835,608 A »
`1DLoos Wamaka et al.
`oe A
`iho00 scan al
`5.966.090 A
`10/1999 McEwan
`5,986,600 A
`11/1999 McEwan
`6,006,175 A * 12/1999 Holzrichter 0.0.0.0... 704/208
`6,009,396 A . 12/1999 Nagata
`eo a
`500" wo et al.
`6266.422 BI
`7/2001 Ikeda
`6,430,295 Bl
`8/2002 Handel etal.
`......0.. 379/388 .06
`6,707,910 B1*
`3/2004 Valveetal.
`2002/0039425 A1*
`4/2002 Burnettet al.
`2003/0228023 Al* 12/2003 Burnett et al. o... 381/92
`FOREIGN PATENT DOCUMENTS
`0795 851 A2 *
`9/1997
`0 984 660 A2 *
`3/2000
`
`EP
`EP
`
`JP
`wo
`
`2000 312 395
`* 11/2000
`Es
`wooOT x Tees
`
`OTHER PUBLICATIONS
`
`L.C. Ng et al.: “Denoising of Human Speech Using Combined.
`Acoustic and EM Sensor Signal Processing”, 2000 IEEEIntl Conf on
`Acoustics Speech and Signal Processing. Proceedings (Cat. No.
`00CH37100),Istanbul, Turkey, Jun. 5-9, 2000 XP002 186255, ISBN
`0-7803-6293-4.
`S. Affes et al.: “A Signal Subspace Tracking Algorithm for Micro-
`phone Array Processing of Speech”. IEEE Transactions on Speech
`and Audio Processing, N.Y, USA vol. 5, No. 5, Sep. 1, 1997.
`XP000774303, ISBN 1063-6676.
`Gregory C. Burnett: “The Physiological Basis of Glottal Electromag-
`netic Micropower Sensors (GEMS) and Their Use in Defining an
`Excitation Function for the Human Vocal Tract”, Dissertation. Uni-
`versity of California at Davis, Jan. 1999, USA.
`L.C. Nget al.: “Speaker Verification Using Combined Acoustic and
`EMSensorSignal Processing”, ICASSP-2001, Salt Lake City, USA.
`A. Hussain: “Intelligibility Assessment of a Multi-Band Speech
`Enhancement Scheme”, Proceedings IEEEIntl. Conf. on Acoustics,
`Speech & Signal Processing (ICASSP-2000). Istanbul, Turkey, Jun.
`2000.
`
`* cited by examiner
`
`
`
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 4 of 22
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 4 of 22
`
`U.S. Patent
`
`Sep. 13, 2011
`
`Sheet 1 of 10
`
`US 8,019,091 B2
`
`
`
`OLOFBUISTOUISUOIOA
`0¢JOssa01gsauoydolo
`
`
`waysXsqnsS70S09802
`
`_seuoydoxonn[01~0001
`
`COld(a)
`
`101
`
`(sy)
`
`
`yosedgpauray)
`[PAOWYVION
`
`ne001
`
`())
`
`[eusis
`
`
`
`
`
`
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 5 of 22
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 5 of 22
`
`U.S. Patent
`
`Sep. 13, 2011
`
`Sheet 2 of 10
`
`US 8,019,091 B2
`
`<—(2)'W—f(%)
`:(2)s|tN(2)'Hjeusig
`
`
`
`<—(2)—~(z)'p(>)
`
`006—~
`
`(2)'nz)°(2)'N
`
`(z)"9()
`(z)q¢9SION
`:(z)"N
`
`OldoyUaston
`
`(>)
`
`
`
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 6 of 22
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 6 of 22
`
`U.S. Patent
`
`Sep. 13, 2011
`
`Sheet 3 of 10
`
`US 8,019,091 B2
`
`~~
`
`A= S
`
`SJ
`
`e=
`
`ST.
`2S
`tL
`
`S
`ih =
`
`~“
`
`aC
`
`O
`
`~
`
`me
`
`ww
`
`a
`
`
`
`Ss
`on
`
`we
`
`—
`
`=a
`a8nR
`EMR
`“
`
`a —
`“ —~
`Ts
`e232 eee
`eert
`SBS SE a SES
`Z,
`Zz,
`Zz.
`
`
`
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 7 of 22
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 7 of 22
`
`U.S. Patent
`
`Sep. 13, 2011
`
`Sheet 4 of 10
`
`Receive acoustic signals
`
`FIG.5
`
`Receive voice activity
`(VAD) information
`
`Determine absence of
`voicing and generate first
`transfer function
`
`Determine presence of
`voicing and generate
`second transfer function
`
`Produce denoised
`acoustic data stream
`
`US 8,019,091 B2
`
`502
`
`504
`
`506
`
`508
`
`510
`
`
`
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 8 of 22
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 8 of 22
`
`U.S. Patent
`
`Sep. 13, 2011
`
`Sheet 5 of 10
`
`US 8,019,091 B2
`
`Noise Removal Results for American English Female Saying 406-5562
`4
`
`x 10
`
`Dirty
`Audio
`604
`
`40)
`
`Cleaned
`Audio
`
`602
`
`
`
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 9 of 22
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 9 of 22
`
`U.S. Patent
`
`Sep. 13, 2011
`
`Sheet 6 of 10
`
`US 8,019,091 B2
`
`FIG.7A
`
`FIG.7B
`
`
`
`VAD
`Device
`
`VAD
`Algorithm
`
`
`
`
`
`
`
`
`704
`
`
`
`Noise
`Suppression
`
`764
`
`704
`
`
`
`VAD
`
`Algorithm
`
`
`
`Signal
`Processing
`
`System
`
`
`
`Noise
`
`Suppression
`System
`
`
`
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 10 of 22
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 10 of 22
`
`U.S. Patent
`
`Sep. 13, 2011
`
`Sheet 7 of 10
`
`US 8,019,091 B2
`
`,— 800
`
`FIG.8
`
`
`
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 11 of 22
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 11 of 22
`
`U.S. Patent
`
`Sep. 13, 2011
`
`Sheet 8 of 10
`
`US 8,019,091 B2
`
`Denoised
`
`AccelerometerNoisyAudio
`
`Time (samples at 8 kHz)
`KS
`FIG.9
`
`
`
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 12 of 22
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 12 of 22
`
`U.S. Patent
`
`Sep. 13, 2011
`
`Sheet 9 of 10
`
`US 8,019,091 B2
`
`Noisy
`
`SSM
`
`0.1
`
`0.05
`
`=
`
`-0.05
`
`Audio
`-0.1
`Audio
`Denoised
`
`2
`
`25
`
`3
`
`35
`
`4
`
`45
`
`#5
`
`55
`
`6)
`
`6.5
`
`x 10
`
`Time (samples at 8 kHz)
`Ke
`FIG.10
`
`
`
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 13 of 22
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 13 of 22
`
`U.S. Patent
`
`Sep. 13, 2011
`
`Sheet 10 of 10
`
`US8,019,091 B2
`
`GEMSNoisyAudio
`
`
`
`DenoisedAudio
`
`Time (samples at 8 kHz)
`Ke J
`FIG.11
`
`
`
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 14 of 22
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 14 of 22
`
`US 8,019,091 B2
`
`1
`VOICE ACTIVITY DETECTOR(VAD) -BASED
`MULTIPLE-MICROPHONE ACOUSTIC
`NOISE SUPPRESSION
`
`RELATED APPLICATIONS
`
`This patent application is a continuation-in-part of U.S.
`patent application Ser. No. 09/905,361, filed Jul. 12, 2001,
`now abandonedwhichclaimspriority from U.S. patent appli-
`cation Ser. No. 60/219,297, filed Jul. 19, 2000. This patent
`application also claimspriority from U.S. patent application
`Ser. No. 10/383,162, filed Mar. 5, 2003.
`
`FIELD OF THE INVENTION
`
`The disclosed embodiments relate to systems and methods
`for detecting and processing a desired signal in the presence
`of acoustic noise.
`
`BACKGROUND
`
`Manynoise suppression algorithms and techniques have
`been developed overthe years. Mostof the noise suppression
`systemsin use today for speech communication systems are
`based on a single-microphonespectral subtraction technique
`first develop in the 1970’s and described, for example, by S.
`F. Boll in “Suppression of Acoustic Noise in Speech using
`Spectral Subtraction,’ IEEE Trans. on ASSP, pp. 113-120,
`1979. These techniques have beenrefined overthe years, but
`the basic principles ofoperation have remained the same. See,
`for example, U.S. Pat. No. 5,687,243 of McLaughlin,et al.,
`and U.S. Pat. No. 4,811,404 ofVilmur, et al. Generally, these
`techniques make use of a microphone-based Voice Activity
`Detector (VAD) to determine the background noise charac-
`teristics, where “voice” is generally understood to include
`human voiced speech, unvoiced speech, or a combination of
`voiced and unvoiced speech.
`The VAD hasalso been usedin digital cellular systems. As
`an example of such a use, see U.S. Pat. No. 6,453,291 of
`Ashley, where a VAD configuration appropriate to the front-
`end of a digital cellular system is described. Further, some
`Code Division Multiple Access (CDMA) systemsutilize a
`VAD to minimize the effective radio spectrum used, thereby
`allowing for more system capacity. Also, Global System for
`Mobile Communication (GSM)systems can include a VAD
`to reduce co-channel interference and to reduce battery con-
`sumption on the client or subscriber device.
`These typical microphone-based VAD systemsare signifi-
`cantly limited in capability as a result of the addition of
`environmental acoustic noise to the desired speech signal
`received by the single microphone, wherein the analysis is
`performed using typical signal processing techniques. In par-
`ticular,
`limitations in performance of these microphone-
`basedVADsystemsare noted when processing signals having
`a low signal-to-noise ratio (SNR), and in settings where the
`backgroundnoise varies quickly. Thus, similar limitations are
`foundin noise suppression systems using these microphone-
`based VADs.
`
`BRIEF DESCRIPTION OF THE FIGURES
`
`FIG.1 is a block diagram of a denoising system, under an
`embodiment.
`
`FIG.2 is a block diagram including components of a noise
`removalalgorithm, underthe denoising system of an embodi-
`ment assuming a single noise source and direct paths to the
`microphones.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`FIG.3 is a block diagram including front-end components
`of a noise removal algorithm of an embodiment generalized
`to n distinct noise sources (these noise sources maybereflec-
`tions or echoes of one another).
`FIG.4 is a block diagram including front-end components
`of a noise removalalgorithm of an embodimentin a general
`case wherethereare n distinct noise sources and signalreflec-
`tions.
`
`FIG. 5 is a flow diagram of a denoising method, under an
`embodiment.
`
`FIG.6 showsresults of a noise suppression algorithm of an
`embodimentfor an American English female speaker in the
`presence of airport terminal noise that includes many other
`human speakers and public announcements.
`FIG. 7A is a block diagram of a Voice Activity Detector
`(VAD) system including hardware for use in receiving and
`processing signals relating to VAD, under an embodiment.
`FIG.7Bis a block diagram of a VAD system using hard-
`ware of a coupled noise suppression system for use in receiv-
`ing VAD information, under an alternative embodiment.
`FIG. 8 is a flow diagram of a method for determining
`voiced and unvoiced speech using an accelerometer-based
`VAD, under an embodiment.
`FIG. 9 showsplots including a noisy audio signal (live
`recording) along with a corresponding accelerometer-based
`VADsignal, the corresponding accelerometer output signal,
`and the denoised audio signal following processing by the
`noise suppression system using the VAD signal, under an
`embodiment.
`FIG. 10 showsplots including a noisy audio signal (live
`recording) along with a corresponding SSM-based VADsig-
`nal, the corresponding SSM output signal, and the denoised
`audio signal following processing by the noise suppression
`system using the VAD signal, under an embodiment.
`FIG. 11 shows plots including a noisy audio signal (live
`recording) along with a corresponding GEMS-based VAD
`signal,
`the corresponding GEMS output signal, and the
`denoised audio signal following processing by the noise sup-
`pression system using the VADsignal, under an embodiment.
`
`DETAILED DESCRIPTION
`
`The following description provides specific details for a
`thorough understanding of, and enabling description for,
`embodiments of the noise suppression system. However, one
`skilled in the art will understand that the invention may be
`practiced without these details. In other instances, well-
`known structures and functions have not been shown or
`
`described in detail to avoid unnecessarily obscuring the
`description of the embodiments of the noise suppression sys-
`tem. In the following description, “signal” represents any
`acoustic signal (such as human speech) that is desired, and
`“noise” is any acoustic signal (which may include human
`speech) that is not desired. An example would be a person
`talking on acellular telephone with a radio in the background.
`The person’s speech is desired and the acoustic energy from
`the radio is notdesired. In addition, “user” describes a person
`who is using the device and whose speech is desired to be
`captured by the system.
`Also, “acoustic” is generally defined as acoustic waves
`propagating in air. Propagation of acoustic waves in media
`other than air will be noted as such. Referencesto “speech” or
`“voice” generally refer to human speech including voiced
`speech, unvoiced speech, and/or a combination of voiced and
`unvoiced speech. Unvoiced speech or voiced speech is dis-
`tinguished where necessary. The term “noise suppression”
`
`
`
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 15 of 22
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 15 of 22
`
`US 8,019,091 B2
`
`3
`generally describes any method by which noise is reduced or
`eliminated in an electronic signal.
`Moreover, the term “VAD”is generally defined as a vector
`or array signal, data, or information that in some manner
`represents the occurrence of speech in the digital or analog
`domain. A commonrepresentation of VAD informationis a
`one-bit digital signal sampled at the samerate as the corre-
`sponding acoustic signals, with a zero value representing that
`no speech has occurred during the corresponding time
`sample, and a unity value indicating that speech has occurred
`during the corresponding time sample. While the embodi-
`ments described herein are generally describedin thedigital
`domain, the descriptionsare alsovalid for the analog domain.
`FIG. 1 is a block diagram of a denoising system 1000 of an
`embodimentthat uses knowledge ofwhen speechis occurring
`derived from physiological information on voicing activity.
`The system 1000 includes microphones 10 and sensors 20
`that provide signals to at least one processor 30. The proces-
`sor includes a denoising subsystem or algorithm 40.
`FIG.2 is a block diagram including components of a noise
`removal algorithm 200 of an embodiment. A single noise
`source and a direct path to the microphones are assumed. An
`This is the general case forall two microphonesystems. In
`operational description ofthe noise removal algorithm 200 of
`a practical system there is always going to be some leakage of
`an embodimentis provided using a single signal source 100
`noise into MIC 1, and some leakage of signal into MIC 2.
`and a single noise source 101, but is not so limited. This
`Equation 1 has four unknowns and only two knownrelation-
`algorithm 200 uses two microphones: a “signal” microphone
`ships and therefore cannot be solved explicitly.
`1 (“MIC1”) and a “noise” microphone 2 (“MIC 2”), but is not
`However, there is another way to solve for some of the
`so limited. The signal microphone MIC 1 is assumedto cap-
`unknowns in Equation 1. The analysis starts with an exami-
`ture mostly signal with some noise, while MIC 2 captures
`nation ofthe case wherethe signalis not being generated, that
`mostly noise with some signal. The data from the signal
`is, where a signal from the VAD element 204 equals zero and
`source 100 to MIC 1 is denoted by s(n), where s(n) is a
`speech is not being produced.In this case, s(n) S(z)=0, and
`discrete sample of the analog signal from the source 100. The
`Equation 1 reduces to
`data from the signal source 100 to MIC2is denoted bys(n).
`M,,(2)-N@A1E)
`The data from the noise source 101 to MIC 2 is denoted by
`n(n). The data from the noise source 101 to MIC 1 is denoted
`by n,(n). Similarly, the data from MIC 1 to noise removal
`element 205 is denoted by m,(n), and the data from MIC 2 to
`noise removal element 205 is denoted by m,(n).
`The noise removal element 205 also receives a signal from
`a voice activity detection (VAD) element 204. The VAD 204
`uses physiological information to determine when a speaker
`is speaking. In various embodiments, the VAD can include at
`least one of an accelerometer, a skin surface microphone in
`physical contact with skin of a user, a human tissue vibration
`detector, a radio frequency (RF) vibration and/or motion
`detector/device, an electroglottograph, an ultrasounddevice,
`an acoustic microphonethat is being used to detect acoustic
`frequency signals that correspond to the user’s speech
`directly from the skin of the user (anywhere on the body), an
`airflow detector, and a laser vibration detector.
`Thetransfer functions from the signal source 100 to MIC 1
`and from the noise source 101 to MIC 2 are assumed to be
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`unity. The transfer function from the signal source 100 to MIC
`2 is denoted by H,(z), andthe transfer function from the noise
`source 101 to MIC 1 is denoted by H, (z). The assumption of
`unity transfer functions does not inhibit the generality of this
`algorithm, as the actual relations between the signal, noise,
`and microphonesare simply ratios andtheratios are redefined
`in this mannerfor simplicity.
`In conventional two-microphone noise removal systems,
`the information from MIC 2 is used to attempt to remove
`noise from MIC 1. However, an (generally unspoken)
`assumptionis that the VAD element 204 is never perfect, and
`thus the denoising must be performed cautiously, so as not to
`remove too muchofthe signal along with the noise. However,
`if the VAD 204is assumedto be perfect such thatit is equal to
`zero whenthere is no speech being producedby the user, and
`
`55
`
`60
`
`65
`
`4
`equal to one when speechis produced, a substantial improve-
`mentin the noise removal can be made.
`
`In analyzing the single noise source 101 andthe direct path
`to the microphones, with reference to FIG.2,the total acous-
`tic information coming into MIC 1 is denoted by m,(n). The
`total acoustic information coming into MIC 2 is similarly
`labeled m,(n). In the z (digital frequency) domain, these are
`represented as M,(z) and M.(z). Then,
`M,@)=S@)+N2(Z)
`
`M2(z)=N(Z)+So(Z)
`
`with
`
`N2Z)-N@)A@)
`
`S)(z)=S(2)A2(2),
`so that
`
`M,@)-SEHN@)A(EZ)
`
`M2(Z)-N@)+S(@)Ha(Z)
`
`Eq. 1
`
`M2,(Z)-N©@),
`
`where the n subscript on the M variables indicate that only
`noise is being received. This leads to
`
`Min(@) = Mon) Hi (2)
`
`Min
`ow ta
`
`Eq. 2
`
`The function H,(z) can be calculated using any of the
`available system identification algorithms and the micro-
`phone outputs when the system is certain that only noise is
`being received. The calculation can be done adaptively, so
`that the system can react to changesin the noise.
`A solution is now available for one of the unknowns in
`Equation 1. Another unknown, H,(z), can be determined by
`using the instances where the VAD equals one and speechis
`being produced. Whenthis is occurring, but the recent (per-
`haps less than 1 second) history of the microphones indicate
`low levels ofnoise, it can be assumedthat n(s)=N(z)~0. Then
`Equation 1 reduces to
`M,,2)=St)
`
`M,,(2)-S@)HD),
`which in turn leads to
`
`Mo5(zZ) = Mis(Z)H2(2)
`
`
`
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 16 of 22
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 16 of 22
`
`US 8,019,091 B2
`
`5
`-continued
`
`
`
`whichis the inverse of the H,(z) calculation. However,it is
`noted that different inputs are being used (now onlythe signal
`is occurring whereas before only the noise was occurring).
`While calculating H,(z), the values calculated for H,(z) are
`held constant and vice versa. Thus, it is assumed that while
`one of H, (z) and H,(z) are being calculated, the one not being
`calculated does not change substantially.
`After calculating H,(z) and H,(2), they are used to remove
`the noise from the signal. If Equation 1 is rewritten as
`S(Z)=—M, @)-N@)Ai@)
`
`N@)=M2(@)-S(@)HD(Z)
`
`S(Z)=M, @)-[Mo(@)-S@)(2)|i)"
`
`S(2)[1-H2(2)(@)]=M@)-M2@)2),
`
`then N(z) may be substituted as shownto solve for S(z) as
`
`6
`mitted. Once again, the “n” subscripts on the microphone
`inputs denote only that noise is being detected, while an “s”
`subscript denotes that only signal is being received by the
`microphones.
`Examining Equation 4 while assuming an absence ofnoise
`produces
`M,,=8
`
`My,=SHo.
`
`Thus, H, can be solved for as before, using any available
`transfer function calculating algorithm. Mathematically,
`then,
`
`Ho
`
`
`_ Mos
`Mis
`
`20
`
`Rewriting Equation 4, using H, defined in Equation 6,
`provides,
`
`S
`
`Mi (Z) — Ma(Z) Mi (2)
`=
`= Thome
`
`M,-S
`~
`Ay, =.
`Ma —SHo
`
`Eq. 3
`
`25
`
`Solving for S yields,
`
`Eq. 7
`
`Eg. 8
`
`If the transfer functions H,(z) and H,(z) can be described
`with sufficient accuracy, then the noise can be completely
`removed andthe original signal recovered. This remains true
`without respect to the amplitude or spectral characteristics of
`the noise. The only assumptions madeinclude use of a perfect
`VAD, sufficiently accurate H,(z) and H,(z), and that when
`one of H,(z) and H(z) are being calculated the other does not
`change substantially. In practice these assumptions have
`proven reasonable.
`The noise removal algorithm described herein is easily
`generalized to include any numberofnoise sources. FIG.3 is
`a block diagram including front-end components 300 of a
`noise removal algorithm of an embodiment, generalized to n
`distinct noise sources. These distinct noise sources may be
`reflections or echoes of one another, but are not so limited.
`There are several noise sources shown, each with a transfer
`function, or path, to each microphone. The previously named
`path H, has been relabeled as H,,so that labeling noise source
`2’s path to MIC 1 is more convenient. The outputs of each
`microphone, when transformedto the z domain,are:
`M(2)=S(@)+N(2)(@)+No(2)Ho@)+ « .. N, AZ)
`
`My(2)=S@)Ho(Z)+N{@)G1@)+No(Z)Go(z)+ . .. Nz@)G,,
`(z)
`
`Eq. 4
`
`When there is no signal (VAD=0), then (suppressing z for
`clarity)
`M,,=N,H+Nolbo+... NH,
`
`M>,=N,G,+N>Got... NG.
`A newtransfer function can now be defined as
`
`Min _— NH +NoHy +... NaHn
`A, =-— = —__——*"
`"My,
`N{G, +N2G.+...N,G,”
`
`Eg. 5
`
`Eq. 6
`
`where FI, is analogous to H,(z) above. Thus FI, depends only
`on the noise sources and their respective transfer functions
`and can be calculated any timethere is no signal being trans-
`
`30
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Se M, - M>Fl,
`1-Hoft,
`
`|
`
`whichis the same as Equation 3, with H, taking the place of
`H,, and H, taking the place of H,. Thus the noise removal
`algorithm still is mathematically valid for any number of
`noise sources, including multiple echoes of noise sources.
`Again, if H, and H, can be estimated to a high enoughaccu-
`racy, and the above assumption of only one path from the
`signal to the microphonesholds, the noise may be removed
`completely.
`The most general case involves multiple noise sources and
`multiple signal sources. FIG. 4 is a block diagram including
`front-end components 400 of a noise removal algorithm of an
`embodimentin the most general case where there are n dis-
`tinct noise sources and signal reflections. Here, signal reflec-
`tions enter both microphones MIC 1 and MIC2. This is the
`most general case, as reflections of the noise source into the
`microphones MIC 1 and MIC 2 can be modeled accurately as
`simple additional noise sources. For clarity, the direct path
`from the signal to MIC 2 is changed from H,(z) to H,,(z), and
`the reflected paths to MIC 1 and MIC 2 are denoted by Hy, (z)
`and H,.(z), respectively.
`The input into the microphones now becomes
`M1 (2)=S@)+S(2)Ao (Z)4+N1 (ZA (Z)4+No(@)Ho(Z)+ .
`.
`-
`N,.@H,@)
`
`My(2)=S@)Ho0(Z)+S)o2(Z)+N@)G(@)+N2(2) Ga(z)+
`...N,(2)G,(Z).
`
`Eq. 9
`
`When the VAD=0, the inputs become (suppressing z again)
`M,,=N\H+NoHot....N,H,
`
`M3, =N,\G4+NoGot ...N,Gys
`which is the same as Equation 5. Thus, the calculation of H,
`in Equation 6 is unchanged, as expected. In examining the
`situation where there is no noise, Equation 9 reduces to
`
`
`
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 17 of 22
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 17 of 22
`
`US 8,019,091 B2
`
`M,,=S+SHo,
`
`M5,=SHo0+SHop.
`
`This leads to the definition of H, as
`
`
`— Hoo + Hor
`~ 1+Ho |
`
`Eq. 10
`
`Rewriting Equation 9 again using the definition for H, (as
`in Equation 7) provides
`
`8
`tially while the other is calculated. If the user environmentis
`such that echoes are present, they can be compensatedforif
`coming from a noise source. If signal echoes are also present,
`they will affect the cleaned signal, but the effect should be
`negligible in most environments.
`In operation, the algorithm of an embodiment has shown
`excellent results in dealing with a variety of noise types,
`amplitudes, and orientations. However,
`there are always
`approximations and adjustments that have to be made when
`moving from mathematical concepts to engineering applica-
`tions. One assumption is made in Equation 3, where H.(z) is
`assumed small and therefore H,(z)H, (z)=0, so that Equation
`3 reduces to
`
`i
`
`M, - SU. + Ho1)
`* Ma = S(Hoo + Hoa)’
`
`15
`
`Eq. 11
`
`S(@)=M(Z)-M2 (2)(2).
`
`This meansthat only H,(z) hasto be calculated, speeding up
`the process and reducing the number of computations
`required considerably. With the proper selection of micro-
`phones, this approximation is easily realized.
`Another approximation involves the filter used in an
`embodiment. The actual H,(z) will undoubtedly have both
`poles and zeros, but for stability and simplicity an all-zero
`Finite Impulse Response (FIR) filter is used. With enough
`taps the approximation to the actual H,(z) can be very good.
`To further increase the performance of the noise suppres-
`sion system, the spectrum of interest (generally about 125 to
`3700 Hz) is divided into subbands. The wider the range of
`frequencies over which a transfer function must be calcu-
`lated, the moredifficult it is to calculate it accurately. There-
`fore the acoustic data was divided into 16 subbands, and the
`denoising algorithm wasthen applied to each subband in turn.
`Finally, the 16 denoised data streams were recombined to
`yield the denoised acoustic data. This works very well, but
`any combinations of subbands (i.e., 4, 6, 8, 32, equally
`spaced, perceptually spaced, etc.) can be used and all have
`been found to work better than a single subband.
`The amplitude of the noise was constrained in an embodi-
`ment so that the microphones used did not saturate (that is,
`operate outside a linear response region). It is important that
`the microphones operate linearly to ensure the best perfor-
`mance. Even with this restriction, very low signal-to-noise
`ratio (SNR) signals can be denoised (down to -10 dBorless).
`The calculation of H,(z) is accomplished every 10 milli-
`seconds using the Least-Mean Squares (LMS) method, a
`commonadaptive transfer function. An explanation may be
`found in “Adaptive Signal Processing” (1985), by Widrow
`and Steams, published by Prentice-Hall, ISBN 0-13-004029-
`0. The LMS was used for demonstration purposes, but many
`other system idenfication techniques can be usedto identify
`H,(z) and H,(z) in FIG.2.
`The VAD for an embodimentis derived from a radio fre-
`
`quency sensor and the two microphones, yielding very high
`accuracy (>99%) for both voiced and unvoiced speech. The
`VADof an embodimentuses a radio frequency (RF) vibration
`detector interferometer to detect tissue motion associated
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`Somealgebraic manipulation yields
`
`S(L + Ho. — Ay(Hoo + Ho2)) = M, - M2 A,
`~,
`(Hoo + Ho2)
`o
`sd+ Ho1)[1 -fA, Tay | =M, — Mol,
`SU + Ho1)[1 - Ay Aa | =M,- MF,
`
`and finally
`
`M,- Mf,
`SU + Ao.)=
`1-A, A,
`
`Eq. 12
`
`Equation 12 is the same as equation 8, with the replacement
`of H, by H.,and the addition ofthe (1+H,,) factor on theleft
`side. This extra factor (1+H,) meansthat S cannot be solved
`for directly in this situation, but a solution can be generated
`for the signal plus the additionofall of its echoes. This is not
`such a badsituation, as there are many conventional methods
`for dealing with echo suppression, and even if the echoes are
`not suppressed,it is unlikely that they will affect the compre-
`hensibility of the speech to any meaningful extent. The more
`complexcalculation of H, is needed to accountfor the signal
`echoes in MIC 2, which act as noise sources.
`FIG. 5 is a flow diagram 500 of a denoising algorithm,
`under an embodiment. In operation, the acoustic signals are
`received, at block 502. Further, physiological information
`associated with human voicing activity is received, at block
`504. A first transfer function representative of the acoustic
`signal is calculated upon determining that voicing informa-
`tion is absent from the acoustic signalfor at least one specified
`period of time, at block 506. A secondtransfer function rep-
`resentative ofthe acoustic signal is calculated upon determin-
`ing that voicing information is present in the acoustic signal
`for at least one specified period of time,at block 508. Noise is
`removed from the acoustic signal using at least one combi-
`nation ofthe first transfer function and the second transfer
`function, producing denoised acoustic data streams, at block
`510.
`
`with human speech production, but is not so limited. The
`signal from the RF device is completely acoustic-noisefree,
`and is able to function in any acoustic noise environment. A
`simple energy measurement of the RF signal can be used to
`An algorithm for noise removal, or denoising algorithm,is
`determine ifvoiced speech is occurring. Unvoiced speech can
`described herein, from the simplest case of a single noise
`be determined using conventional acoustic-based methods,
`source withadirect path to multiple noise sources with reflec-
`by proximity to voiced sections determined using the RF
`tions and echoes. The algorithm has been shownherein to be
`sensoror similar voicing sensors,or through a combination of
`viable under any environmental conditions. The type and
`the above. Since there is much less energy in unvoiced speech,
`amount of noise are inconsequential if a good estimate has
`its detection accuracy is notas critical to good noise suppres-
`been made of H, and H,, andif one does not change substan-
`sion performanceas is voiced speech.
`
`60
`
`
`
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 18 of 22
`Case 6:21-cv-00984-ADA Document 55-2 Filed 05/25/22 Page 18 of 22
`
`US 8,019,091 B2
`
`9
`With voiced and unvoiced speech detected reliably, the
`algorithm of an embodiment can be implemented. Once
`again, it is useful to repeat that the noise removal algorithm
`does not depend on how the VAD is obtained, only that it is
`accurate, especially for voiced speech. If speech is not
`detected and training occurs on the speech, the subsequent
`denoised acoust



