`
`‘s'nTseeTt
`
`
`34
`
`]
`
`
`
`PTO/SB/05 (08-03)
`Approved for use through 07/31/2006. OMB 0651-0032
`U.S. Patent and Trademark Office. U.S. DEPARTMENT OF COMMERCE
`
`
`UTILITY
`
`
`PATENT APPLICATION
`
`VAD-based Multiple-Microphone Acoustic
`
`
`TRANSMITTAL
`Noise Suppression
`
`
`(Only for new nonprovisionalapplications under 37 CFR 1.53(b})
`
`Mail Stop Patent Application
`
`
`
`.
`Commissioner for Patents
`APPLICATION ELEMENTS
`Oo
`ADDRESSTO:
`P.O. Box 1450
`
`See MPEP chapter 600 concerning utility patent application contents.
`-
`Alexandria VA 22343-1450
`
`
`
`
`
`
`1.
`Fee Transmittal Form (e.g., PTO/SB/17)
`7. oO CD-ROM or CD-Rin duplicate,large table or
`wn
`(Submit an original and a duplicate for fee processing)
`Computer Program (Appendix)
`>
`
`
`2 Ej Applicantclaims small entity status.
`8. Nucleotide and/or Amino Acid Sequence Submission
`co
`See 37 CFR 1.27.
`(if applicable, all necessary)
`
`
`
`{Total Pages
`S
`Specification
`a.
`Computer Readable Form (CRF)
`
`
`o
`(preferredarrangement set forth below)
`2
`- Descriptive litle of the invention
`b.
`Specificatian SequenceListing on:
`
`
`- Cross Reference to Related Applications
`
`
`- Statement Regarding Fed sponsored R & D
`oy,
`.
`4 O CD-ROM or CD-R (2 copies); or
`- Reference to sequence listing, a table,
`
`
`or a computer program Sisting appendix
`.
`
`
`- Background of the Invention
`ii. Cc)
`Paper
`- Brief Summary of the Invention
`
`
`- Brief Description of the Drawings(iffiled)
`c. oO Statements verifying identity
`of above copies
`- Detailed Description
`
`
`
`ACCOMPANYING APPLICATION PARTS
`
`- Claim(s)
`- Abstract of the Disclosure
`
`
`
`9. oO Assignment Papers (cover sheet & document(s))
`
`
`
`
`4. [4] Drawing(s) (35 U.S.C. 113) (Total Sheets__42 J
`
`10. [C]
`37 CFR 3.73(b) Statement [ Powerof
`
`(whenthere is an assignee)
`Attomey
`'
`
`
`
`[Total Sheets_______—) 1. English Transtation Document(ifapplicable)
`5. Oath or Declaration
`_
`Copies of IDS
`a. [7] Newly executed (original or copy)
`12.
`Information Disclosure
`Statement (IDS)V/PTO-1449
`Citations
`
`
`b. [2] Copy from a prior application (37 CFR 1.63(d))
`13. O Preliminary Amendment
`
`(for continuation/divisional with Box 18 completed)
`
`14. [4 Retum Receipt Postcard (MPEP 503)
`
`
`
`(Should be specifically iternized)
`iC] DELETION OF INVENTOR(S)
`15.[ 1 Certified Copy of Priority Document(s)
`
`
`
`Signed statementattached deleting inventor{s)
`(if foreign priority is claimed)
`namein the prior application, see 37 CFR
`
`
`
`16. (J Nonpublication Request under 35 U.S.C. 122
`1.63(d)(2) and 1.33(b).
`
`
`
`(b\{2)(B\(i). Applicant must attach form PTO/SB/35
`orits equivalent.
`6. oO Application Data Sheet. See 37 CFR 1.76
`
`
`
`17.1 other:
`18. If a CONTINUING APPLICATION, check appropriate box, and supply the requisite information below andin the first sentenceofthe
`
`Specification following thetitle, or in an Application Data Sheet under 37 CFR 1.76:
`
`
`of priorapplication No.: 99/908,364... ee
`[E3 continuation
`C1 pivisionat
`ra Continuation-in-part (CIP)
`
`
`Art Unit: 2644
`Prior application information:
`Examiner Tony Jacobson
`For CONTINUATION OF DIVISIONAL APPSonly, The entire disclosure of the prior application, from which an oath or declaration fs supplied under Box
`5b, Is considered a part of the disclosura of the accompanying continuation or divisional application and is hereby incorporated by reference.
`
`The incorporation can only be relied upon when a portion has been Inadvertently omitted from the submitted application parts.
`
`
`19. CORRESPONDENCE ADDRESS
`
`
`
`Shemwell Gregory & Courtney LLP
`
`
`4880 Stevens Creek Boulevard
`Address
`
`Suite 201[Suite201
`
`
`
`Zp Code Tesizg|
`
`Name(Print/Type) RigfardAg Gregory, wf) Registration No. (Attamey/Agent)|42,607
`
`
`
`[SenaKuga| Senter 8, 2008
`
`
`
`This collection of information is require
`R 1.53(b). The information is required to obtain or rétain a benefit by the public which is to file (and by the
`USPTO to process) an application. Coffidentia in joverned by 35 U.S.C. 122 and 37 CFR 1.14. This collection is estimated to take 12 minutes to complete,
`including gathering, preparing, and submitting the completed application form to the USPTO. Time will vary depending upon the individual case. Any comments
`on the amountof time you require to complete this form and/or suggestions for reducing this burden, should be sentto the Chief Information Officer, U.S. Patent
`and Trademark Office, U.S. Oepartment of Commerce, P.O. Box 1450, Alexandria, VA 22313-1450. 00 NOT SEND FEES OR COMPLETED FORMSTO THIS
`ADDRESS. SEND TO: Mall Stop Patent Application, Commissioner for Patents, P.O. Box 1450, Alexandria, VA 22313-1450.
`ffyou need assistance in completing the form, call 1-800-PTO-9199 and select option 2.
`
`Oo Customer Number: fF OR
`
`Correspondence address below
`
`
`
`Page 1 of 639
`
`GOOGLEEXHIBIT 1002
`
`Page 1 of 639
`
`GOOGLE EXHIBIT 1002
`
`
`
`EXPRESS MAIL CERTIFICATE OF MAILING
`
`Express Mail” mailing label number: EV 326 938 875 US
`Date of Deposit: September 18, 2003
`| hereby certify that | am causing the paper(s) and/or fee(s) indicated below to be
`deposited with the United States Postal Service “Express Mail Post Office to
`Addressee”service on the date indicated above and that the paper(s) and/or
`fee(s) have been addressedto Mail Stop Patent Application, Commissioner for
`Patents, PO Box 1450, Alexandria, VA 22313-1450.
`
`Richard L. Gregory,Jr.
`
`(Typedor printed nameofperson mailing paper(s) or fee(s))
` Signature of péfson
`
`¢naVing paperorfee
`
`9-16-2003
`(Date signed)
`
`Filing/lssue Date: Herewith
`Serial/Patent No.:
`Tite. VAD-BASED MULTIPLE-MICROPHONE ACOUSTIC NOISE
`SUPPRESSION
`Date Mailed: September 18, 2003
`Atty. Docket No.:
`_ALPH.P010X
`The following has been received in the U.S. Patent & Trademark Office on the date stamped hereon:
`
`Oo Amendment/Response (___ Oo Petition for Extension of Time (=month(s))pgs.)
`
`oO Preliminary Amendment (
`pgs.)
`Vv] Information Disclosure Statement & PTO/SB/OBA
`vi Application - Utility (34 pgs.)
`oO Issue Fee Transmittal
`Oo Application - Rule 1.53(b) Contin. ( pgs.) O Submission of Formal Drawings
`Oo Application - Rule 1.53(b) Divis.(
`pgs.) oO Notice of Appeal
`(1 Application - Rule 1.53(b) CIP pgs.) [J Appeal Brief(
`(71 Application - Rule 1.53(d) CPA(
`pgs.) LJ Reply Bret
`oO Application-PCT(
`pgs.)
`oO Responseto Notice of Missing Parts
`Oo Application - Provisional! (
`vi Utility Patent Application Transmittal
`vi Drawings (12 sheets)
`oO Fee Transmittal (in dupticate)
`O Declaration (_ pgs.)
`vj \temized Postcard
`| Assignment & Cover Sheet (_ pgs.)
`Mi Express Mail Certificate Of Mailing
`Oo Powerof Attomey (_ pgs.)
`Vv Express Mail No. EV 326 938 875 US
`1 Nonpublication Request (35 USC 122(b)) CI check No.
`Amt
`O Other _Copies of twenty-five (25) cited references.
`
`pgs. in triplicate)
`
`pgs.)
`
`Page 2 of 639
`
`Page 2 of 639
`
`
`
`Attorney Docket No. ALPH.P010X.
`
`UNITED STATED PATENT APPLICATION
`
`for
`
`Voice Activity Detector (VAD) -Based Multiple-Microphone Acoustic Noise Suppression
`
`Inventors:
`
`Gregory C. Burnett
`
`Eric F. Breitfeller
`
`Prepared by
`
`Shemwell Gregory & Courtney LLP
`4880 Stevens Creek Blvd., Suite 201
`San Jose, CA 95129
`408-236-6647
`
`Attorney Docket No. ALPH.010X
`
`EXPRESS MAIL CERTIFICATE OF MAILING
`
`“Express Mail” mailing label number: EV 326 938 875 US
`Date of Deposit:__September 18, 2003
`I hereby certify that this paper is being deposited with the United States Postal
`Service “Express Mail Post Office to Addressee” service under 37 CFR §1.10 on the date
`indicated above and is addressed to Mail Stop Patent Application, Commissioner for
`Patents, PO Box 1450, Alexandria, VA 22313-1450.
`
`
`
`Page 3 of 639
`
`Page 3 of 639
`
`
`
`Attorney Docket No. ALPH.P010X
`
`Voice Activity Detector (VAD) -Based Multiple-Microphone Acoustic Noise
`
`Suppression
`
`RELATED APPLICATIONS
`This patent application is a continuation-in-part ofUnited States Patent
`Application Number 09/905,361, filed July 12, 2001, which claimspriority from United
`States Patent Application Number 60/219,297, filed July 19, 2000. This patent
`application also claims priority from United States Patent Application Number
`10/383, 162, filed March 5, 2003.
`
`10
`
`FIELD OF THE INVENTION
`
`The disclosed embodimentsrelate to systems and methods for detecting and
`
`processing a desired signal in the presence of acoustic noise.
`
`15
`
`BACKGROUND
`
`20
`
`25
`
`Many noise suppression algorithms and techniques have been developed over the
`years. Most of the noise suppression systems in use today for speech communication
`systemsare based on a single-microphone spectral subtraction technique first develop in
`the 1970’s and described, for example, by S. F. Boll in “Suppression of Acoustic Noise in
`Speech using Spectral Subtraction," IEEE Trans. on ASSP,pp. 113-120, 1979. These
`techniques have been refined over the years, but the basic principles of operation have
`remained the same. See, for example, United States Patent Number 5,687,243 of
`
`McLaughlin, et al., and United States Patent Number 4,811,404 of Vilmur, et al.
`
`Generally, these techniques make use of a microphone-based Voice Activity Detector
`(VAD)to determinethe backgroundnoise characteristics, where “voice” is generally
`understood to include human voiced speech, unvoiced speech, or a combination of voiced
`
`and unvoiced speech.
`The VAD has also been used in digital cellular systems. As an example of such a
`use, see United States Patent Number 6,453,291 of Ashley, where a VAD configuration
`
`30
`
`appropriate to the front-end ofa digital cellular system is described. Further, some Code
`
`Division Multiple Access (CDMA)systemsutilize a VAD to minimizethe effective radio
`
`spectrum used, thereby allowing for more system capacity. Also, Global System for
`
`Page 4 of 639
`
`Page 4 of 639
`
`
`
`Attorney Docket No. ALPH.P010X
`
`Mobile Communication (GSM)systemscan include a VAD to reduce co-channel
`interference andto reduce battery consumption onthe client or subscriber device.
`These typical microphone-based VAD systemsare significantly limited in
`capability as a result of the addition of environmental acoustic noise to the desired speech
`5_signal received by the single microphone, wherein the analysis is performed using typical
`signal processing techniques. In particular, limitations in performance ofthese
`microphone-based VAD systemsare noted whenprocessing signals having a low signal-
`to-noise ratio (SNR), and in settings where the background noise varies quickly. Thus,
`similar limitations are found in noise suppression systems using these microphone-based
`
`10
`
`VADs.
`
`Page 5 of 639
`
`Page 5 of 639
`
`
`
`Attomey Docket No. ALPH.P010X
`
`BRIEF DESCRIPTION OF THE FIGURES
`Figure 1 is a block diagram of a denoising system, under an embodiment.
`Figure 2 is a block diagram including components of a noise removal algorithm,
`under the denoising system of an embodiment assumingasingle noise source and direct
`
`paths to the microphones.
`Figure 3 is a block diagram including front-endcomponents ofa noise removal
`algorithm of an embodimentgeneralized to n distinct noise sources (these noise sources
`maybereflections or echoes of one another).
`Figure 4 is a block diagram including front-end componentsof a noise removal
`algorithm of an embodimentin a general case where there:are n distinct noise sources and
`
`signal reflections.
`Figure 5 is a flow diagram of a denoising method, under an embodiment.
`Figure 6 showsresults of a noise suppression algorithm of an embodimentfor an
`American English female speaker in the presence ofairport terminal noise that includes
`
`many other human speakers and public announcements.
`Figure 7A is a block diagram of a Voice Activity Detector (VAD) system
`including hardware for use in receiving and processingsignals relating to VAD, under an
`
`embodiment.
`Figure 7B is a block diagram of a VAD system using hardware of a coupled noise
`suppression system for use in receiving VAD information, under an alternative
`embodiment.
`.
`Figure 8 is a flow diagram of a method for determining voiced and unvoiced
`speech using an accelerometer-based VAD,under an embodiment.
`Figure 9 showsplots including a noisy audio signal (live recording) along with a
`corresponding accelerometer-based VAD signal, the corresponding accelerometer output
`signal, and the denoised audio signal following processing by the noise suppression
`system using the VAD signal, under an embodiment.
`Figure 10 showsplots including a noisy audio signal(live recording) along with a
`corresponding SSM-based VAD signal, the corresponding SSM outputsignal, and the
`denoised audio signal following processing by the noise suppression system using the
`
`VAD signal, under an embodiment.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`Page 6 of 639
`
`Page 6 of 639
`
`
`
`Attorney Docket No. ALPH.P010X
`
`Figure 11 showsplots including a noisy audio signal(live recording) along with a
`corresponding GEMS-based VAD signal, the corresponding GEMSoutputsignal, and the
`
`denoised audio signal following processing by the noise suppression system using the
`VAD signal, under an embodiment.
`
`Page 7 of 639
`
`Page 7 of 639
`
`
`
`Attorney Docket No. ALPH.P010X
`
`DETAILED DESCRIPTION
`
`The following description provides specific details for a thorough understanding
`
`of, and enabling description for, embodiments of the noise suppression system.
`
`However, one skilled in the art will understand that the invention may bepracticed
`
`without these details. In other instances, well-known structures and functions have not
`
`been shownor described in detail to avoid unnecessarily obscuring the description of the
`
`embodiments ofthe noise suppression system. In the following description, “signal”
`represents any acoustic signal (such as human speech) that is desired, and “noise” is any
`acoustic signal (which may include human speech)that is not desired. An example
`would be a person talking on a cellular telephone with a radio in the background. The
`person’s speech is desired and the acoustic energy from the radiois not desired. In
`addition, “user” describes a person whois using the device and whose speech is desired
`
`to be captured by the system.
`
`Also, “acoustic” is generally defined as acoustic waves propagating in air.
`
`15
`
`Propagation of acoustic waves in media other than air will be noted as such. References
`
`to “speech”or “voice” generally refer to human speech including voiced speech,
`
`unvoiced speech, and/or a combination of voiced and unvoiced speech. Unvoiced speech
`
`or voiced speech is distinguished where necessary. The term “noise suppression”
`
`generally describes any method by which noise is reduced or eliminated in an electronic
`
`20
`
`signal.
`
`Moreover, the term ““VAD”is generally defined as a vector orarray signal, data,
`
`or information that in some mannerrepresents the occurrence of speechin thedigital or
`
`analog domain. A commonrepresentation of VAD informationis a one-bit digital signal
`sampled at the samerate as the corresponding acoustic signals, with a zero value
`representing that no speech has occurred during the corresponding time sample, and a
`
`unity value indicating that speech has occurred during the corresponding time sample.
`
`While the embodiments described herein are generally described in the digital domain,
`
`the descriptions are also valid for the analog domain.
`
`Figure 1 is a block diagram of a denoising system 1000 of an embodimentthat
`uses knowledge of when speech is occurring derived from physiological information on
`voicing activity. The system 1000 includes microphones 10 and sensors 20 that provide
`
`25
`
`30
`
`Page 8 of 639
`
`Page 8 of 639
`
`
`
`Attorney Docket No, ALPH.P010X
`
`signals to at least one processor 30. The processorincludes a denoising subsystem or
`
`algorithm 40,
`
`Figure 2 is a block diagram including components of a noise removal algorithm
`
`200 of an embodiment. A single noise source and a direct path to the microphonesare
`assumed. An operational description of the noise removal algorithm 200 of an
`
`embodimentis provided using a single signal source 100 and a single noise source 101,
`
`but is not so limited. This algorithm 200 uses two microphones:a “‘signal” microphone 1
`(“MIC1”) and a “noise” microphone 2 (“MIC 2”), but is not So limited. The signal
`microphone MIC 1 is assumed to capture mostly signal with some noise, while MIC 2
`
`10
`captures mostly noise with somesignal. The data from the signal source 100 to MIC1is
`
`denoted by s(n), where s(n) is a discrete sample of the analog signal from the source 100.
`
`The data from the signal source 100 to MIC 2 is denoted by s,(n). The data from the
`
`noise source 101 to MIC 2 is denoted by n(n). The data from the noise source 101 to
`
`MIC1 is denoted by n,(n). Similarly, the data from MIC 1 to noise removal element 205
`
`15
`
`is denoted by m,(n), and the data from MIC 2 to noise removal element 205 is denoted by
`
`m,(n).
`The noise removal element 205also receives a signal fromavoice activity
`
`detection (VAD) element 204, The VAD 204 uses physiological information to
`
`determine when a speaker is speaking. In various embodiments, the VAD can includeat
`
`20
`
`least one of an accelerometer, a skin surface microphonein physical contact with skin of
`
`a user, a human tissue vibration detector, a radio frequency (RF) vibration and/or motion
`
`detector/device, an electroglottograph, an ultrasound device, an acoustic microphonethat
`
`is being used to detect acoustic frequency signals that correspondto the user’s speech
`
`directly from the skin of the user (anywhere on the body), an airflow detector, and a laser
`vibration detector.
`The transfer functions from the signal source 100 to MIC 1 and from the noise
`
`source 101 to MIC 2 are assumedto be unity. The transfer function from the signal
`source 100 to MIC 2 is denoted by H,(z), and the transfer function from the noise source
`101 to MIC 1 is denoted by H,(z). The assumption of unity transfer functions does not
`inhibit the generality of this algorithm, as the actual relations between the signal, noise,
`
`and microphonesare simply ratios and the ratios are redefined in this manner for
`
`25
`
`30
`
`simplicity.
`
`Page 9 of 639
`
`Page 9 of 639
`
`
`
`Attorney Docket No. ALPH.P010X
`
`In conventional two-microphone noise removal systems, the information from
`MIC 2 is used to attempt to remove noise from MIC1. However, an (generally
`unspoken) assumption is that the VAD element 204is never perfect, and thus the
`denoising must be performed cautiously, so as not to remove too muchofthe signal along
`with the noise. However, if the VAD 204 is assumedto be perfect such that it is equal to
`
`5
`
`zero whenthere is no speech being produced bythe user, and equal to one when speechis
`
`produced, a substantial improvementin the noise removal can be made.
`In analyzing the single noise source 101 and the direct path to the microphones,
`with reference to Figure 2, the total acoustic information coming into MIC 1 is denoted
`
`10
`
`bym,(n). Thetotal acoustic information coming into MIC 2 is similarly labeled m,(n).
`
`In the z (digital frequency) domain, these are represented as M,(z) and M,(z). Then,
`
`with
`
`so that
`
`M(2) =S(2)+N,(2)
`M,(2)=N(@)+5,(2)
`
`N,(2)=N@)H(2)
`5,(2)=S(2)H, (2),
`
`M ,(2)=S(2)+ N@H, (2)
`M ,(2)=N@)+ S@AH(2).
`This is the general case for all two microphonesystems. In a practical system
`there is always going to be some leakage of noise into MIC 1, and some leakage ofsignal
`into MIC 2. Equation 1 has four unknownsand only two knownrelationships and
`
`Eq.
`
`1
`
`therefore cannot be solved explicitly.
`
`However, there is another way to solve for some of the unknownsin Equation 1.
`
`The analysis starts with an examination of the case wherethe signal is not being
`generated, that is, where a signal from the VAD element 204 equals zero and speech is
`not being produced.
`In this case, s(n) = S(z) = 0, and Equation | reduces to
`
`M, (Q=N@H(2) .
`M,,(2)-N@),
`
`where the n subscript on the M variables indicate that only noise is being received. This
`
`leads to
`
`15
`
`20
`
`25
`
`30
`
`Page 10 of 639
`
`Page 10 of 639
`
`
`
`Attomey Docket No. ALPH.P010X
`
`M,,@=M,, (2)H,(z)
`M, (z)
`A(@)=— @)
`
`Eq. 2
`
`5
`
`10
`
`The function H,(z) can be calculated using anyof the available system
`identification algorithms and the microphoneoutputs when the system is certain that only
`noise is being received. The calculation can be done adaptively, so that the system can
`react to changes.in the noise.
`
`A solution is now available for one of the unknownsin Equation 1. Another
`unknown, H,(z), can be determinedbyusing the instances where the VAD equals one and
`speech is being produced. Whenthis is occurring, but the recent (perhapsless than 1
`second) history ofthe microphonesindicate low levels ofnoise,it can be assumedthat
`n(s) = N(z) ~ 0. Then Equation 1 reduces to
`
`15.
`
`which in turn leads to
`
`M,,()=S(z)
`M,,.)=S(@)H,(2),
`
`HT, (2)=
`
`M,,,()=M(2H,(2)
`
`M,, (z)
`M (z)
`whichis the inverse ofthe H,(z) calculation. However,it is noted that different inputs are
`being used (now onlythesignal is occurring whereas before only the noise was
`occurring). While calculating H,(z), the values calculated for H,(z) are held constant and
`vice versa. Thus, it is assumed that while one of H,(z) and H,(z) are being calculated, the
`one not being calculated does not change substantially.
`After calculating H,(z) and H,(z), they are used to removethe noise from the
`signal. If Equation 1 is rewritten as
`. SQ=M,@)-N@H,@
`N(@)=M, (2)— S(z)H; (2)
`S(@)=M ,(2)—[M;()-S@)H,(2)]H,(2)'
`SQl1-H,(2)H,(2)] =M,(z)-M,(2)H (2),
`
`20
`
`25
`
`30
`
`then N(z) may be substituted as Shownto solve for S(z) as
`
`Page 11 of 639
`
`Page 11 of 639
`
`
`
`Attorney Docket No. ALPH.P010X
`
`M,@)-M,@H,(@)
`SO@HG)
`" If the transfer functions H,(z) and H,(z) can be described with sufficient accuracy,
`then the noise can be completely removed andthe original signal recovered. This
`
`Eq. 3
`
`remains true without respect to the amplitude or spectral characteristics of the noise. The
`only assumptions made includeuse of a perfect VAD,sufficiently accurate H,(z) and
`H,(z), and that when one of H,(z) and H,(z) are being calculated the other does not
`
`changesubstantially. In practice these assumptions have proven reasonable.
`
`The noise removal algorithm described herein is easily generalized to include any
`numberofnoise sources. Figure 3 is a block diagram including front-end components
`300 ofa noise removal algorithm of an embodiment, generalizedto n distinct noise
`
`sources. These distinct noise sources maybereflections or echoes of one another, but are
`not so limited. There are several noise sources shown,each with a transfer function, or
`path, to each microphone. The previously named path H, has been relabeled as H), so
`that labeling noise source 2’s path to MIC 1] is more convenient. The outputs of each
`microphone, when transformedto the z domain,are:
`
`M,(2)=S(z)+ N(@2)H(2)+N,(@)H,(2)+...N,@)H, (2)
`M,(2)=S@H(2+ N,OG,Q+N,2)G,(2)+...N, 0G, (Zz).
`
`Eq. 4
`
`Whenthere is no signal (VAD = 0), then (suppressing z for clarity) |
`
`M,, =N,H,+N,H,+...N,H,
`M,, =N,G,+N,G,+...N,G,.
`
`A newtransfer function can nowbe definedas
`
`H,
`
`
`_M,, NH, +N,H,+...N,H, ;
`M,, N,G,+N,G,+...N,G,
`
`Eq. 5
`
`Eq. 6
`
`5
`
`10
`
`15
`
`20
`
`25
`
`where A, is analogous to H,(z) above. Thus H , depends only on the noise sources and
`their respective transfer functions and can be calculated any timethere is no signal being
`
`transmitted. Once again, the “n” subscripts on the microphoneinputs denote only that
`
`10
`
`Page 12 of 639
`
`Page 12 of 639
`
`
`
`Attorney Docket No. ALPH.P010X
`
`noise is being detected, while an ‘‘s” subscript denotes that only signal is being received
`
`by the microphones.
`
`Examining Equation 4 while assuming an absence of noise produces
`
`5
`
`M,=S
`M,, =SH,.
`
`Thus, H, can be solved for as before, using any available transfer function calculating
`
`algorithm. Mathematically, then,
`
`10
`
`Rewriting Equation 4, using H , defined in Equation 6, provides,
`
`M2s
`AL=
`° M,,
`
`.
`
`Solving for S yields,
`
`-S§
`
`H,MS
`M,-SH,
`
`gf MH=
`1-H,H,
`
`Eq. 7
`
`Eq. 8
`
`15
`
`20
`
`25
`
`whichis the same as Equation 3, with H, taking the place of H,, and 7 , taking the place
`of H,. Thus the noise removal algorithm still is mathematically valid for any number of
`noise sources, including multiple echoes of noise sources. Again, if H, and H , can be
`estimated to a high enough accuracy, and the above assumption of only one path from the
`
`signal to the microphones holds, the noise may be removed completely.
`
`The most general case involves multiple noise sources and multiple signal
`sources. Figure 4 is a block diagram including front-end components 400 of a noise
`removal algorithm of an embodiment in the most general case where there are n distinct
`noise sources and signal reflections. Here, signal reflections enter both microphones MIC
`1 and MIC 2. This is the most general case, as reflections of the noise source into the
`microphones MIC 1 and MIC 2 can be modeled accurately as simple additional noise
`sources. Forclarity, the direct path from the signal to MIC 2 is changed from H,(z) to
`
`Page 13 of 639
`
`Page 13 of 639
`
`
`
`Attomey Docket No. ALPH.P010X
`
`H,,(z), and the reflected paths to MIC 1 and MIC2are denoted by Ho,(z) and H,,(z),
`
`respectively.
`
`The input into the microphones now becomes
`
`M(2)=S(@)+S@QH, @+N(DH(2+N,@QH,@+...N, ()H,(@
`M,, (2) =S(2)Hgg (2) + S(2)Hgp (2) + N,(2)G, (2) + N,(2)G,(2)+...N, (JG, (2). Eq. 9
`
`Whenthe VAD = 0,the inputs become(suppressing z again)
`
`M,,=N,H,+N,H,+...N,H,
`M,,=N,G,+N,G,+...N,G,,
`
`which is the same as Equation 5. Thus, the calculation of H, in Equation 6 is unchanged,
`
`as expected. In examining the situation where there is no noise, Equation 9 reduces to
`
`M,,=S+SH,,
`M,, =SHy+ SH4).
`This leadsto the definition of H2 as
`
`_M,, Aw tHe
`7M,
`I1+H,,
`Rewriting Equation 9 again using the definition for H, (as in Equation 7)
`
`H
`
`Eq. 10
`4
`
`provides
`
`10
`
`15
`
`~ M,-SU+H
`H, ~M-SU+Hgy)Eq. 11
`M,-S(HytHy)
`
`20
`
`.
`Somealgebraic manipulation yields
`S(I+Hy,-A(Hoy +H J=M,-M,F,
`
`(Ho + Hop)
`
`S+H,,Jia, (AH, )or ea -M,H,
`S(1+H,, [l-,H, |-m,-M,#,,
`
`and finally
`
`12
`
`Page 14 of 639
`
`Page 14 of 639
`
`
`
`Attorney Docket No. ALPH.P010X
`
`S(1+H,,)-Mi-MoH)
`—s
`1-H,F,
`
`Eq. 12
`
`Equation 12 is the sameas equation 8, with the replacement of H, by H >, and the
`
`addition of the (1 + H),) factor on the left side. This extra factor (1 + Hp,) meansthat S
`
`cannot be solved fordirectly in this situation, but a solution can be generated for the
`
`signal plus the addition ofall of its echoes. This is not such a bad situation, as there are
`
`many conventional methods for dealing with echo suppression, and evenif the echoes are
`
`not suppressed,it is unlikely that they will affect the comprehensibility of the speech to
`any meaningful extent. The more complex calculation of H, is needed to account for the
`
`signal echoes in MIC 2, whichact as noise sources.
`
`Figure5is a flow diagram 500 of a denoising algorithm, under an embodiment.
`
`15
`
`20
`
`In operation, the acoustic signals are received, at block 502. Further, physiological
`
`information associated with human voicing activity is received, at block 504. A first
`
`transfer function representative of the acoustic signal is calculated upon determining that
`voicing information is absent from the acoustic signal for at least one specified period of
`time, at block 506. A secondtransfer function representative of the acoustic signalis
`
`calculated upon determining that voicing information is present in the acoustic signal for
`
`at least one specified period of time, at block 508. Noise is removed from the acoustic
`
`signal using at least one combinationof the first transfer function and the secondtransfer
`function, producing denoised acoustic data streams, at block 510.
`An algorithm for noise removal, or denoising algorithm, is described herein, from
`
`the simplest case of a single noise source with a direct path to multiple noise sources with
`
`reflections and echoes. The algorithm has been shownherein to be viable under any
`
`environmental conditions. The type and amountof noise are inconsequentialif a good
`estimate has been made of H, and H,, and if one does not change substantially while the
`
`25
`
`- other is calculated. If the user environmentis such that echoesare present, they can be
`
`compensated for if coming from a noise source. If signal echoesare also present, they
`will affect the cleaned signal, but the effect should be negligible in most environments.
`In operation, the algorithm of an embodiment has shown excellent results in
`
`dealing with a variety of noise types, amplitudes, and orientations. However, there are
`
`30
`
`always approximations and adjustments that have to be made when moving from
`
`Page 15 of 639
`
`Page 15 of 639
`
`
`
`Attorney Docket No. ALPH.PO10X
`
`mathematical concepts to engineering applications. One assumption is made in Equation
`3, where H,(z) is assumed small and therefore H,(z)H,(z) » 0, so that Equation 3 reduces
`to
`
`S(Z)=Mj(2)-M(2H(2).
`This means that only H,(z) has to be calculated, speeding up the process and reducing the
`“numberof computations required considerably. With the properselection of
`microphones, this approximationis easily realized.
`Another approximationinvolvesthefilter used in an embodiment. The actual
`H,(z) will undoubtedly have both poles andzeros, but for stability and simplicity an all-
`zero Finite Impulse Response (FIR)filter is used. With enough taps the approximation to
`the actual H,(z) can be very good.
`,
`To further increase the performanceof the noise suppression system, the spectrum
`ofinterest (generally about 125 to 3700 Hz) is divided into subbands. The wider the
`range of frequencies over whicha transfer function must be calculated, the moredifficult
`it is to calculate it accurately. Therefore the acoustic data was divided into 16 subbands,
`and the denoising algorithm was then applied to each subbandin turn. Finally, the 16
`denoised data streams were recombinedto yield the denoised acoustic data. This works
`very well, but any combinations of subbands(i.e., 4, 6, 8, 32, equally spaced,
`perceptually spaced,etc.) can be used and all have been foundto workbetter than a single
`
`10
`
`15
`
`20
`
`subband.
`
`The amplitude of the noise was constrained in an embodimentso that the
`microphones used did not saturate (that is, operate outside a linear response region). It is
`important that the microphonesoperate linearly to ensure the best performance. Even
`with this restriction, very low signal-to-noise ratio (SNR) signals can be denoised (down
`
`25
`
`30
`
`to -10 dB orless).
`The calculation of H,(z) is accomplished every 10 millisecondsusing the Least-
`Mean Squares (LMS) method, a commonadaptive transfer function. An explanation may
`be found in “Adaptive Signal Processing” (1985), by Widrow and Steams, published by
`Prentice-Hall, ISBN 0-13-004029-0. The LMSwas used for demonstration purposes, but
`many other system idenfication techniques can be used to identify H,(z) and H,(z) in
`Figure 2.
`.
`
`14
`
`Page 16 of 639
`
`Page 16 of 639
`
`
`
`Attorney Docket No. ALPH.P010X
`
`The VAD for an embodimentis derived from a radio frequency sensor and the
`two microphones, yielding very high accuracy (>99%)for both voiced and unvoiced
`speech. The VAD of an embodimentuses a radio frequency (RF) vibration detector
`interferometerto detect tissue motion associated with human speech production,butis
`not so limited. The signal from the RF device is completely acoustic-noise free, and is
`able to function in any acoustic noise environment. A simple energy measurementof the
`RF signal can beused to determineif voiced speech is occurring. Unvoiced speech can
`be determined using conventional acoustic-based methods, byproximity to voiced
`sections determined using the RF sensororsimilar voicing sensors, or through a
`combination ofthe above. Since there is much less energy in unvoiced speech,its
`detection accuracyis notas critical to good noise suppression performanceasis voiced
`
`speech.
`
`With voiced and unvoiced speech detectedreliably, the algorithm of an
`embodiment can be implemented. Onceagain,it is useful to repeat that the noise
`removalalgorithm does not depend on how the VAD is obtained, only thatit is accurate,
`especially for voiced speech. If speech is not detected and training occurs on the speech,
`the subsequent denoised acoustic data can bedistorted.
`Data was collected in four channels, one for MIC 1, one for MIC 2, and two for
`the radio frequency sensorthat detected the tissue motions associated with voiced speech.
`The data were sampled simultaneously at 40 kHz, then digitally filtered and decimated
`downto 8 kHz. The high sampling rate was used to reduce anyaliasing that might result
`from the analog to digital process. A four-channel National Instruments A/D board was
`used along with Labview to capture and store the data. The data was then read into a C
`program and denoised 10 millisecondsata time.
`Figure 6 shows a denoised audio 602 signal output upon application ofthe noise
`suppression algorithm of an em