`a2) Patent Application Publication 0) Pub. No.: US 2002/0193130 Al
`(43) Pub. Date: Dec. 19, 2002
`
`Yanget al.
`
`US 20020193130A1
`
`(54) NOISE SUPPRESSION FOR A WIRELESS
`COMMUNICATION DEVICE
`
`(75)
`
`Inventors: Feng Yang, Plano, TX (US); Yen-Son
`Paul Huang, Saratoga, CA (US)
`
`Publication Classification
`
`1
`7
`MME. CDee cceeseeeeeteseeeeseseeeetseneenees
`H04Q 7/00
`Cl.”
`Int.
`5
`(52) U.S. Ch. eee 455/501; 370/331; 455/67.3;
`455/90
`
`Correspondence Address:
`Truong Dinh
`Dinh & Associates
`2506 Ash Street
`
`Palo Alto, CA 94306 (US)
`
`(73)
`
`Assignee: ForteMedia, Inc., Campbell, CA
`
`(21)
`
`Appl. No.:
`
`10/076,201
`
`(22)
`
`Filed:
`
`Feb. 12, 2002
`
`Related U.S. Application Data
`
`(60) Provisional application No. 60/268,403,filed on Feb.
`12, 2001.
`
` 110b
`
`DISPLAY
`
`110a
`
`(57)
`
`ABSTRACT
`
`Techniques to suppress noise from a signal comprised of
`speech plus noise. In accordance with aspects of the inven-
`tion, two or more signal detectors (e.g., microphones) are
`used to detect respective signals having speech and noise
`components, with the magnitude of each component being
`dependent on various factors such as the distance between
`the speech source and the microphone. Signal processing is
`then used to process the detected signals to generate the
`desired output signal having predominantly speech with a
`large portion of
`the noise removed. The techniques
`described herein may be advantageously used for both
`near-field and far-field applications, and may be imple-
`mented in various mobile communication devices such as
`cellular phones.
`
`DISPLAY
`
`110
`
`110¢
`
`110a
`
`110b
`
`100
`
`110¢
`
`110d OO © ©
`
`DISPLAY
`
`APPLE 1013
`
`APPLE 1013
`
`1
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 1 of 7
`
`US 2002/0193130 Al
`
`110¢ FIG. 1A
`
`DISPLAY
`
`110b
`
`110b
`
` 100c
`
`110¢ DISPLAY
`
`CO CO OO ©
`
`FIG. 1¢
`
`2
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 2 of 7
`
`US 2002/0193130 A1
`
`yndyno
`
`jeublg
`
`uleyW
`
`weeg
`
`JOWJO-
`
`vv
`
`ures)
`
`Joyeinoje9
`
`@SION
`
`wunuyoeds
`
`Joyeusy
`
`BdI0A,
`
`AWAoW
`
`10]99}9q
`
`wueeg
`
`Buluo04
`
`JO}|OUOD
`
`asiouAj}SOw
`
`Buryooig
`
`weeg
`
`JOWIO-j
`
`—~aA|Y
`
`asiou+yoseds
`
`seaeeeeeeeeeeeeprrrrrc!4Hemmeamgnnd
`
`
`
`
`
`I92eseydezEZepLz!VIW
`
`3
`
`
`
`
`
`
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 3 of 7
`
`US 2002/0193130 Al
`
`s(t)
`
`Adaptive
`
`i Filter
`MIC B
`
`MIC A
`
`x(é)
`
`4
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 4 of 7
`
`US 2002/0193130 A1
`
`TTSeeeeerQO?
`
`yndjnoWINOaAS;+uoeseds!Oz1neeeeee
`jeubisdeldSSION
` IiII|$G)A1|uoynseyjaqnsJ2y1-4G)
`
`
`Il|J0}09}8q;;vedaAWAYIonjdepyS010!!|
`
`
`
`I
`
`OSION
`
`(1)X
`
`5
`
`
`
`
`
`
`I1geseseud!|ioeponennnnn--------------,
`
`20E¢
`
`jeubis@SIONIedyndjno!+yoseds
`!|1I1I|(A!(a)s
`
`Patent Application Publication Dec. 19, 2002 Sheet 5 of 7
`
`US 2002/0193130 A1
`
`Ii|1I1I11
`
`anydepySOONOSnewuinsyjoadsRIANOW.!9SIONJoye
`
`
`
`
`i1I1{SSION
`
`(3)x
`
`6
`
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 6 of 7
`
`US 2002/0193130 A1
`
`yndjno
`
`jeubis
`
`(QA
`
`r——_—_se |e eee ee ee Ee eS eee ee
`
`QSION
`
`wuniyoads
`
`Joyeunsy
`
`ules)
`
`JoyeinajegeAndepy
`
`49}|I4
`
`ad10/
`
`AANOY
`
`J0}09}9q]
`
`0v9
`
`+yoseds
`
`OSION
`
`(j)s
`
`OSION
`
`(3)x
`
`7
`
`
`
`
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 7 of 7
`
`US 2002/0193130 Al
`
`speech + noise
`
`710a
`
`’
`
`720
`
`Output
`Signal
`
`y(t)
`
` Signal
`
`
`?
`
`Processor
`
`mostly noise
`
`FIG. 7A
`
`710a
`
`s(t)
`
`710 Uy
`
`x(t)
`
`a
`of speech
`
`Direction
`
`FIG. 7B
`
`8
`
`
`
`US 2002/0193130 Al
`
`Dec. 19, 2002
`
`NOISE SUPPRESSION FOR A WIRELESS
`COMMUNICATION DEVICE
`
`BACKGROUND
`
`[0001] The present invention relates generally to commu-
`nication apparatus. Moreparticularly, it relates to techniques
`for suppressing noise in a speech signal, and which may be
`used in a wireless or mobile communication device such as
`a cellular phone.
`
`In manyapplications, a speech signal is received in
`[0002]
`the presenceof noise, processed, and transmitted to a far-end
`party. One example of such a noisy environmentis wireless
`application. For many conventional cellular phones,
`a
`microphoneis placed near a speaking user’s mouth and used
`to pick up speech signal. The microphone typically also
`picks up background noise, which degrades the quality of
`the speech signal transmitted to the far-end party.
`
`[0003] Newer-generation wireless communication devices
`are designed with additional capabilities. Besides supporting
`voice communication, a user may be able to view text or
`browse World Wide Web page via a display on the wireless
`device. New videophone service requires the user to place
`the phone away, which therefore requires “far-field” speech
`pick-up. Moreover, “hands-free” communication is safer
`and provides more convenience, especially in an automo-
`bile. In any case, the microphonein the wireless device may
`be used in a “far-field” mode whereby it may be placed
`relatively far away from the speaking user (instead of being
`pressed against
`the user’s ear and mouth). For far-field
`communication, less signal and more noise are received by
`the microphone, and a lowersignal-to-noise ratio (SNR) is
`achieved, which typically leads to poor signal quality.
`
`[0004] One commontechnique for suppressing noiseis the
`spectral subtraction technique. In a typical implementation
`of this technique, speech plus noise is received via a single
`microphone and transformed into a number of frequency
`bins via a fast Fourier transform (FFT). Under the assump-
`tion that the background noise is long-time stationary (in
`comparison with the speech), a model of the background
`noise is estimated during time periods of non-speech activity
`whereby the measured spectral energy of the received signal
`is attributed to noise. The background noise estimate for
`each frequency bin is utilized to estimate an SNR of the
`speech in the bin. Then, each frequency bin is attenuated
`according to its noise energy content with a respective gain
`factor computed based on that bin’s SNR.
`
`[0005] The spectral subtraction technique is generally
`effective at suppressing stationary noise components. How-
`ever, due to the time-variant nature of the noisy environment
`(e.g., street, airport, restaurant, and so on),
`the models
`estimated in the conventional manner using a single micro-
`phoneare likely to differ from actuality. This may result in
`an output speech signal having a combination of low audible
`quality, insufficient reduction of the noise, and/or injected
`artifacts.
`
`[0006] Another technique for suppressing noise is with a
`microphonearray. For this technique, multiple microphones
`are arranged typically in a linear or someothertypeofarray.
`An adaptive or non-adaptive methodis then used to process
`the signals received from the microphonesto suppress noise
`and improve speech SNR. However, the microphone array
`
`9
`
`has not seen being applied to mobile communication devices
`since it generally require certain size that cannotbefit into
`the small form factor of current mobile devices.
`
`communication devices
`[0007] Conventional wireless
`such as cellular phones typically utilize a single microphone
`to pick up speech signal. The single microphone design
`limits the type of signal processing that may be performed
`on the received signal, and may further limit the amountof
`improvement (i.e.,
`the amount of noise suppression) that
`may be achievable. The single microphone design is also
`ineffective at suppressing noise in far-field application
`where the microphone is placed at a distance (e.g., a few
`feet) away from the speech source.
`
`[0008] As can be seen, techniques that can be used to
`suppress noise in a speech signal in a wireless environment
`are highly desirable.
`
`SUMMARY
`
`[0009] The invention provides techniques to suppress
`noise from a signal comprised of speech plus noise. In
`accordancewith aspects of the invention, two or more signal
`detectors (e.g., microphones) are used to detect respective
`signals. Each detected signal comprises a desired speech
`component and an undesired noise component, with the
`magnitude of each component being dependent on various
`factors such as the distance between the speech source and
`the microphone, the directivity of the microphone,the noise
`sources, and so on. Signal processing is then used to process
`the detected signals to generate the desired output signal
`having predominantly speech, with a large portion of the
`noise removed. The techniques described herein may be
`advantageously used for both near-field and far-field appli-
`cations, and may be implemented in various wireless and
`mobile devices such as cellular phones.
`
`[0010] An embodimentof the invention provides a mobile
`communication device that
`includes a number of signal
`detectors (e.g., two microphones), optional first and second
`beam forming units, and a noise suppression unit. The beam
`forming units and noise suppression unit may be imple-
`mented within a digital signal processor (DSP). Each signal
`detector provides a respective detected signal having a
`desired component plus an undesired component. Thefirst
`beam forming unit receives and processes the detected
`signals to provide a first signal s(t) having the desired
`componentplus a portion of the undesired component. The
`second beam forming unit
`receives and processes the
`detected signals to provide a second signal x(t) having a
`large portion of the undesired component. The noise sup-
`pression unit then receives and digitally processes the first
`and second signals to provide an output signal y(t) having
`substantially the desired component and a large portion of
`the undesired component removed. The noise suppression
`unit may be designedto digitally process thefirst and second
`signals in the frequency domain, although signal processing
`in the time domain is also possible. The noise suppression
`unit may be designedto perform the noise cancellation using
`spectrum modification technique, which provides improved
`performance over other noise cancellation techniques.
`
`In one specific design, the noise suppression unit
`[0011]
`includes a noise spectrum estimator, a gain calculation unit,
`a speech or voice activity detector, and a multiplier. The
`noise spectrum estimator derives an estimate of the spectrum
`
`9
`
`
`
`US 2002/0193130 Al
`
`Dec. 19, 2002
`
`of the noise based on a transformed representation of the
`second signal. The gain calculation unit provides a set of
`gain coefficients for the multiplier based on a transformed
`representation of the first signal and the noise spectrum
`estimate. The multiplier receives and scales the magnitude
`of the transformedfirst signal with the set of gain coeffi-
`cients to provide a scaled transformed signal, which is then
`inverse transformed to provide the output signal. The activ-
`ity detector provides a control signal indicative of active and
`non-active time periods, with the active time periods indi-
`cating that thefirst signal includes predominantly the desired
`component. Thefirst beam forming unit may be allowed to
`adapt during the active time periods, and the second beam
`forming unit may be allowed to adapt during the non-active
`time periods.
`
`[0012] Another aspect of the invention provides a wireless
`communication device, e.g., a mobile phone, havingat least
`two microphones and a signal processor. Each microphone
`detects and provides a respective detected signal comprised
`of a desired component and an undesired component. For
`each detected signal, the specific amount of each (desired
`and undesired) component included in the detected signal
`may be dependenton various factors, such as the distance to
`the speaking source and the directivity of the microphone.
`The signal processor receives and digitally processes the
`detected signals to provide an output signal having substan-
`tially the desired component and a large portion of the
`undesired component removed. The signal processing may
`be performed in a manner that is dependent in part on the
`characteristics of the detected signals.
`
`[0013] Various other aspects, embodiments, and features
`of the invention are also provided, as described in further
`detail below.
`
`[0014] The foregoing, together with other aspects of this
`invention, will become more apparent whenreferring to the
`following specification, claims, and accompanying draw-
`ings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0015] FIGS. 1A through 1C are diagramsof three wire-
`less communication devices capable of implementing vari-
`ous aspects of the invention;
`
`[0016] FIG. 2 is a block diagram of a speech processing
`system suitable for removing background noise from a
`speech plus noise signal, and may be used for both near-field
`and far-field applications;
`
`[0017] FIGS. 3A and 3B are block diagrams of an
`embodiment of a main beam forming unit and a blocking
`beam forming unit, respectively;
`
`[0018] FIGS. 4, 5, and 6 are block diagrams of three
`different embodiments of the noise suppression unit; and
`
`[0019] FIGS. 7A and 7B are diagrams of another speech
`processing system suitable for removing background noise
`from a speech plus noise signal.
`
`DESCRIPTION OF THE SPECIFIC
`EMBODIMENTS
`
`[0020] FIG. 1A is a diagram of an embodiment of a
`wireless communication device 100a capable of implement-
`ing various aspects of the invention. In this embodiment,
`
`device 100a is a cellular phone having a pair of microphones
`110a@ and 110b. Microphone 110ais located in the lowerleft
`corner of the device, and microphone 110b is located in the
`lowerright corner of the device. The microphones may also
`be located in other parts of the device, and this is within the
`scope of the invention. The placement of the microphones
`may be constrained by various factors such as the small size
`of the cellular phone, manufacturability, and so on.
`
`[0021] FIG. 1B is a diagram of an embodiment of a
`wireless communication device 1005 having three micro-
`phones 110. In this embodiment, microphone 110a is located
`in the lower center of the device near a speaking user’s
`mouth and may be used to pick up desired speech plus
`undesired background noise. Microphone 110b is located in
`the middle left side of the device, and microphone 110c is
`located in the middle right side of the device. Additional
`microphones mayalso be used, and the microphones may
`also be placed in otherparts of the device, and this is within
`the scope of the invention. The microphones do not need to
`be placed in an array. For improved performance,
`the
`microphones may be located as far away from each other as
`practically possible.
`
`[0022] FIG. 1C is a diagram of an embodiment of a
`wireless communication device 100c having a number of
`microphones110. In this embodiment, device 110c includes
`a larger sized display, which may be usedfor displaying text,
`graphics, videos, and so on. Device 100c may be a handset
`for the new 3“ generation (3GPP) wireless communication
`systems under development and deployment. Device 100c
`may also be a personal digital assistant (PDA) with voice
`recognition or phone function. Device 100c may also be a
`video phone with or without web-browser capability. In
`general, device 100c may be any device capable of support-
`ing voice communication possibly along with other func-
`tions (e.g., text, video, and so on). In the specific embodi-
`ment shown in FIG. 1C, microphones 110a through 110d
`are located in a line abovethe display area. The microphones
`may also be placed in other locations of the device.
`
`[0023] Each of devices 100a, 1006, and 100c advanta-
`geously employ two or more microphones to allow the
`device to be used for both “near-field” and “far-field”
`
`applications. For near-field application, one microphone
`(e.g., microphone 110a in FIG. 1B) or multiple microphones
`(e.g., microphones 110a and 1106 in FIG. 1A) may be used
`to pick up speech signal from a close-by source. And for
`far-field application, the microphones are designed to pick
`up speech signal from a source located further away. Noise
`suppression is used to remove noise and improve signal
`quality.
`
`[0024] Devices 110a and 110d are similar to conventional
`cellular phones and may be used with the devices placed
`close to the speaking user. With the noise suppression
`techniques described herein, devices 110a and 110b may
`also be used in a hand-free mode whereby they are located
`further away from the speaking user. Device 110c is a
`handset that may be designed to be placed away from the
`user (e.g., one to two feet away) during use, which allows
`the user to better view the display while talking.
`
`[0025] FIG. 2 is a block diagram of a speech processing
`system 200 capable of removing background noise from a
`speech plus noise signal and utilizing a numberof signal
`detectors. In an embodiment, microphones are used as the
`
`10
`
`10
`
`
`
`US 2002/0193130 Al
`
`Dec. 19, 2002
`
`signal detectors. System 200 may be used for both near-field
`and far-field applications, and may be implemented in each
`of devices 100a through 100c in FIGS. 1A through 1C,
`respectively.
`
`[0026] System 200 includes two or more microphones
`210a through 210”, a beam forming unit 212, and a noise
`suppression unit 230a. Beam forming unit 212 may be
`optional for some devices (e.g., for devices that use direc-
`tional microphones), as described below. Beam forming unit
`212 and a noise suppression unit 230a may be implemented
`within one or more digital signal processors (DSPs) or some
`other integrated circuit.
`
`[0027] Each microphone provides a respective analog
`signal that is typically conditioned (e.g., filtered and ampli-
`fied) and then digitized prior to being subjected to the signal
`processing by beam forming unit 212 and noise suppression
`unit 230a. For simplicity, this conditioning and digitization
`circuitry is not shown in FIG.2.
`
`[0028] The microphones may belocated eithercloseto, or
`at a relatively far distance away from, the speaking user
`during use. Each microphone 210 detects a respective signal
`having a speech component plus a noise component, with
`the magnitude of the received components being dependent
`on various factors, such as (1) the distance between the
`microphone and the speech source, (2) the directivity of the
`microphone (e.g., whether the microphoneis directional or
`omni-directional), and so on. The detected signals from
`microphones 210a through 210n are provided to each of two
`beam forming units 214a@ and 2145 within umit 212.
`
`[0029] Main beam forming unit 214a, which is also
`referred to as the “main beam former”, processesthe signals
`from microphones 210a through 210” to provide a signal
`s(t) comprised of speech plus noise. Main beam forming unit
`214a may further be able to suppress a portion of the
`received noise component. Main beam forming unit 214a
`may be designed to implement any type of beam formerthat
`attempts to reject as much interference and noise as possible.
`Aspecific design for main beam forming unit 214a is shown
`in FIG. 3A below. Main beam forming unit 214a may also
`be an optional unit that may be omitted for some devices
`(e.g., if the signal s(t) can be obtained from one micro-
`phone). Main beam forming unit 214a provides the signal
`s(t) to noise suppression unit 230a.
`[0030] Blocking beam forming unit 214b, which is also
`referred to as a “blocking beam former”, processes the
`signals from microphones 210a through 210” to provide a
`signal x(t) comprised of mostly the noise component. Block-
`ing beam forming unit 214b is used to provide an accurate
`estimate of the noise, and to block as much of the desired
`speech signal as possible. This then allows for effective
`cancellation of the noise in the signal s(t). Blocking beam
`forming unit 214b may also be designed to implement any
`one of a number of beam formers, one of which is shownin
`FIG.3B below. Blocking beam forming unit 214b provides
`the signal x(t) to noise suppression unit 230a. By employing
`blocking beam forming unit 214b to generate the mostly
`noise signal x(t), system 200 may utilize various types of
`microphone (e.g., omni-directional microphone, dipole
`microphones, and so on) which may pick up any combina-
`tion of signal and noise.
`[0031] A beam forming controller 218 directs the opera-
`tion of main and blocking beam forming units 214a and
`
`11
`
`2145. Controller 218 typically receives a control signal from
`a voice activity detector (VAD) 240. Voice activity detector
`240 detects the presence of speech at the microphones and
`provides the Act control signal indicating periods of speech
`activity. The detection of speech activity can be performed
`in various manners known in the art, one of which is
`described by D. K. Freemanet al. in a paper entitled “The
`Voice Activity Detector for the Pan-European Digital Cel-
`lular Mobile Telephone Service,” 1989 IEEE International
`Conference Acoustics, Speech and Signal Processing, Glas-
`gow, Scotland, Mar. 23-26, 1989, pages 369-372, which is
`incorporated herein by reference.
`
`[0032] Beam forming controller 218 provides the neces-
`sary controls that direct main and blocking beam forming
`units 214a and 214b to adapt at the appropriate times. In
`particular, controller 218 provides an Adapt_M control
`signal to main beam forming unit 214a to enableit to adapt
`during periods of speech activity and an Adapt_B control
`signal to blocking beam forming unit 2145 to enable it to
`adapt during periods of non-speech activity. In one simple
`implementation, the Adapt_B control signal is generated by
`inverting the Adapt_M control signal.
`
`[0033] FIG. 3A is a block diagram of an embodiment of
`main beam forming unit 214a. The signal from microphone
`210a is provided to a delay element 312 and the signals from
`microphones 2106 through 210n are respectively provided
`to adaptive filters 3145 through 314”. Delay element 312
`provides delay for the signal from microphone 210¢ such
`that the delayed signal is approximately time-aligned with
`the outputs from adaptive filters 314b through 314”. The
`amountof delay to be provided by delay element 312 is thus
`dependent on the design of adaptive filters 314. One par-
`ticular delay length may bea half of the tap numberof the
`adaptive filters, if a finite impulse response (FIR) adaptive
`filter is used for each adaptivefilter.
`
`[0034] Each adaptive filter 314 filters the received signal
`such that the error signal e(t) used to update the adaptive
`filter is minimized during the adaptation period. Adaptive
`filters 314 may be designed to implement any one of a
`number of adaptation algorithms known in the art. Some
`such algorithms include a least mean square (LMS) algo-
`rithm, a normalized mean square (NLMS),a recursiveleast
`square (RLS)algorithm, and a direct matrix inversion (DMI)
`algorithm. Each of the LMS, NLMS, RLS, and DMI algo-
`rithms(directly or indirectly) attempts to minimize the mean
`square error (MSE)oftheerrorsignal e(t) used to update the
`adaptive filter. In an embodiment, the adaptation algorithm
`implemented by adaptive filters 314b through 3147 is the
`NLMSalgorithm.
`
`[0035] The NLMSalgorithm is described in detail by B.
`Widrow and S. D. Stemsin a book entitled “Adaptive Signal
`Processing,” Prentice-Hall Inc., Englewood Cliffs, N.J.,
`1986. The LMS, NLMS, RLS, DMI, and other adaptation
`algorithms are also described in detail by Simon Haykin in
`a book entitled “Adaptive Filter Theory”, 3rd edition, Pren-
`tice Hall, 1996. The pertinent sections of these books are
`incorporated herein by reference.
`
`[0036] As shownin FIG.3A,the filtered signal from each
`adaptive filter 314 is subtracted by the delayed signal from
`delay element 312 by a respective summer 316 to provide
`the error signal e(t) for that adaptive filter. This error signal
`is then provided back to the adaptive filter and used to
`
`11
`
`
`
`US 2002/0193130 Al
`
`Dec. 19, 2002
`
`update the response of that adaptive filter. As also shown in
`FIG. 3A,adaptive filters 314b through 314” are updated
`when the Adapt_M control signal is enabled, and are main-
`tained when the Adapt_M control signal is disabled.
`
`[0037] To generate the signal s(t), a summer 318 receives
`and combines the delayed signal from microphone 210a
`with the filtered signals from adaptive filters 314b through
`314n. The resultant output may further be divided by a factor
`of N,,;, (where N,,;, denotes the number of microphones) to
`provide the signal s(t).
`
`[0038] FIG. 3A shows a specific design for main beam
`forming unit 214a. Other designs may also be used and are
`within the scope of the invention. For example, main beam
`forming unit 214a may be implemented with a “Griffiths-
`Jim” beam formerthat is described by L. J. Griffiths and C.
`W. Jim in a paper entitled “An Alternative Approach to
`Robust Adaptive Beam Forming,” IEEE Trans. Antenna
`Propagation, January 1982, vol. AP-30, no. 1, pp. 27-34,
`which is incorporated herein by reference.
`
`[0039] FIG. 3B is a block diagram of an embodiment of
`blocking beam forming unit 214b. The signal from micro-
`phone 210a is provided to a delay element 322 and the
`signals from microphones 2105 through 210” are respec-
`tively provided to adaptivefilters 3245 through 324”. Delay
`element 322 provides an amount of delay approximately
`matching the delay of adaptive filters 324. One particular
`delay length may be a half of the tap numberof the adaptive
`filter, if a FIR filter is used for each adaptivefilter.
`
`[0040] Each adaptive filter 324 filters the received signal
`such that an error signal e(t)
`is minimized during the
`adaptation period. Adaptive filters 324 also may be imple-
`mented using various designs, such as with NLMSadaptive
`filters. To generate the signal x(t), a summer 328 receives
`and subtracts the filtered signals from adaptive filters 324b
`through 324from the delay signal from delay element 322.
`The signal x(t) represents the commonerror signal for all
`adaptivefilters 324b through 324within the blocking beam
`former, and is used to adjust the response of these adaptive
`filters.
`
`[0041] Referring back to FIG. 2, noise suppressor 230a
`performs noise suppression in the frequency domain. Fre-
`quency domain processing may provide improved noise
`suppression and may be preferred over time domain pro-
`cessing because of superior performance. The mostly noise
`signal x(t) does not need to be highly correlated to the noise
`component in the speech plus noise signal s(t), and only
`need to be correlated in the power spectrum, which is a much
`more relaxedcriteria.
`
`the speech plus
`[0042] Within noise suppressor 230a,
`noise signal s(t) from main beam forming unit 214a is
`transformed by a transformer 232a to provide a transformed
`speech plus noise signal S(w). In an embodiment, the signal
`s(t) is transformed one block at a time, with each block
`including L data samples for the signal s(t), to provide a
`corresponding transformed block. Each transformed block
`of the signal S(w) includes L elements, S,(w,) through
`S,(@..1), corresponding to L frequency bins, where n
`denotes the time instant associated with the transformed
`block. Similarly, the mostly noise signal x(t) from blocking
`beam forming unit 214b is transformed by a transformer
`232b to provide a transformed mostly noise signal X(o).
`
`Each transformed block of the signal X() also includes L
`elements, X,(@.) through X,(,,,). In the specific embodi-
`ment shownin FIG.2, transformers 2324 and 2325 are each
`implemented as a fast Fourier transform (FFT) that trans-
`forms a time-domain representation into a frequency-do-
`main representation. Other type of transform may also be
`used, and this is within the scope of the invention. The size
`of the digitized data block for the signals s(t) and x(t) to be
`transformed can be selected based on a number of consid-
`erations (e.g., computational complexity). In an embodi-
`ment, blocks of 128 samples at the typical audio sampling
`rate are transformed, although other block sizes may also be
`used. In an embodiment,
`the samples in each block are
`multiplied by a Hanning window function, and there is a
`64-sample overlap between each pair of consecutive blocks.
`
`[0043] The magnitude componentof the transformed sig-
`nal S(w) is provided to a multiplier 236 and a noise spectrum
`estimator 242. Multiplier 236 scales the magnitude compo-
`nent of S(w) with a set of gain coefficients G(w) provided by
`a gain calculation unit 244. The scaled magnitude compo-
`nent is then recombined with the phase component of S(w)
`and provided to an inverse FFT (IFFT) 238, which trans-
`forms the recombined signal back to the time domain. The
`resultant output signal y(t) includes predominantly speech
`and has a large portion of the background noise removed.
`
`It is sometime advantageous, though it may not be
`[0044]
`necessary, to filter the magnitude component of S(w) and
`X(w) so that a better estimation of the short-term spectrum
`magnitude of the respective signal can be obtained. One
`particular
`filter
`implementation is a first-order
`infinite
`impulse response (IIR) low-pass filter with different attack
`and release time.
`
`[0045] Noise spectrum estimator 242 receives the magni-
`tude of the transformed signal S(), the magnitude of the
`transformed signal X(w), and the Act control signal from
`voice activity detector 240 indicative of periods of non-
`speech activity. Noise spectrum estimator 242 then derives
`the magnitude spectrum estimates for the noise N(w), as
`follows:
`
`IN@|-Wo)K)L,
`
`Eq (1)
`
`[0046] where W(«)is referred to as the channel equaliza-
`tion coefficient. In an embodiment, this coefficient may be
`derived based on an exponential average of the ratio of
`magnitude of S(w) to the magnitude of X(w), as follows:
`
`IS]
`_
`Wri (@) = aWn{@) + (1 — a)
`
`Eq Q)
`
`[0047] where a is the time constant for the exponential
`averaging and is O<a<1. In a specific implementation, a=1
`whenvoice activity indicator 240 indicates a speech activity
`period and a=0.98 when voice activity indicator 240 indi-
`cates a non-speech activity period.
`
`[0048] Noise spectrum estimator 242 provides the mag-
`nitude spectrum estimates for the noise N(@) to gain calcu-
`lator 334, which then uses these estimates to generate the
`gain coefficients G(w) for multiplier 334.
`
`[0049] With the magnitude spectrum of the noise |N()|
`and the magnitude spectrum ofthe signal |S(w)| available, a
`
`12
`
`12
`
`
`
`US 2002/0193130 Al
`
`Dec. 19, 2002
`
`numberof spectrum modification techniques may be used to
`determine the gain coefficients G(w). Such spectrum modi-
`fication techniques include a spectrum subtraction tech-
`nique, Weinerfiltering, and so on.
`
`In an embodiment, the spectrum subtraction tech-
`[0050]
`niqueis used for noise suppression, and the gain coefficients
`G(w) may be determined by first computing the SNR ofthe
`speech plus noise signal S(w) and the mostly noise signal
`N(@), as follows:
`
`SNR(w) =
`
`IS@)|
`IN(@)’
`
`Eq 3)
`
`[0051] The gain coefficient G() for each frequency bin w
`may then be expressed as:
`
`G(w) = ma(ae -)
`~
`SNR(w)
`
`> Gin}
`
`Eq (4)
`
`[0052] where G,,,,, is a lower bound on G().
`
`[0053] Gain calculator 244 thus generates a gain coeffi-
`cient G(m,) for each frequency bin j of the transformed
`signal S(w). The gain coefficients for all frequency binsare
`provided to multiplier 236 and used to scale the magnitude
`of the signal S(w).
`[0054]
`In an aspect, the spectrum subtraction is performed
`based on a noise N(w)that is a time-varying noise spectrum
`derived from the mostly noise signal x(t), which may be
`provided by the blocking beam former. Thisis different from
`the spectrum subtraction used in conventional single micro-
`phone design whereby N(w) typically comprises mostly
`stationary or constant values. This type of noise suppression
`is also described in U.S. Pat. No. 5,943,429, entitled “Spec-
`tral Subtraction Noise Suppression Method,” issued Aug.
`24, 1999, whichis incorporated herein by reference. The use
`of a time-varying noise spectrum (which more accurately
`reflects the real noise in the environment) allows the inven-
`tive noise suppression techniques to cancel non-stationary
`noise as well as stationary noise (non-stationary noise can-
`cellation typically cannot be achieve by conventional noise
`suppression techniques that use a static noise spectrum).
`[0055] The spectrum subtraction technique for a single
`microphoneis also described by S. F. Boll in a paperentitled
`“Suppression of Acoustic Noise in Speech Using Spectral
`Subtraction,” IEEE Trans. Acoustic Speech Signal Proc.,
`April 1979, vol. ASSP-27, pp. 113-121, which is incorpo-
`rated herein by reference.
`
`[0056] The spectrum modification technique is one tech-
`nique for removing noise from the speech plus noise signal
`s(t). The spectrum modification technique provides good
`performance and can remove both stationary and non-
`stationary noise (using the time-varying noise spectrum
`estimate described above). However, other noise suppres-
`sion techniques may also be used to remove noise, some of
`whichare described below,and this is within the scopeofthe
`invention.
`
`[0057] The noise suppression technique shown in FIGS.
`2, 3A, and 3B provides goodresult even for wireless devices
`
`is desirable to
`it
`In general,
`having small form fact