`(12) Patent Application Publication (10) Pub. No.: US 2002/0193130 A1
`Yang et al.
`(43) Pub. Date:
`Dec. 19, 2002
`
`US 2002.0193130A1
`
`(54) NOISE SUPPRESSION FOR AWIRELESS
`COMMUNICATION DEVICE
`(75) Inventors: Feng Yang, Plano, TX (US); Yen-Son
`Paul Huang, Saratoga, CA (US)
`Correspondence Address:
`Truong Dinh
`Dinh & Associates
`2506 Ash Street
`Palo Alto, CA 94.306 (US)
`(73) Assignee: ForteMedia, Inc., Campbell, CA
`(21) Appl. No.:
`10/076,201
`(22) Filed:
`Feb. 12, 2002
`
`Related U.S. Application Data
`(60) Provisional application No. 60/268,403, filed on Feb.
`12, 2001.
`
`
`
`Publication Classification
`
`(51) Int. Cl." ....................................................... H04Q 7/00
`(52) U.S. Cl. ........................ 455/501; 370/331; 455/67.3;
`455/90
`
`
`
`(57)
`
`ABSTRACT
`
`Techniques to SuppreSS noise from a signal comprised of
`Speech plus noise. In accordance with aspects of the inven
`tion, two or more signal detectors (e.g., microphones) are
`used to detect respective Signals having speech and noise
`components, with the magnitude of each component being
`dependent on various factorS Such as the distance between
`the Speech Source and the microphone. Signal processing is
`then used to process the detected Signals to generate the
`desired output signal having predominantly Speech with a
`large portion of the noise removed. The techniques
`described herein may be advantageously used for both
`near-field and far-field applications, and may be imple
`mented in various mobile communication devices Such as
`cellular phones.
`
`DSPLAY
`
`DISPLAY
`
`110
`
`10c
`
`DISPLAY
`
`O O O O
`
`Exhibit 1006
`Page 01 of 17
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 1 of 7
`
`US 2002/0193130 A1
`
`DISPLAY
`
`110b
`
`110C
`
`110a
`
`
`
`110b
`
`100c
`
`110C
`
`1100
`
`DISPLAY
`
`O O O O
`FIG. 1C
`
`Exhibit 1006
`Page 02 of 17
`
`
`
`Patent Application Publication Dec. 19, 2002. Sheet 2 of 7
`
`US 2002/0193130 A1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`|--------------------------------------}r - - - - - - - - - - - -– UO?Z
`
`N OIW
`
`Exhibit 1006
`Page 03 of 17
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 3 of 7
`
`US 2002/0193130 A1
`
`
`
`MCA
`
`MIC B
`
`MICN
`
`Exhibit 1006
`Page 04 of 17
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 4 of 7
`
`US 2002/0193130 A1
`
`?nd?nO
`
`
`
`
`
`
`
`N
`
`9 l
`
`?SION + qDeeds
`
`(g)s
`
`?SION
`
`(})X
`
`Exhibit 1006
`Page 05 of 17
`
`
`
`Patent Application Publication Dec. 19, 2002 Sheet 5 of 7
`
`US 2002/0193130 A1
`
`yndino
`
`jeubis
`
`(3)A
`
`ules
`
`Joyeynoye
`
`OSIONwinsjoeds
`
`AWANOIuo
`1oy4J0}9}9q
`
`+yose0dS
`
`aSION
`
`()s
`
`OSION
`
`(3)x
`
`Exhibit 1006
`
`Page 06 of 17
`
`Exhibit 1006
`Page 06 of 17
`
`
`
`
`
`
`
`Patent Application Publication Dec. 19, 2002. Sheet 6 of 7
`
`US 2002/0193130 A1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Exhibit 1006
`Page 07 of 17
`
`
`
`Patent Application Publication Dec. 19, 2002. Sheet 7 of 7
`
`US 2002/0193130 A1
`
`
`
`
`
`
`
`
`
`speech + noise
`710a
`
`720
`
`s(t)
`
`
`
`Signal
`PrOCessor
`
`t
`
`mostly noise
`FIG. 7A
`
`710a
`
`s(t)
`
`10b -y X(t)
`
`u1
`
`Direction
`of speech
`
`FIG. 7B
`
`Exhibit 1006
`Page 08 of 17
`
`
`
`US 2002/0193130 A1
`
`Dec. 19, 2002
`
`NOISE SUPPRESSION FOR AWIRELESS
`COMMUNICATION DEVICE
`
`BACKGROUND
`0001. The present invention relates generally to commu
`nication apparatus. More particularly, it relates to techniques
`for Suppressing noise in a Speech Signal, and which may be
`used in a wireleSS or mobile communication device Such as
`a cellular phone.
`0002.
`In many applications, a speech Signal is received in
`the presence of noise, processed, and transmitted to a far-end
`party. One example of Such a noisy environment is wireleSS
`application. For many conventional cellular phones, a
`microphone is placed near a speaking user's mouth and used
`to pick up Speech Signal. The microphone typically also
`picks up background noise, which degrades the quality of
`the Speech Signal transmitted to the far-end party.
`0.003
`Newer-generation wireless communication devices
`are designed with additional capabilities. Besides Supporting
`Voice communication, a user may be able to view text or
`browse World Wide Web page via a display on the wireless
`device. New Videophone Service requires the user to place
`the phone away, which therefore requires “far-field” speech
`pick-up. Moreover, "hands-free” communication is Safer
`and provides more convenience, especially in an automo
`bile. In any case, the microphone in the wireleSS device may
`be used in a “far-field” mode whereby it may be placed
`relatively far away from the Speaking user (instead of being
`pressed against the user's ear and mouth). For far-field
`communication, leSS Signal and more noise are received by
`the microphone, and a lower signal-to-noise ratio (SNR) is
`achieved, which typically leads to poor Signal quality.
`0004 One common technique for suppressing noise is the
`Spectral Subtraction technique. In a typical implementation
`of this technique, Speech plus noise is received via a single
`microphone and transformed into a number of frequency
`bins via a fast Fourier transform (FFT). Under the assump
`tion that the background noise is long-time stationary (in
`comparison with the speech), a model of the background
`noise is estimated during time periods of non-speech activity
`whereby the measured Spectral energy of the received signal
`is attributed to noise. The background noise estimate for
`each frequency bin is utilized to estimate an SNR of the
`Speech in the bin. Then, each frequency bin is attenuated
`according to its noise energy content with a respective gain
`factor computed based on that bin’s SNR.
`0005 The spectral subtraction technique is generally
`effective at Suppressing Stationary noise components. How
`ever, due to the time-variant nature of the noisy environment
`(e.g., Street, airport, restaurant, and So on), the models
`estimated in the conventional manner using a single micro
`phone are likely to differ from actuality. This may result in
`an output Speech Signal having a combination of low audible
`quality, insufficient reduction of the noise, and/or injected
`artifacts.
`0006 Another technique for Suppressing noise is with a
`microphone array. For this technique, multiple microphones
`are arranged typically in a linear or Some other type of array.
`An adaptive or non-adaptive method is then used to proceSS
`the Signals received from the microphones to SuppreSS noise
`and improve speech SNR. However, the microphone array
`
`has not seen being applied to mobile communication devices
`Since it generally require certain size that cannot be fit into
`the Small form factor of current mobile devices.
`0007 Conventional wireless communication devices
`Such as cellular phones typically utilize a single microphone
`to pick up Speech Signal. The Single microphone design
`limits the type of Signal processing that may be performed
`on the received signal, and may further limit the amount of
`improvement (i.e., the amount of noise Suppression) that
`may be achievable. The Single microphone design is also
`ineffective at Suppressing noise in far-field application
`where the microphone is placed at a distance (e.g., a few
`feet) away from the speech Source.
`0008 AS can be seen, techniques that can be used to
`SuppreSS noise in a speech Signal in a wireleSS environment
`are highly desirable.
`
`SUMMARY
`0009. The invention provides techniques to suppress
`noise from a Signal comprised of Speech plus noise. In
`accordance with aspects of the invention, two or more signal
`detectors (e.g., microphones) are used to detect respective
`Signals. Each detected Signal comprises a desired speech
`component and an undesired noise component, with the
`magnitude of each component being dependent on various
`factorS Such as the distance between the Speech Source and
`the microphone, the directivity of the microphone, the noise
`Sources, and So on. Signal processing is then used to process
`the detected Signals to generate the desired output signal
`having predominantly speech, with a large portion of the
`noise removed. The techniques described herein may be
`advantageously used for both near-field and far-field appli
`cations, and may be implemented in various wireleSS and
`mobile devices Such as cellular phones.
`0010. An embodiment of the invention provides a mobile
`communication device that includes a number of Signal
`detectors (e.g., two microphones), optional first and Second
`beam forming units, and a noise Suppression unit. The beam
`forming units and noise Suppression unit may be imple
`mented within a digital signal processor (DSP). Each signal
`detector provides a respective detected Signal having a
`desired component plus an undesired component. The first
`beam forming unit receives and processes the detected
`Signals to provide a first signal S(t) having the desired
`component plus a portion of the undesired component. The
`Second beam forming unit receives and processes the
`detected signals to provide a second signal x(t) having a
`large portion of the undesired component. The noise Sup
`pression unit then receives and digitally processes the first
`and Second signals to provide an output signal y(t) having
`Substantially the desired component and a large portion of
`the undesired component removed. The noise Suppression
`unit may be designed to digitally process the first and Second
`Signals in the frequency domain, although Signal processing
`in the time domain is also possible. The noise Suppression
`unit may be designed to perform the noise cancellation using
`Spectrum modification technique, which provides improved
`performance over other noise cancellation techniques.
`0011. In one specific design, the noise Suppression unit
`includes a noise Spectrum estimator, a gain calculation unit,
`a speech or Voice activity detector, and a multiplier. The
`noise Spectrum estimator derives an estimate of the Spectrum
`
`Exhibit 1006
`Page 09 of 17
`
`
`
`US 2002/0193130 A1
`
`Dec. 19, 2002
`
`of the noise based on a transformed representation of the
`Second Signal. The gain calculation unit provides a set of
`gain coefficients for the multiplier based on a transformed
`representation of the first signal and the noise spectrum
`estimate. The multiplier receives and Scales the magnitude
`of the transformed first signal with the Set of gain coeffi
`cients to provide a Scaled transformed Signal, which is then
`inverse transformed to provide the output signal. The activ
`ity detector provides a control Signal indicative of active and
`non-active time periods, with the active time periods indi
`cating that the first Signal includes predominantly the desired
`component. The first beam forming unit may be allowed to
`adapt during the active time periods, and the Second beam
`forming unit may be allowed to adapt during the non-active
`time periods.
`0012 Another aspect of the invention provides a wireless
`communication device, e.g., a mobile phone, having at least
`two microphones and a signal processor. Each microphone
`detects and provides a respective detected Signal comprised
`of a desired component and an undesired component. For
`each detected signal, the specific amount of each (desired
`and undesired) component included in the detected signal
`may be dependent on various factors, Such as the distance to
`the Speaking Source and the directivity of the microphone.
`The Signal processor receives and digitally processes the
`detected Signals to provide an output Signal having Substan
`tially the desired component and a large portion of the
`undesired component removed. The Signal processing may
`be performed in a manner that is dependent in part on the
`characteristics of the detected signals.
`0013 Various other aspects, embodiments, and features
`of the invention are also provided, as described in further
`detail below.
`0.014. The foregoing, together with other aspects of this
`invention, will become more apparent when referring to the
`following Specification, claims, and accompanying draw
`ings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`0015 FIGS. 1A through 1C are diagrams of three wire
`leSS communication devices capable of implementing Vari
`ous aspects of the invention;
`0016 FIG. 2 is a block diagram of a speech processing
`System Suitable for removing background noise from a
`Speech plus noise Signal, and may be used for both near-field
`and far-field applications,
`0017 FIGS. 3A and 3B are block diagrams of an
`embodiment of a main beam forming unit and a blocking
`beam forming unit, respectively;
`0018 FIGS. 4, 5, and 6 are block diagrams of three
`different embodiments of the noise Suppression unit; and
`0019 FIGS. 7A and 7B are diagrams of another speech
`processing System Suitable for removing background noise
`from a Speech plus noise Signal.
`
`DESCRIPTION OF THE SPECIFIC
`EMBODIMENTS
`0020 FIG. 1A is a diagram of an embodiment of a
`wireless communication device 100a capable of implement
`ing various aspects of the invention. In this embodiment,
`
`device 100a is a cellular phone having a pair of microphones
`110a and 110b. Microphone 110a is located in the lower left
`corner of the device, and microphone 110b is located in the
`lower right corner of the device. The microphones may also
`be located in other parts of the device, and this is within the
`Scope of the invention. The placement of the microphones
`may be constrained by various factorS Such as the Small size
`of the cellular phone, manufacturability, and So on.
`0021
`FIG. 1B is a diagram of an embodiment of a
`wireless communication device 100b having three micro
`phones 110. In this embodiment, microphone 110a is located
`in the lower center of the device near a speaking user's
`mouth and may be used to pick up desired speech plus
`undesired background noise. Microphone 110b is located in
`the middle left side of the device, and microphone 110c is
`located in the middle right side of the device. Additional
`microphones may also be used, and the microphones may
`also be placed in other parts of the device, and this is within
`the Scope of the invention. The microphones do not need to
`be placed in an array. For improved performance, the
`microphones may be located as far away from each other as
`practically possible.
`0022 FIG. 1C is a diagram of an embodiment of a
`wireless communication device 100c having a number of
`microphones 110. In this embodiment, device 110c includes
`a larger sized display, which may be used for displaying text,
`graphics, Videos, and So on. Device 100c may be a handset
`for the new 3" generation (3GPP) wireless communication
`systems under development and deployment. Device 100c
`may also be a personal digital assistant (PDA) with voice
`recognition or phone function. Device 100c may also be a
`video phone with or without web-browser capability. In
`general, device 100c may be any device capable of Support
`ing voice communication possibly along with other func
`tions (e.g., text, video, and So on). In the specific embodi
`ment shown in FIG. 1C, microphones 110a through 110d
`are located in a line above the display area. The microphones
`may also be placed in other locations of the device.
`0023. Each of devices 100a, 100b, and 100c advanta
`geously employ two or more microphones to allow the
`device to be used for both “near-field” and “far-field”
`applications. For near-field application, one microphone
`(e.g., microphone 110a in FIG. 1B) or multiple microphones
`(e.g., microphones 110a and 110b in FIG. 1A) may be used
`to pick up speech Signal from a close-by Source. And for
`far-field application, the microphones are designed to pick
`up speech Signal from a Source located further away. Noise
`Suppression is used to remove noise and improve Signal
`quality.
`0024 Devices 110a and 110b are similar to conventional
`cellular phones and may be used with the devices placed
`close to the Speaking user. With the noise Suppression
`techniques described herein, devices 110a and 110b may
`also be used in a hand-free mode whereby they are located
`further away from the speaking user. Device 110c is a
`handset that may be designed to be placed away from the
`user (e.g., one to two feet away) during use, which allows
`the user to better view the display while talking.
`0025 FIG. 2 is a block diagram of a speech processing
`System 200 capable of removing background noise from a
`Speech plus noise Signal and utilizing a number of Signal
`detectors. In an embodiment, microphones are used as the
`
`Exhibit 1006
`Page 10 of 17
`
`
`
`US 2002/0193130 A1
`
`Dec. 19, 2002
`
`signal detectors. System 200 may be used for both near-field
`and far-field applications, and may be implemented in each
`of devices 100a through 100c in FIGS. 1A through 1C,
`respectively.
`0.026 System 200 includes two or more microphones
`210a through 210n, a beam forming unit 212, and a noise
`suppression unit 230a. Beam forming unit 212 may be
`optional for Some devices (e.g., for devices that use direc
`tional microphones), as described below. Beam forming unit
`212 and a noise Suppression unit 230a may be implemented
`within one or more digital signal processors (DSPs) or Some
`other integrated circuit.
`0.027
`Each microphone provides a respective analog
`Signal that is typically conditioned (e.g., filtered and ampli
`fied) and then digitized prior to being Subjected to the signal
`processing by beam forming unit 212 and noise Suppression
`unit 230a. For simplicity, this conditioning and digitization
`circuitry is not shown in FIG. 2.
`0028. The microphones may be located either close to, or
`at a relatively far distance away from, the Speaking user
`during use. Each microphone 210 detects a respective signal
`having a speech component plus a noise component, with
`the magnitude of the received components being dependent
`on various factors, Such as (1) the distance between the
`microphone and the speech Source, (2) the directivity of the
`microphone (e.g., whether the microphone is directional or
`omni-directional), and So on. The detected signals from
`microphones 210a through 210n are provided to each of two
`beam forming units 214a and 214b within unit 212.
`0029 Main beam forming unit 214a, which is also
`referred to as the “main beam former, processes the Signals
`from microphones 210a through 210n to provide a signal
`S(t) comprised of speech plus noise. Main beam forming unit
`214a may further be able to suppress a portion of the
`received noise component. Main beam forming unit 214a
`may be designed to implement any type of beam former that
`attempts to reject as much interference and noise as possible.
`A Specific design for main beam forming unit 214a is shown
`in FIG. 3A below. Main beam forming unit 214a may also
`be an optional unit that may be omitted for Some devices
`(e.g., if the Signal S(t) can be obtained from one micro
`phone). Main beam forming unit 214a provides the signal
`S(t) to noise Suppression unit 230a.
`0030 Blocking beam forming unit 214b, which is also
`referred to as a “blocking beam former', processes the
`signals from microphones 210a through 210n to provide a
`Signal x(t) comprised of mostly the noise component. Block
`ing beam forming unit 214b is used to provide an accurate
`estimate of the noise, and to block as much of the desired
`Speech Signal as possible. This then allows for effective
`cancellation of the noise in the signal s(t). Blocking beam
`forming unit 214b may also be designed to implement any
`one of a number of beam formers, one of which is shown in
`FIG. 3B below. Blocking beam forming unit 214b provides
`the Signal x(t) to noise Suppression unit 230a. By employing
`blocking beam forming unit 214b to generate the mostly
`noise signal x(t), System 200 may utilize various types of
`microphone (e.g., omni-directional microphone, dipole
`microphones, and So on) which may pick up any combina
`tion of Signal and noise.
`0.031) A beam forming controller 218 directs the opera
`tion of main and blocking beam forming units 214a and
`
`214b. Controller 218 typically receives a control signal from
`a voice activity detector (VAD) 240. Voice activity detector
`240 detects the presence of Speech at the microphones and
`provides the Act control Signal indicating periods of Speech
`activity. The detection of Speech activity can be performed
`in various manners known in the art, one of which is
`described by D. K. Freeman et al. in a paper entitled “The
`Voice Activity Detector for the Pan-European Digital Cel
`lular Mobile Telephone Service,” 1989 IEEE International
`Conference Acoustics, Speech and Signal Processing, Glas
`gow, Scotland, Mar. 23-26, 1989, pages 369-372, which is
`incorporated herein by reference.
`0032 Beam forming controller 218 provides the neces
`Sary controls that direct main and blocking beam forming
`units 214a and 214b to adapt at the appropriate times. In
`particular, controller 218 provides an Adapt M control
`Signal to main beam forming unit 214a to enable it to adapt
`during periods of speech activity and an Adapt B control
`Signal to blocking beam forming unit 214b to enable it to
`adapt during periods of non-Speech activity. In one simple
`implementation, the Adapt B control signal is generated by
`inverting the Adapt M control Signal.
`0033 FIG. 3A is a block diagram of an embodiment of
`main beam forming unit 214a. The Signal from microphone
`210a is provided to a delay element 312 and the signals from
`microphones 210b through 210n are respectively provided
`to adaptive filters 314b through 314n. Delay element 312
`provides delay for the signal from microphone 210a Such
`that the delayed signal is approximately time-aligned with
`the outputs from adaptive filters 314b through 314n. The
`amount of delay to be provided by delay element 312 is thus
`dependent on the design of adaptive filters 314. One par
`ticular delay length may be a half of the tap number of the
`adaptive filters, if a finite impulse response (FIR) adaptive
`filter is used for each adaptive filter.
`0034). Each adaptive filter 314 filters the received signal
`Such that the error Signal e(t) used to update the adaptive
`filter is minimized during the adaptation period. Adaptive
`filters 314 may be designed to implement any one of a
`number of adaptation algorithms known in the art. Some
`Such algorithms include a least mean Square (LMS) algo
`rithm, a normalized mean Square (NLMS), a recursive least
`square (RLS) algorithm, and a direct matrix inversion (DMI)
`algorithm. Each of the LMS, NLMS, RLS, and DMI algo
`rithms (directly or indirectly) attempts to minimize the mean
`Square error (MSE) of the error Signal e(t) used to update the
`adaptive filter. In an embodiment, the adaptation algorithm
`implemented by adaptive filters 314b through 314n is the
`NLMS algorithm.
`0035) The NLMS algorithm is described in detail by B.
`Widrow and S. D. Stems in a book entitled “Adaptive Signal
`Processing.” Prentice-Hall Inc., Englewood Cliffs, N.J.,
`1986. The LMS, NLMS, RLS, DMI, and other adaptation
`algorithms are also described in detail by Simon Haykin in
`a book entitled "Adaptive Filter Theory', 3rd edition, Pren
`tice Hall, 1996. The pertinent sections of these books are
`incorporated herein by reference.
`0036) As shown in FIG. 3A, the filtered signal from each
`adaptive filter 314 is subtracted by the delayed signal from
`delay element 312 by a respective summer 316 to provide
`the error signal e(t) for that adaptive filter. This error signal
`is then provided back to the adaptive filter and used to
`
`Exhibit 1006
`Page 11 of 17
`
`
`
`US 2002/0193130 A1
`
`Dec. 19, 2002
`
`update the response of that adaptive filter. AS also shown in
`FIG. 3A, adaptive filters 314b through 314n are updated
`when the Adapt M control signal is enabled, and are main
`tained when the Adapt M control Signal is disabled.
`0037 To generate the signal s(t), a Summer 318 receives
`and combines the delayed signal from microphone 210a
`with the filtered signals from adaptive filters 314b through
`314n. The resultant output may further be divided by a factor
`of N (where N denotes the number of microphones) to
`provide the Signal S(t).
`0.038
`FIG. 3A shows a specific design for main beam
`forming unit 214a. Other designs may also be used and are
`within the Scope of the invention. For example, main beam
`forming unit 214a may be implemented with a “Griffiths
`Jim” beam former that is described by L. J. Griffiths and C.
`W. Jim in a paper entitled “An Alternative Approach to
`Robust Adaptive Beam Forming.” IEEE Trans. Antenna
`Propagation, January 1982, vol. AP-30, no. 1, pp. 27-34,
`which is incorporated herein by reference.
`0039 FIG. 3B is a block diagram of an embodiment of
`blocking beam forming unit 214b. The Signal from micro
`phone 210a is provided to a delay element 322 and the
`signals from microphones 210b through 210n are respec
`tively provided to adaptive filters 324b through 324n. Delay
`element 322 provides an amount of delay approximately
`matching the delay of adaptive filters 324. One particular
`delay length may be a half of the tap number of the adaptive
`filter, if a FIR filter is used for each adaptive filter.
`0040. Each adaptive filter 324 filters the received signal
`Such that an error signal e(t) is minimized during the
`adaptation period. Adaptive filters 324 also may be imple
`mented using various designs, Such as with NLMS adaptive
`filters. To generate the signal x(t), a Summer 328 receives
`and subtracts the filtered signals from adaptive filters 324b
`through 324n from the delay signal from delay element 322.
`The Signal x(t) represents the common error Signal for all
`adaptive filters 324b through 324n within the blocking beam
`former, and is used to adjust the response of these adaptive
`filters.
`Referring back to FIG. 2, noise suppressor 230a
`0041
`performs noise Suppression in the frequency domain. Fre
`quency domain processing may provide improved noise
`Suppression and may be preferred over time domain pro
`cessing because of Superior performance. The mostly noise
`Signal x(t) does not need to be highly correlated to the noise
`component in the speech plus noise signal S(t), and only
`need to be correlated in the power spectrum, which is a much
`more relaxed criteria.
`0042. Within noise Suppressor 230a, the speech plus
`noise signal S(t) from main beam forming unit 214a is
`transformed by a transformer 232a to provide a transformed
`speech plus noise signal S(co). In an embodiment, the signal
`S(t) is transformed one block at a time, with each block
`including L data Samples for the Signal S(t), to provide a
`corresponding transformed block. Each transformed block
`of the signal S(co) includes L elements, S(coo) through
`S(co), corresponding to L frequency bins, where n
`denotes the time instant associated with the transformed
`block. Similarly, the mostly noise Signal x(t) from blocking
`beam forming unit 214b is transformed by a transformer
`232b to provide a transformed mostly noise Signal X(CO).
`
`Each transformed block of the signal X(co) also includes L
`elements, X(coo) through X(CO). In the specific embodi
`ment shown in FIG. 2, transformers 232a and 232b are each
`implemented as a fast Fourier transform (FFT) that trans
`forms a time-domain representation into a frequency-do
`main representation. Other type of transform may also be
`used, and this is within the Scope of the invention. The size
`of the digitized data block for the Signals S(t) and X(t) to be
`transformed can be Selected based on a number of consid
`erations (e.g., computational complexity). In an embodi
`ment, blocks of 128 Samples at the typical audio Sampling
`rate are transformed, although other block sizes may also be
`used. In an embodiment, the Samples in each block are
`multiplied by a Hanning window function, and there is a
`64-Sample overlap between each pair of consecutive blockS.
`0043. The magnitude component of the transformed sig
`nal S(co) is provided to a multiplier 236 and a noise spectrum
`estimator 242. Multiplier 236 scales the magnitude compo
`nent of S(()) with a set of gain coefficients G(a)) provided by
`a gain calculation unit 244. The Scaled magnitude compo
`nent is then recombined with the phase component of S(co)
`and provided to an inverse FFT (IFFT) 238, which trans
`forms the recombined Signal back to the time domain. The
`resultant output signal y(t) includes predominantly speech
`and has a large portion of the background noise removed.
`0044) It is sometime advantageous, though it may not be
`necessary, to filter the magnitude component of S(co) and
`X(CO) So that a better estimation of the short-term spectrum
`magnitude of the respective Signal can be obtained. One
`particular filter implementation is a first-order infinite
`impulse response (IIR) low-pass filter with different attack
`and release time.
`0045 Noise spectrum estimator 242 receives the magni
`tude of the transformed signal S(co), the magnitude of the
`transformed signal X(co), and the Act control signal from
`voice activity detector 240 indicative of periods of non
`Speech activity. Noise Spectrum estimator 242 then derives
`the magnitude spectrum estimates for the noise N(co), as
`follows:
`
`0046 where W(()) is referred to as the channel equaliza
`tion coefficient. In an embodiment, this coefficient may be
`derived based on an exponential average of the ratio of
`magnitude of S(co) to the magnitude of X(CO), as follows:
`
`S(co)
`W.1 (co) = a W. (co) + (1 - xi.
`
`Eq. (2)
`
`0047 where a is the time constant for the exponential
`averaging and is 0<a<1. In a Specific implementation, a=1
`when Voice activity indicator 240 indicates a speech activity
`period and a-0.98 when voice activity indicator 240 indi
`cates a non-Speech activity period.
`0048 Noise spectrum estimator 242 provides the mag
`nitude spectrum estimates for the noise N(CD) to gain calcu
`lator 334, which then uses these estimates to generate the
`gain coefficients G(a)) for multiplier 334.
`0049. With the magnitude spectrum of the noise N(())
`and the magnitude spectrum of the signal S(co) available, a
`
`Exhibit 1006
`Page 12 of 17
`
`
`
`US 2002/0193130 A1
`
`Dec. 19, 2002
`
`number of spectrum modification techniques may be used to
`determine the gain coefficients G(co). Such spectrum modi
`fication techniques include a spectrum Subtraction tech
`nique, Weiner filtering, and So on.
`0050. In an embodiment, the spectrum subtraction tech
`nique is used for noise Suppression, and the gain coefficients
`G(()) may be determined by first computing the SNR of the
`speech plus noise signal S(co) and the mostly noise signal
`N(c)), as follows:
`
`S
`SNR(co) = F.
`
`Ed (3
`q. (3)
`
`0051) The gain coefficient G(a)) for each frequency bin ()
`may then be expressed as:
`
`G(co) = ma
`
`(SNR(co) - 1)
`SNR(co)
`
`in
`
`Eq. (4)
`
`where G is a lower bound on G(co).
`0052
`0.053 Gain calculator 244 thus generates a gain coeffi
`cient G(co) for each frequency bin j of the transformed
`Signal S(co). The gain coefficients for all frequency bins are
`provided to multiplier 236 and used to Scale the magnitude
`of the signal S(co).
`0054.
`In an aspect, the spectrum subtraction is performed
`based on a noise N(co) that is a time-varying noise spectrum
`derived from the mostly noise signal x(t), which may be
`provided by the blocking beam former. This is different from
`the Spectrum Subtraction used in conventional Single micro
`phone design whereby N(CD) typically comprises mostly
`Stationary or constant values. This type of noise Suppression
`is also described in U.S. Pat. No. 5,943,429, entitled “Spec
`tral Subtraction Noise Suppression Method,” issued Aug.
`24, 1999, which is incorporated herein by reference. The use
`of a time-varying noise spectrum (which more accurately
`reflects the real noise in the environment) allows the inven
`tive noise Suppression techniques to cancel non-Stationary
`noise as well as Stationary noise (non-stationary noise can
`cellation typically cannot be achieve by conventional noise
`Suppression techniques that use a static noise spectrum).
`0.055 The spectrum subtraction technique for a single
`microphone is also described by S. F. Boll in a paper entitled
`“Suppression of Acoustic Noise in Speech Using Spectral
`Subtraction.” IEEE Trans. Acoustic Speech Signal Proc.,
`April 1979, vol. ASSP-27, pp. 113-121, which is incorpo
`rated herein by reference.
`0056. The spectrum modification technique is one tech
`nique for removing noise from the Speech plus noise Signal
`S(t). The spectrum modification technique provides good
`performance and can remove both Stationary and non
`Stationary noise (using the time-varying noise spectrum
`estimate described above). However, other noi