`
`(12) United States Patent
`Lambert et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,000,482 B2
`Aug. 16, 2011
`
`(54)
`
`(75)
`
`(73)
`
`MCROPHONE ARRAY PROCESSING
`SYSTEM FOR NOISY MULTIPATH
`ENVIRONMENTS
`
`Inventors: Russell H. Lambert, Fountain Valley,
`CA (US); Shi-Ping Hsu, Pasadena, CA
`(US); Karina L. Edmonds, Pasadena,
`CA (US)
`Assignee: Northrop Grumman Systems
`Corporation, Los Angeles, CA (US)
`
`(*)
`
`Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 1776 days.
`
`(21)
`(22)
`(65)
`
`(63)
`
`(51)
`
`(52)
`(58)
`
`Appl. No.: 11/197,817
`
`Filed:
`
`Aug. 5, 2005
`
`Prior Publication Data
`US 2005/028.1415A1
`Dec. 22, 2005
`
`Related U.S. Application Data
`Continuation of application No. 09/388,010, filed on
`Sep. 1, 1999, now abandoned.
`
`Int. C.
`(2006.01)
`H04B I5/00
`U.S. Cl. ...................... 381/94.7: 381/94.1: 381/94.3
`Field of Classification Search ................. 381/94.7,
`381/94.1, 94.3
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`
`
`U.S. PATENT DOCUMENTS
`6,317,501 B1 * 1 1/2001 Matsuo ........................... 381/92
`6,332,028 B1* 12/2001 Marash ........................... 381/92
`6,453,285 B1* 9/2002 Anderson et al. ..
`... 704/210
`6,654,468 B1 * 1 1/2003 Thompson ...................... 381/92
`* cited by examiner
`Primary Examiner — Devona E. Faulk
`Assistant Examiner — George Monikang
`(74) Attorney, Agent, or Firm — Tarolli, Sundheim, Covell
`& Tummino LLP
`
`ABSTRACT
`(57)
`Apparatus and a corresponding method for processing speech
`signals in a noisy reverberant environment, such as an auto
`mobile. An array of microphones (10) receives speech signals
`from a relatively fixed source (12) and noise signals from
`multiple sources (32) reverberated over multiple paths. One
`of the microphones is designated a reference microphone and
`the processing system includes adaptive frequency impulse
`response (FIR) filters (24) enabled by speech detection cir
`cuitry (21) and coupled to the other microphones to align their
`output signals with the reference microphone output signal.
`The filtered signals are then combined in a Summation circuit
`(18). Signal components derived from the speech signal com
`bine coherently in the summation circuit, while noise signal
`components combine incoherently, resulting in composite
`output signal with an improved signal-to-noise ratio. The
`composite output signal is further processed in a speech con
`ditioning circuit (20) to reduce the effects of reverberation.
`21 Claims, 5 Drawing Sheets
`
`10.1
`
`18
`
`
`
`
`
`REFERENCE
`MC.
`
`DATA MC.
`#1
`
`DATA MC.
`f2
`
`
`
`
`
`SPEECH
`DETECTN.
`CRCTRY.
`
`SPEECH
`CONDITIONNG
`(OPTIONAL)
`
`Exhibit 1013
`Page 01 of 11
`
`
`
`U.S. Patent
`
`Aug. 16, 2011
`
`Sheet 1 of 5
`
`US 8,000,482 B2
`
`
`
`DATA MC.
`2
`
`SPEECH
`DETECTN.
`CRCTRY.
`
`F G 1
`
`SPEECH
`CONDITIONING
`(OPTIONAL)
`
`Exhibit 1013
`Page 02 of 11
`
`
`
`U.S. Patent
`
`Aug. 16, 2011
`
`Sheet 2 of 5
`
`US 8,000,482 B2
`
`2. R
`2
`
`REF. USED TO
`UPDATE FILTERS
`
`BANDPASS
`
`22.1
`
`BANDPASS
`
`22.2
`
`10.1
`/
`
`1 O2
`-v.
`
`o BANDPASS-1-
`
`W.
`
`X
`
`-
`
`SOURCE
`(12)
`
`10N 722.N
`N ?
`
`21
`
`
`
`SPEECH
`DETECTN.
`CRCTRY.
`
`
`
`: .
`
`18
`
`7-20
`SPEECH
`COND
`
`CLEAN
`SPEECH
`
`FIG 2
`
`Exhibit 1013
`Page 03 of 11
`
`
`
`U.S. Patent
`
`Aug. 16, 2011
`
`Sheet 3 of 5
`
`US 8,000,482 B2
`
`1O.R
`
`10.1
`
`1 O2
`
`10.3
`
`1 ON
`
`REF.
`MC.
`
`
`
`DATA
`DATA
`DATA
`MIC.#1y, MIC.#2 y, MIC.#3y,
`
`DATA
`MCEN
`
`h
`
`h
`
`hn
`
`SOURCE
`(12)
`FG. 3A
`
`REF.
`
`24, 1
`
`H
`
`y y
`-Lyr
`y - H -GE)
`
`24.2
`
`24.3
`
`18
`
`28
`1
`
`28
`
`283
`
`SPHERE
`STEERED
`X OUTPUT
`
`H
`
`r
`
`',
`
`24.N
`
`y1)
`
`W
`
`-
`
`28.N
`
`Exhibit 1013
`Page 04 of 11
`
`
`
`U.S. Patent
`
`Aug. 16, 2011
`
`Sheet 4 of 5
`
`US 8,000,482 B2
`
`32
`
`NOSE
`
`(n)
`
`
`
`12
`
`SPEECH
`SOURCE
`
`NaN
`WWW
`
`If I
`
`
`
`1O.R
`
`18
`
`
`
`
`
`10.1
`DATA MC.
`
`i1 Cir
`
`#2
`
`DATA MC.
`N
`
`SPEECH
`DETECTN.
`CRCTRY.
`
`21
`
`FIG 5
`
`Exhibit 1013
`Page 05 of 11
`
`
`
`U.S. Patent
`U.S. Patent
`
`Aug. 16, 2011
`Aug.16, 2011
`
`Sheet 5 of 5
`
`US 8,000,482 B2
`US 8,000,482 B2
`
`9Sls
`
`
`
`
`
`ylX{|Jaquunuowsn)Bulssage:daojaq
`
`
`
`
`
`Z-Ola
`
`
`
`=‘:;:Uf)
`
`eo
`
`o
`
`}
`
`L-
`
`cn
`
`9kvlchOlB9vc0
`
`
`
`
`
`yOXsoiw2BuisnBurssaooidsaye
`
`91vbclOL89v¢0 (Buissa9qJg
`
`
`
`
`
`
`
`Jeauly)wuywoblyauyasegjauueyoniny
`
`Exhibit 1013
`
`Page 06 of 11
`
`Exhibit 1013
`Page 06 of 11
`
`
`
`
`US 8,000,482 B2
`
`1.
`MCROPHONE ARRAY PROCESSING
`SYSTEM FOR NOSY MULTIPATH
`ENVIRONMENTS
`
`CROSS-REFERENCE TO RELATED
`APPLICATION
`
`This application is a continuation of application Ser. No.
`09/388,010, now abandoned, which was filed Sep. 1, 1999
`and entitled Microphone Array Processing System for Noisy
`Multipath Environments, which is incorporated herein by
`reference.
`
`BACKGROUND OF THE INVENTION
`
`10
`
`15
`
`This invention relates generally to techniques for reliable
`conversion of speech data from acoustic signals to electrical
`signals in an acoustically noisy and reverberant environment.
`There is a growing demand for “hands-free” cellular tele
`phone communication from automobiles, using automatic
`speech recognition (ASR) for dialing and other functions.
`However, background noise from both inside and outside an
`automobile renders in-vehicle communication both difficult
`and stressful. Reverberation within the automobile combines
`with high noise levels to greatly degrade the speech signal
`received by a microphone in the automobile. The microphone
`receives not only the original speech signal but also distorted
`and delayed duplicates of the speech signal, generated by
`multiple echoes from walls, windows and objects in the auto
`mobile interior. These duplicate signals in general arrive at
`the microphone over different paths. Hence the term “multi
`path’ is often applied to the environment. The quality of the
`speech signal is extremely degraded in Such an environment,
`and the accuracy of any associated ASR systems is also
`degraded, perhaps to the point where they no longer operate.
`For example, recognition accuracy of ASR systems as high as
`96% in a quiet environment could drop to well below 50% in
`a moving automobile.
`Another related technology affected by a noise and rever
`beration is speech compression, which digitally encodes
`speech signals to achieve reductions in communication band
`width and for other reasons. In the presence of noise, speech
`compression becomes increasingly difficult and unreliable.
`In the prior art, sensor arrays have been used or Suggested
`for processing narrowband signals, usually with a fixed uni
`formly spaced microphone array, with each microphone hav
`ing a single weighting coefficient. There are also wideband
`array signal processing systems for speech applications. They
`use a beam-steering technique to position “nulls' in the direc
`tion of noise or jamming sources. This only works, of course,
`if the noise is emanating from one or a small number of point
`Sources. In a reverberant or multipath environment, the noise
`appears to emanate from many different directions, so noise
`nulling by conventional beam steering is not a practical solu
`tion.
`There are also a number of prior art systems that effect
`active noise cancellation in the acoustic field. Basically, this
`technique cancels acoustic noise signals by generating an
`opposite signal, sometimes referred to as “anti-noise.”
`through one or more transducers near the noise source, to
`cancel the unwanted noise signal. This technique often cre
`ates noise at Some other location in the vicinity of the speaker,
`and is not a practical solution for canceling multiple unknown
`noise sources, especially in the presence of multipath effects.
`Accordingly, there is still a significant need for reduction of
`the effects of noise in a reverberant environment, such as the
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`interior of a moving automobile. As discussed in the follow
`ing Summary, the present invention addresses this need.
`
`SUMMARY OF THE INVENTION
`
`The present invention resides in a system and related
`method for noise reduction in a reverberant environment,
`Such as an automobile. Briefly, and in general terms, the
`system of the invention comprises a plurality of microphones
`positioned to detect speech from a single speech Source and
`noise from multiple sources, and to generate corresponding
`microphone output signals, one of the microphones being
`designated a reference microphone and the others being des
`ignated data microphones. The system further comprises a
`plurality of bandpass filters, one for each microphone, for
`eliminating from the microphone output signals a known
`spectral band containing noise; a plurality of adaptive filters,
`one for each of the data microphones, for aligning each data
`microphone output signal with the output signal from the
`reference microphone; and a signal Summation circuit, for
`combining the filtered output signals from the microphones.
`Signal components resulting from the speech source combine
`coherently and signal components resulting from multiple
`noise sources combine incoherently, to produce an increased
`signal-to-noise ratio. The system may also comprise speech
`conditioning circuitry coupled to the signal Summation cir
`cuit, to reduce reverberation effects in the output signal.
`More specifically, each of the adaptive filters includes
`means for filtering data microphone output signals by convo
`lution with a vector of weight values; means for comparing
`the filtered data microphone output signals from one of the
`data microphones with reference microphone output signals
`and deriving therefrom an error signal; and means for adjust
`ing the weight values convolved with the data microphone
`output signals to minimize the error signal. In the preferred
`embodiment of the invention, each of the adaptive filters
`further includes fast Fourier transform means, to transform
`Successive blocks of data microphone output signals to a
`frequency domain representation to facilitate real-time adap
`tive filtering.
`The invention may also be defined in terms of a method for
`improving detection of speech signals in noisy environments.
`Briefly, the method comprises the steps of positioning a plu
`rality of microphones to detect speech from a single speech
`Source and noise from multiple sources, one of the micro
`phones being designated a reference microphone and the
`others being designated data microphones; generating micro
`phone output signals in the microphones; filtering the micro
`phone output signals in a plurality of bandpass filters, one for
`each microphone, to eliminate from the microphone output
`signals a known spectral band containing noise; adaptively
`filtering the microphone output signals in a plurality of adap
`tive filters, one for each of the data microphones, and thereby
`aligning each data microphone output signal with the output
`signal from the reference microphone; and combining the
`adaptively filtered output signals from the microphones in a
`signal Summation circuit. The incoming speech from one or
`multiple microphones is monitored to determine when speech
`is present. The adaptive filters are only allowed to adapt while
`speech is present. Signal components resulting from the
`speech source combine coherently in the signal Summation
`circuit and signal components resulting from noise combine
`incoherently, to produce an increased signal-to-noise ratio.
`The method may further comprise the step of conditioning the
`combined signals in speech conditioning circuitry coupled to
`the signal Summation circuit, to reduce reverberation effects
`in the output signal.
`
`Exhibit 1013
`Page 07 of 11
`
`
`
`US 8,000,482 B2
`
`1
`O
`
`3
`More specifically, the step of adaptively filtering includes
`filtering data microphone output signals by convolution with
`a vector of weight values; comparing the filtered data micro
`phone output signals from one of the data microphones with
`reference microphone output signals and deriving therefrom 5
`an error signal; adjusting the weight values convolved with
`the data microphone output signals to minimize the error
`signal; and repeating the filtering, comparing and adjusting
`steps to converge on a set of weight values that results in
`minimization of noise effects.
`In the preferred embodiment of the invention, the step of
`adaptively filtering further includes obtaining a block of data
`microphone signals; transforming the block of data to a fre
`quency domain using a fast Fourier transform; filtering the
`block of data in the frequency domain using a current best
`estimate of weighting values; comparing the filtered block of 15
`data with corresponding data derived from the reference
`microphone; updating the filter weight values to minimize
`any difference detected in the comparing step; transforming
`the filter weight values back to the time domain using an
`inverse fast Fourier transform; Zeroing out portions of the 20
`filterweight values that give rise to unwanted circular convo
`lution; and converting the filter values back to the frequency
`domain.
`It will be appreciated from the foregoing Summary that the
`present invention represents a significant advance in speech 25
`communication techniques, and more specifically in tech
`niques for enhancing the quality of speech signals produced
`in a noisy environment. The invention improves signal-to
`noise performance and reduces the reverberation effects, pro
`viding speech signals that are more intelligible to users. The 30
`invention also improves the accuracy of automatic speech
`recognition systems. Other aspects and advantages of the
`invention will become apparent from the following more
`detailed description, taken in conjunction with the accompa
`nying drawings.
`
`4
`of noise in the detection or recognition of speech in a noisy
`and reverberant environment, such as the interior of a moving
`automobile. The quality of speech transmission from mobile
`telephones in automobiles has long been known to be poor
`much of the time. Noise from within and outside the vehicle
`result in a relatively low signal-to-noise ratio and reverbera
`tion of sounds within the vehicle further degrades the speech
`signals. Available technologies for automatic speech recog
`nition (ASR) and speech compression are at best degraded,
`and may not operate at all in the environment of the automo
`bile.
`In accordance with the present invention, use of an array of
`microphones and its associated processing system results in a
`significant improvement in signal-to-noise ratio, which
`enhances the quality of the transmitted Voice signals, and
`facilitates the Successful implementation of Such technolo
`gies as ASR and speech compression.
`The present invention operates on the assumption that
`noise emanates from many directions. In a moving automo
`bile, noise sources inside and outside the vehicle clearly do
`emanate from different directions. Moreover, after multiple
`reflections inside the vehicle, even noise from a point source
`reaches a microphone from multiple directions. A source of
`speech, however, is assumed to be a point Source that does not
`move, at least not rapidly. Since the noise comes from many
`directions it is largely independent, or uncorrelated, at each
`microphone. The system of the invention Sums signals from N
`microphones and, in so doing, achieves a power gain of N for
`the signal of interest, because the amplitudes of the individual
`signals from the microphones Sum coherently, and power is
`proportional to the square of the amplitude. Because the noise
`components obtained from the microphones are incoherent,
`Summing them together results in an incoherent power gain
`proportional to N. Therefore, there is a signal-to-noise ratio
`improvement by a factor of N/N, or N.
`FIG. 1 shows an array of three microphones, indicated at
`10.1, 10.2 and 10.3, respectively. Microphone 10.1 is desig
`nated the reference microphone and the other two micro
`phones are designated data microphones. Each microphone
`receives an acoustic signal S from a speech Source 12. For
`purposes of explanation, in this illustration noise is consid
`ered to be absent. The acoustic transfer functions for the three
`microphones are h, he and h, respectively. Thus, the elec
`trical output signals from the microphones are Sh, S*h, and
`S*h, respectively. The signals from the data microphones
`10.2 and 10.3 are processed as shown in blocks 14 and 16,
`respectively, to allow them to be combined with each other
`and with the reference microphone signal. In block 14, the
`acoustic path transfer function his inverted and the reference
`acoustic path transfer function his applied, to yield the signs
`S*h. Similarly, in block 16, the function his inverted and the
`function h is applied, to yield the signal Sh. The three
`microphone signals are then applied to a Summation circuit
`18, which yields at output of 3:S*h. This signal is then
`processed by speech conditioning circuitry 20, which effec
`tively inverts the transfer function h, and yields the resulting
`signal amplitude 3.S. An array of N microphones would yield
`an effective signal amplitude gain of N (a power gain of N).
`The incoming speech to one or multiple microphones 10 is
`monitored in speech detection circuitry 21 to determine when
`speech is present. The functions performed in blocks 14 and
`16 are performed only when speech is detected by the cir
`cuitry 21.
`The signal gain obtained from the array of microphones is
`not dependent in any way on the geometry of the array. One
`requirement for positioning the microphones is that they be
`close enough to the speech source to provide a strong signal.
`
`35
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a block diagram depicting an important aspect of
`the invention, wherein signal amplitude is increased by coher- 40
`ent addition of filtered signals from multiple microphones;
`FIG. 2 is another block diagram showing a microphone
`array in accordance with the invention, and including band
`pass filters, speech detection circuitry, adaptive filters, a sig
`45
`nal Summation circuit, and speech conditioning circuitry;
`FIGS. 3A and 3B together depict another block diagram of
`the invention, including more detail of adaptive filters
`coupled to receive microphone outputs;
`FIG. 4 is a block diagram showing detail of a single adap
`tive filter used in the invention;
`FIG. 5 is another block diagram of the invention, showing
`how noise signal components are effectively reduced in
`accordance with the invention;
`FIG. 6 is a graph showing a composite output signal from
`a single microphone detecting a single speaker in a noisy 55
`automobile environment; and
`FIG. 7 is a graph showing a composite output signal
`obtained from an array of seven microphones in accordance
`with the invention, while processing speech from a single
`speaker in conditions similar to those encountered in the 60
`generation of the graph of FIG. 6.
`
`50
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENTS
`
`As shown in the drawings, the present invention is con
`cerned with a technique for significantly reducing the effects
`
`65
`
`Exhibit 1013
`Page 08 of 11
`
`
`
`5
`A second requirement is that the microphones be spatially
`separated. This spatial separation is needed so that indepen
`dent noises are sampled. Similarly, noise reduction in accor
`dance with the invention is not dependent on the geometry of
`the microphone array.
`The purpose of the speech conditioning circuitry 20 is to
`modify the spectrum of the cumulative signal obtained from
`the summation circuit 18 to resemble the spectrum of “clean'
`speech obtained in ideal conditions. The amplified signal
`obtained from the summation circuit 18 is still a reverberated
`10
`one. Some improvement is obtained by equalizing the mag
`nitude spectrum of the output signal to match a typical rep
`resentative clean speech spectrum. A simple implementation
`of the speech conditioning circuitry 20, therefore, includes an
`equalizer that selectively amplifies spectral bands of the out
`put signal to render the spectrum consistent with the clear
`speech spectrum. A more advanced form of speech condition
`ing circuitry is a blind equalization process specially tailored
`for speech. (See, for example, Lambert, R. H. and Nikias, C.
`L., “Blind Deconvolution of Multipath Mixtures. Chapter
`from Unsupervised Adaptive Filtering, Vol. 1, edited by
`Simon Haykin, John Wiley & Sons, 1999.) This speech con
`ditioning process is particularly important when an ASR sys
`tem is “trained using clean speech samples. Optimum results
`are obtained by training the ASR system using the output of
`the present invention under typical noisy environmental con
`ditions.
`FIG. 2 depicts the invention in principle, showing the
`speech source 12, a reference microphone 10.R. and N data
`microphones indicated at 10.1 through 10.N. The output from
`the reference microphone 10.R is coupled to abandpass filter
`22.R and the outputs from the data microphones 10.1 through
`10.N are coupled to similar bandpass filters 22.1 through
`22.N. respectively. A great deal of environmental noise lies in
`the low frequency region of approximately 0-300 Hz. There
`fore, it is advantageous to remove energy in this region to
`provide an improvement in signal-to-noise ratio.
`The outputs of the bandpass filters 22.1 through 22.N are
`connected to adaptive filters 24.1 through 24.N. respectively,
`indicated in the figure as W through W, respectively. These
`filters are functionally equivalent to the filters 14 and 16 in
`FIG. 1. The outputs of the filters 24, indicated as values X
`through X, are input to the Summation circuit 18, the output
`of which is processed by speech conditioning circuitry 20, as
`discussed with reference to FIG.1. As indicated by the arrow
`26, output signals from the reference bandpass filter 22.R are
`used to update the filters W through Wyperiodically, as will
`be discussed with reference to FIGS. 3 and 4. Speech detec
`tion circuitry 21 enables the filters 24 only when speech is
`detected.
`FIGS. 3A and 3B show the configuration of FIG. 2 in more
`detail, but without the bandpass filters 22 of FIG. 2. FIG. 3A
`shows the same basic configuration of microphones 10R and
`10.1 through 10.N. each receiving acoustic signals from the
`speech source 12. FIG. 3B shows the filters W. 24.1 through
`W. 24.N in relation to incoming signals y through y from
`the data microphones 10.1 through 10.N. Each of the W filters
`24.1 through 24.N has an associated Summing circuit 28.1
`through 28.N connected to its output. In each Summing cir
`cuit, the output of the W filter 24 is subtracted from a signal
`from the reference microphone 22.R transmitted over line 30
`to each of the Summing circuits. The result is an error signal
`that is fed back to the corresponding W filter 24, which is
`continually adapted to minimize the error signal.
`FIG. 4 shows this filter adaptation process in general terms,
`wherein the i' filter W, is shown as processing the output
`signal from the i' data microphone. Adaptive filtering fol
`
`45
`
`6
`lows conventional techniques for implementing finite
`impulse response (FIR) filters and can be performed in either
`the time domain or the frequency domain. In the usual time
`domain implementation of an adaptive filter, W, is a weight
`vector, representing weighting factors applied to Successive
`outputs of a tapped delay line that forms a transversal filter. In
`a conventional LMS adaptive filter, the weights of the filter
`determine its impulse response, and are adaptively updated in
`the LMS algorithm. Frequency domain implementations
`have also been proposed, and in general require less compu
`tation than the time domain approach. In a frequency domain
`approach, it is convenient to group the data into blocks and to
`modify the filter weights only after processing each block.
`In the preferred embodiment of the invention, the adaptive
`filter process is a block frequency domain LMS (least mean
`squares) adaptive update procedure similar to that described
`in a paper by E. A. Ferrara, entitled “Fast Implementation of
`LMS Adaptive Filters.” IEEE Trans. On Acoustics, Speech
`and Signal Processing, Vol. ASSP-28, No. 4, 1980, pp. 474
`475. The error signal computed in Summing circuit 28.i is
`given by (Reference mic.) -y, W. In digital processing of
`successive blocks of data, one adaptive step of W, may be
`represented by the expression:
`
`where k is the data block number and L is a small adaptive
`step.
`The process described by Ferrara has been modified to
`provide greater efficiency in a real-time system. The modifi
`cation entails converting the filters to the time domain, Zero
`ing the portions of the filters that give rise to circular convo
`lution, and then returning the filters to the frequency domain.
`More specifically, for each data block k, the following steps
`are performed:
`Obtain a block of data from the reference microphone and
`convert the data to the frequency domain. REF(k)=fft
`(ref(k)). New data read in is less than one-half of the FFT
`(fast Fourier transform) size, following a conventional
`process known as the overlap and save method.
`For each sensor i=1 to N, perform the following steps:
`Obtain a block of data y (k) from microphone i and
`transformit to the frequency domain.Y.(k)=fft(y,(k)).
`Filter the frequency domain block with the current best
`estimate of w, to obtain X(k)=W,(k)*Y,(k).
`Update the filter using W(k+1)=W,(k)+L(REF(k)-X,
`(k)) conj(Y).
`Convert the frequency domain filter back to the time
`domain. W(k+1)—ifft(W,(k+1)).
`Zero out portions of w,(k+1).
`Convert back to the frequency domain. W(k+1)=fft(w,
`(k+1)).
`FIG.5 shows the system of the invention processing speech
`from the source 12 and noise from multiple sources referred
`to generally by reference numeral 32. In the Summation cir
`cuit 18, the speech signal contributions from the data micro
`phones are added coherently, as previously discussed, to pro
`duce a speech signal proportional to N*S*h, and this signal
`can be conveniently convolved with the transfer function h to
`produce a larger speech signal NS. The speech signals, being
`coherent, combine in amplitude, and since the power of a
`sinusoidal signal is proportional to the square of its ampli
`tude, the speech signal power from Nsensors will be N-times
`the power from a single sensor. In contrast, the noise compo
`nents sensed by each microphone come from many different
`directions, and combine incoherently in the Summation cir
`cuit 18. The noise components may be represented by the
`Summation: n+n+...+n. Because these contributions are
`
`US 8,000,482 B2
`
`5
`
`15
`
`25
`
`30
`
`35
`
`40
`
`50
`
`55
`
`60
`
`65
`
`Exhibit 1013
`Page 09 of 11
`
`
`
`US 8,000,482 B2
`
`10
`
`15
`
`25
`
`7
`incoherent, their powers combine as N but their root mean
`square (RMS) amplitudes combine as VN. The cumulative
`noise power from the N sensors is, therefore, increased by a
`factor N, and the signal-to-noise ratio (the ratio of signal
`power to noise power) is increased by a factor N/N, or N. As
`in the previously described embodiments of the invention,
`speech detection circuitry 21 enables the filters 24 only when
`speech is detected by the circuitry.
`Theoretically, if the number of sensors is doubled the
`single-to-noise ratio should also double, i.e. show an
`improvement of 3 dB (decibels). In practice, the noise is not
`perfectly independent at each microphone, so the signal-to
`noise ratio improvement obtained from using N microphones
`will be somewhat less than N.
`The effect of the adaptive filters in the system of the inven
`tion is to “focus’ the system on a spherical field Surrounding
`the source of the speech signals. Other sources outside this
`sphere tend to be eliminated from consideration and noise
`Sources from multiple sources are reduced in effect because
`they are combined incoherently in the system. In an automo
`bile environment, the system re-adapts in a few seconds when
`there is a physical change in the environment, such as when
`passengers enter or leave the vehicle, or luggage items are
`moved, or when a window is opened or closed.
`FIGS. 6 and 7 show the improvement obtained by use of the
`invention. A composite output signal derived from a single
`microphone is shown in FIG. 6 and is clearly more noisy than
`a similar signal derived from seven microphones in accor
`dance with the invention.
`It will be appreciated from the foregoing that the present
`invention represents a significant advance in the field of
`microphone signal processing in noisy environments. The
`system of the invention adaptively filters the outputs of mul
`tiple microphones to align their signals with a common ref
`erence and allow signal components from a single source to
`combine coherently, while signal components from multiple
`noise sources combine incoherently and have a reduced
`effect. The effect of reverberation is also reduced by speech
`conditioning circuitry and the resultant signals more reliably
`represent the original speech signals. Accordingly, the system
`provides more acceptable transmission of voice signals from
`noisy environments, and more reliable operation of automatic
`speech recognition systems. It will also be appreciated that,
`although a specific embodiment of the invention has been
`described for purposes of illustration, various modifications
`45
`may be made without departing from the spirit and scope of
`the invention. Accordingly, the invention should not be lim
`ited except as by the appended claims.
`What is claimed is:
`1. A microphone array processing system for performance
`enhancement in noisy environments, the system comprising:
`a plurality of N microphones positioned to detect speech
`from a speech source and noise from at least one noise
`Source and to generate corresponding microphone out
`put signals, where N is a positive integer denoting a
`55
`number of the plurality of microphones, one of the N
`microphones being designated a reference microphone
`and the other N-1 microphones being designated data
`microphones, the reference microphone and the data
`microphones receive acoustic signals both from the
`speech source and from the at least one noise Source;
`a plurality of adaptive filters, one for each of the data
`microphones, for aligning each data microphone output
`signal relative to the reference microphone output sig
`nal; and
`a signal Summation circuit that sums the adaptively filtered
`microphone output signals with the reference micro
`
`8
`phone output signal Such that signal components result
`ing from the speech Source combine coherently to pro
`vide a speech signal having a power gain of
`approximately N and such that the signal components
`resulting from noise combine incoherently to provide a
`noise signal having power gain of approximately N to
`produce a corresponding increased signal-to-noise ratio.
`2. The system of claim 1, further comprising a plurality of
`bandpass filters configured to remove a known spectral band
`containing noise from each of the microphone output signals,
`the plurality of adaptive filters aligning each of the bandpass
`filtered output signals from the data microphones relative to
`the reference microphone output signal.
`3. The system of claim 2, wherein the plurality of adaptive
`filters are updated based on the output signal from the band
`pass filter that filters the reference microphone output signal.
`4. The system of claim 3, wherein each of the plurality of
`adaptive filters is configured to update a filter weight value
`according to a block frequency domain least mean square
`adaptive update procedure.
`5. The system of claim 1, wherein each of the plurality of
`adaptive filters further comprises a Summation circuit that
`subtracts the output of a respective adaptive filter from the
`reference microphone output signal to provide a correspond
`ing error signal, each of the plurality of adaptive filters adapt
`ing to minimize the corresponding error signal.
`6. The system of claim 5, wherein each of the plurality of
`adaptive filters further comprises a weight vector, represent
`ing weighting factors, that is updated based on the corre
`sponding error signal and applied to successive outputs of a
`tapped delay line of the respective adaptive filter.
`7. The system of claim 1, further comprising speech detec
`tion circuitry that enables the plurality of adaptive filters in
`response to detecting speech from the speech Source.
`8. The system of claim 1, further comprising speech con
`ditioning circuitry that processes the speech output signal to
`provide a resulting speech signal having an amplitude gain o