`
`(12) United States Patent
`Buck et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,194,872 B2
`Jun. 5, 2012
`
`(54) MULTI-CHANNEL ADAPTIVE SPEECH
`SIGNAL PROCESSING SYSTEM WITH NOISE
`REDUCTION
`
`(75) Inventors: Markus Buck, Biberbach (DE); Tim
`Haulick, Blaubeuren (DE); Phillip A.
`Hetherington, Port Moody (CA); Pierre
`Zakarauskas, Vancouver (CA)
`(73) Assignee: Nuance Communications, Inc.,
`Burlington, MA (US)
`
`(*) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 1229 days.
`(21) Appl. No.: 11/234,837
`(22) Filed:
`Sep. 23, 2005
`
`(65)
`
`(30)
`
`Prior Publication Data
`US 2006/0222184 A1
`Oct. 5, 2006
`
`Foreign Application Priority Data
`
`Sep. 23, 2004 (EP) ..................................... O4O22677
`
`(51) Int. Cl.
`(2006.01)
`A6IF II/06
`(2006.01)
`H04B I5/00
`(2006.01)
`H04R L/40
`(52) U.S. Cl. ....................... 381/71.11: 381/94.1: 381/97
`(58) Field of Classification Search ............... 381/71.11,
`381/92, 94.1, 94.8, 94.5; 704/226
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`6,449,586 B1
`9/2002 Hoshuyama .................. TO2,190
`2003/01 08214 A1* 6/2003 Brennan et al. ..
`381 (94.7
`2004/O161121 A1* 8, 2004 Chol et al. ...................... 381/92
`
`DE
`DE
`JP
`JP
`JP
`WO
`
`FOREIGN PATENT DOCUMENTS
`43 30 243 A1
`3, 1993
`19934 724 A1
`4/2001
`2000-047699
`2, 2000
`2000-181498
`6, 2000
`2003-0271.191
`9, 2003
`WO 01.10169 A1
`2, 2001
`
`OTHER PUBLICATIONS
`Herbordt, Wolfgang et al., "Adaptive Beamforming for Audio Signal
`Acquisition'. Adaptive Signal Processing. Applications to Real
`World Problems, J. Benesty et al. (Eds.), copyright 2003, Chapter 6,
`pp. 155-194.
`
`(Continued)
`
`Primary Examiner — Devona Faulk
`Assistant Examiner — George Monikang
`(74) Attorney, Agent, or Firm — Sunstein Kann Murphy &
`Timbers LLP
`
`(57)
`ABSTRACT
`An adaptive signal processing system eliminates noise from
`input signals while retaining desired signal content, Such as
`speech. The resulting low noise output signal delivers
`improved clarity and intelligibility. The low noise output
`signal also improves the performance of Subsequent signal
`processing systems, including speech recognition systems.
`An adaptive beam former in the signal processing system
`consistently updates beam forming signal weights in response
`to changing microphone signal conditions. The adaptive
`weights emphasize the contribution of high energy micro
`phone signals to the beam formed output signal. In addition,
`adaptive noise cancellation logic removes residual noise from
`the beam formed output signal based on a noise estimate
`derived from the microphone input signals.
`
`24 Claims, 6 Drawing Sheets
`
`200
`
`
`
`
`
`
`
`
`
`
`
`
`
`Compensation is Adaptive Self
`Time Delay
`Calibration
`XTM
`Logic
`202
`
`
`
`
`
`
`
`
`
`Adaptation
`Control Logic
`112
`
`114
`
`Adaptive
`Beamformer
`
`Adaptive Noise
`Cancellation
`Logic 118
`
`Page 1 of 14
`
`GOOGLE EXHIBIT 1004
`
`
`
`US 8,194,872 B2
`Page 2
`
`OTHER PUBLICATIONS
`Herbordt, W. et al., “Analysis of Blocking Matrices for Generalized
`Sidelobe Cancellers for non-Stationary Broadband Signals'. Student
`Forum of Int. Conference on Acoustics, Speech and Signal Process
`ing, May 2002, retrieved from the Internet at: <URL:http://www.int.
`de/LMS publications/web/Int2002 007.pdf>. 4 pages.
`Herbordt, Wolfgang et al., “Frequency-Domain Integration of Acous
`tic Echo Cancellation and a Generalized Sidelobe Canceller with
`Improved Robustness”, European Translations on Telecommunica
`tions, vol. 13, No. 2, Jun. 2002, retrieved from the Internet at:
`<URL:http://www.Int.de/LMS publications/web/Int2002 006>.
`pdf, pp. 1-10.
`Hoshuyana, Osamu et al., “A Robust Adaptive Beamformer for
`Microphone Arrays with a Blocking Matrix Using Constrained
`Adaptive Filters', IEEE Transactions OnSignal Processing, vol. 47.
`No. 10, 1999, pp. 2677-2684.
`“Microphone Arrays—Signal Processing Techniques and Applica
`tions'. M. Brandstein et al. (Eds.), copyright Springer-Verlag 2001,
`pp. 3-106 and 229-349.
`
`Gannot, Sharonetal. “Signal Enhancement Using Beamforming and
`Nonstationarity With Applications to Speech'. IEEE Transactions. On
`Signal Processing, vol. 49, No. 8, 2001, pp. 1614-1626.
`Griffiths, Lloyd J. et al., “An Alternative Approach to Linearly Con
`strained Adaptive Beamforming”, IEEE Transactions on Antennas
`and Propagation, vol. AP-30, No. 1, 1982, pp. 27-34.
`McCowan, Iain A. et al. "Adaptive Parameter Compensation for
`Robust Hands-Free Speech Recognition Using a Dual Beamforming
`Microphone Array”. Proceeding of 2001 International Symposium
`on Intelligent Multimedia, Video and Speech Processing, 2001, pp.
`547-550.
`Oh, Stephen et al. “Hands-Free Voice Communication in an Auto
`mobile With a Microphone Array, IEEE Digital Signal Processing,
`vol. 5, 1992, pp. I-281 to I-284.
`Van Veen, Barry D. et al., “Beamforming: A Versatile Approach to
`Spatial Filtering”, IEEE ASSP Magazine, 1988, pp. 4-24.
`* cited by examiner
`
`Page 2 of 14
`
`
`
`U.S. Patent
`
`Jun. 5, 2012
`
`Sheet 1 of 6
`
`US 8,194,872 B2
`
`SITo807q
`
`uoneljsouez
`
`uonedepy
`
`801
`
`oardepy
`
`JoWOyWIeOg
`
`
`
`OSIONdAdepy
`
`
`
`vOTs1807
`
`Aejaqouty
`
`uonesusdwi0,
`
`Page 3 of 14
`
`[wnsty
`
`
`
`d1d0'][ONUOZ
`
`Page 3 of 14
`
`
`
`
`U.S. Patent
`U.S. Patent
`
`
`
`Jun. 5, 2012
`
`Sheet 2 of 6
`
`US 8,194,872 B2
`US 8,194,872 B2
`
`ZNs]
`
`uo?endepv
`uoneldepy
`
`ISIONoandepy
`
`uonry[aoues
`
`
`
`801I
`
`JULIOJUIBO
`
`sandepy
`
`00¢
`/
`
`
`
` “JOSoandepya
`
`uonelqie)
`
`I130'7T
`
`901
`
`
`
`POTs1807
`
`
`
`Aejaqsully,
`
`uolyesuadwio7)
`
`Page 4 of 14
`
`Page 4 of 14
`
`
`
`
`
`
`U.S. Patent
`
`Jun. 5, 2012
`
`Sheet 3 of 6
`
`US 8,194,872 B2
`
`Receive multiple
`microphone input
`Signals.
`
`Digitize microphone input
`signals.
`304
`
`Transform microphone
`signals into the frequency
`domain.
`306
`
`Compensate for time
`delays.
`
`3O8
`
`Adaptively self-calibrate
`microphone signals. 310
`
`Determine adaptive
`weighting coefficients for
`adaptive beamformer. 312
`
`300
`
`Determine beam formed
`signal.
`31 4
`
`Determine noise reference
`signals.
`316
`
`Determine noise
`estimate.
`
`3 1 8
`
`
`
`Subtract noise estimate
`from the beamformed
`output signal.
`
`32
`
`Transform low noise
`output signal into the
`time domain.
`
`32
`
`End
`
`Figure 3
`
`Page 5 of 14
`
`
`
`U.S. Patent
`
`Jun. 5, 2012
`
`Sheet 4 of 6
`
`US 8,194,872 B2
`
`Measure speech
`signal energy. 402
`
`
`
`
`
`
`
`> Threshold?
`404
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Adapt beamformer
`weights.
`
`Normalize beamformer
`weights.
`
`Adapt adaptive noise
`reference logic in
`response to beam former
`adaptation.
`
`10
`
`
`
`
`
`
`
`
`
`
`
`oise Present?
`
`Adapt adaptive
`noise cancellation
`logic.
`414
`
`Figure 4
`
`Page 6 of 14
`
`
`
`U.S. Patent
`
`Jun. 5, 2012
`
`Sheet 5 of 6
`
`US 8,194,872 B2
`
`Microphone 512
`Microphone 514
`508
`
`!----------
`- - - - - - - - - - - - - 502
`
`Analog to Digital
`Converter
`504
`
`506
`Frequency
`Transform Logic
`
`102
`
`104
`Time Delay
`Compensation Logic
`
`•
`
`
`
`Adaptive
`Beamformer
`
`108
`
`106
`
`122
`
`GO
`
`124
`Yosc
`
`120
`
`Nose Reference
`Logic
`
`1 10
`
`---
`
`Adaptive Noise 118
`Cancellation Logic
`
`/
`
`W
`Adaptive Self 202
`Calibration Logic
`:
`:
`
`204
`
`Adaptation Control ----------------------
`Logic
`112
`
`Figure 5
`
`Page 7 of 14
`
`
`
`U.S. Patent
`
`Jun. 5, 2012
`
`Sheet 6 of 6
`
`US 8,194,872 B2
`
`Microphone Array
`
`608
`
`Receiver System
`
`
`
`Input Sources
`
`614
`
`606
`
`Multi-channel adaptive signal
`processing System.
`10
`
`Multi-channel adaptive signal
`processing System.
`200
`
`
`
`
`
`
`
`
`
`Audio Reproduction
`System
`616
`
`Voice Recognition
`System
`
`
`
`22
`
`Transmission
`System
`
`Figure 6
`
`Page 8 of 14
`
`
`
`US 8,194,872 B2
`
`1.
`MULT-CHANNEL ADAPTIVE SPEECH
`SIGNAL PROCESSING SYSTEM WITH NOISE
`REDUCTION
`
`PRIORITY CLAIM
`
`This application claims the benefit of priority from Euro
`pean Patent Application No. 04022677.1, filed Sep. 23, 2004,
`which is incorporated herein by reference.
`
`10
`
`BACKGROUND OF THE INVENTION
`
`15
`
`25
`
`30
`
`35
`
`1. Technical Field
`This invention relates to signal processing systems. In par
`ticular, this invention relates to multi-channel speech signal
`processing using adaptive beam forming.
`2. Related Art
`Speech signal processing systems often operate in noisy
`background environments. For example, a hands-free Voice
`command or communication system in an automobile may
`operate in a background environment which includes signifi
`cant levels of wind or road noise, passenger noise, or noise
`from other sources. Noisy background environments result in
`poor signal-to-noise ratio (SNR), masking, distortion, cor
`ruption of signals, and other detrimental effects on signals. As
`a result, noisy background environments reduce the intelligi
`bility and clarity of speech signals and reduce speech recog
`nition accuracy.
`Past attempts to improve signal quality in noisy back
`ground environments relied on multi-channel systems. Such
`as systems including microphone arrays. Multi-channel sys
`tems primarily employ a General Sidelobe Canceller (GSC)
`which processes the speech signal along two signal paths. The
`first signal path Suppresses the unwanted noise. The second
`signal path employs a non-adaptive (i.e., fixed) beam former
`that synchronizes the signal of each microphone in the array.
`The synchronization is based on the limiting assumption that
`the microphone signals differ only by their time delays. Reli
`ance on a fixed beam former renders such systems susceptible
`to potentially wide variations in energy levels at each micro
`phone in the array and the differences in SNR among the
`microphone signals.
`In many practical applications, the SNR of each micro
`phone signal of an array differs from the SNR of every other
`microphone signal obtained from the array. Under Such con
`45
`ditions, the fixed beam former may actually reduce perfor
`mance of the noise reduction signal processing system. In
`particular, microphone signals with low SNR may contribute
`excessive noise to the beam formed output signal. Thus, past
`GSC implementations did not provide a consistently reliable
`mechanism for reducing noise, and do not provide speech
`command or communication systems with a consistently
`noise free signal.
`Therefore, a need exists for an improved noise reduction
`signal processing System.
`
`40
`
`50
`
`55
`
`SUMMARY
`
`This invention provides improved speech signal clarity and
`intelligibility. The improved speech signal enhances commu
`60
`nication and improves downstream processing system perfor
`mance across a wide range of applications, including speech
`detection and recognition. The improved speech signal
`results from Substantially reducing noise, while retaining
`desired signal components.
`A signal processing system generates the improved speech
`signal on a noise reduced signal output. The signal processing
`
`65
`
`2
`system includes multiple microphone signal inputs on which
`the processing system receives microphone signals. Time
`delay compensation logic time aligns the microphone signals
`and provides the time aligned signals to noise reference logic
`and to an adaptive beam former.
`The noise reference logic generates noise reference signals
`based on the time aligned microphone signals. The noise
`reference signals are provided to adaptive noise cancellation
`logic. The adaptive noise cancellation logic produces a noise
`estimate from the noise reference signals.
`The adaptive beam former applies adaptive real-valued
`weights to the time aligned microphone signals. The adaptive
`beam former repeatedly recalculates and updates the weights.
`The updates may occur in response to temporal changes in
`noise power, speech amplitude, or other signal variations.
`Based upon the adapting weights, the adaptive beam former
`combines the time aligned microphone signals into a beam
`formed output signal. Summing logic Subtracts the noise
`estimate from the beam formed output signal. A low noise
`output signal results.
`The signal processing system may include adaptive self
`calibration logic connected to the time delay compensation
`logic. The adaptive self-calibration logic matches phase,
`amplitude, or other signal characteristics among the time
`aligned microphone signals. Alternatively or additionally, the
`signal processing system may include adaptation control
`logic connected to any combination of the adaptive self
`calibration logic, adaptive beam former, noise cancellation
`logic, and adaptive noise cancellation logic. The adaptation
`control logic initiates adaptation based on SNR, speech signal
`detection, speech signal energy level, acoustic signal direc
`tion, or other signal characteristics.
`Other systems, methods, features and advantages of the
`invention will be, or will become, apparent to one with skill in
`the art upon examination of the following figures and detailed
`description. It is intended that all Such additional systems,
`methods, features and advantages be included within this
`description, be within the scope of the invention, and be
`protected by the following claims.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The invention can be better understood with reference to
`the following drawings and description. The components in
`the figures are not necessarily to Scale, emphasis instead
`being placed upon illustrating the principles of the invention.
`Moreover, in the figures, like referenced numerals designate
`corresponding parts throughout the different views.
`FIG. 1 shows a multi-channel adaptive signal processing
`system
`FIG. 2 shows a multi-channel adaptive signal processing
`system including adaptive self-calibration logic.
`FIG.3 shows acts which the signal processing system may
`take to reduce input signal noise.
`FIG. 4 shows acts which the signal processing system may
`take to adapt to changing input signal conditions.
`FIG. 5 shows a multi-channel adaptive signal processing
`system connected to a microphone array.
`FIG. 6 shows a multi-channel adaptive speech processing
`system operating in conjunction with pre-processing logic
`and post-processing logic.
`
`DETAILED DESCRIPTION OF THE PREFERRED
`EMBODIMENTS
`
`FIG. 1 shows a multi-channel adaptive speech processing
`system 100. The processing system 100 reduces noise origi
`nally present in one or more input signals. A low noise output
`signal results.
`
`Page 9 of 14
`
`
`
`US 8,194,872 B2
`
`4
`Based on the above signal and noise component models,
`the adaptive beam former 108 may calculate the weights as:
`
`3
`The processing system 100 includes microphone signal
`inputs 102. The microphone signal inputs 102 communicate
`microphone signals X to X to time delay compensation
`logic 104. The microphone signals may be provided to the
`processing system 100 in the frequency domain and in Sub
`bands, denoted as X (nk) to X(nk), where the index M
`denotes the number of microphones, n is a frequency bin
`index, and k is a time index. However, the processing system
`100 may instead process the microphone signals in the time
`domain, a combination of the time domain and frequency
`domain, or in the frequency domain.
`The time delay compensation logic 104 generates time
`aligned microphone signals Xz to X, on time delay com
`pensated microphone signal outputs 106. The time delay
`compensated microphone signal outputs 106 connect to an
`adaptive beam former 108, noise reference logic 110, and
`adaptation control logic 112. The adaptation control logic 112
`connects to any combination of the adaptive beam former 108,
`the noise reference logic 110, and the adaptive noise cancel
`lation logic 118.
`The adaptive beam former 108 combines the time aligned
`microphone signals X, to X7 into a beam formed signal Y,
`provided on a beam formed signal output 114. The noise ref
`erence logic 110 provides noise reference signals X to
`X on noise reference signal outputs 116 to the adaptive
`noise cancellation logic 118. The adaptive noise cancellation
`logic 118 produces a noise estimate on the adaptive noise
`cancellation output 120.
`The beam formed signal output 114 and adaptive noise
`cancellation output 120 connect to Summing logic 122. The
`30
`Summing logic Subtracts the noise estimate from the beam
`formed signal to generate the low noise output signal Yis.
`The summing logic 122 provides Ys, on the noise reduced
`signal output 124.
`The time delay compensation logic 104 compensates for
`time delays between the microphone signals. A time delay in
`the microphonesignals may arise when the microphones have
`different acoustic distances from the source of the speech
`signal. The microphones may have different acoustic dis
`tances from the source of the speech signal when the micro
`phones point in different directions, are placed in different
`locations, or vary in another physical or electrical character
`istic. The time delay compensation logic 104 compensates for
`the time delay by Synchronizing the microphone signals. The
`time delay compensation logic 104 generates time aligned
`microphone signals X, to X, on the time delay compen
`sated signal outputs 106.
`The adaptive beam former 108 applies weights A(n) to the
`time aligned microphone signals. The weights may be real
`valued weights. One step in determining the weights is to
`model the time aligned microphone signals X, to X, as
`including a signal component S(nk) and a noise component
`N(nk):
`
`The adaptive beam former 108 may normalize the weights
`as shown below. Normalization provides a unity response for
`the desired signal components.
`
`The adaptive weights A(n) emphasize the contribution of
`the high energy microphone signals from each frequency
`band to the beam formed output signal. In practical applica
`tions, C, (n) and B(n) are time dependent. The adaptive
`beam former 108 may repeatedly recalculate A(n) in
`response to temporal changes in signal characteristics. Such
`as the SNR, direction, or energy as noted above. The adaptive
`beam former 108 may track the temporal changes by estimat
`ing the noise power e{IN,(nk), by determining ratios of
`speech amplitude between different microphone signals, or in
`other manners.
`The adaptive beam former 108 applies the weights A(n) to
`each time aligned microphone signal m in each Sub-band
`n.The beam formed signal Yprovides intermediate results
`in each sub-band which will lead to the low noise output
`signal Yosc:
`
`i
`
`Y (n, k) =XA, (n)XT (n, k).
`
`n=1
`
`The noise reference logic 110 generates noise reference
`signals X to X based on the time aligned microphone
`signals. The noise reference logic 110 may be implemented
`with a blocking matrix, and may be adaptive. The blocking
`matrix may be a Walsh-Hadamard, Griffiths-Jim, or other
`type of blocking matrix. In other implementations, the noise
`reference logic 110 may determine the noise reference signals
`by Subtracting adjacent time aligned microphone signals.
`The noise reference logic 110 projects the time delay com
`pensated microphone signals X, to X, onto the noise
`plane. The noise reference logic 110 thereby determines the
`noise reference signals X to X. In other words, the
`noise reference logic 110 maps complex valued microphone
`signals to the noise reference signals, which are elements of
`the noise plane in noise space.
`The noise reference signals X to X. Substantially
`eliminate what would ordinarily be the desired signal com
`ponents in the microphone signals. For example, the noise
`reference signals X to X
`may substantially eliminate
`speech signal components. The noise reference signals X.
`to X,
`thereby provide a representation of the noise in the
`microphone input signals.
`The noise reference signal outputs 116 connect to the adap
`tive noise cancellation logic 118. The adaptive noise cancel
`lation logic 118 determines a noise estimate based on the
`noise reference signals X to X
`and adaptive complex
`valued filters Hosca,(nk). The complex-valued filters may
`
`10
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`Xt(n,k)=S(n,k)+N,(n,k)
`The signal component may be modelled with positive Scal
`ing factors C, as shown below:
`
`55
`
`S(n, k)=C(n)S(n, k).
`The noise components may be assumed orthogonal to one
`other and may have powerse which differ as a function off?
`a positive real-valued number:
`
`60
`
`65
`
`Page 10 of 14
`
`
`
`US 8,194,872 B2
`
`5
`adapt to minimize the power in each sub-band of the low noise
`output signal: e{IYs (nk)}. Because the noise reference
`signals substantially eliminate the desired signal compo
`nents, the residual noise in the beam formed output signal Y,
`is reduced and SNR is further increased in the low noise
`5
`output signal Yosc.
`To adapt the complex valued filters Hosca,(nk), the adap
`tive noise cancellation logic 118 may apply an adaptation
`algorithm such as the Normalized Least-Mean Square
`(NLMS) algorithm:
`
`10
`
`Yasc(n, k) = Y, (n, k) – XXe (n, k)Hosch (n, k)
`
`-
`
`Hosc, (n, k + 1) = Hosc, (n, k.) +
`
`YaSCn(n, k)Xin (n, k).
`
`In the equation above, the asterisk denotes the complex
`conjugate of the noise reference signals. Thus, the adaptive
`noise cancellation logic uses the noise reference signals Xo,
`to X
`and the complex valued filters Hosc, (n.k) to gen
`erate the noise estimate. The noise estimate, Subtracted from
`the beam formed output signal Y-yields the low noise output
`signal Yosc.
`The Summing logic 122 subtracts the noise estimate from
`the beam formed signal Y to produce the low noise output
`Signal Ys, on the noise reduced signal output 124:
`
`Yasc(n, k) = Y, (n, k) – XXe (n, k)Hosch (n, k).
`
`-
`
`15
`
`25
`
`30
`
`35
`
`6
`110 and/or the adaptive noise cancellation logic 118 in
`response to beam former adaptation. The adaptive beam
`former 108 may adapt when the energy of desired signal
`content (e.g., speech) exceeds the background noise by a
`threshold. Furthermore, the adaptation control logic 112 may
`adapt the noise cancellation logic 118 when noise is present
`and desired signal content (e.g., speech) is Substantially
`absent or under a threshold.
`FIG. 2 shows a multi-channel adaptive speech processing
`system 200 including adaptive self-calibration logic 202. The
`adaptive self-calibration logic 202 minimizes mismatches in
`the time aligned microphonesignals X, to X, provided by
`the time delay compensation logic 104. In particular, the
`adaptive self-calibrating logic 202 minimizes mismatches in
`phase, amplitude, or other signal characteristics of the time
`aligned microphone signals X, to X, Thus, in addition to
`time delay compensation, the processing system 200 employs
`the self-calibration logic 202 to match microphone signal
`frequency characteristics prior to combining the microphone
`signals in the adaptive beam former 108.
`The adaptive self-calibration logic 202 may use self-cali
`bration filters H(nk). The self-calibration filters may
`determine the time aligned microphone signals X to Xr,
`according to:
`
`Ac (n, k) XT (n, k) Ho (n, k)
`To facilitate filter adaptation, the adaptive self-calibration
`logic 202 may determine error signals E,(nk):
`
`1 i
`
`Ecn(n, k) = XXc. (n, k) – Xcm (n, k)
`
`=
`
`The adaptive self-calibration logic 202 may employ the
`error signals E,(nk) in conjunction with an adaptation
`technique, such as the NLMS technique, which minimizes the
`power of the error signals e{IE,(nk) I} as shown below:
`
`W
`
`W
`
`Hein, k +1) = Fic.(n ) + ft.
`
`, k
`
`Eclin, KXi,(n, k).
`
`The adaptive self-calibration logic 202 may rescale the
`filters to obtain a unity mean response:
`
`He n, k) = Fic (n, k) – X Fic (n, k.) +
`
`1 i
`
`=
`
`1 i
`
`1 with Hon (n, k) |
`
`Multiple microphones in an array, even microphones of the
`same type from the same manufacturer, may differ in sensi
`tivity, frequency response, or other characteristics. The self
`calibration logic 202 compensates for differences in micro
`phone characteristics. The self-calibration logic 202 provides
`a long term matching of phase and amplitude characteristics
`among the microphones in the array. Thus, the self-calibra
`tion logic 202 may compensate for a microphone which is
`consistently more sensitive than another microphone and/or
`may compensate for a microphone with a different phase
`response than another microphone in the array. The adaptive
`self-calibration logic 202 generates self-calibrated time
`
`40
`
`In the equation above, the Summation represents the noise
`estimate determined by the adaptive noise cancellation logic
`118. Removing noise from the beam formed signal Y-yields
`an increase in SNR of the output signal Ys. The low noise
`output signal Yes, enhances speech acquisition and Subse
`quent speech processing, including speech recognition.
`The adaptation control logic 112 may control adaptation of
`45
`any combination of the adaptive beam former 108, the noise
`reference logic 110, the adaptive noise cancellation logic 118,
`or the self-calibration logic 202. The adaptation control logic
`112 controls adaptation step size. The step size may be based
`on the SNR of the microphone input signals (e.g. the instan
`taneous SNR), the detection of a speech signal in the micro
`phone input signals, the speech signal energy level, the acous
`tic signal direction, or other signal characteristics.
`The step size may be larger (and adaptation faster) when
`the SNR is high and/or when the desired signal comes from an
`expected direction (e.g., the direction of the driver in an
`automobile). The step size may be larger when the energy of
`a desired signal component (e.g., speech) exceeds back
`ground noise by a threshold. The threshold may be 5-12 db
`above the background noise, 7-8 db above the background
`noise, or may be set at another value. Signal energy 7-8 db (or
`more) above the background noise is a strong indicator that
`the desired signal component (e.g., speech) is present.
`Adaptation of the weights in the adaptive beam former 108
`may give rise to an adaptation of the noise reference logic 110
`and/or adaptive noise cancellation logic 118. Thus, the adap
`tation control logic 112 may adapt the noise reference logic
`
`50
`
`55
`
`60
`
`65
`
`Page 11 of 14
`
`
`
`7
`aligned microphone signals X to X on the self-cali
`brated time delay compensated signal outputs 204. The adap
`tive beam former 108 and the noise reference logic 110 pro
`cess the time aligned microphone signals.
`FIG. 3 shows acts 300 which the multi-channel adaptive
`speech signal processing systems may take to generate a low
`noise output signal. The signal processing systems receive
`multiple microphone input signals (e.g., signals from mul
`tiple microphones in a microphone array) (Act 302). Anana
`log to digital converter digitizes the microphone input signals
`(Act 304) and frequency transform logic (e.g., an FFT) trans
`forms the digitized input signals into the frequency domain
`(Act 306). The FFT may be a 128-point FFT performed each
`second, but the FFT length and calculation interval may vary
`depending on the application in which the signal processing
`systems 100 and 200 are employed.
`The time delay compensation logic 104 compensates for
`the time delay between microphone signals (Act 308). Addi
`tional signal matching (e.g., in phase or amplitude) occurs in
`the adaptive self-calibration logic 202 (Act 310). The time
`delay compensation and self-calibration prepare the micro
`phone input signals for processing by the adaptive beam
`former 108 and noise reference logic 110.
`An adaptive beam former 108 adaptively determines
`weights for combining the microphonesignals (Act 312). The
`weights may adapt in response to temporal changes in the
`noise power, speech amplitude, or other changes in signal
`characteristics. The adaptive beam former 108 combines the
`microphone signals into the beam formed output signal (Act
`314).
`The noise reference logic 110 generates noise reference
`signals from the time delay compensated and self-calibrated
`microphone input signals (Act 316). Noise cancellation logic
`118 generates a noise estimate based on the noise reference
`signals (Act 318). The noise estimate provides an approxima
`tion to the residual noise in the beam formed output signal.
`The Summing logic 122 subtracts the noise estimate from
`the beam formed signal (Act 320). A low noise output signal
`results. Frequency to time transformation logic (e.g., an
`inverse FFT) may convert the low noise output signal to the
`time domain.
`FIG. 4 shows acts 400 which the signal processing systems
`may take to adapt their processing to changing signal condi
`tions. The adaptation control logic 112 measures the signal
`energy of a desired signal component (e.g., speech) in the
`microphone signals (Act 402). The adaptation control logic
`112 compares the speech signal energy to a threshold energy
`level (Act 404). If the speech signal energy exceeds the
`threshold energy level, the adaptation control logic 112
`adapts the beam former weights and controls the adaptation
`step size based on noise power, speech amplitude, or other
`signal characteristics (Act 406). The adaptation control logic
`112 may also normalize the adapted beam former weights
`(Act 408). Adaptation of the beam former 108 may trigger
`adaptation of the noise reference logic (Act 410).
`If the adaptation control logic 112 does not detect speech
`signal energy in excess of the threshold noise energy level
`(Act 404), the adaptation control logic 122 may determine
`whether the signal contains noise (Act 412). When noise is
`present, the adaptation control logic 112 adapts the adaptive
`noise cancellation logic 118 (Act 414).
`FIG. 5 shows the multi-channel adaptive signal processing
`system 200 operating in conjunction with a microphone array
`502, analog to digital converter 504, and frequency transform
`logic 506. The microphone array 502 may include multiple
`sub-arrays, such as the sub-array 508 and the sub-array 510.
`Each Sub-array may include one or more microphones. In
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 8,194,872 B2
`
`10
`
`15
`
`8
`FIG. 5, sub-array 508 includes microphones 512 and 514,
`while the sub-array 510 includes microphones 516 and 518.
`The microphone array 502 outputs microphone signals to
`the digital to analog converter 504. The analog to digital
`converter digitizes the microphone signals and the samples
`are provided to the frequency transform logic 506. The fre
`quency transform logic 506 generates a frequency represen
`tation of the microphone input signals for Subsequent noise
`reduction processing.
`The microphone array 502 may provide a multi-channel
`signal transducer for the processing systems 100 and 200. The
`microphone array 502 may be part of an audio processing
`system in a car. Such as a hands free communication system,
`voice command system, or other system. The sub-arrays 508
`and 510 and/or individual microphones 512-518 may be
`placed in different locations throughout the car and/or may be
`oriented in different directions to provide spatially diverse
`reception of audio signals.
`The microphones 512-518 may be placed on or around a
`rear view mirror, headliner, upper console, or in another loca
`tion in the vehicle. When two microphones are employed, the
`first microphone may point toward the driverfor passenger,
`while the second microphone may point toward the passenger
`and/or driver. In other implementations, four microphones
`may be placed on or in the rear view mirror.
`FIG. 6 shows the multi-channel adaptive signal processing
`systems 100 and/or 200 operating in conjunction with pre
`processing logic 602 and post-processing logic 604. The pre
`processing logic 602 connects to input sources 606. The sig
`nal processing system 100 and 200 may accept input from the
`input sources 606 directly, or after initial processing by the
`pre-processing logic 602. The pre-processing logic 602
`receives signal data from the input sources 606 and performs
`any desired signal processing operation (e.g., signal condi
`tioning, filtering, gain control, or other processing) on the
`signal data prior to processing by the adaptive signal process
`ing systems 100 and 200.
`The input sources 606 may include digital or analog signal
`Sources such as a microphone array 608 or other acoustic
`sensor. The microphone array 608 may include multiple
`microphones or multiple microphone Sub-arrays. The micro
`phone array 608 or any of the microphones in the microphone
`array 608 may be part of an audio communication system
`(e.g., an automobile hands-free communication system),
`speech recognition system (e.g., an automobile Voice com
`mand system), or any other system. In a vehicle, the micro
`phones may be placed and oriented to provide spatial diver
`sity in the reception of audio energy. The microphones, pre
`processing logic