`(12) Patent Application Publication (10) Pub. No.: US 2009/0067642 A1
`Buck et al.
`(43) Pub. Date:
`Mar. 12, 2009
`
`US 2009006,7642A1
`
`(54) NOISE REDUCTION THROUGHSPATIAL
`SELECTIVITY AND FILTERING
`
`(76) Inventors:
`
`Markus Buck, Biberach (DE):
`Tobias Wolff, Ulm (DE)
`
`Correspondence Address:
`HARMAN - BRINKSHOFERCHICAGO
`Brinks Hofer Gilson & Lione
`P.O. Box 10395
`Chicago, IL 60610 (US)
`
`(21) Appl. No.:
`
`12/189,545
`
`(22) Filed:
`
`Aug. 11, 2008
`
`Foreign Application Priority Data
`(30)
`Aug. 13, 2007 (EP) .................................. O7O15908.2
`
`Publication Classification
`
`(51) Int. Cl.
`(2006.01)
`H04B I5/00
`(52) U.S. Cl. ....................................................... 381A94.1
`(57)
`ABSTRACT
`A signal processor uses input devices to detect speechoraural
`signals. Through a programmable set of weights and/or time
`delays (or phasing) the output of the input devices may be
`processed to yield a combined signal. The noise contributions
`of some or each of the outputs of the input devices may be
`estimated by a circuit element or a controller that processes
`the outputs of the respective input devices to yield power
`densities. A short-term measure or estimate of the noise con
`tribution of the respective outputs of the input devices may be
`obtained by processing the power densities of some or each of
`the outputs of the respective input devices. Based on the
`short-term measure or estimate, the noise contribution of the
`combined signal may be estimated to enhance the combined
`signal when processed further. An enhancement device or
`post-filter may reduce noise more effectively and yield robust
`speech based on the estimated noise contribution of the com
`bined signal.
`
`2OO
`
`r
`
`- n
`
`a
`
`r r w : w -x X -
`
`c
`
`X
`
`w
`
`w
`
`208
`
`2 1 O
`
`
`
`
`
`blocking
`matrix
`
`If uk
`A(e.g.,k)
`
`post-filter
`
`i2.
`Pte spik)
`
`noise
`reducer
`
`;
`
`:
`
`.
`
`(e fu.k.)
`MAP
`218
`optimizer
`
`220
`
`(
`p()
`
`Synthesis
`filter
`bank
`
`Page 1 of 16
`
`GOOGLE EXHIBIT 1011
`
`
`
`Patent Application Publication
`
`Mar. 12, 2009 Sheet 1 of 9
`
`US 2009/0067642 A1
`
`
`
`É
`
`3
`
`
`
`Page 2 of 16
`
`
`
`Patent Application Publication
`
`Mar. 12, 2009 Sheet 2 of 9
`
`US 2009/0067642 A1
`
`O
`CN
`CN
`
`4
`
`p: 4 x
`
`or as we
`
`---+ - - - - - - - - - - - - - - - - - -• • • • ? ? ? ? ? ? ? ? ?* * * *
`
`00Z
`
`Ozz
`
`80Z70ZZOZ
`
`
`
`
`
`
`
`*** xx … … * * *- …« … --? ?
`
`-
`
`-
`
`Page 3 of 16
`
`
`
`Patent Application Publication
`
`Mar. 12, 2009 Sheet 3 of 9
`
`US 2009/0067642 A1
`
`809
`
`019
`
`Z09
`
`909
`
`Page 4 of 16
`
`
`
`Patent Application Publication
`
`Mar. 12, 2009 Sheet 4 of 9
`
`US 2009/0067642 A1
`
`807Z07
`
`
`
`#707
`
`907
`
`Page 5 of 16
`
`
`
`Patent Application Publication
`
`Mar. 12, 2009 Sheet 5 of 9
`
`US 2009/0067642 A1
`
`907
`
`0 || 17
`
`Z09
`
`707
`
`907
`
`Page 6 of 16
`
`
`
`Patent Application Publication
`
`Mar. 12, 2009 Sheet 6 of 9
`
`US 2009/0067642 A1
`
`907
`
`0 || 17
`
`Z09
`
`Z07
`
`ZOG
`
`707
`
`907
`
`Page 7 of 16
`
`
`
`Patent Application Publication
`
`Mar. 12, 2009 Sheet 7 of 9
`
`US 2009/0067642 A1
`
`807
`
`0 || 7
`
`907
`
`Page 8 of 16
`
`
`
`Patent Application Publication
`
`Mar. 12, 2009 Sheet 8 of 9
`
`US 2009/0067642 A1
`
`
`
`s
`
`O 2 i
`8
`C S5
`5 5 D
`-C 9.
`5
`CD
`S3 is
`O 22
`
`Page 9 of 16
`
`
`
`Patent Application Publication
`
`Mar. 12, 2009 Sheet 9 of 9
`
`US 2009/0067642 A1
`
`
`
`
`
`
`
`Page 10 of 16
`
`
`
`US 2009/0067642 A1
`
`Mar. 12, 2009
`
`NOISE REDUCTION THROUGHSPATAL
`SELECTIVITY AND FILTERING
`
`BACKGROUND OF THE INVENTION
`0001 1. Priority Claim
`0002 This application claims the benefit of priority from
`European Patent Application No. 07015908.2, filed Aug. 13,
`2007, entitled “Noise Reduction. By Combined Beamforming
`and Post-Filtering,” which is incorporated by reference.
`0003 2. Technical Field
`0004. The inventions relate to noise reduction, and in par
`ticular to enhancing acoustic signals that may comprise
`speech signals.
`0005 3. Related Art
`0006 Speech communication may suffer from the effects
`of background noise. Background noise may affect the qual
`ity and intelligibility of a conversation and, in Some instances,
`prevent communication.
`0007 Interference is common in vehicles. It may affect
`hands free systems that are susceptible to the temporally
`variable characteristics that may define Some noises. Some
`systems that attempt to Suppress these noises through spectral
`differences that may distort speech. These systems may
`dampen the spectral components affected by noise that may
`include speech without removing the noise.
`0008. Due to the limited amount of time available to adapt
`to noise, Some systems are not successful in blocking its
`time-variant nature. Unfortunately, non-stationary distur
`bances are common in many applications.
`
`SUMMARY
`0009. A signal processor uses input devices to detect
`speech or aural signals. Through a programmable set of
`weights and/or time delays (or phasing) the output of the input
`devices may be processed to yield a combined signal. The
`noise contributions of some or each of the outputs of the input
`devices may be estimated by a circuit element or a controller
`that processes the outputs of the respective input devices to
`yield power densities. A short-term measure orestimate of the
`noise contribution of the respective outputs of the input
`devices may be obtained by processing the power densities of
`Some or each of the outputs of the respective input devices.
`Based on the short-term measure or estimate, the noise con
`tribution of the combined signal may be estimated to enhance
`the combined signal when processed further. An enhance
`ment device or post-filter may reduce noise more effectively
`and yield robust speech based on the estimated noise contri
`bution of the combined signal.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`0010. The system may be better understood with reference
`to the following drawings and description. The components in
`the figures are not necessarily to Scale, emphasis instead
`being placed upon illustrating the principles of the invention.
`Moreover, in the figures, like referenced numerals designate
`corresponding parts throughout the different views.
`0011
`FIG. 1 is a noise reduction system.
`0012 FIG. 2 is an alternative noise reduction system.
`0013 FIG. 3 is process that automatically removes noise
`(or undesired signals) from an input.
`0014 FIG. 4 is an alternative process that automatically
`removes noise (or undesired signals) from an input.
`
`0015 FIG. 5 is another alternative process that automati
`cally removes noise (or undesired signals) from an input.
`0016 FIG. 6 is another alternative process that automati
`cally removes noise (or undesired signals) from an input.
`0017 FIG. 7 is another alternative process that automati
`cally removes noise (or undesired signals) from an input.
`0018 FIG. 8 is a noise reduction system or method inter
`faced to a vehicle.
`0019 FIG. 9 is a noise reduction system or method inter
`faced to a communication system, a speech recognition sys
`tem and/or an audio system.
`
`DETAILED DESCRIPTION OF THE PREFERRED
`EMBODIMENTS
`
`0020. A signal processor uses sensors, transducers, and/or
`microphones (e.g., input devices) to detect speech or aural
`signals. The input devices convert Sound waves (e.g., speech
`signals) into analog signals or digital data. The input devices
`may be distributed about a space Such as a perimeter or
`positioned in an arrangement like an array (e.g., a linear or
`planar array). Through a programmable set of weights (e.g.,
`fixed weightings) and/or time delays (or phasing) the output
`of the input devices may be processed to yield a combined
`signal. The noise contributions of some or each of the outputs
`of the input devices may be estimated by a circuit element
`(e.g., a blocking matrix) and/or a controller (e.g., a processor)
`that processes the outputs of the respective input devices to
`yield (spectral) power densities. A short-term measure or
`estimate (e.g., an average short-time power density) of the
`noise contribution of the respective outputs of the input
`devices may be obtained by processing the (spectral) power
`densities of some or each of the outputs of the respective input
`devices. Based on the short-term measure or estimate, the
`noise contribution (or spectral power densities of the noise
`contribution) of the combined signal may be estimated to
`enhance the combined signal when processed further (e.g.,
`post filter). The enhancement device or post-filter may reduce
`noise more effectively and yield robust speech to improve
`speech quality and/or speech recognition.
`0021. In some systems the input devices may comprise
`two or more (M) transducers, sensors, and/or microphones
`that are sensitive to Sound from one or more directions (e.g.,
`directional microphones). Each of the input devices may
`detect Sound, e.g., a verbal utterance, and generate analog
`and/or digital communication signalsy (m=1,..., M). The
`communication signals may be enhanced by a noise reduction
`process or processor. A signal processor may process data
`about the location of the input devices and/or the communi
`cation signals directions to improve the rejection of unwanted
`signals (e.g., through a fixed beam former). The communica
`tion signals may be processed by a blocking matrix to repre
`sent noise that is present in the communication signals.
`0022. In some systems, signals are processed (e.g., a sig
`nal processor) in a Sub-band domain rather than a discrete
`time domain. In other systems, signals are processed in a time
`domain and/or frequency domains. When processing at a
`Sub-band resolution, the communication signals (y) may be
`divided into bands by an analysis filter bank to render sub
`band signals Y(e',k). At time k, the frequency sub-band
`may be represented by S2 and the imaginary unit may be
`represented byj. An enhanced beam formed signal (P) may be
`filtered by an optional synthesis filter bank to obtain an
`enhanced audio signal, e.g., a noise reduced speech signal.
`
`Page 11 of 16
`
`
`
`US 2009/0067642 A1
`
`Mar. 12, 2009
`
`0023. A beam formed signal in the sub-band domain may
`representa Discrete Fourier transform coefficient A(e'".k) at
`time k for the frequency sub-band S2. The output of the
`(signal processor or) beam forming technique may be filtered
`which may enhance the output and reduce noise. In some
`systems, the beam formed signals A(e'".k) may be pre-pro
`cessed to reduce noise. The incidence or severity of noise may
`be reduced by identifying or estimating the (power densities)
`noise contributions of each of the communication signals
`(y). In some systems, the noise contributions may be ren
`dered through a blocking matrix. The noise contributions of
`each of the communication signals may be substantially Sup
`pressed (e.g., Subtracted) before the signals are combined to
`obtain signal A(e'".k). A General Sidelobe Canceller (GSC)
`that may include a delay-and-Sum beam former, for example,
`may suppress noise before a post-filtering process removes
`residual noise.
`0024. In some systems, an adaptive weighted Sum beam
`former may combine time aligned signals y, of M input
`devices. An adaptive weighted Sum may include time depen
`dent weights that are recalculated more than once (e.g.,
`repeatedly) to maintain directional sensitivity to a desired
`signal. The time dependent weights may further minimize
`directional sensitivity to noise sources.
`0025. A post-filtering process may be based on an esti
`mated (spectral) power density (A)of the noise contribution
`(A) of a beam formed signal (A). The estimated (spectral)
`power density (A) may be based on an average short-time
`power density (V) of a noise contributions of each of the
`communication signals (y) as described by Equation 1.
`
`Equation 3
`
`nearly-0. In some systems, adaptations occur exclusively
`when speech is not detected or when pauses in speech are
`detected (e.g., through a speech or pause detector).
`(0027. When a Weiner technique or filters are used, the
`hardware and/or software selectively pass certain elements of
`the combined or beam formed signal (A). The filter passes an
`enhanced output (P) (e.g., a combined or beam formed signal)
`according to Equation 3.
`P(euk)-H(euk)A(ejuk)
`where
`Equation 4
`H(ejuk)=1-(eisuk)
`In Equations 3 and 4, 5,(e'".k) represents an estimate for
`|A(e'.k)||A(e'.k). In these expressions A(e'.k)
`comprises the noise contribution of the combined or beam
`formed signal A(e'".k) at time k for the frequency sub
`bands2.ace".k) may be obtained from the output of the
`signal processor or beam former, and the estimate of A(e''",
`k) (e.g., A(e'".k)) may be obtained as described above or
`below. The Wiener filter devices or techniques may be very
`efficient and reliable post-filters and may have stable conver
`gence characteristics. Through its comparisons, the Weiner
`filters or techniques may reduce processor loads and proces
`Sor times.
`10028. In some systems, (e'".k), e.g., the estimate for
`|A(e'.k)||A(e'.k) may be based on a point estimate
`that may be based on a method of maximum aposteriori (e.g.,
`MAP or a posterior mode). The MAP estimate may yield
`Wiener filter characteristics or coefficients that efficiently
`reduce (residual) noise from the combined or beam formed
`signal. A first estimate for the filter characteristics may be
`given by Equations 5 and 6.
`1-(e.,k)
`
`Equation 5
`
`In Equation 1 M represents the number of input devices or
`microphones and the asterisk represents the complex conju
`gate. In each sub-band, U(e'".k) represents the (spectral)
`power density of a noise contribution present in the commu
`nication signal y(1) (after Sub-band filtering of the commu
`nication signal).
`0026. In some systems, the post-filter may comprise a
`Wiener or Weiner like filter. The filter coefficients may be
`adapted to the estimated power density of the noise contribu
`tion of the combined or beam formed signal. To obtain the
`filter coefficients, a signal processor may multiply the short
`time power density (V) of the noise contributions of each of
`the communication signals (y) with a real factor f3(e'".k) at
`timek for the frequency sub-band S2. The real factor B(er,
`k) may be adapted to the expectation values E described in
`Equation 2.
`
`In Equation 2, A,(e'".k), A(e'".k) and A.(e'".k) represent
`the estimated power density |A(e'".k)' of the noise contri
`bution (A) of the combined or beam formed signal (A), the
`noise contribution of the beam formed signal (A), and the
`portion of the wanted signal of the output of the signal pro
`cessor or beam former, respectively (A=An+As). If the pro
`cessed signal detected by the Minput devices or arrays (e.g.,
`microphones or microphone array) is speech, the adaptation
`of the real coefficient f(e'.k) may occur during pauses in
`speech, e.g., during periods in that A(e'".k)=0 or is
`
`Equation 6
`(eink)=A(euk)|2/B(euk)V(euk)
`In Equations 5 and 6. 5,(e'".k) may be optimized through a
`MAP estimate.
`0029. An exemplary method of a MAP estimate in a loga
`rithmic representation may be described by Equation 7
`Equation 7
`The ratio T(e'.k)=10 log{A(e'.k)||A(e'".k)|} is to
`be estimated and the estimation error A(e'".k)=10 log{A,
`(e'".k)/A,(e'".k)} is a measure for the estimated power
`density of the noise contribution of the combined or beam
`formed signal A(e'".k). During speech pauses (e.g., T.(e''",
`k)=0), an estimation error A(e'".k) may generate artifacts
`that may be perceived as musical tones. An estimate
`F.(e''",k) obtained through a MAP method may minimize
`the musical noise.
`0030 FIG. 1 is a block diagram of a noise reduction sys
`tem 100 that receives the communication signals described by
`Equation 8.
`y(l), m=1,..., M.
`In Equation 8, (1) represents a discrete time index that is
`obtained by Minput devices (e.g., microphones such as direc
`tional microphones that may be part of a microphone array).
`In FIG. 1, the GSC processor 102 interfaces multiple signal
`processing paths. A first path (or cancellation path) comprises
`an adaptive path that may include a blocking matrix and an
`adaptive noise canceller. The second path (or compensation
`path) may include fixed delay compensation or a fixed beam
`
`Page 12 of 16
`
`
`
`US 2009/0067642 A1
`
`Mar. 12, 2009
`
`former. The compensation or beam former may enhance sig
`nals through time delay compensations. The blocking matrix
`may be configured or programmed to generate noise refer
`ence signals that may dampen or Substantially remove (re
`sidual) noise from the output signal of the compensation path
`or fixed beam former.
`0031. Through the GSC processor 102, the Discrete Fou
`rier Transform (DFT) coefficient, e.g., the sub-band signal,
`A(e'".k) may be obtained at time k for the frequency sub
`band S2. For each (or nearly each) channel m, the noise
`portions U,(e'".k) of the communication signalsy,(1) may
`be obtained as Sub-band signals by the blocking matrix that
`may be part of the cancellation path of the GSC processor
`102. In FIG. 1, the scalar estimator 104 (e(e'.k) may be
`based on the output of the (cancellation path or) the blocking
`matrix U.(e''",k)) and the (compensated output of the fixed
`beam former or) output of the GSC A(e'.k). The hardware
`and/or software of the post filter 106 selectively passes certain
`elements of the output of the GSC A(e'".k) and eliminates
`and minimizes others to obtain a noise reduced audio or
`speech signal (a desired or wanted signal) p(T).
`0032 FIG. 2 illustrates an alternative noise reduction sys
`tem 200 that includes a GSC controller 220, a MAP optimizer
`218, and a post-filter 210. An interface receives communica
`tion signalsy (1) that are processed by an analysis filter bank
`202. The hardware or software of the analysis filter bank 202
`rejects signals while passing other that lie with within the
`sub-band signal Y,(e'".k) bands. The analysis filter bank
`202 may use a Hanning window, a Hamming windowing, or
`a Gaussian window, for example. AGSC controller 220 com
`prising a beam former 204, a blocking matrix 206, and a noise
`reducer 208 receives the sub-band signals Y(e'".k). The
`noise reducer 208 subtracts (or dampens) noise estimated by
`the blocking matrix 206 from the sub-band signals.Y.,(e'.k)
`to obtain the noise reduced Discrete FourierTransform (DFT)
`coefficient A(e'''.k).
`0033. In FIG. 2 the blocking matrix 206 may comprise an
`adaptive filter. The noise signals output of the blocking matrix
`206 may entirely (or in the alternative systems partially or not
`completely) block a desired or useful signal within the input
`signals that may result or pass a band limited spectra of the
`undesired signals. A Walsh-Hadamard kind of blocking
`matrix or a Griffiths-Jim blocking matrix may be used in
`Some systems. The Walsh-Hadamard blocking matrix may,
`be established for arrays comprising of M=2" input devices
`(or microphones).
`0034. In FIG. 2, a post-filter 210 (e.g., a Wiener filter or a
`spectral subtractor) may further reduce residual noise. When
`a Wiener-like filter is used, an exemplary filter characteristic
`may be described by Equation 9.
`
`H(e') = 1 - (se. (2) + ale)
`Saa (O)
`
`-l
`
`Equation 9
`
`In Equation 9. S. (S2) and S (S2) represent the auto power
`density spectrum of the wanted (or desired) signal and the
`noise disturbances or perturbation contained in the output
`A(e'".k) of the GSC controller 220, respectively. In some
`systems, it may be assumed that the wanted or desired signal
`and the noise disturbances or perturbation are uncorrelated.
`0035 An aposteriori signal-to-noise ratio (SNR) shown in
`the brackets of Equation 9 may be estimated by a temporal
`
`averaging to target stationary disturbances or perturbations.
`In FIG. 2, the system 200 may suppress time-dependent varia
`tions or perturbations. A time-dependent estimate for a post
`filtering Scalar may be given by Equation 10.
`
`|A(e'u, k)
`
`Equation 10
`
`In equation 10. A represents the noise portion of (A).
`10036) An estimate (e',k) for Y(e',k) of the direction
`and incidence of Sound may be achieved by estimating A.
`(A) may be obtained from the output of the GSC controller
`220. In FIG. 2. A may be obtained from the output of the
`blocking matrix 206.
`0037. In this example, the average short-time power den
`sity of the output signals of the blocking matrix 206 V(e'".k)
`may obtained by device (or controller) 212 of FIG. 2 as
`described by Equation 11
`
`1
`4
`V(e's, k) = X U?e", k) U,(e", k)
`
`Equation 11
`
`where theasterisk represents the complex conjugate. An esti
`mate A.(e'".k) for IA.(e'".k), may be obtained through the
`real factor B(e'.k), e.g., A.(e'.k)=f3(e'".k)V(e'.k). The
`real factor f3(e'".k) may be adapted to satisfy the relation for
`the expectation values E
`
`where A(e'".k) is the portion of the wanted signal of the
`output of the GSC A(e'".k). Thus, an estimate may be
`described by Equation 13.
`
`|Aceh, k)
`
`Equation 13
`
`I0038. By factor B(e',k), a power adaptation of the power
`density of the outputs of the GSC controller 220 and the
`blocking matrix 206 may be estimated or measured through
`the power adapter 214. The post-filter scalar (e'".k) esti
`mate may be determined by an estimator 216. The post-filter
`scalar may be optimized by a MAP optimizer 218.
`0039. In FIG. 2, the post-filter 210 may be adapted through
`a MAP or a posterior mode estimation of the noise power
`spectral density. An exemplary method of a MAP estimate in
`a logarithmic domain or a logarithmic estimate of a post-filter
`scalar may be described by Equation 7.
`
`T(e'u, k) = 10logy (e'u, k)
`
`Equation 7
`
`= 10log
`
`+ 10log
`A, (eu, k)
`= 10logy (e', k) + 10logo.(e'it, k)
`
`Page 13 of 16
`
`
`
`US 2009/0067642 A1
`
`Mar. 12, 2009
`
`where A(e'".k) represents the estimation error. In some sys
`tems, the estimation error may generate artifacts that may be
`perceived as musical noise.
`I0040 Some systems minimize the estimation error A(e''",
`k) (. In this explanation T(e'".k) and A(e'".k) are assumed
`to represent stochastic variables. For a given observable, e.g.,
`F.(e''",k), the probability that the quantity that is to be esti
`mated, eg. T(e'".k), assumes a value may be given by the
`conditional density p(TIC) (in the following the argument
`(e'".k) is omitted for simplicity). According to MAP princi
`pals, the system may choose the value for T, that maximizes
`p(TIR):
`
`f = argmaxp( T.I.)
`
`Equation 14
`
`By Bayes' rule the conditional density p may be expressed as
`Equation 15
`
`p(T. T.)p(T)
`p(Ta Ti) =
`p(Ta)
`
`Equation 15
`
`where p(T) is known as the a priori density. Maximization
`requires for
`
`op(T. T.)p(T) = 0
`6.
`
`Equation 16
`
`Based on empirical studies the conditional density
`0041
`can be modeled by a Gaussian distribution with variance A:
`
`p(TT) =
`
`(T. - T )?
`est- s 6.
`th. A
`
`27th A
`
`Equation 17
`
`0042 Assuming that the real and imaginary parts of both
`the wanted signal and the disturbance or perturbation may be
`described as average-free Gaussians with identical variances
`p(T) can be approximated by
`
`0043. In Equation 19 the instantaneous aposteriori SNR is
`expressed as a function of the perturbed measurement value
`F, the aprioriSNRS as well as the variance pa(note that f =
`P. for A=0). In the limit of A->OO the filter weights of the
`Wiener characteristics may be obtained. If the a priori SNRS
`is negligible, e.g., during speech pauses, the filter is closed in
`order to avoid musical noise artifacts.
`0044 Consequently, the above-mentioned Wiener charac
`teristics for the post-filter 210 may be obtained for each time
`k und frequency interpolation point S2 as follows:
`H(eink)=1--" (euk)
`Equation 20
`0045. The output of the GSC controller 220, e.g., the DFT
`coefficient A(e'".k), is filtered by the post-filter 210 that may
`be adapted by the process described above. The filtering may
`yield the noise reduced DFT coefficient P(e'".k)=H(e'".k)
`A(e'.k). In some systems, an optional synthesis filter bank
`220 may obtain a full-band noise reduced audio signal p(1).
`0046. In the above described system, the parameters S. A
`and K may be determined. For upper limit K of the variance
`up(S) a value of about 50 may be used. The priori SNRS may
`be derived by a decision directed approach. According to noe
`approach S can be estimated as
`
`|A(k)?
`P(k - 1)
`g(k) = a -
`(1 - a)F
`-- 1 with
`th,
`
`Equation 21
`
`Fx =
`
`x, if x > 0
`O, else
`
`and P(k - 1)
`
`denoting the squared magnitude of the DFT coefficient at the
`output of the post-filter 210 at timek-1. Thereal factorag may
`be a Smoothing factor of almost 1, e.g., 0.98.
`0047. In some systems, the estimate for the variance of the
`perturbation , is not determined by means of temporal
`Smoothing in speech pauses. Rather spatial information on
`the direction of perturbation shall be used by recursively
`determining , as decribed in Equation 22.
`
`Equation 22
`with the Smoothing factor a, that might be chosen from
`between about 0.6 and about 0.8. A may be recursively
`determined during speech pauses (e.g.,
`-0) according to
`Equation 23.
`
`El Test
`p(T) =
`V2th)
`
`2
`T -
`- Ali ())
`2ft (8)
`
`Equation 18
`
`iA(k) = a(k)ii (k-1)+(1-a(k))(T(k)) with
`, if its = 0
`a()={" it.
`O, else
`
`Equation 23
`
`with the a priori SNR S-?up, and p(S)=KS/(1+S) and
`L(S)-10 log(S+1), where K is the upper limit of the variance
`p (S). Use has shown that satisfying results may beachieved
`with, e. g., K=50. Solution for the maximization requirement
`above results in
`
`f = K&T + (5 + 1)th A10logg + 1)
`6.
`K+ (+ 1)j. A
`
`Equation 19
`
`from which the scalar estimate =10'a' 'readily results.
`
`with the Smoothing factor a that might be chosen from
`between 0.6 and 0.8.
`0048. Some processes may automatically remove noise
`(or undesired signals) to improve speechand/or audio quality.
`In the automated process of FIG.3, aural or speech signals are
`received at 302. The Sound waves (e.g., speech signals) may
`be converted into analog signals or digital data. Through a
`programmable set of fixed weights and/or time delays the
`received inputs are processed to yield a combined signal at
`304. The noise contributions of each of the detected signals
`are estimated through a dynamic process at 306. A signal
`processing technique or dynamic blocking technique may
`
`Page 14 of 16
`
`
`
`US 2009/0067642 A1
`
`Mar. 12, 2009
`
`processes the detected inputs to yield (spectral) power densi
`ties. A short-term measure or estimate (e.g., an average short
`time power density) of the noise contribution of the detected
`inputs may be obtained by processing the (spectral) power
`densities of some or each of the detected inputs. Based on the
`short-term measure or estimate, the noise contribution (or
`spectral power densities of the noise contribution) of the
`combined signal may be estimated at 308 to enhance the
`combined signal when further processed. The filter coeffi
`cients (e.g., Scalar coefficients) may be adapted from the
`estimate of the noise contribution of the combined signal at
`310. At 312 an optional synthesis filter may reconstruct the
`signal to yield a robust speech.
`0049. In another processes shown in FIG. 4, an input array
`(e.g., a microphone array comprising at least two micro
`phones) may detect multiple communication signals at 402. A
`signal processing method may selectively combine (e.g.,
`beam formed) the multiple communication signals to a fixed
`bearnforming pattern at 404. An adaptive filtering process
`may process the communication signals to obtain the power
`densities of noise contributions of each of the communication
`signals at 406. The signal processing method may process, the
`power densities of noise the contributions of each of the
`communication signals to render an average short-time power
`density. The signal processing method may estimate the
`power density of a noise contribution of the combined signal
`(or beam formed signal) based on the average short-time
`power density at 408. Apost-filtering process at 410 may filter
`the combined signal (or beam formed signal) based on the
`estimated power density of the noise contribution of the
`beam formed signal to improve the rejection of unwanted or
`undesired signals.
`0050. The signal processing method may further comprise
`a signal processing technique or a filtering array method that
`separates the communication signals into several compo
`nents, each one comprising or containing a frequency Sub
`band of the original communication signals as shown at 502
`of FIG. 5. The method or filter may isolate the different
`frequency components of the communication signals. In FIG.
`6, the post-filtered communication signals are processed to
`synthesize speech at 602. In some processes, speech is syn
`thesized at 702 by methods that may not separate communi
`cation signals into several components as shown in FIG. 7.
`0051. The methods and descriptions of FIGS. 1-7 may be
`encoded in a signal bearing storage medium, a computer
`readable medium or a computer readable storage medium
`Such as a memory that may comprise unitary or separate
`logic, programmed within a device such as one or more inte
`grated circuits, or processed by a controller or a computer. If
`the methods are performed by software, the software or logic
`may reside in a memory resident to or interfaced to (or a
`system that interfaces or is integrated within) one or more
`processors or controllers, a wireless communication inter
`face, a wireless system, a communication controller, an enter
`tainment and/or comfort controller of a structure that trans
`ports people or things such as a vehicle (e.g., FIG. 8) or
`non-volatile or volatile memory remote from or resident to
`device. The memory may retain an ordered listing of execut
`able instructions for implementing logical functions. A logi
`cal function may be implemented through digital circuitry,
`through source code, through analog circuitry, or through an
`analog Source Such as through an analog electrical, or audio
`signals. The Software may be embodied in any computer
`readable medium or signal-bearing medium, for use by, or in
`
`connection with an instruction executable system or appara
`tus resident to a vehicle (e.g., FIG. 8) or a hands-free or
`wireless communication system (e.g., FIG. 9). Alternatively,
`the Software may be embodied in media players (including
`portable media players) and/or recorders. Such a system may
`include a computer-based system, a processor-containing
`system that includes an input and output interface that may
`communicate with an automotive or wireless communication
`bus through any hardwired or wireless automotive commu
`nication protocol, combinations, or other hardwired or wire
`less communication protocols to a local or remote destina
`tion, server, or cluster.
`0.052 A computer-readable medium, machine-readable
`medium, propagated-signal medium, and/or signal-bearing
`medium may comprise any medium that contains, stores,
`communicates, propagates, or transports Software for use by
`or in connection with an instruction executable system, appa
`ratus, or device. The machine-readable medium may selec
`tively be, but not limited to, an electronic, magnetic, optical,
`electromagnetic, infrared, or semiconductor system, appara
`tus, device, or propagation medium. A non-exhaustive list of
`examples of a machine-readable medium would include: an
`electrical or tangible connection having one or more links, a
`portable magnetic or optical disk, a Volatile memory Such as
`a Random Access Memory "RAM” (electronic), a Read-Only
`Memory “ROM, an Erasable Programmable Read-Only
`Memory (EPROM or Flash memory), or an optical fiber. A
`machine-readable medium may also include a tangible
`medium upon which software is printed, as the software may
`be electronically stored as an image or in anotherformat (e.g.,
`through an optical scan), then compiled by a controller, and/
`orinterpreted or otherwise processed. The processed medium
`may then be stored in a local or remote computer and/or a
`machine memory.
`0053 While various embodiments of the invention have
`been described, it will be apparent to those of ordinary skill in
`the art that many more embodiments and implementations are
`possible within the scope of the invention. Accordingly, the
`invention is not to be restricted except in light of the attached
`claims and their equivalents.
`
`What is claimed is:
`1. Method for audio signal processing, comprising
`detecting an audio signal from a microphone array to
`obtain communication signals;
`processing the communication signals by a beam former to
`obtain a beam formed signal;
`processing the communication signals through a blocking
`matrix to obtain