`
`Bitzer and Simmer
`
`If we now take a closer look at the W G we can see why this design is not
`suitable in real-world applications. Whereas the DSB suppresses uncorrelated
`noise equally at all frequencies, the SDB boosts uncorrelated noise at lower
`frequencies.
`In order to give a deeper insight into how supergain works, we will com(cid:173)
`pute the coefficients for an array of only two microphones. The distance is
`again 5 cm and endfire steering is used.
`
`30
`
`i 20
`00
`-0
`.!: 10
`N
`8
`:c
`
`0
`
`0.5
`
`0
`
`t
`:£
`3
`:c
`ei
`
`IO
`
`-10
`0
`
`-0.5
`0
`
`-
`-
`
`Sensor 1
`- Sensor 2
`
`,,..
`
`,:..
`
`,,..
`
`,,..
`
`.,
`
`.,
`
`,,,
`
`,,,
`
`,,,
`
`1000 2000 3000 4000
`n 15 /(2n) In Hz (cid:157)
`
`1000 2000 3000 4000
`n I /(2n) in Hz (cid:157)
`s
`Fig. 2.4. Coefficien s of a two channel SDB left: Magnitude, right: Phase (l = 5
`cm, N = 2, endfire steering direction)
`
`In Fig. 2.4 the squared magnitude and the phase of the two coefficient vec(cid:173)
`tors are shown. First of all, the coefficients are conjugate complex. Secondly,
`the fil ers force the phase between the noise components a each sensor to
`be 1r. Therefore the correlated part of the noise will be compensated. Hence,
`the desired signal is also correlated, and therefore it is reduced as well. To
`fulfill the constraint of an undisturbed desired signal, the coefficients have
`to boost the input signal to compensate this behavior, which can be seen
`left side of Fig. 2.4. Therefore, uncorrelated noise will be amplified.
`on th
`At higher frequencies the correlation between the noise components vanishes
`and the beamformer degrades to the DSB. The magnitud of the coefficients
`reaches 1/2.
`In order to overcome the problem of self-noise amplification in superdirec(cid:173)
`tive designs, Gilbert and Morgan have proposed a method for solving (2.24)
`under a W G constraint [15]. The method uses a small added scalar µ to
`the main diagonal of the normalized PSD or coherence matrix:
`(I'vv + µJ) - Ld
`W _
`c - dfl (I'vv + µI- 1d·
`
`(2.33)
`
`We prefer a mathematically equivalent form, which preserves the interpreta(cid:173)
`ion as a coherence matrix with elements smaller than one. Instead of adding
`
`Amazon Ex. 1010
`IPR Petition - US RE47,049
`
`Amazon Ex. 1010, Page 44 of 129
`
`
`
`2 Superdirec ive Microphone Arrays
`
`29
`
`the scalar to the main diagonal we divide each non-diagonal element by l+µ.
`Therefore, µ can be interpreted as the ratio of the sensor noise G'2 o the am(cid:173)
`bient noise power Pvv. For the diffus noise field the non-diagonal elements
`are given by
`
`{ [} f slnm }
`.
`sine
`I'v,. Vm = - - - -2-~
`C
`(J'
`l+ --
`cf>v
`
`(2.34)
`
`The factor µ can vary from zero to infinity, which resu lts in the unconstrained
`SDB or the DSB respectively. The W G changes as a monotonic -function
`between the two limits [15]. Typical values for µ are jn the range between
`-lOdB to -30dB. Unfortunately, there is no simple relation betweenµ and
`the resulting WNG. By using a frequency variant µ. the W G can be re(cid:173)
`stricted at all fr quencies, but not through direct computation.
`There are wo different iterative design schemes. The first one was pub(cid:173)
`lished by Doerbecker [9]. It is a straightforward implementation of a trial(cid:173)
`and-error strategy. Another iterative design method uses the scaled projection
`algori thm developed by Cox et al. for adaptive arrays [6]. Instead of the es(cid:173)
`timat d PSD-matrix the theoretically defined coherence or PSD-matrix is
`inserted in the scaled projection algorithm. This solution was presented in
`[17]. Both algorithms result in similar coefficients and can be implemented
`easily.
`
`1 0~ - - - - - - - -~
`
`t 10
`en
`"O
`.S:
`i3
`
`··-µ =-2·od8
`- µ =-30 dB
`-
`µ =-40 dB
`· - µ=var. dB
`o----------
`1000 2000 3000 4000
`0
`.Q 15 /(2n) in Hz (cid:157)
`
`0
`
`t
`en
`"O
`.S - 10
`(!) z
`~ -20
`
`. ___ __ ./
`·, /
`· .... .
`/. '
`.
`I
`:
`.
`·• .. · -µ = - 20d8
`-· µ = -30 dB
`.,.
`· ·-
`• · · µ = - 40 dB
`- · µ = var. dB
`- 3 0 - - - - - - ' - - - -~
`1000 2000 3000 4000
`0
`.Q 15 /(2n) in Hz (cid:157)
`
`\
`: ~.....
`
`'
`
`Fig. 2.5. Left: Directivity index (DI) for different constrained designs. Right: White
`G) for different constrained designs. (l = 5 cm , = 5, endlire
`noi e gain (
`steering direction)
`
`Figure 2.5 depicts the effects for three fixed and one variable µ. as con(cid:173)
`straining parameters. For the variable µ, the W G constrain was set to
`-6 dB. The constrained design facilitates a good compromise between DI
`
`Amazon Ex. 1010, Page 45 of 129
`
`
`
`30
`
`Bitzer and Simmer
`
`and W G. A careful design can optimize such arrays for a wide range of
`applications .
`
`2.3.3 Design for Cylindrical Isotropic Noise
`
`In some applications a spherical isotropic noise field is not the best choice
`or the best approximation of a given noise-field. Another well-defined noise(cid:173)
`field can be used, if we reduce the three dimensions to two dimensions. We
`get a noise-field which is defined by infinite noise sources of a circle with an
`infinite radius. This kind of noise can arise if a lot of people speak in large
`rooms where the ceiling and the floor are damped well , or in the free-field
`( cocktail-party noise) 4 . The coherence between two sensors is given by (7]
`
`(2.35)
`
`where J0 ( ·) is the zeroth-order Bessel function of the first kind . This leads to
`the solution of [8] as an improved design for speech enhancement for a. hearing(cid:173)
`aid application. In order to constrain the coefficients, a similar technique as
`in (2.34) has to be carried out.
`In comparison to the design for a diffuse noise-field the differences are not
`large, but at lower frequencies a better suppression of noise sources behind
`the look direction can be observed. Elko [11] has shown that the directivity
`factor is less and its limit is 2N - 1 in contrast to N 2 in the unconstrained
`case (µ = 0). A design example will be given in the next section.
`
`2.3.4 Design for an Optimal Front-to-Back Ratio
`
`A last data-independent design tries to optimize the front-to-back ratio. In
`many applications the look direction of the desired signal cannot be pre(cid:173)
`determined, but in most cases the desired signal is in front of the array and
`al1 disturbances are at the rear e.g. when recording an orchestra or in video(cid:173)
`conferences.
`Our suggestion for a different design strategy is not to use an isotropic
`noise field but to restrict the assumed infinite noise sources to the back half
`of a circle or a sphere.
`The resulting noise-field between two sensors separated by the distance l
`can be described by an integration over an infinite number of uncorrelated
`noise sources. The resulting function in the two-dimensional case is:
`
`l 1Bo+3,r/2
`.
`J(e1n,0o) = -
`7f Oo+1r/2
`
`exp (jt?fsc- 1lcos(0)) d0.
`
`(2.36)
`
`4 The origin of this cylindrical isotropic noise-field is the sonar application in shal(cid:173)
`low water.
`
`Amazon Ex. 1010, Page 46 of 129
`
`
`
`2 Superdirective Microphone Arrays
`
`31
`
`Using numerical integration methods inserting the resulting complex values
`in the coherence matrix and solving (2.26), resu lts in a new design which
`suppresses noise sources from the rear very well.
`
`0
`
`N
`
`~ 1000
`.;;-
`~2000
`
`-_.,
`'f 3000
`
`0
`
`N
`~ 1000
`'2
`~2000
`_.,
`C: 3000
`
`4000
`0
`
`8 / rr.
`
`(cid:157)
`
`2
`
`4000
`0
`
`1
`8 / rr. (cid:157)
`
`2
`
`10
`
`0
`
`- 10
`
`-20
`
`-30
`
`Fig . 2.6. Left: beampattern of a con trained uperdirective beamformer. Right:
`beampattem of a constrained beamformer, designed with (2.36). (l = 5 cm N = 5,
`µ = 0.01, endfue steering direction)
`
`Figure 2.6 shows beampatterns of two constrained beamformers (µ =
`0.01). The left side is computed with optimized coefficients for a diffuse noise(cid:173)
`field and the right side uses coefficients designed with the help of (2.36).
`At lower frequencies the constraining parameter is dominant and therefore
`both designs do not perform well. From 300 Hz to 2800 Hz the new design
`suppresses all signals coming from the rear at the cost of a wider main lobe;
`this is sometimes an advantage, for example if the source is not exactly in
`endfire position.
`At higher frequencies especially if spatial aliasing occurs, the new design
`boosts signals coming from directions near the look direction, which can cause
`som unnatural coloring of the signal and the remaining noise. Therefore
`special care has to be taken when choosing the parameters of the new design
`scheme.
`In order to show the advantages of the new schemes, Fig. 2.7 depicts the
`DI and the FBR measure for the three different designs. At lower frequencies
`the small advantage of the cylindrical optimal design against the spherical
`design for the FBR can be seen, but the differences are very small over t he
`whole frequency range. On the other hand the behavior of the new des ign is
`completely different. Measuring the DI leads to much smaller values but the
`FBR is very high, especially in the mid-frequency range.
`In erestingly we can transform between the optimal design for cylindri(cid:173)
`cal isotropic noise and the new design by introducing a new variable which
`
`Amazon Ex. 1010, Page 47 of 129
`
`
`
`32
`
`Bitzer and Simmer
`
`40
`
`10
`
`i
`Cl)
`"O 5
`.!:
`0
`
`0
`
`\
`
`-sph.
`-
`-cyl.
`- · back
`
`:-
`
`,1
`,, .
`I
`.
`\
`\ · . I
`1000 2000 3000 4000
`n f /(2n) in Hz --)
`s
`
`- 5
`0
`
`/
`
`/
`
`I \
`
`'
`
`'
`
`'
`
`30
`
`t
`Cl) 20
`"O
`.5
`a: 10
`Cl)
`u.
`
`-sph.
`-cyl.
`0 -
`back
`1000 2000 3000 4000
`n fs /(2n) in Hz --)
`
`-10
`0
`
`Fig. 2. 7. Left: Directivity index (DI) for three optimal designs. Right: Front-to(cid:173)
`back ratio (FBR) for three optimal designs. (l = 5 cm, N = 5, µ = 0.01 , endfue
`steering direction)
`
`adjusts the limits of the integral, i.e. ,
`
`J (eifl ,0o ,8) = 2(
`
`l
`
`7r -
`
`8)
`
`1 0o - c5 +2,r
`
`00+6
`
`exp (jf2f5Ct lcos(0)) d0 0 '.S 8 '.S 1r
`
`(2.37)
`
`Setting 8 = 0 corresponds to the isotropic noise case, and 8 = 1r /2 results in
`(2.36).
`
`2.3.5 Design for Measured Noise Fields
`
`So far only data-independent designs have been considered . If a priori knowl(cid:173)
`edge i available however, it should be used to improve the performance. For
`example this information could be a prescribed direction (0 = angle) of an
`incoming noise source. Assuming the noise source i in the far field of the mi(cid:173)
`crophone array, the complex coherence function between two sensors is given
`by
`
`R {I'
`( )}
`e XX W
`
`n
`
`m
`
`=COS
`
`( nfs cos(0)lnm.)
`
`C
`
`. (nfscos(0)lnm )
`I {I'
`m xx w = -sm
`( )}
`C
`
`m
`
`n
`
`(2.38)
`
`(2.39)
`
`Inserting the complete coherence matrix in (2.26) forms a null in that clirec(cid:173)
`tion over the whole frequency range. In order to restrict the W G a con(cid:173)
`strained design is necessary.
`Furthermore, if we assume stationarity we can measure the actual noise(cid:173)
`field and solve the design equation which results in he MVDR solution.
`Adaptive algorithms like the constrained projection by Cox (6], or the original
`
`Amazon Ex. 1010, Page 48 of 129
`
`
`
`2 Superdirective Microphone Arrays
`
`33
`
`algorithm by Frost [13], will converge exactly to the same solution under the
`assumption of stationary noise and an infinitely small step-size.
`
`2.4 Extensions and Details
`
`After describing the main form of the MVDR beamformer and typical data(cid:173)
`independent designs, we will compare them to their analogue counterparts,
`the gradient microphones. Furthermore, an alternative implementation struc(cid:173)
`ture will be given which can reduce the computational complexity and open
`superdirective designs for future extensions.
`
`2.4.1 Alternative Form
`
`Assuming a time-aligned input signal, the optimal weights are defined differ(cid:173)
`ently since the look-direction vector d is replaced by the column-vector
`
`l= ~T
`N
`
`containing only ones, and the PSD-matrix or the coherence matrix contain
`the statistical information after time alignment (see Fig. 2.8). This gives
`
`"'i:J(k) = s(k-, 0)+ vc{k)
`
`'
`
`-(cid:141)
`
`(cid:141)
`
`x:-:Jk) = s(k-1:N-i)+vN•l(k)
`(cid:141)
`
`x~(k) = s(k)+ v0(k)
`
`(cid:141)
`
`Time
`I
`delay - -
`'
`estimation ,
`and / or I
`compen-
`sation
`
`1~ '1-l(k) - s(k)
`
`- - (cid:141)
`
`v~_1(k)
`
`New
`coherence
`orPSD
`measure-
`menl
`point
`
`__ I
`
`Fig. 2.8 . Signal model after time delay compensation
`
`This solution of the constrained minimization problem can be decomposed
`into two orthogonal parts , following the ideas of Griffith and Jim [16]. One
`part represents the constraints only and the other part represents the uncon(cid:173)
`strained coefficients to minimize the output power of the noise.
`
`(2.40)
`
`Amazon Ex. 1010, Page 49 of 129
`
`
`
`34
`
`Bitzer and Simmer
`
`z
`(cid:141)
`
`Fig. 2.9. Schematic description of the decomposition of the optimal weight vector
`into two orthogonal parts
`
`The decomposed structure is depicted in Fig. 2.9. The multi-channel time(cid:173)
`aligned input signal X is multiplied by w e to fulfill the constraints. Fur(cid:173)
`thermore, the input signal is projected onto the noise-only subspace5 by a
`blocking matrix B . The resulting vector X 8 is multiplied by the optimal
`vector H and then subtracted from the output of the upper part of the struc(cid:173)
`ture to get the noise-reduced output signal Z. Several authors have shown
`the equivalence between this structure and Lhe standard beamformer [16],
`[3], [12], if
`
`1
`w c = - l
`N'
`which represents a delay-and-sum beamformer. Additionally, B has to fulfill
`the following properties:
`
`• The size of the matrix is (N - 1) x N
`• The sum of all values in one row is zero
`• The matrix has to be of rank N-1.
`An example for N = 4 is given by
`
`(2.41)
`
`[l 1 -1 -li
`
`l - 1 - 1 1
`1 -1 1 -1
`
`B =
`
`Another well-known example is the original Griffith-Jim matrix which sub(cid:173)
`tracts two adjacent channels only:
`
`B = ( ~ ~l ~I ~
`
`: )
`
`.
`
`0 · · · 0 0 1 -1
`
`The last step to achieve a solution equivalent to (2.25) is the computation
`of the optimal filter H . A closer look at Fig. 2.9 shows that Yi, X 8 and Z
`describe exactly the problem of a multiple input noise canceler, described by
`5 Which means that the desired signal is spatially filtered out (blocked).
`
`Amazon Ex. 1010, Page 50 of 129
`
`
`
`2 Superdirective Microphone Arrays
`
`35
`
`Widrow and Stearns [24]. Therefore, this structure is called the generalized
`sidelobe canceler (GSC), if an adaptive implementation is used. The non(cid:173)
`adaptive multi-channel W iener solution of this problem can be found in [21]
`
`where tf?x axa denotes the PSD-matri.x of all signals after the matrix B , and
`'1?xay1 is the cross-PSD vector between the fixed beamformer out.put, and
`the output signals X 8 . Additionally, the coefficient vector can be computed
`as a function of the input PSD-matrix:
`
`(2.42)
`
`(2.43)
`
`If we now assume a homogeneous noise field, the PSD-matrix can be replaced
`by the coherence matrix of the delay-compensated noise field to compute the
`optimal coefficients:
`
`(2.44)
`
`Therefore, all designs presented in section 2.3 can be implemented by using
`the GSC-structure. However, why should we do that? First of all, the new
`structure needs one filter less than the direct implementation. Using the first
`blocking matrix (2.41) further reduces the number of filters [l]. Secondly, a
`DSB output is available which can be used for future extensions. Thirdly, the
`new structure allows us to combine superdirective beamformers with adap(cid:173)
`tive post-filters for further noise reduction [2], and the new structure gives a
`deeper insight into MVDR-beamforming. For example, we can see that opti(cid:173)
`mal beamforming is an a,V(~raging process combined with noise compensation.
`
`2.4.2 Comparison with Gradient M icrophones
`
`Other devices with superdirectional characteristics are optimized gradient
`microphones [11]. In Fig. 2.10 a typical structure of a first order gradient
`microphone and its technical equivalent (composed of two omni-directional
`microphones) is shown.
`The acoustic delay between the two open parts of the microphone can be
`realized by placing the diaphragm not exactly in the middle, or by using a
`material with a slower speed of sound.
`The output of sud1 systems is given by
`E (w, B) = P0 (1 - exp(-jw(T + c- 1l cos(0)])) ,
`where T is t he acoustic delay and Po denotes t he amplitude of the source
`signal. If we now assume a small spacing with respect to the wavelength, an
`approximate solution can be derived:
`E(w, 0):::::: Pow(T + c- 11 cos(O)) .
`
`(2.46)
`
`(2.45)
`
`Amazon Ex. 1010, Page 51 of 129
`
`
`
`36
`
`Bitzer and Simmer
`
`fl
`
`0
`
`acoustic
`delay
`
`(cid:141)
`
`..
`(cid:141) +)
`.. -
`J
`
`~
`
`lay I
`
`Fig. 2 .10. Schematic description of a first order gradient microphone
`
`A proper choice of -r leads to the different superdirective designs, called car(cid:173)
`dioid, supercardioid and hypercardioid. For example, the beampattern for a
`hypercardi.oid first order gradient microphone shows its zeros at ~ ±109°.
`This type of microphone is designed to optimize the directivity factor and
`therefore, it represents the analogue equivalent of a two-sensor superdirective
`array. For a deeper insight and a complete review of higher order gradient
`microphones see [11].
`At lower frequenc ies the two systems react more or less equally. The ad(cid:173)
`vantages of the analogue system are the smaller size of the device, and that
`no analogue-to-digital conversion i necessary. The advantages of the digital
`array technique are its flexibility the easy scaling for many microphones, and
`the possible extensions with post-filters or other adaptive techniques.
`At higher frequencies, if the assumption of small spacing is not valid any(cid:173)
`more, the differences become visible. Through careful manufacturing these
`frequencies are much higher han the covered bandwidth . However, at some
`high frequencies the analogue microphone cancels the desired signal com(cid:173)
`pletely. On the other hand the array system reacts like a DSB at these fre(cid:173)
`quencies, and no cancellation occurs.
`
`2.5 Conclusion
`
`Designing a so-called superdirective array or an optimal array for theoret(cid:173)
`ically well-defined noise fields can be reduced to solving a single equation.
`Even nearfield assumptions and measured noise fields can be easily included.
`We have shown hat the spatial characteristic, described by the coherence
`function, plays a key role in designing arrays. Most of the evaluation tools
`like the beampattern or the directivity index are directly connected to the
`coherence function. Beamformer designs with optimized directivity or higher
`front-to-back ra io also use the coherence.
`One of the new aspects included in this chapter was a new noise model
`to improve the front-to-back ratio. Furthermore we emphasized the close
`
`Amazon Ex. 1010, Page 52 of 129
`
`
`
`2 Superdirective Microphone Arrays
`
`37
`
`relationship between superdirective arrays and adaptive beamformers and
`their well-known implementation as a generalized sidelobe canceler.
`
`References
`
`1. J . Bitzer, K. U. Simmer, and K. D. Kammeyer, "An alternative implementation
`of the superdirective beamformer", in Proc. IEEE Workshop Applicat. Signal
`Processing to Audio Acoust., pp. 7- 10, New Paltz, NY, USA, Oct 1999.
`2. J. Bitzer, K. l:. Simmer, and K. D. Kammeyer, "Multi-microphone noise re(cid:173)
`duction by post-filter a11d superdirective bearnforrner" , in Proc. Int. Workshop
`Acoust. Echo and Noise Control, pp. 100- 103, Pocono Manor, USA, Sep 1999.
`3. K. M. Buckley, "Broad-band beamforming and the generalized sidelobe can(cid:173)
`celler", IEEE Trans . Acoust. Speech Signal Processing, vol. 34, pp. 1322- 1323,
`Oct 1986.
`4. G. C. Carter, Coherence and Time Delay Estimation, IEEE Press, 1993.
`5. H. Cox, R. M. Zeskind, and T . Kooij, "Practical supergain" , IEEE Trans.
`Acoust. Speech Signal Processing, vol. 34, pp. 393- 398, Jun 1986.
`6. H. Cox, R. M. Zeskind, and M. M. Owen, "Robust adaptive beamforming'',
`IEEE 1~·ans. Acoust. Speech Signal Processing, vol. 35, pp. 1365- 1375, Oct 1987.
`7. B. F. Cron and C.H. Sherman, "Spatial-correlation functions for various noise
`models", J. Acoust. Soc. Amer., vol. 34, pp. 1732-1736, Nov 1962.
`8. M. Doerbecker, "Speech enhancement using small microphone arrays with op(cid:173)
`timized directivity" , In Proc. Int. Workshop Acoust. Echo and Noise Control,
`pp. 100- 103, London, UK, Sep 1997.
`9. l'vL Doerbecker, Mehrkanalige Signalverarbeitung zu.r Verbesserung akustisch
`gestorter Sprachsignale am Beispiel elektronischer Horhilfen. PhD thesis, Dept.
`of Telecommunications, University of TH Aachen, Verlag der Augustinus Buch(cid:173)
`handlung, Aachen, Germany, Aug 1998.
`10. C. L. Dolph, "A current distribution for broadside arrays which optimizes the
`relationship between beamwidth and sidelobe level", Proc. IRE, pp. 335-348,
`J un 1946.
`11.. G. W. Elko, "Superdirectional microphone arrays" , in Acoustic Signal Process(cid:173)
`ing for Telecommunication, S. L. Gay and J. Benesty, eds, ch. 10, pp. 181-235,
`Kluwer Academic P ress, 2000.
`12. M.H. Er and A. Cantoni, "'Transformation of linearly constrained broadband
`processors to unconstrained partitioned form" ,
`IEE Proc. Pt. H, vol. 133,
`pp. 209-212, June 1986.
`13. 0. L. Frost, "An algorithm for linearly constrained adaptive array processing",
`Proc. IEEE, vol. 60, pp. 926- 935, Aug 1972.
`14. J.G. Ryan and R. A. Goubran, "Optimal nearfield response for microphone
`arrays" , in Proc. IEEE Workshop Applicat. Signal Processing to Audio Acoust.,
`New Paltz, NY, USA, Oct 1997.
`15. E. N. Gilbert and S. P. Ylorgan, "Optimum design of directive antenna arrays
`subject to random variations" , Bell Syst. Tech. J ., pp. 637 663, May 1955.
`16. L. J. Griffiths and C. W. Jim, "An alternative approach to linearly constrained
`adaptive beamforming", IEEE Trans. Antennas Propagat., vol. 30, pp. 27- 34,
`1982.
`
`Amazon Ex. 1010, Page 53 of 129
`
`
`
`38
`
`Bitzer and Simmer
`
`17. J. M. Kates and M. R. Weiss, "A comparison of hearing-aid array-processing
`techniques" , J . Acoust. Soc. Amer., vol. 99, pp. 31a8- 3148, May 1996.
`18. R. A. Kennedy, T. Abhayapala, D. B . Ward, and R. C. Williamson, "Nearfield
`broadband frequency invariant beamforming", in Proc. IEEE Int. Conj. Acoust.
`Speech Signal Processing (ICASSP-96), pp. 905- 908, April 1996.
`19. R. N. Marshall and W. R. Harry, "A new microphone providing uniform di(cid:173)
`rectivity over an extended frequency range", J. Acoust. Soc. Amer., vol. 12,
`pp. 481- 497, 1941.
`20. R. A. Monzingo and T. W. Miller, Introduction to Adaptive Arrays, John W iley
`and Sons, New York, 1980.
`21. S. Nordholm, I. Claesson, and P . Eriksson, "The broad-band W iener solution
`for Griffith-Jim beamformers", IEEE Trans. Signal Processing, vol. 40, pp. 474-
`478, Feb 1992.
`22. W. Taeger, "Near field superdirectivity (NFSD)", in Proc. IEEE int. Conf.
`Acoust. Speech Signal Processing {ICASSP-98), Seat tle, WA, USA, 1998.
`23. D. B. Ward and G. W. E lko, "Mixed nearfield/ farfield beamforming: A new
`technique for speech acqusition in a reverberant environment", in Proc. JEBB
`Workshop Applicat. Signal Processing to Audio Acoust., New Paltz, NY, USA,
`Oct 1997.
`24. 8. W id.row and S. D. Stearns, Adaptive Signal Processing, E nglewood Cliffs,
`1985.
`
`Amazon Ex. 1010, Page 54 of 129
`
`
`
`3 Post-Filtering Techniques
`
`K. Uwe Simmer1 , Joerg Bitzer2 , and Claude Marro3
`
`1 Aureca GmbH, Bremen, Germany
`2 Houpert Digital Audio, Bremen, Germany
`3 Prance Telecom R&D, Lannion, France
`
`Abstract. In t he context of microphone arrays, the term post-filtering denotes the
`post-processing of the array output by a single-channel noise suppression filter. A
`theoretical analysis shows that Wiener post-filtering of the output of an optimum
`distortionless beamformer provides a minimum mean squared error solution. We
`examine published methods for post-filter estimation and develop a new algorithm.
`A simulation system is presented to compare the performance of the discussed
`algorithms.
`
`3.1
`
`Introduction
`
`What can be gained by additional post-filtering if the Minimum Variance
`Distortionless Response (MVDR) beamformer already provides the optimum
`solution for a given sound field?
`Assuming that signal and noise are mutually uncorrelated the MVDR
`beamformer minimizes the noise power (or variance) subject to the constraint
`of a distortionless look direction response. The solution can be shown to be
`optimum in t he Maximum Likelihood (ML) sense and produces the best pos(cid:173)
`sible Signal to Noise Ratio (SNR) for a narrowband input (1]. However, it
`does not maximize the SNR for a broadband input such as speech. Further(cid:173)
`more, t he MVDR beamformer does not provide a broadband Minimum Mean
`Squared faror (MMSE) solution. The best possible linear filter in the MMSE
`sense is the multi-channel Wiener filter. As shown below the broadband multi(cid:173)
`channel MMSE solution can be factorized into a MVDR beamformer followed
`by a single-channel W iener post-filter. The multi-channel Wiener filter gen(cid:173)
`erally produces a higher output SNR than the MVDR filter. Therefore, addi(cid:173)
`tional post-filtering can significantly improve the SNR, which motivates t his
`chapter.
`The squa,red error minimized by the single-channel Wiener filter is the
`sum of residual noise and signal distort ion comp onents at the output of the
`filter . As a result, linear distortion of the desired signal cannot be avoided en(cid:173)
`t irely if Wiener filtering is used. Additional Wiener filtering is advantageous
`in practice, however, because signal distortions can be masked by residual
`noise and a compromise between signal distortion and noise suppression can
`be found. Using MVDR beamforming alone often does not provide sufficient
`noise reduction due to its limited ability to reduce diffuse noise and rever(cid:173)
`beration.
`
`Amazon Ex. 1010, Page 55 of 129
`
`
`
`40
`
`Simmer et al.
`
`The first concept of an electronic multi-microphone device to suppress
`diffuse reverberation was proposed by Danilenko in 1968 (2] . His research
`was motivated by Bekesy's (3] observation that human listeners are able to
`suppress reverberation if sounds are presented binaurally. In Danilenko's re(cid:173)
`verberation suppressor a main microphone signal is multiplied by a broad(cid:173)
`band gain factor that is equal to the ratio of short-time cross-correlation and
`energy measurements. Two auxiliary microphones were used to measure cor(cid:173)
`relat ion and energy. Danilenko already noted that such a system would also
`suppress incoherent acoustic noise. However, the proposed analog, electronic
`tube version of this system was not realized at that time. Another proposal
`in [2] was to evaluate squared sum and differences of two microphone signals,
`an idea that later was developed independently by Gierl and others in the
`context of digital multi-channel spectral subtraction algorithms (4], [5], (6],
`(7], [8].
`According to Danilenko, his correlation-based concept was first realized
`during Blauert's stay at Bell Labs. In [9], Allen et al. presented a digital,
`two-microphone algorithm for dereverberation based on short-term Fourier(cid:173)
`Transform and the overlap-add method. In 1984, Kaneda and Tohyama ex(cid:173)
`tended the application of the correlation based post-filters to noise reduction
`[10]. The first multi-microphone solution was published by Zelinski [11], (12].
`Simmer and Wasiljeff showed that Zelinski's approach does not provide an op(cid:173)
`timum solution in t he Wiener sense if the noise is spatially uncorrelated, and
`developed a slightly modified version [13]. A deeper analysis of the Zelinski
`and the Simmer post-filter can be found in (14], (15].
`In the last decade, several new combinations and extensions of the post(cid:173)
`filter approach were published. Le-Bouqu.in and Faucon used the coherence
`function as a post-filter (16], (17] and extended their system by a coherence
`subtraction method to overcome the problem of insufficient noise reduction at
`low frequencies [18], (19]. The problem of time delay estimation and further
`improvement of the estimation of the transfer function was independently
`addressed by Kuczynski et al. (20), (21] and Drews et al. [22J, (23]. Fischer
`and Simmer gave a first solution by associating a post-filter and a generalized
`sidelobe canceler (GSC) to improve the noise reduction in case the noise field
`is dominated by coherent sources [24), [25]. Another system for the same task
`was introduced by Hussain et al. (26] and was based on switching between al(cid:173)
`gorithms. The same strategy of switching between different algorithms, where
`the decision is based on the coherence between the sensors, can be found in
`[27], (28]. Furthermore, Mamhoudi and Drygajlo used the wavelet-transform
`in combination with different post-filters to improve the performance [29),
`(30]. Bitzer et al. (31), (32] proposed a solution with a super-directive array
`and McCowan et al. used a near-field super-directive approach (33].
`Reading these papers we find t hat a theoretical basis for post-filtering
`seems to be missing. Therefore, an analysis based on opt imum MMSE multi(cid:173)
`channel filtering is presented in the following section.
`
`Amazon Ex. 1010, Page 56 of 129
`
`
`
`3 Post-filtering Techniques
`
`41
`
`3.2 Multi-channel Wiener Filtering in Subbands
`
`We use matrix notation for a compact derivation. Signal vector x and weight
`vector w denote the multi-channel signal at the output of the N microphones
`and the multi-channel beamformer coefficients, respectively. We assume that
`the input signal vector x(k) is decomposed into M complex subband signals
`x(k, i) by means of an analysis filter-bank, where k is the discrete time in(cid:173)
`dex and i is the subband index. The optimum weight vector Wopt(k,i) for
`transforming the input signal vector x(k, i) = s(k, i) + v (k, i) corrupted by
`additive noise v (k, i) into the best possible MMSE approximation of the de(cid:173)
`sired scalar signal s(k, i) is referred to as multi-channel Wiener filter [34].
`We assume that the relation between the desired scalar signal s(k, i) and the
`signal vector s(k, i) is linear and that the N elements of the column vectors
`s(k, i) and v (k, i) are random processes. In the following, T denotes trans(cid:173)
`position, • denotes complex conjugation, H denotes Hermitian transposition,
`and E [·] denotes the statistical expectation operator.
`
`(3.1)
`
`(3.2)
`
`3.2.1 Derivation of the Optimum Solution
`The error in subband i for an arbitrary weight vector w(k, i) is defined as
`the difference of the filter output
`y(k, i) = wH (k, i) x(k, i) = wH (k, i) (s (k, i) + v (k, i)]
`and the scalar desired signal s(k, i), that is
`e(k, i) = s(k, i) - wH (k , i)x(k, i) .
`using the definitions for the power of a complex signal
`ef>xx(k, i) = E [x(k, i)x(k , i)"],
`the cross-correlation vector
`</>xy(k, i) = E [x(k, i)y• (k, i)] ,
`and the correlation matrix
`<Pxx (k, i) = E [ x(k, i)xH (k, i)) ,
`the squared error at time k may be written as
`ef>ee(i) = E [{s(i) - w"(i)x(i)}{s~(i) - xH(i)w(i)}J
`= </>ss(i) - W H (i)</>xs(i) - </>~(i)w(i) + WH (i)4':z:x(i)w (i), (3.6)
`where the time index k has been omitted without loss of generality. The
`optimum solution minimizes the sum of all error powers ¢ee(i):
`
`(3.3)
`
`(3.4)
`
`(3.5)
`
`M L [<l>ss (i) - WH ('i)<Pxs(i) - </>:~(i)w(i) + WH (i)4>xx(i)w(i)].
`
`i;::0
`
`(3. 7)
`
`Amazon Ex. 1010, Page 57 of 129
`
`
`
`42
`
`Simmer et al.
`
`Since the error power is necessarily real-valued and nonnegative for all sub(cid:173)
`bands, the sum can be minimized for the weight vector w(i) by minimi,zing
`the error power <Pee(i) for each subband. Therefore, the frequEmcy index i
`may also be omitted without loss of generality.
`The power ef>ee is a quadratic function of w and therefore has a single,
`global minimum. The optimum weight vector minimizing the squared error
`is obtained by setting the gradient of <Pee with respect to w equal to the null
`vector (35]:
`
`8r/Jee
`'vw(c/>ee = 2 aw• = - 2</>xs + 2Pxx W = 0.
`)
`
`(3.8)
`
`The resulting expression is the subband version of the multi-channel Wiener(cid:173)
`Hopf equation in its most general form
`
`(3.9)
`
`where Pxx is the correlation matrix of the noisy input vector and <l>xs is the
`cross-correlation vector between the noisy input vector and the desired scalar
`signals. Assuming <l>xx to be nonsingular, we may solve (3.9) for the optimum
`weight vector:
`W opt = <l>xx - l <Pxs ·
`
`(3.10)
`
`3.2.2 Factorization of the Wiener Solution
`
`In our application, the r:eceived signal is assumed to consist of a single desired
`sca.lar signal that is transformed by t