`
`1327
`
`Direction of Arrival Estimation Using the
`Parameterized Spatial Correlation Matrix
`
`Jacek Dmochowski, Jacob Benesty, Senior Member, IEEE, and Sofiène Affes, Senior Member, IEEE
`
`Abstract—The estimation of the direction-of-arrival (DOA) of
`one or more acoustic sources is an area that has generated much
`interest in recent years, with applications like automatic video
`camera steering and multiparty stereophonic teleconferencing
`entering the market. DOA estimation algorithms are hindered by
`the effects of background noise and reverberation. Methods based
`on the time-differences-of-arrival (TDOA) are commonly used
`to determine the azimuth angle of arrival of an acoustic source.
`TDOA-based methods compute each relative delay using only two
`microphones, even though additional microphones are usually
`available. This paper deals with DOA estimation based on spatial
`spectral estimation, and establishes the parameterized spatial cor-
`relation matrix as the framework for this class of DOA estimators.
`This matrix jointly takes into account all pairs of microphones,
`and is at the heart of several broadband spatial spectral estima-
`tors, including steered-response power (SRP) algorithms. This
`paper reviews and evaluates these broadband spatial spectral esti-
`mators, comparing their performance to TDOA-based locators. In
`addition, an eigenanalysis of the parameterized spatial correlation
`matrix is performed and reveals that such analysis allows one to
`estimate the channel attenuation from factors such as uncalibrated
`microphones. This estimate generalizes the broadband minimum
`variance spatial spectral estimator to more general signal models.
`A DOA estimator based on the multichannel cross correlation
`coefficient (MCCC) is also proposed. The performance of all
`proposed algorithms is included in the evaluation. It is shown that
`adding extra microphones helps combat the effects of background
`noise and reverberation. Furthermore, the link between accurate
`spatial spectral estimation and corresponding DOA estimation
`is investigated. The application of the minimum variance and
`MCCC methods to the spatial spectral estimation problem leads
`to better resolution than that of the commonly used fixed-weighted
`SRP spectrum. However, this increased spatial spectral resolution
`does not always translate to more accurate DOA estimation.
`
`Index Terms—Circular arrays, delay-and-sum beamforming
`(DSB), direction-of-arrival (DOA) estimation, linear spatial predic-
`tion, microphone arrays, multichannel cross correlation coefficient
`(MCCC), spatial correlation matrix, time delay estimation.
`
`I. INTRODUCTION
`
`P ROPAGATING signals contain much information about
`
`the sources that emit them. Indeed, the location of a signal
`source is of much interest in many applications, and there exists
`a large and increasing need to locate and track sound sources.
`
`Manuscript received September 6, 2006; revised November 8, 2006. The as-
`sociate editor coordinating the review of this manuscript and approving it for
`publication was Dr. Hiroshi Sawada.
`The authors are with the Institut National de la Recherche Scientifique-
`Énergie, Matériaux, et Télécommunications (INRS-EMT), Université du
`Québec, Montréal, QC H5A 1K6, Canada (e-mail: dmochow@emt.inrs.ca).
`Digital Object Identifier 10.1109/TASL.2006.889795
`
`For example, a signal-enhancing beamformer [1], [2] must con-
`tinuously monitor the position of the desired signal source in
`order to provide the desired directivity and interference sup-
`pression. This paper is concerned with estimating the direc-
`tion-of-arrival (DOA) of acoustic sources in the presence of sig-
`nificant levels of both noise and reverberation.
`The two major classes of broadband DOA estimation
`techniques are those based on the time-differences-of-arrival
`(TDOA) and spatial spectral estimators. The latter terminology
`arises from the fact that spatial frequency corresponds to the
`wavenumber vector, whose direction is that of the propagating
`signal. Therefore, by looking for peaks in the spatial spectrum,
`one is determining the DOAs of the dominant signal sources.
`The TDOA approach is based on the relationship between
`DOA and relative delays across the array. The problem of es-
`timating these relative delays is termed “time delay estimation”
`[3]. The generalized cross-correlation (GCC) approach of [4],
`[5] is the most popular time delay estimation technique. Alter-
`native methods of estimating the TDOA include phase regres-
`sion [6] and linear prediction preprocessing [7]. The resulting
`relative delays are then mapped to the DOA by an appropriate
`inverse function that takes into account array geometry.
`Even though multiple-microphone arrays are commonplace
`in time delay estimation algorithms, there has not emerged a
`clearly preferred way of combining the various measurements
`from multiple microphones. Notice that in the TDOA approach,
`the time delays are estimated using only two microphones at a
`time, even though one usually has several more sensor outputs at
`one’s disposal. The averaging of measurements from indepen-
`dent pairs of microphones is not an optimal way of combining
`the measurements, as each computed time delay is derived from
`only two microphones, and thus often contains significant levels
`of corrupting noise and interference. It is thus well known that
`current TDOA-based DOA estimation algorithms are plagued
`by the effects of both noise and especially reverberation.
`To that end, Griebel and Brandstein [8] map all “realizable”
`combinations of microphone-pair delays to the corresponding
`source locations, and maximize simultaneously the sum (across
`various microphone pairs) of cross-correlations across all pos-
`sible locations. This approach is notable, as it jointly maximizes
`the results of the cross-correlations between the various micro-
`phone pairs.
`The spatial spectral estimation problem is well defined in the
`narrowband signal community. There are three major methods:
`the steered conventional beamformer approach (also termed
`the “Bartlett” estimate), the minimum variance estimator (also
`termed the “Capon” or maximum-likelihood estimator), and
`the linear spatial predictive spectral estimator. Reference [9]
`Amazon Ex. 1006
`IPR Petition – US RE47,049
`
`1558-7916/$25.00 © 2007 IEEE
`
`Amazon Ex. 1006, Page 1 of 13
`
`
`
`1328
`
`IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 4, MAY 2007
`
`provides an excellent overview of these approaches. These
`three approaches are unified in their use of the narrowband
`spatial correlation matrix, as outlined in the next section.
`The situation is more scattered in the broadband signal
`case. Various spectral estimators have been proposed, but there
`does not exist any common framework for organizing these
`approaches. The steered conventional beamformer approach
`applies to broadband signals. The delay-and-sum beamformer
`(DSB) is steered to all possible DOAs to determine the DOA
`which emits the most energy. An alternative formulation of
`this approach is termed the “steered-response power” (SRP)
`method, which exploits the fact that the DSB output power may
`be written as a sum of cross-correlations. The computational
`requirements of the SRP method are a hindrance to practical
`implementation [8]. A detailed treatment of steered-beam-
`former approaches to source localization is given in [10], and
`the statistical optimality of the approach is shown in [11]–[13].
`Krolik and Swingler develop a broadband minimum variance
`estimator based on the steered conventional beamformer [14],
`which may be viewed as an adaptive weighted SRP algorithm.
`There have also been approaches that generalize narrowband
`localization algorithms (i.e., MUSIC [15]) to broadband sig-
`nals through subband processing and subsequent combining
`(see [16], for example). A broadband linear spatial predictive
`approach to time delay estimation is outlined in [17] and [18].
`This approach, which is limited to linear array geometries,
`makes use of all the channels in a joint fashion via the time
`delay parameterized spatial correlation matrix.
`This paper attempts to unify broadband spatial spectral esti-
`mators into a single framework and compares their performance
`from a DOA estimation standpoint to TDOA-based algorithms.
`This unified framework is the azimuth parameterized spatial
`correlation matrix, which is at the heart of all broadband spa-
`tial spectral estimators.
`In addition, several new ideas are presented. First, due to
`the parametrization, well-known narrowband array processing
`notions [19] are applied to the DOA estimation problem, gen-
`eralizing these ideas to the broadband case. A DOA estimator
`based on the eigenanalysis of the parameterized spatial corre-
`lation matrix ensues. More importantly, it is shown that this
`eigenanalysis allows one to estimate the channel attenuation
`from factors such as uncalibrated microphones. The existing
`minimum variance approach to broadband spatial spectral esti-
`mation is reformulated in the context of a more general signal
`model which accounts for such attenuation factors. Further-
`more, the ideas of [17] and [18] are extended to more general
`array geometries (i.e., circular) via the azimuth parameterized
`spatial correlation matrix, resulting in a minimum entropy DOA
`estimator.
`Circular arrays (see [20]–[22], for example) offer some ad-
`vantages over their linear counterparts. A circular array provides
`spatial discrimination over the entire 360 azimuth range, which
`is particularly important for applications that require front-to-
`back signal enhancement, such as teleconferencing. Further-
`more, a circular array geometry allows for more compact de-
`signs. While the contents of this paper apply generally to planar
`array geometries, the circular geometry is used throughout the
`simulation portion.
`
`Fig. 1. Circular array geometry.
`
`Section II presents the signal propagation model in planar
`(i.e., circular) arrays and serves as the foundation for the re-
`mainder of the paper. Section III reviews the role of the tradi-
`tional, nonparameterized spatial correlation matrix in narrow-
`band DOA estimation, and shows how the parameterized ver-
`sion of the spatial correlation matrix allows for generalization
`to broadband signals. Section IV describes the existing and pro-
`posed broadband spatial spectral estimators in terms of the pa-
`rameterized spatial correlation matrix. Section V outlines the
`simulation model employed throughout this paper and evaluates
`the performance of all spatial spectral estimators and TDOA-
`based methods in both reverberation- and noise-limited envi-
`ronments. Concluding statements are given in Section VI.
`The spatial spectral estimation approach to DOA estimation
`has limitations in certain reverberant environments. If an inter-
`fering signal or reflection arrives at the array with a higher en-
`ergy than the direct-path signal, the DOA estimate will be false,
`even though the spatial spectral estimate is accurate. Such situ-
`ations arise when the source is oriented towards a reflective bar-
`rier and away from the array. This problem is beyond the scope
`of this paper and is not addressed herein. Rather, the focus of
`this paper is on the evaluation of spatial spectral estimators in
`noisy and reverberant environments and on their application to
`DOA estimation.
`
`II. SIGNAL MODEL
`
`elements in a 2-D geom-
`Assume a planar array of
`etry, shown in Fig. 1 (i.e., circular geometry), whose outputs
`are denoted by
`,
`, where
`is the time index.
`Denoting the azimuth angle of arrival by , propagation of the
`signal from a far-field source to microphone is modeled as:
`
`(1)
`
`, are the attenuation factors due to
`,
`where
`channel effects,
`is the propagation time, in samples, from the
`unknown source
`to microphone 0,
`is an additive noise
`signal at the th microphone, and
`, is the
`
`,
`
`Amazon Ex. 1006, Page 2 of 13
`
`
`
`DMOCHOWSKI et al.: DIRECTION OF ARRIVAL ESTIMATION USING THE PARAMETERIZED SPATIAL CORRELATION MATRIX
`
`1329
`
`relative delay between microphones 0 and . In matrix form, the
`array signal model becomes:
`
`...
`
`...
`
`...
`
`. . .
`
`...
`
`. . .
`. . .
`
`...
`
`...
`
`although presented in far-field planar context, easily generalize
`to the near-field spherical case by including the range and ele-
`vation in the forthcoming parametrization.
`
`III. PARAMETERIZED SPATIAL CORRELATION MATRIX
`In narrowband signal applications, a common space-time
`statistic is that of the spatial correlation matrix [19], which is
`given by
`
`(2)
`
`where
`
`(5)
`
`(6)
`
`relates the angle of arrival to the relative delays
`The function
`between microphone elements 0 and , and is derived for the case
`of an equispaced circular array in the following manner. When
`operating in the far-field, the time delay between microphone
`and the center of the array is given by [23]
`
`where the azimuth angle (relative to the selected angle refer-
`,
`ence) of the th microphone is denoted by
`,
`denotes the array radius, and is the speed of signal
`propagation. It easily follows that
`
`(3)
`
`(4)
`
`may
`It is also worth mentioning that the additive noise
`be temporally correlated with the desired signal
`. In that
`case, a reverberant environment is modeled. The anechoic en-
`vironment is modeled by making the additive noise temporally
`uncorrelated with the source signal. In either case, the additive
`noise may be spatially correlated across the sensors.
`It should also be stated that the signal model presented above
`makes use of the far-field assumption, in that the incoming wave
`is assumed to be planar, such that all sensors perceive the same
`DOA. An error is incurred if the signal source is actually lo-
`cated in the near-field; in that case, the relative delays are also
`a function of the range. In the most general case (i.e., a source
`in the near-field of a 3-D geometry), the function
`takes three
`parameters: the azimuth, range, and elevation. This paper fo-
`cuses on a specific subset of this general model: a source located
`in the far-field with only a slight elevation, such that a single
`parameter suffices. This is commonly the case in a teleconfer-
`encing environment. Nevertheless, the concepts of this paper,
`
`denotes conjugate transpose, as complex sig-
`the superscript
`nals are commonly used in narrowband applications, and
`de-
`notes the transpose of a matrix or vector. To steer these array
`outputs to a particular DOA, one applies a complex weight to
`each sensor output, whose phase performs the steering, and then
`sums the sensor outputs to form the output beam. Now, if the
`input signal is no longer narrowband, each frequency requires
`its own complex weight to appropriately phase-shift the signal
`at that frequency. In the context of broadband spatial spectral
`estimation, the spatial correlation matrix may be computed at
`each temporal frequency, and the resulting spatial spectrum is
`now a function of the temporal frequency. For broadband appli-
`cations, these narrowband estimates may be assimilated into a
`time-domain statistic, a procedure termed “focusing,” which is
`described in [24]. The resulting structure is termed a “focused
`covariance matrix.”
`In this paper, broadband spatial spectral estimation is
`addressed in another manner. Instead of implementing the
`steering delays in the complex weighting at each sensor, the
`delays are actually implemented as a time-delay in the spatial
`correlation matrix, which is now parameterized. Thus, each
`microphone output is appropriately delayed before computing
`this parameterized spatial correlation matrix:
`
`(7)
`
`and real signals are assumed from this point on. The delays are
`a function of the assumed azimuth DOA, which becomes the
`parameter. The parameterized spatial correlation matrix is for-
`mally written as shown by (8) and (9) at the bottom of the page.
`The matrix
`is not simply the array observation matrix, as is
`commonly used in narrowband beamforming models. Instead, it
`is a parameterized correlation matrix that represents the signal
`
`...
`
`...
`
`. . .
`
`...
`
`(8)
`
`(9)
`
`Amazon Ex. 1006, Page 3 of 13
`
`
`
`1330
`
`IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 4, MAY 2007
`
`powers across the array emanating from azimuth . Each off-di-
`agonal entry in the matrix
`is a single cross-correlation term
`and a function of the azimuth angle
`. Notice that the various
`microphone pairs are combined in a joint fashion, in that altering
`the steering angle
`affects all off-diagonal entries of
`. This
`property allows for the more prudent combining of microphone
`measurements as compared to the ad hoc method of averaging
`independent pairs of cross-correlation results.
`This paper relates broadband spatial spectral estimators in
`terms of the parameterized spatial correlation matrix
`
`], by an amount
`[or advanced, depending on the sign of
`that takes into account the array geometry, via the function
`.
`The estimate of the spatial spectral power at azimuth angle
`is given by the power of the beamformer output when steered to
`azimuth . Therefore, to form the entire spectrum, one needs to
`steer the beam and compute the output power across the entire
`azimuth space.
`The steered-beamformer spectral estimate is given by
`
`Substitution of (12) into (13) leads to
`
`(10)
`
`Expression (14) may be written more neatly in matrix notation
`as
`
`(13)
`
`(14)
`
`(15)
`
`(16)
`
`(17)
`
`is the steered azimuth
`is some estimation function,
`where
`angle, and
`is the estimate of the broadband spatial spec-
`trum at azimuth angle
`.
`The DOA estimate follows directly from the spatial spectrum,
`in that peaks in the spectrum correspond to assumed source
`locations. For the case of a single source, which is the case
`throughout this paper, the estimate of the source’s DOA is given
`by
`
`where
`is the DOA estimate.
`Note that this broadband extension is not without caveats:
`care must be taken when spacing the microphones to ensure that
`spatial aliasing [2] does not result.
`It is also important to point out that the GCC method is quite
`compatible with DOA estimation based on the parameterized
`spatial correlation matrix—the cross-correlation estimates that
`comprise the matrix may be computed in the frequency-domain
`using a GCC variant such as the phase transform (PHAT) [4].
`This paper focuses on how to extract the DOA estimate from the
`parameterized spatial correlation matrix; the ideas presented are
`general in that they do not hinge on any particular method for
`computing the actual cross correlations.
`
`IV. BROADBAND SPATIAL SPECTRAL ESTIMATORS
`The following subsections detail the existing and proposed
`broadband spatial spectral estimation methods, relating each to
`the parameterized spatial correlation matrix.
`
`A. Steered Conventional Beamforming and the SRP Algorithm
`The aim of a DSB is to time-align the received signals in
`the array aperture, such that the desired signal is coherently
`summed, while signals from other directions are incoherently
`summed and thus attenuated. Using the model of Section II, the
`output of a DSB steered to an angle of arrival of
`is given as
`
`(12)
`
`steer the beamformer to the desired DOA,
`The delays
`while the beamformer weights
`help shape the beam accord-
`ingly. The weights here have been made dependent on the de-
`sired angle of arrival
`, for a reason that will become apparent
`in future subsections. In (12), the received signals are delayed
`
`(11)
`
`where
`
`The DOA estimate is thus given by
`
`The maximization of a steered beamformer output power is
`equivalent to maximizing a quadratic of the beamformer weight
`vector with respect to the angle of arrival. Altering the angle
`affects the parameter in the quadratic form, namely, the param-
`eterized spatial correlation matrix.
`The well-known SRP algorithm [10] follows directly from a
`special case of (17), where
`for all
`, and
`is a vector
`of
`ones:
`
`(18)
`
`For this special case of fixed unit weights, this means that the
`maximization of the power of a steered DSB is equivalent to the
`maximization of the sum of the entries of
`.
`The SRP algorithm has garnered significant attention re-
`cently: see [10], [25], and [26]. In all of these implementations,
`the weighting of
`is used, which is fixed with respect
`to both the data and the steering angle. Given the well-known
`classical results on the advantages of adaptive beamforming
`over fixed beamforming, it is therefore surprising that adaptive
`weighting schemes have not been investigated more in the
`context of DOA estimation based on the parameterized spatial
`correlation matrix (A fixed weighting scheme is proposed in
`[27]). Notice that from (15), this is an effectively “narrowband”
`weight selection, in that the pre-aligning of the microphones
`requires only the selection of a single weight per channel. Note,
`however, that this weight selection must be performed for all
`angles
`. To that end, the following section presents one such
`adaptive weighting scheme, proposed by Krolik [14].
`
`Amazon Ex. 1006, Page 4 of 13
`
`
`
`DMOCHOWSKI et al.: DIRECTION OF ARRIVAL ESTIMATION USING THE PARAMETERIZED SPATIAL CORRELATION MATRIX
`
`1331
`
`B. Minimum Variance
`The minimum variance approach to spatial spectral esti-
`mation involves selecting weights that pass a signal [i.e., a
`] propagating from azimuth with
`broadband plane wave
`unity gain, while minimizing the total output power, given by
`. The application of the minimum variance method
`to broadband spatial spectral estimation is given in [14].
`The unity gain constraint proposed by [14] is
`
`is apparent that the vector may be estimated from the eigenanal-
`.
`ysis of
`To that end, consider another adaptive weight selection
`method, which follows from the ideas of narrowband beam-
`forming [19]. This weight selection attempts to nontrivially
`maximize the output energy of the steered-beamformer for a
`given azimuth
`
`(19)
`
`subject to
`
`vector follows from the fact that the signal is already
`and the
`time-aligned across the array before minimum variance pro-
`cessing. It is as if the signal is coming from the broadside of
`a linear array.
`Using the method of Lagrange multipliers in conjunction with
`the cost function
`, the minimum variance weights
`become
`
`It is well known that the solution to the above constrained opti-
`mization is the vector that maximizes the Rayleigh quotient [2]
`, which is in turn given by the eigenvector
`. The resulting
`corresponding to the maximum eigenvalue of
`spatial spectral estimate is given by
`
`(20)
`
`(28)
`
`(26)
`
`(27)
`
`The resulting minimum variance spatial spectral estimate is
`found by substituting the weights of (20) into the cost function:
`
`The broadband minimum variance DOA estimator is thus given
`by
`
`(21)
`
`(22)
`
`The next section presents a new idea: the eigenanalysis of the
`parameterized spatial correlation matrix.
`
`C. Eigenanalysis of the Parameterized Spatial Correlation
`Matrix
`Using the signal model of Section II, notice that when the
`steered azimuth matches the actual azimuth , the parameter-
`ized spatial correlation matrix may be decomposed into signal
`and noise components in the following manner:
`
`where
`
`is the signal power
`
`and
`
`(23)
`
`(24)
`
`(25)
`
`where
`is
`, and
`is the maximum eigenvalue of
`the corresponding eigenvector. The DOA estimation involves
`searching for the angle that produces the largest maximum
`eigenvalue of
`:
`
`(29)
`
`In addition to producing another spatial spectrum estimate, the
`above eigenanalysis allows one to estimate:
`
`(30)
`
`Now that an estimate of the attenuation vector
`is available,
`the minimum variance method of [14] may be improved to re-
`flect the presence of channel attenuation factors, which were
`omitted in the developments of Section IV-B.
`
`D. Improved Minimum Variance
`The broadband minimum variance spatial spectral estimation
`proposed by [14] assumes that the attenuation vector
`is equal
`to , or a scaled version of
`. In practice, it is not uncommon
`for this assumption to be violated by factors such as uncalibrated
`microphones, for example. To that end, the unity gain constraint
`proposed by [14] is modified to reflect the more general signal
`model of Section II.
`Taking into account the channel attenuation vector
`posed unity gain constraint is
`
`, the pro-
`
`Note that it has been implicitly assumed that the desired signal is
`wide-sense stationary, zero-mean, and temporally uncorrelated
`with the additive noise. Consider only the signal component of
`. It may be easily shown that this matrix has one nonzero
`eigenvector, that eigenvector being
`, with the corresponding
`eigenvalue being
`. The vector of attenuation constants
`is generally unknown; however, from the above discussion, it
`
`which may be simplified and written in vector notation as
`
`Therefore, the optimal minimum variance weights become
`
`(31)
`
`(32)
`
`(33)
`
`Amazon Ex. 1006, Page 5 of 13
`
`
`
`1332
`
`IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 4, MAY 2007
`
`The resulting proposed minimum variance spatial spectral esti-
`mate is found by substituting the weights of (33) into the cost
`function
`
`Using the method of Lagrange multipliers, the optimal predic-
`tive weights are given by
`
`(34)
`
`and the resulting minimum mean-squared error (mmse) is
`
`The proposed broadband minimum variance DOA estimator is
`thus given by
`
`(35)
`
`E. Linear Spatial Prediction and the Multichannel
`Cross-Correlation Coefficient
`
`Spatial spectral estimation using linear prediction is well de-
`fined for the case of narrowband signals, as the narrowband as-
`sumption allows one to write one of the microphone outputs as a
`complex-weighted linear combination of the other microphone
`outputs [2]. To extend this idea to the broadband case, the same
`method as that of the previous sections is used, in that the time
`delay is applied prior to computing the predictive coefficients.
`This concept was first presented in [17] and [18] in the con-
`text of time delay estimation; the approach was limited to linear
`array geometries, and yielded only a single relative delay. This
`section generalizes the idea to planar array geometries, trans-
`forming the problem from time delay estimation to DOA esti-
`mation.
`The idea is to predict, using real predictive coefficients, the
`output of
`using a linear combination of
`,
`. Using a spatial autoregressive (AR) model, the
`linear predictive framework is given by
`
`Note that both the optimal predictive coefficients and the mmse
`are a function of the steered angle .
`The classical approach to spectral estimation using linear pre-
`diction is to map the optimal predictive coefficients to an AR
`transfer function. However, it is well known that this method
`is very sensitive to the presence of additive noise in the obser-
`vations [2]. This is because the AR model breaks down when
`additive noise is present. To that end, a more robust implemen-
`tation of linear spatial prediction is proposed in [17] and [18].
`The idea is to not estimate an AR spectrum, but rather to find the
`parameter (i.e., the angle ) that minimizes the prediction error.
`In [17] and [18], the idea of linear spatial prediction was used
`to derive the (time delay parameterized) multichannel cross cor-
`relation coefficient (MCCC) in the context of linear array time
`delay estimation. These ideas are now extended to planar array
`geometries, and the azimuth angle-parameterized MCCC is pre-
`sented as another broadband spatial spectral estimator.
`The matrix
`may be factorized as [17], [18]:
`
`where
`
`(41)
`
`(42)
`
`(43)
`
`(44)
`
`(45)
`
`(36)
`
`...
`
`. . .
`
`. . .
`
`...
`
`is a diagonal matrix
`
`where
`may be interpreted as either the spatially white noise
`that drives the AR model, or the prediction error. For each
`in
`the azimuth space, one finds the weight vector
`
`which minimizes the criterion
`
`subject to the constraint
`
`where
`
`(37)
`
`(38)
`
`(39)
`
`(40)
`
`...
`
`. . .
`
`. . .
`
`...
`
`is a symmetric matrix, and
`
`is the cross-correlation coefficient between
`.
`The azimuth-angle dependent mmse may be written using
`(43) as
`
`(46)
`and
`
`(47)
`
`Amazon Ex. 1006, Page 6 of 13
`
`
`
`DMOCHOWSKI et al.: DIRECTION OF ARRIVAL ESTIMATION USING THE PARAMETERIZED SPATIAL CORRELATION MATRIX
`
`1333
`
`is the submatrix formed by removing the first row
`where
`, and
`stands for “determinant.” It is
`and column from
`shown in [17] and [18] that
`
`and thus the following relationship is established:
`
`(48)
`
`(49)
`
`From this relationship, it is easily observed that minimizing
`the spatial prediction error corresponds to minimizing the quan-
`tity
`. Notice that
`when every entry of
`is equal to unity (i.e., perfectly correlated microphone sig-
`nals). Conversely, in the case of mutually uncorrelated micro-
`phone outputs,
`. Putting all of this together, the
`azimuth angle parameterized MCCC is defined as
`
`The MCCC broadband spatial spectral estimate is given by
`
`(50)
`
`from which the DOA estimation easily follows as
`
`(51)
`
`(52)
`
`It is interesting to note that even though the linear spatial pre-
`dictive approach is used here to arrive at the azimuth parame-
`terized MCCC estimator, maximizing the MCCC actually cor-
`responds more closely to the minimization of the joint entropy
`of the received signals [28], assuming that the signals are jointly
`Gaussian distributed. This follows from the fact that for jointly
`Gaussian distributed
`, the joint entropy of
`is directly
`proportional to
`[28].
`
`V. SIMULATION EVALUATION
`
`A. Simulation Environment
`The various broadband spatial spectral estimators are eval-
`uated in a computer simulation. An equispaced circular array
`of three to ten omnidirectional microphones is employed as the
`spatial aperture. The radius of the array is chosen as the distance
`that fulfills the spatial aliasing equality for circular arrays. In
`other words, the array radius is made as large as possible without
`suffering from spatial aliasing [23]
`
`(53)
`
`where
`denotes the highest frequency of interest, and is
`chosen to be 4 kHz in the simulations. For a ten-element cir-
`cular array, the array radius becomes 6.9 cm. The signal sources
`are omnidirectional point sources. This means that the direct-
`
`path component is stronger than any individual reflected com-
`ponent—as mentioned in the Introduction, it is beyond the scope
`of this paper to handle cases where due to source directivity and
`orientation, a reflected component contains more energy than
`the direct-path component.
`A reverberant acoustic environment is simulated using the
`image model method [29]. The simulated room is rectangular
`with plane reflective boundaries (walls, ceiling, and floor). Each
`boundary is characterized by a frequency-independent uniform
`reflection coefficient which does not vary with the angle of in-
`cidence of the source signal.
`The room dimensions in centimeters are (304.8, 457.2, 381).
`The circular array is located in the center of the room: the center
`of the array sits at (152.4, 228.6, 101.6). Two distinct scenarios
`are simulated, as described below.
`The speaker is immobile and situated at (254, 406.4, 101.6)
`and (254, 406.4, 152.4) in the first and second simulation
`scenarios, respectively. The immobility of the source means
`that the evaluation does not consider frames during which the
`source exhibits movement. The correct azimuth angle of arrival
`is 60 . The distance from the center of the array to the source
`is 204.7 cm.
`The SNR at the microphone elements is 0 dB. Here, SNR
`refers to spatially white sensor noise in the first scenario and
`spherically isotropic (diffuse) noise in the second scenario. The
`generation of spherically isotropic noise is performed by trans-
`forming a vector of uncorrelated Gaussian random variables into
`a vector of correlated (i.e., according to a given covariance ma-
`trix) Gaussian random variables by premultiplying the original
`(uncorrelated) vector with the Cholesky factorization [30] of
`the covariance matrix of a diffuse noise field [2]. The covari-
`ance matrix of the diffuse noise field is computed by averaging
`over the entire frequency range (300–4000 Hz). For the compu-
`tation of the SNR, the signal component includes reverberation.
`In terms of reverberation, three levels are simulated for each sce-
`nario: anechoic, moderately reverberant, and highly reverberant.
`The reverberation times are measured using the reverse-time in-
`tegrated impulse response method of [31]. The frequency-inde-
`pendent reflection coefficients of the walls and ceiling are ad-
`justed to achieve the desired level of reverberation: a 60-dB re-
`verberation decay time of 300 ms for the moderately reverberant
`case, and 600 ms for the highly reverberant case.
`In the first simulation scenario, the microphones are all per-
`fectly calibrated with unity gains. In the second simulation sce-
`nario, the presence of uncalibrated microphones is simulated,
`by setting
`,
`to a uniformly distributed random
`number over the range (0.2, 1).
`The source signal is convolved with the synthetic impulse re-
`sponses. Appropriately scaled temporally white Gaussian noise
`is then added at the microphones to achiev