throbber
MAY 2007
`
`VOLUME 15
`
`NUMBER 4
`
`ITASD8
`
`(ISSN 1558-7916)
`
`PAPERS
`Speech Analysis
`A Soft Voice Activity Detection Using GARCH Filter and Variance Gamma Distribution ...... . . . . . R. Tahmasbi and S. Rezaei
`Single and Multiple F0 Contour Estimation Through Parametric Spectrogram Modeling of Speech in Noisy Environments ...
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. Le Roux, H. Kameoka, N. Ono, A. de Cheveigné, and S. Sagayama
`Speech Coding
`Memory-Based Vector Quantization of LSF Parameters by a Power Series Approximation ...... . . . . T. Eriksson and F. Nordén
`Rate Allocation for Noncollaborative Multiuser Speech Communication Systems Based on Bargaining Theory ............. ....
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. J. Borgström, M. van der Schaar, and A. Alwan
`Wideband Speech Coding Advances in VMR-WB Standard .................... .... . . . . . . . . . . . . . . . . . . . . . M. Jelínek and R. Salami
`Speech Enhancement
`A Spectral Conversion Approach to Single-Channel Speech Enhancement .... ......................... ......................... ....
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Mouchtaris, J. Van der Spiegel, P. Mueller, and P. Tsakalides
`Noisy Speech Enhancement Using Harmonic-Noise Model and Codebook-Based Post-Processing .. ......................... ....
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E. Zavarehei, S. Vaseghi, and Q. Yan
`Speech Adaptation/Normalization
`Environmental Independent ASR Model Adaptation/Compensation by Bayesian Parametric Representation ................ ....
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X. Wang and D. O’Shaughnessy
`Speech Synthesis and Generation
`Simulation of Losses Due to Turbulence in the Time-Varying Vocal System .. ..... . . . . . P. Birkholz, D. Jackèl, and B. J. Kröger
`Variable-Length Unit Selection in TTS Using Structural Syntactic Cost ... . . C.-H. Wu, C.-C. Hsia, J.-F. Chen, and J.-F. Wang
`Speech Data Mining and Document Retrieval
`Audio Signal Feature Extraction and Classification Using Local Discriminant Bases ................. ......................... ....
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K. Umapathy, S. Krishnan, and R. K. Rao
`Content-Based Audio Processing
`Melody Transcription From Music Audio: Approaches and Evaluation ........ ......................... ......................... ....
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G. E. Poliner, D. P. W. Ellis, A. F. Ehmann, E. Gómez, S. Streich, and B. Ong
`Melody Extraction and Musical Onset Detection via Probabilistic Models of Framewise STFT Peak Data ................... ....
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H. Thornburg, R. J. Leistikow, and J. Berger
`Audio Coding
`Low Bit-Rate Object Coding of Musical Audio Using Bayesian Harmonic Models ....... . . . . . . E. Vincent and M. D. Plumbley
`
`1129
`
`1135
`
`1146
`
`1156
`1167
`
`1180
`
`1194
`
`1204
`
`1218
`1227
`
`1236
`
`1247
`
`1257
`
`1273
`
`(Contents Continued on Back Cover)
`
`IPR PETITION
`US RE48,371
`Sonos Ex. 1011
`
`

`

`(Contents Continued from Front Cover)
`
`Audio Analysis and Synthesis
`Joint Detection and Tracking of Time-Varying Harmonic Components: A Flexible Bayesian Approach ...................... ....
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Dubois and M. Davy
`Audio for Multimedia
`Robust Data Hiding in Audio Using Allpass Filters .... ............. . . . . . . . . . . . . . . . H. M. A. Malik, R. Ansari, and A. A. Khokhar
`Echo Cancellation
`System Identification in the Short-Time Fourier Transform Domain With Crossband Filtering ...... . . . . Y. Avargel and I. Cohen
`An Improvement of the Two-Path Algorithm Transfer Logic for Acoustic Echo Cancellation ........ ......................... ....
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F. Lindström, C. Schüldt, and I. Claesson
`Loudspeaker and Microphone Array Signal Processing
`Direction of Arrival Estimation Using the Parameterized Spatial Correlation Matrix .................. ......................... ....
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. Dmochowski, J. Benesty, and S. Affes
`Multichannel Bin-Wise Robust Frequency-Domain Adaptive Filtering and Its Application to Adaptive Beamforming ...... ....
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Herbordt, H. Buchner, S. Nakamura, and W. Kellermann
`Large Vocabulary Continuous Recognition/Search
`Efficient WFST-Based One-Pass Decoding With On-The-Fly Hypothesis Rescoring in Extremely Large Vocabulary Continuous
`Speech Recognition ........... .................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T. Hori, C. Hori, Y. Minami, and A. Nakamura
`Robust Speech Recognition
`A Study of Variable-Parameter Gaussian Mixture Hidden Markov Modeling for Noisy Speech Recognition ................ ....
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X. Cui and Y. Gong
`General Topics in Speech Recognition
`Template-Based Continuous Speech Recognition ....... ......................... ......................... ......................... ....
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. De Wachter, M. Matton, K. Demuynck, P. Wambacq, R. Cools, and D. Van Compernolle
`Exploiting Temporal Correlation of Speech for Error Robust and Bandwidth Flexible Distributed Speech Recognition ..... ....
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Z.-H. Tan, P. Dalsgaard, and B. Lindberg
`A Framework for Secure Speech Recognition ........... ................ . . . . . . . . . . . . . . . . . . . . . . . . . . P. Smaragdis and M. Shashanka
`Acoustic Modeling for Automatic Speech Recognition
`Automatic Model Complexity Control Using Marginalized Discriminative Growth Functions ...... . . . . . . . X. Liu and M. Gales
`Trajectory Clustering for Solving the Trajectory Folding Problem in Automatic Speech Recognition ........................ ....
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Y. Han, J. de Veth, and L. Boves
`Speaker Characterization and Recognition
`Joint Factor Analysis Versus Eigenchannels in Speaker Recognition .. . . P. Kenny, G. Boulianne, P. Ouellet, and P. Dumouchel
`Speaker and Session Variability in GMM-Based Speaker Verification .. . . P. Kenny, G. Boulianne, P. Ouellet, and P. Dumouchel
`Automatic Speaker Clustering Using a Voice Characteristic Reference Space and Maximum Purity Estimation ............. ....
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W.-H. Tsai, S.-S. Cheng, and H.-M. Wang
`Source Separation and Signal Enhancement
`Separation of Singing Voice From Music Accompaniment for Monaural Recordings ........... . . . . . . . . . . . Y. Li and D. L. Wang
`Signal Processing for Music
`Parameterized Finite Difference Schemes for Plates: Stability, the Reduction of Directional Dispersion and Frequency
`Warping ........................ .................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Bilbao, L. Savioja, and J. O. Smith
`
`1283
`
`1296
`
`1305
`
`1320
`
`1327
`
`1340
`
`1352
`
`1366
`
`1377
`
`1391
`1404
`
`1414
`
`1425
`
`1435
`1448
`
`1461
`
`1475
`
`1488
`
`CORRESPONDENCE
`On the Ramsey Class of Interleavers for Robust Speech Recognition in Burst-Like Packet Loss ...........................
`.. ........ ......... ......... ........ ......... ......... ........ ...... A. M. Gómez, A. M. Peinado, V. Sánchez, and A. J. Rubio
`
`1496
`
`EDICS—Editors’s Information Classification Scheme ...................................... ......... ........ ......... ......... .
`Information for Authors ....................................................... ........ ......... ......... ........ ......... ......... .
`
`1500
`1502
`
`ANNOUNCEMENTS
`Call for Papers—Special Issue on New Approaches to Statistical Speech Processing .................... ......... ......... .
`Call for Papers—2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics ................... .
`Call for Papers—IEEE TRANSACTIONS ON MULTIMEDIA Special Issue on Multimedia Applications in Mobile/Wireless
`Context ...................................................................... ........ ......... ......... ........ ......... ......... .
`
`1504
`1505
`
`1506
`
`

`

`IEEE SIGNAL PROCESSING SOCIETY
`The Signal Processing Society is an organization, within the framework of the IEEE, of members with principal professional interest in the technology of transmission, recording, repro-
`duction, processing, and measurement of speech and other signals by digital electronic, electrical, acoustic, mechanical, and optical means, the components and systems to accomplish
`these and related aims, and the environmental, psychological, and physiological factors concerned therewith. All members of the IEEE are eligible for membership in the Society and
`will receive this TRANSACTIONS upon payment of the annual Society membership fee of $27.00 plus an annual subscription fee of $34.00. For information on joining, write to the IEEE
`at the address below. Member copies of Transactions/Journals are for personal use only.
`
`Publications Board Chair
`K. J. R. LIU, VP-Publications
`Univ. Maryland
`College Park, MD 20742
`
`Signal Processing Letters
`A. B. GERSHMAN, Editor-in-Chief
`Darmstadt Univ. Technol.
`D-64283 Darmstadt, Germany
`
`ALEX ACERO
`Microsoft Res.
`Redmond, WA 98052-6399
`
`ABEER ALWAN
`Dept. of Elect. Eng.
`UCLA
`Los Angeles, CA90095
`
`BILL BYRNE
`Eng. Dept.
`Cambridge, CB21PZ, U.K.
`
`ISRAEL COHEN
`Technion–Israel Inst. of Technol.
`Technion City, Haifa 32000, Israel
`
`YARIV EPHRAIM
`George Mason Univ.
`Dept. of ECE
`Fairfax, VA 22030-444
`
`DILEK HAKKANI-TÜR
`Intl. Comput. Sci. Inst. (ICSI)
`Berkely, CA 94704
`
`MARY HARPER
`Purdue Univ.
`Sch. of Elect. & Comput. Eng.
`West Lafayete, IN 47907-1285
`
`Trans. on Signal Processing
`A.-J. VAN DER VEEN, Editor-in-Chief
`Delft Univ. Technol.
`2628 CD Delft, The Netherlands
`
`SOCIETY PUBLICATIONS
`Trans. on Image Processing
`C. A. BOUMAN, Editor-in-Chief
`Purdue Univ.
`W. Lafayette, IN 47906
`
`Trans. on Information Forensics
`and Security
`P. MOULIN, Editor-in-Chief
`Univ. Illinois
`Urbana, IL 61801
`
`SP Magazine
`S.-F. CHANG, Editor-in-Chief
`Columbia Univ.
`New York, NY 10027
`
`TRANSACTIONS ASSOCIATE EDITORS
`MARK HASEGAWA-JOHNSON
`SYLVAIN MARCHAND
`Univ. of Illinois
`Univ. of Bordeaux
`Elect. & Comput. Eng.
`351, cours de la Liberation
`Beckman Inst.
`F-33405, Talence Cedex, France
`Urbana, IL 61801
`
`TIMOTHY J. HAZEN
`MIT Comput. Sci. and
`Artificial Intelligence Lab.
`Cambridge, MA 02139
`
`HONG–GOO KANG
`Yonsei Univ. Seoul
`South Korea, 120–749
`
`SIMON KING
`Ctr. for Speech Technol. Res.
`Univ. of Edinburgh
`Edinburgh, EH8 9LW, U.K.
`
`SEN KUO
`Dept. of Elect. Eng.
`Northern Illinois Univ.
`Dekalb, IL 60115
`
`SHOJI MAKINO
`NTT Communication Res. Labs.
`Kyoto, 619-0237, Japan
`
`RAINER MARTIN
`Ruhr-Univ. Bochum
`Inst. of Communication Acoustics
`Bochum, Germany 44780
`
`HELEN MENG
`Chinese Univ. of Hong Kong
`Shatin, New Territory
`Hong Kong, SAR, China
`
`MAURIZIO OMOLOGO
`ITC-IRST
`38050, Povo-Trento, Italy
`
`RUDOLF RABENSTEN
`Telecommunications Inst. 1
`Univ. of Erlangen-Nuremberg
`Cauerstrasse 7
`D-91058 Erlangen, Germany
`
`SUSANTO RAHARDJA
`Div. of Info. Eng.
`21 Heng Mui Keng Terrace
`Singapore, 119613
`ersusanto@ntu.edu.sg
`
`Trans. on Audio, Speech,
`and Language Processing
`M. OSTENDORF, Editor-in-Chief
`Univ. Washington
`Seattle, WA 98195-2500
`
`Trans. on SP, IP, ASL, and IFS
`SPS Publication Office
`IEEE Signal Processing Society
`Piscataway, NJ 08854
`
`GAEL RICHARD
`37-39 rue Dareau, Bureau/Office DA 412
`75014 Paris, France
`GERHARD RIGOLL
`Munich Univ. of Technol.
`D-80290 Munich, Germany
`HIROSHI SAWADA
`NTT Communication Sci. Labs.
`NTT Corp.
`Kyoto, 619-0237, Japan
`MALCOLM SLANEY
`Yahoo! Res.
`Santa Clara, CA 95054
`ARUN C. SURENDRAN
`Comm. Collaboration and Signal
`Processing at Microsoft Res.
`One Microsoft Way
`Redmond, WA 98052
`GEORGE TZANETAKIS
`Dept. of Comput. Sci.
`Univ. of Victoria
`Victoria, BC V8W 3 P6, Canada
`VESA VALIMAKI
`TKK-Helsinki Univ. of Technol.
`Dept. of Elect. Comm. Eng.
`Lab. Acoustics Audio Signal Processing
`FI-02015 TKK, Espoo, Finland
`
`LEAH H. JAMIESON, President and CEO
`LEWIS M. TERMAN, President-Elect
`CELIA L. DESMOND, Secretary
`DAVID G. GREEN, Treasurer
`MICHAEL R. LIGHTNER, Past President
`MOSHE KAM, Vice President, Educational Activities
`RICHARD V. COX, Director, Division IX—Signals and Applications
`
`IEEE Officers
`JOHN B. BAILLIEUL, Vice President, Publication Services and Products
`PEDRO A. RAY, Vice President, Regional Activities
`GEORGE W. ARNOLD, President, IEEE Standards Association
`PETER W. STAECKER, Vice President, Technical Activities
`JOHN W. MEREDITH, President, IEEE-USA
`
`IEEE Executive Staff
`JEFFRY W. RAYNES, CAE, Executive Director & Chief Operating Officer
`MATTHEW LOEB, Corporate Strategy & Communications
`DONALD CURTIS, Human Resources
`RICHARD D. SCHWARTZ, Business Administration
`ANTHONY DURNIAK, Publications Activities
`CHRIS BRANTLEY,
`IEEE-USA
`JUDITH GORMAN, Standards Activities
`MARY WARD-CALLAN, Technical Activities
`CECELIA JANKOWSKI, Regional Activities
`SALLY A. WASELIK,
`Information Technology
`BARBARA COBURN STOLER, Educational Activities
`
`IEEE Periodicals
`Transactions/Journals Department
`Staff Director: FRAN ZAPPULLA
`Editorial Director: DAWN MELLEY Production Director: ROBERT SMREK
`Managing Editor: MARTIN J. MORAHAN Associate Editor: ANDREW SWARTZ
`
`IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING (ISSN 1558-7916) is published eight times a year in January, February, March, May, July, August, September,
`and November by the Institute of Electrical and Electronics Engineers, Inc. Responsibility for the contents rests upon the authors and not upon the IEEE, the Society/Council, or its
`members. IEEE Corporate Office: 3 Park Avenue, 17th Floor, New York, NY 10016-5997. IEEE Operations Center: 445 Hoes Lane, P.O. Box 1331, Piscataway, NJ 08855-1331. NJ
`Telephone: +1 732 981 0060. Price/Publication Information: Individual copies: IEEE Members $20.00 (first copy only), nonmembers $90.00 per copy. (Note: Postage and handling
`charge not included.) Member and nonmember subscription prices available upon request. Available in microfiche and microfilm. Copyright and Reprint Permissions: Abstracting is
`permitted with credit to the source. Libraries are permitted to photocopy for private use of patrons, provided the per-copy fee indicated in the code at the bottom of the first page is paid
`through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923. For all other copying, reprint, or republication permission, write to Copyrights and Permissions
`Department, IEEE Publications Administration, 445 Hoes Lane, P.O. Box 1331, Piscataway, NJ 08855-1331. Copyright © 2007 by the Institute of Electrical and Electronics Engineers,
`Inc. All rights reserved. Periodicals Postage Paid at New York, NY and at additional mailing offices. Postmaster: Send address changes to IEEE TRANSACTIONS ON AUDIO, SPEECH, AND
`LANGUAGE PROCESSING, IEEE, 445 Hoes Lane, P.O. Box 1331, Piscataway, NJ 08855-1331. GST Registration No. 125634188. Printed in U.S.A.
`
`Digital Object Identifier 10.1109/TASL.2007.897012
`
`

`

`IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 4, MAY 2007
`
`1327
`
`Direction of Arrival Estimation Using the
`Parameterized Spatial Correlation Matrix
`
`Jacek Dmochowski, Jacob Benesty, Senior Member, IEEE, and Sofiène Affes, Senior Member, IEEE
`
`Abstract—The estimation of the direction-of-arrival (DOA) of
`one or more acoustic sources is an area that has generated much
`interest in recent years, with applications like automatic video
`camera steering and multiparty stereophonic teleconferencing
`entering the market. DOA estimation algorithms are hindered by
`the effects of background noise and reverberation. Methods based
`on the time-differences-of-arrival (TDOA) are commonly used
`to determine the azimuth angle of arrival of an acoustic source.
`TDOA-based methods compute each relative delay using only two
`microphones, even though additional microphones are usually
`available. This paper deals with DOA estimation based on spatial
`spectral estimation, and establishes the parameterized spatial cor-
`relation matrix as the framework for this class of DOA estimators.
`This matrix jointly takes into account all pairs of microphones,
`and is at the heart of several broadband spatial spectral estima-
`tors, including steered-response power (SRP) algorithms. This
`paper reviews and evaluates these broadband spatial spectral esti-
`mators, comparing their performance to TDOA-based locators. In
`addition, an eigenanalysis of the parameterized spatial correlation
`matrix is performed and reveals that such analysis allows one to
`estimate the channel attenuation from factors such as uncalibrated
`microphones. This estimate generalizes the broadband minimum
`variance spatial spectral estimator to more general signal models.
`A DOA estimator based on the multichannel cross correlation
`coefficient (MCCC) is also proposed. The performance of all
`proposed algorithms is included in the evaluation. It is shown that
`adding extra microphones helps combat the effects of background
`noise and reverberation. Furthermore, the link between accurate
`spatial spectral estimation and corresponding DOA estimation
`is investigated. The application of the minimum variance and
`MCCC methods to the spatial spectral estimation problem leads
`to better resolution than that of the commonly used fixed-weighted
`SRP spectrum. However, this increased spatial spectral resolution
`does not always translate to more accurate DOA estimation.
`
`Index Terms—Circular arrays, delay-and-sum beamforming
`(DSB), direction-of-arrival (DOA) estimation, linear spatial predic-
`tion, microphone arrays, multichannel cross correlation coefficient
`(MCCC), spatial correlation matrix, time delay estimation.
`
`I. INTRODUCTION
`
`P ROPAGATING signals contain much information about
`
`the sources that emit them. Indeed, the location of a signal
`source is of much interest in many applications, and there exists
`a large and increasing need to locate and track sound sources.
`
`Manuscript received September 6, 2006; revised November 8, 2006. The as-
`sociate editor coordinating the review of this manuscript and approving it for
`publication was Dr. Hiroshi Sawada.
`The authors are with the Institut National de la Recherche Scientifique-
`Énergie, Matériaux, et Télécommunications (INRS-EMT), Université du
`Québec, Montréal, QC H5A 1K6, Canada (e-mail: dmochow@emt.inrs.ca).
`Digital Object Identifier 10.1109/TASL.2006.889795
`
`For example, a signal-enhancing beamformer [1], [2] must con-
`tinuously monitor the position of the desired signal source in
`order to provide the desired directivity and interference sup-
`pression. This paper is concerned with estimating the direc-
`tion-of-arrival (DOA) of acoustic sources in the presence of sig-
`nificant levels of both noise and reverberation.
`The two major classes of broadband DOA estimation
`techniques are those based on the time-differences-of-arrival
`(TDOA) and spatial spectral estimators. The latter terminology
`arises from the fact that spatial frequency corresponds to the
`wavenumber vector, whose direction is that of the propagating
`signal. Therefore, by looking for peaks in the spatial spectrum,
`one is determining the DOAs of the dominant signal sources.
`The TDOA approach is based on the relationship between
`DOA and relative delays across the array. The problem of es-
`timating these relative delays is termed “time delay estimation”
`[3]. The generalized cross-correlation (GCC) approach of [4],
`[5] is the most popular time delay estimation technique. Alter-
`native methods of estimating the TDOA include phase regres-
`sion [6] and linear prediction preprocessing [7]. The resulting
`relative delays are then mapped to the DOA by an appropriate
`inverse function that takes into account array geometry.
`Even though multiple-microphone arrays are commonplace
`in time delay estimation algorithms, there has not emerged a
`clearly preferred way of combining the various measurements
`from multiple microphones. Notice that in the TDOA approach,
`the time delays are estimated using only two microphones at a
`time, even though one usually has several more sensor outputs at
`one’s disposal. The averaging of measurements from indepen-
`dent pairs of microphones is not an optimal way of combining
`the measurements, as each computed time delay is derived from
`only two microphones, and thus often contains significant levels
`of corrupting noise and interference. It is thus well known that
`current TDOA-based DOA estimation algorithms are plagued
`by the effects of both noise and especially reverberation.
`To that end, Griebel and Brandstein [8] map all “realizable”
`combinations of microphone-pair delays to the corresponding
`source locations, and maximize simultaneously the sum (across
`various microphone pairs) of cross-correlations across all pos-
`sible locations. This approach is notable, as it jointly maximizes
`the results of the cross-correlations between the various micro-
`phone pairs.
`The spatial spectral estimation problem is well defined in the
`narrowband signal community. There are three major methods:
`the steered conventional beamformer approach (also termed
`the “Bartlett” estimate), the minimum variance estimator (also
`termed the “Capon” or maximum-likelihood estimator), and
`the linear spatial predictive spectral estimator. Reference [9]
`
`1558-7916/$25.00 © 2007 IEEE
`
`

`

`1328
`
`IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 4, MAY 2007
`
`provides an excellent overview of these approaches. These
`three approaches are unified in their use of the narrowband
`spatial correlation matrix, as outlined in the next section.
`The situation is more scattered in the broadband signal
`case. Various spectral estimators have been proposed, but there
`does not exist any common framework for organizing these
`approaches. The steered conventional beamformer approach
`applies to broadband signals. The delay-and-sum beamformer
`(DSB) is steered to all possible DOAs to determine the DOA
`which emits the most energy. An alternative formulation of
`this approach is termed the “steered-response power” (SRP)
`method, which exploits the fact that the DSB output power may
`be written as a sum of cross-correlations. The computational
`requirements of the SRP method are a hindrance to practical
`implementation [8]. A detailed treatment of steered-beam-
`former approaches to source localization is given in [10], and
`the statistical optimality of the approach is shown in [11]–[13].
`Krolik and Swingler develop a broadband minimum variance
`estimator based on the steered conventional beamformer [14],
`which may be viewed as an adaptive weighted SRP algorithm.
`There have also been approaches that generalize narrowband
`localization algorithms (i.e., MUSIC [15]) to broadband sig-
`nals through subband processing and subsequent combining
`(see [16], for example). A broadband linear spatial predictive
`approach to time delay estimation is outlined in [17] and [18].
`This approach, which is limited to linear array geometries,
`makes use of all the channels in a joint fashion via the time
`delay parameterized spatial correlation matrix.
`This paper attempts to unify broadband spatial spectral esti-
`mators into a single framework and compares their performance
`from a DOA estimation standpoint to TDOA-based algorithms.
`This unified framework is the azimuth parameterized spatial
`correlation matrix, which is at the heart of all broadband spa-
`tial spectral estimators.
`In addition, several new ideas are presented. First, due to
`the parametrization, well-known narrowband array processing
`notions [19] are applied to the DOA estimation problem, gen-
`eralizing these ideas to the broadband case. A DOA estimator
`based on the eigenanalysis of the parameterized spatial corre-
`lation matrix ensues. More importantly, it is shown that this
`eigenanalysis allows one to estimate the channel attenuation
`from factors such as uncalibrated microphones. The existing
`minimum variance approach to broadband spatial spectral esti-
`mation is reformulated in the context of a more general signal
`model which accounts for such attenuation factors. Further-
`more, the ideas of [17] and [18] are extended to more general
`array geometries (i.e., circular) via the azimuth parameterized
`spatial correlation matrix, resulting in a minimum entropy DOA
`estimator.
`Circular arrays (see [20]–[22], for example) offer some ad-
`vantages over their linear counterparts. A circular array provides
`spatial discrimination over the entire 360 azimuth range, which
`is particularly important for applications that require front-to-
`back signal enhancement, such as teleconferencing. Further-
`more, a circular array geometry allows for more compact de-
`signs. While the contents of this paper apply generally to planar
`array geometries, the circular geometry is used throughout the
`simulation portion.
`
`Fig. 1. Circular array geometry.
`
`Section II presents the signal propagation model in planar
`(i.e., circular) arrays and serves as the foundation for the re-
`mainder of the paper. Section III reviews the role of the tradi-
`tional, nonparameterized spatial correlation matrix in narrow-
`band DOA estimation, and shows how the parameterized ver-
`sion of the spatial correlation matrix allows for generalization
`to broadband signals. Section IV describes the existing and pro-
`posed broadband spatial spectral estimators in terms of the pa-
`rameterized spatial correlation matrix. Section V outlines the
`simulation model employed throughout this paper and evaluates
`the performance of all spatial spectral estimators and TDOA-
`based methods in both reverberation- and noise-limited envi-
`ronments. Concluding statements are given in Section VI.
`The spatial spectral estimation approach to DOA estimation
`has limitations in certain reverberant environments. If an inter-
`fering signal or reflection arrives at the array with a higher en-
`ergy than the direct-path signal, the DOA estimate will be false,
`even though the spatial spectral estimate is accurate. Such situ-
`ations arise when the source is oriented towards a reflective bar-
`rier and away from the array. This problem is beyond the scope
`of this paper and is not addressed herein. Rather, the focus of
`this paper is on the evaluation of spatial spectral estimators in
`noisy and reverberant environments and on their application to
`DOA estimation.
`
`II. SIGNAL MODEL
`
`elements in a 2-D geom-
`Assume a planar array of
`etry, shown in Fig. 1 (i.e., circular geometry), whose outputs
`,
`, where
`is the time index.
`are denoted by
`Denoting the azimuth angle of arrival by , propagation of the
`signal from a far-field source to microphone is modeled as:
`
`(1)
`
`, are the attenuation factors due to
`,
`where
`channel effects,
`is the propagation time, in samples, from the
`to microphone 0,
`is an additive noise
`unknown source
`signal at the th microphone, and
`, is the
`
`,
`
`

`

`DMOCHOWSKI et al.: DIRECTION OF ARRIVAL ESTIMATION USING THE PARAMETERIZED SPATIAL CORRELATION MATRIX
`
`1329
`
`relative delay between microphones 0 and . In matrix form, the
`array signal model becomes:
`
`...
`
`...
`
`...
`
`. . .
`
`...
`
`. . .
`. . .
`
`...
`
`...
`
`although presented in far-field planar context, easily generalize
`to the near-field spherical case by including the range and ele-
`vation in the forthcoming parametrization.
`
`III. PARAMETERIZED SPATIAL CORRELATION MATRIX
`In narrowband signal applications, a common space-time
`statistic is that of the spatial correlation matrix [19], which is
`given by
`
`(2)
`
`where
`
`(5)
`
`(6)
`
`The function
`relates the angle of arrival to the relative delays
`between microphone elements 0 and , and is derived for the case
`of an equispaced circular array in the following manner. When
`operating in the far-field, the time delay between microphone
`and the center of the array is given by [23]
`
`where the azimuth angle (relative to the selected angle refer-
`,
`ence) of the th microphone is denoted by
`,
`denotes the array radius, and is the speed of signal
`propagation. It easily follows that
`
`(3)
`
`(4)
`
`may
`It is also worth mentioning that the additive noise
`. In that
`be temporally correlated with the desired signal
`case, a reverberant environment is modeled. The anechoic en-
`vironment is modeled by making the additive noise temporally
`uncorrelated with the source signal. In either case, the additive
`noise may be spatially correlated across the sensors.
`It should also be stated that the signal model presented above
`makes use of the far-field assumption, in that the incoming wave
`is assumed to be planar, such that all sensors perceive the same
`DOA. An error is incurred if the signal source is actually lo-
`cated in the near-field; in that case, the relative delays are also
`a function of the range. In the most general case (i.e., a source
`takes three
`in the near-field of a 3-D geometry), the function
`parameters: the azimuth, range, and elevation. This paper fo-
`cuses on a specific subset of this general model: a source located
`in the far-field with only a slight elevation, such that a single
`parameter suffices. This is commonly the case in a teleconfer-
`encing environment. Nevertheless, the concepts of this paper,
`
`the superscript
`denotes conjugate transpose, as complex sig-
`de-
`nals are commonly used in narrowband applications, and
`notes the transpose of a matrix or vector. To steer these array
`outputs to a particular DOA, one applies a complex weight to
`each sensor output, whose phase performs the steering, and then
`sums the sensor outputs to form the output beam. Now, if the
`input signal is no longer narrowband, each frequency requires
`its own complex weight to appropriately phase-shift the signal
`at that frequency. In the context of broadband spatial spectral
`estimation, the spatial correlation matrix may be computed at
`each temporal frequency, and the resulting spatial spectrum is
`now a function of the temporal frequency. For broadband appli-
`cations, these narrowband estimates may be assimilated into a
`time-domain statistic, a procedure termed “focusing,” which is
`described in [24]. The resulting structure is termed a “focused
`covariance matrix.”
`In this paper, broadband spatial spectral estimation is
`addressed in another manner. Instead of implementing the
`steering delays in the complex weighting at each sensor, the
`delays are actually implemented as a time-delay in the spatial
`correlation matrix, which is now parameterized. Thus, each
`microphone output is appropriately delayed before computing
`this parameterized spatial correlation matrix:
`
`(7)
`
`and real signals are assumed from this point on. The delays are
`a function of the assumed azimuth DOA, which becomes the
`parameter. The parameterized spatial correlation matrix is for-
`mally written as shown by (8) and (9) at the bottom of the page.
`is not simply the array observation matrix, as is
`The matrix
`commonly used in narrowband beamforming models. Instead, it

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket