`US RE48,371
`Sonos Ex. 1022
`
`
`
`HANDBOOK ON ARRAY
`PROCESSING AND
`SENSOR NETWORKS
`
`HANDBOOK ON ARRAY
`PROCESSING AND
`SENSOR NETWORKS
`
`
`
`
`
`HANDBOOK ON ARRAY
`PROCESSING AND
`SENSOR NETWORKS
`
`Simon Haykin
`K. J. Ray Liu
`
`A JOHN WILEY & SONS, INC., PUBLICATION
`
`
`
`Copyright © 2009 by John Wiley & Sons, Inc. All rights reserved
`
`Published by John Wiley & Sons, Inc., Hoboken, New Jersey
`Published simultaneously in Canada
`
`No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or
`by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as
`permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior
`written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to
`the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax
`(978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be
`addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030,
`(201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
`
`Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in
`preparing this book, they make no representations or warranties with respect to the accuracy or
`completeness of the contents of this book and specifically disclaim any implied warranties of
`merchantability or fitness for a particular purpose. No warranty may be created or extended by sales
`representatives or written sales materials. The advice and strategies contained herein may not be suitable
`for your situation. You should consult with a professional where appropriate. Neither the publisher nor
`author shall be liable for any loss of profit or any other commercial damages, including but not limited to
`special, incidental, consequential, or other damages.
`
`For general information on our other products and services or for technical support, please cantact our
`Customer Care Department within the United States at (800) 762-2974, outside the United States at (317)
`572-3993 or fax (317) 572-4002.
`
`Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may
`not be available in electronic formats. For more information about Wiley products, visit our web site at
`www.wiley.com.
`
`Library of Congress Cataloging-in-Publication Data:
`
`Haykin, Simon
`Handbook on array processing and sensor networks / Simon Haykin, K. J. Ray Liu.
`p. cm.
`Includes bibliographical references and index.
`ISBN 978-0-470-37176-3 (cloth)
`1. Sensor networks. 2. Antenna arrays. 3. Array processors. I. Liu, K. J. Ray, 1961- II. Title.
`TK7872.D48H39 2009
`621.382’4–dc22
`
`2008055880
`
`Printed in the United States of America
`
`10 9 8 7 6 5 4 3 2 1
`
`
`
`CONTENTS
`
`Preface
`Simon Haykin and K. J. Ray Liu
`Contributors
`
`Introduction
`Simon Haykin
`
`PART I: FUNDAMENTAL ISSUES IN ARRAY SIGNAL PROCESSING
`
`1 Wavefields
`Alfred Hanssen
`
`Introduction
`1.1
`Harmonizable Stochastic Processes
`1.2
`Stochastic Wavefields
`1.3
`1.4 Wave Dispersion
`1.5
`Conclusions
`Acknowledgments
`References
`
`2 Spatial Spectrum Estimation
`Petar M. Djuri´c
`
`Introduction
`2.1
`Fundamentals
`2.2
`Temporal Spectrum Estimation
`2.3
`Spatial Spectrum Estimation
`2.4
`Final Remarks
`2.5
`References
`
`3 MIMO Radio Propagation
`Tricia J. Willink
`
`Introduction
`3.1
`Space–Time Propagation Environment
`3.2
`Propagation Models
`3.3
`3.4 Measured Channel Characteristics
`3.5
`Stationarity
`
`xiii
`
`xv
`
`1
`
`9
`
`11
`
`11
`12
`15
`19
`26
`27
`27
`
`29
`
`29
`33
`34
`41
`56
`56
`
`59
`
`59
`60
`64
`75
`81
`
`v
`
`
`
`vi
`
`CONTENTS
`
`Summary
`3.6
`References
`
`4 Robustness Issues in Sensor Array Processing
`Alex B. Gershman
`
`Introduction
`4.1
`Direction-of-Arrival Estimation
`4.2
`Adaptive Beamforming
`4.3
`Conclusions
`4.4
`Acknowledgments
`References
`
`5 Wireless Communication and Sensing in Multipath Environments
`Using Multiantenna Transceivers
`Akbar M. Sayeed and Thiagarajan Sivanadyan
`
`86
`87
`
`91
`
`91
`92
`102
`107
`108
`108
`
`115
`
`115
`Introduction and Overview
`5.1
`5.2 Multipath Wireless Channel Modeling in Time, Frequency, and Space 118
`5.3
`Point-to-Point MIMO Wireless Communication Systems
`133
`5.4
`Active Wireless Sensing with Wideband MIMO Transceivers
`156
`5.5
`Concluding Remarks
`165
`References
`166
`
`PART II: NOVEL TECHNIQUES FOR AND APPLICATIONS
`OF ARRAY SIGNAL PROCESSING
`
`6
`
`Implicit Training and Array Processing for Digital Communication
`Systems
`Aldo G. Orozco-Lugo, Mauricio Lara and Desmond C. McLernon
`
`6.1
`6.2
`6.3
`6.4
`
`Introduction
`Classification of Implicit Training Methods
`IT-Based Estimation for a Single User
`IT-Based Estimation for Multiple Users Exploiting Array Processing:
`Continuous Transmission
`IT-Based Estimation for Multiple Users Exploiting Array Processing:
`Packet Transmission
`Open Research Problems
`6.6
`Acknowledgments
`References
`
`6.5
`
`7 Unitary Design of Radar Waveform Diversity Sets
`Michael D. Zoltowski, Tariq R. Qureshi, Robert Calderbank and Bill Moran
`
`7.1
`
`Introduction
`
`171
`
`173
`
`173
`180
`186
`
`191
`
`199
`201
`203
`203
`
`211
`
`211
`
`
`
`CONTENTS
`
`2 × 2 Space–Time Diversity Waveform Design
`7.2
`4 × 4 Space–Time Diversity Waveform Design
`7.3
`7.4 Waveform Families Based on Kronecker Products
`7.5
`Introduction to Data-Dependent Waveform Design
`3 × 3 and 6 × 6 Waveform Scheduling
`7.6
`7.7
`Summary
`References
`
`8 Acoustic Array Processing for Speech Enhancement
`Markus Buck, Eberhard H¨ansler, Mohamed Krini, Gerhard Schmidt
`and Tobias Wolff
`
`Introduction
`8.1
`Signal Processing in Subband Domain
`8.2
`8.3 Multichannel Echo Cancellation
`8.4
`Speaker Localization
`8.5
`Beamforming
`8.6
`Sensor Calibration
`8.7
`Postprocessing
`8.8
`Conclusions
`References
`
`9 Acoustic Beamforming for Hearing Aid Applications
`Simon Doclo, Sharon Gannot, Marc Moonen and Ann Spriet
`
`Introduction
`9.1
`Overview of noise reduction techniques
`9.2
`9.3 Monaural beamforming
`9.4
`Binaural beamforming
`9.5
`Conclusion
`References
`
`10 Underdetermined Blind Source Separation Using Acoustic Arrays
`Shoji Makino, Shoko Araki, Stefan Winter and Hiroshi Sawada
`
`vii
`
`213
`217
`220
`226
`228
`229
`229
`
`231
`
`231
`233
`236
`240
`242
`249
`252
`264
`264
`
`269
`
`269
`270
`272
`286
`296
`296
`
`303
`
`303
`
`10.1 Introduction
`10.2 Underdetermined Blind Source Separation of Speeches in Reverberant
`305
`Environments
`307
`10.3 Sparseness of Speech Sources
`312
`10.4 Binary Mask Approach to Underdetermined BSS
`321
`10.5 MAP-Based Two-Stage Approach to Underdetermined BSS
`10.6 Experimental Comparison with Binary Mask Approach and MAP-Based
`Two-Stage Approach
`328
`10.7 Concluding Remarks
`335
`References
`337
`
`
`
`viii
`
`CONTENTS
`
`11 Array Processing in Astronomy
`Douglas C.-J. Bock
`
`11.1 Introduction
`11.2 Correlation Arrays
`11.3 Aperture Plane Phased Arrays
`11.4 Future Directions
`11.5 Conclusion
`References
`
`12 Digital 3D/4D Ultrasound Imaging Array
`Stergios Stergiopoulos
`
`12.1 Background
`12.2 Next-Generation 3D/4D Ultrasound Imaging Technology
`12.3 Computing Architecture and Implementation Issues
`12.4 Experimental Planar Array Ultrasound Imaging System
`12.5 Conclusion
`References
`
`PART III: FUNDAMENTAL ISSUES IN DISTRIBUTED
`SENSOR NETWORKS
`
`13 Self-Localization of Sensor Networks
`Joshua N. Ash and Randolph L. Moses
`
`13.1 Introduction
`13.2 Measurement Types and Performance Bounds
`13.3 Localization Algorithms
`13.4 Relative and Transformation Error Decomposition
`13.5 Conclusions
`References
`
`14 Multitarget Tracking and Classification in Collaborative Sensor
`Networks via Sequential Monte Carlo Methods
`Tom Vercauteren and Xiaodong Wang
`
`14.1 Introduction
`14.2 System Description and Problem Formulation
`14.3 Sequential Monte Carlo Methods
`14.4 Joint Single-Target Tracking and Classification
`14.5 Multiple-Target Tracking and Classification
`14.6 Sensor Selection
`14.7 Simulation Results
`14.8 Conclusion
`
`343
`
`343
`343
`361
`362
`364
`365
`
`367
`
`367
`372
`392
`394
`403
`404
`
`407
`
`409
`
`409
`411
`420
`427
`434
`435
`
`439
`
`439
`440
`446
`448
`452
`456
`459
`464
`
`
`
`Appendix: Derivations of (14.38) and (14.40)
`References
`
`15 Energy-Efficient Decentralized Estimation
`Jin-Jun Xiao, Shuguang Cui and Zhi-Quan Luo
`
`15.1 Introduction
`15.2 System Model
`15.3 Digital Approaches
`15.4 Analog Approaches
`15.5 Analog versus Digital
`15.6 Extension to Vector Model
`15.7 Concluding Remarks
`Acknowledgments
`References
`
`16 Sensor Data Fusion with Application to Multitarget Tracking
`R. Tharmarasa, K. Punithakumar, T. Kirubarajan and Y. Bar-Shalom
`
`16.1 Introduction
`16.2 Tracking Filters
`16.3 Data Association
`16.4 Out-of-Sequence Measurements
`16.5 Results with Real Data
`16.6 Summary
`References
`
`17 Distributed Algorithms in Sensor Networks
`Usman A. Khan, Soummya Kar and Jos´e M. F. Moura
`
`17.1 Introduction
`17.2 Preliminaries
`17.3 Distributed Detection
`17.4 Consensus Algorithms
`17.5 Zero-Dimension (Average) Consensus
`17.6 Consensus in Higher Dimensions
`17.7 Leader–Follower (Type) Algorithms
`17.8 Localization in Sensor Networks
`17.9 Linear System of Equations: Distributed Algorithm
`17.10 Conclusions
`References
`
`18 Cooperative Sensor Communications
`Ahmed K. Sadek, Weifeng Su and K. J. Ray Liu
`
`18.1 Introduction
`
`CONTENTS
`
`ix
`
`465
`466
`
`469
`
`469
`471
`472
`476
`485
`487
`492
`494
`494
`
`499
`
`499
`500
`511
`521
`524
`527
`527
`
`533
`
`533
`535
`538
`539
`542
`544
`545
`548
`551
`553
`553
`
`559
`
`559
`
`
`
`x
`
`CONTENTS
`
`18.2 Cooperative Relay Protocols
`18.3 SER Analysis and Optimal Power Allocation
`18.4 Energy Efficiency in Cooperative Sensor Networks
`18.5 Experimental Results
`18.6 Conclusions
`References
`
`19 Distributed Source Coding
`Zixiang Xiong, Angelos D. Liveris and Yang Yang
`
`19.1 Introduction
`19.2 Theoretical Background
`19.3 Code Designs
`19.4 Applications
`19.5 Conclusions
`References
`
`20 Network Coding for Sensor Networks
`Christina Fragouli
`
`20.1 Introduction
`20.2 How Can We Implement Network Coding in a Practical Sensor
`Network?
`20.3 Data Collection and Coupon Collector Problem
`20.4 Distributed Storage and Sensor Network Data Persistence
`20.5 Decentralized Operation and Untuned Radios
`20.6 Broadcasting and Multipath Diversity
`20.7 Network, Channel, and Source Coding
`20.8 Identity-Aware Sensor Networks
`20.9 Discussion
`Acknowledgments
`References
`
`21
`
`Information-Theoretic Studies of Wireless Sensor Networks
`Liang-Liang Xie and P. R. Kumar
`
`21.1 Introduction
`21.2 Information-Theoretic Studies
`21.3 Relay Schemes
`21.4 Wireless Network Coding
`21.5 Concluding Remarks
`Acknowledgments
`References
`
`561
`568
`589
`599
`606
`606
`
`609
`
`609
`610
`619
`631
`638
`639
`
`645
`
`645
`
`649
`653
`657
`660
`662
`663
`664
`666
`666
`666
`
`669
`
`669
`670
`674
`684
`688
`689
`689
`
`
`
`CONTENTS
`
`xi
`
`PART IV: NOVEL TECHNIQUES FOR AND APPLICATIONS OF
`DISTRIBUTED SENSOR NETWORKS
`
`22 Distributed Adaptive Learning Mechanisms
`Ali H. Sayed and Federico S. Cattivelli
`
`22.1 Introduction
`22.2 Motivation
`22.3 Incremental Adaptive Solutions
`22.4 Diffusion Adaptive Solutions
`22.5 Concluding Remarks
`Acknowledgments
`References
`
`23 Routing for Statistical Inference in Sensor Networks
`A. Anandkumar, A. Ephremides, A. Swami and L. Tong
`
`23.1 Introduction
`23.2 Spatial Data Correlation
`23.3 Statistical Inference of Markov Random Fields
`23.4 Optimal Routing for Inference with Local Processing
`23.5 Conclusion and Future Work
`23.6 Bibliographic Notes
`References
`
`24 Spectral Estimation in Cognitive Radios
`Behrouz Farhang-Boroujeny
`
`24.1 Filter Bank Formulation of Spectral Estimators
`24.2 Polyphase Realization of Uniform Filter Banks
`24.3 Periodogram Spectral Estimator
`24.4 Multitaper Spectral Estimator
`24.5 Filter Bank Spectral Estimator
`24.6 Distributed Spectrum Sensing
`24.7 Discussion
`Appendix A: Effective Degree of Freedom
`Appendix B: Explanation to the Results of Table 24.1
`References
`
`25 Nonparametric Techniques for Pedestrian Tracking in Wireless Local
`Area Networks
`Azadeh Kushki and Konstantinos N. Plataniotis
`
`25.1 Introduction
`
`693
`
`695
`
`695
`697
`698
`707
`720
`721
`721
`
`723
`
`723
`724
`730
`731
`744
`745
`745
`
`749
`
`750
`751
`752
`757
`766
`773
`776
`777
`779
`779
`
`783
`
`783
`
`
`
`xii
`
`CONTENTS
`
`25.2 WLAN Positioning Architectures
`25.3 Signal Models
`25.4 Zero-Memory Positioning
`25.5 Dynamic Positioning Systems
`25.6 Cognition and Feedback
`25.7 Tracking Example
`25.8 Conclusions
`References
`
`26 Reconfigurable Self-Activating Ion-Channel-Based Biosensors
`Vikram Krishnamurthy and Bruce Cornell
`
`26.1 Introduction
`26.2 Biosensors Built of Ion Channels
`26.3 Joint Input Excitation Design and Concentration Classification for
`Biosensor
`26.4 Decentralized Deployment of Dense Network of Biosensors
`26.5 Discussion and Extensions
`References
`
`27 Biochemical Transport Modeling, Estimation, and Detection in
`Realistic Environments
`Mathias Ortner and Arye Nehorai
`
`27.1 Introduction
`27.2 Physical and Statistical Models
`27.3 Transport Modeling Using Monte Carlo Approximation
`27.4 Localizing the Source(s)
`27.5 Sequential Detection
`27.6 Conclusion
`References
`
`28 Security and Privacy for Sensor Networks
`Wade Trappe, Peng Ning and Adrian Perrig
`
`28.1 Introduction
`28.2 Security and Privacy Challenges
`28.3 Ensuring Integrity of Measurement Process
`28.4 Availability Attacks against the Wireless Link
`28.5 Ensuring Privacy of Routing Contexts
`28.6 Conclusion
`References
`
`Index
`
`785
`786
`788
`790
`796
`799
`801
`801
`
`805
`
`805
`807
`
`812
`816
`826
`827
`
`831
`
`831
`832
`835
`843
`846
`849
`851
`
`855
`
`855
`856
`860
`868
`876
`882
`883
`
`889
`
`
`
`Preface
`
`More than a decade ago, a book edited by Simon Haykin on array processing was a huge
`success with significant impact. Ever since, the field of array processing has grown
`to the extent that one can see its applications everywhere. Indeed, traditional array
`techniques form the foundation of the more general sensor processing and networking
`that continue to advance the state-of-the-art research and find ubiquitous applications.
`Sensor networks and array processing form the two pillars of the proposed handbook.
`Sensors and array processing, in their own individual ways, have been active areas
`of research for several decades: Wireless communications, radar, radio astronomy, and
`biomedical engineering, just to name a few important ones. This new Handbook on
`Array Processing and Sensor Networks addresses these topics in an organized manner
`under a single umbrella.
`The major goal of this Handbook is to collect tutorial discussions on recent advance-
`ments and state-of-the-art results by providing a comprehensive overview of array
`processing and sensor networks. It covers fundamental principles as well as applica-
`tions. This handbook features some of the most prominent researchers from all over
`the world, addressing the important topics that we consider to be essential for mak-
`ing the handbook highly valuable to the readers; this point is well borne out by the
`list of contents.
`This Handbook consists of an introductory chapter, followed by 28 chapters that are
`written by leading authorities in sensor networks and array signal processing. Putting
`all this material together under a single umbrella, we have a Handbook that is one of
`a kind.
`This Handbook should appeal to researchers as well as graduate students and new-
`comers to the field of sensors and array processing, and thereby learn not only about the
`many facets of these two subjects but also exploit the possibility of cross fertilization
`between them. Moreover, this Handbook may also appeal to professors in teaching
`graduate courses on sensor networks and/or array signal processing.
`
`Simon Haykin
`McMaster University
`
`K. J. Ray Liu
`University of Maryland, College Park
`
`xiii
`
`
`
`CHAPTER 8
`Acoustic Array Processing for Speech
`Enhancement
`
`Markus Buck1, Eberhard H ¨ansler2, Mohamed Krini1, Gerhard Schmidt1, and
`Tobias Wolff1
`1Harman/Becker Automotive Systems, Ulm, Germany
`2Technische Universit ¨at Darmstadt, Darmstadt, Germany
`
`8.1 INTRODUCTION
`
`Today, hands-free functionality has become a standard for acoustic front ends of
`telephone and speech dialog systems. However, these systems are often applied in
`adverse acoustic environments where ambient noise as well as acoustic couplings of
`loudspeaker signals superpose the desired speech signal. Furthermore, the level of the
`desired speech signal is reduced due to the relatively large distance between speaker
`and microphones. Therefore, the quality of the microphone signals is poor. Methods
`for controlling noise and echo without degrading speech quality are still subject of
`intensive research.
`The application of array processing has opened new chances in speech and audio
`signal processing. There are at least two favorable features: reduction of processing
`power and/or considerable improvement of system performance.
`Speech enhancement procedures demand high amounts of computing power even
`if they work with a single input channel. This attribute stems from the properties of
`the electroacoustic environment. Splitting the signal into subbands and, for example,
`adapting cancellation filters for the subband signals reduces the necessary computa-
`tion power notably [1] and—as an additional benefit—leads to a perceptibly higher
`performance of the system.
`Relying on multiple input channels permits the design of systems with improved
`quality as compared to single-channel systems. However, it also allows to solve prob-
`lems such as source localization and tracking or source separation, which cannot be
`answered if just a single signal is available.
`We organize this chapter around the example of a multichannel speech enhancement
`system (see Fig. 8.1). We discuss solutions to the various tasks such as signal analyzing
`and synthesizing, echo canceling, speaker localization, delay alignment and microphone
`calibration, beamforming, residual echo and noise suppression, dereverberation, and
`signal reconstruction. We also show the interrelations of the corresponding subsystems.
`
`Handbook on Array Processing and Sensor Networks. By Simon Haykin and K.J.R. Liu.
`Copyright © 2009 John Wiley & Sons, Inc.
`
`231
`
`
`
`232
`
`ACOUSTIC ARRAY PROCESSING FOR SPEECH ENHANCEMENT
`
`Analysis
`filter bank
`
`8.2
`
`y1 (n)
`
`yM (n)
`
`E1 (ejΩ
`
`m, n)
`
`EM (ejΩ
`
`m, n)
`
`Delay
`alignment
`and
`microphone
`calibration
`
`Beam-
`forming
`
`Post-
`processing
`
`Synthesis
`filter bank
`
`8.6
`
`8.5
`
`8.7
`
`8.2
`
`ˆ
`s (n)
`
`8.4
`
`Speaker
`localization
`
`XR (e jΩ
`
`m, n)
`
`U (e jΩ
`
`m, n)
`(includes residual echo
`and noise suppression,
`dereverberation,
`spatial postprocessing,
`and signal reconstruction)
`
`D1 (ejΩ
`ˆ
`
`m, n)
`
`Echo
`cancellation
`
`m, n)
`
`X1 (ejΩ
`Analysis
`filter bank
`x1 (n)
`
`xR (n)
`
`8.3
`
`8.2
`
`Figure 8.1 Example of multichannel speech enhancement system. Numbers in the frames refer
`to the related sections of this chapter.
`
`A look at Figure 8.1 gives an idea of the complexity of a modern speech enhance-
`ment system. Only brief discussions are possible in the frame of this handbook. Thus,
`for further details, we have to refer the reader to the references cited in this chapter.
`
`8.1.1 Acoustic Environments and Speech Signal Properties
`
`Problems such as acoustic echo cancellation, speech dereverberation, and noise reduc-
`tion arise whenever loudspeakers and microphones are placed in the same enclosure
`and the signal sources (speakers) are not close to the microphones. In this case micro-
`phones pick up not only the speech signal but also the reverberated signals from the
`loudspeakers together with environmental noise. Consequently, remote speakers have
`to listen to their echo, which is delayed by the round trip time of the transmission
`system.
`For low sound pressure and no overload of the converters, loudspeaker–enclosure–
`microphone (LEM) systems may be modeled with sufficient accuracy as linear systems.
`Their impulse responses can be described by a sequence of delayed delta impulses.
`The delays are associated with the geometrical lengths of related propagation paths. The
`amplitudes of the impulses depend on the reflection coefficients of the boundaries and
`on the inverse of the path lengths. As a first-order approximation one can assume that
`the impulse response decays exponentially.
`Figure 8.2 shows the impulse responses of LEM systems measured in a passenger
`car (left) and in an office (right). They are considerably “longer” than impulse responses
`of ordinary electrical systems.
`
`
`
`SIGNAL PROCESSING IN SUBBAND DOMAIN
`
`233
`
`Impulse response measured in a car
`
`Impulse response measured in an office
`
`0.4
`
`0.3
`
`0.2
`
`0.1
`
`0
`
`–0.1
`
`–0.2
`
`–0.3
`
`–0.4
`
`100 200 300 400 500 600 700
`Samples
`
`0.4
`
`0.3
`
`0.2
`
`0.1
`
`0
`
`–0.1
`
`–0.2
`
`–0.3
`
`–0.4
`
`100 200 300 400 500 600 700
`Samples
`
`Figure 8.2
`Impulse responses measured in a car (left) and in an office (right) (sampling
`frequency fs = 11 kHz).
`
`In addition, the impulse responses are highly sensitive to any changes such as the
`movement of a person within the enclosure (enclosure dislocation). Both properties
`together explain that high-order adaptive filters are required for echo canceling and
`noise suppression.
`The complexity of speech and audio signal processing is further increased by the
`fact that most of the time desired signals and noise are available as their sum only and
`that both occupy the same frequency range. Only sophisticated estimation procedures
`lead the way out of this difficulty.
`Speech signals are characterized by a large bandwidth: The frequency range relevant
`for signal processing spreads over a broad spectrum reaching from 70 Hz to 4 kHz for
`telephone applications and to even 10 kHz for broadband systems. Furthermore, the
`signal-to-noise ratio (SNR) of speech signals varies strongly over time. Whereas this
`nonstationary characteristic often poses difficulties for signal processing, it also offers
`the benefit of using speech pauses to analyze the noise.
`
`8.2 SIGNAL PROCESSING IN SUBBAND DOMAIN
`
`Processing speech and audio signals in a subband domain offers a good deal of advan-
`tages. Therefore, analysis–synthesis schemes are essential parts of processing systems
`for such signals (see Fig. 8.1). A multitude of structures has been investigated and
`proposed for individual applications [2, 3]. For speech coding, for example, the pre-
`ferred choice are analysis–synthesis schemes that conserve the amount of data in the
`subband or short-term frequency domain they extract from the time domain signal.
`Perfect reconstruction—the connection of the analysis and the synthesis stage with-
`out processing in between is only a delay—is very important here. Medium or even
`large aliasing components in each individual subband are tolerable as long as the syn-
`thesis stage compensates all aliasing components. For speech processing application,
`however, large aliasing components are not tolerable as they limit the performance of
`subband filters, whereas perfect reconstruction is not necessarily required.
`
`
`
`234
`
`ACOUSTIC ARRAY PROCESSING FOR SPEECH ENHANCEMENT
`
`Analysis
`filter bank
`
`Delay
`alignment and
`microphone
`calibration
`
`Beam-
`forming
`
`Synthesis
`filter bank
`
`echo
`cancellation
`
`Analysis
`filter bank
`
`Post-
`processing
`
`Speaker
`localization
`
`xr (n)
`
`ym (n)
`
`Block
`extraction
`
`Block
`extraction
`
`Rsub
`
`Down-
`sampling
`
`Rsub
`
`gana, k
`
`Windowing
`
`gana, k
`
`DFT
`
`DFT
`
`Xr (e jΩ
`m, n)
`
`Ym(e jΩ
`m, n)
`
`Signal
`processing
`
`ˆ
`s (n)
`
`Block
`concen-
`tration
`
`Windowing
`
`Rsub
`
`Up-sampling
`
`gsyn, k
`
`IDFT
`
`S (e jΩ
`ˆ
`m, n)
`
`Figure 8.3 Basic building blocks of a DFT-based analysis–synthesis system for loudspeaker
`signals xr (n) and microphone signals ym(n).
`
`The most popular uniform analysis–synthesis scheme applied for speech enhance-
`ment periodically performs DFTs and inverse DFTs of overlapping signal segments.
`The basic structure of such a system is depicted in Figure 8.3. First, a block of the last
`Nana samples of the input signals ym(n) and xr (n), respectively, is extracted at time n,
`with m ∈ {1, . . . , M} and r ∈ {1, . . . , R}. Subsequently, appropriate downsampling is
`applied to obtain the desired rate in the subband domain. Usually, the downsampling
`factor Rsub is chosen such that successive blocks overlap by 50 or 75%. Each segment
`is multiplied by a window function gana,k. Applying a DFT to the windowed signal
`vector results in the NDFT = Nana-point short-term spectrum of the respective block:1
`(cid:2)
`
`ym(nRsub − k) gana,k e
`
`−j μk .
`
`(8.1)
`
`(cid:3) = NDFT−1(cid:4)
`
`k=0
`
`Ym
`
`ej μ, n
`
`1The analysis of the loudspeaker signals xr (n) is performed in a similar manner.
`
`
`
`SIGNAL PROCESSING IN SUBBAND DOMAIN
`
`235
`
`For this type of analysis scheme the frequency supporting points μ are distributed
`equidistantly over the normalized frequency range:
`μ with μ ∈ {0, . . . , NDFT − 1} .
` μ = 2π
`NDFT
`
`(8.2)
`
`In order to match the frequency resolution of the filter bank with that of the human
`auditory system, the frequency resolution can be decreased toward higher frequencies.
`This can be achieved by replacing the delay elements within the block extraction
`unit with appropriately designed all-pass filters [4]. However, this feature comes at
`increased computational cost.
`After processing the microphone subband signals Ym(ej μ, n) and the reference
`subband signals Xr (ej μ, n) with specific signal processing methods, as described in the
`
`next sections, enhanced subband signals(cid:5)S(ej μ , n) are obtained. For synthesizing the
`
`output signal, first an inverse DFT is performed. The resulting vector is weighted with
`the synthesis window function gsyn,k and overlapping vectors are added (overlap-add
`method [2]) to compute the output signal.
`Whenever time-variant processing—such as noise suppression—is applied, the
`usage of synthesis windowing has the positive effect of reducing distortions at frame
`boundaries [5], especially when the spectral attenuation has changed significantly
`between consecutive blocks.
`If synthesis windowing should be applied and, at the same time, the frequency selec-
`tivity of the analysis should be enhanced, the DFT and its inverse can be extended to
`a more general structure such as polyphase filter banks (see Fig. 8.4).2 To achieve
`this, preceeding weighted blocks need to be added before the DFT is performed.
`
`Ym (e jΩ
`m, n)
`
`DFT
`
`ym (n)
`
`z−NDFT
`
`z−NDFT
`
`z−NDFT
`
`Block
`extraction
`
`Windowing
`
`gana, 1, k
`
`Down-
`sampling
`
`Rsub
`
`gana, 2, k
`
`Block
`extraction
`
`Rsub
`
`Block
`extraction
`
`gana, N, k
`ˆ
`
`Rsub
`
`Down-
`sampling
`
`Windowing
`
`Figure 8.4 Polyphase-based analysis system. Details about synthesis schemes can be found,
`e.g., in [1].
`
`2Note that the basic analysis–synthesis scheme described before is a special case of a polyphase filter bank
`where just one polyphase component is utilized.
`
`
`
`236
`
`ACOUSTIC ARRAY PROCESSING FOR SPEECH ENHANCEMENT
`
`Furthermore, the window function gana,k has to be extended by a so-called prototype
`low-pass filter that covers the current as well as previous frames. Using more than one
`frame for the current spectral analysis leads to a better frequency resolution and lower
`in-band aliasing properties. However, the inherent disadvantage is that the impulse
`
`response of the prototype filter is much longer than the DFT order Nana = (cid:6)N NDFT
`(with (cid:6)N being the number of blocks to be added). This results in a reduced time
`
`resolution. If the input signal changes its spectral characteristics within the time cor-
`responding to the memory size of the analysis stage of the filter bank, the short-time
`spectrum is smoothed. For very long prototype filters, that is, the ratio of the filter
`length and the sampling frequency Nana/fs is larger than about 150 ms, the spectral
`smoothing results in so-called postechoes. They appear at sudden changes of the coef-
`ficients of the residual echo and noise suppression filter. Due to a long prototype filter
`the synthesis stage has a long memory, too. Filling this memory with large amplitudes
`during a speech sequence and recording very small amplitudes afterwards leads to
`artifacts during the output of the stored large samples. Anyhow, if the length of the
`prototype low-pass filter is chosen not too large, polyphase filter banks are a good
`candidate for analysis–synthesis systems for speech enhancement applications.
`The main advantage of processing in the subband domain (compared to standard
`time-domain processing) is the reduction of the computational complexity. Depending
`on the size of the involved filters (e.g., for echo cancellation), the DFT-order NDFT,
`and the subsampling factor Rsub reductions of about 50–90% can be achieved. How-
`ever, the required memory usually increases by a factor of 2–4. A further advantage
`is the orthogonality of the individual subband signals. This allows, for example, for
`faster filter convergence whenever the time-domain signal is not white (which is def-
`initely true for speech signals). The main drawback of analysis–synthesis systems is
`the delay that is introduced in the signal path. Especially for hands-free applications
`this delay has a negative effect on the overall communication quality and thus limita-
`tions have been specified (e.g., for hands-free processing in mobile phones a maximum
`additional delay for signal processing of 39 ms is allowed [6]).
`For the rest of this chapter we will use a filter bank with NDFT = 256 subbands
`within all examples. For the window function a Hann [7] window of length Nana =
`NDFT is utilized. Neighboring frames overlap by 75%, resulting from a subsampling fac-
`tor Rsub = 64. The time-domain input signals are sampled at fs = 11,025 Hz, leading
`to a filter bank delay of about 23 ms.
`
`8.3 MULTICHANNEL ECHO CANCELLATION
`
`To avoid the problems of annoying echoes and howling of a communication loop, a set
`of subband adaptive filters can be placed parallel to the LEM system (see Fig. 8.5). If
`one succeeds in matching the impulse responses of the filters exactly with the subband
`impulse responses of the LEM system, the signals Xr (ej μ, n) and Em(ej μ, n) are
`perfectly decoupled without any disturbing effects to the users of the electroacoustic
`system.
`Since, in real applications, a perfect match (over all times and all situations) can-
`not be achieved, the remaining subband error signals Em(ej μ, n) still contain echo
`components. For further reduction of these signals an additional filter for residual
`echo suppression is applied in the transmitting path—as described in Section 8.7.
`
`
`
`MULTICHANNEL ECHO CANCELLATION
`
`237
`
`Analysis
`filter bank
`
`Delay alignment
`and microphone
`calibration
`
`Beam-
`forming
`
`Synthesis
`filter bank
`
`Echo
`cancellation
`
`Analysis
`filter bank
`
`Post-processing
`
`Speaker
`localization
`
`Analysis
`filter bank
`
`ym (n)
`
`Ym (e jΩ
`
`m, n)
`
`Em (e jΩ
`
`m, n)
`
`Dm (e jΩ
`ˆ
`
`m, n)
`
`yM (n)
`
`y1 (n)
`
`LEM
`system
`
`xR (n)
`
`x1 (n)
`
`Echo
`cancellation
`filter
`
`Xr (e jΩ
`
`m, n)
`
`Analysis
`filter bank
`
`Hr, m (e jΩ
`ˆ
`
`m, n)
`
`xr (n)
`
`Figure 8.5 Basic building blocks of a multichannel subband acoustic echo cancellation system.
`
`The complexity of echo cancellation schemes grows with the number of microphones
`M and the number of reference channels R. For multimicrophone–multireference sys-
`tems this leads to a huge computational complexity. A reduction can be achieved if
`echo cancellation is efficiently combined with beamforming. Section 8.5.4 will show
`details about such combinations.
`
`8.3.1 Adaptation Algorithms
`
`The majority of implementations of acoustic echo canceling systems use the normalized
`least-mean-square (NLMS) algorithm to update the adaptive filters. This gradient-type
`algorithm minimizes the mean-square error [8]. The update equation is given by
`(cid:2)H r,m(ej μ, n + 1) = (cid:2)H r,m(ej μ, n)
`+ μr,m(ej μ, n)
`
`.
`
`(8.3)
`
`(ej μ, n)
`
`∗ m
`
`R(cid:7)
`
`Xr (ej μ, n) E
`
`XH
`r (ej μ, n) Xr (ej μ, n)
`
`r=1
`
`
`
`238
`
`ACOUSTIC ARRAY PROCESSING FOR SPEECH ENHANCEMENT
`
`The term in the denominator of the update part in Eq. (8.3) represents a normalization
`according to the energy of all input vectors:
`Xr (ej μ, n), . . . , Xr (ej μ, n − Nec + 1)
`
`(cid:9)T
`
`,
`
`(8.4)
`
`Xr (ej μ, n) =(cid:8)
`
`with Nec denoting the length of the adaptive filters. This is necessary due to the
`high nonstationarity of speech signals. The step size of the update is controlled by
`the factor μr,m(ej μ, n). The algorithm is stable (in the mean-square sense) for 0 <
`μr,m(ej μ, n) < 2. Reducing the step size is necessary to prevent divergence of the
`filter coefficients in case of strong local speech signals and/or local background noise
`that are part of the microphone outputs.
`To perform the filter update the subband error signals Em(ej μ, n) are computed by
`
`subtracting the estimated echo components (cid:5)Dm(ej μ, n) from the microphone signals
`(cid:10)
`(cid:11)(cid:12)
`(cid:13)
`(cid:5)Dr,m(ej μ,n)
`Em(ej μ, n) = Ym(ej μ, n) − R(cid:4)
`(cid:13)(cid:10)
`(cid:11)
`(cid:12)
`(cid:5)Dm(ej μ,n)
`
`.
`
`(8.5)
`
`H r
`
`,m(ej μ, n) Xr (ej μ, n)
`
`(cid:2)H
`
`r=1
`
`Ym(ej μ, n):
`
`The NLMS algorithm has no memory, that is, it uses only error signals that are available
`at the time of the update. The speed of convergence can be improved by extending the
`NLMS algorithm to the so-called affine projection algorithm [9]. This means to consider
`previous blocks in the update equation and comes at on