throbber
IVAN TASHEV
`
`Sound
`Capture
`and
`Processing
`
`PRACTICAL
`APPROACHES
`
`IPR PETITION
`US RE48,371
`Sonos Ex. 1033
`
`

`

`Sound Capture and Processing
`
`

`

`-1k
`1B9~
`[[pt;
`,3,
`Sound Capture and Processin/009
`Practical Approaches
`
`I van J. Tashev
`Microsoft Research, USA
`
`ffiWILEY
`
`A John Wiley and SonR, Ltd., Publication
`
`

`

`To my family: the time to write
`this book was taken from them
`
`This edition first published 2009
`© 2009 John Wiley & Sons Ltd.,
`
`Registered office
`John Wiley & Sons Ltd, Tbe Atrium, Southern Gate, Chichester, West Sussex, P019 8SQ, United Kingdom
`
`rordctails of our global editorial offices, for customer services and for information about how to apply for permission to
`reuse lhe copyright material in this book please see our website at www.wiley.com.
`
`The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright,
`Designs and Patents Act 1988.
`
`All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmiued, in
`any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by
`the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
`
`Wiley also publishes its books in a variety of electronic formals. Sornecontentthat appears in print may not be available
`in electronic books.
`
`Designations used by companies to distinguish their producL~ are often claimed as trademarks. All brand names
`and product names used in this bnok are trade names, service marks, trademarks or registered trademarks of their
`respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication
`is designed to provide accur ate and authoritative information in regard to the subject matter covered. lt is sold on
`the understanding that the publisher is not engaged in rendering professional services. [f professional advice or other
`expert assistance is required, the services of a competent professional should be sought.
`
`MATLAB® is a trademark of The Math Works, Inc., and is used with permission. The MathWorh does not warrant
`the accuracy of the text or exercises in this book. This book's use or discussion of MATLAB® software or
`related products does not constitute endorsement or sponsorship by The Math Works of a particular pedagogical
`approach or particular use of MATLAB® software.
`
`Library of Congress Ca1a/oging-in-P11b/ication Data
`
`Tashev, [van J. (Ivan Jelev)
`Sound capture and processing: practical approaches/ Ivan J. Tashev.
`p. cm.
`Includes index.
`ISBN 978-0-470-31983-3 (cloth)
`1. Speech processing systems. 2. Sound-Recording and reproducing-Digital
`techniques. 3. Signal processing-Digital techniques. L Title.
`TK7882.S65T37 2009
`62I.382'8-<ic22
`
`2009011987
`
`A catalogue record for this book is available from the British Library.
`
`ISBN 978-0-470-31983-3 (H/B)
`
`'fypeset in l l/l3pt Times by Thomson Digital, Noida, India.
`Printed and bound in Great Britain by CPI Antony Rowe, Chippenham, Wiltshire.
`
`

`

`Contents
`
`About the Author
`Foreword
`Preface
`Acknowledgements
`
`1 Introduction
`1.1 The Need for, and Consumers of, Sound Capture and
`Audio Processing Algorithms
`1.2 Typical Sound Capture System
`1.3 The Goal of this Book and its Target Audience
`1.4 Prerequisites
`1.5 Book Structure
`1.6 Exercises
`
`2 Basics
`2.1 Noise: Definition, Modeling, Properties
`2.1.1 Statistical Properties
`2.1.2 Spectral Properties
`2.1.3 Temporal Properties
`2.1.4 Spatial Characteristics
`2.2 Sigrial: Definition, Modeling, Properties
`2.2.1 Statistical Properties
`2.2.2 Spectral Properties
`2.2.3 Temporal Properties
`2.2.4 Spatial Characteristics
`2.3 Classification: Suppression, Cancellation, Enhancement
`2.3.1 Noise Suppression
`2.3.2 Noise Cancellation
`2.3.3 Active Noise Cancellation
`2.3.4 De-reverberation
`2.3.5 Speech Enhancement
`2.3.6 Acoustic Echo Reduction
`2.4 Sampling and Quantization
`2.4.1 Sampling Process and Sampling Theorem
`
`xv
`xvu
`XIX
`xxi
`
`1
`
`1
`2
`3
`4
`4
`5 ·
`
`7
`7
`7
`9
`11
`11
`12
`13
`16
`17
`18
`19
`19
`20
`20
`21
`21
`21
`23
`23
`
`

`

`viii
`
`Contents
`
`Contents
`
`2.4.2 Quantization
`2.4.3 Signal Reconstruction
`2.4.4 Errors During Real Discretization
`2.4.4.1 Discretization with a Non-ideal Sampling Function
`2.4.4.2 Sampling with Averaging
`2.4.4.3 Sampling Signals with Finite Duration
`2.5 Audio Processing in the Frequency Domain
`2.5.1 Processing in the Frequency Domain
`2.5.2 Properties of the Frequency Domain Representation
`2.5.3 Discrete Fourier Transformation
`2.5.4 Short-time Transformation, and Weighting
`2.5.5 Overlap-add Process
`2.5.6 Spectrogram: Time-Frequency Representation of the Signal
`2.5.7 Other Methods for Transformation to the Frequency Domain
`2.5. 7 .1 Lapped Transformations
`2.5.7.2 Cepstral Analysis
`2.6 Bandwidth Limiting
`2.7 Signal-to-Noise-Ratio: Definition and Measurement
`2.8 Subjective Quality Measurement
`2.9 Other Methods for Quality and Enhancement Measurement
`2.10 Summary
`Bibliography
`
`3 Sound and Sound Capturing Devices
`3.1
`Sound and Sound Propagation
`3.1.1 Sound as a Longitudinal Mechanical Wave
`3.1.2 Frequency of the Sound Wave
`3.1.3 Speed of Sound
`3.1.4 Wavelength
`3.1.5 Sound Wave Parameters
`3.1.5.1
`Intensity
`3.1.5.2 Sound Pressure Level
`3.1.5.3 Power
`3.1.5.4 Sound Attenuation
`3.1.6 Huygens' Principle, Diffraction, and Reflection
`3.1.7 Doppler Effect
`3.1.8 Weighting Curves and Measuring Sound Pressure Levels
`3.2 Microphones
`3.2.1 Definition
`3.2.2 Microphone Classification by Conversion 'Type
`3.3 Omnidirectional and Pressure Gradient Microphones
`3.3.1 Pressure Microphone
`3.3.2 Pressure-gradient Microphone
`3.4 Parameter Definitions
`3.4.1 Microphone Sensitivity
`3.4.2 Microphone Noise and Output SNR
`
`25
`27
`29
`29
`30
`31
`32
`32
`33
`35
`36
`37
`40
`42
`42
`43
`45
`48
`49
`50
`52
`53
`
`55
`55
`55
`56
`58
`60
`61
`61
`61
`62
`63
`63
`65
`66
`68
`68
`69
`70
`70
`71
`73
`73
`74
`
`3.4.3 Directivity Pattern
`3.4.4 Frequency Response
`3.4.5 Directivity Index
`3.4.6 Ambient Noise Suppression
`3.4.7 Additional Electrical Parameters
`3.4.8 Manufacturing Tolerances
`3.5 First-order Directional Microphones
`3.6 Noise-canceling Microphones and the Proximity Effect
`3. 7 Measurement of Microphone Parameters
`3.7.1 Sensitivity
`3.7 .2 Directivity Pattern
`3.7.3 Self Noise
`3.8 Microphone Models
`3.9 Summary
`Bibliography
`
`4 Single-channel Noise Reduction
`4.1 Noise Suppression as a Signal Estimation Problem
`4.2 Suppression Rules
`4.2.1 Noise Suppression as Gain-based Processing
`4.2.2 Definition of A-Priori and A-Posteriori SNRs
`4.2.3 Wiener Suppression Rule
`4.2.4 Artifacts and Distortions
`4.2.5 Spectral Subtraction Rule
`4.2.6 Maximum-likelihood Suppression Rule
`4.2.7 Ephraim and Malah Short-term MMSE Suppression Rule
`4.2.8 Ephraim and Malah Short-term Log-MMSE Suppression Rule
`4.2.9 More Efficient Solutions
`4.2.10 Exploring Other Probability Distributions of the Speech Signal
`4.2.11 Probability-based Suppression Rules
`4.2.12 Comparison of the Suppression Rules
`4.3 Uncertain Presence of the Speech Signal
`4.3.1 Voice Activity Detectors
`4.3.1.1 ROC Curves
`4.3.1.2 Simple VAD with Dual-time-constant Integrator
`4.3.1.3 Statistical-model-based VAD with Likelihood Ratio Test
`4.3.1.4 VAD with Floating Threshold and Hangover
`Scheme with State Machine
`4.3.2. Modified Suppression Rule
`4.3.3. Presence Probability Estimators
`4.4 Estimation of the Signal and Noise Parameters
`4.4.1 Noise Models: Updating and Statistical Parameters
`4.4.2 A-Priori SNR Estimation
`4.5 Architecture of a Noise Suppressor
`4.6 Optimizing the Entire System
`4.7 Specialized Noise-reduction Systems
`
`..
`
`ix
`
`74
`75
`75
`77
`77
`78
`82
`84
`87
`87
`87
`90
`92
`92
`93
`
`95
`96
`96
`96
`97
`98
`99
`100
`100
`102
`103
`103
`105
`108
`lll
`ll5
`ll5
`ll6
`ll8
`122
`
`123
`124
`126
`126
`126
`127
`130
`137
`139
`
`

`

`X
`
`Contents
`
`Contents
`
`4.7.1 Adaptive Noise Cancellation
`4.7.2 Psychoacoustic Noise Suppression
`4.7.2.1 Human Hearing Organ
`4.7.2.2 Loudness
`4.7.2.3 Masking Effects
`4.7.2.4 Perceptually Balanced Noise Suppressors
`4.7.3 Suppression of Predictable Components
`4.7.4 Noise Suppression Based on Speech Modeling
`4.8 Practical Tips and Tricks for Noise Suppression
`4.8.1 Model Initialization and Tracking
`4.8.2 Averaging in the Frequency Domain
`4.8.3 Limiting
`4.8.4 Minimal Gain
`4.8.5 Overflow and Underflow
`4.8.6 Dealing with High Signal-to-Noise Ratios
`4.8.7 Fast Real-time Implementation
`4.9 Summary
`Bibliography
`
`5 Sound Capture with Microphone Arrays
`5.1 Definitions and Types of Microphone Array
`5.1.1 Transducer Arrays and their Applications
`5.1.2 Specifics of Array Processing for Audio Applications
`5.1.3 Types of Microphone Arrays
`5.1.3.1 Linear Microphone Arrays
`5.1.3.2 Circular Microphone Arrays
`5.1.3.3 Planar Microphone Arrays
`5.1.3.4 Volumetric (3D) Microphone Arrays
`5.1.3.5 Specialized Microphone Arrays
`5.2 The Sound Capture Model and Beamforming
`5.2.1 Coordinate System
`5.2.2 Sound Propagation and Capture
`5.2.2.1 Near-field Model
`5.2.2.2 Far-field Model
`5.2.3 Spatial Aliasing and Ambiguity
`5.2.4 Spatial Correlation of the Microphone Signals
`5.2.5 Delay-and-Sum Beamformer
`5.2.6 Generalized Filter-and-Sum Beamformer
`5.3 Terminology and Parameter Definitions
`5.3.1 Terminology
`5.3.2 Directivity Pattern and Directivity Index
`5.3.3 Beam Width
`5.3.4 Array Gain
`5.3.5 Uncorrelated Noise Gain
`5.3.6 Ambient Noise Gain
`5.3.7 Total Noise Gain
`
`139
`142
`142
`143
`144
`149
`150
`157
`158
`158
`159
`159
`159
`160
`160
`161
`161
`162
`
`165
`165
`165
`169
`171
`171
`172
`173
`173
`174
`174
`174
`176
`176
`177
`178
`181
`182
`187
`188
`188
`190
`192
`193
`194
`194
`195
`
`IDOA Space Definition
`5.3.8
`5.3.9 Beamformer Design Goal and Constraints
`5.4 Time-invariant Beamformers
`5.4.1 MVDR Beamformer
`5.4.2 More Realistic Design - Adding the Microphone Self Noise
`5.4.3 Other Criteria for Optimality
`5.4.4 Beam Pattern Synthesis
`5.4.4.1 Beam Pattern Synthesis with the Cosine Function
`5.4.4.2 Beam Pattern Synthesis with
`Dolph-Chebyshev Polynomials
`5.4.4.3 Practical Use of Beam Pattern Synthesis
`5.4.5 Beam Width Optimization
`5.4.6 Beamformer with Direct Optimization
`5.5 Channel Mismatch and Handling
`5.5.1 Reasons for Channel Mismatch
`5.5.2 How Manufacturing Tolerances Affect the Beamformer
`5.5.3 Calibration and Self-calibration Algorithms
`5.5.3.1 Classification of Calibration Algorithms
`5.5.3.2 Gain Self-calibration Algorithms
`5.5.3.3 Phase Self-calibration Algorithm
`5.5.3.4 Self-calibration Algorithms - Practical Use
`5.5.4 Designs Robust to Manufacturing Tolerances
`5.5.4.1 Tolerances as Uncorrelated Noise
`5.5.4.2 Cost Functions and Optimization Goals
`5.5.4.3 MVDR Beamformer Robust to
`Manufacturing Tolerances
`5.5.4.4 Beamformer with Direct Optimization
`Robust to Manufacturing Tolerances
`5.5.4.5 Balanced Design for Handling the
`Manufacturing Tolerances
`5.6 Adaptive Beamformers
`5.6.1 MVDR and MPDR Adaptive Beamformers
`5.6.2 LMS Adaptive Beamformers
`5.6.2.1 Widrow Beamformer
`5.6.2.2 Frost Beamformer
`5.6.3 Generalized Side-lobe Canceller
`5.6.3.1 Griffiths-Jim Beamformer
`5.6.3 .2 Robust Generalized Side-lobe Canceller
`5.6.4 Adaptive Algorithms for Microphone Arrays - Summary
`5.7 Microphone-array Post-processors
`5.7.1 Multimicrophone MMSE Estimator
`5.7.2 Post-processor Based on Power Densities Estimation
`5.7.3 Post-processor Based on Noise-field Coherence
`5.7.4 Spatial Suppression and Filtering in the IDOA Space
`5.7.4.1 Spatial Noise Suppression
`5.7.4.2 Spatial Filtering
`
`xi
`
`195
`197
`198
`198
`201
`202
`203
`203
`
`205
`207
`207
`210
`213
`213
`215
`218
`218
`219
`222
`222
`223
`223
`224
`
`225
`
`225
`
`230
`231
`231
`231
`232
`232
`233
`233
`235
`236
`236
`237
`238
`240
`241
`242
`244
`
`

`

`xii
`
`Contents
`
`Contents
`
`5.7.4.3 Spatial Filter in Side-lobe Canceller Scheme
`5.7.4.4 Combination with LMS Adaptive Filter
`5.8 Specific Algorithms for Small Microphone Arrays
`5.8.1 Linear Beamforming Using the Directivity of the Microphones
`5.8.2 Spatial Suppressor Using Microphone Directivity
`5.8.2.1 Time-invariant Linear Beamformers
`5.8.2.2 Feature Extraction and Statistical Models
`5.8.2.3 Probability Estimation and Features Fusion
`5.8.2.4 Estimation of Optimal Time-invariant Parameters
`5.9 Summary
`Bibliography
`
`6 Sound Source Localization and Tracking with Microphone Arrays
`6.1 Sound Source Localization
`6.1.1 Goal of Sound Source Localization
`6.1.2 Major Scenarios
`6.1.3 Performance Limitations
`6.1.4 How Humans and Animals Localize Sounds
`6.1.5 Anatomy of a Sound Source Localizer
`6.1.6 Evaluation of Sound Source Localizers
`6.2 Sound Source Localization from a Single Frame
`6.2.1 Methods Based on Time Delay Estimation
`6.2.1.1 Time Delay Estimation for One Pair of Microphones
`6.2.1.2 Combining the Pairs
`6.2.2 Methods Based on Steered-response Power
`6.2.2.1 Conventional Steered-response Power Algorithms
`6.2.2.2 Weighted Steered-response Power Algorithm
`6.2.2.3 Maximum-likelihood Algorithm
`6.2.2.4 MUSIC Algorithm
`6.2.2.5 Combining the Bins
`6.2.2.6 Comparison of the Steered-response Power Algorithms
`6.2.2.7 Particle Filters
`6.3 Post-processing Algorithms
`6.3.1 Purpose
`6.3.2 Simple Clustering
`6.3.2.1 Grouping the Measurements
`6.3.2.2 Determining the Number of Cluster Candidates
`6.3.2.3 Averaging the Measurements in Each Cluster Candidate
`6.3.2.4 Reduction of the Potential Sound Sources
`6.3.3 Localization and Tracking of Multiple Sound Sources
`6.3.3.1 k-Means Clustering
`6.3.3.2 Fuzzy C-means Clustering
`6.3.3.3 Tracking the Dynamics
`6.4 Practical Approaches and Tips
`6.4.1 Increasing the Resolution of Time-delay Estimates
`6.4.2 Practical Alternatives for Finding the Peaks
`
`247
`248
`250
`251
`254
`255
`256
`258
`258
`260
`261
`
`263
`263
`263
`264
`266
`266
`270
`271
`272
`272
`272
`278
`280
`281
`281
`282
`282
`284
`285
`286
`291
`291
`294
`294
`294
`295
`296
`296
`297
`298
`299
`300
`300
`301
`
`6.4.3 Peak Selection and Weighting
`6.4.4 Assigning Confidence Levels and Precision
`6.5 Summary
`Bibliography
`
`7 Acoustic Echo-reduction Systems
`7.1 General Principles and Terminology
`7.1.1 Problem Description
`7.1.2 Acoustic Echo Cancellation
`7.1.3 Acoustic Echo Suppression
`7.1.4 Evaluation Parameters
`7.2 LMS Solution for Acoustic Echo Cancellation
`7.3 NLMS and RLS Algorithms
`7.4 Double-talk Detectors
`7.4.1 Principle and Evaluation
`7.4.2 Geigel Algorithm
`7.4.3 Cross-correlation Algorithms
`7.4.4 Coherence Algorithms
`7.5 Non-linear Acoustic Echo Cancellation
`7.5.1 Non-linear Distortions
`7.5.2 Non-linear ABC with Adaptive Volterra Filters
`7.5.3 Non-linear ABC Using Orthogonalized Power Filters
`7.5.4 Non-linear ABC in the Frequency Domain
`7.6 Acoustic Echo Suppression
`7.6.1 Estimation of the Residual Energy
`7.6.2 Suppressing the Echo Residual
`7.7 Multichannel Acoustic Echo Reduction
`7.7.1 The Non-uniquenes Problem
`7.7.2 Tracking the Changes
`7.7.3 Decorrelation of the Channels
`7.7.4 Multichannel Acoustic Echo Suppression
`7.7.5 Reducing the Degrees of Freedom
`7.8 Practical Aspects of the Acoustic Echo-reduction Systems
`7.8.1 Shadow Filters
`7.8.2 Center Clipper
`7.8.3 Feedback Prevention
`7.8.4 Tracking the Clock Drifts
`7.8.5 Putting Them All Together
`7.9 Summary
`Bibliography
`
`8 De-reverberation
`8.1 Reverberation and Modeling
`8.1.1 Reverberation Effect
`8.1.2 How Reverberation Affects Humans
`8.1.3 Reverberation and Speech Recognition
`
`.
`
`xiii
`
`301
`302
`303
`304
`
`307
`307
`307
`309
`311
`312
`313
`315
`316
`316
`317
`317
`319
`320
`320
`321
`322
`323
`323
`323
`325
`327
`327
`329
`329
`330
`331
`334
`334
`334
`335
`335
`336
`337
`338
`
`341
`341
`341
`345
`347
`
`

`

`

`

`Foreword
`
`Just a couple of decades ago we would think of "sound capture and
`processing" as the problems of designing microphones for converting
`sounds from the real world into electrical signals, as well as amplifying,
`editing, recording, and transmitting such signals, mostly using analog
`hardware technologies. That's because our intended applications were
`mostly analog telephony, broadcasting, and voice and music recording.
`We have come a long way: small digital audio players have replaced bulky
`portable cassette tape players, and people make voice calls mostly via
`digital mobile phones and voice communication software in their com(cid:173)
`puters. Thanks to the evolution of digital signal processing technologies, we now focus mostly
`on processing sounds not as analog electrical signals, but rather as digital files or data streams in
`a computer or digital device. We can do a lot more with digital sound processing, such as
`transcribe speech into text, identify persons speaking, recognize music from humming, remove·
`noises much more efficiently, add special effects, and so much more. Thus, today we think of
`sound capture as the problem of digitally processing the signals captured by microphones so as
`to improve their quality for best performance in digital communications, broadcasting,
`recording, recognition, classification, and other applications.
`This book by Ivan Tashev provides a comprehensive yet concise overview of the funda(cid:173)
`mental problems and core signal processing algorithms for digital sound capture, including
`ambient noise reduction, acoustic echo cancellation, and reduction of reverberation. After
`introducing the necessary basic aspects of digital audio signal processing, the book presents
`basic physical properties of sound and propagation of sound waves, as well as a review of
`microphone technologies, providing the reader with a strong understanding of key aspects of
`digitized sounds. The book dLscusses the fundamental problems of noise reduction, which are
`usually solved via techniques based on statistical models of the signals of interest (typically
`voice) and of interfering signals. An important discussion of properties of the human auditory
`system is also presented; auditory models can play a very important role in algorithms for
`enhancing audio signals in communication and recording/playback applications, where the
`final destination is the human ear.
`Microphone arrays have become increasingly important in the past decade or so. Thanks to
`the rapid evolution and reduction in cost of analog and digital electronics in recent years, it is
`inexpensive to capture sound through several channels, using an array of microphones. That
`opens new opportunities for improving sound capture, such as detecting the direction of
`incoming sounds and applying spatial filtering techniques. The book includes two excellent
`
`

`

`xviii
`
`Foreword
`
`chapters whose coverage goes from the basics of microphone array configurations and delay(cid:173)
`and-sum beamforming, to modem sophisticated algorithms for high-performance multichan(cid:173)
`nel signal enhancement.
`Acoustic echoes and reverberation are the two most important kinds of signal degradations in
`many sound capture scenarios. If you 're a professional singer, you probably don't mind holding
`a microphone or wearing a headset with a microphone close to your mouth, but most of us prefer
`microphones to be invisible, far away from our mouths. That means microphone will capture
`not only our own voices, but also reverberation components because of sound reflections from
`nearby walls, as well as echoes of signals that are being played back from loudspeakers.
`Removing such undesirable artifacts presents significant technical challenges, which are well
`addressed in the final two chapters, which present modern algorithms for tackling them.
`A key quality of this book is that it presents not only fundamental theoretical analyses,
`models, and algorithms, but it also considers many practical aspects that are very important for
`the design of real-world engineering solutions to sound capture problems. Thus, this book
`should be of great appeal to both students and engineers.
`I have had the pleasure of working with Ivan on research and development of sound capture
`systems and algorithms. His enthusiasm, deep engineering and mathematical knowledge, and
`pragmatic approaches were all contagious. His work has had significant practical impact, for
`example the introduction of multichannel sound capture and processing modules in the
`Microsoft Windows operating system. I have learned a considerable amount about sound
`capturing and processing from my interactions with Ivan, and I am sure you will, as well, by
`reading this book. Enjoy!
`
`Henrique Malvar
`Managing Director
`Microsoft Research
`Redmond Laboratory
`
`Preface
`
`Capturing and processing sounds is critical in mobile and handheld devices, communication
`systems, and computers using automatic speech recognition. Devices and technologies for
`proper conversion of sounds to electric signals and removing unwanted parts, such as noise and
`reverberation, have been used since the first telephones. They evolved, becoming more and
`more complex. In many cases the existing algorithms exceed the abilities of typical processors
`in these devices and computers to provide real-time processing of the captured signal.
`This book will discuss the basic principles for building an audio processing stack, sound
`capturing devices, single-channel speech-enhancement algorithms, and microphone arrays for
`sound capture and sound source localization. Further, algorithms will be described for acoustic
`echo cancellation and de-reverberation - building blocks of a sound capture and processing
`stack for telecommunication and speech recognition. Wherever possible the various algorithms
`are discussed in the order of their development and publication. In all cases the aim is to try to
`give the larger picture - where the technology came from, what worked and what had to be ·
`adapted for the needs of audio processing. This gives a better perspective for further
`development of new audio signal processing algorithms.
`Even the best equations and signal processing algorithms are not worth anything before
`being implemented and verified by processing of real data. That is why, in this book, stress is
`placed on experimenting with recorded sounds and implementation of the algorithms. In
`practice, frequently a simpler model with fewer parameters to estimate works better than a more
`precise but more complex model with a larger number of parameters. With the latter one has
`either to sacrifice estimation precision or to increase the estimation time. This balance of
`simplicity, precision, and reaction time is critical for real-time systems, where on top of
`everything we have to watch out for parameters such as latency, consumed memory, and CPU
`time.
`Most of the algorithms and approaches described in this book are based on statistical models.
`In mathematics, a single example cannot prove but can disprove a theorem. In statistical signal
`processing, a single example is ... just a sample. What matters is careful evaluation of the
`algorithms with a good corpus of speech or audio signals, distributed in their signal-to-noise
`ratios, type of noise, and other parameters - as close as possible to the real problem we are trying
`to solve.
`The solution of practically any signal processing problem can be improved by tuning the
`parameters of the algorithm, provided we have a proper criterion for optimality. There are
`always adaptation time constants, thresholds, which cannot be estimated and their values
`have to be adjusted experimentally. The mathematical models and solutions we use are usually
`
`

`

`xx
`
`Preface
`
`optimal in one or another way. If they reflect properly the nature of the process they model, then
`we have a good solution and the results are satisfactory. In all cases it is important to remember
`that we do not want a "minimum mean-square error solution," or a "maximum-likelihood
`solution," or even a "log minimum mean-square error solution." We do not want to improve
`the signal-to-noise ratio. What we want is for listeners to perceive the sound quality of the
`processed signal as better - improved - compared to the input signal. From this perspective,
`the final judge of how good is an algorithm is the human ear, so use it to verify the solution.
`Hearing is an important sense for humans and animals. In many places in this book are provided
`examples of how humans and animals hear and localize sounds - this explains better some
`signal processing approaches and brings biology-inspired designs for sound capture and
`processing systems.
`In many cases the signal processing chain consists of several algorithms for sound capture
`and speech enhancement. The practice shows us that a sequence of separately optimized
`algorithms usually provides suboptimal results. Tuning and optimization of the designed sound
`capturing system end-to-end is a must if we want to achieve best results.
`For further information please visit http://www.wiley.com/go/tashev sound
`
`Ivan Tashev
`Redmond, WA
`USA
`
`Acknowledgements
`
`I want to thank the Book Program in MathWorks and especially Dee Savageau, Naomi
`Fernandes, and Meg Vulliez for the help and responsiveness. The MATLAB® scripts, part of
`this book, were tested with MATLAB® R2007a, provided as part of this program.
`I am grateful to my colleagues from Microsoft Research Alex Acero, Amitav Das, Li Deng,
`Dinei Florencio, Cormac Herley, Zicheng Liu, Mike Seltzer, and Cha Zhang. They read the
`chapters of this book and provided valuable feedback.
`And last, but not least, I want to express my great pleasure working with the nice and helpful
`people from John Wiley & Sons, Ltd. During the long process from proposal, through writing,
`copyediting, and finalizing the book with all the details, they were always professional,
`understanding, and ready to suggest the right solution. I was lucky enough to work with Tiina
`Ruonamaa, Sarah Hinton, Sarah Tilley, and Catlin Flint - thank you all for everything you did
`during the process of writing this book!
`
`

`

`4
`
`Single-channel Noise Reduction
`
`This chapter deals with noise reduction of a single channel. We assume that we have a
`mixture of a useful signal, usually human speech, and an unwanted signal - which we
`call noise. The goal of this type of processing is to provide an estimate of the useful
`signal - an enhanced signal with better properties and characteristics.
`The problem with a noisy speech signal is that a human listener can understand a
`lower percentage of the spoken words. In addition, this understanding requires more
`mental effort on the part of the listener. This means that the listener can quickly lose
`attention - an unwanted outcome during meetings over a noisy telephone line, for
`example. If the noisy signal is sent to a speech recognition engine, the noise reduces the
`recognition rate as it masks speech features important for the recognizer.
`With noise-reduction algorithms, as with most other signal processing algorithms,
`there are multiple trade-offs. One is between better reduction of the unwanted noise
`signal and introduction of undesired effects - additional signals and distortions in the
`wanted speech signal. From this perspective, while improvement in the signal-to-noise
`ratio (SNR) remains the main evaluation criterion of the efficiency of these algorithms,
`subjective listening tests or objective sound quality evaluations are also important. The
`perfect noise-reduction algorithm will make the main speaker's voice more under(cid:173)
`standable so that it seems to stand out, while preserving relevant background noise
`(train station, party sounds, and so on). Such an algorithm should not introduce
`noticeable dist011ions in either foreground (wanted speech) or background (unwanted
`noise) signals.
`Most single-channel algorithms are based on building statistical models of the
`speech and noise signals. In this chapter we wilJ look at the commonly used approaches
`for suppression of noise, the algorithms to distinguish between noise and voice (called
`"voice activity detectors"), and some adaptive noise-canceling algorithms. Exercises
`with implementation of some of these algorithms will be provided for better
`understanding of the processes inside the noise suppressors.
`
`Sound Capture and ProceJ·sing
`© 2009 John Wiley & Sons, Ltd
`
`Ivan J. Tashev
`
`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`114
`
`Sound Capture and Processing
`
`Single-channel Noise Reduction
`
`115
`
`probabilistic rules. The maximum-likelihood suppression rule is definitely worst in
`this sense.
`From the LSD perspective, the front runners are MMSE and log-MMSE (which is
`optimal in the log-MMSE sense). Good results are shown by the entire group of
`efficient alternatives. Note that Wiener and probabilistic rules are worse from this
`perspective, which means that they do not deal well with low levels of noise and speech.
`The best average SNR improvement definitely has Wiener and probabilistic
`rnles, followed by the efficient alternatives and spectral subtraction. The maximum(cid:173)
`likelihood rule, as expected, has the lowest improvement in SNR. It is outperformed
`by the approximate Wiener suppression rule.
`The highest MOS score and the best sound is achieved by log-MMSE and MAP
`SAE, followed closely by the group of efficient alternatives. The maximum-likelihood
`suppression rule sounds worse owing to a substantial amount of noise.
`Figure 4.9 shows the relationship between the average improvement of SNR and the
`average MOS score - the last two columns in Table 4.2. It is clear that, to a certain
`degree, the noise suppression helps, and the signals with more suppressed noise achieve
`better perceptual sound quality. Enforcing the noise suppression further actually
`decreases the sound quality, regardless of the better SNR. This is good evidence that,
`when evaluating noi e-suppressing algorithms, improvement in the SNR should not
`be used as the only criterion, and even not as a main evaluation criterion. IBtimately the
`goal of this type of speech enhancement is to make the output signal sound better for the
`human listener. From this perspective, the MOS is a much better criterion. When
`targeting speech recognition, the best criterion is, of course, the highest recognition rate.
`
`certain errors. Thus it will be important how robust each one of these suppression rules
`is to those errors.
`
`EXERCISE
`
`Look at the MATLAB script SuppressionRule.m which returns the suppression rule
`values for the given vectors of a-priori and a-posteriori SNRs:
`
`Gain= SuppressionRule (gamma , xi , SuppressionType)
`
`The argument Supp ression Typ e is a numberfrom0to9 and determines which
`suppression rule is to be used. The script contains implementation of most of the
`suppression rules discussed so far. Finish the implementation of the rest of the
`suppression rules.
`Write a MATLAB script that computes the suppression rules as a function of the a(cid:173)
`priori and a-posteriori SNRs in the range of ±30 dB. Limit the gain values in the range
`from - 40d.B to +20d.B and plot the rules in three dimensions using the mesh
`function.
`
`4.3 Uncertain Presence of the Speech Signal
`
`All the suppression rules discussed above were derived under the assumption of the
`presence of both noise and speech signals. The speech signal, however, is not always
`presented in the short-term spectral representations. Even continuous speech has
`pauses with durations of 100- 200 ms-which, compared with the typical frame sizes of
`10-40 ms, means that there will be a substantial number of audio frames without a
`speech signal at all. Trying to estimate the speech signal in these frames leads to
`di tortions and musical noise .
`Classification of audio frames into "noise only" and "contains some speech" is in
`general a detection and estimation problem [10]. Stable and reliable work of the voice
`activity detector (VAD) is critical for achieving good noise-suppression results. Frame
`classification is used further to build statistical models of the noise and speech signals,
`so it leads to modification of the suppression rule as well.
`
`4.3.J Voice Activity Detectors
`
`Voice activity detectors are algorithms for detecting the presence of speech in a
`mixed signal consisting of speech plus noise. They can vary from a simple binary
`decision (yes/no) for the entire audio frame to precise estimators of the speech
`presence probability for each frequency bin. Most modern noise-suppression
`systems contain at least one VAD, in many cases two or more. The commonly
`used algorithms base their decision on the assumption of a quasi-stationary noise;
`
`4.4
`
`4.2
`
`QI 4.0
`
`..

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket