`
`TAMPERE UNIVERSITY OF TECHNOLOGY
`
`Julkaisu 794 * Publication 794
`
`Pasi Pertila
`
`Acoustic Source Localization in a Room Environment
`
`and at Moderate Distances
`
`
`
`Tampere 2009
`Page 1 of 136
`
`
`
`SONOS EXHIBIT1015
`
`Page 1 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`Tampereen teknillinen yliopisto. Julkaisu 794
`Tampere University of Technology. Publication 794
`
`Pasi Pertilä
`
`Acoustic Source Localization in a Room Environment
`and at Moderate Distances
`Thesis for the degree of Doctor of Technology to be presented with due permission for
`public examination and criticism in Tietotalo Building, Auditorium TB222, at Tampere
`University of Technology, on the 30th of January 2009, at 12 noon.
`
`Tampereen teknillinen yliopisto - Tampere University of Technology
`Tampere 2009
`
`Page 2 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`ISBN 978-952-15-2106-5 (printed)
`ISBN 978-952-15-2137-9 (PDF)
`ISSN 1459-2045
`
`Page 3 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`Abstract
`
`The pressure changes of an acoustic wavefront are sensed with a microphone
`
`that acts as a transducer, converting sound pressure into voltage. The volt-
`age is then converted into digital form with an analog to digital (AD) -converter to
`provide a discrete time quantized digital signal. This thesis discusses methods to
`estimate the location of a sound source from the signals of multiple microphones.
`Acoustic source localization (ASL) can be used to locate talkers, which is
`useful for speech communication systems such as teleconferencing and hearing
`aids. Active localization methods receive and send energy, whereas passive meth-
`ods only receive energy. The discussed ASL methods are passive which makes
`them attractive for surveillance applications, such as localization of vehicles and
`monitoring of areas. This thesis focuses on ASL in a room environment and
`at moderate distances that are often present in outdoor applications. The fre-
`quency range of many commonly occurring sounds such as speech, vehicles, and
`jet aircraft is large. Time delay estimation (TDE) methods are suitable for es-
`timating properties from such wideband signals. Since TDE methods have been
`extensively studied, the theory is attractive to apply in localization.
`Time difference of arrival (TDOA) -based methods estimate the source lo-
`cation from measured TDOA values between microphones. These methods are
`computationally attractive but deteriorate rapidly when the TDOA estimates are
`no longer directly related to the source position. In a room environment such
`conditions could be faced when reverberation or noise starts to dominate TDOA
`estimation.
`The combination of microphone pairwise TDE measurements is studied as a
`more robust localization solution. TDE measurements are combined into a spa-
`tial likelihood function (SLF) of source position. A sequential Bayesian method
`known as particle filtering (PF) is used to estimate the source position. The PF
`based localization accuracy increases when the variance of SLF decreases. Results
`from simulations and real-data show that multiplication (intersection operation)
`results in a SLF with smaller variance than the typically applied summation
`(union operation).
`
`i
`
`Page 4 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`The above localization methods assume that the source is located in the near-
`field of the microphone array, i.e., the source emitted wavefront curvature is
`observable. In the far-field, the source wavefront is assumed planar and local-
`ization is considered by using spatially separated direction observations. The
`direction of arrival (DOA) of a source emitted wavefront impinging on a micro-
`phone array is traditionally estimated by steering the array to a direction that
`maximizes the steered response power. Such estimates can be deteriorated by
`noise and reverberation. Therefore, talker localization is considered using DOA
`discrimination.
`The sound propagation delay from the source to the microphone array be-
`comes significant at moderate distances. As a result, the directional observations
`from a moving sound source point behind the true source position. Omitting the
`propagation delay results in a biased location estimate of a moving or discontin-
`uously emitting source. To solve this problem the propagation delay is proposed
`to be modeled in the estimation process. Motivated by the robustness of localiza-
`tion using the combination of TDE measurements, source localization by directly
`combining the TDE-based array steered responses is considered. This extends
`the near-field talker localization methods to far-field source localization. The
`presented propagation delay modeling is then proposed for the steered response
`localization. The improvement in localization accuracy by including the propa-
`gation delay is studied using a simulated moving sound source in the atmosphere.
`The presented indoor localization methods have been evaluated in the Clas-
`sification of Events, Activities and Relationships (CLEAR) 2006 and CLEAR’07
`technology evaluations. In the evaluations, the performance of the proposed ASL
`methods was evaluated by a third party from several hours of annotated data.
`The data was gathered from meetings held in multiple smart rooms. According
`to the obtained results from CLEAR’07 development dataset (166 min) presented
`in this thesis, 92 % of speech activity in a meeting situation was located within
`17 cm accuracy.
`
`ii
`
`Page 5 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`Preface
`
`This thesis was compiled during my work at Tampere University of Technology
`
`(TUT) in the Department of Signal Processing. My research on direction
`of arrival (DOA) -based sound source localization is summarized in the latter
`part of this thesis. This topic was introduced to me by my supervisor, Professor
`Ari Visa. During the years 2007 – 2008 I worked with the time delay estimation
`(TDE) -based source localization problem. This topic is discussed in the first part
`of the thesis. The financial support of Tampere Graduate School in Information
`Science and Engineering (TISE) is acknowledged. I wish also to thank the Nokia
`Foundation and the Industrial Research Fund at TUT (Tuula and Yrjö Neuvo
`fund).
`I wish to acknowledge Tuomo Pirinen’s activity in organizing the spatial audio
`research in the Department of Signal Processing before me. I wish to express my
`gratitude towards my colleagues in the Audio Research Group (ARG) for creating
`an inspiring environment for working. I thank Teemu Korhonen for his insightful
`approaches and mathematical visions – this is also evident in the number of pa-
`pers we have co-authored. Thanks to Mikko Parviainen for contributing to the
`presented research and for being an active co-author in many of the included pub-
`lications. Thanks to Anssi Klapuri for his advice, and thanks to Matti Ryynänen
`for helping with LATEX formatting. Thanks to Jouni Paulus, Tuomas Virtanen,
`Marko Helén, Toni Mäkinen, Antti Löytynoja, Atte Virtanen, Sakari Tervo, Ju-
`uso Penttilä, Mikko Roininen, Elina Helander, Hanna Silén, Teemu Karjalainen,
`Konsta Koppinen, Toni Heittola, and Annamaria Mesaros.
`I thank my parents Heikki and Liisa, and my brother Esa for supporting me
`throughout my studies. Last, but not least, I would like to thank Minna for her
`kind support.
`
`Pasi Pertilä
`Tampere, January 2009
`
`iii
`
`Page 6 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`Contents
`
`List of Figures
`
`List of Tables
`
`List of Algorithms
`
`List of Terms, Symbols, and Mathematical Notations
`
`1 Introduction
`1.1 List of Included Publications . . . . . . . . . . . . . . . . . . . . .
`1.1.1 List of Supplemental Publications . . . . . . . . . . . . . .
`1.2 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . .
`1.2.1
`Sound Source . . . . . . . . . . . . . . . . . . . . . . . . .
`1.2.2
`Sound Propagation . . . . . . . . . . . . . . . . . . . . . .
`1.2.3 Measurement
`. . . . . . . . . . . . . . . . . . . . . . . . .
`1.2.4 Localization Algorithm . . . . . . . . . . . . . . . . . . . .
`1.3 Overview of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . .
`1.4 Author’s Contributions . . . . . . . . . . . . . . . . . . . . . . . .
`1.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2 Time Delay Estimation
`2.1 Signal Model
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`2.2 The Impulse Response Model
`. . . . . . . . . . . . . . . . . . . .
`2.3 Practical Measurement Environment
`. . . . . . . . . . . . . . . .
`2.4 Simulated Room Environment . . . . . . . . . . . . . . . . . . . .
`2.5 Time Difference of Arrival
`. . . . . . . . . . . . . . . . . . . . . .
`2.6 TDOA Estimation Methods . . . . . . . . . . . . . . . . . . . . .
`2.6.1 Generalized Cross Correlation . . . . . . . . . . . . . . . .
`2.6.2 Average Magnitude Difference Function . . . . . . . . . . .
`2.6.3 TDE Function . . . . . . . . . . . . . . . . . . . . . . . . .
`2.6.4 Adaptive TDOA Methods
`. . . . . . . . . . . . . . . . . .
`
`iv
`
`vii
`
`ix
`
`x
`
`xi
`
`1
`2
`3
`4
`5
`5
`7
`7
`9
`10
`11
`
`12
`12
`13
`16
`17
`20
`23
`23
`24
`25
`25
`
`Page 7 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`Source Model-Based TDOA Methods . . . . . . . . . . . .
`2.6.5
`2.6.6 TDOA Interpolation . . . . . . . . . . . . . . . . . . . . .
`2.7 TDOA Estimation Bounds . . . . . . . . . . . . . . . . . . . . . .
`2.7.1 CRLB of TDOA Estimation . . . . . . . . . . . . . . . . .
`2.7.2 Reverberant Systems . . . . . . . . . . . . . . . . . . . . .
`2.7.3
`SNR Threshold in Simulations . . . . . . . . . . . . . . . .
`2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`3 Time Delay Estimation -Based Localization Methods
`3.1 TDOA-Based Closed-Form Localization . . . . . . . . . . . . . . .
`3.1.1 Unconstrained LS Method . . . . . . . . . . . . . . . . . .
`3.1.2 Extended Unconstrained LS Method . . . . . . . . . . . .
`3.1.3 Pre-Multiplying Method . . . . . . . . . . . . . . . . . . .
`3.1.4 Constrained LS Method . . . . . . . . . . . . . . . . . . .
`3.1.5 Approximate LS Method . . . . . . . . . . . . . . . . . . .
`3.1.6 Two Step Closed-Form Weighted LS Method . . . . . . . .
`3.1.7 Weighted Constrained Least Squares Method . . . . . . .
`3.1.8 LS Solution for Source Position, Range, and Propagation
`Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`3.1.9 TDOA Maximum Likelihood Approach . . . . . . . . . . .
`3.2 Dilution of Precision . . . . . . . . . . . . . . . . . . . . . . . . .
`3.3 CRLB of TDOA Localization . . . . . . . . . . . . . . . . . . . .
`3.4 TDOA-Based Sequential Localization Methods . . . . . . . . . . .
`3.4.1
`State Estimation . . . . . . . . . . . . . . . . . . . . . . .
`3.5 TDE Function -Based Localization . . . . . . . . . . . . . . . . .
`3.5.1 Correlation Combination with Summation . . . . . . . . .
`3.5.2 Correlation Combination with Multiplication . . . . . . . .
`3.5.3 Correlation Combination with Hamacher T-norm . . . . .
`3.5.4
`Spatial Likelihood Function Variance . . . . . . . . . . . .
`3.5.5 TDE Likelihood Function Smoothing and Interpolation . .
`3.6 TDE Likelihood-Based Localization by Iteration . . . . . . . . . .
`3.7 TDE Likelihood-Based Localization with Sequential Bayesian Meth-
`ods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`3.7.1 Particle Filtering . . . . . . . . . . . . . . . . . . . . . . .
`3.8 Simulations
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`3.8.1
`Scoring Metrics . . . . . . . . . . . . . . . . . . . . . . . .
`3.8.2 Localization Methods . . . . . . . . . . . . . . . . . . . . .
`3.8.3
`Simulation Results and Discussion . . . . . . . . . . . . . .
`3.8.4 TDE Likelihood Combination and PF . . . . . . . . . . . .
`3.9 Results with Speech Data . . . . . . . . . . . . . . . . . . . . . .
`3.9.1 CLEAR’07 Dataset Description . . . . . . . . . . . . . . .
`3.9.2 Results with CLEAR’07 Dataset
`. . . . . . . . . . . . . .
`3.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`26
`27
`27
`27
`28
`29
`30
`
`31
`32
`33
`34
`35
`36
`37
`37
`38
`
`39
`39
`40
`42
`43
`43
`44
`46
`47
`49
`49
`50
`51
`
`52
`53
`55
`55
`55
`56
`58
`59
`59
`60
`61
`
`v
`
`Page 8 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`4 Direction of Arrival -Based Localization
`4.1 DOA-Based Localization Problem . . . . . . . . . . . . . . . . . .
`4.1.1 Bearings-Only Source Localization . . . . . . . . . . . . .
`4.2 DOA-Based Closed-Form Localization . . . . . . . . . . . . . . .
`4.3 Robust DOA-Based Localization . . . . . . . . . . . . . . . . . . .
`4.3.1
`Simulations . . . . . . . . . . . . . . . . . . . . . . . . . .
`4.3.2 Results with Speech Data . . . . . . . . . . . . . . . . . .
`4.4 DOA Vector-Based Localization Using Propagation Delay . . . . .
`4.4.1
`Simulation Results
`. . . . . . . . . . . . . . . . . . . . . .
`4.5 Localization Using TDE-Based Array Steered Responses
`. . . . .
`4.6 Sound Propagation Delay in Directional Steered Response Local-
`ization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`4.6.1
`Implementation Issues
`. . . . . . . . . . . . . . . . . . . .
`4.6.2
`Simulations . . . . . . . . . . . . . . . . . . . . . . . . . .
`4.6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`5 Conclusions, Discussion, and Future Work
`
`6 Errata
`
`Bibliography
`
`P1 Publication 1
`
`P2 Publication 2
`
`P3 Publication 3
`
`P4 Publication 4
`
`P5 Publication 5
`
`Appendix
`
`A Algorithm Descriptions
`
`B Simulation Setup
`
`C Simulation Results
`
`D Concepts Related to Random Processes
`
`vi
`
`63
`64
`64
`65
`67
`68
`69
`70
`73
`74
`
`77
`78
`79
`80
`81
`
`83
`
`85
`
`86
`
`99
`
`115
`
`121
`
`128
`
`134
`
`135
`
`141
`
`147
`
`148
`
`150
`
`Page 9 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`List of Figures
`
`1.1 Sound source localization process
`
`. . . . . . . . . . . . . . . . . .
`
`Image source concept . . . . . . . . . . . . . . . . . . . . . . . . .
`2.1
`2.2 Recording room floor plan . . . . . . . . . . . . . . . . . . . . . .
`2.3 Microphone locations inside recording room . . . . . . . . . . . .
`2.4
`Impulse response of recording room . . . . . . . . . . . . . . . . .
`2.5 Waveform and amplitude spectrum of a speech frame . . . . . . .
`2.6 Spectrograms of speech signal and babble . . . . . . . . . . . . . .
`2.7
`Illustration of simulation setup . . . . . . . . . . . . . . . . . . . .
`2.8 TDOA mapping into spatial coordinates
`. . . . . . . . . . . . . .
`2.9 Example TDE function . . . . . . . . . . . . . . . . . . . . . . . .
`2.10 The threshold effect of TDOA estimation . . . . . . . . . . . . . .
`2.11 Simulated effect of reverberation on cross correlation . . . . . . .
`
`3.1 Example of recording room dilution of precision (DOP) . . . . . .
`3.2 Example of microphone pairwise SLF . . . . . . . . . . . . . . . .
`3.3 Example SLF produced by SRP-PHAT . . . . . . . . . . . . . . .
`3.4 Example SLF produced by Multi-PHAT . . . . . . . . . . . . . .
`3.5 Marginal SLF from real-data recordings . . . . . . . . . . . . . . .
`3.6 Weighted distance error (WDE) values of SLFs built with different
`combination methods . . . . . . . . . . . . . . . . . . . . . . . . .
`3.7 RMS error of
`simulations
`for SRP-PHAT+PF and Multi-
`PHAT+PF methods
`. . . . . . . . . . . . . . . . . . . . . . . . .
`
`4
`
`14
`15
`16
`17
`18
`19
`20
`21
`26
`29
`30
`
`41
`45
`47
`48
`50
`
`51
`
`59
`
`65
`4.1 DOA-based source localization problem . . . . . . . . . . . . . . .
`69
`4.2 Simulation results with robust DOA-based localization . . . . . .
`70
`4.3 Space-time diagram . . . . . . . . . . . . . . . . . . . . . . . . . .
`71
`4.4 Source localization problem with propagation delay . . . . . . . .
`75
`4.5 Example of TDE-based DOA likelihood from microphone pair
`. .
`76
`4.6 TDE-based array steered response . . . . . . . . . . . . . . . . . .
`4.7 Example of spatial likelihood function using steered array responses 79
`
`vii
`
`Page 10 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`4.8 RMS localization error of propagation delay -based steered array
`response localization . . . . . . . . . . . . . . . . . . . . . . . . .
`4.9 Example of estimated source trajectory with and without propa-
`gation delay modeling
`. . . . . . . . . . . . . . . . . . . . . . . .
`
`80
`
`81
`
`viii
`
`Page 11 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`List of Tables
`
`2.1 Recording room microphone locations . . . . . . . . . . . . . . . .
`2.2 Reverberation time values in simulations . . . . . . . . . . . . . .
`
`3.1 Simulation localization results for ML-TDOA . . . . . . . . . . .
`3.2 Simulation localization results for Multi-PHAT using particle fil-
`tering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`3.3 Real-data results with CLEAR’07 database . . . . . . . . . . . . .
`
`4.1 Robust DOA-based simulation setup . . . . . . . . . . . . . . . .
`
`16
`19
`
`57
`
`58
`61
`
`68
`
`B.1 Microphone coordinates.
`
`. . . . . . . . . . . . . . . . . . . . . . . 147
`
`C.1 Accuracy of ML-TDOA localization in simulations . . . . . . . . . 148
`C.2 Accuracy of Multi-PHAT + PF localization in simulations
`. . . . 149
`
`ix
`
`Page 12 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`List of Algorithms
`
`1
`2
`3
`4
`5
`6
`
`. . . . . . . . . . . . . 141
`SIR algorithm for particle filtering [Aru02].
`The systematic resampling algorithm [Aru02]. . . . . . . . . . . . . 142
`ADC method for Speaker localization [P2].
`. . . . . . . . . . . . . 143
`DOA vector-based localization with propagation delay [P3].
`. . . . 144
`TDE-based directional likelihood for far-field source localization.
`. 145
`TDE-based directional likelihood for far-field source localization
`with propagation delay according to [P5].
`. . . . . . . . . . . . . . 146
`
`x
`
`Page 13 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`List of Terms, Symbols, and
`Mathematical Notations
`
`Terms and Acronyms
`
`Term or acronym Explanation
`AED
`Adaptive Eigenvalue Decomposition
`AMDF
`Absolute Magnitude Difference Function
`AMSF
`Absolute Magnitude Sum Function
`ASL
`Acoustic Source Localization
`BOL
`Bearings Only Localization
`CDF
`Cumulative Distribution Function
`CLEAR
`CLassification of Events, Activities and Relationships
`evaluation and workshop
`Cramér-Rao Lower Bound
`Cross Spectral Density
`Discrete Fourier Transform
`Direction Of Arrival
`Dilution Of Precision
`Fisher Information Matrix
`Finite Impulse Response
`Generalized Cross Correlation
`Global Positioning System
`Independent and Identically Distributed
`Light Amplification by Stimulated Emission of Radiation
`Least Squares
`Modified Absolute Magnitude Difference Function
`Maximum Likelihood
`Minimum Variance Distortionless Response
`Probability Density Function
`Particle Filter
`
`CRLB
`CSD
`DFT
`DOA
`DOP
`FIM
`FIR
`GCC
`GPS
`IID
`LASER
`LS
`MAMDF
`ML
`MVDR
`PF
`
`xi
`
`Page 14 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`Term or acronym Explanation
`PHAT
`PHAse Transform
`PSD
`Power Spectral Density
`RADAR
`RAdio Detecting And Ranging
`RMS
`Root Mean Square
`SAD
`Speech Activity Detection
`SIR
`Sampling Importance Resampling algorithm
`SLF
`Spatial Likelihood Function (of source position)
`SNR
`Signal to Noise Ratio
`SONAR
`SOund NAvigation and Ranging
`SRP-PHAT
`Steered Response Power using PHAT
`SSL
`Sound Source Localization
`TDE
`Time Delay Estimation
`TDOA
`Time Difference Of Arrival
`VAD
`Voice Activity Detection
`WLS
`Weighted Least Squares
`
`xii
`
`Page 15 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`Mathematical Notations
`
`List of symbols
`
`Symbol Explanation
`a
`Scalar variable
`A column vector of scalars, a = [a1, a2, . . . , aN ]T
`a
`A column vector of values 1
`1
`Identity matrix, I = diag(1)
`I
`
`
`
`. . . w1N
`. . . w2N
`...
`. . .
`. . . wM N
`
`W A matrix of scalars, W =
`
`
`Scalar constant value of √−1
`
`w11 w12
`w21 w22
`...
`...
`wM 1 wM 2
`A N-dimensional space of real numbers
`Signal x value at time t
`Angular frequency [rad/s]
`Frequency [Hz]
`Sampling frequency
`Length of processing frame [samples]
`Duration of processing frame of length L [s]
`
`RN
`x(t)
`ω
`f
`fs
`L
`Tw
`j
`DFT of frame x(t) (at discerete frequency index k)
`X(k)
`µx
`Mean value of variable x
`σ2
`Variance of variable x
`x
`ΩA set of elements
`λ
`Wavelength
`
`xiii
`
`Page 16 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`List of operators
`
`Notation
`U (a, b)
`N (µ, σ2)
`a∗
`|a|
`#·$
`ˆa
`%a%
`D(a, b)
`WT
`W−1
`diag(w)
`
`Explanation
`Uniform distribution between a,b
`Normal distribution with mean µ and variance σ2
`Complex conjugate of a
`Absolute value of a
`Rounding to nearest integer
`Estimate of a
`Euclidean norm of vector a
`Euclidean distance between a and b, %a − b%
`Matrix transpose
`Matrix inverse
`A square matrix with non-diagonal values of 0 and
`diagonal values specified in vector w.
`sum of diagonal values of matrix W
`trace(W)
`Expected value of a
`E[a]
`∗
`Convolution operator
`⊗,⊕
`Binary operators
`Probability of a parameterized by θ.
`p(a; θ)
`P (a|b)
`The likelihood of a conditioned on b
`Projba
`Projection of vector a onto vector b
`|Ω|
`Cardinality of set Ω
`f (x) = O(g(x)) Function g(x) is the asymptotic upper bound for the
`computational time of function f (x).
`
`xiv
`
`Page 17 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`Chapter 1
`
`Introduction
`
`Localization has been an important task in the history of mankind. In the
`
`beginning of modern navigation one could determine his/her position at sea
`by measuring the angles from the horizon of celestial objects at a known time.
`The angles were determined via measurements, e.g., using a sextant. The ce-
`lestial object’s angle above the horizon at a certain time determines a line of
`position (LOP) on a map. The crossing of LOPs is the location. Modern naviga-
`tion and localization utilizes mainly electromagnetic signals. The applications of
`localization include radio detecting and ranging (RADAR) systems, global posi-
`tioning system (GPS) navigation, and light amplification by stimulated emission
`of radiation (LASER) -based localization technology. Other means of localization
`include utilization of sound waves in, e.g., underwater applications such as the
`sound navigation and ranging (SONAR).
`Localization methods can be divided between active and passive methods.
`Active methods send and receive energy whereas passive methods only receive
`energy. Active methods have the advantage of controlling the signal they emit
`which helps the reception process. Drawbacks of an active method include that
`the emitter position is revealed, more complex transducers are required, and the
`energy consumption is higher compared to passive systems. Passive methods are
`more suitable for surveillance purposes since no energy is intentionally emitted.
`This thesis focuses on passive acoustic source localization methods.
`In the era of electrical localization methods, why does one require acous-
`tic localization? Typically the location of a source can be solved with several
`techniques, often even more accurately than with the use of sound. There are,
`however, situations where the use of sound for localization is natural. Consider
`the following video conference setup. A rotating camera is placed on the cen-
`ter of the meeting room table and the participants sit around the table. The
`remote end would like to see the video image of the active talker and hear his
`speech. How could the camera be steered to the direction of the active talker?
`All participants could have buttons which they press before speaking to turn
`
`1
`
`Page 18 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`the pre-calibrated camera, a cameraman could manually turn the camera, or a
`microphone array could determine the speaker direction and steer the camera
`automatically. All these approaches would work in varying degree, but obviously
`the sound-based automatic camera steering is the most practical solution. Such
`systems have been widely developed and have been used for automatic camera
`management during lectures [Liu01]. However, more reverberation and noise tol-
`erant solutions are called for. Microphones are becoming ubiquitous through the
`use of smart phones and laptops. They are relatively cheap and robust. Hence,
`acoustic localization methods hold a great potential for utilization.
`Special rooms that are equipped with different sensors such as microphones,
`orientation sensors, and video cameras are referred as Smart rooms. Smart room
`data together with annotations are important resources for developing and eval-
`uating automatic methods to sense human actions. For example, systems for
`locating people based on audio and video could be investigated separately or
`jointly if a smart room is equipped with microphones and video cameras. Public
`databases of such recordings are available [Gar07b]. Some localization methods
`presented in this thesis have also been evaluated in the “CLEAR technology eval-
`uation” which uses a large database consisting of annotated smart room record-
`ings [cle07, Mos07]. These recording rooms are located at the Society in Informa-
`tion Technologies at Athens Information Technology, Athens, Greece (AIT), the
`IBM T.J. Watson Research Center, Yorktown Heights, USA (IBM), the Centro
`per la ricerca scientica e tecnologica at the Instituto Trentino di Cultura1, Trento,
`Italy (ITC-irst), the Interactive Systems Labs of the Universitat Karlsruhe, Ger-
`many (UKA), and the Universitat Politecnica de Catalunya, Barcelona, Spain
`(UPC).
`
`1.1 List of Included Publications
`
`This thesis is a compound thesis and is based on the following publications:
`
`P1 Pasi Pertilä, Teemu Korhonen, and Ari Visa, Measurement
`Combination for Acoustic Source Localization in a Room Environment.
`EURASIP Journal on Audio, Speech, and Music Processing, vol. 2008, Ar-
`ticle ID 278185, 14 pages, 2008.
`
`P2 Pasi Pertilä and Mikko Parviainen, Robust Speaker Localization in
`Meeting Room Domain.
`In Proceedings of the IEEE International Con-
`ference on Acoustics, Speech, and Signal Processing, (ICASSP’07), vol. 4,
`pages 497 – 500, 2007.
`
`P3 Pasi Pertilä, Mikko Parviainen, Teemu Korhonen, and Ari Visa,
`A Spatiotemporal Approach to Passive Sound Source Localization. In Pro-
`
`1Fondazione Bruno Kessler
`
`2
`
`Page 19 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`ceedings of International Symposium on Communications and Information
`Technologies 2004 (ISCIT’04), pages 1150–1154, 2004.
`
`P4 Pasi Pertilä, Mikko Parviainen, Teemu Korhonen, and Ari Visa,
`Moving Sound Source Localization in Large Areas. In 2005 International
`Symposium on Intelligent Signal Processing and Communication Systems
`(ISPACS 2005), pages 745–748, 2005.
`
`P5 Pasi Pertilä, Array Steered Response Time-Alignment for Propagation
`Delay Compensation for Acoustic Localization. In 42nd Asilomar Confer-
`ence on Signals, Systems, and Computers. In press, 2008.
`
`These publications are cited as [P1],[P2], etc.
`
`1.1.1 List of Supplemental Publications
`
`S1 Teemu Korhonen and Pasi Pertilä, TUT Acoustic Source Tracking Sys-
`tem 2007. In R. Stiefelhagen, R. Bowers, and J. Fiscus, editors, Multimodal
`Technologies for Perception of Humans, International Evaluation Work-
`shops CLEAR 2007 and RT 2007. Revised Selected Papers, volume 4625 of
`Series: Lecture Notes in Computer Science, pages 104-112. Springer, 2008.
`
`S2 Pasi Pertilä, Teemu Korhonen, Tuomo Pirinen, and Mikko Parvi-
`ainen, TUT Acoustic Source Tracking System 2006. In R. Stiefelhagen and
`J. Garofolo, editors, Multimodal Technologies for Perception of Humans –
`First international Evaluation Workshop on Classification of Events, Ac-
`tivities and Relationships, CLEAR 2006, Southampton, UK, Lecture Notes
`in Computer Science 4122, pages 127–136. Springer, Southampton, UK,
`2007.
`
`S3 Mikko Parviainen, Pasi Pertilä, Teemu Korhonen, and Ari Visa,
`A Spatiotemporal Approach for Passive Source Localization — Real-World
`Experiments. In Proceedings of International Workshop on Nonlinear Sig-
`nal and Image Processing (NSIP 2005), Sapporo, Japan, pages 468–473,
`2005.
`
`3
`
`Page 20 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`Sound emission
`
`Propagation
`
`Measurement
`
`Localization method
`
`Figure 1.1: The process of sound localization can be divided into four stages: sound
`emission, propagation of the sound wave, reception of sound, and the actual localization
`algorithm.
`
`1.2 Problem Description
`
`The process of acoustic source localization (ASL) is illustrated in Fig. 1.1. The
`ASL problem is divided into four stages: sound emission, propagation, measure-
`ment, and localization. The first three stages represent the physical phenomena
`and the measurement taking place before the localization algorithm solves the
`source position. These stages are briefly discussed in the following subsections.
`This thesis focuses on the last stage and discusses signal processing methods to
`locate the sound source.
`When discussing solutions to a problem, it is useful to classify the type of prob-
`lem. According to [Tar] the prediction of results from measurements requires 1)
`a model of the system under investigations and 2) a physical theory linking the
`parameters of the model to the parameters being measured. A forward problem
`is to predict the measurement parameters from the model parameters, which is
`often straightforward. An inverse problem uses measurement parameters to infer
`model parameters. An example of a forward problem would state: output the
`received signal at the given microphone location by using the known source posi-
`tion and source signal. Assuming a free-field scenario this would be achieved by
`simply delaying the source signal the amount of sound propagation delay between
`source and microphone position and attenuating the signal relative to the propa-
`gation path length. The inverse problem would state: solve the source location by
`using the measured microphone signals at known locations. The example inverse
`problem is much more difficult to answer than the forward problem.
`Hadamard’s definition of a well-posed problem is
`
`1. A solution exists
`
`2. The solution is unique
`
`3. The solution depends continuously on the data
`
`A problem that violates one or more of these rules is termed ill-posed. During
`
`4
`
`Page 21 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`this thesis it will become evident that sound source localization is an ill-posed
`inverse problem in most of the realistic scenarios.
`
`1.2.1 Sound Source
`
`Sound source localization system is often designed for a specific application which
`leads to assumptions about the source. For example, in the case of locating a
`talker for video conferencing some assumptions about the movement of humans
`can be applied. In addition, the speech signal has special characteristics origi-
`nating from the speech production system that differentiate it from other signals.
`Signal characteristics such as bandwidth and center frequency can also guide the
`selection of a suitable localization scheme. A coarse characterization of the source
`signal between a narrowband and wideband signal is typically made. Many com-
`monly occurring audio signals are wideband, e.g., human speech and jet aircraft
`represent typical wideband signals, while, e.g., some bird calls could be considered
`as narrowband, consisting of a few individual frequencies. The source can also
`be directive, possibly as a function of frequency, such as a human talker [Dun39].
`However, the presented methods do not exploit directionality. It is also noted
`that detecting a source or enumerating sources is a separate problem from local-
`ization although they are somewhat related. These problems are not discussed
`here.
`
`1.2.2 Sound Propagation
`
`Sound is mechanical vibration of particles, and is propagated as a longitudinal
`wave. It therefore requires a medium (here, air) to exist. Accurately modeling
`sound propagation from an unknown source position to the sensor is not trivial,
`and the physical properties of sound propagation are therefore briefly reviewed.
`A rough division between near-field and far-field sources can be made based
`on the geometry of the problem setting. Far-field methods assume that the source
`emitted wavefront is a plane wave at the receiving microphone array. In the near-
`field situation, the received wavefront is curved. In a way, the far-field assumption
`is an approximation of the near-field situation.
`This work discusses sound source localization for indoor and outdoor appli-
`cations.
`In both scenarios the received waveform is disturbed by background
`noise and multipath propagation effects.
`In indoors the multipath effects are
`caused by sound reflections from room surfaces and objects larger than the wave-
`length. Sound bends around objects that are smaller than the wavelength. This
`phenomenon is called diffraction. For example, a 2000 Hz signal having the
`wavelength of 17 cm would reflect from an office wall but not from a coffee mug.
`Reflections can be specular (mirror-like) or diffuse where sound is reflected into
`directions not assumed by the specular reflection. Diffuse reflections cause scat-
`tering of the wave, i.e., difference between the ideal wave behavior and the actual
`
`5
`
`Page 22 of 136
`
`SONOS EXHIBIT 1015
`
`
`
`wave behavior.
`Enclosures can be characterized by their acoustical properties. A typical
`measure is the amount of reverberation expressed as the time sound pressure
`takes to attenuate 60 dB after switching off a continuous source [Ros90]. The
`reverberation time is noted as T60 (s). Reverberation is related to the surface
`absorption coefficient αi which determines how much sound is reflected from the
`surface and how much is ab