throbber
TAMPEREEN TEKNILLINEN YLIOPISTO
`
`TAMPERE UNIVERSITY OF TECHNOLOGY
`
`Julkaisu 794 * Publication 794
`
`Pasi Pertila
`
`Acoustic Source Localization in a Room Environment
`
`and at Moderate Distances
`
`
`
`Tampere 2009
`Page 1 of 136
`
`
`
`SONOS EXHIBIT1015
`
`Page 1 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`Tampereen teknillinen yliopisto. Julkaisu 794
`Tampere University of Technology. Publication 794
`
`Pasi Pertilä
`
`Acoustic Source Localization in a Room Environment
`and at Moderate Distances
`Thesis for the degree of Doctor of Technology to be presented with due permission for
`public examination and criticism in Tietotalo Building, Auditorium TB222, at Tampere
`University of Technology, on the 30th of January 2009, at 12 noon.
`
`Tampereen teknillinen yliopisto - Tampere University of Technology
`Tampere 2009
`
`Page 2 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`ISBN 978-952-15-2106-5 (printed)
`ISBN 978-952-15-2137-9 (PDF)
`ISSN 1459-2045
`
`Page 3 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`Abstract
`
`The pressure changes of an acoustic wavefront are sensed with a microphone
`
`that acts as a transducer, converting sound pressure into voltage. The volt-
`age is then converted into digital form with an analog to digital (AD) -converter to
`provide a discrete time quantized digital signal. This thesis discusses methods to
`estimate the location of a sound source from the signals of multiple microphones.
`Acoustic source localization (ASL) can be used to locate talkers, which is
`useful for speech communication systems such as teleconferencing and hearing
`aids. Active localization methods receive and send energy, whereas passive meth-
`ods only receive energy. The discussed ASL methods are passive which makes
`them attractive for surveillance applications, such as localization of vehicles and
`monitoring of areas. This thesis focuses on ASL in a room environment and
`at moderate distances that are often present in outdoor applications. The fre-
`quency range of many commonly occurring sounds such as speech, vehicles, and
`jet aircraft is large. Time delay estimation (TDE) methods are suitable for es-
`timating properties from such wideband signals. Since TDE methods have been
`extensively studied, the theory is attractive to apply in localization.
`Time difference of arrival (TDOA) -based methods estimate the source lo-
`cation from measured TDOA values between microphones. These methods are
`computationally attractive but deteriorate rapidly when the TDOA estimates are
`no longer directly related to the source position. In a room environment such
`conditions could be faced when reverberation or noise starts to dominate TDOA
`estimation.
`The combination of microphone pairwise TDE measurements is studied as a
`more robust localization solution. TDE measurements are combined into a spa-
`tial likelihood function (SLF) of source position. A sequential Bayesian method
`known as particle filtering (PF) is used to estimate the source position. The PF
`based localization accuracy increases when the variance of SLF decreases. Results
`from simulations and real-data show that multiplication (intersection operation)
`results in a SLF with smaller variance than the typically applied summation
`(union operation).
`
`i
`
`Page 4 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`The above localization methods assume that the source is located in the near-
`field of the microphone array, i.e., the source emitted wavefront curvature is
`observable. In the far-field, the source wavefront is assumed planar and local-
`ization is considered by using spatially separated direction observations. The
`direction of arrival (DOA) of a source emitted wavefront impinging on a micro-
`phone array is traditionally estimated by steering the array to a direction that
`maximizes the steered response power. Such estimates can be deteriorated by
`noise and reverberation. Therefore, talker localization is considered using DOA
`discrimination.
`The sound propagation delay from the source to the microphone array be-
`comes significant at moderate distances. As a result, the directional observations
`from a moving sound source point behind the true source position. Omitting the
`propagation delay results in a biased location estimate of a moving or discontin-
`uously emitting source. To solve this problem the propagation delay is proposed
`to be modeled in the estimation process. Motivated by the robustness of localiza-
`tion using the combination of TDE measurements, source localization by directly
`combining the TDE-based array steered responses is considered. This extends
`the near-field talker localization methods to far-field source localization. The
`presented propagation delay modeling is then proposed for the steered response
`localization. The improvement in localization accuracy by including the propa-
`gation delay is studied using a simulated moving sound source in the atmosphere.
`The presented indoor localization methods have been evaluated in the Clas-
`sification of Events, Activities and Relationships (CLEAR) 2006 and CLEAR’07
`technology evaluations. In the evaluations, the performance of the proposed ASL
`methods was evaluated by a third party from several hours of annotated data.
`The data was gathered from meetings held in multiple smart rooms. According
`to the obtained results from CLEAR’07 development dataset (166 min) presented
`in this thesis, 92 % of speech activity in a meeting situation was located within
`17 cm accuracy.
`
`ii
`
`Page 5 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`Preface
`
`This thesis was compiled during my work at Tampere University of Technology
`
`(TUT) in the Department of Signal Processing. My research on direction
`of arrival (DOA) -based sound source localization is summarized in the latter
`part of this thesis. This topic was introduced to me by my supervisor, Professor
`Ari Visa. During the years 2007 – 2008 I worked with the time delay estimation
`(TDE) -based source localization problem. This topic is discussed in the first part
`of the thesis. The financial support of Tampere Graduate School in Information
`Science and Engineering (TISE) is acknowledged. I wish also to thank the Nokia
`Foundation and the Industrial Research Fund at TUT (Tuula and Yrjö Neuvo
`fund).
`I wish to acknowledge Tuomo Pirinen’s activity in organizing the spatial audio
`research in the Department of Signal Processing before me. I wish to express my
`gratitude towards my colleagues in the Audio Research Group (ARG) for creating
`an inspiring environment for working. I thank Teemu Korhonen for his insightful
`approaches and mathematical visions – this is also evident in the number of pa-
`pers we have co-authored. Thanks to Mikko Parviainen for contributing to the
`presented research and for being an active co-author in many of the included pub-
`lications. Thanks to Anssi Klapuri for his advice, and thanks to Matti Ryynänen
`for helping with LATEX formatting. Thanks to Jouni Paulus, Tuomas Virtanen,
`Marko Helén, Toni Mäkinen, Antti Löytynoja, Atte Virtanen, Sakari Tervo, Ju-
`uso Penttilä, Mikko Roininen, Elina Helander, Hanna Silén, Teemu Karjalainen,
`Konsta Koppinen, Toni Heittola, and Annamaria Mesaros.
`I thank my parents Heikki and Liisa, and my brother Esa for supporting me
`throughout my studies. Last, but not least, I would like to thank Minna for her
`kind support.
`
`Pasi Pertilä
`Tampere, January 2009
`
`iii
`
`Page 6 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`Contents
`
`List of Figures
`
`List of Tables
`
`List of Algorithms
`
`List of Terms, Symbols, and Mathematical Notations
`
`1 Introduction
`1.1 List of Included Publications . . . . . . . . . . . . . . . . . . . . .
`1.1.1 List of Supplemental Publications . . . . . . . . . . . . . .
`1.2 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . .
`1.2.1
`Sound Source . . . . . . . . . . . . . . . . . . . . . . . . .
`1.2.2
`Sound Propagation . . . . . . . . . . . . . . . . . . . . . .
`1.2.3 Measurement
`. . . . . . . . . . . . . . . . . . . . . . . . .
`1.2.4 Localization Algorithm . . . . . . . . . . . . . . . . . . . .
`1.3 Overview of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . .
`1.4 Author’s Contributions . . . . . . . . . . . . . . . . . . . . . . . .
`1.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`2 Time Delay Estimation
`2.1 Signal Model
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`2.2 The Impulse Response Model
`. . . . . . . . . . . . . . . . . . . .
`2.3 Practical Measurement Environment
`. . . . . . . . . . . . . . . .
`2.4 Simulated Room Environment . . . . . . . . . . . . . . . . . . . .
`2.5 Time Difference of Arrival
`. . . . . . . . . . . . . . . . . . . . . .
`2.6 TDOA Estimation Methods . . . . . . . . . . . . . . . . . . . . .
`2.6.1 Generalized Cross Correlation . . . . . . . . . . . . . . . .
`2.6.2 Average Magnitude Difference Function . . . . . . . . . . .
`2.6.3 TDE Function . . . . . . . . . . . . . . . . . . . . . . . . .
`2.6.4 Adaptive TDOA Methods
`. . . . . . . . . . . . . . . . . .
`
`iv
`
`vii
`
`ix
`
`x
`
`xi
`
`1
`2
`3
`4
`5
`5
`7
`7
`9
`10
`11
`
`12
`12
`13
`16
`17
`20
`23
`23
`24
`25
`25
`
`Page 7 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`Source Model-Based TDOA Methods . . . . . . . . . . . .
`2.6.5
`2.6.6 TDOA Interpolation . . . . . . . . . . . . . . . . . . . . .
`2.7 TDOA Estimation Bounds . . . . . . . . . . . . . . . . . . . . . .
`2.7.1 CRLB of TDOA Estimation . . . . . . . . . . . . . . . . .
`2.7.2 Reverberant Systems . . . . . . . . . . . . . . . . . . . . .
`2.7.3
`SNR Threshold in Simulations . . . . . . . . . . . . . . . .
`2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`3 Time Delay Estimation -Based Localization Methods
`3.1 TDOA-Based Closed-Form Localization . . . . . . . . . . . . . . .
`3.1.1 Unconstrained LS Method . . . . . . . . . . . . . . . . . .
`3.1.2 Extended Unconstrained LS Method . . . . . . . . . . . .
`3.1.3 Pre-Multiplying Method . . . . . . . . . . . . . . . . . . .
`3.1.4 Constrained LS Method . . . . . . . . . . . . . . . . . . .
`3.1.5 Approximate LS Method . . . . . . . . . . . . . . . . . . .
`3.1.6 Two Step Closed-Form Weighted LS Method . . . . . . . .
`3.1.7 Weighted Constrained Least Squares Method . . . . . . .
`3.1.8 LS Solution for Source Position, Range, and Propagation
`Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`3.1.9 TDOA Maximum Likelihood Approach . . . . . . . . . . .
`3.2 Dilution of Precision . . . . . . . . . . . . . . . . . . . . . . . . .
`3.3 CRLB of TDOA Localization . . . . . . . . . . . . . . . . . . . .
`3.4 TDOA-Based Sequential Localization Methods . . . . . . . . . . .
`3.4.1
`State Estimation . . . . . . . . . . . . . . . . . . . . . . .
`3.5 TDE Function -Based Localization . . . . . . . . . . . . . . . . .
`3.5.1 Correlation Combination with Summation . . . . . . . . .
`3.5.2 Correlation Combination with Multiplication . . . . . . . .
`3.5.3 Correlation Combination with Hamacher T-norm . . . . .
`3.5.4
`Spatial Likelihood Function Variance . . . . . . . . . . . .
`3.5.5 TDE Likelihood Function Smoothing and Interpolation . .
`3.6 TDE Likelihood-Based Localization by Iteration . . . . . . . . . .
`3.7 TDE Likelihood-Based Localization with Sequential Bayesian Meth-
`ods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`3.7.1 Particle Filtering . . . . . . . . . . . . . . . . . . . . . . .
`3.8 Simulations
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`3.8.1
`Scoring Metrics . . . . . . . . . . . . . . . . . . . . . . . .
`3.8.2 Localization Methods . . . . . . . . . . . . . . . . . . . . .
`3.8.3
`Simulation Results and Discussion . . . . . . . . . . . . . .
`3.8.4 TDE Likelihood Combination and PF . . . . . . . . . . . .
`3.9 Results with Speech Data . . . . . . . . . . . . . . . . . . . . . .
`3.9.1 CLEAR’07 Dataset Description . . . . . . . . . . . . . . .
`3.9.2 Results with CLEAR’07 Dataset
`. . . . . . . . . . . . . .
`3.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`26
`27
`27
`27
`28
`29
`30
`
`31
`32
`33
`34
`35
`36
`37
`37
`38
`
`39
`39
`40
`42
`43
`43
`44
`46
`47
`49
`49
`50
`51
`
`52
`53
`55
`55
`55
`56
`58
`59
`59
`60
`61
`
`v
`
`Page 8 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`4 Direction of Arrival -Based Localization
`4.1 DOA-Based Localization Problem . . . . . . . . . . . . . . . . . .
`4.1.1 Bearings-Only Source Localization . . . . . . . . . . . . .
`4.2 DOA-Based Closed-Form Localization . . . . . . . . . . . . . . .
`4.3 Robust DOA-Based Localization . . . . . . . . . . . . . . . . . . .
`4.3.1
`Simulations . . . . . . . . . . . . . . . . . . . . . . . . . .
`4.3.2 Results with Speech Data . . . . . . . . . . . . . . . . . .
`4.4 DOA Vector-Based Localization Using Propagation Delay . . . . .
`4.4.1
`Simulation Results
`. . . . . . . . . . . . . . . . . . . . . .
`4.5 Localization Using TDE-Based Array Steered Responses
`. . . . .
`4.6 Sound Propagation Delay in Directional Steered Response Local-
`ization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`4.6.1
`Implementation Issues
`. . . . . . . . . . . . . . . . . . . .
`4.6.2
`Simulations . . . . . . . . . . . . . . . . . . . . . . . . . .
`4.6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`5 Conclusions, Discussion, and Future Work
`
`6 Errata
`
`Bibliography
`
`P1 Publication 1
`
`P2 Publication 2
`
`P3 Publication 3
`
`P4 Publication 4
`
`P5 Publication 5
`
`Appendix
`
`A Algorithm Descriptions
`
`B Simulation Setup
`
`C Simulation Results
`
`D Concepts Related to Random Processes
`
`vi
`
`63
`64
`64
`65
`67
`68
`69
`70
`73
`74
`
`77
`78
`79
`80
`81
`
`83
`
`85
`
`86
`
`99
`
`115
`
`121
`
`128
`
`134
`
`135
`
`141
`
`147
`
`148
`
`150
`
`Page 9 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`List of Figures
`
`1.1 Sound source localization process
`
`. . . . . . . . . . . . . . . . . .
`
`Image source concept . . . . . . . . . . . . . . . . . . . . . . . . .
`2.1
`2.2 Recording room floor plan . . . . . . . . . . . . . . . . . . . . . .
`2.3 Microphone locations inside recording room . . . . . . . . . . . .
`2.4
`Impulse response of recording room . . . . . . . . . . . . . . . . .
`2.5 Waveform and amplitude spectrum of a speech frame . . . . . . .
`2.6 Spectrograms of speech signal and babble . . . . . . . . . . . . . .
`2.7
`Illustration of simulation setup . . . . . . . . . . . . . . . . . . . .
`2.8 TDOA mapping into spatial coordinates
`. . . . . . . . . . . . . .
`2.9 Example TDE function . . . . . . . . . . . . . . . . . . . . . . . .
`2.10 The threshold effect of TDOA estimation . . . . . . . . . . . . . .
`2.11 Simulated effect of reverberation on cross correlation . . . . . . .
`
`3.1 Example of recording room dilution of precision (DOP) . . . . . .
`3.2 Example of microphone pairwise SLF . . . . . . . . . . . . . . . .
`3.3 Example SLF produced by SRP-PHAT . . . . . . . . . . . . . . .
`3.4 Example SLF produced by Multi-PHAT . . . . . . . . . . . . . .
`3.5 Marginal SLF from real-data recordings . . . . . . . . . . . . . . .
`3.6 Weighted distance error (WDE) values of SLFs built with different
`combination methods . . . . . . . . . . . . . . . . . . . . . . . . .
`3.7 RMS error of
`simulations
`for SRP-PHAT+PF and Multi-
`PHAT+PF methods
`. . . . . . . . . . . . . . . . . . . . . . . . .
`
`4
`
`14
`15
`16
`17
`18
`19
`20
`21
`26
`29
`30
`
`41
`45
`47
`48
`50
`
`51
`
`59
`
`65
`4.1 DOA-based source localization problem . . . . . . . . . . . . . . .
`69
`4.2 Simulation results with robust DOA-based localization . . . . . .
`70
`4.3 Space-time diagram . . . . . . . . . . . . . . . . . . . . . . . . . .
`71
`4.4 Source localization problem with propagation delay . . . . . . . .
`75
`4.5 Example of TDE-based DOA likelihood from microphone pair
`. .
`76
`4.6 TDE-based array steered response . . . . . . . . . . . . . . . . . .
`4.7 Example of spatial likelihood function using steered array responses 79
`
`vii
`
`Page 10 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`4.8 RMS localization error of propagation delay -based steered array
`response localization . . . . . . . . . . . . . . . . . . . . . . . . .
`4.9 Example of estimated source trajectory with and without propa-
`gation delay modeling
`. . . . . . . . . . . . . . . . . . . . . . . .
`
`80
`
`81
`
`viii
`
`Page 11 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`List of Tables
`
`2.1 Recording room microphone locations . . . . . . . . . . . . . . . .
`2.2 Reverberation time values in simulations . . . . . . . . . . . . . .
`
`3.1 Simulation localization results for ML-TDOA . . . . . . . . . . .
`3.2 Simulation localization results for Multi-PHAT using particle fil-
`tering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`3.3 Real-data results with CLEAR’07 database . . . . . . . . . . . . .
`
`4.1 Robust DOA-based simulation setup . . . . . . . . . . . . . . . .
`
`16
`19
`
`57
`
`58
`61
`
`68
`
`B.1 Microphone coordinates.
`
`. . . . . . . . . . . . . . . . . . . . . . . 147
`
`C.1 Accuracy of ML-TDOA localization in simulations . . . . . . . . . 148
`C.2 Accuracy of Multi-PHAT + PF localization in simulations
`. . . . 149
`
`ix
`
`Page 12 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`List of Algorithms
`
`1
`2
`3
`4
`5
`6
`
`. . . . . . . . . . . . . 141
`SIR algorithm for particle filtering [Aru02].
`The systematic resampling algorithm [Aru02]. . . . . . . . . . . . . 142
`ADC method for Speaker localization [P2].
`. . . . . . . . . . . . . 143
`DOA vector-based localization with propagation delay [P3].
`. . . . 144
`TDE-based directional likelihood for far-field source localization.
`. 145
`TDE-based directional likelihood for far-field source localization
`with propagation delay according to [P5].
`. . . . . . . . . . . . . . 146
`
`x
`
`Page 13 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`List of Terms, Symbols, and
`Mathematical Notations
`
`Terms and Acronyms
`
`Term or acronym Explanation
`AED
`Adaptive Eigenvalue Decomposition
`AMDF
`Absolute Magnitude Difference Function
`AMSF
`Absolute Magnitude Sum Function
`ASL
`Acoustic Source Localization
`BOL
`Bearings Only Localization
`CDF
`Cumulative Distribution Function
`CLEAR
`CLassification of Events, Activities and Relationships
`evaluation and workshop
`Cramér-Rao Lower Bound
`Cross Spectral Density
`Discrete Fourier Transform
`Direction Of Arrival
`Dilution Of Precision
`Fisher Information Matrix
`Finite Impulse Response
`Generalized Cross Correlation
`Global Positioning System
`Independent and Identically Distributed
`Light Amplification by Stimulated Emission of Radiation
`Least Squares
`Modified Absolute Magnitude Difference Function
`Maximum Likelihood
`Minimum Variance Distortionless Response
`Probability Density Function
`Particle Filter
`
`CRLB
`CSD
`DFT
`DOA
`DOP
`FIM
`FIR
`GCC
`GPS
`IID
`LASER
`LS
`MAMDF
`ML
`MVDR
`PDF
`PF
`
`xi
`
`Page 14 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`Term or acronym Explanation
`PHAT
`PHAse Transform
`PSD
`Power Spectral Density
`RADAR
`RAdio Detecting And Ranging
`RMS
`Root Mean Square
`SAD
`Speech Activity Detection
`SIR
`Sampling Importance Resampling algorithm
`SLF
`Spatial Likelihood Function (of source position)
`SNR
`Signal to Noise Ratio
`SONAR
`SOund NAvigation and Ranging
`SRP-PHAT
`Steered Response Power using PHAT
`SSL
`Sound Source Localization
`TDE
`Time Delay Estimation
`TDOA
`Time Difference Of Arrival
`VAD
`Voice Activity Detection
`WLS
`Weighted Least Squares
`
`xii
`
`Page 15 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`Mathematical Notations
`
`List of symbols
`
`Symbol Explanation
`a
`Scalar variable
`A column vector of scalars, a = [a1, a2, . . . , aN ]T
`a
`A column vector of values 1
`1
`Identity matrix, I = diag(1)
`I
`
`
`
`. . . w1N
`. . . w2N
`...
`. . .
`. . . wM N
`
`W A matrix of scalars, W = 
`
`
`Scalar constant value of √−1
`
`w11 w12
`w21 w22
`...
`...
`wM 1 wM 2
`A N-dimensional space of real numbers
`Signal x value at time t
`Angular frequency [rad/s]
`Frequency [Hz]
`Sampling frequency
`Length of processing frame [samples]
`Duration of processing frame of length L [s]
`
`RN
`x(t)

`f
`fs
`L
`Tw
`j
`DFT of frame x(t) (at discerete frequency index k)
`X(k)
`µx
`Mean value of variable x
`σ2
`Variance of variable x
`x
`ΩA set of elements

`Wavelength
`
`xiii
`
`Page 16 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`List of operators
`
`Notation
`U (a, b)
`N (µ, σ2)
`a∗
`|a|
`#·$
`ˆa
`%a%
`D(a, b)
`WT
`W−1
`diag(w)
`
`Explanation
`Uniform distribution between a,b
`Normal distribution with mean µ and variance σ2
`Complex conjugate of a
`Absolute value of a
`Rounding to nearest integer
`Estimate of a
`Euclidean norm of vector a
`Euclidean distance between a and b, %a − b%
`Matrix transpose
`Matrix inverse
`A square matrix with non-diagonal values of 0 and
`diagonal values specified in vector w.
`sum of diagonal values of matrix W
`trace(W)
`Expected value of a
`E[a]
`∗
`Convolution operator
`⊗,⊕
`Binary operators
`Probability of a parameterized by θ.
`p(a; θ)
`P (a|b)
`The likelihood of a conditioned on b
`Projba
`Projection of vector a onto vector b
`|Ω|
`Cardinality of set Ω
`f (x) = O(g(x)) Function g(x) is the asymptotic upper bound for the
`computational time of function f (x).
`
`xiv
`
`Page 17 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`Chapter 1
`
`Introduction
`
`Localization has been an important task in the history of mankind. In the
`
`beginning of modern navigation one could determine his/her position at sea
`by measuring the angles from the horizon of celestial objects at a known time.
`The angles were determined via measurements, e.g., using a sextant. The ce-
`lestial object’s angle above the horizon at a certain time determines a line of
`position (LOP) on a map. The crossing of LOPs is the location. Modern naviga-
`tion and localization utilizes mainly electromagnetic signals. The applications of
`localization include radio detecting and ranging (RADAR) systems, global posi-
`tioning system (GPS) navigation, and light amplification by stimulated emission
`of radiation (LASER) -based localization technology. Other means of localization
`include utilization of sound waves in, e.g., underwater applications such as the
`sound navigation and ranging (SONAR).
`Localization methods can be divided between active and passive methods.
`Active methods send and receive energy whereas passive methods only receive
`energy. Active methods have the advantage of controlling the signal they emit
`which helps the reception process. Drawbacks of an active method include that
`the emitter position is revealed, more complex transducers are required, and the
`energy consumption is higher compared to passive systems. Passive methods are
`more suitable for surveillance purposes since no energy is intentionally emitted.
`This thesis focuses on passive acoustic source localization methods.
`In the era of electrical localization methods, why does one require acous-
`tic localization? Typically the location of a source can be solved with several
`techniques, often even more accurately than with the use of sound. There are,
`however, situations where the use of sound for localization is natural. Consider
`the following video conference setup. A rotating camera is placed on the cen-
`ter of the meeting room table and the participants sit around the table. The
`remote end would like to see the video image of the active talker and hear his
`speech. How could the camera be steered to the direction of the active talker?
`All participants could have buttons which they press before speaking to turn
`
`1
`
`Page 18 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`the pre-calibrated camera, a cameraman could manually turn the camera, or a
`microphone array could determine the speaker direction and steer the camera
`automatically. All these approaches would work in varying degree, but obviously
`the sound-based automatic camera steering is the most practical solution. Such
`systems have been widely developed and have been used for automatic camera
`management during lectures [Liu01]. However, more reverberation and noise tol-
`erant solutions are called for. Microphones are becoming ubiquitous through the
`use of smart phones and laptops. They are relatively cheap and robust. Hence,
`acoustic localization methods hold a great potential for utilization.
`Special rooms that are equipped with different sensors such as microphones,
`orientation sensors, and video cameras are referred as Smart rooms. Smart room
`data together with annotations are important resources for developing and eval-
`uating automatic methods to sense human actions. For example, systems for
`locating people based on audio and video could be investigated separately or
`jointly if a smart room is equipped with microphones and video cameras. Public
`databases of such recordings are available [Gar07b]. Some localization methods
`presented in this thesis have also been evaluated in the “CLEAR technology eval-
`uation” which uses a large database consisting of annotated smart room record-
`ings [cle07, Mos07]. These recording rooms are located at the Society in Informa-
`tion Technologies at Athens Information Technology, Athens, Greece (AIT), the
`IBM T.J. Watson Research Center, Yorktown Heights, USA (IBM), the Centro
`per la ricerca scientica e tecnologica at the Instituto Trentino di Cultura1, Trento,
`Italy (ITC-irst), the Interactive Systems Labs of the Universitat Karlsruhe, Ger-
`many (UKA), and the Universitat Politecnica de Catalunya, Barcelona, Spain
`(UPC).
`
`1.1 List of Included Publications
`
`This thesis is a compound thesis and is based on the following publications:
`
`P1 Pasi Pertilä, Teemu Korhonen, and Ari Visa, Measurement
`Combination for Acoustic Source Localization in a Room Environment.
`EURASIP Journal on Audio, Speech, and Music Processing, vol. 2008, Ar-
`ticle ID 278185, 14 pages, 2008.
`
`P2 Pasi Pertilä and Mikko Parviainen, Robust Speaker Localization in
`Meeting Room Domain.
`In Proceedings of the IEEE International Con-
`ference on Acoustics, Speech, and Signal Processing, (ICASSP’07), vol. 4,
`pages 497 – 500, 2007.
`
`P3 Pasi Pertilä, Mikko Parviainen, Teemu Korhonen, and Ari Visa,
`A Spatiotemporal Approach to Passive Sound Source Localization. In Pro-
`
`1Fondazione Bruno Kessler
`
`2
`
`Page 19 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`ceedings of International Symposium on Communications and Information
`Technologies 2004 (ISCIT’04), pages 1150–1154, 2004.
`
`P4 Pasi Pertilä, Mikko Parviainen, Teemu Korhonen, and Ari Visa,
`Moving Sound Source Localization in Large Areas. In 2005 International
`Symposium on Intelligent Signal Processing and Communication Systems
`(ISPACS 2005), pages 745–748, 2005.
`
`P5 Pasi Pertilä, Array Steered Response Time-Alignment for Propagation
`Delay Compensation for Acoustic Localization. In 42nd Asilomar Confer-
`ence on Signals, Systems, and Computers. In press, 2008.
`
`These publications are cited as [P1],[P2], etc.
`
`1.1.1 List of Supplemental Publications
`
`S1 Teemu Korhonen and Pasi Pertilä, TUT Acoustic Source Tracking Sys-
`tem 2007. In R. Stiefelhagen, R. Bowers, and J. Fiscus, editors, Multimodal
`Technologies for Perception of Humans, International Evaluation Work-
`shops CLEAR 2007 and RT 2007. Revised Selected Papers, volume 4625 of
`Series: Lecture Notes in Computer Science, pages 104-112. Springer, 2008.
`
`S2 Pasi Pertilä, Teemu Korhonen, Tuomo Pirinen, and Mikko Parvi-
`ainen, TUT Acoustic Source Tracking System 2006. In R. Stiefelhagen and
`J. Garofolo, editors, Multimodal Technologies for Perception of Humans –
`First international Evaluation Workshop on Classification of Events, Ac-
`tivities and Relationships, CLEAR 2006, Southampton, UK, Lecture Notes
`in Computer Science 4122, pages 127–136. Springer, Southampton, UK,
`2007.
`
`S3 Mikko Parviainen, Pasi Pertilä, Teemu Korhonen, and Ari Visa,
`A Spatiotemporal Approach for Passive Source Localization — Real-World
`Experiments. In Proceedings of International Workshop on Nonlinear Sig-
`nal and Image Processing (NSIP 2005), Sapporo, Japan, pages 468–473,
`2005.
`
`3
`
`Page 20 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`Sound emission
`
`Propagation
`
`Measurement
`
`Localization method
`
`Figure 1.1: The process of sound localization can be divided into four stages: sound
`emission, propagation of the sound wave, reception of sound, and the actual localization
`algorithm.
`
`1.2 Problem Description
`
`The process of acoustic source localization (ASL) is illustrated in Fig. 1.1. The
`ASL problem is divided into four stages: sound emission, propagation, measure-
`ment, and localization. The first three stages represent the physical phenomena
`and the measurement taking place before the localization algorithm solves the
`source position. These stages are briefly discussed in the following subsections.
`This thesis focuses on the last stage and discusses signal processing methods to
`locate the sound source.
`When discussing solutions to a problem, it is useful to classify the type of prob-
`lem. According to [Tar] the prediction of results from measurements requires 1)
`a model of the system under investigations and 2) a physical theory linking the
`parameters of the model to the parameters being measured. A forward problem
`is to predict the measurement parameters from the model parameters, which is
`often straightforward. An inverse problem uses measurement parameters to infer
`model parameters. An example of a forward problem would state: output the
`received signal at the given microphone location by using the known source posi-
`tion and source signal. Assuming a free-field scenario this would be achieved by
`simply delaying the source signal the amount of sound propagation delay between
`source and microphone position and attenuating the signal relative to the propa-
`gation path length. The inverse problem would state: solve the source location by
`using the measured microphone signals at known locations. The example inverse
`problem is much more difficult to answer than the forward problem.
`Hadamard’s definition of a well-posed problem is
`
`1. A solution exists
`
`2. The solution is unique
`
`3. The solution depends continuously on the data
`
`A problem that violates one or more of these rules is termed ill-posed. During
`
`4
`
`Page 21 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`this thesis it will become evident that sound source localization is an ill-posed
`inverse problem in most of the realistic scenarios.
`
`1.2.1 Sound Source
`
`Sound source localization system is often designed for a specific application which
`leads to assumptions about the source. For example, in the case of locating a
`talker for video conferencing some assumptions about the movement of humans
`can be applied. In addition, the speech signal has special characteristics origi-
`nating from the speech production system that differentiate it from other signals.
`Signal characteristics such as bandwidth and center frequency can also guide the
`selection of a suitable localization scheme. A coarse characterization of the source
`signal between a narrowband and wideband signal is typically made. Many com-
`monly occurring audio signals are wideband, e.g., human speech and jet aircraft
`represent typical wideband signals, while, e.g., some bird calls could be considered
`as narrowband, consisting of a few individual frequencies. The source can also
`be directive, possibly as a function of frequency, such as a human talker [Dun39].
`However, the presented methods do not exploit directionality. It is also noted
`that detecting a source or enumerating sources is a separate problem from local-
`ization although they are somewhat related. These problems are not discussed
`here.
`
`1.2.2 Sound Propagation
`
`Sound is mechanical vibration of particles, and is propagated as a longitudinal
`wave. It therefore requires a medium (here, air) to exist. Accurately modeling
`sound propagation from an unknown source position to the sensor is not trivial,
`and the physical properties of sound propagation are therefore briefly reviewed.
`A rough division between near-field and far-field sources can be made based
`on the geometry of the problem setting. Far-field methods assume that the source
`emitted wavefront is a plane wave at the receiving microphone array. In the near-
`field situation, the received wavefront is curved. In a way, the far-field assumption
`is an approximation of the near-field situation.
`This work discusses sound source localization for indoor and outdoor appli-
`cations.
`In both scenarios the received waveform is disturbed by background
`noise and multipath propagation effects.
`In indoors the multipath effects are
`caused by sound reflections from room surfaces and objects larger than the wave-
`length. Sound bends around objects that are smaller than the wavelength. This
`phenomenon is called diffraction. For example, a 2000 Hz signal having the
`wavelength of 17 cm would reflect from an office wall but not from a coffee mug.
`Reflections can be specular (mirror-like) or diffuse where sound is reflected into
`directions not assumed by the specular reflection. Diffuse reflections cause scat-
`tering of the wave, i.e., difference between the ideal wave behavior and the actual
`
`5
`
`Page 22 of 136
`
`SONOS EXHIBIT 1015
`
`

`

`wave behavior.
`Enclosures can be characterized by their acoustical properties. A typical
`measure is the amount of reverberation expressed as the time sound pressure
`takes to attenuate 60 dB after switching off a continuous source [Ros90]. The
`reverberation time is noted as T60 (s). Reverberation is related to the surface
`absorption coefficient αi which determines how much sound is reflected from the
`surface and how much is ab

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket