throbber
DECLARATION OF GORDON MACPHERSON
`
`I, Gordon MacPherson, am over twenty-one (21) years of age. I have never been
`convicted of a felony, and I am fully competent to make this declaration. I declare the following
`to be true to the best of my knowledge, information and belief:
`
`1.
`
`2.
`
`3.
`
`I am Director Board Governance & IP Operations of The Institute of Electrical and
`Electronics Engineers, Incorporated (“IEEE”).
`
`IEEE is a neutral third party in this dispute.
`
`I am not being compensated for this declaration and IEEE is only being reimbursed
`for the cost of the article I am certifying.
`
`4. Among my responsibilities as Director Board Governance & IP Operations, I act as a
`custodian of certain records for IEEE.
`
`5.
`
`I make this declaration based on my personal knowledge and information contained
`in the business records of IEEE.
`
`6. As part of its ordinary course of business, IEEE publishes and makes available
`technical articles and standards. These publications are made available for public
`download through the IEEE digital library, IEEE Xplore.
`
`7.
`
`It is the regular practice of IEEE to publish articles and other writings including
`article abstracts and make them available to the public through IEEE Xplore. IEEE
`maintains copies of publications in the ordinary course of its regularly conducted
`activities.
`
`8. The article below has been attached as Exhibit A to this declaration:
`
`A. Miki Sato et al.; “A single-chip speech dialogue module and its evaluation
`on a personal robot, PaPeRo-mini”, 2009 IEEE International Conference
`on Acoustics, Speech and Signal Processing, April 19 – 24, 2009.
`
`9.
`
`I obtained a copy of Exhibit A through IEEE Xplore, where it is maintained in the
`ordinary course of IEEE’s business. Exhibit A is a true and correct copy of the
`Exhibit, as it existed on or about December 29, 2021.
`
`10. The article and abstract from IEEE Xplore shows the date of publication. IEEE
`Xplore populates this information using the metadata associated with the publication.
`
`445 Hoes Lane Piscataway, NJ 08854
`
`DocuSign Envelope ID: 7FCDEB04-9D8A-4D7A-9401-7811BCC4CFCA
`
`Page 1 of 17
`
`SONOS EXHIBIT 1049
`
`

`

`
`11. Miki Sato et al.; “A single-chip speech dialogue module and its evaluation on a
`personal robot, PaPeRo-mini” was published in the 2009 IEEE International
`Conference on Acoustics, Speech and Signal Processing. The 2009 IEEE
`International Conference on Acoustics, Speech and Signal Processing was held from
`April 19 – 24, 2009. Copies of the conference proceedings were made available no
`later than the last day of the conference. The article is currently available for public
`download from the IEEE digital library, IEEE Xplore.
`
`12. I hereby declare that all statements made herein of my own knowledge are true and
`that all statements made on information and belief are believed to be true, and further
`that these statements were made with the knowledge that willful false statements and
`the like are punishable by fine or imprisonment, or both, under 18 U.S.C. § 1001.
`
`I declare under penalty of perjury that the foregoing statements are true and correct.
`
`
`
`
`Executed on:
`
`
`
`
`
`
`
`
`
`
`DocuSign Envelope ID: 7FCDEB04-9D8A-4D7A-9401-7811BCC4CFCA
`
`1/6/2022
`
`Page 2 of 17
`
`SONOS EXHIBIT 1049
`
`

`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`EXHIBIT A
`
`DocuSign Envelope ID: 7FCDEB04-9D8A-4D7A-9401-7811BCC4CFCA
`
`Page 3 of 17
`
`SONOS EXHIBIT 1049
`
`

`

`IEEE.org
`
`IEEE Xplore
`
`IEEE-SA
`
`IEEE Spectrum
`
`More Sites
`
`Create
`Account
`
`Personal
`Sign In
`
`Cart
`
`
`
`
`
`Access provided by:
`Everything Demo User
`
`Sign Out
`
`Browse  My Settings  Help 
`
`Access provided by:
`Everything Demo User
`
`Sign Out
`
`All
`
`
`
`Conferences > 2009 IEEE International Confe... 
`
`
`
`ADVANCED SEARCH
`
` Back to Results
`
`A single-chip speech dialogue module and its evaluation
`on a personal robot, PaPeRo-mini
`Publisher: IEEE
`
`Cite This
`
` PDF
`
` << Results
`
`Miki Sato ; Toru Iwasawa ; Akihiko Sugiyama ; Toshihiro Nishizawa ; Yosuke Takano All Authors
`
`  
`Alerts
`
`Manage Content
`
`Alerts
`Add to Citation
`
`Alerts
`
`131
`Full
`Text Views
`
`3P
`
`aper
`Citations
`
` D
`
`ownl
`
`PDF
`
`Abstract:This paper presents a single-chip speech dialogue module and its
`evaluation on a personal robot. This module is implemented on an application
`processor that was developed... View more
`
` Metadata
`Abstract:
`This paper presents a single-chip speech dialogue module and its evaluation on
`a personal robot. This module is implemented on an application processor that
`was developed primarily for mobile phones to provide a compact size, low
`power-consumption, and low cost. It performs speech recognition with
`preprocessing functions such as direction-of-arrival (DOA) estimation, noise
`cancellation, beamforming with an array of microphones, and echo cancellation.
`Text-to-speech (TTS) conversion is also equipped with. Evaluation results
`obtained on a new personal robot, PaPeRo-mini, which is a scale-down version
`of PaPeRo, demonstrate an 85% correct rate in DOA estimation, and as much
`as 54% and 30% higher speech recognition rates in noisy environments and
`
`Abstract
`
`Document
`Sections
`
`1.
`
`INTRODUCTION
`
`2. SPEECH
`DIALOGUE
`MODULE
`
`3. EVALUATION
`
`4. CONCLUSION
`
`Authors
`
`Figures
`
`References
`
`Citations
`
`
`
`Page 4 of 17
`
`SONOS EXHIBIT 1049
`
`More
`Like
`This
`Coherent signals direction-of-arrival
`estimation using a spherical microphone
`array: Frequency smoothing approach
`2009 IEEE Workshop on Applications of
`Signal Processing to Audio and Acoustics
`Published: 2009
`Co-prime Circular Microphone Arrays and
`Their Application to Direction of Arrival
`Estimation of Speech Sources
`ICASSP 2019 - 2019 IEEE International
`Conference on Acoustics, Speech and Signal
`Processing (ICASSP)
`Published: 2019
`Show
`More
`

`

`Keywords
`
`Metrics
`
`More Like This
`
`during robot utterances, respectively. These results are shown to be
`comparable to those obtained by PaPeRo.
`
`Published in: 2009 IEEE International Conference on Acoustics, Speech and
`Signal Processing
`
`Date of Conference: 19-24 April 2009
`
`Date Added to IEEE Xplore: 26 May
`2009
`
`INSPEC Accession Number:
`10701554
`
`DOI: 10.1109/ICASSP.2009.4960429
`
` ISBN Information:
`
` ISSN Information:
`
`Publisher: IEEE
`
`Conference Location: Taipei, Taiwan
`
` Contents
`
`SECTION 1.
`INTRODUCTION
`
`Speech dialogue systems have been receiving particular
`attentions as a user interface for a wide variety of interactive
`applications, such as robots and car navigation systems. These
`applications are generally controlled by voice co mmands from a
`distance. A given co mmand is processed by a speech recognition
`system to generate a corresponding operation. It is also necessary
`to transform text information into an audible form by using a
`text-to-speech (TTS) conversion system. However, it is still
`challenging to perform off-microphone speech recognition,
`where the microphone is placed at a distance from the talker [1].
`The target signal is seriously interfered by other signals and the
`ambient noise in noisy environments. Therefore, noise
`robustness is essential to speech recognition systems in the real
`environment.
`
`To reduce undesirable influence by the ambient noise and the
`interference, signal-processing functions have been used for
`preprocessing the noisy speech. Among these functions are
`estimation of the direction of arrival (DOA) [2], [3], noise
`cancellation [4], beam-forming with a microphone array [5],
`and echo cancellation [6]. DOA estimation identifies the
`direction of the voice co mmand so that the microphone
`directivity is steered towards the speech source. An adaptive
`noise canceller (ANC) and a microphone array (MA) reduce
`undesirable influence which cannot be sufficiently offset by the
`directional microphone. An acoustic echo canceller (AEC)
`suppresses an echo that is a part of robot speech leaking in the
`microphone signal and contaminating the voice co mmand.
`
`In robot applications, these functions are generally implemented
`by software on a platform based on a personal computer (PC) [7].
`It is sometimes necessary to share computational power with
`other applications on the same platform. Considering that a
`larger number of complex applications are required on a robot, it
`is desirable to have a speech dialogue module on a separate
`platform so that the PC-based platform can be fully devoted to
`
`Page 5 of 17
`
`SONOS EXHIBIT 1049
`
`

`

`more complex and computationally intense applications on the
`robot. On such a separate module, the performance of the speech
`dialogue functions becomes more stable and guaranteed. In
`addition, a compact speech dialogue module helps promote
`human-robot interactions, with its portability, on more robots
`that otherwise would not have such an interface.
`
`This paper presents a compact speech dialogue module and its
`evaluation on a personal robot. This module offers dialogue
`functions similar to a personal robot PaPeRo [8] on an
`application processor that was developed primarily for mobile
`phones to provide a compact size, low power-consumption, and
`low cost. In the following section, functions of the speech
`dialogue module are described with the hardware for their
`implementation. Section 3 presents evaluation results of a near-
`field DOA estimator, a noise-robust ANC with variable stepsizes,
`an adaptive beamformer for MA, and a noise-robust AEC in the
`real environment.
`
`SECTION 2.
`SPEECH DIALOGUE MODULE
`
`2.1. Speech Dialogue Functions
`A block diagram of the speech dialogue module is illustrated in
`Fig. 1. This module consists of speech recognition (word
`recognition), DOA estimation, noise reduction, and TTS
`conversion as speech dialogue functions. Noise reduction has
`three subfunctions, namely, an adaptive noise canceller (ANC), a
`microphone-array (MA) beamformer, and an acoustic echo
`canceller (AEC). They operate separately and a desired output is
`manually selected. These functions work as RT (Robot
`Technology) components for RT middleware [9], and can be
`controlled by other network-connected applications.
`
`Fig. 1. Block diagram of Speech Dialogue Module.
`
`
`
`2.2. Hardware
`
`Figure 2 shows a picture of the speech dialogue module whose
`specifications are illustrated in Table 1. An application
`processor, MP211 [10], primarily designed for mobile phones, is
`
`Page 6 of 17
`
`SONOS EXHIBIT 1049
`
`

`

`employed for a sufficient processing power. It consists of one
`DSP and three ARM9 cores and runs on a Linux operating
`system. For audio input interface, this module is equipped with
`synchronous microphone inputs on an extended board that are
`extensible to 16 channels, as well as 2 types of 2-channel
`synchronous microphone inputs on the main board. In addition,
`the module also has some peripheral interfaces such as 2-channel
`loudspeaker outputs, 2-channel camera inputs, an LCD output, a
`USB and a LAN interfaces. It is possible to use a compact flash
`memory (CF) card on an extended board.
`
`Fig. 2. Speech Dialogue Module.
`
`
`
`TABLE 1. Specifications of the Speech Dialogue Module
`
`Item
`
`CPU
`
`Memory
`
`Specification
`
`
`{\rm DSP}({\rm SPXK}6 192 {\rm MHz}) \times 1$
`
`128MB
`+128MB (w/ extended board)
`Flash 64MB
`
`Audio
`Interface
`
`microphone inputs
`speaker output 2ch
`
` (w/ audio board)
`
`Image
`Interface
`
`
`video output, LCD output
`
`Other
`Interface
`
`USB, LAN, IrDA, GPIO
`CF Card (w/ extended board)
`
`Size
`
`
`55
`(w/ audio and extended boards)
`
`2.3. Implementation of the functions
`
`The functions of the module were distributed to an ARM9 and a
`DSP cores running at 192 MHz. The task distribution between the
`ARM9 and the DSP are depicted in Fig. 3. Speech recognition,
`TTS conversion, and RT component translator operates on the
`ARM9. DOA estimation and noise reduction are decomposed into
`core-functions and control blocks. The noise-reduction core
`consists of three sub-cores, namely, ANC core, MA core, and AEC
`core. The control blocks operate on the ARM core and the core-
`fuction blocks on the DSP core. The input signals are converted
`into a digital form at a rate of 11025 Hz and saved in multi-ring
`buffers on an internal memory of the ARM. The computational
`load for each noise-reduction function is compared in Fig. 4.
`
`Page 7 of 17
`
`SONOS EXHIBIT 1049
`
`ARM9(192MHz) × 3
`2ch × 2 + 16ch
`camerainput × 2
`mm × 100mm × 32mm
`

`

`Fig. 3. Task distribution between ARM and DSP cores.
`
`
`
`
`
`Fig. 4. Computational load for each function.
`
`SECTION 3.
`EVALUATION
`
`3.1. Platform: PaPeRo-mini
`
`The speech dialogue module was installed in a PaPeRo-mini
`which is a scale-down version of PaPeRo, a partner robot based
`on a Windows PC. Figure 5 depicts PaPeRo and PaPeRo-mini
`whose specifications are compared in Table 2. PaPeRo-mini is an
`autonomous mobile robot with a size of
`(HWD) and a weight of 2.5 Kg. Equispaced eight omnidirectional
`microphones are mounted around the neck and 2-channel
`loudspeakers are mounted in the bottom. It also has CCD
`cameras, ultrasonic sensors, infrared sensors, touch sensors, a
`pyroelectric sensor, and an LCD.
`
`Page 8 of 17
`
`SONOS EXHIBIT 1049
`
`250 × 170 × 179mm
`

`

`Fig. 5. PaPeRo-mini (Left) and PaPeRo (Right).
`
`TABLE 2. Specifications of PaPeRo-mini and PaPeRo
`
`
`
`PaPeRo-mini
`
`PaPeRo
`
`MP211
`
`Linux
`
`Omnidirectional Mic x8
`
`Pentium-M 1.6 GHz
`
`Windows XP
`
`Omnidirectional Mic x7
`Directional Mic x1
`
`Stereo Loudspeakers
`Line Output x2
`
`Stereo Loudspeakers
`Line Output x2
`
`Stereo CCD Camera
`
`Stereo CCD Camera
`
`Composite Video
`LCD
`
`IrDA
`USB
`
`Li-ion 74Wh
`Operating Time 8h
`
`Composite Video
`RGB
`
`Remote Control
`USB
`
`Li-ion 60Wh
`Operating Time 2h
`
`2.5 kg
`
`5.0 kg
`
`
`
`
`
`CPU
`
`OS
`
`Audio
`Input
`
`Audio
`Output
`
`Image
`Input
`
`Image
`Output
`
`Other
`I/F
`
`Battery
`
`Size
`Weight
`
`3.2. DOA (direction of arrival) Estimation [3]
`
`Figure 6 (a) depicts the evaluation environment in a room with a
`background noise level of 40 dBA. One sentence spoken by 5
`different males and females were presented 10 times at 75 dB
`from a loudspeaker at 1.0 m in elevation. PaPeRo-mini was
`placed 1.5 m away and turned with a step of 30 degrees to make
`12 different DOAs. The microphone arrangement of PaPeRo-mini
`and PaPeRo are illustrated in Fig. 7.
`
`Page 9 of 17
`
`SONOS EXHIBIT 1049
`
`250 × 170 × 179mm
`385 × 248 × 245mm
`

`

`Fig. 6. Evaluation Environment. (a) DOA, (b) ANC/MA/AEC.
`
`
`
`
`
`Fig. 7. Microphone Arrangement (Top View, Distance in cm). (a)
`PaPeRo-mini, (b) PaPeRo
`
`Figure 8 depicts the evaluation result. Any DOA estimation
`results other than those with insufficient power, correlation, or
`inconsistent DOAs among different microphone pairs are
`considered as detection. The correct answer has a margin for an
`error of
` degrees from the true DOA. For comparison, the
`evaluation result of PaPeRo is depicted in Figure 8. Parameters
`for height adjustment [3] were selected as
` and
` for a typical robot use at home. The correct answer
`rate is slightly more degraded than others for 60 degrees.
`However, average rate of correct answers reaches 83% which is
`comparable to PaPeRo.
`
`Fig. 8. DOA Estimation Result.
`
`
`
`3.3. ANC (adaptive noise canceller) [4]
`
`Speech recognition was performed with noise-cancelled speech
`by the ANC. The evaluation environment, prepared in the same
`room as that for DOA estimation, is depicted in Figure 6 (b). 450
`utterances by 9 different males, females and children were
`
`Page 10 of 17
`
`SONOS EXHIBIT 1049
`
`±15
`d = 1.5m
`h = 1.0m
`

`

`presented at a distance of 1.0 m, a height of 1.0 m, and a level of
`70 dB. A loudspeaker presenting a commercial TV-program was
`placed 1.0 m away as the noise source in a direction of 90, 135, or
`180 degrees at a level of 60-65 dB. A dictionary of 50 recognition
`and noise-rejection words [11] was used for speech recognition.
`For signal input, a front-side and a rear-side microphones among
`the eight around the neck of PaPeRo-mini were used as the
`primary and the reference microphones.
`
`Figure 9 demonstrates the speech recognition rate. For
`comparison, the average speech-recognition rate of PaPeRo [4],
`is depicted in Figure 9. It was evaluated on 1500 utterances by
`30 different males, females and children at a distance of 0.5 m
`and 1.5 m at a level of 70 dB. The noise level was set at 55-60 and
`65-70 dB. The recognition rate with an ANC is improved by more
`than 40% and the maximum improvement reaches 54% with
`noise arriving from behind the robot. When there is no noise, the
`recognition rate of PaPeRo-mini is more than 15% lower than
`that of PaPeRo. It comes from the microphones. PaPeRo uses a
`directional microphone, while PaPeRo-mini uses an
`omnidirectional microphone. However, due to the ANC, the
`recognition rates in noisy environment are almost comparable.
`
`Fig. 9. Speech Recognition Result (ANC).
`
`
`
`3.4. MA (microphone array) [5]
`
`In the case of MA, the conditions for evaluation were same as
`those for the ANC except the noise directions. The noise source
`was placed in a direction of 30, 60, or 90 degrees. For the MA,
`four microphones arranged linearly with 0.02 m spacings were
`mounted on the front-side of PaPeRo-mini. Figure 10 depicts the
`speech recognition rate in comparison with an average rate by
`PaPeRo in the same condition as that for PaPeRo-mini. Due to
`the MA, the recognition rate is improved by more than 20% and
`the maximum improvement reaches 40% with noise arriving
`from the front of the robot. The recognition rate of PaPeRo-mini
`with the MA is comparable to that of PaPeRo.
`
`Page 11 of 17
`
`SONOS EXHIBIT 1049
`
`

`

`Fig. 10. Speech Recognition Result (MA).
`
`
`
`3.5. AEC (acoustic echo canceller) [6]
`
`Speech recognition was performed with echo-cancelled speech by
`AEC. The condition of evaluation was the same as that for the
`ANC and the MA except that there was no noise source. The
`music source was presented as an echo at 60-65 dB from a
`loudspeaker mounted on the bottom of PaPeRo-mini. A
`microphone same as the primary microphone for the ANC was
`used to capture the echo and the target speech. Figure 11 depicts
`the speech recognition rate in comparison with the PaPeRo data
`in the same environment. The echo level for PaPeRo was at 55-60
`and 65-70 dB. Due to the AEC, the recognition rate is improved
`by 30% with the sound from the loudspeaker of the robot. The
`speech recognition rate by PaPeRo-mini was equivalent to that of
`PaPeRo.
`
`Fig. 11. Speech Recognition Result (AEC).
`
`
`
`SECTION 4.
`CONCLUSION
`
`A single-chip speech dialogue module and its evaluation on a
`personal robot has been presented. This module has been
`implemented on a single-chip application processor to provide a
`compact size, low power-consumption, and portability. It has
`been equipped with direction of arrival (DOA) estimation,
`adaptive noise cancellation (ANC), a microphone array (MA)
`beamforming, and acoustic echo cancellation (AEC) for speech
`
`Page 12 of 17
`
`SONOS EXHIBIT 1049
`
`

`

`recognition in noisy environment. Evaluation results obtained on
`PaPeRo-mini in real environment have demonstrated an 85%
`correct rate in DOA estimation, and as much as 54% and 30%
`higher speech recognition rates in noisy environments and
`during robot utterances, respectively.
`
`ACKNOWLEDGMENT
`
`This development was supported in part by a common platform
`development project for next-generation robots of NEDO (New
`Energy and Industrial Technology Development Organization).
`
`Authors
`
`Figures
`
`References
`
`Citations
`
`Keywords
`
`Metrics
`
`
`
`
`
`
`
`
`
`
`
`
`
`IEEE Personal Account
`
`Purchase Details
`
`Profile Information
`
`Need Help?
`
`CHANGE USERNAME/PASSWORD
`
`PAYMENT OPTIONS
`
`COMMUNICATIONS PREFERENCES
`
`US & CANADA: +1 800 678 4333
`
`VIEW PURCHASED DOCUMENTS
`
`PROFESSION AND EDUCATION
`
`WORLDWIDE: +1 732 981 0060
`
`TECHNICAL INTERESTS
`
`CONTACT & SUPPORT
`
`Follow
`
`  
`
`About IEEE Xplore | Contact Us | Help | Accessibility | Terms of Use | Nondiscrimination Policy | IEEE Ethics Reporting  | Sitemap | Privacy & Opting Out of Cookies
`A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.
`
`© Copyright 2021 IEEE - All rights reserved.
`
`IEEE Account
`
`Purchase Details
`
`Profile Information
`
`Need Help?
`
`» Change Username/Password
`» Update Address
`
`» Payment Options
`» Order History
`» View Purchased Documents
`
`» Communications Preferences
`» Profession and Education
`» Technical Interests
`
`» US & Canada: +1 800 678 4333
`» Worldwide: +1 732 981 0060
`» Contact & Support
`
`About IEEE Xplore Contact Us
`
`|
`
`
`
`|
`
`Help
`
`
`
`|
`
`Accessibility
`
`
`
`|
`
`Terms of Use
`
`
`
`|
`
`Nondiscrimination Policy
`
`
`
`|
`
`Sitemap
`
`
`
`|
`
`Privacy & Opting Out of Cookies
`
`A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.
`© Copyright 2021 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.
`
`Page 13 of 17
`
`SONOS EXHIBIT 1049
`
`

`

`978-1-4244-2354-5/09/$25.00 ©2009 IEEE
`
`3697
`
`ICASSP 2009
`
`Authorized licensed use limited to: Everything Demo User. Downloaded on December 29,2021 at 15:00:21 UTC from IEEE Xplore. Restrictions apply.
`
`A SINGLE-CHIP SPEECH DIALOGUE MODULE
`AND ITS EVALUATION ON A PERSONAL ROBOT, PAPERO-MINI
`
`Miki Sato, Toru Iwasawa, Akihiko Sugiyama, Toshihiro Nishizawa, Yosuke Takano
`
`NEC Co mmon Platform Software Research Laboratories
`Kawasaki 211-8666, JAPAN
`
`ABSTRACT
`
`Speech Dialogue Module
`
`This paper presents a single-chip speech dialogue module and its
`evaluation on a personal robot. This module is implemented on
`an application processor that was developed primarily for mobile
`phones to provide a compact size, low power-consumption, and low
`cost. It performs speech recognition with preprocessing functions
`such as direction-of-arrival (DOA) estimation, noise cancellation,
`beamforming with an array of microphones, and echo cancellation.
`Text-to-speech (TTS) conversion is also equipped with. Evaluation
`results obtained on a new personal robot, PaPeRo-mini, which is
`a scale-down version of PaPeRo, demonstrate an 85% correct rate
`in DOA estimation, and as much as 54% and 30% higher speech
`recognition rates in noisy environments and during robot utterances,
`respectively. These results are shown to be comparable to those ob-
`tained by PaPeRo.
`
`Index Terms— speech recognition, DOA estimation, noise can-
`cellation, microphone array, echo cancellation, speech dialogue mod-
`ule
`
`1. INTRODUCTION
`
`Speech dialogue systems have been receiving particular attentions
`as a user interface for a wide variety of interactive applications, such
`as robots and car navigation systems. These applications are gen-
`erally controlled by voice co mmands from a distance. A given
`co mmand is processed by a speech recognition system to gener-
`ate a corresponding operation. It is also necessary to transform text
`information into an audible form by using a text-to-speech (TTS)
`conversion system. However, it is still challenging to perform off-
`microphone speech recognition, where the microphone is placed at a
`distance from the talker [1]. The target signal is seriously interfered
`by other signals and the ambient noise in noisy environments. There-
`fore, noise robustness is essential to speech recognition systems in
`the real environment.
`To reduce undesirable influence by the ambient noise and the
`interference, signal-processing functions have been used for prepro-
`cessing the noisy speech. Among these functions are estimation of
`the direction of arrival (DOA) [2, 3], noise cancellation [4], beam-
`forming with a microphone array [5], and echo cancellation [6].
`DOA estimation identifies the direction of the voice co mmand so
`that the microphone directivity is steered towards the speech source.
`An adaptive noise canceller (ANC) and a microphone array (MA)
`reduce undesirable influence which cannot be sufficiently offset by
`the directional microphone. An acoustic echo canceller (AEC) sup-
`presses an echo that is a part of robot speech leaking in the micro-
`phone signal and contaminating the voice co mmand.
`In robot applications, these functions are generally implemented
`by software on a platform based on a personal computer (PC) [7].
`It is sometimes necessary to share computational power with other
`
`RT Component Translator
`
`DOA
`Estimation
`
`Noise
`Reduction
`
`Speech
`Recognition
`
`TTS
`Conversion
`
`ANC
`
`MA
`
`AEC
`
`Speech
`Recognition
`
`Speech
`Synthesis
`
`Multichannel Audio Input
`
`Loudspeaker
`
`Network I/F
`
`Microphones
`
`Other Applications
`
`Fig. 1. Block diagram of Speech Dialogue Module.
`
`applications on the same platform. Considering that a larger num-
`ber of complex applications are required on a robot, it is desirable
`to have a speech dialogue module on a separate platform so that the
`PC-based platform can be fully devoted to more complex and com-
`putationally intense applications on the robot. On such a separate
`module, the performance of the speech dialogue functions becomes
`more stable and guaranteed. In addition, a compact speech dialogue
`module helps promote human-robot interactions, with its portability,
`on more robots that otherwise would not have such an interface.
`This paper presents a compact speech dialogue module and its
`evaluation on a personal robot. This module offers dialogue func-
`tions similar to a personal robot PaPeRo [8] on an application pro-
`cessor that was developed primarily for mobile phones to provide a
`compact size, low power-consumption, and low cost. In the follow-
`ing section, functions of the speech dialogue module are described
`with the hardware for their implementation. Section 3 presents eval-
`uation results of a near-field DOA estimator, a noise-robust ANC
`with variable stepsizes, an adaptive beamformer for MA, and a noise-
`robust AEC in the real environment.
`
`2. SPEECH DIALOGUE MODULE
`
`2.1. Speech Dialogue Functions
`
`A block diagram of the speech dialogue module is illustrated in Fig.
`1. This module consists of speech recognition (word recognition),
`DOA estimation, noise reduction, and TTS conversion as speech di-
`alogue functions. Noise reduction has three subfunctions, namely,
`an adaptive noise canceller (ANC), a microphone-array (MA) beam-
`former, and an acoustic echo canceller (AEC). They operate sepa-
`rately and a desired output is manually selected. These functions
`
`Page 14 of 17
`
`SONOS EXHIBIT 1049
`
`

`

`Authorized licensed use limited to: Everything Demo User. Downloaded on December 29,2021 at 15:00:21 UTC from IEEE Xplore. Restrictions apply.
`
`3698
`
`Shared
`Memory
`
`For DOA Est.
`Input 1
`
`Input 4
`DOA
`
`For Noise
`Reduction
`
`Input 1
`
`Input 4
`Noise-
`Cancelled
`Signal
`
`DSP
`
`DOA Est.
`Core
`
`ANC
`Core
`
`MA
`Core
`
`AEC
`Core
`
`ARM1
`
`RT Component Translator
`
`Speech
`Recognition
`
`TTS
`Conversion
`
`DOA Est.
`Control
`
`Noise
`Reduction
`Control
`
`Input #20
`
`Input #3
`Input #2
`Input #1
`
`Output
`
`Multi-ring Buffer
`
`Internal
`Memory
`
`(cid:14984)(cid:14984)(cid:14985)(cid:15015)(cid:15012)
`
`Fig. 2. Speech Dialogue Module.
`
`Table 1. Specifications of the Speech Dialogue Module
`
`Fig. 3. Task distribution between ARM and DSP cores.
`
`TOTAL 348 MIPS
`
`AEC (142)
`
`ANC
`(78)
`
`MA
`(82)
`
`DOA
`
`(46)
`
`Fig. 4. Computational load for each function.
`
`Item
`CPU
`
`Memory
`
`Audio
`Interface
`
`Image
`Interface
`Other
`Interface
`Size
`
`Specification
`ARM9(192 MHz) (cid:152) 3
`DSP(SPXK6 192 MHz) (cid:152) 1
`128MB
`+128MB (w/ extended board)
`Flash 64MB
`microphone inputs 2ch (cid:152) 2
`+16ch (w/ audio board)
`speaker output 2ch
`camera input (cid:152) 2
`video output, LCD output
`USB, LAN, IrDA, GPIO
`CF Card (w/ extended board)
`55 mm (cid:152) 100 mm (cid:152) 32 mm
`(w/ audio and extended boards)
`
`work as RT (Robot Technology) components for RT middleware [9],
`and can be controlled by other network-connected applications.
`
`2.2. Hardware
`
`Figure 2 shows a picture of the speech dialogue module whose spec-
`ifications are illustrated in Table 1. An application processor, MP211
`[10], primarily designed for mobile phones, is employed for a suf-
`ficient processing power. It consists of one DSP and three ARM9
`cores and runs on a Linux operating system. For audio input inter-
`face, this module is equipped with synchronous microphone inputs
`on an extended board that are extensible to 16 channels, as well as
`2 types of 2-channel synchronous microphone inputs on the main
`board. In addition, the module also has some peripheral interfaces
`such as 2-channel loudspeaker outputs, 2-channel camera inputs, an
`LCD output, a USB and a LAN interfaces. It is possible to use a
`compact flash memory (CF) card on an extended board.
`
`2.3. Implementation of the functions
`
`The functions of the module were distributed to an ARM9 and a DSP
`cores running at 192 MHz. The task distribution between the ARM9
`and the DSP are depicted in Fig. 3. Speech recognition, TTS con-
`version, and RT component translator operates on the ARM9. DOA
`estimation and noise reduction are decomposed into core-functions
`and control blocks. The noise-reduction core consists of three sub-
`cores, namely, ANC core, MA core, and AEC core. The control
`blocks operate on the ARM core and the core-fuction blocks on the
`DSP core. The input signals are converted into a digital form at a rate
`of 11025 Hz and saved in multi-ring buffers on an internal memory
`
`Fig. 5. PaPeRo-mini (Left) and PaPeRo (Right).
`
`of the ARM. The computational load for each noise-reduction func-
`tion is compared in Fig. 4.
`
`3. EVALUATION
`
`3.1. Platform: PaPeRo-mini
`
`The speech dialogue module was installed in a PaPeRo-mini which
`is a scale-down version of PaPeRo, a partner robot based on a Win-
`dows PC. Figure 5 depicts PaPeRo and PaPeRo-mini whose speci-
`fications are compared in Table 2. PaPeRo-mini is an autonomous
`mobile robot with a size of 250 × 170 × 179 mm (HWD) and a
`weight of 2.5 Kg. Equispaced eight omnidirectional microphones
`are mounted around the neck and 2-channel loudspeakers are mounted
`in the bottom. It also has CCD cameras, ultrasonic sensors, infrared
`sensors, touch sensors, a pyroelectric sensor, and an LCD.
`
`3.2. DOA (direction of arrival) Estimation [3]
`
`Figure 6 (a) depicts the evaluation environment in a room with a
`background noise level of 40 dBA. One sentence spoken by 5 differ-
`ent males and females were presented 10 times at 75 dB from a loud-
`speaker at 1.0 m in elevation. PaPeRo-mini was placed 1.5 m away
`and turned with a step of 30 degrees to make 12 different DOAs. The
`microphone arrangement of PaPeRo-mini and PaPeRo are illustrated
`in Fig. 7.
`
`Page 15 of 17
`
`SONOS EXHIBIT 1049
`
`

`

`Authorized licensed use limited to: Everything Demo User. Downloaded on December 29,2021 at 15:00:21 UTC from IEEE Xplore. Restrictions apply.
`
`3699
`
`12 cm
`
`3
`
`13 cm
`
`12 cm
`
`(a)
`
`FRONT
`
`2
`
`14 cm
`
`14 cm
`
`1
`
`3
`
`21 cm
`
`(b)
`
`1
`
`2
`
`13 cm
`
`4
`
`Fig. 7. Microphone Arrangement (Top View, Distance in cm). (a)
`PaPeRo-mini, (b) PaPeRo
`
`Detection: Correct Ans.:
`
`0
`30
`60
`90
`120
`150
`180
`210
`240
`270
`300
`330
`Ave
`
`DOA [deg]
`
`PaPeRo
`
`0
`20
`60
`40
`80
`100
`Detection/Correct Answer Rates [%]
`
`Fig. 8. DOA Estimation Result.
`
`ison, the average speech-recognition rate of PaPeRo [4], is depicted
`It was evaluated on 1500 utterances by 30 different
`in Figure 9.
`males, females and children at a distance of 0.5 m and 1.5 m at a
`level of 70 dB. The noise level was set at 55-60 and 65-70 dB. The
`recognition rate with an ANC is improved by more than 40% and
`the maximum improvement reaches 54% with noise arriving from
`behind the robot. When there is no noise, the recognition rate of
`PaPeRo-mini is more than 15% lower than that of PaPeRo. It comes
`from the microphones. PaPeRo uses a directional microphone, while
`PaPeRo-mini uses an omnidirectional microphone. However, due to
`the ANC, the recognition rates in noisy environment are almost com-
`parable.
`
`3.4. MA (microphone array) [5]
`
`In the case of MA, the conditions for evaluation were same as those
`for the ANC except the noise directions. The noise source was
`placed in a direction of 30, 60, or 90 degrees. For the MA, four mi-
`crophones arranged linearly with 0.02 m spacings were mounted on
`the front-side of PaPeRo-mini. Figure 10 depicts the speech recogni-
`tion rate in comparison with an average rate by PaPeRo in the same
`condition as that for PaPeRo-mini. Due to the MA, the recognition
`rate is improved by more than 20% and the maximum improvement
`reaches 40% with noise arriving from the front of the robot. The
`recognition rate of PaPeRo-mini with the MA is comparable to that
`of PaPeRo.
`
`3.5. AEC (acoustic echo canceller) [6]
`
`Speech recognition was performed with echo-cancelled speech by
`AEC. The condition of evaluation was the same as that for the ANC
`
`Table 2. Specifications of PaPeRo-mini and PaPeRo
`
`CPU
`OS
`Audio
`Input
`Audio
`Output
`Image
`Input
`Image
`Output
`Other
`I/F
`Battery
`
`Size
`Weight
`
`PaPeRo
`PaPeRo-mini
`Pentium-M 1.6 GHz
`MP211
`Windows XP
`Linux
`Omnidirectional Mic x8 Omnidirectional Mic x7
`Directional Mic x1
`Stereo Loudspeakers
`Line Output x2
`Stereo CCD Camera
`
`Stereo Loudspeakers
`Line Output x2
`Stereo CCD Camera
`
`Composite Video
`LCD
`IrDA
`USB
`Li-ion 74Wh
`Operating Time 8h

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket