throbber
(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY(PCT)
`(19) World Intellectual Property
`Organization
`International Bureau
`
`(10) International Publication Number
`(43) International Publication Date
`WO 2017/105998 Al
` 22 June 2017 (22.06.2017) WIPO|PCT
`
`\Z
`
`(22) International Filing Date:
`
`(25) Filing Language:
`(26) Publication Language:
`
`8 December 2016 (08.12.2016)
`
`English
`English
`
`(81) Designated States (unless otherwise indicated, for every
`($1) International Patent Classification:
`
`GIOL 21/0208 (2013.01)—GIOL 21/0216 (2013.01) kind of national protection available): AE, AG, AL, AM,
`21) International Application Number:
`AO, AT, AU, AZ, BA, BB, BG, BH, BN, BR, BW, BY,
`@1)
`International
`Application Number:
`BZ, CA, CH,CL, CN, CO, CR, CU, CZ, DE, DJ, DK, DM,
`5
`a
`DO, DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT,
`PCT/US2016/065563
`HN, HR, HU, ID, IL, IN, IR, IS, JP, BE, KG, KH, KN,
`KP, KR, KW, KZ, LA, LC, LK, LR, LS, LU, LY, MA,
`MD, ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG,
`NI, NO, NZ, OM, PA, PE, PG, PH, PL, PT, QA, RO, RS,
`RU, RW,SA, SC, SD, SE, SG, SK, SL, SM,ST, SV, SY,
`TH, TJ, TM, TN, TR, TT, TZ, UA, UG, US, OZ, VC, VN,
`ZA, ZM, ZW.
`
`(0) Priority Data:
`14/973,274
`
`US
`
`17 December 2015 (17.12.2015)
`:
`INC.
`TECHNOLOGIES,
`AMAZON
`(1) Applicant:
`[US/US]; PO Box 81226, Seaitle, Washington 98108-1226
`(US).
`Inventors: AYRAPETIAN, Robert; 410 Terry Avenue
`North, Seattle, Washington 98109-5210 (US), HILMES
`>
`>
`8
`9
`Philip Ryan; 410 Terry Avenue North, Seattle, Washing-
`ton 98109-5210 (US)
`,
`,
`.
`(74) Agent: BARZILAY, Dan; 2 Seaport Lane, Suite 300, Bo-
`ston, Massachusetts 02210-2028 (US).
`
`(72)
`
`(84) Designated States (unless otherwise indicated, for every
`kind of regional protection availabie}): ARIPO (BW, GH,
`GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, ST, SZ,
`TZ, UG, 4M, ZW), Eurasian (AM, AZ, BY, KG, KZ, RU,
`TJ, TM), European (AL, AT, BE, BG, CH,CY, CZ, DE,
`DK, EE, ES,FL FR, GB, GR, HR, HU, IE,IS, I, LT, LU,
`5
`:
`LV, MC, MK, MT, NL, NO,PL, PT, RO, RS. SF,SI, SK,
`SM, TR), OAPI (BE, BJ, CF, CG, CL, CM, GA, GN, GQ,
`GW, KM, ML, MR, NE, SN, TD, TG).
`sn.
`Published:
`—— with international search report (Art. 21(3))
`
`(54) Title: ADAPTIVE BEAMFORMING TO CREATE REFERENCE CHA
`
`NELS
`
`
`
`
`
`System
`{Oy
`Wireless
`
`RP inks 113.
`
`
`
`FIG. 1
`
`
`Fixed Beamformer
`‘PBF105
`
`Muliple Input
`Canceier (MC)
`106
`
`
`
`
` +
`
`
`
`
`
`
`
`
`| Microphone
`4
`
`Adaptive Beamtormer 164 |
`y 12a 2ye t20b
`
`igna!
`
`Target Signal|| Referens
`
`|
`12
`
`
`we
`Audio Output
`128
`Acoustic Echo Canceliation
`(AEC) 108
`
`
`
`
`730
`
`aR
`
`eceive audiinput
`132
`
`x
`Parform audio beamforming
`
`Device 162,
`
`
`Output audio data
`
`
`(57) Abstract: An echo cancellation system that performs audio beamforming to separate audio input into multiple directions and
`determines a target signal and a reference signal trom the multiple directions. For cxample, the system may detect a strong signal as -
`sociated with a speaker and select the strong signal as a reference signal, selecting another direction as a target signal. The system
`may determine a speech position and mayselect the speech position as a target signal and an opposite direction as a reference signal.
`The system may create pairwise combinations of opposite directions, with an individual direction being selected as a target signal
`and a reference signal. The system mayselect a fixed beamformer output for the target signal and an adaptive beamformer output for
`the reference signal, or vice versa. The system may remove the reference signal (e.g., audio output by the loudspeaker) to isolate
`speech included in the target signal.
`
`AMZN0001532
`
`
`
` wo2017/105998A1|IIMIMIMNIMIINTNNAINTANTMMEATAATT
`
`

`

`WO 2017/105998
`
`PCT/US2016/065363
`
`ADAPTIVE BEAMFORMING TO CREATE REFERENCE CHANNELS
`
`CROSS-REFERENCE TO RELATED APPLICATION DATA
`
`This application claims priority to U.S. Patent Application No. 14/973,274filed
`
`on December 17, 2015 which is incorporated herein by reference inits entirety.
`
`BACKGROUND
`
`In audio systems, automatic echo cancellation (AEC) refers to techniques that are
`
`used to recognize when a system has recaptured sound via a microphone after some delay
`
`that the system previously output via a speaker. Systems that provide AEC subtract a delayed
`
`10
`
`version of the onginal audio signal from the captured audio, producing a version of the
`
`captured audio that ideally eliminates the “echo”of the original audio signal, leaving only
`
`newaudio information. For example, if someone were singing karaoke into a microphone
`
`while prerecorded music is output by a loudspeaker, AEC can be used to remove anyof the
`
`recorded music from the audio captured by the microphone, allowing the singer’s voice to be
`
`amplified and output without also reproducing a delayed “echo” the original music. As
`
`another example, a media player that accepts voice commands via a microphone can use
`
`AEC to remove reproduced sounds corresponding to output media that are captured bythe
`
`microphone, makingit easier to process input voice commands.
`
`20
`
`25
`
`BRIEF DESCRIPTION OF DRAWINGS
`
`For a more complete understanding of the present disclosure, reference is now
`
`madeto the following description taken in conjunction with the accompanying drawings.
`
`FIG. | illustrates an echo cancellation systemthat performs adaptive beamforming
`
`according to embodiments of the present disclosure.
`
`FIG. 2 is an illustration of beamforming according lo embodiments of the present
`
`disclosure.
`
`FIGS. 3A-3B illustrate examples of beamforming configurations according to
`
`embodiments of the present disclosure.
`
`FIG. 4 illustrates an example of different techniques of adaptive beamforming
`
`according to embodiments of the present disclosure.
`
`FIGS. 5A-5B illustrate examples of a first signal mapping usingafirst technique
`
`AMZN0001533
`
`

`

`WO 2017/105998
`
`PCT/US2016/063563
`
`according to embodiments of the present disclosure.
`
`FIGS. 6A-6C illustrate examples of signal mappings using the first technique
`
`according to embodiments of the present disclosure.
`
`FIGS. 7A-7C illustrate examples of signal mappings using a second technique
`
`according to embodiments of the present disclosure.
`
`FIGS. 8A-8Billustrate examples of signal mappings using a third technique
`
`according to embodiments of the present disclosure.
`
`FIG. 9 is a flowchart conceptuallyillustrating an example method for determining
`
`a signal mapping according to embodiments of the present disclosure.
`
`10
`
`FIGS. 10A-10B illustrate an example of a signal mapping using a fourth technique
`
`according to embodiments of the present disclosure.
`
`FIG. 11 is a flowchart conceptually illustrating an example method for
`
`determining a signal mapping according to embodiments of the present disclosure.
`
`FIG. 12 is a block diagram conceptually illustrating example components of a
`
`system for echo cancellation according to embodiments of the present disclosure.
`
`DETAILED DESCRIPTION
`
`Typically, a conventional Acoustic Echo Cancellation (AEC) system may remove
`
`audio output by a loudspeaker from audio captured by the system’s microphone(s) by
`
`20
`
`subtracting a delayed version of the originally transmitted audio. However, in stereo and
`
`multi-channel audio systems that include wireless or network-connected loudspeakers and/or
`
`microphones, a major cause of problems is whenthere are differences between the signal sent
`
`to a loudspeaker and a signal played at the loudspeaker. As the signal sent to the loudspeaker
`
`is not the sameas the signal played at the loudspeaker, the signal sent to the loudspeakeris
`
`NoA
`
`not a true reference signal for the AEC system. For example, when the AEC system attempts
`
`to remove the audio output by the loudspeaker from audio captured bythe system’s
`
`microphone(s) by subtracting a delayed version of the originally transmitted audio, the audio
`
`captured by the microphoneis subtly different than the audio that had been sentto the
`
`loudspeaker.
`
`There maybe a difference between the signal sent to the loudspeaker and the
`
`signal played at the loudspeaker for one or more reasons. A first cause is a difference in
`
`clock synchronization (e.g., clock offset) between loudspeakers and microphones. For
`
`2
`
`AMZN0001534
`
`

`

`WO 2017/105998
`
`PCT/US2016/063563
`
`example, in a wireless “surround sound”5.1 system comprising six wireless loudspeakers
`
`that each receive an audio signal from a surround-sound receiver, the receiver and each
`
`loudspeaker has its own crystal oscillator which provides the respective component with an
`
`independent “clock” signal. Among other things that the clock signals are used for is
`
`converting analog audio signals into digital audio signals (“A/D conversion’’) and converting
`
`digital audio signals into analog audio signals (“D/A conversion”). Such conversions are
`
`commonplace in audio systems, such as when a surround-sound receiver performs A/D
`
`conversion prior to transmitting audio to a wireless loudspeaker, and when the loudspeaker
`
`performs D/A conversion on the received signal to recreate an analog signal. The
`
`loudspeaker produces audible sound by driving a “voice coil” with an amplified version of
`
`the analog signal.
`
`A second causeis that the signal sent to the loudspeaker may be modified based
`
`on compression/decompression during wireless communication, resulting in a different signal
`
`being received by the loudspeaker than was sent to the loudspeaker. A third case is non-
`
`linear post-processing performed on the received signal by the loudspeakerprior to playing
`
`the received signal. A fourth cause is buffering performed by the loudspeaker, which could
`
`create unknown latency, additional samples, fewer samples or the like that subtly change the
`
`signal played by the loudspeaker.
`
`10
`
`15
`
`To perform Acoustic Echo Cancellation (AEC) without knowingthe signal played
`
`20
`
`by the loudspeaker, devices, systems and methods may perform audio beamforming on a
`
`signal received by the microphones and may determine a reference signal and a target signal
`
`based on the audio beamforming. For example, the system may receive audio input and
`
`separate the audio input into multiple directions. The system maydetect a strong signal
`
`associated with a speaker and mayset the strong signal as a reference signal, selecting
`
`NoA
`
`another direction as a target signal.
`
`In some examples, the system may determine a speech
`
`position (e.g., near end talk position) and may set the direction associated with the speech
`
`position as a target signal and an opposite direction as a reference signal. If the system
`
`cannot detect a strong signal or determine a speech position, the system may create pairwise
`
`combinations of opposite directions, with an individual direction being used as a target signal
`
`and a reference signal. The system may remove the reference signal(e.g., audio output by
`
`the loudspeaker) to isolate speech includedin the target signal.
`
`FIG. 1 illustrates a high-level conceptual block diagram of echo-cancellation
`
`3
`
`AMZN0001535
`
`

`

`WO 2017/105998
`
`PCT/US2016/063563
`
`aspects of an AEC system 100. As illustrated, an audio input 110 provides stereo audio
`
`“reference” signals x1(n) 112a and x2(n) 112b. The reference signal x;(n) 1 12a is transmitted
`
`via a radio frequency (RF) link 113 to a wireless loudspeaker 114a, and the reference signal
`
`x2(n) 112b is transmitted via an RF link 113 to a wireless loudspeaker 114b. Each speaker
`
`outputs the received audio, and portions of the output sounds are captured bya pair of
`
`microphones 118a and 118b as “echo” signals y(n) 120a and y2(n) 120b, which contain some
`
`of the reproduced sounds from the reference signals x;(n) 112a and x2(n) 1125,in addition to
`
`any additional sounds (e.g., speech) picked up by the microphones 118.
`
`To isolate the additional sounds from the reproduced sounds, the device 102 may
`
`10
`
`include an adaptive beamformer 104 that may perform audio beamforming on the echo
`
`signals 120 to determine a target signal 122 and a reference signal 124. For example, the
`
`adaptive beamformer 104 may include a fixed beamformer (FBF) 105, a multiple input
`
`canceler (MC) 106 and/or a blocking matrix (BM) 107. The FBF 105 maybe configured to
`
`form a beam in a specific direction so that a target signal is passed andall other signals are
`
`attenuated, enabling the adaptive beamformer 104 to select a particular direction.
`
`In contrast,
`
`the BM 107 may be configured to form a null in a specific direction so that the target signal is
`
`attenuated and all other signals are passed. The adaptive beamformer 104 maygenerate fixed
`
`beamforms(e.g., outputs of the FBF 105) or may generate adaptive beamforms using a
`
`Linearly Constrained Minimum Variance (LCMV) beamformer, a Minimum Variance
`
`20
`
`Distortioniess Response (MVDR) beamformer or other beamforming techniques. For
`
`example, the adaptive beamformer 104 may receive audio input, determine six beamforming
`
`directions and output six fixed beamform outputs and six adaptive beamform outputs. In
`
`some examples, the adaptive beamformer 104 may generate six fixed beamform outputs, six
`
`LCMV beamform outputs and six MVDR beamform outputs, although the disclosure is not
`
`NoA
`
`limited thereto. Using the adaptive beamformer 104 and techniques discussed below, the
`
`device 102 may determine the target signal 122 and the reference signal 124 to pass to an
`
`acoustic echo cancellation (AEC) 108. The AEC 108 may removethe reference signal (e.¢.,
`
`reproduced sounds) from the target signal (e.g., reproduced sounds and additional sounds) to
`
`remove the reproduced sounds andisolate the additional sounds (e.g., speech) as audio output
`
`126.
`
`Toillustrate, in some examples the device 102 may use outputs of the FBF 105 as
`
`the target signal 122. For example, the outputs of the FBF 105 may be shownin equation (1):
`
`4
`
`AMZN0001536
`
`

`

`WO 2017/105998
`
`PCT/US2016/063563
`
`Target =s +7 + noise
`
`a)
`
`where s is speech (e.g., the additional sounds), z is an echo from the signal sent to the
`
`loudspeaker(e.g., the reproduced sounds) and noise is additional noise that is not associated
`
`with the speech or the echo. In order to attenuate the echo (z), the device 102 may use
`
`outputs of the BM 107 as the reference signal 124, which may be shown in equation2:
`
`10
`
`Reference = z + noise
`
`(2)
`
`By removing the reference signal 124 from the target signal 122, the device 102 may remove
`
`the echo and generate the audio output 126 including onlythe speech and somenoise. The
`
`device 102 mayuse the audio output 126 to perform speech recognition processing on the
`
`speech to determine a command and mayexecute the command. For example, the device
`
`102 may determine that the speech corresponds to a command to play music and the device
`
`102 may play music in response to receiving the speech.
`
`In some examples, the device 102 mayassociate specific directions with the
`
`reproduced sounds and/or speech based on features of the signal sent to the loudspeaker.
`
`Examples of features includes power spectrum density, peak levels, pause intervals or the like
`
`20
`
`that may be used to identify the signal sent to the loudspeaker and/or propagation delay
`
`between different signals. For example, the adaptive beamformer 104 may compare the
`
`signal sent to the loudspeaker with a signal associated with a first direction to determineif the
`
`signal associated with the first direction includes reproduced sounds from the loudspeaker.
`
`When the signal associated with the first direction matches the signal sent to the loudspeaker,
`
`NoA
`
`the device 102 may associate the first direction with a wireless speaker. When the signal
`
`associated with the first direction does not match the signal sent to the loudspeaker, the
`
`device 102 may associate thefirst direction with speech, a speech position, a person or the
`
`like.
`
`Asillustrated in FIG. 1, the device 102 may receive (130) an audio input and may
`
`perform (132) audio beamforming. For example, the device 102 mayreceive the audio input
`
`from the microphones 118 and mayperform audio beamforming to separate the audio input
`
`into separate directions. The device 102 may determine (134) a speech position (e.g., near
`
`5
`
`AMZN0001537
`
`

`

`WO 2017/105998
`
`PCT/US2016/063563
`
`end talk position) associated with speech and/or a person speaking. For example, the device
`
`102 mayidentify the speech, a person and/or a position associated with the speech/person
`
`using audio data (e.g., audio beamforming when speech is recognized), video data (e.g., facial
`
`recognition) and/or other inputs known to one of skill in the art. The device 102 may
`
`determine (136) a target signal and may determine (138) a reference signal based on the
`
`speech position and the audio beamforming. For example, the device 102 may associate the
`
`speech position with the target signal and may select an opposite direction as the reference
`
`signal.
`
`The device 102 may determine the target signal and the reference signal using
`
`10
`
`oultiple techniques, which are discussed in greater detail below. For example, the device
`
`102 mayuse a first technique when the device 102 detects a clearly defined speaker signal, a
`
`second technique when the device 102 doesn’t detect a clearly defined speaker signal but
`
`does identify a speech position and/or a third technique when the device 102 doesn’t detect a
`
`clearly defined speaker signal or a speech position. Using the first technique, the device 102
`
`mayassociate the clearly defined speaker signal with the reference signal and may select any
`
`or all of the other directions as the target signal. For example, the device 102 may generate a
`
`single target signal using all of the remaining directions for a single loudspeaker or may
`
`generate multiple target signals using portions of remaining directions for multiple
`
`loudspeakers. Using the second technique, the device 102 may associate the speech position
`
`20
`
`with the target signal and may select an opposite direction as the reference signal. Using the
`
`third technique, the device 102 may select multiple combinations of opposing directions to
`
`generate multiple target signals and multiple reference signals.
`
`The device 102 may remove (140) an echo from the target signal by removing the
`
`reference signal to isolate speech or additional sounds and may output (142) audio data
`
`NoA
`
`including the speech or additional sounds. For example, the device 102 may remove music
`
`(e.g., reproduced sounds) played over the loudspeakers 114 to isolate a voice command input
`
`to the microphones 118.
`
`The device 102 may include a microphone array having multiple microphones 118
`
`that are laterally spaced from each other so that they can be used by audio beamforming
`
`components to produce directional audio signals. The microphones 118 may, in some
`
`instances, be dispersed around a perimeter of the device 102 in order to apply beampatterns to
`
`audio signals based on sound captured by the microphone(s) 118. For example, the
`
`6
`
`AMZN0001538
`
`

`

`WO 2017/105998
`
`PCT/US2016/063563
`
`microphones 118 may be positioned at spaced intervals along a perimeter of the device 102,
`
`although the present disclosure is not limited thereto.
`
`In some examples, the microphone(s)
`
`118 maybe spaced on a substantially vertical surface of the device 102 and/or a top surface
`
`of the device 102. Each of the microphones 118 is omnidirectional, and beamforming
`
`technology is used to produce directional audio signals based on signals from the
`
`microphones 118. In other embodiments, the microphones mayhave directional audio
`
`reception, which may removethe need for subsequent beamforming.
`
`In various embodiments, the microphone array may include greater or less than the
`
`number of microphones 118 shown. Speaker(s) (not illustrated) may be located at the bottom
`
`of the device 102, and may be configured to emit sound omnidirectionally, in a 360 degree
`
`pattern around the device 102. For example, the speaker(s) may comprise a round speaker
`
`element directed downwardlyin the lowerpart of the device 102.
`
`Using the plurality of microphones 118 the device 102 may employ beamforming
`
`techniques to isolate desired sounds for purposes of converting those soundsinto audio
`
`signals for speech processing by the system. Beamformingis the process of applying a set of
`
`beamformercoefficients to audio signal data to create beampatterns, or effective directions of
`
`gain or attenuation. In some implementations, these volumes maybe consideredto result
`
`from constructive and destructive interference between signals from individual microphones
`
`in a microphonearray.
`
`The device 102 mayinclude an adaptive beamformer 104 that may include one or
`
`more audio beamformers or beamforming components that are configured to generate an
`
`audio signal that is focused in a direction from which user speech has been detected. More
`
`specifically, the beamforming components may be responsiveto spatially separated
`
`microphone elements of the microphonearray to produce directional audio signals that
`
`emphasize sounds originating from different directions relative to the device 102, and to
`
`select and output one of the audio signals that is most likely to contain user speech.
`
`Audio beamforming, also referred to as audio array processing, uses a microphone
`
`array having multiple microphones that are spaced from each other at known distances.
`
`Sound originating from a source is received by each of the microphones. However, because
`
`each microphoneis potentially at a different distance from the sound source, a propagating
`
`sound wave arrives at each of the microphonesatslightly different times. This difference in
`
`arrival time results in phase differences between audio signals produced by the microphones.
`
`7
`
`10
`
`15
`
`20
`
`NoA
`
`AMZN0001539
`
`

`

`WO 2017/105998
`
`PCT/US2016/063563
`
`The phase differences can be exploited to enhance sounds originating from chosen directions
`
`relative to the microphone array.
`
`Beamforming uses signal processing techniques to combine signals from the different
`
`microphonesso that sound signals originating from a particular direction are emphasized
`
`while sound signals from other directions are deemphasized. More specifically, signals from
`
`the different microphones are combined in such a way that signals from a particular direction
`
`experience constructive interference, while signals from other directions experience
`
`destructive interference. The parameters used in beamforming maybe varied to dynarnically
`
`select different directions, even when using a fixed-configuration microphonearray.
`
`A given beampattern maybe used to selectively gather signals from a particular
`
`spatial location where a signal source is present. The selected beampattern may be
`
`configured to provide gain or attenuation for the signal source. For example, the beampattern
`
`may be focused on a particular user’s head allowing for the recovery of the user’s speech
`
`while attenuating noise from an operating air conditioner that is across the room and in a
`
`different direction than the user relative to a device that captures the audio signals.
`
`Such spatial selectivity by using beamforming allows for the rejection or attenuation
`
`of undesired signals outside of the beampattern. The increased selectivity of the beampattern
`
`improves signal-to-noise ratio for the audio signal. By improving the signal-to-noise ratio,
`
`the accuracy of speaker recognition performed on the audio signal is improved.
`
`The processed data from the beamformer module may then undergo additional
`
`filtering or be used directly by other modules. For example, a filter may be applied to
`
`processed data which is acquiring speech from a user to remove residual audio noise from a
`
`machine running in the environment.
`
`FIG. 2 is an illustration of beamforming according to embodiments of the present
`
`disclosure. FIG. 2 illustrates a schematic of a beampattern 202 formed by applying
`
`beamforming coefficients to signal data acquired from a microphonearray of the device 102.
`
`As mentioned above, the beampattern 202 results from the application of a set of beamformer
`
`coefficients to the signal data. The beampattern generates directions of effective gain or
`
`attenuation.
`
`In this illustration, the dashed line indicates isometric lines of gain provided by
`
`the beamforming coefficients. For example, the gain at the dashed line here may be +12
`
`decibels (dB) relative to an isotropic microphone.
`
`The beampattern 202 mayexhibit a plurality of lobes, or regions of gain, with gain
`
`8
`
`10
`
`15
`
`20
`
`NoA
`
`AMZN0001540
`
`

`

`WO 2017/105998
`
`PCT/US2016/063563
`
`predominating in a particular direction designated the beampattern direction 204. A main
`
`lobe 206 is shown here extending along the beampattern direction 204. A main lobe beam-
`
`width 208 is shown, indicating a maximum width of the main lobe 206. In this example, the
`
`beampattern 202 also includes side lobes 210, 212, 214, and 216. Opposite the main lobe 206
`
`along the beampattern direction 204 is the back lobe 218. Disposed around the beampattern
`
`202 are null regions 220. These null regions are areas of attenuation to signals.
`
`In the
`
`example, the person 10 resides within the main lobe 206 and benefits from the gain provided
`
`by the beampattern 202 and exhibits an improved SNR ratio compared to a signal acquired
`
`with non-beamforming.
`
`In contrast, if the person 10 were to speak from a null region, the
`
`10
`
`resulting audio signal may besignificantly reduced. As shown in this illustration, the use of
`
`the beampattern provides for gain in signal acquisition compared to non-beamforming.
`
`Beamformingalso allowsfor spatial selectivity, effectively allowing the system to “turn a
`
`deaf ear” on a signal which is not of interest. Beamforming mayresult in directional audio
`
`signal(s) that may then be processed by other components of the device 102 and/or system
`
`15
`
`100.
`
`While beamforming alone may increase a signal-to-noise (SNR) ratio of an audio
`
`signal, combining known acoustic characteristics of an environment(e.g., a room impulse
`
`response (RIR)) and heuristic knowledge of previous beampattern lobe selection may provide
`
`an even better indication of a speaking user’s likely location within the environment.
`
`In some
`
`20
`
`instances, a device includes multiple microphones that capture audio signals that include user
`
`speech. As is known and as used herein, “capturing” an audio signal includes a microphone
`
`transducing audio waves of captured soundto an electrical signal and a codec digitizing the
`
`signal. The device mayalso include functionality for applying different beampatterns to the
`
`captured audio signals, with each beampattern having multiple lobes. By identifying lobes
`
`NoA
`
`most likely to contain user speech using the combination discussed above, the techniques
`
`enable devotion of additional processing resources of the portion of an audio signal most
`
`likely to contain user speech to provide better echo canceling and thus a cleaner SNR ratio mn
`
`the resulting processed audio signal.
`
`To determine a value of an acoustic characteristic of an environment (e.g., an RIR of
`
`the environment), the device 102 may emit sounds at known frequencies (e.g., chirps, text-to-
`
`speech audio, music or spoken word content playback, etc.) to measure a reverberant
`
`signature of the environment to generate an RIR of the environment. Measured over time in
`
`9
`
`AMZNO0001541
`
`

`

`WO 2017/105998
`
`PCT/US2016/063563
`
`an ongoing fashion, the device may be able to generate a consistent picture of the RIR and the
`
`reverberant qualities of the environment, thus better enabling the device to determine or
`
`approximate whereit is located in relation to walls or corners of the environment (assuming
`
`the device is stationary). Further, if the device is moved, the device may be able to determine
`
`this change by noticing a change in the RIR pattern. In conjunction withthis information, by
`
`tracking which lobe of a beampattern the device most often selects as having the strongest
`
`spoken signal path over time, the device may begin to notice patterns in which lobes are
`
`selected. If a certain set of lobes (or microphones) is selected, the device can heuristically
`
`determine the user's typical speaking location in the environment. The device may devote
`
`more CPU resources to digital signal processing (DSP) techniques forthat lobe or set of
`
`lobes. For example, the device may run acoustic echo cancelation (AEC) at full strength
`
`across the three most commonlytargeted lobes, instead of picking a single lobe to run AECat
`
`full strength. The techniques maythus improve subsequent automatic speech recognition
`
`(ASR) and/or speaker recognition results as long as the device is not rotated or moved. And,
`
`if the device 1s moved, the techniques mayhelp the device to determine this change by
`
`comparing current RIR results to historical ones to recognize differences that are significant
`
`enough to cause the device to begin processing the signal coming from all lobes
`
`approximately equally, rather than focusing only on the most commonly targeted lobes.
`
`10
`
`15
`
`By focusing processing resources on a portion of an audio signal most likely to
`
`20
`
`include user speech, the SNR of that portion may be increased as compared to the SNR if
`
`processing resources were spread out equally to the entire audio signal. This higher SNR for
`
`the most pertinent portion of the audio signal may increase the efficacy of the device 102
`
`when performing speaker recognition on the resulting audio signal.
`
`Using the beamforming and directional based techniques above,
`
`the system may
`
`NoA
`
`determine a direction of detected audio relative to the audio capture components.
`
`Such
`
`direction information may be used to link speech / a recognized speaker identity to video data
`
`as described below.
`
`FIGS. 3A-3B illustrate examples of beamforming configurations according to
`
`embodiments of the present disclosure. As illustrated in FIG. 3A, the device 102 may
`
`perform beamforming to determinea plurality of portions or sections of audio received from
`
`a microphonearray. FIG. 3A illustrates a beamforming configuration 310 including six
`
`portions or sections (e.g, Sections 1-6). For example, the device 102 may include six
`
`10
`
`AMZN0001542
`
`

`

`WO 2017/105998
`
`PCT/US2016/063563
`
`different microphones, may divide an area around the device 102 into six sections or the like.
`
`However, the present disclosure is not limited thereto and the number of microphonesin the
`
`microphonearray and/or the numberof portions/sections in the beamforming may vary. As
`
`illustrated in FIG. 3B, the device 102 may generate a beamforming configuration 312
`
`including eight portions/sections (e.g., Sections 1-8) without departing from the disclosure.
`
`For example, the device 102 mayinclude eight different microphones, may divide the area
`
`around the device 102 into eight portions/sections or the like. Thus, the following examples
`
`mayperform beamforming and separate an audio signal into eight different portions/sections,
`
`but these examples are intended as illustrative examples and the disclosure is not limited
`
`10
`
`thereto.
`
`The numberof portions/sections generated using beamforming does not depend
`
`on the number of microphones in the microphonearray. For example, the device 102 may
`
`include twelve microphones in the microphonearray but may determine three portions, six
`
`portions or twelve portions of the audio data without departing from the disclosure. As
`
`15
`
`discussed above, the adaptive beamformer 104 may generate fixed beamforms (e.g., outputs
`
`of the FBF 105) or may generate adaptive beamforms using a Linearly Constrained Minimum
`
`Variance (LCMV) beamformer, a Minimum Variance Distortionless Response (MVDR)
`
`beamformer or other beamforming techniques. For example, the adaptive beamformer 104
`
`may receive the audio input, may determine six beamforming directions and output six fixed
`
`20
`
`beamform outputs and six adaptive beamform outputs corresponding to the six beamforming
`
`directions.
`
`In some examples, the adaptive beamformer 104 may generate six fixed
`
`beamform outputs, six LCMV beamform outputs and six MVDR beamform outputs, although
`
`the disclosure is not limited thereto.
`
`The device 102 may determine a number of wireless loudspeakers and/or
`
`NoA
`
`directions associated with the wireless loudspeakers using the fixed beamform outputs. For
`
`example, the device 102 may localize energy in the frequency domain andclearly identify
`
`much higher energy in two directions associated with two wireless loudspeakers(e.g., a first
`
`direction associated with a first speaker and a second direction associated with a second
`
`speaker).
`
`In some examples, the device 102 may determine an existence and/or location
`
`associated with the wireless loudspeakers using a frequencyrange (e.g., 1 kHz to 3 kHz),
`
`although the discl

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket