IPR2022-00888, No. 1001 Exhibit - US Patent No 8,321,213 (P.T.A.B. May. 20, 2022)

USOO8321 213B2
`
`(12) United States Patent
`Petit et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,321,213 B2
`*Nov. 27, 2012
`
`(54) ACOUSTIC VOICE ACTIVITY DETECTION
`(AVAD) FOR ELECTRONIC SYSTEMS
`
`(75) Inventors: Nicolas Petit, San Francisco, CA (US);
`Gregory Burnett, Dodge Center, MN
`(US); Zhinian Jing, San Francisco, CA
`(US)
`(73) Assignee: AliphCom, Inc., San Francisco, CA
`(US)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 540 days.
`This patent is Subject to a terminal dis
`claimer.
`
`(*) Notice:
`
`(21) Appl. No.: 12/606,146
`
`(22) Filed:
`
`Oct. 26, 2009
`
`(65)
`
`Prior Publication Data
`US 2010/O128894 A1
`May 27, 2010
`
`Related U.S. Application Data
`(63) Continuation-in-part of application No. 12/139,333,
`filed on Jun. 13, 2008, and a continuation-in-part of
`application No. 1 1/805,987, filed on May 25, 2007,
`now abandoned.
`(60) Provisional application No. 61/108,426, filed on Oct.
`24, 2008.
`
`(51) Int. Cl.
`(2006.01)
`GIOL II/06
`(52) U.S. Cl. ........................................ 704/208; 704/214
`(58) Field of Classification Search .................. 704/208,
`704/210, 214, 215; 381/99, 100, 46
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`5.459,814 A * 10/1995 Gupta et al. .................. TO4,233
`7,171,357 B2* 1/2007 Boland ......
`... 704/231
`7,246,058 B2* 7/2007 Burnett ......
`... 704/226
`7.464,029 B2 * 12/2008 Visser et al. ... r TO4/210
`8,019,091 B2 * 9/2011 Burnett et al. ............... 381,718
`2009/0089053 A1* 4/2009 Wang et al. ................... TO4,233
`* cited by examiner
`Primary Examiner — Abul Azad
`(74) Attorney, Agent, or Firm — Kokka & Backus, PC
`(57)
`ABSTRACT
`Acoustic Voice Activity Detection (AVAD) methods and sys
`tems are described. The AVAD methods and systems, includ
`ing corresponding algorithms or programs, use microphones
`to generate virtual directional microphones which have very
`similar noise responses and very dissimilar speech responses.
`The ratio of the energies of the virtual microphones is then
`calculated over a given window size and the ratio can then be
`used with a variety of methods to generate a VAD signal. The
`virtual microphones can be constructed using either an adap
`tive or a fixed filter.
`
`42 Claims, 35 Drawing Sheets
`
`
`
`Forming first virtual microphone by combining
`first signal of first physical microphone and
`second signal of second physical microphone.
`
`Forming filter that describes relationship for
`speech between first physical microphone
`and second physical microphone.
`
`Forming second virtual microphone by
`applying filter to first signal to generate
`first intermediate signal, and summing
`first intermediate signal and second signal.
`
`-500
`
`502
`
`504
`
`506
`
`Generating energy ratio of energies of first virtual
`microphone and second virtual microphone.
`
`508
`
`Detecting acoustic voice activity of speaker when
`energy ratio is greater than threshold value.
`
`510
`
`Page 1 of 56
`
`GOOGLE EXHIBIT 1001
`
`

`U.S. Patent
`U.S. Patent
`
`Nov. 27, 2012
`Nov. 27, 2012
`
`Sheet 1 of 35
`Sheet 1 of 35
`
`US 8,321,213 B2
`US 8,321,213 B2
`
`
`
`
`
`FIG.2
`FIG.2
`
`Page 2 of 56
`
`Page 2 of 56
`
`

`U.S. Patent
`U.S. Patent
`
`Nov. 27, 2012
`Nov. 27, 2012
`
`Sheet 2 of 35
`Sheet 2 of 35
`
`US 8,321,213 B2
`US 8,321,213 B2
`
`
`
`FIG.3
`FIG.3
`
`
`
`
`
`Page 3 of 56
`
`Page 3 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 3 Of 35
`
`US 8,321,213 B2
`
`
`
`Forming first virtual microphone by combining
`first signal of first physical microphone and
`second signal of second physical microphone.
`
`Forming filter that describes relationship for
`speech between first physical microphone
`and second physical microphone.
`
`Forming second virtual microphone by
`applying filter to first signal to generate
`first intermediate signal, and summing
`first intermediate signal and second signal.
`
`-500
`
`502
`
`504
`
`506
`
`Generating energy ratio of energies of first virtual
`microphone and second virtual microphone.
`
`508
`
`Detecting acoustic voice activity of speaker when
`energy ratio is greater than threshold value.
`
`510
`
`FIG.5
`
`Page 4 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 4 of 35
`
`US 8,321,213 B2
`
`
`
`
`
`(038) QUI?
`
`9’OIH
`
`Page 5 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 5 Of 35
`
`
`
`07G909GZ0ZG?0||G US 8,321,213 B2
`
`L'OIH
`
`
`
`(008) QUI?
`
`- - - - - - - - - - - - - - -
`
`
`
`
`
`Á??O ?009ds £104 pôX? JOJ (u10110q) ZA pub (do]) IA
`
`
`
`
`
`
`
`
`
`Page 6 of 56
`
`

`U.S. Patent
`U.S. Patent
`
`Nov. 27, 2012
`Nov.27, 2012
`
`Sheet 6 of 35
`Sheet 6 of 35
`
`US 8,321,213 B2
`US 8,321,213 B2
`
`
`
`(998)ouy QSIOUUIyooeds
`
`
`v}0qpox]JOF(W10}30q)ZApue(doy)[A
`
`------
`
`8Did
`
`Page 7 of 56
`
`Page 7 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 7 Of 35
`
`US 8,321,213 B2
`
`
`
`
`
`(008) QUI?
`
`6’OIH
`
`Page 8 of 56
`
`

`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 8 of 35
`
`US 8,321,213 B2
`
` AyTuo
`
`
`
`yooadsejaqaAtdepeJoy(W0}30q)ZApue(doy)[A
`
`
`
`(998)SUIT}
`
`01Did
`
`Page 9 of 56
`
`Page 9 of 56
`
`

`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 9 of 35
`
`US 8,321,213 B2
`
` asiouUlyooads
`onu
`
`
`ejaqSAIdepeJoy(tu0}}0q)ZApue(dor)[A
`0€_GC02S0}
`
`
`on
`
`on
`
`oH
`
`(998)ow
`
`IPDW
`
`
`
`
`
`
`Page 10 of 56
`
`Page 10 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 10 of 35
`
`US 8,321,213 B2
`
`
`
`
`
`Microphones
`
`1220
`
`
`
`Voicing
`Sensors
`
`
`
`
`
`Processor
`
`1230
`
`Detection
`Subsystem
`Denoising
`Subsystem
`
`1250
`
`1240
`
`FIG.12
`
`
`
`
`
`
`
`
`
`
`
`Processor
`
`1230
`
`Detection
`Subsystem
`
`Denoising
`Subsystem
`
`1250
`
`1240
`
`FIG.13
`
`Page 11 of 56
`
`

`U.S. Patent
`U.S. Patent
`
`Nov. 27, 2012
`Nov.27, 2012
`
`Sheet 11 of 35
`Sheet 11 of 35
`
`US 8,321,213 B2
`US 8,321,213 B2
`
`
`
`
`
`O
`C
`N
`v
`
`Page 12 of 56
`
`yosadspauesy)
`
`
`BAOWOYISTON
`
`¢SIN
`
`(u)uasION
`
`v(co)
`
`(ou
`
`Page 12 of 56
`
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 12 of 35
`
`US 8,321,213 B2
`
`1250
`TN
`
`
`
`
`
`
`
`Step 10 mseC
`
`inrnw:
`Retail 2. data
`from m1, m2, gems
`II ly
`CalculateXCORRofm, gems
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Calcmean (abs(XCORR)) =MC
`
`CalcSTDDEVof gems = GSD
`V(window) = 2
`bhi = bhold
`
`ls MC> WTC and Y NO
`GSD > WTS?
`
`ls
`new std XUV ma'sd ma
`and
`td X UW
`new S OR UW Sd
`are we at the beginning?
`
`
`
`old std = new std
`keep old=0
`bhold=bh?
`
`
`
`After both Subbands
`checked, is
`CEL(SUM(UV)2)= 1?
`Ye
`
`
`
`UV(subband) = 2
`keep old = 1
`
`FIG.15
`
`Constants:
`W = 0 if noise, 1 if UW, 2 if V
`WTC E VOiced threshold for COrr
`WTS = Voiced threshold for std. deV.
`f=forgetting factor for std. deV.
`num_ma of taps in mafilter
`UV ma=UV std dev m.a. thresh
`UV std=UV std dev threshold
`UVF binary values denoting UV
`detected in each subband.
`num begin=# win at "beginning
`Variables:
`bhl=LMS calc of MIC 1-2TF
`keep old = 1 if last win V/UV, OOW
`sdma vector = last NV Sd values
`sdma = m.a. Of the last NV Sd
`PSAD
`UV = (0,0), Filterm? and
`m2 into 2 bands, 1500-2500
`and 2500-3500 HZ
`Calculate bhi using
`Pathfinder for each Subband
`new sum = Sum(abs(bh1);
`if not keep Old Or at beginning,
`add new sum to new Sum vector
`(fnumbers long)
`
`new std=STDDEVof
`new Sum Vector
`If not keep Old Or at beginning,
`shiftsdma vector to right
`Replace first value in
`sdma Vector with old std
`
`Filtersdma Vector with moving
`average filter to get Sdma
`
`Page 13 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 13 Of 35
`
`US 8,321,213 B2
`
`Gems and Mean Correlation
`
`O
`
`0.5
`
`1
`
`1.5
`2
`2.5
`FIG.16A
`
`3
`
`3.5
`
`4
`
`
`
`Gems and Standard Deviation
`
`Page 14 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 14 of 35
`
`US 8,321,213 B2
`
`
`
`Voicing
`
`-1700
`
`1706.
`Acoustic
`
`Page 15 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 15 Of 35
`
`US 8,321,213 B2
`
`
`
`Linear array
`midline
`
`FIG.18
`
`Page 16 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 16 of 35
`
`US 8,321,213 B2
`
`
`
`d1 versus delta M for delta d = 1,2,3,4 cm
`
`1900
`
`dl (cm)
`FIG.19
`
`Page 17 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 17 Of 35
`
`US 8,321,213 B2
`
`Acoustic data (solid) and gain parameter (dashed)
`
`
`
`2000
`
`Gain Parameter
`
`O
`
`0.5
`
`1
`
`2.5
`2
`1.5
`time (samples)
`
`3
`
`4
`3.5
`x 10'
`
`FIG20
`
`Page 18 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 18 Of 35
`
`US 8,321,213 B2
`
`2100
`
`
`
`Mic 1 and W for "pop pan" in \headmicmicgems plbin
`Voicing Signal
`Audio Signal
`2104
`
`Unvoiced
`Level
`
`Gems Signal
`
`2106
`
`-
`
`:
`
`Not Woiced
`
`O
`
`0.5
`
`1
`
`2.5
`2
`15
`time (samples)
`FIG.21
`
`3
`
`4
`
`3.5
`X 10
`
`Page 19 of 56
`
`

`U.S. Patent
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 19 of 35
`
`US 8,321,213 B2
`US 8,321,213 B2
`
`
`yosadgpourayy
`
`[PAOWIOYOSION
`
`0077—*
`
`
`
`WOTEWIOJU]SUIDIOA
`
`POC
`
`
`
`
`
`
`
`CN
`C
`S2
`t
`
`(sy)|00C2
`
`TVNDIS
`
`(u)s
`
`((%))|10¢¢
`
`ASION
`
`(u)u
`
`Page 20 of 56
`
`Page 20 of 56
`
`
`

`U.S. Patent
`U.S. Patent
`
`Nov. 27, 2012
`Nov. 27, 2012
`
`Sheet 20 Of 35
`Sheet 20 of 35
`
`US 8,321,213 B2
`US 8,321,213 B2
`
`
`
`
`
`Page 21 of 56
`
`Page 21 of 56
`
`

`U.S. Patent
`U.S. Patent
`
`Nov. 27, 2012
`Nov.27, 2012
`
`Sheet 21 Of 35
`Sheet 21 of 35
`
`US 8,321,213 B2
`US 8,321,213 B2
`
`
`
`
`
`Page 22 of 56
`
`Page 22 of 56
`
`

`Nov. 27, 2012
`Nov.27, 2012
`
`Sheet 22 Of 35
`Sheet 22 of 35
`
`US 8,321,213 B2
`US 8,321,213 B2
`
`C
`C
`Mad
`
`
`
`U.S. Patent
`U.S. Patent
`
`
`
`Page 23 of 56
`
`Page 23 of 56
`
`

`U.S. Patent
`U.S. Patent
`
`Nov. 27, 2012
`Nov.27, 2012
`
`Sheet 23 Of 35
`Sheet 23 of 35
`
`US 8,321,213 B2
`US 8,321,213 B2
`
`
`
` aw”
`
`TTT TT eee
`
`i
`2702
`
`FIG.27
`FIG.27
`
`Page 24 of 56
`
`Page 24 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 24 of 35
`
`US 8,321,213 B2
`
`Receive acoustic signals at a first physical
`microphone and a second physical microphone.
`
`Output first microphone signal from first physical
`microphone and second microphone signal from
`second physical microphone.
`
`Form first virtual microphone using the first combination
`of first microphone signal and second microphone signal.
`
`Form second virtual microphone using second combination
`of first microphone signal and second microphone signal.
`
`Generate denoised output signals having less
`acoustic noise than received acoustic signals.
`2800-
`FIG.28
`
`
`
`Form physical microphone array including first
`physical microphone and second physical microphone.
`
`Form virtual microphone array including first virtual
`microphone and second virtual microphone using
`signals from physical microphone array.
`
`2802
`
`2804
`
`2806
`
`2808
`
`2810
`
`2902
`
`2904
`
`Page 25 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 25 Of 35
`
`US 8,321,213 B2
`
`Linear response of W2 to a speech source at 0.10 meters
`
`
`
`Linear response of V2 to ano
`tl meters
`1SC SOCC 2.
`
`-- - - - - - - - - - - - -
`
`Page 26 of 56
`
`

`U.S. Patent
`U.S. Patent
`
`Nov.27, 2012
`
`Sheet 26 of 35
`
`US 8,321,213 B2
`
`Linear response of V1 to a speech source at 0.10 meters
`
`180
`
`ban
`
`V11111
`
`mee
`—_
`
`In¢ar Tesponse 0
`
`f V1 toa no
`
`ISC SOUrCE a
`
`t 1 meters
`
`
`
`180
`oSCc
`
`- - - - - - - - - - -
`
`L
`
`Page 27 of 56
`
`Page 27 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 27 Of 35
`
`US 8,321,213 B2
`
`Linear response of W1 to a speech source at 0.1 meters
`
`
`
`
`
`t
`
`- - - - - - -
`
`
`
`A 4000Hz
`240
`
`
`
`Page 28 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 28 Of 35
`
`US 8,321,213 B2
`
`Frequency response at 0 degrees
`
`5-------------------------------------N----------------------------
`Cardioid speech
`... / ... response .N.
`
`t
`
`-5
`A2 -10----j- V1 speech ----
`response
`-15-----------------------------------------------------------------------------
`
`
`
`-20
`
`1000
`
`2000
`
`5000
`4000
`3000
`Frequency (Hz)
`
`6000
`
`7000
`
`8000
`
`FIG.35
`
`Page 29 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 29 Of 35
`
`US 8,321,213 B2
`
`W1 (top, dashed) and V2 speech response vs. Bassuming d = 0.lm
`
`7
`FIG.36
`W1W2 for speech versus Bassuming d = 0.1m
`
`
`
`B
`FIG.37
`
`Page 30 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 30 Of 35
`
`US 8,321,213 B2
`
`Bfactor VS. actuald assuming d = 0.lm and theta=0
`
`
`
`0.05
`
`0.1
`
`1.25
`
`0.15
`
`0.35
`
`0.3
`0.25
`0.2
`Actualds (meters)
`FIG.38
`B versus theta assuming d = 0.lm
`
`0.4
`
`0.45
`
`0.5
`
`-80
`
`-60
`
`-40
`
`20
`O
`-20
`theta (degrees)
`FIG.39
`
`40
`
`60
`
`80
`
`Page 31 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 31 Of 35
`
`US 8,321,213 B2
`
`O
`
`1000
`
`2000
`
`3000
`
`4000
`
`5000
`
`6000
`
`7000
`
`8000
`
`
`
`S
`
`AH
`
`-100
`
`1000
`
`2000
`
`5000
`4000
`3000
`Frequency (Hz)
`
`6000
`
`7000
`
`8000
`
`FIG.40
`
`Page 32 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 32 Of 35
`
`US 8,321,213 B2
`
`
`
`O
`
`
`
`1000
`
`2000
`
`3000
`
`4000
`
`5000
`
`6000
`
`7000
`
`8000
`
`O
`
`1000
`
`2000
`
`
`
`5000
`4000
`3000
`Frequency (Hz)
`
`FIG.41
`
`6000
`
`7000
`
`8000
`
`Page 33 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 33 Of 35
`
`US 8,321,213 B2
`
`Cancellation with d1 = 1, theta1 = 0, d2 = 1, and theta2 = 30
`
`O
`
`1000
`
`2000
`
`3000
`
`4000
`
`5000
`
`6000
`
`7000
`
`8000
`
`
`
`O
`
`1000
`
`2000
`
`
`
`5000
`4000
`3000
`Frequency (Hz)
`
`FIG.42
`
`6000
`
`7000
`
`8000
`
`Page 34 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 34 of 35
`
`US 8,321,213 B2
`
`Cancellation with d1 = 1, theta1 = 0, d2 = 1, and theta2 = 45
`
`-10---------
`-20-21.-----------------------------------
`-30/---------------------
`-40
`O
`1000
`2000
`3000
`4000
`5000
`6000
`7000
`8000
`
`
`
`d 3.
`92
`S
`A
`
`
`
`
`
`
`
`60;
`
`1000
`
`2000
`
`5000
`4000
`3000
`Frequency (Hz)
`
`6000
`
`7000
`
`8000
`
`FIG.43
`
`Page 35 of 56
`
`

`U.S. Patent
`
`Nov. 27, 2012
`
`Sheet 35 of 35
`
`US 8,321,213 B2
`
`Original Wl (top) and cleaned Wl (bottom) with simplified WAD (dashed) in noise
`0.4
`0.3
`
`2
`
`
`
`O
`
`0.5
`
`1.5
`1
`Time (samples at 8 kHz/sec)
`
`2
`
`2.5 1 n5
`X 10
`
`FIG.44
`
`Page 36 of 56
`
`

`US 8,321,213 B2
`
`1.
`ACOUSTIC VOICE ACTIVITY DETECTION
`(AVAD) FOR ELECTRONIC SYSTEMS
`
`RELATED APPLICATIONS
`
`This application claims the benefit of U.S. Patent Applica
`tion No. 61/108,426, filed Oct. 24, 2008.
`This application is a continuation in part of U.S. patent
`application Ser. No. 1 1/805,987, filed May 25, 2007.
`This application is a continuation in part of U.S. patent
`application Ser. No. 12/139,333, filed Jun. 13, 2008.
`
`TECHNICAL FIELD
`
`The disclosure herein relates generally to noise Suppres
`Sion. In particular, this disclosure relates to noise Suppression
`systems, devices, and methods for use in acoustic applica
`tions.
`
`10
`
`15
`
`BACKGROUND
`
`2
`FIG. 6 shows experimental results of the algorithm using a
`fixed beta when only noise is present, under an embodiment.
`FIG. 7 shows experimental results of the algorithm using a
`fixed beta when only speech is present, under an embodiment.
`FIG. 8 shows experimental results of the algorithm using a
`fixed beta when speech and noise is present, under an embodi
`ment.
`FIG. 9 shows experimental results of the algorithm using
`an adaptive beta when only noise is present, under an embodi
`ment.
`FIG. 10 shows experimental results of the algorithm using
`an adaptive beta when only speech is present, under an
`embodiment.
`FIG. 11 shows experimental results of the algorithm using
`an adaptive beta when speech and noise is present, under an
`embodiment.
`FIG. 12 is a block diagram of a NAVSAD system, under an
`embodiment
`FIG. 13 is a block diagram of a PSAD system, under an
`embodiment.
`FIG. 14 is a block diagram of a denoising Subsystem,
`referred to herein as the Pathfinder system, under an embodi
`ment.
`FIG. 15 is a flow diagram of a detection algorithm for use
`in detecting Voiced and unvoiced speech, under an embodi
`ment.
`FIGS. 16A, 16B, and 17 show data plots for an example in
`which a subject twice speaks the phrase "pop pan’, under an
`embodiment.
`FIG.16A plots the received GEMS signal for this utterance
`along with the mean correlation between the GEMS signal
`and the Mic 1 signal and the threshold T1 used for voiced
`speech detection, under an embodiment.
`FIG.16B plots the received GEMS signal for this utterance
`along with the standard deviation of the GEMS signal and the
`threshold T2 used for voiced speech detection, under an
`embodiment.
`FIG. 17 plots voiced speech detected from the acoustic or
`audio signal, along with the GEMS signal and the acoustic
`noise; no unvoiced speech is detected in this example because
`of the heavy background babble noise, under an embodiment.
`FIG. 18 is a microphone array for use under an embodi
`ment of the PSAD system.
`FIG. 19 is a plot of AM versus d for several Ad values,
`under an embodiment.
`FIG.20 shows a plot of the gain parameter as the sum of the
`absolute values of H (Z) and the acoustic data or audio from
`microphone 1, under an embodiment.
`FIG. 21 is an alternative plot of acoustic data presented in
`FIG. 20, under an embodiment.
`FIG. 22 is a two-microphone adaptive noise Suppression
`system, under an embodiment.
`FIG. 23 is a generalized two-microphone array (DOMA)
`including an array and speech Source S configuration, under
`an embodiment.
`FIG.24 is a system for generating or producing a first order
`gradient microphone V using two omnidirectional elements
`O and O, under an embodiment.
`FIG. 25 is a block diagram for a DOMA including two
`physical microphones configured to form two virtual micro
`phones V and V, under an embodiment.
`FIG. 26 is a block diagram for a DOMA including two
`physical microphones configured to form N virtual micro
`phones V through V, where N is any number greater than
`one, under an embodiment.
`
`25
`
`30
`
`The ability to correctly identify voiced and unvoiced
`speech is critical to many speech applications including
`speech recognition, speaker verification, noise Suppression,
`and many others. In a typical acoustic application, speech
`from a human speaker is captured and transmitted to a
`receiver in a different location. In the speaker's environment
`there may exist one or more noise sources that pollute the
`speech signal, the signal of interest, with unwanted acoustic
`noise. This makes it difficult or impossible for the receiver,
`whether human or machine, to understand the user's speech.
`Typical methods for classifying voiced and unvoiced
`speech have relied mainly on the acoustic content of single
`microphone data, which is plagued by problems with noise
`and the corresponding uncertainties in signal content. This is
`especially problematic with the proliferation of portable com
`munication devices like mobile telephones. There are meth
`ods known in the art for Suppressing the noise present in the
`speech signals, but these normally require a robust method of
`40
`determining when speech is being produced. Non-acoustic
`methods have been employed Successfully in commercial
`products such as the Jawbone headset produced by Aliphcom,
`Inc., San Francisco, Calif. (Aliph), but an acoustic-only solu
`tion is desired in some cases (e.g., for reduced cost, as a
`Supplement to the non-acoustic sensor, etc.).
`
`35
`
`45
`
`INCORPORATION BY REFERENCE
`
`Each patent, patent application, and/or publication men
`50
`tioned in this specification is herein incorporated by reference
`in its entirety to the same extent as if each individual patent,
`patent application, and/or publication was specifically and
`individually indicated to be incorporated by reference.
`
`55
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a configuration of a two-microphone array with
`speech source S, under an embodiment.
`FIG. 2 is a block diagram of V2 construction using a fixed
`B(Z), under an embodiment.
`FIG. 3 is a block diagram of V construction using an
`adaptive f3(Z), under an embodiment.
`FIG. 4 is a block diagram of V construction, under an
`embodiment.
`FIG. 5 is a flow diagram of acoustic voice activity detec
`tion, under an embodiment.
`
`60
`
`65
`
`Page 37 of 56
`
`

`3
`FIG.27 is an example of a headset orhead-worn device that
`includes the DOMA, as described herein, under an embodi
`ment.
`FIG. 28 is a flow diagram for denoising acoustic signals
`using the DOMA, under an embodiment.
`FIG.29 is a flow diagram for forming the DOMA, under an
`embodiment.
`FIG. 30 is a plot of linear response of virtual microphone
`V with B=0.8 to a 1 kHz speech source at a distance of 0.1 m,
`under an embodiment.
`FIG. 31 is a plot of linear response of virtual microphone
`V, with (B=0.8 to a 1 kHz noise source at a distance of 1.0m,
`under an embodiment.
`FIG. 32 is a plot of linear response of virtual microphone
`V with B=0.8 to a 1 kHz speech source at a distance of 0.1 m,
`under an embodiment.
`FIG.33 is a plot of linear response of virtual microphone
`V with B=0.8 to a 1 kHz noise source at a distance of 1.0m,
`under an embodiment.
`FIG. 34 is a plot of linear response of virtual microphone
`V with B=0.8 to a speech source at a distance of 0.1 m for
`frequencies of 100, 500, 1000, 2000, 3000, and 4000 Hz,
`under an embodiment.
`FIG. 35 is a plot showing comparison of frequency
`responses for speech for the array of an embodiment and for
`a conventional cardioid microphone, under an embodiment.
`FIG. 36 is a plot showing speech response for V (top,
`dashed) and V (bottom, solid) versus B withdassumed to be
`0.1 m, under an embodiment, under an embodiment.
`FIG.37 is a plot showing a ratio of V/V, speech responses
`shown in FIG. 31 versus B, under an embodiment.
`FIG.38 is a plot of B versus actual d assuming that d-10
`cm and theta=0, under an embodiment.
`FIG. 39 is a plot of B versus theta with d-10 cm and
`assuming di-10 cm, under an embodiment.
`FIG. 40 is a plot of amplitude (top) and phase (bottom)
`response of N(s) with B=1 and D=-7.2 usec, under an
`embodiment.
`FIG. 41 is a plot of amplitude (top) and phase (bottom)
`response of N(s) with B=1.2 and D=-7.2 usec, under an
`embodiment.
`FIG. 42 is a plot of amplitude (top) and phase (bottom)
`response of the effect on the speech cancellation in V, due to
`a mistake in the location of the speech source with q1 =0
`degrees and q2–30 degrees, under an embodiment.
`FIG. 43 is a plot of amplitude (top) and phase (bottom)
`response of the effect on the speech cancellation in V, due to
`a mistake in the location of the speech source with q1 =0
`degrees and q2–45 degrees, under an embodiment.
`FIG. 44 shows experimental results for a 2d 19 mm array
`using a linear B of 0.83 and B1 =B2=1 on a Bruel and Kjaer
`Head and Torso Simulator (HATS) in very loud (-85 dBA)
`music/speech noise environment.
`
`10
`
`15
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`US 8,321,213 B2
`
`4
`signal but requires training. In addition, restrictions can be
`placed on the filter to ensure that it is training only on speech
`and not on environmental noise.
`In the following description, numerous specific details are
`introduced to provide a thorough understanding of, and
`enabling description for, embodiments. One skilled in the
`relevant art, however, will recognize that these embodiments
`can be practiced without one or more of the specific details, or
`with other components, systems, etc. In other instances, well
`known structures or operations are not shown, or are not
`described in detail, to avoid obscuring aspects of the disclosed
`embodiments.
`FIG. 1 is a configuration of a two-microphone array of the
`AVAD with speech source S, under an embodiment. The
`AVAD of an embodiment uses two physical microphones (O.
`and O.) to form two virtual microphones (V and V). The
`virtual microphones of an embodiment are directional micro
`phones, but the embodiment is not so limited. The physical
`microphones of an embodiment include omnidirectional
`microphones, but the embodiments described herein are not
`limited to omnidirectional microphones. The virtual micro
`phone (VM)V is configured in such away that it has minimal
`response to the speech of the user, while V is configured so
`that it does respond to the user's speech but has a very similar
`noise magnitude response to V, as described in detail herein.
`The PSADVAD methods can then be used to determine when
`speech is taking place. A further refinement is the use of an
`adaptive filter to further minimize the speech response of V.
`thereby increasing the speech energy ratio used in PSAD and
`resulting in better overall performance of the AVAD.
`The PSAD algorithm as described herein calculates the
`ratio of the energies of two directional microphones M and
`M:
`
`where the “Z” indicates the discrete frequency domain and
`99
`1
`ranges from the beginning of the window of interest to the
`end, but the same relationship holds in the time domain. The
`Summation can occur over a window of any length, 200
`samples at a sampling rate of 8 kHZ has been used to good
`effect. Microphone M is assumed to have a greater speech
`response than microphone M. The ratio R depends on the
`relative strength of the acoustic signal of interest as detected
`by the microphones.
`For matched omnidirectional microphones (i.e. they have
`the same response to acoustic signals for all spatial orienta
`tions and frequencies), the size of R can be calculated for
`speech and noise by approximating the propagation of speech
`and noise waves as spherically symmetric Sources. For these
`the energy of the propagating wave decreases as 1/y :
`
`R=
`
`M(z) dz
`M2(zi)
`di
`
`: - :
`
`di +d
`d
`
`The distance d is the distance from the acoustic source to
`M. d is the distance from the acoustic source to M, and
`d=d-d (see FIG. 1). It is assumed that O is closer to the
`speech Source (the user's mouth) so that d is always positive.
`If the microphones and the user's mouth are all on a line, then
`d=2d, the distance between the microphones. For matched
`
`DETAILED DESCRIPTION
`
`55
`
`Acoustic Voice Activity Detection (AVAD) methods and
`systems are described herein. The AVAD methods and sys
`tems, which include algorithms or programs, use micro
`phones to generate virtual directional microphones which
`have very similar noise responses and very dissimilar speech
`responses. The ratio of the energies of the virtual micro
`phones is then calculated over a given window size and the
`ratio can then be used with a variety of methods to generate a
`VAD signal. The virtual microphones can be constructed
`using either a fixed or an adaptive filter. The adaptive filter
`generally results in a more accurate and noise-robust VAD
`
`60
`
`65
`
`Page 38 of 56
`
`

`US 8,321,213 B2
`
`6
`The filter B(z) can also be determined experimentally using
`an adaptive filter. FIG.3 is a block diagram of V construction
`using an adaptive B(Z), under an embodiment, where:
`
`5
`omnidirectional microphones, the magnitude of R, depends
`only on the relative distance between the microphones and the
`acoustic source. For noise Sources, the distances are typically
`a meter or more, and for speech sources, the distances are on
`the order of 10 cm, but the distances are not so limited.
`Therefore for a 2-cm array typical values of R are:
`
`12 cm
`R d2
`S - 4 - 10 en -
`d.
`102 cm
`= soon = 1.02
`
`where the “S” subscript denotes the ratio for speech sources
`and “N’ the ratio for noise sources. There is not a significant
`amount of separation between noise and speech sources in
`this case, and therefore it would be difficult to implement a
`robust solution using simple omnidirectional microphones.
`A better implementation is to use directional microphones
`where the second microphone has minimal speech response.
`As described herein, Such microphones can be constructed
`using omnidirectional microphones O and O:
`
`where C.(z) is a calibration filter used to compensate O’s
`response so that it is the same as O, B(Z) is a filter that
`describes the relationship between O and calibrated O for
`speech, and Y is a fixed delay that depends on the size of the
`array. There is no loss of generality in defining C (Z) as above,
`as either microphone may be compensated to match the other.
`For this configuration V and V have very similar noise
`response magnitudes and very dissimilar speech response
`magnitudes if
`
`5
`
`10
`
`15
`
`25
`
`30
`
`35
`
`The adaptive process varies B(z) to minimize the output ofV.
`when only speech is being received by O and O. A Small
`amount of noise may be tolerated with little ill effect, but it is
`preferred that only speech is being received when the coeffi
`cients of B(z) are calculated. Any adaptive process may be
`used; a normalized least-mean squares (NLMS) algorithm
`was used in the examples below.
`The V can be constructed using the current value for B(z)
`or the fixed filter B(z) can be used for simplicity. FIG. 4 is a
`block diagram of V construction, under an embodiment.
`Now the ratio R is
`
`where double bar indicates norm and again any size window
`may be used. If B(z) has been accurately calculated, the ratio
`for speech should be relatively high (e.g., greater than
`approximately 2) and the ratio for noise should be relatively
`low (e.g., less than approximately 1.1). The ratio calculated
`will depend on both the relative energies of the speech and
`noise as well as the orientation of the noise and the reverber
`ance of the environment. In practice, either the adapted filter
`f(z) or the static filter b(z) may be used for V. (z) with little
`effect on R—but it is important to use the adapted filter f(Z)
`in V(Z) for best performance. Many techniques known to
`those skilled in the art (e.g., Smoothing, etc.) can be used to
`make R more amenable to use in generating a VAD and the
`embodiments herein are not so limited.
`The ratio R can be calculated for the entire frequency band
`of interest, or can be calculated in frequency Subbands. One
`effective Subband discovered was 250Hz to 1250 Hz, another
`was 200 Hz to 3000 Hz, but many others are possible and
`useful.
`Once generated, the vector of the ratio Riversus time (or the
`matrix of R versus time if multiple subbands are used) can be
`used with any detection system (such as one that uses fixed
`and/or adaptive thresholds) to determine when speech is
`occurring. While many detection systems and methods are
`known to exist by those skilled in the art and may be used, the
`method described herein for generating an R so that the
`speech is easily discernable is novel. It is important to note
`that the R does not depend on the type of noise or its orien
`tation or frequency content; R simply depends on the V and
`V spatial response similarity for noise and spatial response
`dissimilarity for speech. In this way it is very robust and can
`operate Smoothly in a variety of noisy acoustic environments.
`FIG.5 is a flow diagram of acoustic voice activity detection
`500, under an embodiment. The detection comprises forming
`a first virtual microphone by combining a first signal of a first
`physical microphone and a second signal of a second physical
`microphone 502. The detection comprises forming a filter
`that describes a relationship for speech between the first
`physical microphone and the second physical microphone
`504. The detection comprises forming a second virtual micro
`
`g
`
`where again d=2do and c is the speed of sound in air, which is
`temperature dependent and approximately
`
`C = 331.3
`
`T
`1 +
`273.15 sec
`
`40
`
`45
`
`50
`
`where T is the temperature of the air in Celsius.
`The filter B(z) can be calculated using wave theory to be
`
`d
`d +d
`
`55
`
`2
`
`where again d is the distance from the user's mouth to O.
`FIG. 2 is a block diagram of V construction using a fixed
`B(Z), under an embodiment. This fixed (or static) f works
`sufficiently well if the calibration filter C.(z) is accurate and d
`and d are accurate for the user. This fixed-falgorithm, how
`ever, neglects important effects Such as reflection, diffraction,
`poor array orientation (i.e. the microphones and the mouth of 65
`the user are not all on a line), and the possibility of different d
`and d values for different users.
`
`60
`
`Page 39 of 56
`
`

`US 8,321,213 B2
`
`10
`
`15
`
`7
`phone by applying the filter to the first signal to generate a first
`intermediate signal, and Summing the first intermediate sig
`nal and the second signal 506. The detection comprises gen
`erating an energy ratio of energies of the first virtual micro
`phone and the second virtual microphone 508. The detection
`comprises detecting acoustic Voice activity of a speaker when
`the energy ratio is greater than a threshold value 510.
`The accuracy of the adaptation to the B(Z) of the system is
`a factor in determining the effectiveness of the AVAD. A more
`accurate adaptation to the actual B(Z) of the system leads to
`lower energy of the speech response in V, and a higher ratio
`R. The noise (far-field) magnitude response is largely
`unchanged by the adaptation process, so the ratio R will be
`near unity for accurately adapted beta. For purposes of accu
`racy, the system can be trained on speech alone, or the noise
`should be low enough in energy so as not to affect or to have
`a minimal affect the training.
`To make the training as accurate as possible, the coeffi
`c

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go Monthly

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases