throbber
(19) United States
`(12) Patent Application Publication (10) Pub. No.: US 2003/0179888A1
`(43) Pub. Date:
`Sep. 25, 2003
`Burnett et al.
`
`US 2003O179888A1
`
`(54) VOICE ACTIVITY DETECTION (VAD)
`DEVICES AND METHODS FOR USE WITH
`NOISE SUPPRESSION SYSTEMS
`(76) Inventors: Gregory C. Burnett, Livermore, CA
`(US); Nicolas J. Petit, San Francisco,
`CA (US); Alexander M. Asseily, San
`Francisco, CA (US); Andrew E.
`Einaudi, San Francisco, CA (US)
`Correspondence Address:
`Shemwell Gregory & Courtney LLP
`Suite 201
`4880 Stevens Creek Boulevard
`San Jose, CA 95129 (US)
`(21) Appl. No.:
`10/383,162
`(22) Filed:
`Mar. 5, 2003
`Related U.S. Application Data
`(60) Provisional application No. 60/362,162, filed on Mar.
`5, 2002. Provisional application No. 60/362,170, filed
`on Mar. 5, 2002. Provisional application No. 60/361,
`
`981, filed on Mar. 5, 2002. Provisional application
`No. 60/362,161, filed on Mar. 5, 2002. Provisional
`application No. 60/362,103, filed on Mar. 5, 2002.
`Provisional application No. 60/368,343, filed on Mar.
`27, 2002.
`
`Publication Classification
`
`(51) Int. Cl." .......................... A61F 11/06; G1OK 11/16;
`HO3B 29/00
`(52) U.S. Cl. .......................................... 381/71.8; 381/71.1
`(57)
`ABSTRACT
`Voice Activity Detection (VAD) devices, systems and meth
`ods are described for use with Signal processing Systems to
`denoise acoustic signals. Components of a Signal processing
`System and/or VAD System receive acoustic Signals and
`Voice activity Signals. Control Signals are automatically
`generated from data of the Voice activity signals. Compo
`nents of the Signal processing System and/or VAD System
`use the control Signals to automatically Select a denoising
`method appropriate to data of frequency Subbands of the
`acoustic Signals. The Selected denoising method is applied to
`the acoustic Signals to generate denoised acoustic Signals.
`
`
`
`OO
`4
`
`Cleaned speech->
`
`Page 1 of 40
`
`Amazon v. Jawbone
`U.S. Patent 10,779,080
`Amazon Ex. 1019
`
`

`

`Patent Application Publication
`
`Sep. 25, 2003 Sheet 1 of 22
`
`US 2003/0179888A1
`
`
`
`Page 2 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 2 of 22
`
`US 2003/0179888A1
`
`
`
`Page 3 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 3 of 22
`
`US 2003/0179888A1
`
`
`
`Page 4 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 4 of 22
`
`US 2003/0179888A1
`
`
`
`NOISE
`
`Figure 2 (ROR Akt)
`
`Page 5 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 5 of 22
`
`US 2003/0179888A1
`
`30d
`l
`302
`{eceive Accele?o?eTa/2. M7
`Fe, A.J. Scitize receLeacAETag M77
`
`''
`
`eccNT AN) &re?s NG, Tizza y Ti
`
`30.
`
`es
`
`--
`
`
`
`
`
`
`
`ferows arecreat Ilfo,24ATION Cokevites
`By No (Ge
`30
`calculate EVeey NeACH Willow
`courage ecay to THQeshold VAlves 32
`
`30g
`
`Page 6 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 6 of 22
`
`US 2003/0179888A1
`
`0.4
`
`-T
`
`T
`
`
`
`
`
`AF
`
`0.2
`LOH
`whildheit I
`all
`O regging y y
`4. E. R
`RailARts, Agii-AAE H NME is AREER
`All
`Hall.
`Ele
`PERIE
`I
`Ho2 -
`
`i
`2
`-0.2
`
`-
`
`
`
`
`
`2.5
`
`3
`
`4.5
`4.
`3.5
`Time (samples at 8 kHz)
`
`5
`
`5.5
`
`I
`6
`
`6.5
`x 10
`
`Figure 4
`
`Page 7 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 7 of 22
`
`US 2003/0179888A1
`
`I
`
`-T- --
`
`
`
`0.4
`
`O2
`
`-02
`
`
`
`-0 1
`
`O. 2
`
`5A-
`As takihild
`
`. kill. Alth
`seekly people
`
`t I
`
`Ra?tasis.
`
`502 --
`
`Time (samples at 8 kHz)
`
`-
`
`x 10
`
`Figure 5
`
`Page 8 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 8 of 22
`
`US 2003/0179888A1
`
`1 -
`
`-T-
`
`g
`Airl, A.
`also R
`t A. d
`2 0. R. AE
`
`2
`
`(O2
`
`o
`
`o's
`
`i
`
`A
`404 -
`level
`I A. I
`A. he
`'.
`1's
`
`
`
`
`
`A. th.
`
`Risk.
`
`2
`
`25
`
`s
`
`35
`
`4
`10'
`
`1
`
`
`
`--
`
`Cf
`
`0.5
`
`O
`
`0.5
`
`-1
`O
`
`
`
`0.6
`9 0.4
`
`A.
`
`sh-Ald.
`0.2
`5
`O 8-de
`-0.2
`y p
`5.
`a2a1
`
`10'
`
`-
`
`Ty
`
`y
`
`-
`
`O
`
`o's
`
`1
`
`2.5
`2
`15
`Time (samples at 8 kHz)
`
`3
`
`35
`
`x 10'
`
`Figure 6
`
`Page 9 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 9 of 22
`
`US 2003/0179888A1
`
`15
`
`-
`
`-
`
`I
`
`--
`
`;
`
`
`
`0.5 i O
`
`-0.5
`
`0
`
`1
`
`2
`
`4.
`3
`Time (samples at 8 kHz)
`
`5
`
`6
`
`7
`x 10
`
`Figure 7
`
`Page 10 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 10 of 22 US 2003/0179888A1
`
`
`
`
`
`
`
`
`
`
`
`Locate subject's
`face and vocal
`articulators
`
`- 602
`
`Movement of - 30
`
`Is movement
`faster than
`threshold and
`oscillatory?
`
`
`
`Pass information to
`Pathfinder noise
`suppression system
`
`Page 11 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 11 of 22
`
`US 2003/0179888A1
`
`O 2
`shell
`O
`YA
`
`
`
`O. 4.
`
`;
`
`O. ; 2
`0.O.O O24.6
`
`-0. 2
`-0.4
`
`-
`
`31 H
`E. "Wypy "
`,
`,
`
`9021
`
`
`
`
`
`
`
`
`
`iRN t
`I
`
`h R
`R
`
`r
`
`Hist
`R E. ar
`
`,
`
`,
`
`,
`
`,
`
`,
`
`--
`
`,
`
`,
`x 10'
`
`h A.
`
`th.
`
`A. I
`
`k d
`
`|- seed
`"Y" v.
`
`2
`
`2.5
`
`3
`
`3.5
`
`4
`
`4.5
`
`5
`
`5.5
`
`6
`
`6.5
`
`Time (samples at 8 kHz)
`
`Figure 9
`
`Page 12 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 12 of 22 US 2003/0179888A1
`
`
`
`Max 16
`Figure 10
`
`OO
`A
`
`Mic 2
`response
`
`AC 2.
`) 02
`
`Gain < 1.
`
`- N3O
`
`. --6
`.
`.
`Gain > 1 .
`to
`Figure 11
`
`
`
`
`
`Mic -
`response
`olo
`MC
`|OO2
`
`Page 13 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 13 of 22 US 2003/0179888A1
`
`A
`
`110O
`4.
`
`Reave stel.JALs For
`
`ste aceofoles 202
`
`File?. Awes bicitize <t (oMALS
`
`o
`
`6tG-MeNT, sef, fA) Fittee yof Tizey )Af
`
`2O0
`
`calcula E 2awhael New AT10Ns ANS
`(2O6
`AveAces of Ast GAW2
`-
`st
`are entious a? AE--- ("-alo
`PN/UM VALJes, A Wy Abvs.
`
`teeshollys
`
`a 2.
`
`
`
`
`
`
`
`
`
`ties.Holbs ANY See(2M Ne if
`Cofi A&E GAt
`S. 2 Voice) of UWvoices
`NATA of Wynbow
`
`2 C.
`
`Page 14 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 14 of 22
`
`US 2003/0179888A1
`
`0.5
`
`- --
`
`T
`
`-T
`
`— — .
`
`O
`O
`
`i
`O
`2.
`
`t
`
`2d
`o Adelaid ahead E.I.S.A.
`Ry-Yeh" By-PF MP F. AIFF-
`2021
`
`-0.5 -
`2
`
`2.5
`
`3.
`
`-
`3.5
`4
`
`4.5
`
`5
`
`i.
`
`A Bas
`Spirie
`
`-- I -
`
`
`5.5
`6
`6.5
`x 10'
`
`
`
`8
`
`46
`
`2
`
`O
`
`S 0.2
`
`-0-2
`
`-0.4
`
`f
`A.A.A.A.A.A.
`
`H-----0
`
`A329.
`
`2
`
`-- --- |--|--
`2.5
`3
`3.5
`4
`4.5
`5
`Time (samples at 8 kHz)
`
`5.5
`
`- - - - - - -
`6
`65
`x 10'
`
`Figure 13
`
`Page 15 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 15 of 22 US 2003/0179888A1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Keceive st& JALs
`
`Window M1 and M2
`
`-
`
`Calculate FFT1 and FFT2
`
`- 1402
`404
`
`0.
`
`Il-10O
`d
`
`-
`|0%
`Calculate magnitude of FFT1 and 1
`FFT2
`-
`
`Calculate exponentially averaged FFT
`for 1 and 2
`
`Calculate the FFT ratio from previous
`step and its mean
`
`|HO
`
`2
`
`Compare mean to threshold, set VAD
`
`State
`
`l
`
`L
`
`Update parameters, keep track
`of highest meanin contiguous
`voicing
`N4 7
`
`Reset parameters, if first non-voiced
`section after voiced, check to see if
`previous voicing was false positive
`
`Calculate high and low energy levels,
`calculate new voicing threshold, add
`hangover if appropriate
`
`END
`
`-
`
`Figure 14
`
`Page 16 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 16 of 22 US 2003/0179888A1
`
`2
`
`2.5
`
`3
`
`3.5
`
`4
`
`4.5
`
`5
`
`5.5
`
`6
`x 10'
`
`-
`
`I
`
`O.15
`
`
`
`0.1
`
`0.05
`
`O
`
`C
`t
`
`s
`
`-0.05
`
`-0.1
`
`0.15
`
`t
`
`-0.2
`2.
`
`2.5
`
`3
`
`4.5
`4
`3.5
`Time (samples at 8 kHz)
`
`5
`
`5.5
`
`6
`x 10'
`
`Figure 15
`
`Page 17 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 17 of 22 US 2003/0179888A1
`
`
`
`
`
`
`
`
`
`(ceof'Hove 6.a Als/ATA |- (-02.
`
`-)(66
`
`go wool bata aws souA&E Resott - ill
`
`
`
`2.
`(AculAF ENERGY t
`LTs 31AWMA&M lawfos AAE, AverAoEs-Ielf
`%. 42 Eugeey WAlug5
`-1
`ceAge 4-AWAARA bew Anos Alb Ave?Aees
`AGA. St Milfur VAttles Awy Ab US
`Calculate W0 (NG theesHalls 1 It $
`ca, Aee secy id TH(&es Holly's AWA
`erect F ATA of WW105
`Wolce) - ok Up voice)
`
`
`
`
`
`
`
`
`
`Page 18 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 18 of 22
`
`US 2003/0179888A1
`
`
`
`D.
`
`time (samples at 16kHz)
`
`x 1 o'
`
`Figure 17
`
`Page 19 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 19 of 22 US 2003/0179888A1
`
`
`
`
`
`( s )
`
`SIGNAL
`s(n)
`
`
`
`m Coventional
`WAD
`
`Cleaned
`speech
`
`NOISE
`n(n)
`
`i
`
`.-
`
`Reference
`A c 2.
`Figure 1 8
`
`Page 20 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 20 of 22 US 2003/0179888A1
`
`
`
`feo De WAY INR 4WATON TO N0 Se
`see essan Syste'
`
`icuet
`
`Page 21 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 21 of 22 US 2003/0179888A1
`
`the flow data and
`digitize
`
`2eo
`
`
`
`
`
`Gather next 20
`msec of data
`
`2006.
`
`
`
`
`
`Filter out ,
`Wanted flow
`spectra
`
`- 2008
`
`
`
`
`
`OOO
`
`Pass information to
`Pathfinder noise
`suppression system
`
`Figure 20
`
`Page 22 of 40
`
`

`

`Patent Application Publication Sep. 25, 2003 Sheet 22 of 22 US 2003/0179888A1
`
`0.3 - - - -
`
`0.2
`
`-
`
`20
`
`C
`a.
`> 0.1
`o
`C
`
`l,
`BiH,
`
`t
`
`Road
`ev
`
`> -0.1
`t
`O
`2-02
`
`Altaf
`
`R. R.
`g
`
`on 1
`
`-0.3
`O
`
`-
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`x 10
`
`w
`
`
`
`0.15
`
`0.1
`
`0.05
`
`O
`
`O
`g
`
`C
`
`9. o
`
`-0.05
`
`-0.1 -
`
`O
`
`---
`1
`
`2
`
`-l
`3
`4
`Time (samples at 8 kHz)
`
`--
`5
`
`6
`
`7
`x 10'
`
`Figure 21
`
`Page 23 of 40
`
`

`

`US 2003/0179888A1
`
`Sep. 25, 2003
`
`VOICE ACTIVITY DETECTION (VAD) DEVICES
`AND METHODS FOR USE WITH NOISE
`SUPPRESSION SYSTEMS
`
`RELATED APPLICATIONS
`0001. This application claims priority from the following
`U.S. patent applications: application Ser. No. 60/362,162,
`entitled PATHFINDER-BASED VOICE ACTIVITY
`DETECTION (PVAD) USED WITH PATHFINDER
`NOISE SUPPRESSION, filed Mar. 5, 2002; application Ser.
`No. 60/362,170, entitled ACCELEROMETER-BASED
`VOICE ACTIVITY DETECTION (PVAD) WITH PATH
`FINDER NOISE SUPPRESSION, filed Mar. 5, 2002; appli
`cation Ser. No. 60/361,981, entitled ARRAY-BASED
`VOICE ACTIVITY DETECTION (AVAD) AND PATH
`FINDER NOISE SUPPRESSION, filed Mar. 5, 2002; appli
`cation Ser. No. 60/362,161, entitled PATHFINDER NOISE
`SUPPRESSION USING AN EXTERNAL VOICE ACTIV
`ITY DETECTION (VAD) DEVICE, filed Mar. 5, 2002;
`application Ser. No. 60/362,103, entitled ACCELEROM
`ETER-BASED VOICE ACTIVITY DETECTION, filed
`Mar. 5, 2002; and application Ser. No. 60/368,343, entitled
`TWO-MICROPHONE FREQUENCY-BASED VOICE
`ACTIVITY DETECTION, filed Mar. 27, 2002, all of which
`are currently pending.
`0002 Further, this application relates to the following
`U.S. patent applications: application Ser. No. 09/905,361,
`entitled METHOD AND APPARATUS FOR REMOVING
`NOISE FROM ELECTRONIC SIGNALS, filed Jul 12,
`2001; application Ser. No. 10/159,770, entitled DETECT
`INGVOICED AND UNVOICED SPEECH USING BOTH
`ACOUSTIC AND NONACOUSTIC SENSORS, filed May
`30, 2002; and application Ser. No. 10/301.237, entitled
`METHOD AND APPARATUS FOR REMOVING NOISE
`FROM ELECTRONIC SIGNALS, filed Nov. 21, 2002.
`
`TECHNICAL FIELD
`0003. The disclosed embodiments relate to systems and
`methods for detecting and processing a desired signal in the
`presence of acoustic noise.
`
`BACKGROUND
`0004. Many noise suppression algorithms and techniques
`have been developed over the years. Most of the noise
`Suppression Systems in use today for Speech communication
`Systems are based on a Single-microphone spectral Subtrac
`tion technique first develop in the 1970s and described, for
`example, by S. F. Boll in “Suppression of Acoustic Noise in
`Speech using Spectral Subtraction.” IEEE Trans. on ASSP,
`pp. 113-120, 1979. These techniques have been refined over
`the years, but the basic principles of operation have
`remained the same. See, for example, U.S. Pat. No. 5,687,
`243 of McLaughlin, et al., and U.S. Pat. No. 4,811,404 of
`Vilmur, et al. Generally, these techniques make use of a
`single-microphone Voice Activity Detector (VAD) to deter
`mine the background noise characteristics, where “voice” is
`generally understood to include human Voiced speech,
`unvoiced speech, or a combination of Voiced and unvoiced
`Speech.
`0005. The VAD has also been used in digital cellular
`Systems. AS an example of Such a use, See U.S. Pat. No.
`6,453,291 of Ashley, where a VAD configuration appropriate
`
`to the front-end of a digital cellular System is described.
`Further, some Code Division Multiple Access (CDMA)
`systems utilize a VAD to minimize the effective radio
`Spectrum used, thereby allowing for more System capacity.
`Also, Global System for Mobile Communication (GSM)
`Systems can include a VAD to reduce co-channel interfer
`ence and to reduce battery consumption on the client or
`Subscriber device.
`0006 These typical single-microphone VAD systems are
`Significantly limited in capability as a result of the analysis
`of acoustic information received by the Single microphone,
`wherein the analysis is performed using typical Signal pro
`cessing techniques. In particular, limitations in performance
`of these single-microphone VAD Systems are noted when
`processing Signals having a low signal-to-noise ratio (SNR),
`and in Settings where the background noise varies quickly.
`Thus, Similar limitations are found in noise Suppression
`Systems using these Single-microphone VADS.
`
`BRIEF DESCRIPTION OF THE FIGURES
`0007 FIG. 1 is a block diagram of a signal processing
`System including the Pathfinder noise Suppression System
`and a VAD System, under an embodiment.
`0008 FIG. 1A is a block diagram of a VAD system
`including hardware for use in receiving and processing
`Signals relating to VAD, under an embodiment.
`0009 FIG. 1B is a block diagram of a VAD system using
`hardware of the associated noise Suppression System for use
`in receiving VAD information, under an alternative embodi
`ment.
`0010 FIG. 2 is a block diagram of a signal processing
`System that incorporates a classical adaptive noise cancel
`lation System, as known in the art.
`0011 FIG. 3 is a flow diagram of a method for deter
`mining Voiced and unvoiced speech using an accelerometer
`based VAD, under an embodiment.
`0012 FIG. 4 shows plots including a noisy audio signal
`(live recording) along with a corresponding accelerometer
`based VAD Signal, the corresponding accelerometer output
`Signal, and the denoised audio Signal following processing
`by the Pathfinder system using the VAD signal, under an
`embodiment.
`0013 FIG. 5 shows plots including a noisy audio signal
`(live recording) along with a corresponding SSM-based
`VAD Signal, the corresponding SSM Output signal, and the
`denoised audio Signal following processing by the Path
`finder System using the VAD Signal, under an embodiment.
`0014 FIG. 6 shows plots including a noisy audio signal
`(live recording) along with a corresponding GEMS-based
`VAD Signal, the corresponding GEMS output Signal, and the
`denoised audio Signal following processing by the Path
`finder System using the VAD Signal, under an embodiment.
`0015 FIG. 7 shows plots including recorded spoken
`acoustic data with digitally added noise along with a corre
`sponding EGG-based VAD Signal, and the corresponding
`highpass filtered EGG output Signal, under an embodiment.
`0016 FIG. 8 is a flow diagram 80 of a method for
`determining voiced speech using a Video-based VAD, under
`an embodiment.
`
`Page 24 of 40
`
`

`

`US 2003/0179888A1
`
`Sep. 25, 2003
`
`0017 FIG. 9 shows plots including a noisy audio signal
`(live recording) along with a corresponding single (gradient)
`microphone-based VAD Signal, the corresponding gradient
`microphone output Signal, and the denoised audio signal
`following processing by the Pathfinder System using the
`VAD Signal, under an embodiment.
`0.018
`FIG. 10 shows a single cardioid unidirectional
`microphone of the microphone array, along with the asso
`ciated Spatial response curve, under an embodiment.
`0019 FIG. 11 shows a microphone array of a PVAD
`System, under an embodiment.
`0020 FIG. 12 is a flow diagram of a method for deter
`mining voiced and unvoiced speech using H (Z) gain values,
`under an alternative embodiment of the PVAD.
`0021
`FIG. 13 shows plots including a noisy audio signal
`(live recording) along with a corresponding microphone
`based PVAD signal, the corresponding PVAD gain versus
`time signal, and the denoised audio signal following pro
`cessing by the Pathfinder system using the PVAD signal,
`under an embodiment.
`0022 FIG. 14 is a flow diagram of a method for deter
`mining voiced and unvoiced speech using a Stereo VAD,
`under an embodiment.
`0023 FIG. 15 shows plots including a noisy audio signal
`(live recording) along with a corresponding SVAD signal,
`and the denoised audio signal following processing by the
`Pathfinder system using the SVAD signal, under an embodi
`ment.
`0024 FIG. 16 is a flow diagram of a method for deter
`mining Voiced and unvoiced speech using an AVAD, under
`an embodiment.
`0.025
`FIG. 17 shows plots including audio signals and
`from each microphone of an AVAD system along with the
`corresponding combined energy Signal, under an embodi
`ment.
`FIG. 18 is a block diagram of a signal processing
`0.026
`System including the Pathfinder noise Suppression System
`and a single-microphone (conventional) VAD System, under
`an embodiment.
`0027 FIG. 19 is a flow diagram of a method for gener
`ating voicing information using a Single-microphone VAD,
`under an embodiment.
`0028 FIG. 20 is a flow diagram of a method for deter
`mining voiced and unvoiced Speech using an airflow-based
`VAD, under an embodiment.
`0029 FIG. 21 shows plots including a noisy audio signal
`along with a corresponding manually activated/calculated
`VAD Signal, and the denoised audio Signal following pro
`cessing by the Pathfinder system using the manual VAD
`Signal, under an embodiment.
`0.030. In the drawings, the same reference numbers iden
`tify identical or Substantially similar elements or acts. To
`easily identify the discussion of any particular element or
`act, the most significant digit or digits in a reference number
`refer to the Figure number in which that element is first
`introduced (e.g., element 104 is first introduced and dis
`cussed with respect to FIG. 1).
`
`DETAILED DESCRIPTION
`0.031) Numerous Voice Activity Detection (VAD) devices
`and methods are described below for use with adaptive noise
`Suppression Systems. Further, results are presented below
`from experiments using the VAD devices and methods
`described herein as a component of a noise Suppression
`system, in particular the Pathfinder Noise Suppression Sys
`tem available from Aliph, San Francisco, Calif. (http://
`www.aliph.com), but the embodiments are not so limited. In
`the description below, when the Pathfinder noise suppres
`Sion System is referred to, it should be kept in mind that
`noise Suppression Systems that estimate the noise waveform
`and Subtract it from a signal and that use or are capable of
`using VAD information for reliable operation are included in
`that reference. Pathfinder is Simply a convenient referenced
`implementation for a System that operates on Signals com
`prising desired Speech Signals along with noise.
`0032. When using the VAD devices and methods
`described herein with a noise Suppression System, the VAD
`Signal is processed independently of the noise Suppression
`System, So that the receipt and processing of VAD informa
`tion is independent from the processing associated with the
`noise Suppression, but the embodiments are not So limited.
`This independence is attained physically (i.e., different hard
`ware for use in receiving and processing Signals relating to
`the VAD and the noise Suppression), through processing
`(i.e., using the same hardware to receive signals into the
`noise Suppression System while using independent tech
`niques (Software, algorithms, routines) to process the
`received signals), and through a combination of different
`hardware and different Software.
`0033. In the following description, “acoustic' is gener
`ally defined as acoustic waves propagating in air. Propaga
`tion of acoustic waves in media other than air will be noted
`as such. References to “speech” or “voice” generally refer to
`human Speech including voiced Speech, unvoiced speech,
`and/or a combination of Voiced and unvoiced speech.
`Unvoiced speech or voiced speech is distinguished where
`necessary. The term “noise Suppression' generally describes
`any method by which noise is reduced or eliminated in an
`electronic Signal.
`0034) Moreover, the term “VAD” is generally defined as
`a vector or array Signal, data, or information that in Some
`manner represents the occurrence of Speech in the digital or
`analog domain. A common representation of VAD informa
`tion is a one-bit digital Signal Sampled at the same rate as the
`corresponding acoustic Signals, with a Zero value represent
`ing that no speech has occurred during the corresponding
`time Sample, and a unity value indicating that speech has
`occurred during the corresponding time Sample. While the
`embodiments described herein are generally described in the
`digital domain, the descriptions are also valid for the analog
`domain.
`0035) The VAD devices/methods described herein gen
`erally include Vibration and movement Sensors, acoustic
`Sensors, and manual VAD devices, but are not So limited. In
`one embodiment, an accelerometer is placed on the skin for
`use in detecting skin Surface vibrations that correlate with
`human Speech. These recorded vibrations are then used to
`calculate a VAD Signal for use with or by an adaptive noise
`Suppression algorithm in Suppressing environmental acous
`tic noise from a simultaneously (within a few milliseconds)
`recorded acoustic Signal that includes both Speech and noise.
`
`Page 25 of 40
`
`

`

`US 2003/0179888A1
`
`Sep. 25, 2003
`
`0036) Another embodiment of the VAD devices/methods
`described herein includes an acoustic microphone modified
`with a membrane So that the microphone no longer effi
`ciently detects acoustic vibrations in air. The membrane,
`though, allows the microphone to detect acoustic vibrations
`in objects with which it is in physical contact (allowing a
`good mechanical impedance match), Such as human skin.
`That is, the acoustic microphone is modified in Some way
`Such that it no longer detects acoustic vibrations in air
`(where it no longer has a good physical impedance match),
`but only in objects with which the microphone is in contact.
`This configures the microphone, like the accelerometer, to
`detect vibrations of human skin associated with the Speech
`production of that human while not efficiently detecting
`acoustic environmental noise in the air. The detected vibra
`tions are processed to form a VAD Signal for use in a noise
`Suppression System, as detailed below.
`0037 Yet another embodiment of the VAD described
`herein uses an electromagnetic Vibration Sensor, Such as a
`radiofrequency vibrometer (RF) or laser vibrometer, which
`detect skin vibrations. Further, the RF vibrometer detects the
`movement of tissue within the body, Such as the inner
`Surface of the cheek or the tracheal wall. Both the exterior
`skin and internal tissue vibrations associated with Speech
`production can be used to form a VAD Signal for use in a
`noise Suppression System as detailed below.
`0038. Further embodiments of the VAD devices/methods
`described herein include an electroglottograph (EGG) to
`directly detect vocal fold movement. The EGG is an alter
`nating current-(AC) based method of measuring vocal fold
`contact area. When the EGG indicates Sufficient vocal fold
`contact the assumption that follows is that Voiced speech is
`occurring, and a corresponding VAD Signal representative of
`Voiced Speech is generated for use in a noise Suppression
`system as detailed below. Similarly, an additional VAD
`embodiment uses a Video System to detect movement of a
`perSon's Vocal articulators, an indication that Speech is being
`produced.
`0039) Another set of VAD devices/methods described
`below use Signals received at one or more acoustic micro
`phones along with corresponding Signal processing tech
`niques to produce VAD Signals accurately and reliably under
`most environmental noise conditions. These embodiments
`include simple arrays and co-located (or nearly SO) combi
`nations of omnidirectional and unidirectional acoustic
`microphones. The simplest configuration in this set of VAD
`embodiments includes the use of a Single microphone,
`located very close to the mouth of the user in order to record
`signals at a relatively high SNR. This microphone can be a
`gradient or “close-talk' microphone, for example. Other
`configurations include the use of combinations of unidirec
`tional and omnidirectional microphones in various orienta
`tions and configurations. The Signals received at these
`microphones, along with the associated Signal processing,
`are used to calculate a VAD Signal for use with a noise
`Suppression System, as described below. Also described
`below is a VAD System that is activated manually, as in a
`walkie-talkie, or by an observer to the System.
`0040 AS referenced above, the VAD devices and meth
`ods described herein are for use with noise Suppression
`systems like, for example, the Pathfinder Noise Suppression
`System (referred to herein as the “Pathfinder system”)
`
`available from Aliph of San Francisco, Calif. While the
`descriptions of the VAD devices herein are provided in the
`context of the Pathfinder Noise Suppression System, those
`skilled in the art will recognize that the VAD devices and
`methods can be used with a variety of noise Suppression
`Systems and methods known in the art.
`0041. The Pathfinder system is a digital signal process
`ing-(DSP) based acoustic noise Suppression and echo
`cancellation system. The Pathfinder system, which can
`couple to the front-end of Speech processing Systems, uses
`VAD information and received acoustic information to
`reduce or eliminate noise in desired acoustic Signals by
`estimating the noise waveform and Subtracting it from a
`Signal including both Speech and noise. The Pathfinder
`system is described further below and in the Related Appli
`cations.
`0042 FIG. 1 is a block diagram of a signal processing
`system 100 including the Pathfinder noise Suppression sys
`tem 101 and a VAD system 102, under an embodiment. The
`Signal processing System 100 includes two microphones
`MIC 1110 and MIC 2112 that receive signals or information
`from at least one speech Signal Source 120 and at least one
`noise Source 122. The path s(n) from the speech signal
`source 120 to MIC 1 and the path n(n) from the noise source
`122 to MIC 2 are considered to be unity. Further, H., (z)
`represents the path from the noise source 122 to MIC 1, and
`H2(z) represents the path from the speech Signal Source 120
`to MIC 2. In contrast to the signal processing system 100
`including the Pathfinder system 101, FIG. 2 is a block
`diagram of a signal processing System 200 that incorporates
`a classical adaptive noise cancellation System 202 as known
`in the art.
`0043 Components of the signal processing system 100,
`for example the noise Suppression System 101, couple to the
`microphones MIC 1 and MIC 2 via wireless couplings,
`wired couplings, and/or a combination of wireleSS and wired
`couplings. Likewise, the VAD System 102 couples to com
`ponents of the Signal processing System 100, like the noise
`Suppression System 101, via wireleSS couplings, wired cou
`plings, and/or a combination of wireleSS and wired cou
`plings. AS an example, the VAD devices and microphones
`described below as components of the VAD system 102 can
`comply with the Bluetooth wireless specification for wire
`leSS communication with other components of the Signal
`processing System, but are not So limited.
`0044) Referring to FIG. 1, the VAD signal 104 from the
`VAD system 102, derived in a manner described herein,
`controls noise removal from the received signals without
`respect to noise type, amplitude, and/or orientation. When
`the VAD signal 104 indicates an absence of voicing, the
`Pathfinder system 101 uses MIC 1 and MIC 2 signals to
`calculate the coefficients for a model of transfer function
`H (Z) over pre-specified Subbands of the received signals.
`When the VAD signal 104 indicates the presence of voicing,
`the Pathfinder system 101 stops updating H (Z) and starts
`calculating the coefficients for transfer function H(Z) over
`pre-Specified Subbands of the received signals. Updates of
`H, coefficients can continue in a Subband during speech
`production if the SNR in the Subband is low (note that H(z)
`and H(Z) are Sometimes referred to herein as H and H2,
`respectively, for convenience). The Pathfinder system 101 of
`an embodiment uses the Least Mean Squares (LMS) tech
`
`Page 26 of 40
`
`

`

`US 2003/0179888A1
`
`Sep. 25, 2003
`
`nique to calculate H and H2, as described further by B.
`Widrow and S. Stearns in “Adaptive Signal Processing”,
`Prentice-Hall Publishing, ISBN 0-13-004029-0, but is not so
`limited. The transfer function can be calculated in the time
`domain, frequency domain, or a combination of both the
`time/frequency domains. The Pathfinder system Subse
`quently removes noise from the received acoustic Signals of
`interest using combinations of the transfer functions H (Z)
`and H(Z), thereby generating at least one denoised acoustic
`Stream.
`004.5 The Pathfinder system can be implemented in a
`variety of ways, but common to all of the embodiments is
`reliance on an accurate and reliable VAD device and/or
`method. The VAD device/method should be accurate
`because the Pathfinder system updates its filter coefficients
`when there is no speech or when the SNR during speech is
`low. If Sufficient speech energy is present during coefficient
`update, Subsequent Speech with Similar spectral character
`istics can be Suppressed, an undesirable occurrence. The
`VAD device/method should be robust to support high accu
`racy under a variety of environmental conditions. Obviously,
`there are likely to be some conditions under which no VAD
`device/method will operate Satisfactorily, but under normal
`circumstances the VAD device/method should work to pro
`vide maximum noise Suppression with few adverse affects
`on the Speech Signal of interest.
`0046) When using VAD devices/methods with a noise
`Suppression System, the VAD Signal is processed indepen
`dently of the noise Suppression System, So that the receipt
`and processing of VAD information is independent from the
`processing associated with the noise Suppression, but the
`embodiments are not So limited. This independence is
`attained physically (i.e., different hardware for use in receiv
`ing and processing Signals relating to the VAD and the noise
`Suppression), through processing (i.e., using the same hard
`ware to receive signals into the noise Suppression System
`while using independent techniques (Software, algorithms,
`routines) to process the received signals), and through a
`combination of different hardware and different Software, as
`described below.
`0047 FIG. 1A is a block diagram of a VAD system 102A
`including hardware for use in receiving and processing
`signals relating to VAD, under an embodiment. The VAD
`system 102A includes a VAD device 130 coupled to provide
`data to a corresponding VAD algorithm 140. Note that noise
`Suppression Systems of alternative embodiments can inte
`grate some or all functions of the VAD algorithm with the
`noise Suppression processing in any manner obvious to those
`skilled in the art.
`0048 FIG. 1B is a block diagram of a VAD system 102B
`using hardware of the associated noise Suppression System
`101 for use in receiving VAD information 164, under an
`embodiment. The VAD system 102B includes a VAD algo
`rithm 150 that receives data 164 from MIC 1 and MIC 2, or
`other components, of the corresponding Signal processing
`system 100. Alternative embodiments of the noise Suppres
`Sion System can integrate Some or all functions of the VAD
`algorithm with the noise Suppression processing in any
`manner obvious to those skilled in the art.
`0049) Vibration/Movement-Based VAD Devices/Meth
`ods
`0050. The vibration/movement-based VAD devices
`include the physical hardware devices for use in receiving
`
`and processing Signals relating to the VAD and the noise
`Suppression. As a Speaker or user produces Speech, the
`resulting vibrations propagate through the tissue of the
`Speaker and, therefore can be detected on and beneath the
`skin using various methods. These vibrations are an excel
`lent Source of VAD information, as they are strongly asso
`ciated with both voiced and unvoiced speech (although the
`unvoiced Speech vibrations are much weaker and more
`difficult to detect) and generally are only slightly affected by
`environmental acoustic noise (Some devices/methods, for
`example the electromagnetic Vibrometers described below,
`are not affected by environmental acoustic noise). These
`tissue vibrations or movements are det

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket