`(12) Patent Application Publication (10) Pub. No.: US 2003/0179888A1
`(43) Pub. Date:
`Sep. 25, 2003
`Burnett et al.
`
`US 2003O179888A1
`
`(54) VOICE ACTIVITY DETECTION (VAD)
`DEVICES AND METHODS FOR USE WITH
`NOISE SUPPRESSION SYSTEMS
`(76) Inventors: Gregory C. Burnett, Livermore, CA
`(US); Nicolas J. Petit, San Francisco,
`CA (US); Alexander M. Asseily, San
`Francisco, CA (US); Andrew E.
`Einaudi, San Francisco, CA (US)
`Correspondence Address:
`Shemwell Gregory & Courtney LLP
`Suite 201
`4880 Stevens Creek Boulevard
`San Jose, CA 95129 (US)
`(21) Appl. No.:
`10/383,162
`(22) Filed:
`Mar. 5, 2003
`Related U.S. Application Data
`(60) Provisional application No. 60/362,162, filed on Mar.
`5, 2002. Provisional application No. 60/362,170, filed
`on Mar. 5, 2002. Provisional application No. 60/361,
`
`981, filed on Mar. 5, 2002. Provisional application
`No. 60/362,161, filed on Mar. 5, 2002. Provisional
`application No. 60/362,103, filed on Mar. 5, 2002.
`Provisional application No. 60/368,343, filed on Mar.
`27, 2002.
`
`Publication Classification
`
`(51) Int. Cl." .......................... A61F 11/06; G1OK 11/16;
`HO3B 29/00
`(52) U.S. Cl. .......................................... 381/71.8; 381/71.1
`(57)
`ABSTRACT
`Voice Activity Detection (VAD) devices, systems and meth
`ods are described for use with Signal processing Systems to
`denoise acoustic signals. Components of a Signal processing
`System and/or VAD System receive acoustic Signals and
`Voice activity Signals. Control Signals are automatically
`generated from data of the Voice activity signals. Compo
`nents of the Signal processing System and/or VAD System
`use the control Signals to automatically Select a denoising
`method appropriate to data of frequency Subbands of the
`acoustic Signals. The Selected denoising method is applied to
`the acoustic Signals to generate denoised acoustic Signals.
`
`
`
`OO
`4
`
`Cleaned speech->
`
`Page 1 of 40
`
`Amazon v. Jawbone
`U.S. Patent 10,779,080
`Amazon Ex. 1019
`
`
`
`Patent Application Publication
`
`Sep. 25, 2003 Sheet 1 of 22
`
`US 2003/0179888A1
`
`
`
`Page 2 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 2 of 22
`
`US 2003/0179888A1
`
`
`
`Page 3 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 3 of 22
`
`US 2003/0179888A1
`
`
`
`Page 4 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 4 of 22
`
`US 2003/0179888A1
`
`
`
`NOISE
`
`Figure 2 (ROR Akt)
`
`Page 5 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 5 of 22
`
`US 2003/0179888A1
`
`30d
`l
`302
`{eceive Accele?o?eTa/2. M7
`Fe, A.J. Scitize receLeacAETag M77
`
`''
`
`eccNT AN) &re?s NG, Tizza y Ti
`
`30.
`
`es
`
`--
`
`
`
`
`
`
`
`ferows arecreat Ilfo,24ATION Cokevites
`By No (Ge
`30
`calculate EVeey NeACH Willow
`courage ecay to THQeshold VAlves 32
`
`30g
`
`Page 6 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 6 of 22
`
`US 2003/0179888A1
`
`0.4
`
`-T
`
`T
`
`
`
`
`
`AF
`
`0.2
`LOH
`whildheit I
`all
`O regging y y
`4. E. R
`RailARts, Agii-AAE H NME is AREER
`All
`Hall.
`Ele
`PERIE
`I
`Ho2 -
`
`i
`2
`-0.2
`
`-
`
`
`
`
`
`2.5
`
`3
`
`4.5
`4.
`3.5
`Time (samples at 8 kHz)
`
`5
`
`5.5
`
`I
`6
`
`6.5
`x 10
`
`Figure 4
`
`Page 7 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 7 of 22
`
`US 2003/0179888A1
`
`I
`
`-T- --
`
`
`
`0.4
`
`O2
`
`-02
`
`
`
`-0 1
`
`O. 2
`
`5A-
`As takihild
`
`. kill. Alth
`seekly people
`
`t I
`
`Ra?tasis.
`
`502 --
`
`Time (samples at 8 kHz)
`
`-
`
`x 10
`
`Figure 5
`
`Page 8 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 8 of 22
`
`US 2003/0179888A1
`
`1 -
`
`-T-
`
`g
`Airl, A.
`also R
`t A. d
`2 0. R. AE
`
`2
`
`(O2
`
`o
`
`o's
`
`i
`
`A
`404 -
`level
`I A. I
`A. he
`'.
`1's
`
`
`
`
`
`A. th.
`
`Risk.
`
`2
`
`25
`
`s
`
`35
`
`4
`10'
`
`1
`
`
`
`--
`
`Cf
`
`0.5
`
`O
`
`0.5
`
`-1
`O
`
`
`
`0.6
`9 0.4
`
`A.
`
`sh-Ald.
`0.2
`5
`O 8-de
`-0.2
`y p
`5.
`a2a1
`
`10'
`
`-
`
`Ty
`
`y
`
`-
`
`O
`
`o's
`
`1
`
`2.5
`2
`15
`Time (samples at 8 kHz)
`
`3
`
`35
`
`x 10'
`
`Figure 6
`
`Page 9 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 9 of 22
`
`US 2003/0179888A1
`
`15
`
`-
`
`-
`
`I
`
`--
`
`;
`
`
`
`0.5 i O
`
`-0.5
`
`0
`
`1
`
`2
`
`4.
`3
`Time (samples at 8 kHz)
`
`5
`
`6
`
`7
`x 10
`
`Figure 7
`
`Page 10 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 10 of 22 US 2003/0179888A1
`
`
`
`
`
`
`
`
`
`
`
`Locate subject's
`face and vocal
`articulators
`
`- 602
`
`Movement of - 30
`
`Is movement
`faster than
`threshold and
`oscillatory?
`
`
`
`Pass information to
`Pathfinder noise
`suppression system
`
`Page 11 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 11 of 22
`
`US 2003/0179888A1
`
`O 2
`shell
`O
`YA
`
`
`
`O. 4.
`
`;
`
`O. ; 2
`0.O.O O24.6
`
`-0. 2
`-0.4
`
`-
`
`31 H
`E. "Wypy "
`,
`,
`
`9021
`
`
`
`
`
`
`
`
`
`iRN t
`I
`
`h R
`R
`
`r
`
`Hist
`R E. ar
`
`,
`
`,
`
`,
`
`,
`
`,
`
`--
`
`,
`
`,
`x 10'
`
`h A.
`
`th.
`
`A. I
`
`k d
`
`|- seed
`"Y" v.
`
`2
`
`2.5
`
`3
`
`3.5
`
`4
`
`4.5
`
`5
`
`5.5
`
`6
`
`6.5
`
`Time (samples at 8 kHz)
`
`Figure 9
`
`Page 12 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 12 of 22 US 2003/0179888A1
`
`
`
`Max 16
`Figure 10
`
`OO
`A
`
`Mic 2
`response
`
`AC 2.
`) 02
`
`Gain < 1.
`
`- N3O
`
`. --6
`.
`.
`Gain > 1 .
`to
`Figure 11
`
`
`
`
`
`Mic -
`response
`olo
`MC
`|OO2
`
`Page 13 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 13 of 22 US 2003/0179888A1
`
`A
`
`110O
`4.
`
`Reave stel.JALs For
`
`ste aceofoles 202
`
`File?. Awes bicitize <t (oMALS
`
`o
`
`6tG-MeNT, sef, fA) Fittee yof Tizey )Af
`
`2O0
`
`calcula E 2awhael New AT10Ns ANS
`(2O6
`AveAces of Ast GAW2
`-
`st
`are entious a? AE--- ("-alo
`PN/UM VALJes, A Wy Abvs.
`
`teeshollys
`
`a 2.
`
`
`
`
`
`
`
`
`
`ties.Holbs ANY See(2M Ne if
`Cofi A&E GAt
`S. 2 Voice) of UWvoices
`NATA of Wynbow
`
`2 C.
`
`Page 14 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 14 of 22
`
`US 2003/0179888A1
`
`0.5
`
`- --
`
`T
`
`-T
`
`— — .
`
`O
`O
`
`i
`O
`2.
`
`t
`
`2d
`o Adelaid ahead E.I.S.A.
`Ry-Yeh" By-PF MP F. AIFF-
`2021
`
`-0.5 -
`2
`
`2.5
`
`3.
`
`-
`3.5
`4
`
`4.5
`
`5
`
`i.
`
`A Bas
`Spirie
`
`-- I -
`
`
`5.5
`6
`6.5
`x 10'
`
`
`
`8
`
`46
`
`2
`
`O
`
`S 0.2
`
`-0-2
`
`-0.4
`
`f
`A.A.A.A.A.A.
`
`H-----0
`
`A329.
`
`2
`
`-- --- |--|--
`2.5
`3
`3.5
`4
`4.5
`5
`Time (samples at 8 kHz)
`
`5.5
`
`- - - - - - -
`6
`65
`x 10'
`
`Figure 13
`
`Page 15 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 15 of 22 US 2003/0179888A1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Keceive st& JALs
`
`Window M1 and M2
`
`-
`
`Calculate FFT1 and FFT2
`
`- 1402
`404
`
`0.
`
`Il-10O
`d
`
`-
`|0%
`Calculate magnitude of FFT1 and 1
`FFT2
`-
`
`Calculate exponentially averaged FFT
`for 1 and 2
`
`Calculate the FFT ratio from previous
`step and its mean
`
`|HO
`
`2
`
`Compare mean to threshold, set VAD
`
`State
`
`l
`
`L
`
`Update parameters, keep track
`of highest meanin contiguous
`voicing
`N4 7
`
`Reset parameters, if first non-voiced
`section after voiced, check to see if
`previous voicing was false positive
`
`Calculate high and low energy levels,
`calculate new voicing threshold, add
`hangover if appropriate
`
`END
`
`-
`
`Figure 14
`
`Page 16 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 16 of 22 US 2003/0179888A1
`
`2
`
`2.5
`
`3
`
`3.5
`
`4
`
`4.5
`
`5
`
`5.5
`
`6
`x 10'
`
`-
`
`I
`
`O.15
`
`
`
`0.1
`
`0.05
`
`O
`
`C
`t
`
`s
`
`-0.05
`
`-0.1
`
`0.15
`
`t
`
`-0.2
`2.
`
`2.5
`
`3
`
`4.5
`4
`3.5
`Time (samples at 8 kHz)
`
`5
`
`5.5
`
`6
`x 10'
`
`Figure 15
`
`Page 17 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 17 of 22 US 2003/0179888A1
`
`
`
`
`
`
`
`
`
`(ceof'Hove 6.a Als/ATA |- (-02.
`
`-)(66
`
`go wool bata aws souA&E Resott - ill
`
`
`
`2.
`(AculAF ENERGY t
`LTs 31AWMA&M lawfos AAE, AverAoEs-Ielf
`%. 42 Eugeey WAlug5
`-1
`ceAge 4-AWAARA bew Anos Alb Ave?Aees
`AGA. St Milfur VAttles Awy Ab US
`Calculate W0 (NG theesHalls 1 It $
`ca, Aee secy id TH(&es Holly's AWA
`erect F ATA of WW105
`Wolce) - ok Up voice)
`
`
`
`
`
`
`
`
`
`Page 18 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 18 of 22
`
`US 2003/0179888A1
`
`
`
`D.
`
`time (samples at 16kHz)
`
`x 1 o'
`
`Figure 17
`
`Page 19 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 19 of 22 US 2003/0179888A1
`
`
`
`
`
`( s )
`
`SIGNAL
`s(n)
`
`
`
`m Coventional
`WAD
`
`Cleaned
`speech
`
`NOISE
`n(n)
`
`i
`
`.-
`
`Reference
`A c 2.
`Figure 1 8
`
`Page 20 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 20 of 22 US 2003/0179888A1
`
`
`
`feo De WAY INR 4WATON TO N0 Se
`see essan Syste'
`
`icuet
`
`Page 21 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 21 of 22 US 2003/0179888A1
`
`the flow data and
`digitize
`
`2eo
`
`
`
`
`
`Gather next 20
`msec of data
`
`2006.
`
`
`
`
`
`Filter out ,
`Wanted flow
`spectra
`
`- 2008
`
`
`
`
`
`OOO
`
`Pass information to
`Pathfinder noise
`suppression system
`
`Figure 20
`
`Page 22 of 40
`
`
`
`Patent Application Publication Sep. 25, 2003 Sheet 22 of 22 US 2003/0179888A1
`
`0.3 - - - -
`
`0.2
`
`-
`
`20
`
`C
`a.
`> 0.1
`o
`C
`
`l,
`BiH,
`
`t
`
`Road
`ev
`
`> -0.1
`t
`O
`2-02
`
`Altaf
`
`R. R.
`g
`
`on 1
`
`-0.3
`O
`
`-
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`x 10
`
`w
`
`
`
`0.15
`
`0.1
`
`0.05
`
`O
`
`O
`g
`
`C
`
`9. o
`
`-0.05
`
`-0.1 -
`
`O
`
`---
`1
`
`2
`
`-l
`3
`4
`Time (samples at 8 kHz)
`
`--
`5
`
`6
`
`7
`x 10'
`
`Figure 21
`
`Page 23 of 40
`
`
`
`US 2003/0179888A1
`
`Sep. 25, 2003
`
`VOICE ACTIVITY DETECTION (VAD) DEVICES
`AND METHODS FOR USE WITH NOISE
`SUPPRESSION SYSTEMS
`
`RELATED APPLICATIONS
`0001. This application claims priority from the following
`U.S. patent applications: application Ser. No. 60/362,162,
`entitled PATHFINDER-BASED VOICE ACTIVITY
`DETECTION (PVAD) USED WITH PATHFINDER
`NOISE SUPPRESSION, filed Mar. 5, 2002; application Ser.
`No. 60/362,170, entitled ACCELEROMETER-BASED
`VOICE ACTIVITY DETECTION (PVAD) WITH PATH
`FINDER NOISE SUPPRESSION, filed Mar. 5, 2002; appli
`cation Ser. No. 60/361,981, entitled ARRAY-BASED
`VOICE ACTIVITY DETECTION (AVAD) AND PATH
`FINDER NOISE SUPPRESSION, filed Mar. 5, 2002; appli
`cation Ser. No. 60/362,161, entitled PATHFINDER NOISE
`SUPPRESSION USING AN EXTERNAL VOICE ACTIV
`ITY DETECTION (VAD) DEVICE, filed Mar. 5, 2002;
`application Ser. No. 60/362,103, entitled ACCELEROM
`ETER-BASED VOICE ACTIVITY DETECTION, filed
`Mar. 5, 2002; and application Ser. No. 60/368,343, entitled
`TWO-MICROPHONE FREQUENCY-BASED VOICE
`ACTIVITY DETECTION, filed Mar. 27, 2002, all of which
`are currently pending.
`0002 Further, this application relates to the following
`U.S. patent applications: application Ser. No. 09/905,361,
`entitled METHOD AND APPARATUS FOR REMOVING
`NOISE FROM ELECTRONIC SIGNALS, filed Jul 12,
`2001; application Ser. No. 10/159,770, entitled DETECT
`INGVOICED AND UNVOICED SPEECH USING BOTH
`ACOUSTIC AND NONACOUSTIC SENSORS, filed May
`30, 2002; and application Ser. No. 10/301.237, entitled
`METHOD AND APPARATUS FOR REMOVING NOISE
`FROM ELECTRONIC SIGNALS, filed Nov. 21, 2002.
`
`TECHNICAL FIELD
`0003. The disclosed embodiments relate to systems and
`methods for detecting and processing a desired signal in the
`presence of acoustic noise.
`
`BACKGROUND
`0004. Many noise suppression algorithms and techniques
`have been developed over the years. Most of the noise
`Suppression Systems in use today for Speech communication
`Systems are based on a Single-microphone spectral Subtrac
`tion technique first develop in the 1970s and described, for
`example, by S. F. Boll in “Suppression of Acoustic Noise in
`Speech using Spectral Subtraction.” IEEE Trans. on ASSP,
`pp. 113-120, 1979. These techniques have been refined over
`the years, but the basic principles of operation have
`remained the same. See, for example, U.S. Pat. No. 5,687,
`243 of McLaughlin, et al., and U.S. Pat. No. 4,811,404 of
`Vilmur, et al. Generally, these techniques make use of a
`single-microphone Voice Activity Detector (VAD) to deter
`mine the background noise characteristics, where “voice” is
`generally understood to include human Voiced speech,
`unvoiced speech, or a combination of Voiced and unvoiced
`Speech.
`0005. The VAD has also been used in digital cellular
`Systems. AS an example of Such a use, See U.S. Pat. No.
`6,453,291 of Ashley, where a VAD configuration appropriate
`
`to the front-end of a digital cellular System is described.
`Further, some Code Division Multiple Access (CDMA)
`systems utilize a VAD to minimize the effective radio
`Spectrum used, thereby allowing for more System capacity.
`Also, Global System for Mobile Communication (GSM)
`Systems can include a VAD to reduce co-channel interfer
`ence and to reduce battery consumption on the client or
`Subscriber device.
`0006 These typical single-microphone VAD systems are
`Significantly limited in capability as a result of the analysis
`of acoustic information received by the Single microphone,
`wherein the analysis is performed using typical Signal pro
`cessing techniques. In particular, limitations in performance
`of these single-microphone VAD Systems are noted when
`processing Signals having a low signal-to-noise ratio (SNR),
`and in Settings where the background noise varies quickly.
`Thus, Similar limitations are found in noise Suppression
`Systems using these Single-microphone VADS.
`
`BRIEF DESCRIPTION OF THE FIGURES
`0007 FIG. 1 is a block diagram of a signal processing
`System including the Pathfinder noise Suppression System
`and a VAD System, under an embodiment.
`0008 FIG. 1A is a block diagram of a VAD system
`including hardware for use in receiving and processing
`Signals relating to VAD, under an embodiment.
`0009 FIG. 1B is a block diagram of a VAD system using
`hardware of the associated noise Suppression System for use
`in receiving VAD information, under an alternative embodi
`ment.
`0010 FIG. 2 is a block diagram of a signal processing
`System that incorporates a classical adaptive noise cancel
`lation System, as known in the art.
`0011 FIG. 3 is a flow diagram of a method for deter
`mining Voiced and unvoiced speech using an accelerometer
`based VAD, under an embodiment.
`0012 FIG. 4 shows plots including a noisy audio signal
`(live recording) along with a corresponding accelerometer
`based VAD Signal, the corresponding accelerometer output
`Signal, and the denoised audio Signal following processing
`by the Pathfinder system using the VAD signal, under an
`embodiment.
`0013 FIG. 5 shows plots including a noisy audio signal
`(live recording) along with a corresponding SSM-based
`VAD Signal, the corresponding SSM Output signal, and the
`denoised audio Signal following processing by the Path
`finder System using the VAD Signal, under an embodiment.
`0014 FIG. 6 shows plots including a noisy audio signal
`(live recording) along with a corresponding GEMS-based
`VAD Signal, the corresponding GEMS output Signal, and the
`denoised audio Signal following processing by the Path
`finder System using the VAD Signal, under an embodiment.
`0015 FIG. 7 shows plots including recorded spoken
`acoustic data with digitally added noise along with a corre
`sponding EGG-based VAD Signal, and the corresponding
`highpass filtered EGG output Signal, under an embodiment.
`0016 FIG. 8 is a flow diagram 80 of a method for
`determining voiced speech using a Video-based VAD, under
`an embodiment.
`
`Page 24 of 40
`
`
`
`US 2003/0179888A1
`
`Sep. 25, 2003
`
`0017 FIG. 9 shows plots including a noisy audio signal
`(live recording) along with a corresponding single (gradient)
`microphone-based VAD Signal, the corresponding gradient
`microphone output Signal, and the denoised audio signal
`following processing by the Pathfinder System using the
`VAD Signal, under an embodiment.
`0.018
`FIG. 10 shows a single cardioid unidirectional
`microphone of the microphone array, along with the asso
`ciated Spatial response curve, under an embodiment.
`0019 FIG. 11 shows a microphone array of a PVAD
`System, under an embodiment.
`0020 FIG. 12 is a flow diagram of a method for deter
`mining voiced and unvoiced speech using H (Z) gain values,
`under an alternative embodiment of the PVAD.
`0021
`FIG. 13 shows plots including a noisy audio signal
`(live recording) along with a corresponding microphone
`based PVAD signal, the corresponding PVAD gain versus
`time signal, and the denoised audio signal following pro
`cessing by the Pathfinder system using the PVAD signal,
`under an embodiment.
`0022 FIG. 14 is a flow diagram of a method for deter
`mining voiced and unvoiced speech using a Stereo VAD,
`under an embodiment.
`0023 FIG. 15 shows plots including a noisy audio signal
`(live recording) along with a corresponding SVAD signal,
`and the denoised audio signal following processing by the
`Pathfinder system using the SVAD signal, under an embodi
`ment.
`0024 FIG. 16 is a flow diagram of a method for deter
`mining Voiced and unvoiced speech using an AVAD, under
`an embodiment.
`0.025
`FIG. 17 shows plots including audio signals and
`from each microphone of an AVAD system along with the
`corresponding combined energy Signal, under an embodi
`ment.
`FIG. 18 is a block diagram of a signal processing
`0.026
`System including the Pathfinder noise Suppression System
`and a single-microphone (conventional) VAD System, under
`an embodiment.
`0027 FIG. 19 is a flow diagram of a method for gener
`ating voicing information using a Single-microphone VAD,
`under an embodiment.
`0028 FIG. 20 is a flow diagram of a method for deter
`mining voiced and unvoiced Speech using an airflow-based
`VAD, under an embodiment.
`0029 FIG. 21 shows plots including a noisy audio signal
`along with a corresponding manually activated/calculated
`VAD Signal, and the denoised audio Signal following pro
`cessing by the Pathfinder system using the manual VAD
`Signal, under an embodiment.
`0.030. In the drawings, the same reference numbers iden
`tify identical or Substantially similar elements or acts. To
`easily identify the discussion of any particular element or
`act, the most significant digit or digits in a reference number
`refer to the Figure number in which that element is first
`introduced (e.g., element 104 is first introduced and dis
`cussed with respect to FIG. 1).
`
`DETAILED DESCRIPTION
`0.031) Numerous Voice Activity Detection (VAD) devices
`and methods are described below for use with adaptive noise
`Suppression Systems. Further, results are presented below
`from experiments using the VAD devices and methods
`described herein as a component of a noise Suppression
`system, in particular the Pathfinder Noise Suppression Sys
`tem available from Aliph, San Francisco, Calif. (http://
`www.aliph.com), but the embodiments are not so limited. In
`the description below, when the Pathfinder noise suppres
`Sion System is referred to, it should be kept in mind that
`noise Suppression Systems that estimate the noise waveform
`and Subtract it from a signal and that use or are capable of
`using VAD information for reliable operation are included in
`that reference. Pathfinder is Simply a convenient referenced
`implementation for a System that operates on Signals com
`prising desired Speech Signals along with noise.
`0032. When using the VAD devices and methods
`described herein with a noise Suppression System, the VAD
`Signal is processed independently of the noise Suppression
`System, So that the receipt and processing of VAD informa
`tion is independent from the processing associated with the
`noise Suppression, but the embodiments are not So limited.
`This independence is attained physically (i.e., different hard
`ware for use in receiving and processing Signals relating to
`the VAD and the noise Suppression), through processing
`(i.e., using the same hardware to receive signals into the
`noise Suppression System while using independent tech
`niques (Software, algorithms, routines) to process the
`received signals), and through a combination of different
`hardware and different Software.
`0033. In the following description, “acoustic' is gener
`ally defined as acoustic waves propagating in air. Propaga
`tion of acoustic waves in media other than air will be noted
`as such. References to “speech” or “voice” generally refer to
`human Speech including voiced Speech, unvoiced speech,
`and/or a combination of Voiced and unvoiced speech.
`Unvoiced speech or voiced speech is distinguished where
`necessary. The term “noise Suppression' generally describes
`any method by which noise is reduced or eliminated in an
`electronic Signal.
`0034) Moreover, the term “VAD” is generally defined as
`a vector or array Signal, data, or information that in Some
`manner represents the occurrence of Speech in the digital or
`analog domain. A common representation of VAD informa
`tion is a one-bit digital Signal Sampled at the same rate as the
`corresponding acoustic Signals, with a Zero value represent
`ing that no speech has occurred during the corresponding
`time Sample, and a unity value indicating that speech has
`occurred during the corresponding time Sample. While the
`embodiments described herein are generally described in the
`digital domain, the descriptions are also valid for the analog
`domain.
`0035) The VAD devices/methods described herein gen
`erally include Vibration and movement Sensors, acoustic
`Sensors, and manual VAD devices, but are not So limited. In
`one embodiment, an accelerometer is placed on the skin for
`use in detecting skin Surface vibrations that correlate with
`human Speech. These recorded vibrations are then used to
`calculate a VAD Signal for use with or by an adaptive noise
`Suppression algorithm in Suppressing environmental acous
`tic noise from a simultaneously (within a few milliseconds)
`recorded acoustic Signal that includes both Speech and noise.
`
`Page 25 of 40
`
`
`
`US 2003/0179888A1
`
`Sep. 25, 2003
`
`0036) Another embodiment of the VAD devices/methods
`described herein includes an acoustic microphone modified
`with a membrane So that the microphone no longer effi
`ciently detects acoustic vibrations in air. The membrane,
`though, allows the microphone to detect acoustic vibrations
`in objects with which it is in physical contact (allowing a
`good mechanical impedance match), Such as human skin.
`That is, the acoustic microphone is modified in Some way
`Such that it no longer detects acoustic vibrations in air
`(where it no longer has a good physical impedance match),
`but only in objects with which the microphone is in contact.
`This configures the microphone, like the accelerometer, to
`detect vibrations of human skin associated with the Speech
`production of that human while not efficiently detecting
`acoustic environmental noise in the air. The detected vibra
`tions are processed to form a VAD Signal for use in a noise
`Suppression System, as detailed below.
`0037 Yet another embodiment of the VAD described
`herein uses an electromagnetic Vibration Sensor, Such as a
`radiofrequency vibrometer (RF) or laser vibrometer, which
`detect skin vibrations. Further, the RF vibrometer detects the
`movement of tissue within the body, Such as the inner
`Surface of the cheek or the tracheal wall. Both the exterior
`skin and internal tissue vibrations associated with Speech
`production can be used to form a VAD Signal for use in a
`noise Suppression System as detailed below.
`0038. Further embodiments of the VAD devices/methods
`described herein include an electroglottograph (EGG) to
`directly detect vocal fold movement. The EGG is an alter
`nating current-(AC) based method of measuring vocal fold
`contact area. When the EGG indicates Sufficient vocal fold
`contact the assumption that follows is that Voiced speech is
`occurring, and a corresponding VAD Signal representative of
`Voiced Speech is generated for use in a noise Suppression
`system as detailed below. Similarly, an additional VAD
`embodiment uses a Video System to detect movement of a
`perSon's Vocal articulators, an indication that Speech is being
`produced.
`0039) Another set of VAD devices/methods described
`below use Signals received at one or more acoustic micro
`phones along with corresponding Signal processing tech
`niques to produce VAD Signals accurately and reliably under
`most environmental noise conditions. These embodiments
`include simple arrays and co-located (or nearly SO) combi
`nations of omnidirectional and unidirectional acoustic
`microphones. The simplest configuration in this set of VAD
`embodiments includes the use of a Single microphone,
`located very close to the mouth of the user in order to record
`signals at a relatively high SNR. This microphone can be a
`gradient or “close-talk' microphone, for example. Other
`configurations include the use of combinations of unidirec
`tional and omnidirectional microphones in various orienta
`tions and configurations. The Signals received at these
`microphones, along with the associated Signal processing,
`are used to calculate a VAD Signal for use with a noise
`Suppression System, as described below. Also described
`below is a VAD System that is activated manually, as in a
`walkie-talkie, or by an observer to the System.
`0040 AS referenced above, the VAD devices and meth
`ods described herein are for use with noise Suppression
`systems like, for example, the Pathfinder Noise Suppression
`System (referred to herein as the “Pathfinder system”)
`
`available from Aliph of San Francisco, Calif. While the
`descriptions of the VAD devices herein are provided in the
`context of the Pathfinder Noise Suppression System, those
`skilled in the art will recognize that the VAD devices and
`methods can be used with a variety of noise Suppression
`Systems and methods known in the art.
`0041. The Pathfinder system is a digital signal process
`ing-(DSP) based acoustic noise Suppression and echo
`cancellation system. The Pathfinder system, which can
`couple to the front-end of Speech processing Systems, uses
`VAD information and received acoustic information to
`reduce or eliminate noise in desired acoustic Signals by
`estimating the noise waveform and Subtracting it from a
`Signal including both Speech and noise. The Pathfinder
`system is described further below and in the Related Appli
`cations.
`0042 FIG. 1 is a block diagram of a signal processing
`system 100 including the Pathfinder noise Suppression sys
`tem 101 and a VAD system 102, under an embodiment. The
`Signal processing System 100 includes two microphones
`MIC 1110 and MIC 2112 that receive signals or information
`from at least one speech Signal Source 120 and at least one
`noise Source 122. The path s(n) from the speech signal
`source 120 to MIC 1 and the path n(n) from the noise source
`122 to MIC 2 are considered to be unity. Further, H., (z)
`represents the path from the noise source 122 to MIC 1, and
`H2(z) represents the path from the speech Signal Source 120
`to MIC 2. In contrast to the signal processing system 100
`including the Pathfinder system 101, FIG. 2 is a block
`diagram of a signal processing System 200 that incorporates
`a classical adaptive noise cancellation System 202 as known
`in the art.
`0043 Components of the signal processing system 100,
`for example the noise Suppression System 101, couple to the
`microphones MIC 1 and MIC 2 via wireless couplings,
`wired couplings, and/or a combination of wireleSS and wired
`couplings. Likewise, the VAD System 102 couples to com
`ponents of the Signal processing System 100, like the noise
`Suppression System 101, via wireleSS couplings, wired cou
`plings, and/or a combination of wireleSS and wired cou
`plings. AS an example, the VAD devices and microphones
`described below as components of the VAD system 102 can
`comply with the Bluetooth wireless specification for wire
`leSS communication with other components of the Signal
`processing System, but are not So limited.
`0044) Referring to FIG. 1, the VAD signal 104 from the
`VAD system 102, derived in a manner described herein,
`controls noise removal from the received signals without
`respect to noise type, amplitude, and/or orientation. When
`the VAD signal 104 indicates an absence of voicing, the
`Pathfinder system 101 uses MIC 1 and MIC 2 signals to
`calculate the coefficients for a model of transfer function
`H (Z) over pre-specified Subbands of the received signals.
`When the VAD signal 104 indicates the presence of voicing,
`the Pathfinder system 101 stops updating H (Z) and starts
`calculating the coefficients for transfer function H(Z) over
`pre-Specified Subbands of the received signals. Updates of
`H, coefficients can continue in a Subband during speech
`production if the SNR in the Subband is low (note that H(z)
`and H(Z) are Sometimes referred to herein as H and H2,
`respectively, for convenience). The Pathfinder system 101 of
`an embodiment uses the Least Mean Squares (LMS) tech
`
`Page 26 of 40
`
`
`
`US 2003/0179888A1
`
`Sep. 25, 2003
`
`nique to calculate H and H2, as described further by B.
`Widrow and S. Stearns in “Adaptive Signal Processing”,
`Prentice-Hall Publishing, ISBN 0-13-004029-0, but is not so
`limited. The transfer function can be calculated in the time
`domain, frequency domain, or a combination of both the
`time/frequency domains. The Pathfinder system Subse
`quently removes noise from the received acoustic Signals of
`interest using combinations of the transfer functions H (Z)
`and H(Z), thereby generating at least one denoised acoustic
`Stream.
`004.5 The Pathfinder system can be implemented in a
`variety of ways, but common to all of the embodiments is
`reliance on an accurate and reliable VAD device and/or
`method. The VAD device/method should be accurate
`because the Pathfinder system updates its filter coefficients
`when there is no speech or when the SNR during speech is
`low. If Sufficient speech energy is present during coefficient
`update, Subsequent Speech with Similar spectral character
`istics can be Suppressed, an undesirable occurrence. The
`VAD device/method should be robust to support high accu
`racy under a variety of environmental conditions. Obviously,
`there are likely to be some conditions under which no VAD
`device/method will operate Satisfactorily, but under normal
`circumstances the VAD device/method should work to pro
`vide maximum noise Suppression with few adverse affects
`on the Speech Signal of interest.
`0046) When using VAD devices/methods with a noise
`Suppression System, the VAD Signal is processed indepen
`dently of the noise Suppression System, So that the receipt
`and processing of VAD information is independent from the
`processing associated with the noise Suppression, but the
`embodiments are not So limited. This independence is
`attained physically (i.e., different hardware for use in receiv
`ing and processing Signals relating to the VAD and the noise
`Suppression), through processing (i.e., using the same hard
`ware to receive signals into the noise Suppression System
`while using independent techniques (Software, algorithms,
`routines) to process the received signals), and through a
`combination of different hardware and different Software, as
`described below.
`0047 FIG. 1A is a block diagram of a VAD system 102A
`including hardware for use in receiving and processing
`signals relating to VAD, under an embodiment. The VAD
`system 102A includes a VAD device 130 coupled to provide
`data to a corresponding VAD algorithm 140. Note that noise
`Suppression Systems of alternative embodiments can inte
`grate some or all functions of the VAD algorithm with the
`noise Suppression processing in any manner obvious to those
`skilled in the art.
`0048 FIG. 1B is a block diagram of a VAD system 102B
`using hardware of the associated noise Suppression System
`101 for use in receiving VAD information 164, under an
`embodiment. The VAD system 102B includes a VAD algo
`rithm 150 that receives data 164 from MIC 1 and MIC 2, or
`other components, of the corresponding Signal processing
`system 100. Alternative embodiments of the noise Suppres
`Sion System can integrate Some or all functions of the VAD
`algorithm with the noise Suppression processing in any
`manner obvious to those skilled in the art.
`0049) Vibration/Movement-Based VAD Devices/Meth
`ods
`0050. The vibration/movement-based VAD devices
`include the physical hardware devices for use in receiving
`
`and processing Signals relating to the VAD and the noise
`Suppression. As a Speaker or user produces Speech, the
`resulting vibrations propagate through the tissue of the
`Speaker and, therefore can be detected on and beneath the
`skin using various methods. These vibrations are an excel
`lent Source of VAD information, as they are strongly asso
`ciated with both voiced and unvoiced speech (although the
`unvoiced Speech vibrations are much weaker and more
`difficult to detect) and generally are only slightly affected by
`environmental acoustic noise (Some devices/methods, for
`example the electromagnetic Vibrometers described below,
`are not affected by environmental acoustic noise). These
`tissue vibrations or movements are det