throbber
I 1111111111111111 11111 1111111111 111111111111111 1111111111111111 IIII IIII IIII
`US008019091B2
`
`c12) United States Patent
`Burnett et al.
`
`(IO) Patent No.:
`(45) Date of Patent:
`
`US 8,019,091 B2
`*Sep. 13, 2011
`
`(54) VOICE ACTIVITY DETECTOR (VAD) -BASED
`MULTIPLE-MICROPHONE ACOUSTIC
`NOISE SUPPRESSION
`
`(75)
`
`Inventors: Gregory C. Burnett, Dodge Center, MN
`(US); Eric F. Breitfeller, Dublin, CA
`(US)
`
`(73) Assignee: Aliphcom, Inc., San Francisco, CA (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 713 days.
`
`This patent is subject to a terminal dis(cid:173)
`claimer.
`
`(21) Appl. No.: 10/667,207
`
`(22) Filed:
`
`Sep.18,2003
`
`(65)
`
`Prior Publication Data
`
`US 2004/0133421 Al
`
`Jul. 8, 2004
`
`Related U.S. Application Data
`
`(63) Continuation-in-part of application No. 09/905,361,
`filed on Jul. 12, 2001, now abandoned.
`
`(60) Provisional application No. 60/219,297, filed on Jul.
`19, 2000.
`
`(51)
`
`Int. Cl.
`H03B 29100
`(2006.01)
`(52) U.S. Cl. ....................................... 381/71.8; 704/215
`(58) Field of Classification Search .................... 381/70,
`381/94.1-94.7, 71.8, 91-92, 122, 71.1; 704/200,
`704/231,233,246,214-215
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`3,789,166 A *
`1/1974 Sebesta
`4,006,318 A *
`2/ 1977 Sebesta et al.
`4,591,668 A *
`5/ 1986 Iwata
`
`4,901,354 A *
`5,097,515 A *
`5,212,764 A
`5,400,409 A
`5,406,622 A *
`5,414,776 A
`5,463,694 A *
`
`2/1990 Gollmar et al.
`3/1992 Baba
`5/1993 Ariyoshi
`3/1995 Linhard
`4/1995 Silverberg et al. ........... 381/94.7
`5/1995 Sims, Jr.
`10/1995 Bradley et al. .................. 381/92
`(Continued)
`
`EP
`
`FOREIGN PATENT DOCUMENTS
`0 637 187 A * 2/1995
`(Continued)
`
`OTHER PUBLICATIONS
`
`Zhao Li et al: "Robust Speech Coding Using Microphone Arrays",
`Signals Systems and Computers, 1997. Conf. recordof3 l st Asilomar
`Conf., Nov. 2-5, 1997, IEEE Comput. Soc. Nov. 2, 1997. USA.
`
`(Continued)
`
`Primary Examiner - Davetta Goins
`Assistant Examiner - Lun-See Lao
`(74) Attorney, Agent, or Firm - Gregory & Sawrie LLP
`
`(57)
`
`ABSTRACT
`
`Acoustic noise suppression is provided in multiple-micro(cid:173)
`phone systems using Voice Activity Detectors (V AD). A host
`system receives acoustic signals via multiple microphones.
`The system also receives information on the vibration of
`human tissue associated with human voicing activity via the
`VAD. In response, the system generates a transfer function
`representative of the received acoustic signals upon determin(cid:173)
`ing that voicing information is absent from the received
`acoustic signals during at least one specified period of time.
`The system removes noise from the received acoustic signals
`using the transfer function, thereby producing a denoised
`acoustic data stream.
`
`204
`
`s(n)
`
`H/z)
`
`H1(z)
`
`n(n)
`
`100
`
`<
`((1:>))
`Signal
`s(n)
`
`101
`
`<
`((1:>))
`Noise
`n(n)
`
`20 Claims, 10 Drawing Sheets
`
`..,,,-----200
`
`Voicing Information
`
`c~2
`
`Mic 1 m1(n)
`
`/
`ni(n)
`
`Noise Removal
`
`Cleaned Speech
`
`205
`
`103
`si(n)
`r:!__ / m,(,)
`~
`
`Mic 2
`
`Page 1 of 21
`
`GOOGLE EXHIBIT 1001
`
`

`

`US 8,019,091 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`5,473,701 A *
`12/1995 Cezanne et al.
`................ 381/92
`5,473,702 A *
`12/1995 Yoshida et al.
`.............. 381/94.7
`5,515,865 A *
`5/ 1996 Scanlon et al.
`5,517,435 A *
`5/1996 Sugiyama ..................... 708/322
`5,539,859 A
`7/1996 Robbe et al.
`5,590,241 A *
`12/1996 Park et al. ..................... 704/227
`5,633,935 A *
`5/ 1997 Kanamori et al.
`.............. 381/26
`5,649,055 A
`7 / 1997 Gupta et al.
`5,684,460 A *
`1111997 Scanlon et al.
`5,729,694 A *
`3/1998 Holzrichter et al. ............ 705/17
`5,754,665 A *
`5/ 1998 Hosoi
`.......................... 381/94.1
`5,835,608 A
`1111998 Warnaka et al.
`5,853,005 A *
`12/1998 Scanlon
`6/1999 Sasaki et al.
`5,917,921 A
`5,966,090 A
`10/1999 McEwan
`5,986,600 A
`1111999 McEwan
`6,006,175 A *
`12/1999 Holzrichter ................... 704/208
`6,009,396 A
`12/1999 Nagata
`6,069,963 A *
`5/2000 Martin et al.
`6,191,724 Bl
`2/2001 McEwan
`7/2001 Ikeda
`6,266,422 Bl
`6,430,295 Bl
`8/2002 Handel et al.
`6,707,910 Bl*
`3/2004 Valve et al. .............. 379/388.06
`2002/0039425 Al*
`4/2002 Burnett et al.
`2003/0228023 Al *
`12/2003 Burnett et al ................... 381/92
`
`EP
`EP
`
`FOREIGN PATENT DOCUMENTS
`0 795 851 A2 * 9/1997
`0 984 660 A2 * 3/2000
`
`JP
`JP
`WO
`
`2000 312 395
`2001 189 987
`WO 02 07151
`
`* 11/2000
`* 7/2001
`* 1/2002
`
`OTHER PUBLICATIONS
`
`L.C. Ng et al.: "Denoising of Human Speech Using Combined
`Acoustic and EM Sensor Signal Processing", 2000 IEEE Intl Conf on
`Acoustics Speech and Signal Processing. Proceedings (Cat. No.
`00CH37100), Istanbul, Turkey, Jun. 5-9, 2000 XP002186255, ISBN
`0-7803-6293-4.
`S. Affes et al.: "A Signal Subspace Tracking Algorithm for Micro(cid:173)
`phone Array Processing of Speech". IEEE Transactions on Speech
`and Audio Processing, N.Y, USA vol. 5, No. 5, Sep. 1, 1997.
`XP000774303, ISBN 1063-6676.
`GregoryC. Burnett: "The Physiological Basis of Glottal Electromag(cid:173)
`netic Micropower Sensors (GEMS) and Their Use in Defining an
`Excitation Function for the Human Vocal Tract", Dissertation. Uni(cid:173)
`versity of California at Davis, Jan. 1999, USA.
`L.C. Ng et al.: "Speaker Verification Using Combined Acoustic and
`EM Sensor Signal Processing", ICASSP-2001, Salt Lake City, USA.
`A. Hussain: "Intelligibility Assessment of a Multi-Band Speech
`Enhancement Scheme", Proceedings IEEE Intl. Conf. on Acoustics,
`Speech & Signal Processing (ICASSP-2000). Istanbul, Turkey, Jun.
`2000.
`
`* cited by examiner
`
`Page 2 of 21
`
`

`

`"'""' = N
`=
`= "'""'
`
`\0
`
`\0
`
`00
`r.,;_
`d
`
`....
`0 ....
`....
`.....
`rJJ =-
`
`('D
`('D
`
`0
`
`N
`~
`
`~
`
`....
`0 ....
`'? ....
`
`('D
`rJJ
`
`FIG.2
`
`205
`
`Cleaned Speech
`
`I 1'Ul>C 1'.CWUVill
`
`Mic 2
`
`• ~ / m2(n)
`
`103
`
`~
`
`si(n)
`
`~
`/
`
`ni(n)
`/
`
`~
`
`m1(n)
`► ~
`cl02
`
`Mic I
`
`~200
`
`I
`
`Voicing Information
`
`204 ---i V AD I
`
`n( n)
`
`H (z) f
`
`({<~•>) ~ 1
`
`n(n)
`Noise
`
`IOI )
`
`s(n)
`
`Signal: H (z) I
`({•:>>) r
`~
`100
`
`2
`
`s(n)
`
`=
`
`~
`~
`~
`~
`•
`00
`~
`
`~
`
`FIG.I
`
`► Subsystem
`11 Denoising ~
`
`40
`
`Voicing I
`
`Sensors
`
`20~
`
`~30
`
`Processor
`
`-1
`
`I 000 ~ 10 ~ Microphones I
`
`Page 3 of 21
`
`

`

`U.S. Patent
`U.S. Patent
`
`Sep.13,2011
`Sep. 13, 2011
`
`Sheet 2 of 10
`Sheet 2 of 10
`
`US 8,019,091 B2
`US 8,019,091 B2
`
`t ---N ----~
`I
`~i
`
`<—(2)'W—f(%)
`;(2)s|tN(2)'Hjeusig
`
`
`......
`B
`s:::1 N
`t,::S - - (cid:173)
`.!,◄ bl)---
`~ ....... Cl')
`CZ)
`
`t ---N ---N
`~
`I
`~i
`
`<—(z)"~—~(z)'p(®)
`éZW(2)"D(2)"y(2)'N]9SION
`
`c:,
`c:,
`("r)
`
`006—~,
`)
`
`€OldoyUaston
`
`s:::l
`
`(>)
`
`(z)"9()
`(z)H¢9SION
`-(z)"N
`
`---
`-:=::- CUN
`..::::::- oz
`•◄ <:'-I - - -
`('°'I
`-
`.........
`z
`
`Page 4 of 21
`
`Page 4 of 21
`
`

`

`U.S. Patent
`U.S. Patent
`
`Sep.13,2011
`Sep. 13, 2011
`
`Sheet 3 of 10
`Sheet 3 of 10
`
`US 8,019,091 B2
`US 8,019,091 B2
`
`t
`
`,,-..,
`...._.
`N
`C"l
`::E
`I
`C\J~
`
`- - - - - - -
`
`C)
`C)
`-.::f"
`
`—400
`)
`
`FIG.4
`
`t
`
`,,-..,
`...._.
`N
`::E ......
`I
`C\Ji
`
`✓
`✓
`
`-✓
`
`H(2)
`
`()) SignalS(z)
`
`)
`
`Page 5 of 21
`
`Page 5 of 21
`
`

`

`U.S. Patent
`
`Sep.13,2011
`
`Sheet 4 of 10
`
`US 8,019,091 B2
`
`Start
`--
`
`¥500
`
`Receive acoustic signals
`
`r--..-- 502
`
`'
`Receive voice activity
`(V AD) information
`
`r--..--
`
`504
`
`,
`Determine absence of
`voicing and generate first
`transfer function
`
`---,__. 506
`
`'
`Determine presence of
`voicing and generate
`second transfer function
`
`r--..-- 508
`
`,~
`
`Produce denoised ~
`510
`acoustic data stream
`
`FIG.5
`
`~
`
`End
`
`Page 6 of 21
`
`

`

`U.S. Patent
`
`Sep.13,2011
`
`Sheet 5 of 10
`
`US 8,019,091 B2
`
`Noise Removal Results for American English Female Saying 406-5562
`X 104
`
`40
`
`5 5 6 2
`
`Dirty
`Audio
`604
`
`1.5
`1
`0.5
`0
`-0.5
`-1
`
`0
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`Cleaned
`Audio
`602
`
`10000
`8000
`6000
`4000
`2000
`0
`-2000
`-4000
`-6000
`-8000
`0
`
`406 5 5 6 2
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`FIG.6
`
`Page 7 of 21
`
`

`

`U.S. Patent
`
`Sep.13,2011
`
`Sheet 6 of 10
`
`US 8,019,091 B2
`
`FIG.7A
`
`FIG.7B
`
`VAD
`
`VAD
`Device
`
`VAD
`Algorithm
`
`704
`
`Noise
`Suppression
`
`702A
`
`730
`
`740
`
`701
`
`VAD
`
`~ 702B
`
`VAD
`Algorithm
`
`_ __,,,_
`
`750
`
`764
`
`~ 1--._r----7 0
`4
`
`Signal
`Processing ~ 700
`System
`
`.
`
`Noise
`Suppression
`System
`
`1-.J--._ 701
`
`Page 8 of 21
`
`

`

`U.S. Patent
`
`Sep.13,2011
`
`Sheet 7 of 10
`
`US 8,019,091 B2
`
`~800
`
`802
`
`804
`
`806
`
`808
`
`810
`
`812
`
`814
`
`816
`
`Receive accelerometer data
`
`Filter and digitize accelerometer data
`
`Segment and step digitized data
`
`Remove spectral information corrupted by noise
`
`Calculate energy in each window
`
`Compare energy to threshold values
`
`Energy above threshold indicates voiced speech
`
`Energy below threshold indicates unvoiced speech
`
`FIG.8
`
`Page 9 of 21
`
`

`

`U.S. Patent
`
`Sep.13,2011
`
`Sheet 8 of 10
`
`US 8,019,091 B2
`
`0
`
`·-"'O
`~
`00 ·-0
`z
`
`>,
`
`..... cu
`$--< cu
`8
`$--< cu -cu
`0
`
`u
`u
`<
`
`"'O cu
`
`00 ·-0
`
`i:::l cu
`A
`
`0.4
`0.2
`0
`-0.2
`-0.4
`
`0.2
`0.1
`0
`-0.1
`-0.2
`
`0.2
`0.1
`0
`-0.1
`-0.2
`
`2
`
`2.5
`
`3
`
`3.5
`
`4
`
`4.5
`
`5
`
`5.5
`
`912
`
`2
`
`2.5
`
`3
`
`3.5
`
`4
`
`4.5
`
`5
`
`5.5
`
`6.5
`6
`X 104
`
`6.5
`6
`X 104
`
`922
`
`2
`
`2.5
`
`3
`
`3.5
`
`4
`
`4.5
`
`5
`
`5.5
`
`Time (samples at 8 kHz)
`
`6.5
`6
`X 104
`
`FIG.9
`
`Page 10 of 21
`
`

`

`U.S. Patent
`
`Sep.13,2011
`
`Sheet 9 of 10
`
`US 8,019,091 B2
`
`0
`
`·-"Cj
`:::s <
`er., --0
`z
`
`>,
`
`0.4
`0.2
`0
`-0.2
`-0.4
`
`2
`
`2.5
`
`3
`
`3.5
`
`4
`
`4.5
`
`5
`
`5.5
`
`::E
`00
`00
`
`0.1
`0.05
`0
`-0.05
`-0.1
`
`1012
`
`2
`
`2.5
`
`3
`
`3.5
`
`4
`
`4.5
`
`5
`
`5.5
`
`6.5
`6
`X 104
`
`6.5
`6
`X 104
`
`0 ·-"Cj
`:::s <
`er., --
`
`"Cj
`0
`
`0
`$::I
`0
`0
`
`0.2
`0.1
`0
`-0.1
`-0.2
`
`1022
`
`2
`
`2.5
`
`3
`
`3.5
`
`4
`
`4.5
`
`5
`
`5.5
`
`Time (samples at 8 kHz)
`
`6.5
`6
`X 104
`
`FIG.IO
`
`Page 11 of 21
`
`

`

`U.S. Patent
`
`Sep.13,2011
`
`Sheet 10 of 10
`
`US 8,019,091 B2
`
`0
`
`·--0 :::s <
`>-er.I ·-0
`z
`
`1
`
`0.5
`0
`-0.5
`-1
`
`0
`
`1
`0.5
`0
`-0.5
`-1
`
`0.6
`0.4
`0.2
`0
`-0.2
`-0.4
`-0.6
`
`C'-l
`::E
`~
`0
`
`0
`
`·--0 :::s
`<
`er.I ·-
`
`'"t:l
`(I.)
`
`0
`i::::l
`(I.)
`0
`
`0.5
`
`1
`
`1.5
`
`2
`
`2.5
`
`3
`
`3.5
`
`1112
`
`0.5
`
`1
`
`1.5
`
`2
`
`2.5
`
`3
`
`3.5
`
`0.5
`
`1
`
`1.5
`
`2
`
`2.5
`
`3
`
`3.5
`
`Time (samples at 8 kHz)
`
`FIG.11
`
`4
`X 104
`
`4
`X 104
`
`4
`X 104
`
`Page 12 of 21
`
`

`

`US 8,019,091 B2
`
`1
`VOICE ACTIVITY DETECTOR (VAD) -BASED
`MULTIPLE-MICROPHONE ACOUSTIC
`NOISE SUPPRESSION
`
`RELATED APPLICATIONS
`
`This patent application is a continuation-in-part of U.S.
`patent application Ser. No. 09/905,361, filed Jul. 12, 2001,
`now abandoned which claims priority from U.S. patent appli(cid:173)
`cation Ser. No. 60/219,297, filed Jul. 19, 2000. This patent
`application also claims priority from U.S. patent application
`Ser. No. 10/383,162, filed Mar. 5, 2003.
`
`FIELD OF THE INVENTION
`
`The disclosed embodiments relate to systems and methods
`for detecting and processing a desired signal in the presence
`of acoustic noise.
`
`BACKGROUND
`
`2
`FIG. 3 is a block diagram including front-end components
`of a noise removal algorithm of an embodiment generalized
`ton distinct noise sources (these noise sources may be reflec(cid:173)
`tions or echoes of one another).
`FIG. 4 is a block diagram including front-end components
`of a noise removal algorithm of an embodiment in a general
`case where there are n distinct noise sources and signal reflec(cid:173)
`tions.
`FIG. 5 is a flow diagram of a denoising method, under an
`10 embodiment.
`FIG. 6 shows results of a noise suppression algorithm of an
`embodiment for an American English female speaker in the
`presence of airport terminal noise that includes many other
`15 human speakers and public announcements.
`FIG. 7A is a block diagram of a Voice Activity Detector
`(VAD) system including hardware for use in receiving and
`processing signals relating to VAD, under an embodiment.
`FIG. 7B is a block diagram of a VAD system using hard-
`20 ware of a coupled noise suppression system for use in receiv(cid:173)
`ing VAD information, under an alternative embodiment.
`FIG. 8 is a flow diagram of a method for determining
`voiced and unvoiced speech using an accelerometer-based
`VAD, under an embodiment.
`FIG. 9 shows plots including a noisy audio signal (live
`recording) along with a corresponding accelerometer-based
`VAD signal, the corresponding accelerometer output signal,
`and the denoised audio signal following processing by the
`noise suppression system using the VAD signal, under an
`30 embodiment.
`FIG. 10 shows plots including a noisy audio signal (live
`recording) along with a corresponding SSM-based VAD sig(cid:173)
`nal, the corresponding SSM output signal, and the denoised
`audio signal following processing by the noise suppression
`35 system using the VAD signal, under an embodiment.
`FIG. 11 shows plots including a noisy audio signal (live
`recording) along with a corresponding GEMS-based VAD
`signal, the corresponding GEMS output signal, and the
`denoised audio signal following processing by the noise sup-
`40 pression system using the VAD signal, under an embodiment.
`
`DETAILED DESCRIPTION
`
`The following description provides specific details for a
`thorough understanding of, and enabling description for,
`embodiments of the noise suppression system. However, one
`skilled in the art will understand that the invention may be
`practiced without these details. In other instances, well(cid:173)
`known structures and functions have not been shown or
`described in detail to avoid unnecessarily obscuring the
`description of the embodiments of the noise suppression sys(cid:173)
`tem. In the following description, "signal" represents any
`acoustic signal (such as human speech) that is desired, and
`"noise" is any acoustic signal (which may include human
`speech) that is not desired. An example would be a person
`talking on a cellular telephone with a radio in the background.
`The person's speech is desired and the acoustic energy from
`the radio is not desired. In addition, "user" describes a person
`who is using the device and whose speech is desired to be
`60 captured by the system.
`Also, "acoustic" is generally defined as acoustic waves
`propagating in air. Propagation of acoustic waves in media
`other than air will be noted as such. References to "speech" or
`"voice" generally refer to human speech including voiced
`65 speech, unvoiced speech, and/or a combination of voiced and
`unvoiced speech. Unvoiced speech or voiced speech is dis(cid:173)
`tinguished where necessary. The term "noise suppression"
`
`Many noise suppression algorithms and techniques have
`been developed over the years. Most of the noise suppression
`systems in use today for speech communication systems are
`based on a single-microphone spectral subtraction technique 25
`first develop in the 1970's and described, for example, by S.
`F. Boll in "Suppression of Acoustic Noise in Speech using
`Spectral Subtraction," IEEE Trans. on ASSP, pp. 113-120,
`1979. These techniques have been refined over the years, but
`the basic principles of operation have remained the same. See,
`for example, U.S. Pat. No. 5,687,243 of McLaughlin, et al.,
`and U.S. Pat. No. 4,811,404 ofVilmur, et al. Generally, these
`techniques make use of a microphone-based Voice Activity
`Detector (VAD) to determine the background noise charac(cid:173)
`teristics, where "voice" is generally understood to include
`human voiced speech, unvoiced speech, or a combination of
`voiced and unvoiced speech.
`The VAD has also been used in digital cellular systems. As
`an example of such a use, see U.S. Pat. No. 6,453,291 of
`Ashley, where a VAD configuration appropriate to the front(cid:173)
`end of a digital cellular system is described. Further, some
`Code Division Multiple Access (CDMA) systems utilize a
`VAD to minimize the effective radio spectrum used, thereby
`allowing for more system capacity. Also, Global System for
`Mobile Communication (GSM) systems can include a VAD 45
`to reduce co-channel interference and to reduce battery con(cid:173)
`sumption on the client or subscriber device.
`These typical microphone-based VAD systems are signifi(cid:173)
`cantly limited in capability as a result of the addition of
`environmental acoustic noise to the desired speech signal 50
`received by the single microphone, wherein the analysis is
`performed using typical signal processing techniques. In par(cid:173)
`ticular, limitations in performance of these microphone(cid:173)
`based VAD systems are noted when processing signals having
`a low signal-to-noise ratio (SNR), and in settings where the 55
`background noise varies quickly. Thus, similar limitations are
`found in noise suppression systems using these microphone(cid:173)
`based VADs.
`
`BRIEF DESCRIPTION OF THE FIGURES
`
`FIG. 1 is a block diagram of a denoising system, under an
`embodiment.
`FIG. 2 is a block diagram including components of a noise
`removal algorithm, under the denoising system of an embodi(cid:173)
`ment assuming a single noise source and direct paths to the
`microphones.
`
`Page 13 of 21
`
`

`

`US 8,019,091 B2
`
`4
`equal to one when speech is produced, a substantial improve(cid:173)
`ment in the noise removal can be made.
`In analyzing the single noise source 101 and the direct path
`to the microphones, with reference to FIG. 2, the total acous(cid:173)
`tic information coming into MIC 1 is denoted by m 1 (n). The
`total acoustic information coming into MIC 2 is similarly
`labeled min). In the z (digital frequency) domain, these are
`represented as M 1(z) and Miz). Then,
`
`M 1(z)~S(z)+No(z)
`
`with
`
`So(z)~S(z)Ho(z),
`
`so that
`
`Eq.1
`
`This is the general case for all two microphone systems. In
`a practical system there is always going to be some leakage of
`noise into MIC 1, and some leakage of signal into MIC 2.
`Equation 1 has four unknowns and only two known relation(cid:173)
`ships and therefore cannot be solved explicitly.
`However, there is another way to solve for some of the
`unknowns in Equation 1. The analysis starts with an exami(cid:173)
`nation of the case where the signal is not being generated, that
`is, where a signal from the VAD element 204 equals zero and
`speech is not being produced. In this case, s(n) S(z)=O, and
`Equation 1 reduces to
`
`M1n(z)~N(z)H1 (z)
`
`M2n(z)~N(z),
`
`where the n subscript on the M variables indicate that only
`noise is being received. This leads to
`
`Eq. 2
`
`3
`generally describes any method by which noise is reduced or
`eliminated in an electronic signal.
`Moreover, the term "VAD" is generally defined as a vector
`or array signal, data, or information that in some manner
`represents the occurrence of speech in the digital or analog 5
`domain. A common representation ofVAD information is a
`one-bit digital signal sampled at the same rate as the corre(cid:173)
`sponding acoustic signals, with a zero value representing that
`no speech has occurred during the corresponding time
`sample, and a unity value indicating that speech has occurred 10
`during the corresponding time sample. While the embodi(cid:173)
`ments described herein are generally described in the digital
`domain, the descriptions are also valid for the analog domain.
`FIG.1 is a block diagram of a denoising system 1000 ofan
`embodiment that uses knowledge of when speech is occurring 15
`derived from physiological information on voicing activity.
`The system 1000 includes microphones 10 and sensors 20
`that provide signals to at least one processor 30. The proces(cid:173)
`sor includes a denoising subsystem or algorithm 40.
`FIG. 2 is a block diagram including components of a noise 20
`removal algorithm 200 of an embodiment. A single noise
`source and a direct path to the microphones are assumed. An
`operational description of the noise removal algorithm 200 of
`an embodiment is provided using a single signal source 100
`and a single noise source 101, but is not so limited. This 25
`algorithm 200 uses two microphones: a "signal" microphone
`1 ("MICl") and a "noise" microphone 2 ("MIC 2"), but is not
`so limited. The signal microphone MIC 1 is assumed to cap(cid:173)
`ture mostly signal with some noise, while MIC 2 captures
`mostly noise with some signal. The data from the signal 30
`source 100 to MIC 1 is denoted by s(n), where s(n) is a
`discrete sample of the analog signal from the source 100. The
`data from the signal source 100 to MIC 2 is denoted by sin).
`The data from the noise source 101 to MIC 2 is denoted by
`n(n). The data from the noise source 101 to MIC 1 is denoted 35
`by nin). Similarly, the data from MIC 1 to noise removal
`element 205 is denoted by m 1 (n), and the data from MIC 2 to
`noise removal element 205 is denoted by min).
`The noise removal element 205 also receives a signal from
`a voice activity detection (VAD) element 204. The VAD 204 40
`uses physiological information to determine when a speaker
`is speaking. In various embodiments, the VAD can include at
`least one of an accelerometer, a skin surface microphone in
`physical contact with skin of a user, a human tissue vibration
`detector, a radio frequency (RF) vibration and/or motion 45
`detector/device, an electroglottograph, an ultrasound device,
`an acoustic microphone that is being used to detect acoustic
`frequency signals that correspond to the user's speech
`directly from the skin of the user (anywhere on the body), an
`airflow detector, and a laser vibration detector.
`The transfer functions from the signal source 100 to MIC 1
`and from the noise source 101 to MIC 2 are assumed to be
`unity. The transfer function from the signal source 100 to MIC
`2 is denoted by Hiz), and the transfer function from the noise
`source 101 to MIC 1 is denoted by H 1 (z). The assumption of 55
`unity transfer functions does not inhibit the generality of this
`algorithm, as the actual relations between the signal, noise,
`and microphones are simply ratios and the ratios are redefined
`in this manner for simplicity.
`In conventional two-microphone noise removal systems, 60
`the information from MIC 2 is used to attempt to remove
`noise from MIC 1. However, an (generally unspoken)
`assumption is that the VAD element 204 is never perfect, and
`thus the denoising must be performed cautiously, so as not to
`remove too much of the signal along with the noise. However, 65
`if the VAD 204 is assumed to be perfect such that it is equal to
`zero when there is no speech being produced by the user, and
`
`The function H 1 (z) can be calculated using any of the
`available system identification algorithms and the micro(cid:173)
`phone outputs when the system is certain that only noise is
`50 being received. The calculation can be done adaptively, so
`that the system can react to changes in the noise.
`A solution is now available for one of the unknowns in
`Equation 1. Another unknown, Hiz), can be determined by
`using the instances where the VAD equals one and speech is
`being produced. When this is occurring, but the recent (per(cid:173)
`haps less than 1 second) history of the microphones indicate
`low levels of noise, it can be assumed that n(s)=N(z)-0. Then
`Equation 1 reduces to
`
`M2,(z)~S(z )Ho(z ),
`
`which in turn leads to
`
`Page 14 of 21
`
`

`

`US 8,019,091 B2
`
`5
`-continued
`
`which is the inverse of the H 1 (z) calculation. However, it is
`noted that different inputs are being used (now only the signal
`is occurring whereas before only the noise was occurring).
`While calculating Hiz), the values calculated for H 1(z) are
`held constant and vice versa. Thus, it is assumed that while 10
`one ofH 1 (z) and Hiz) are being calculated, the one not being
`calculated does not change substantially.
`After calculating H 1 (z) and Hiz), they are used to remove
`the noise from the signal. If Equation 1 is rewritten as
`
`S(z)~M1 (z)-N(z)H1 (z)
`
`15
`
`6
`mitted. Once again, the "n" subscripts on the microphone
`inputs denote only that noise is being detected, while an "s"
`subscript denotes that only signal is being received by the
`microphones.
`Examining Equation 4 while assuming an absence of noise
`produces
`
`M2,~SH0 .
`Thus, H0 can be solved for as before, using any available
`transfer function calculating algorithm. Mathematically,
`then,
`
`S(z) [1-Hiz)H1 (z) ]~M1 (z)-Miz)H1 (z),
`then N(z) may be substituted as shown to solve for S(z) as
`
`Rewriting Equation 4, using H 1 defined in Equation 6,
`20 provides,
`
`M1(z)-M2(z)H1(z)
`S(z) = 1 - H2(z)H1 (z)

`
`M1 -S
`-
`Hi=M2-SH0·
`
`Eq. 3 25
`
`Solving for S yields,
`
`Eq. 7
`
`Eq. 8
`
`If the transfer functions H 1(z) and Hiz) can be described
`with sufficient accuracy, then the noise can be completely
`removed and the original signal recovered. This remains true 30
`without respect to the amplitude or spectral characteristics of
`the noise. The only assumptions made include use of a perfect
`VAD, sufficiently accurate H 1(z) and Hiz), and that when
`one of H 1 ( z) and Hi z) are being calculated the other does not
`change substantially. In practice these assumptions have 35
`proven reasonable.
`The noise removal algorithm described herein is easily
`generalized to include any number of noise sources. FIG. 3 is
`a block diagram including front-end components 300 of a
`noise removal algorithm of an embodiment, generalized to n 40
`distinct noise sources. These distinct noise sources may be
`reflections or echoes of one another, but are not so limited.
`There are several noise sources shown, each with a transfer
`function, or path, to each microphone. The previously named
`path H2 has been relabeled as H0 , so that labeling noise source 45
`2's path to MIC 1 is more convenient. The outputs of each
`microphone, when transformed to the z domain, are:
`
`Miz)~S(z)Ho(z)+N1(z)G1(z)+Niz)Giz)+ ... Nn(z)Gn
`(z)
`
`Eq. 4
`
`When there is no signal (VAD=0), then (suppressing z for
`clarity)
`
`50
`
`55
`
`M1 -M2H1
`S= - - - -
`l-HoH1
`
`which is the same as Equation 3, with H0 taking the place of
`H2 , and H 1 taking the place of H 1 . Thus the noise removal
`algorithm still is mathematically valid for any number of
`noise sources, including multiple echoes of noise sources.
`Again, ifH0 and H 1 can be estimated to a high enough accu(cid:173)
`racy, and the above assumption of only one path from the
`signal to the microphones holds, the noise may be removed
`completely.
`The most general case involves multiple noise sources and
`multiple signal sources. FIG. 4 is a block diagram including
`front-end components 400 of a noise removal algorithm of an
`embodiment in the most general case where there are n dis(cid:173)
`tinct noise sources and signal reflections. Here, signal reflec(cid:173)
`tions enter both microphones MIC 1 and MIC 2. This is the
`most general case, as reflections of the noise source into the
`microphones MIC 1 and MIC 2 can be modeled accurately as
`simple additional noise sources. For clarity, the direct path
`from the signal to MIC 2 is changed from Ho(z) to H00(z), and
`the reflected paths to MIC 1 and MIC 2 are denoted by H01 (z)
`and H0iz), respectively.
`The input into the microphones now becomes
`
`M 1 (z)~S(z)+S(z)H01 (z)+N1 (z)H1 (z)+N2(z)H2(z)+ ...
`Nn(z)Hn(z)
`
`M2n~N1G1+N2G2+ ... NnGn"
`A new transfer function can now be defined as
`
`N1H1 + N2H2 + ... NnHn
`_
`Min
`Hi= M2n = N1G1+N2G2+ ... NnGn'
`
`Eq. 5
`
`Eq. 6
`
`MoCz)~S(z )H00(z)+S(z)H0iz)+N1 (z )G 1 (z )+Niz)Giz)+
`... Nn(z)Gn(z).
`
`Eq. 9
`
`60 When the VAD=0, the inputs become (suppressing z again)
`
`where H 1 is analogous to H 1 (z) above. Thus H 1 depends only
`on the noise sources and their respective transfer functions
`and can be calculated any time there is no signal being trans-
`
`M2n~N1G1+N2G2+ ... NnGn,
`65 which is the same as Equation 5. Thus, the calculation ofH 1
`in Equation 6 is unchanged, as expected. In examining the
`situation where there is no noise, Equation 9 reduces to
`
`Page 15 of 21
`
`

`

`US 8,019,091 B2
`
`7
`
`M2, =SH00+SH02 .
`This leads to the definition of H.2 as
`
`Eq. 10
`
`Rewriting Equation 9 again using the definition for H. 1 (as
`in Equation 7) provides
`
`8
`tially while the other is calculated. If the user environment is
`such that echoes are present, they can be compensated for if
`coming from a noise source. If signal echoes are also present,
`they will affect the cleaned signal, but the effect should be
`5 negligible in most environments.
`In operation, the algorithm of an embodiment has shown
`excellent results in dealing with a variety of noise types,
`amplitudes, and orientations. However, there are always
`approximations and adjustments that have to be made when
`10 moving from mathematical concepts to engineering applica(cid:173)
`tions. One assumption is made in Equation 3, where Hiz) is
`assumed small and therefore Hiz)H 1 (z)ss0, so that Equation
`3 reduces to
`
`Eq. 11 15
`
`Some algebraic manipulation yields
`
`-
`(Hoo+ Ho2)]
`-
`S(l +HoiJ l-H1 (l +HoiJ =M1 -M2H1
`[
`
`S(l + HoiJ[l -H1H2] = M1 -M2H1,
`
`and finally
`
`Eq. 12
`
`Equation 12 is the same as equation 8, with the replacement
`ofH0 by H.2 , and the addition of the (1 +H01 ) factor on the left
`side. This extra factor (l+H01 ) means that S cannot be solved
`for directly in this situation, but a solution can be generated
`for the signal plus the addition of all of its echoes. This is not
`such a bad situation, as there are many conventional methods
`for dealing with echo suppression, and even if the echoes are
`not suppressed, it is unlikely that they will affect the compre(cid:173)
`hensibility of the speech to any meaningful extent. The more
`complex calculation ofH.2 is needed to account for the signal
`echoes in MIC 2, which act as noise sources.
`FIG. 5 is a flow diagram 500 of a denoising algorithm,
`under an embodiment. In operation, the acoustic signals are
`received, at block 502. Further, physiological information
`associated with human voicing activity is received, at block
`504. A first transfer function representative of the acoustic
`signal is calculated upon determining that voicing informa(cid:173)
`tion is absent from the acoustic signal for at least one specified
`period of time, at block 506. A second transfer function rep(cid:173)
`resentative of the acoustic signal is calculated upon determin(cid:173)
`ing that voicing information is present in the acoustic signal 55
`for at least one specified period of time, at block 508. Noise is
`removed from the acoustic signal using at least one combi(cid:173)
`nation of the first transfer function and the second transfer
`function, producing denoised acoustic data streams, at block
`510.
`An algorithm for noise removal, or denoising algorithm, is
`described herein, from the simplest case of a single noise
`source with a direct path to multiple noise sources with reflec(cid:173)
`tions and echoes. The algorithm has been shown herein to be
`viable under any environmental conditions. The type and
`amount of noise are inconsequential if a good estimate has
`been made ofH. 1 and H.2 , and if one does not change substan-
`
`This means that only H 1 ( z) has to be calculated, speeding up
`the process and reducing the number of computations
`required considerably. With the proper selection of micro-
`20 phones, this approximation is easily realized.
`Another approximation involves the filter used in an
`embodiment. The actual H 1 (z) will undoubtedly have both
`poles and zeros, but for stability and simplicity an all-zero
`Finite Impulse Response (FIR) filter is used. With enough
`25 taps the approximation to the actual H 1 (z) can be very good.
`To further increase the performance of the noise suppres(cid:173)
`sion system, the spectrum of interest (generally about 125 to
`3700 Hz) is divided into subbands. The wider the range of
`frequencies over which a transfer function must be calcu-
`30 lated, the more difficult it is to calculate it accurately. There(cid:173)
`fore the acoustic data was divided into 16 subbands, and the
`denoising algorithm was then applied to each sub band in tum.
`Finally, the 16 denoised data streams were recombined to
`yield the denoised acoustic data. This works very well, but
`35 any combinations of subbands (i.e., 4, 6, 8, 32, equally
`spaced, perceptually spaced, etc.) can be used and all have
`been found to work better than a single sub band.
`The amplitude of the noise was constrained in an embodi(cid:173)
`ment so that the microphones used did not saturate (that is,
`40 operate outside a linear response region). It is important that
`the microphones operate linearly to ensure the best perfor(cid:173)
`mance. Even with this restriction, very low signal-to-noise
`ratio (SNR) signals can be denoised (down to -10 dB or less).
`The calculation ofH 1(z) is accomplished every 10 milli-
`45 seconds using the Least-Mean Squares (LMS) method, a
`common adaptive transfer function. An explanation may be
`found in "Adaptive Signal Processing" (

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket