throbber
111111
`
`1111111111111111111111111111111111111111111111111111111111111
`US007464029B2
`
`c12) United States Patent
`Visser et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,464,029 B2
`Dec. 9, 2008
`
`(54) ROBUST SEPARATION OF SPEECH SIGNALS
`IN A NOISY ENVIRONMENT
`
`wo
`wo
`
`wo 2006/012578
`wo 2006/028587
`
`2/2006
`3/2006
`
`(75)
`
`Inventors: Erik Visser, San Diego, CA (US);
`Jeremy Toman, San Marcos, CA (US);
`Kwokleung Chan, San Diego, CA (US)
`
`(73) Assignee: QUALCOMM Incorporated, San
`Diego, CA (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 246 days.
`
`(21) Appl. No.: 11/187,504
`
`(22) Filed:
`
`Jul. 22, 2005
`
`(65)
`
`(51)
`
`(52)
`
`(58)
`
`(56)
`
`EP
`wo
`
`Prior Publication Data
`US 2007/0021958Al
`Jan.25,2007
`
`Int. Cl.
`(2006.01)
`G10L 19114
`(2006.01)
`G10L 11106
`(2006.01)
`G10L 21102
`(2006.01)
`G10L 15120
`U.S. Cl. ....................... 704/210; 704/215; 704/228;
`704/233
`Field of Classification Search ....................... None
`See application file for complete search history.
`References Cited
`U.S. PATENT DOCUMENTS
`3/1987 Zinser, Jr. et al.
`4,649,505 A
`4,912,767 A
`3/1990 Chang
`5,208,786 A
`511993 Weinstein et a!.
`5,251,263 A
`10/1993 Andrea et al.
`(Continued)
`FOREIGN PATENT DOCUMENTS
`1 006 652 A2
`6/2000
`wo 01/27874
`4/2001
`
`OTHER PUBLICATIONS
`
`Amari, eta!. 1996. A new learning algorithm for blind signal sepa-
`ration. In D. Touretzky, M. Mozer, and M. Hasselmo (Eds.).
`Advances in NeurallnformationProcessing Systems 8 (pp. 757 -763).
`Cambridge: MIT Press.
`
`(Continued)
`Primary Examiner-David R. Hudspeth
`Assistant Examiner-Brian L Albertalli
`(74) Attorney, Agent, or Firm-Espartaco Diaz Hidelgo;
`Timothy F. Loomis; Thomas R. Rouse
`
`(57)
`
`ABSTRACT
`
`A method for improving the quality of a speech signal
`extracted from a noisy acoustic environment is provided. In
`one approach, a signal separation process is associated with a
`voice activity detector. The voice activity detector is a two-
`channel detector, which enables a particularly robust and
`accurate detection of voice activity. When speech is detected,
`the voice activity detector generates a control signal. The
`control signal is used to activate, adjust, or control signal
`separation processes or post-processing operations
`to
`improve the quality of the resulting speech signal. In another
`approach, a signal separation process is provided as a learning
`stage and an output stage. The learning stage aggressively
`adjusts to current acoustic conditions, and passes coefficients
`to the output stage. The output stage adapts more slowly, and
`generates a speech-content signal and a noise dominant sig-
`nal. When the learning stage becomes unstable, only the
`learning stage is reset, allowing the output stage to continue
`outputting a high quality speech signal.
`
`44 Claims, 13 Drawing Sheets
`
`102
`
`11M
`
`106
`
`- TransdJcerSignal
`.speech Signal
`
`114
`
`VOICE
`ACTI\111Y
`DETECTOR
`
`107
`
`1
`
`Sony v. Jawbone
`
`U.S. Patent No. 8,321,213
`
`Sony Ex. 1006
`
`

`

`US 7,464,029 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`5,327,178 A
`7/1994 McManigal
`5,375,174 A
`12/1994 Denenberg
`5,383,164 A
`111995 Sejnowski et al.
`5,706,402 A
`111998 Bell
`5,715,321 A
`2/1998 Andrea et al.
`5,732,143 A
`3/1998 Andrea et al.
`5,770,841 A
`6/1998 Moedetal.
`5,999,567 A
`12/1999 Torkkola
`5,999,956 A
`12/1999 Deville
`6,002,776 A
`12/1999 Bhadkarnkar et al.
`6,108,415 A
`8/2000 Andrea
`6,130,949 A * 10/2000 Aoki eta!. ................. 381194.3
`6,167,417 A
`12/2000 Parra eta!.
`6,381,570 B2
`4/2002 Li eta!.
`6,424,960 B1
`7/2002 Lee et al.
`6,526,178 B1
`2/2003 Fukuhara
`6,549,630 B1 * 4/2003 Bobisuthi .................. 381194.7
`6,606,506 B1
`8/2003 Jones
`7,099,821 B2
`8/2006 Visser eta!.
`200110037195 A1
`1112001 Acero eta!.
`2002/0110256 A1
`8/2002 Watson et al.
`2002/0136328 A1
`9/2002 Shimizu
`2002/0193130 A1
`12/2002 Yang eta!.
`2003/0055735 A1
`3/2003 Cameron et al.
`2003/0179888 A1 * 9/2003 Burnett et al.
`. ............ 381171.8
`2004/0039464 A1
`2/2004 Virolainen et a!.
`2004/0120540 A1
`6/2004 Mullenborn et a!.
`2004/0136543 A1
`7/2004 White eta!.
`
`OTHER PUBLICATIONS
`Amari, eta!. 1997. Stability analysis of! earning algorithms for blind
`source separation. Neural Networks, 10(8):1345-1351.
`Bell, eta!. 1995. An information-maximization approach to blind
`separation and blind deconvolution. Neural Computation, 7:1129-
`1159.
`Cardoso, J-F. 1992. Fourth-order cumulant structore forcing. Appli-
`cation to blind array processing. Proc. IEEE SP Workshop on SSAP-
`92, 136-139.
`Comon, P. 1994. Independent component analysis, A new concept?
`Signal Porcessing, 36:287-314.
`Griffiths, eta!. 1982. An alternative approach to linearly constrained
`adaptive beamforming. IEEE Transactions on Antennas and Propa-
`gation, AP-30(1):27-34.
`Herault, et a!. ( 1986). Space or time adaptive signal processing by
`neural network models. Neural Networks for Computing, In J.S.
`Denker (Ed.), Proc. oftheAIP Conference(pp. 206-211). New York:
`American Institute of Physics.
`Hoshuyama, eta!. 1999. A robust adaptive bearnformer for micro-
`phone arrays with a blocking matrix using constrained adaptive fil-
`ters. IEEE Transactions on Signal Processing, 47(10):2677-2684.
`Hyvarinen, eta!. 1997. A fast fixed-point algorithm for independent
`component analysos. Neural Computation, 9:1483-1492.
`Hyvarinen, A. 1999. Fast and robust fixed-point algorithms for inde-
`pendent component analysos, IEEE Trans. on Neural Networks,
`10(3):626-634.
`Jutten, et al. 1991. Blind separation of sources, Part I: An adaptive
`algorithm based on neuromimetic architecture. Signal Processing,
`24:1-10.
`Lambert, R. H. 1996. Multichannel blind deconvolution; FIR matrix
`algebra and separation of multi path mixtures. Doctoral Dissertation,
`University of Southern California.
`Lee, eta!. 1997. A contextual blind separation of delayed and con-
`volved sources. Proceedings of the 1997 IEEE International Confer-
`ence on Acoustics, Speech, and Signal Processing(ICASSP'97),
`2:1199-1202.
`Lee, et a!. 1998. Combining time-delayed decorrelation and ICA:
`Towards solving the cocktail party problem. Proceedings of the 1998
`IEEE International Conference on Acoustics, Speech, and Signal
`Processing (ICASSP'98), 2:1249-1252.
`
`Murata, Ikeda. 1998. An On-line Algorithm for Blind Source Sepa-
`ration on Speech Signals. Proc. of1998 International Symposium on
`Nonlinear Theory and its Application (NOLTA98), pp. 923-926, Le
`Regent, Crans-Montana, Switzerland.
`Molgedey, et al. 1994. Separation of a mixture of independent signals
`using time delayed correlations. Physical Review Letters, The Ameri-
`can Physical Society, 72(23):3634-3637.
`Parra, et al. 2000. Convolutive blind separation of non-stationary
`sources. IEEE Trnsactions of Speech and Audio Processing,
`8(3):320-327.
`Platt, et a!. 1992. Networks for the separation of sources that are
`superimposed and delayed. In J. Moody, S. Hanson, R. Lippmann
`(Eds.), Advances in Neural Information Processing 4 (pp. 730-737).
`San Francisco: Morgan-Kaufmann .
`Tong, eta!. 1991. A necessary and sufficient condition for the blind
`identification of memoryless systems. Circuits and Systems, IEEE
`International Symposium, 1:1-4.
`Torkkola, K. 1996. Blind separation of convolved sources based on
`information maximization. Neural Networks for Signal Processing:
`VI Proceedings of the 1996 IEEE Signal Processing Society Work-
`shop, pp. 423-432.
`Torkkola, K. 1997. Blind deconvolution, information mazimization
`and recursive filters. IEEE International Conference on Acoustics,
`Speech, and Signal Processing(ICASSP'97), 4:3301-3304.
`Van Compernolle, et a!. 1992. Signal separation in a symmetric
`adaptive noise canceler by output decorrelation. Acoustics, Speech,
`and Signal Processing, 1992. ICASSP-92., 1992 IEEE International
`Conference, 4:221-224.
`Visser, eta!. Blind source separation in mobile environments using a
`priori knowledge. Acoustics, Speech, and Signal Processing, 2004,
`Proceedings. (ICASSP'04). IEEE International Conference on, vol.
`3, May 17-21, 2004, pp. :iii-893-896.
`Vissser, eta!. Speech enhancement using blind source separation and
`two-channel energy based speaker detection. Acoustics, Speech, and
`Signal Processing, 2003. Proceedings . (ICASSP '03). 2003 IEEE
`International Conference on, vol. 1, Apr. 6-10, 2003, pp. I-884 -
`I-887.
`Yellin, et a!. 1996. Multichannel signal separation: Methods and
`analysis. IEEE Transactions on Signal Processing, 44( 1 ): 106-118.
`First Examination Report dated Oct. 23, 2006 from Indian Applica-
`tion No. 15711CHENP/2005.
`International Search Report from PCT/US03/39593 dated Apr. 29,
`2004.
`International Search Report from the EPO, Reference No. P400550,
`dated Oct, 15. 2007, in regards to European Publication No.
`EP1570464.
`International Preliminary Report on Patentability dated Feb. 1, 2007,
`with copy of Written Opinion of ISA dated Apr. 19, 2006, for PCT/
`US2005/026195 filed on Jul. 22, 2005.
`International Preliminary Report on Patentability dated Feb. 1, 2007,
`with copy ofWritten Opinion ofiSA dated Mar. 10, 2006, for PCT/
`US2005/026196 filed on Jul. 22, 2005.
`Office Action dated Oct. 31, 2006 from U.S. Appl. No. 10/537,985
`filed Jun. 9, 2005.
`Final Office Action dated Apr. 13, 2007 from U.S. Appl. No.
`10/537,985 filed Jun. 9, 2005.
`Notice of Allowance with Examiner's Amendment dated Jul. 30,
`2007 from U.S. Appl. No. 10/537,985 filed Jun. 9, 2005.
`Notice of Allowance dated Dec. 12, 2007 from U.S. Appl. No.
`10/537,985 filed Jun. 9, 2005.
`Office Action dated Mar. 23, 2007 from U.S. Appl. No. 111463,376
`filed Aug. 9, 2006.
`Notice of Allowance dated Dec. 12, 2007 from U.S. Appl. No.
`111463,376 filed Aug. 9, 2006.
`Office Action dated Dec. 27, 2005 from U.S. Appl. No. 10/897,219
`filed Jul. 22, 2004.
`Notice of Allowance dated Apr. 10, 2006 from U.S. Appl. No.
`10/897,219 filed Jul. 22, 2004.
`International Preliminary Report on Patentability dated Jan. 31,
`2008, with copy of Written Opinion ofiSA dated Aug. 31, 2007, for
`PCT/US2006/028627 filed On Jul. 21, 2006.
`* cited by examiner
`
`2
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 1 of 13
`
`US 7,464,029 B2
`
`102
`
`••• @
`
`104
`
`106"'
`
`SIGNAL
`08...,
`'-.. SEPARATION
`PROCESS
`
`- Transducer Signal
`-Speech Signal
`
`/100
`
`106
`
`VOICE
`ACTIV11Y
`DETECTOR
`
`10 '
`
`POST
`~ PROCESSNG ~----------~~------~
`"-107
`
`121--
`
`123""'
`
`TRANSMISSION
`
`~125
`
`FIG.1
`
`3
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 2 of 13
`
`US 7,464,029 B2
`
`177
`
`•••
`
`,('179
`~
`
`178
`
`- Transducer Signal
`- Speech Signal
`
`186
`
`185
`
`180
`
`LEARNING PROCESS
`
`VOLUME AOJUSTM ENT
`
`SIGNAL
`SEPARATION
`PROCESS
`
`NOISE ESTIMATION
`
`NOISE REDUCTION
`
`VOICE
`ACTIVITY
`DETECTOR
`
`AGC
`
`Speech Signal
`
`181
`
`I
`:
`195
`- 196
`_____ .L.._···--···-··--· ·--·····-·---····-···-·-·-····~······-·····-J
`
`I
`
`191
`
`TRANSMISSION
`
`193
`
`FIG. 2
`
`4
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 3 of 13
`
`US 7,464,029 B2
`
`/?.QQ
`
`206 "'-..
`
`POSITION A FIRST MICROPHONE
`CLOSER TO THE SPEECH SOURCE
`THAN A SECOND MICROPHONE
`
`207
`
`I
`
`"RECEIVE A SIGNAL FROM EACH OF THE
`MICROPHONES
`
`208"""
`MONITOR A THRESHOLD 01 FFERENCE
`AND COMPARE ENERGY LEVELS
`
`209"'
`
`/210
`
`THE RRST MIC SIGNAL
`HAS A HIGHER
`ENERGY LEVEL THEN
`THESECONDMIC
`SIGNAL
`
`THE SECOND MIC
`SIGNAL HAS A HIGHER
`ENERGY LEVEL THEN
`THE Fl RST Ml C Sl GNAL
`
`212"
`LIKELY SPEECH
`
`213
`
`"LIKELY NOISE
`
`FIG. 3
`
`5
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 4 of 13
`
`US 7,464,029 B2
`
`/250
`
`251"
`
`252""'
`
`POSITION A FIRST MICROPHONE
`CLOSER TO THE SPEECH SOURCE
`THAN A SECOND MICROPHONE
`
`I
`
`A SIGNAL SEPARATION PROCESS
`GENERATES A NOISE SIGNAL ANDA
`SPEECH SIGNAL
`
`253""'
`MONITOR A THRESHOLD Dl FFERENCE
`AND COMPARE ENERGY LEVELS
`
`I
`
`254""'
`
`/255
`
`THE SPEECH Sl GNAL
`HASAHIGHER
`ENERGY LEVEL THEN
`THE NOISE SIGNAL
`
`THE NOISE SIGNAL
`HAS A HIGHER
`ENERGY LEVEL THEN
`THE SPEECH SIGNAL
`
`257
`
`""'LIKELY SPEECH
`
`"'LIKELY NOISE
`
`258
`
`FIG. 4
`
`6
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 5 of 13
`
`US 7,464,029 B2
`
`329
`
`330
`
`332
`
`SPEECH
`SEPARATION
`PROCESS
`
`- Blind Signal Separation
`-Independent Component Analysis
`
`{
`
`231
`
`TRANSMISSION
`
`- BlueTooth
`-Wired
`- I EEE 802.11
`
`{
`
`335
`
`FIG. 5
`
`7
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 6 of 13
`
`US 7,464,029 B2
`
`352
`
`SPEECH
`SEPARATION
`PROCESS
`
`355
`
`354
`
`SPEAKER
`
`SIDE TONE
`PROCESSING
`
`360
`
`356
`
`TRANSMISSION
`
`358
`
`362
`
`FIG. 6
`
`8
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 7 of 13
`
`US 7,464,029 B2
`
`402
`
`405
`
`SPEECH
`SEPARATION
`PROCESS
`
`Speech Signal
`
`Noisy Signal
`
`406
`
`407
`
`413
`
`NOISE ESTIMATION
`
`410
`
`VOICE ACTIVITY
`DETECTOR
`
`411
`
`NOISE REDUCTION
`
`415
`
`Control Signal
`
`TRANSMISSION
`
`418
`
`420
`
`FIG. 7
`
`9
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 8 of 13
`
`US 7,464,029 B2
`
`/451
`,.
`
`/452
`
`Mic 1
`
`0
`
`FIG. 8
`
`10
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 9 of 13
`
`US 7,464,029 B2
`
`502""
`POSITION TRANSDUCERS
`
`!
`
`504""
`RECBVE Sl GNALS HAVING NOISE AND
`INFORMATION
`
`/517
`
`Set gain
`
`506""
`PROCESS Sl GNALS INTO CHANNELS
`/521
`Adapt filter coefficients
`
`519""
`Rearrange-----~
`coeffidents
`
`/523 _j
`
`Apply filters
`
`508""
`IDENTIFY CHANNEL WITH BOTH NOISE
`AND INFORMATION
`
`arrangement
`510""
`PROCESS THE IDENTIRED CHANNEL TO
`GENERATE AN INFORMATION SIGNAL
`
`/515
`Measure
`Noise signal
`Combination signal
`
`FIG. 9
`
`11
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 10 of 13
`
`US 7,464,029 B2
`
`/600
`
`xjt)
`
`FIG. 10
`
`FIG. 11
`
`12
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 11 of 13
`
`US 7,464,029 B2
`
`/750
`
`/765
`RESET MONITOR
`
`/767
`DEFAULT COEFFICIENTS
`
`t I
`776"'-J
`
`I
`I
`
`777-
`
`I
`~-----------------------,
`I
`752~
`I
`I
`I
`I
`
`1+----· I
`
`I
`I
`I
`I
`I
`I
`I
`r
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`,.. ___ I
`
`760""'
`
`762\
`
`LEARNING STAGE
`754 '\. LEARNING STAGE
`'\
`FILTER
`COEFFI a ENTS
`
`756"'
`
`OUTPUT STAGE
`~....-_. 758 '\. OUTPUT STAGE
`'\
`FILTER
`COEFFI a ENTS
`
`770'\
`
`773'\
`
`FIG. 12
`
`13
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 12 of 13
`
`US 7,464,029 B2
`
`801 ~802
`~
`
`/800
`
`807
`
`806
`
`812
`
`----------------,
`
`------------------~
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`I
`I
`I
`I
`I
`I
`
`SIGNAL
`SEPARATION 1--i-----P~
`PROCESS
`
`814
`
`SCALING
`MONITOR
`
`803
`
`POST
`PROCESSING
`
`821--
`
`823
`
`TRANSMISSION
`
`825
`
`FIG. 13
`
`14
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 13 of 13
`
`US 7,464,029 B2
`
`/.!100
`
`"POSITION A RRST MICROPHONE TO A
`DIFFERENT WIND DIRECTION THEN A
`SECOND MICROPHONE
`
`902
`
`904
`
`"" MONITOR MICROPHONE SIGNALS FOR
`A LOW-FREQUENCY WIND SIGNATURE
`
`DEACTIVATE OR DE-EMPHASIZE
`MICROPHONE HIT BY WI NO
`
`906"
`
`908
`
`"OPERATE AS A SINGLE CHANNEL
`COMMUNICATION PROCESS
`
`911"
`MONITOR MICROPHONE SIGNALS FOR
`ENDING OF THE LOW-FREQUENCY
`WIND SIGNATURE
`
`913"
`REACTIVE MICROPHONE AND ACTIVATE
`TWO CHANNEL SEPARATION AND POST
`PROCESSING
`
`FIG. 14
`
`15
`
`

`

`1
`ROBUST SEPARATION OF SPEECH SIGNALS
`IN A NOISY ENVIRONMENT
`
`RELATED APPLICATIONS
`
`This application is related to U.S. patent application Ser.
`No. 10/897,219, filed Jul. 22,2004, (now U.S. Pat. No. 7,099,
`821, issued Aug. 29, 2006) and entitled "Separation of Target
`Acoustic Signals in a Multi-Transducer Arrangement", which
`is related to a co-pending Patent Cooperation Treaty applica-
`tionnumber PCT/US03/39593, entitled "System and Method
`for Speech Processing Using Improved Independent Compo-
`nent Analysis", filed Dec. 11, 2003, which claims priority to
`U.S. patent application Ser. No. 60/502,253, both of which
`are incorporated herein by reference.
`
`FIELD OF THE INVENTION
`
`The present invention relates to processes and methods for
`separating a speech signal from a noisy acoustic environment.
`More particularly, one example of the present invention pro-
`vides a blind signal source process for separating a speech
`signal from a noisy environment.
`
`BACKGROUND
`
`An acoustic environment is often noisy, making it difficult
`to reliably detect and react to a desired informational signal.
`For example, a person may desire to communicate with
`another person using a voice communication channel. The
`channel may be provided, for example, by a mobile wireless
`handset, a walkie-talkie, a two-way radio, or other commu-
`nication device. To improve usability, the person may use a
`headset or earpiece connected to the communication device.
`The headset or earpiece often has one or more ear speakers
`and a microphone. Typically, the microphone extends on a
`boom toward the person's mouth, to increase the likelihood
`that the microphone will pick up the sound of the person
`speaking. When the person speaks, the microphone receives
`the person's voice signal, and converts it to an electronic
`signal. The microphone also receives sound signals from
`various noise sources, and therefore also includes a noise
`component in the electronic signal. Since the headset may
`position the microphone several inches from the person's
`mouth, and the environment may have many uncontrollable
`noise sources, the resulting electronic signal may have a
`substantial noise component. Such substantial noise causes
`an unsatisfactory communication experience, and may cause
`the communication device to operate in an inefficient manner,
`thereby increasing battery drain.
`In one particular example, a speech signal is generated in a
`noisy environment, and speech processing methods are used
`to separate the speech signal from the environmental noise.
`Such speech signal processing is important in many areas of
`everyday communication, since noise is almost always 55
`present in real-world conditions. Noise is defined as the com-
`bination of all signals interfering or degrading the speech
`signal of interest. The real world abounds from multiple noise
`sources, including single point noise sources, which often
`transgress into multiple sounds resulting in reverberation. 60
`Unless separated and isolated from background noise, it is
`difficult to make reliable and efficient use of the desired
`speech signal. Background noise may include numerous
`noise signals generated by the general environment, signals
`generated by background conversations of other people, as
`well as reflections and reverberation generated from each of
`the signals. In communication where users often talk in noisy
`
`US 7,464,029 B2
`
`10
`
`2
`environments, it is desirable to separate the user's speech
`signals from background noise. Speech communication
`mediums, such as cell phones, speakerphones, headsets,
`cordless telephones, teleconferences, CB radios, walkie-talk-
`ies, computer telephony applications, computer and automo-
`bile voice command applications and other hands-free appli-
`cations, intercoms, microphone systems and so forth, can take
`advantage of speech signal processing to separate the desired
`speech signals from background noise.
`Many methods have been created to separate desired sound
`signals from background noise signals, including simple fil-
`tering processes. Prior art noise filters identify signals with
`predetermined characteristics as white noise signals, and sub-
`15 tract such signals from the input signals. These methods,
`while simple and fast enough for real time processing of
`sound signals, are not easily adaptable to different sound
`environments, and can result in substantial degradation of the
`speech signal sought to be resolved. The predetermined
`20 assumptions of noise characteristics can be over-inclusive or
`under-inclusive. As a result, portions of a person's speech
`may be considered "noise" by these methods and therefore
`removed from the output speech signals, while portions of
`background noise such as music or conversation may be
`25 considered non-noise by these methods and therefore
`included in the output speech signals.
`In signal processing applications, typically one or more
`input signals are acquired using a transducer sensor, such as a
`microphone. The signals provided by the sensors are mixtures
`30 of many sources. Generally, the signal sources as well as their
`mixture characteristics are unknown. Without knowledge of
`the signal sources other than the general statistical assump-
`tion of source independence, this signal processing problem
`is known in the art as the "blind source separation (BSS)
`35 problem". The blind separation problem is encountered in
`many familiar forms. For instance, it is well known that a
`human can focus attention on a single source of sound even in
`an environment that contains many such sources, a phenom-
`enon commonly referred to as the "cocktail-party effect."
`40 Each of the source signals is delayed and attenuated in some
`time varying manner during transmission from source to
`microphone, where it is then mixed with other independently
`delayed and attenuated source signals, including multipath
`versions of itself (reverberation), which are delayed versions
`45 arriving from different directions.A person receiving all these
`acoustic signals may be able to listen to a particular set of
`sound source while filtering out or ignoring other interfering
`sources, including multi-path signals.
`Considerable effort has been devoted in the prior art to
`50 solve the cocktail-party effect, both in physical devices and in
`computational simulations of such devices. Various noise
`mitigation techniques are currently employed, ranging from
`simple elimination of a signal prior to analysis to schemes for
`adaptive estimation of the noise spectrum that depend on a
`correct discrimination between speech and non-speech sig-
`nals. A description of these techniques is generally character-
`ized in U.S. Pat. No. 6,002,776 (herein incorporated by ref-
`erence). In particular, U.S. Pat. No. 6,002,776 describes a
`scheme to separate source signals where two or more micro-
`phones are mounted in an environment that contains an equal
`or lesser number of distinct sound sources. Using direction-
`of-arrival information, a first module attempts to extract the
`original source signals while any residual crosstalk between
`the channels is removed by a second module. Such an
`65 arrangement may be effective in separating spatially local-
`ized point sources with clearly defined direction-of-arrival
`but fails to separate out a speech signal in a real-world spa-
`
`16
`
`

`

`US 7,464,029 B2
`
`3
`tially distributed noise environment for which no particular
`direction-of-arrival can be determined.
`Methods, such as Independent Component Analysis
`("ICA"), provide relatively accurate and flexible means for
`the separation of speech signals from noise sources. ICA is a
`technique for separating mixed source signals (components)
`which are presumably independent from each other. In its
`simplified form, independent component analysis operates an
`"un-mixing" matrix of weights on the mixed signals, for
`example multiplying the matrix with the mixed signals, to 10
`produce separated signals. The weights are assigned initial
`values, and then adjusted to maximize joint entropy of the
`signals in order to minimize information redundancy. This
`weight-adjusting and entropy-increasing process is repeated
`until the information redundancy of the signals is reduced to 15
`a minimum. Because this technique does not require infor-
`mation on the source of each signal, it is known as a "blind
`source separation" method. Blind separation problems refer
`to the idea of separating mixed signals that come from mul-
`tiple independent sources.
`Many popular ICA algorithms have been developed to
`optimize their performance, including a nnmber which have
`evolved by significant modifications of those which only
`existed a decade ago. For example, the work described inA.
`J. Bell and T J Sejnowski, Neural Computation 7:1129-1159 25
`(1995), and Bell, A. J. U.S. Pat. No. 5,706,402, is usually not
`used in its patented form. Instead, in order to optimize its
`performance, this algorithm has gone through several rechar-
`acterizations by a number of different entities. One such
`change includes the use of the "natural gradient", described in 30
`Amari, Cichocki, Yang (1996). Other popular ICA algorithms
`include methods that compute higher-order statistics such as
`cnmulants (Cardoso, 1992; Coman, 1994; Hyvaerinen and
`Oja, 1997).
`However, many known ICA algorithms are not able to 35
`effectively separate signals that have been recorded in a real
`environment which inherently include acoustic echoes, such
`as those due to room architecture related reflections. It is
`emphasized that the methods mentioned so far are restricted
`to the separation of signals resulting from a linear stationary 40
`mixture of source signals. The phenomenon resulting from
`the snmming of direct path signals and their echoic counter-
`parts is termed reverberation and poses a major issue in arti-
`ficial speech enhancement and recognition systems. ICA
`algorithms may require long filters which can separate those
`time-delayed and echoed signals, thus precluding effective
`real time use.
`Known ICA signal separation systems typically use a net-
`work of filters, acting as a neural network, to resolve indi-
`vidual signals from any nnmber of mixed signals input into
`the filter network. That is, the ICAnetworkis used to separate
`a set of sound signals into a more ordered set of signals, where
`each signal represents a particular sound source. For example,
`if an ICA network receives a sound signal comprising piano
`music and a person speaking, a two port ICA network will
`separate the sound into two signals: one signal having mostly
`piano music, and another signal having mostly speech.
`Another prior technique is to separate sound based on
`auditory scene analysis. In this analysis, vigorous use is made
`of assumptions regarding the nature of the sources present. It
`is assumed that a sound can be decomposed into small ele-
`ments such as tones and bursts, which in turn can be grouped
`according to attributes such as harmonicity and continuity in
`time. Auditory scene analysis can be performed using infor-
`mation from a single microphone or from several micro-
`phones. The field of auditory scene analysis has gained more
`attention due to the availability of computational machine
`
`4
`learning approaches leading to computational auditory scene
`analysis or CASA. Although interesting scientifically since it
`involves the understanding of the human auditory processing,
`the model assumptions and the computational techniques are
`still in its infancy to solve a realistic cocktail party scenario.
`Other techniques for separating sounds operate by exploit-
`ing the spatial separation of their sources. Devices based on
`this principle vary in complexity. The simplest such devices
`are microphones that have highly selective, but fixed patterns
`of sensitivity. A directional microphone, for example, is
`designed to have maximnm sensitivity to sounds emanating
`from a particular direction, and can therefore be used to
`enhance one audio source relative to others. Similarly, a
`close-talking microphone mounted near a speaker's mouth
`may reject some distant sources. Microphone-array process-
`ing techniques are then used to separate sources by exploiting
`perceived spatial separation. These techniques are not prac-
`tical because sufficient suppression of a competing sound
`20 source cannot be achieved due to their assnmption that at least
`one microphone contains only the desired signal, which is not
`practical in an acoustic environment.
`A widely known technique for linear microphone-array
`processing is often referred to as "beamforming". In this
`method the time difference between signals due to spatial
`difference of microphones is used to enhance the signal. More
`particularly, it is likely that one of the microphones will
`"look" more directly at the speech source, whereas the other
`microphone may generate a signal that is relatively attenu-
`ated. Although some attenuation can be achieved, the beam-
`former cannot provide relative attenuation of frequency com-
`ponents whose wavelengths are larger than the array. These
`techniques are methods for spatial filtering to steer a beam
`towards a sound source and therefore putting a null at the
`other directions. Beamforming techniques make no assump-
`tion on the sound source but assume that the geometry
`between source and sensors or the sound signal itself is
`known for the purpose of dereverberating the signal or local-
`izing the sound source.
`A known technique in robust adaptive beamforming
`referred to as "Generalized Sidelobe Canceling" (GSC) is
`discussed in Hoshuyama, 0., Sugiyama, A., Hirano, A., A
`Robust Adaptive Beamformer for Microphone Arrays with a
`Blocking Matrix using Constrained Adaptive Filters, IEEE
`45 Transactions on Signal Processing, vol 47, No 10, pp 2677-
`2684, October 1999. GSC aims at filtering out a single desired
`source signal z_i from a set of measurements x, as more fully
`explained in The GSC principle, Griffiths, L. J., Jim, C. W.,
`An alternative approach to linear constrained adaptive
`50 beamforming, IEEE Transaction Antennas and Propagation,
`vol 30, no 1, pp. 27-34, January 1982. Generally, GSC pre-
`defines that a signal-independent beamformer c filters the
`sensor signals so that the direct path from the desired source
`remains undistorted whereas, ideally, other directions should
`55 be suppressed. Most often, the position of the desired source
`must be pre-determined by additional localization methods.
`In the lower, side path, an adaptive blocking matrix B aims at
`suppressing all components originating from the desired sig-
`nal z_i so that only noise components appear at the output of
`60 B. From these, an adaptive interference canceller a derives an
`estimate for the remaining noise component in the output of c,
`by minimizing an estimate of the total output power
`E(z_i*z_i). Thus the fixed beamformer c and the interference
`canceller a jointly perform interference suppression. Since
`65 GSC requires the desired speaker to be confined to a limited
`tracking region, its applicability is limited to spatially rigid
`scenanos.
`
`17
`
`

`

`US 7,464,029 B2
`
`5
`Another known technique is a class of active-cancellation
`algorithms, which is related to sound separation. However,
`this technique requires a "reference signal," i.e., a signal
`derived from only of one of the sources. Active noise-cancel-
`lation and echo cancellation techniques make extensive use of
`this technique and the noise reduction is relative to the con-
`tribution of noise to a mixture by filtering a known signal that
`contains only the noise, and subtracting it from the mixture.
`This method assumes that one of the measured signals con-
`sists of one and only one source, an assumption which is not 10
`realistic in many real life settings.
`Techniques for active cancellation that do not require a
`reference signal are called "blind" and are of primary interest
`in this application. They are now classified, based on the
`degree of realism of the underlying assumptions regarding the 15
`acoustic processes by which the unwanted signals reach the
`microphones. One class of blind active-cancellation tech-
`niques may be called "gain-based" or also known as "instan-
`taneous mixing": it is presumed that the waveform produced
`by each source is received by the microphones simulta-
`neously, but with varying relative gains. (Directional micro-
`phones are most often used to produce the required differ-
`ences in gain.) Thus, a gain-based system attempts to cancel
`copies of an undesired source in different microphone signals
`by applying relative gains to the microphone signals and
`subtracting, but not applying time delays or other filtering.
`Numerous gain-based methods for blind active cancellation
`have been proposed; see Herault and Jutten (1986), Tong eta!.
`(1991), and Molgedey and Schuster (1994). The gain-based
`or instantaneous mixing assumption is violated when micro-
`phones are separated in space as in most acoustic applica-
`tions. A simple extension of this method is to include a time
`delay factor but without any other filtering, which will work
`under anechoic conditions. However, this simple model of
`acoustic propagation from the sources to the microphones is
`oflimited use when echoes and reverberation are present. The
`most realistic active-cancellation techniques currently known
`are "convolutive": the effect of acoustic propagation from
`each source to each microphone is modeled as a convolutive
`filter. These techniques are more realistic than gain-based and
`delay-based techniques because they explicitly accommodate
`the effects of inter-microphone separation, echoes and rever-
`beration. They are also more general since, in principle, gains
`and

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket