throbber
US 7,464,029 B2
`(10) Patent No:
`a2) United States Patent
`Visser et al.
`(45) Date of Patent:
`Dec. 9, 2008
`
`
`US007464029B2
`
`2/2006
`WO 2006/012578
`WO
`(54) ROBUST SEPARATION OF SPEECH SIGNALS
`
`
`IN A NOISY ENVIRONMENT WO—WO 2006/028587 3/2006
`
`(75)
`
`Inventors: Erik Visser, San Diego, CA (US);
`Jeremy Toman, San Marcos, CA (US);
`Kwokleung Chan, San Diego, CA (US)
`
`(73) Assignee: QUALCOMM Incorporated, San
`Diego, CA (US)
`.
`.
`.
`.
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 246 days.
`
`.
`ae
`(*) Notice:
`
`(21) Appl. No.: 11/187,504
`
`(22)
`
`Filed:
`
`Jul. 22, 2005
`
`(65)
`
`Prior Publication Data
`US 2007/0021958 Al
`Jan. 25, 2007
`
`(51)
`
`Int. Cl.
`(2006.01)
`GIOL 19/14
`(2006.01)
`GIOL 11/06
`(2006.01)
`GIOL 21/02
`(2006.01)
`GIOL 15/20
`(52) US. Ch cece 704/210; 704/215; 704/228;
`704/233
`(58) Field of Classification Search o0...0000..sees None
`See applicationfile for complete search history.
`References Cited
`
`(56)
`
`U.S. PATENT DOCUMENTS
`4,649,505 A
`3/1987 Zinser, Ir.et al.
`4,912,767 A
`3/1990 Chang
`5,208,786 A
`5/1993 Weinsteinetal.
`5,251,263 A
`10/1993 Andreaetal.
`.
`(Continued)
`
`EP
`
`WO
`
`FOREIGN PATENT DOCUMENTS
`1 006 652 A2
`6/2000
`
`WO01/27874
`
`4/2001
`
`OTHER PUBLICATIONS
`
`Amari, et al. 1996. A new learning algorithm for blind signal sepa-
`ration.
`In D. Touretzky, M. Mozer, and M. Hasselmo (Eds.).
`Advances in NeuralInformation Processing Systems8 (pp. 757-763).
`Cambridge: MIT Press.
`
`(Continued)
`
`Primary Examiner—David R. Hudspeth
`Assistant Examiner—Brian L Albertalli
`(74) Attorney, Agent, or Firm—Espartaco Diaz Hidelgo;
`Timothy F. Loomis; Thomas R. Rouse
`
`(57)
`
`ABSTRACT
`
`A method for improving the quality of a speech signal
`extracted from a noisy acoustic environmentis provided. In
`one approach, a signal separation process is associated with a
`voice activity detector. The voiceactivity detector is a two-
`channel detector, which enables a particularly robust and
`accurate detection ofvoice activity. When speechis detected,
`the voice activity detector generates a control signal. The
`control signal is used to activate, adjust, or control signal
`separation processes or post-processing operations
`to
`improve the quality of the resulting speech signal. In another
`approach,a signal separation process is providedas a learning
`stage and an output stage. The learning stage aggressively
`adjusts to current acoustic conditions, and passes coefficients
`to the output stage. The output stage adapts more slowly, and
`generates a speech-content signal and a noise dominantsig-
`nal. When the learning stage becomes unstable, only the
`.
`.
`:
`.
`learning stage is reset, allowing the output stage to continue
`outputting a high quality speech signal.
`
`44 Claims, 13 Drawing Sheets
`
`105)
`~ Transchicer Signal
`« Speech Signal
`
`
`
`VOICE
`ACTIVITY
`DETECTOR
`
`
`
`POST
`
`PROCESSING
`
`
`107
`
`
`421-——
`
`
`
`
`
`
`423
`
`
`
`TRANSMISSION
`
`125
`
`1
`
`Amazon Ex. 1006
`
`Amazon v. Jawbone
`USS. Patent 8,321,213
`
`
`
`1
`
`Amazon v. Jawbone
`U.S. Patent 8,321,213
`Amazon Ex. 1006
`
`

`

`US 7,464,029 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`7/1994 McManigal
`5,327,178 A
`12/1994 Denenberg
`5,375,174 A
`1/1995 Sejnowskiet al.
`5,383,164 A
`1/1998 Bell
`5,706,402 A
`2/1998 Andreaetal.
`5,715,321 A
`3/1998 Andreaetal.
`5,732,143 A
`6/1998 Moedetal.
`5,770,841 A
`12/1999 Torkkola
`5,999,567 A
`12/1999 Deville
`5,999,956 A
`12/1999 Bhadkamkar etal.
`6,002,776 A
`8/2000 Andrea
`6,108,415 A
`6,130,949 A * 10/2000 Aokietal. oo... 381/943
`6,167,417 A
`12/2000 Parra etal.
`6,381,570 B2
`4/2002 Lietal.
`6,424,960 Bl
`7/2002 Lee et al.
`6,526,178 Bl
`2/2003 Fukuhara
`6,549,630 B1*
`4/2003 Bobisuthi
`6,606,506 Bl
`8/2003 Jones
`7,099,821 B2
`8/2006 Visseret al.
`2001/0037195 Al
`11/2001 Aceroet al.
`2002/0110256 Al
`8/2002 Watsonetal.
`2002/0136328 Al
`9/2002 Shimizu
`2002/0193130 Al
`12/2002 Yanget al.
`2003/0055735 Al
`3/2003 Cameronetal.
`2003/0179888 Al*
`9/2003 Burnett et al. 0... 381/718
`2004/0039464 Al
`2/2004 Virolainenet al.
`2004/0120540 Al
`6/2004 Mullenbornetal.
`2004/0136543 Al
`7/2004 Whiteet al.
`
`.........0..... 381/94.7
`
`OTHER PUBLICATIONS
`
`Amari,et al. 1997. Stability analysis of learning algorithmsfor blind
`source separation. Neural Networks, 10(8):1345-1351.
`Bell, et al. 1995. An information-maximization approach to blind.
`separation and blind deconvolution. Neural Computation, 7:1129-
`1159.
`Cardoso, J-F. 1992. Fourth-order cumulantstructore forcing. Appli-
`cation to blind array processing. Proc. IEEE SP Workshop on SSAP-
`92, 136-139.
`Comon, P. 1994. Independent component analysis, A new concept?
`Signal Porcessing, 36:287-314.
`Griffiths, et al. 1982. An alternative approachto linearly constrained
`adaptive beamforming. JEEE Transactions on Antennas and Propa-
`gation, AP-30(1):27-34.
`Herault, et al. (1986). Space or time adaptive signal processing by
`neural network models. Neural Networks for Computing, In J.S.
`Denker (Ed.), Proc. ofthe AIP Conference(pp. 206-211). New York:
`American Institute of Physics.
`Hoshuyama,et al. 1999. A robust adaptive beamformer for micro-
`phonearrays with a blocking matrix using constrained adaptivefil-
`ters. IEEE Transactions on Signal Processing, 47(10):2677-2684.
`Hyvarinen,et al. 1997. A fast fixed-point algorithm for independent
`component analysos. Neural Computation, 9:1483-1492.
`Hyvarinen, A. 1999. Fast and robust fixed-point algorithmsfor inde-
`pendent component analysos, JEEE Trans. on Neural Networks,
`10(3):626-634.
`Jutten, et al. 1991. Blind separation of sources, Part I: An adaptive
`algorithm based. on neuromimetic architecture. Signal Processing,
`24:1-10.
`Lambert, R. H. 1996. Multichannel blind deconvolution; FIR matrix
`algebra and separation of multipath mixtures. Doctoral Dissertation,
`University of Southern California.
`Lee, et al. 1997. A contextual blind separation of delayed and con-
`volved sources. Proceedings ofthe 1997 JEEE International Confer-
`ence on Acoustics, Speech, and Signal ProcessingICASSP’97),
`2:1199-1202.
`Lee, et al. 1998. Combining time-delayed decorrelation and ICA:
`Towardssolving the cocktail party problem. Proceedings ofthe1998
`IEEE International Conference on Acoustics, Speech, and Signal
`Processing (ICASSP’98), 2:1249-1252.
`
`Murata, Ikeda. 1998. An On-line Algorithm for Blind Source Sepa-
`ration on Speech Signals. Proc. of 1998 International Symposium on
`Nonlinear Theory andits Application (NOLTA98), pp. 923-926, Le
`Regent, Crans-Montana, Switzerland.
`Molgedey,et al. 1994. Separation of a mixture of independentsignals
`using time delayed correlations. Physical Review Letters, TheAmeri-
`can Physical Society, 72(23):3634-3637.
`Parra, et al. 2000. Convolutive blind separation of non-stationary
`sources. JEEE Trnsactions of Speech and Audio Processing,
`8(3):320-327.
`Platt, et al. 1992. Networks for the separation of sources that are
`superimposed and. delayed. In J. Moody, S. Hanson, R. Lippmann
`(Eds.), Advances in Neural Information Processing 4 (pp. 730-737).
`San Francisco: Morgan-Kaufmann.
`Tong, et al. 1991. A necessary and sufficient condition for the blind
`identification of memoryless systems. Circuits and Systems, IEEE
`International Symposium, 1:1-4.
`Torkkola, K. 1996. Blind separation of convolved sources based on
`information maximization. Neural Networks for Signal Processing:
`VI. Proceedings of the 1996 IEEE Signal Processing Society Work-
`shop, pp. 423-432.
`Torkkola, K. 1997. Blind deconvolution, information mazimization
`and recursive filters. JEEE International Conference on Acoustics,
`Speech, and Signal ProcessingICASSP’97), 4:3301-3304.
`Van Compernolle, et al. 1992. Signal separation in a symmetric
`adaptive noise canceler by output decorrelation. Acoustics, Speech,
`and Signal Processing, 1992. ICASSP-92., 1992 IEEE International
`Conference, 4:22 1-224.
`Visser, et al. Blind source separation in mobile environments using a
`priori knowledge. Acoustics, Speech, and Signal Processing, 2004,
`Proceedings. ICASSP’ 04). TEEE International Conference on,vol.
`3, May 17-21, 2004, pp. :i1i-893-896.
`Vissser, et al. Speech enhancementusing blind source separation and
`two-channel energy based speaker detection. Acoustics, Speech, and
`Signal Processing , 2003. Proceedings . ICASSP ’03). 2003 IEEE
`International Conference on, vol. 1, Apr. 6-10, 2003, pp. I-884 -
`1-887.
`Yellin, et al. 1996. Multichannel signal separation: Methods and
`analysis.EEE Transactions on Signal Processing, 44(1):106-118.
`First Examination Report dated Oct. 23, 2006 from Indian Applica-
`tion No. 1571/CHENP/2005.
`International Search Report from PCT/US03/39593 dated Apr. 29,
`2004.
`International Search Report from the EPO, Reference No. P400550,
`dated Oct, 15. 2007,
`in regards to European Publication No.
`EP1570464.
`International Preliminary Report on Patentability dated Feb. 1, 2007,
`with copy of Written Opinion of ISA dated Apr. 19, 2006, for PCT/
`US2005/026195 filed on Jul. 22, 2005.
`International Preliminary Report on Patentability dated Feb. 1, 2007,
`with copy of Written Opinion of ISA dated Mar. 10, 2006, for PCT/
`US2005/026196 filed on Jul. 22, 2005.
`Office Action dated Oct. 31, 2006 from U.S. Appl. No. 10/537,985
`filed Jun. 9, 2005.
`Final Office Action dated Apr. 13, 2007 from U.S. Appl. No.
`10/537,985 filed Jun. 9, 2005.
`Notice of Allowance with Examiner’s Amendment dated Jul. 30,
`2007 from U.S. Appl. No. 10/537,985 filed Jun. 9, 2005.
`Notice of Allowance dated Dec. 12, 2007 from U.S. Appl. No.
`10/537,985 filed Jun. 9, 2005.
`Office Action dated Mar. 23, 2007 from U.S. Appl. No. 11/463,376
`filed Aug. 9, 2006.
`Notice of Allowance dated Dec. 12, 2007 from U.S. Appl. No.
`11/463,376 filed Aug. 9, 2006.
`Office Action dated Dec. 27, 2005 from U.S. Appl. No. 10/897,219
`filed Jul. 22, 2004.
`Notice of Allowance dated Apr. 10, 2006 from U.S. Appl. No.
`10/897,219 filed Jul. 22, 2004.
`International Preliminary Report on Patentability dated Jan. 31,
`2008, with copy of Written Opinion of ISA dated Aug. 31, 2007, for
`PCT/US2006/028627filed On Jul. 21, 2006.
`
`* cited by examiner
`
`2
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 1 of 13
`
`US 7,464,029 B2
`
`106
`
`
`
`
`
`
`
`
`105
`
`\
`
`- Transducer Signal
`Speech Signal
`
`
`
`
`121 -——~
`
`
`
`
`
`
`
`
`
`VOICE
`ACTIMTY
`DETECTOR
`
`106
`
`108
`
`SIGNAL
`SEPARATION
`PROCESS
`
`112
`
`110
`
`POST
`
`: PROCESSING
`
`107
`
`123
`
`TRANSMISSION
`
`
`
`FIG. 1
`
`3
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 2 of 13
`
`US 7,464,029 B2
`
`
`
`- Transducer Signal
`- Speech Signal
`
`178
`
`180
`
`
`
` |
`
`Speech Signal
`
`
`
`
`SIGNAL
`VOICE
`NOISEREDUCTION a
`SEPARATION
`
`
`ACTIVITY
`
`
`PROCESS
`DETECTOR
`
`
`AGG 2
`ie
`
`
`
`. POST PROCESSING a
`
` -4----
`
`
`
`
`
`181
`
`191
`
`
`
`TRANSMISSION
`
`ee eS ee ee
`
`FIG. 2
`
`4
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 3 of 13
`
`US 7,464,029 B2
`
`26.
`
`POSITION A FIRST MICROPHONE
`CLOSER TO THE SPEECH SOURCE
`THAN A SECOND MICROPHONE
`
`‘S
`
`207
`
`\
`
`ze
`
`RECEIVE A SIGNAL FROM EACH OF THE
`MICROPHONES
`
`MONITOR A THRESHOLD DIFFERENCE
`AND COMPARE ENERGY LEVELS
`
`209 \
`
`210
`
`/
`
`THE FIRST MIC SIGNAL
`HAS AHIGHER
`ENERGY LEVEL THEN
`THE SECOND MIC
`SIGNAL
`
`THE SECOND MIC
`SIGNAL HAS A HIGHER
`ENERGY LEVEL THEN
`THE FIRST MIC SIGNAL
`
`212
`
`LIKELY SPEECH
`
`213
`
`\UIKELY NOISE
`
`FIG. 3
`
`5
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 4 of 13
`
`US 7,464,029 B2
`
`250
`
`/
`
`251
`
`\
`
`POSITION A FIRST MICROPHONE
`CLOSER TO THE SPEECH SOURCE
`THAN A SECOND MICROPHONE
`
`252
`
`|
`
`\ A SIGNAL SEPARATION PROCESS
`GENERATES A NOISE SIGNAL ANDA
`SPEECH SIGNAL
`
`253
`
`MONITOR A THRESHOLD DIFFERENCE
`AND COMPARE ENERGY LEVELS
`
`254 \
`
`J 255
`
`THE SPEECH SIGNAL
`HAS A HIGHER
`ENERGY LEVEL THEN
`THE NOISE SIGNAL
`
`THE NOISE SIGNAL
`HAS AHIGHER
`ENERGY LEVEL THEN
`THE SPEECH SIGNAL
`
`257 \ LIKELY SPEECH
`
`258 \
`
`LIKELY NOISE
`
`FIG. 4
`
`6
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 5 of 13
`
`US 7,464,029 B2
`
`327
`
`329
`
`.
`- Blind Signal Separation
`- Independent ComponentAnalysis
`
`- BlueTooth
`- Wired
`-
`[IEEE 802.11
`
`330
`
`332
`
`
`
` SPEECH
`SEPARATION
`PROCESS
`
`
`
`
`TRANSMISSION
`
`FIG. 5
`
`7
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 6 of 13
`
`US 7,464,029 B2
`
`351
`
`352
`
`SPEECH
`SEPARATION
`PROCESS
`
`355
`
` 360
`PROCESSING
`
`354
`
`SPEAKER
`
`SIDE TONE
`
`356
`
`TRANSMISSION
`
`358
`
`362
`
`FIG. 6
`
`8
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 7 of 13
`
`US 7,464,029 B2
`
`401
`
`402
`
`\ E
`
`410
`
`405
`
`
`
`VOICE ACTIVITY
`DETECTOR
`
`SPEECH
`SEPARATION
`PROCESS
`
`Speech Signal
`
`Noisy Signal
`
`411
`
`406
`
`407
`
`413
`
`NOISE ESTIMATION
`
`=}-------
`
`NOISE REDUCTION
`
`415
`
`Control Signal
`
`TRANSMISSION
`
`418
`
`420
`
`FIG. 7
`
`9
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 8 of 13
`
`US 7,464,029 B2
`
`451
`
`/
`
`fe
`
`Mic 1
`
`Mic 2
`
`\)oe)
`
`FIG. 8
`
`10
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 9 of 13
`
`US 7,464,029 B2
`
`502
`
`on
`
`POSITION TRANSDUCERS
`
`|
`
`RECEIVE SIGNALS HAVING NOISE AND
`INFORMATION
`
`506
`
`\
`
`\
`PROCESS SIGNALS INTO CHANNELS
`
`an
`
`ZY 517
`
`Set gain
`
`519
`
`\
`Rearrange
`coefficients
`
`508
`
`521
`..
`a“
`Adaptfilter coefficients
`523
`Applyfilters
`
`518 \
`Detect
`transducer
`arrangement
`
`IDENTIFY CHANNEL WITH BOTH NOISE
`AND INFORMATION
`
`545
`
`4
`Measure —
`—
`Noise signal
`Combination signal
`
`510
`
`PROCESS THE IDENTIFIED CHANNEL TO
`GENERATE AN INFORMATION SIGNAL
`
`FIG. 9
`
`11
`
`11
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 10 of 13
`
`US 7,464,029 B2
`
`s¢_fact
`
`oS
`
`
`
`
`FIG. 10
`
`FIG. 11
`
`12
`
`12
`
`

`

`765
`
`RESET MONITOR
`
`I
`
`116 \
`
`\
`
`752
`
`77 ——»
`
`LEARNING STAGE
`
`LEARNING STAGE
`FILTER
`COEFFICIENTS
`
`
`
`
`760
`
`762
`
`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 11 of 13
`
`US 7,464,029 B2
`
`/ 750
`
`787
`
`DEFAULT COEFFICIENTS
`
`
`
`
`SET 4
`
`SET 2
`
`SET 3
`
`oe
`
`i
`'
`
`"a
`:
`
`'
`
`
`
`OUTPUT STAGE
`
`
`
`OUTPUT STAGE
`
`FILTER
`COEFFICIENTS
`
`
`770
`
`773
`
`FIG. 12
`
`13
`
`13
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 12 of 13
`
`US 7,464,029 B2
`
`801
`
`800
`
`/
`
`807
`
`SCALE
`
`
`
`808
`
`SIGNAL
`SEPARATION
`
`
`PROCESS
`
`
`810
`POST
`PROCESSING
`
` 806
`
`
` 803p|812 SCALINGMONITOR
`
`
`
`821 -—
`
`823
`
`TRANSMISSION
`
`825
`
`FIG. 13
`
`14
`
`14
`
`

`

`U.S. Patent
`
`Dec. 9, 2008
`
`Sheet 13 of 13
`
`US 7,464,029 B2
`
`900
`
`/
`
`POSITION A FIRST MICROPHONE TO A
`DIFFERENT WIND DIRECTION THEN A
`SECOND MICROPHONE
`
`MONITOR MICROPHONE SIGNALS FOR
`A LOW-FREQUENCY WIND SIGNATURE
`
`902
`
`\
`
`904.
`
`906
`
`DEACTIVATE OR DE-EMPHASIZE
`MICROPHONE HIT BY WIND
`
`OPERATE AS A SINGLE CHANNEL
`COMMUNICATION PROCESS
`
`98
`
`911 \
`
`MONITOR MICROPHONE SIGNALS FOR
`ENDING OF THE LOW-FREQUENCY
`WIND SIGNATURE
`
`913 \
`
`REACTIVE MICROPHONE AND ACTIVATE
`TWO CHANNEL SEPARATION AND POST
`PROCESSING
`
`FIG. 14
`
`15
`
`15
`
`

`

`US 7,464,029 B2
`
`1
`ROBUST SEPARATION OF SPEECH SIGNALS
`IN A NOISY ENVIRONMENT
`
`RELATED APPLICATIONS
`
`This application is related to U.S. patent application Ser.
`No. 10/897,219, filed Jul. 22, 2004, (now USS. Pat. No. 7,099,
`821, issued Aug. 29, 2006) andentitled “Separation of Target
`Acoustic Signals ina Multi-TransducerArrangement”, which
`is related to a co-pending Patent Cooperation Treaty applica-
`tion number PCT/US03/39593, entitled “System and Method
`for Speech Processing Using Improved Independent Compo-
`nent Analysis”, filed Dec. 11, 2003, which claimspriority to
`USS. patent application Ser. No. 60/502,253, both of which
`are incorporated herein by reference.
`
`2
`environments, it is desirable to separate the user’s speech
`signals from background noise. Speech communication
`mediums, such as cell phones, speakerphones, headsets,
`cordless telephones, teleconferences, CB radios, walkie-talk-
`ies, computer telephony applications, computer and automo-
`bile voice commandapplications and other hands-free appli-
`cations, intercoms, microphone systemsand soforth, can take
`advantage of speech signal processing to separate the desired
`speech signals from background noise.
`Many methods have been created to separate desired sound
`signals from backgroundnoise signals, including simplefil-
`tering processes. Prior art noise filters identify signals with
`predetermined characteristics as white noise signals, and sub-
`tract such signals from the input signals. These methods,
`while simple and fast enough for real time processing of
`sound signals, are not easily adaptable to different sound
`environments, and can result in substantial degradation ofthe
`speech signal sought to be resolved. The predetermined
`assumptions of noise characteristics can be over-inclusive or
`under-inclusive. As a result, portions of a person’s speech
`may be considered “noise” by these methods and therefore
`removed from the output speech signals, while portions of
`background noise such as music or conversation may be
`considered non-noise by these methods and therefore
`included in the output speech signals.
`In signal processing applications, typically one or more
`An acoustic environmentis often noisy, makingit difficult
`input signals are acquired using a transducersensor, suchas a
`to reliably detect and react to a desired informational signal.
`microphone. Thesignals provided by the sensors are mixtures
`For example, a person may desire to communicate with
`ofmany sources. Generally, the signal sources as well as their
`another person using a voice communication channel. The
`mixture characteristics are unknown. Without knowledge of
`channel maybe provided, for example, by a mobile wireless
`the signal sources other than the generalstatistical assump-
`handset, a walkie-talkie, a two-way radio, or other commu-
`tion of source independence,this signal processing problem
`nication device. To improve usability, the person may use a
`is known in the art as the “blind source separation (BSS)
`headset or earpiece connected to the communication device.
`problem”. The blind separation problem is encountered in
`The headset or earpiece often has one or more ear speakers
`many familiar forms. For instance, it is well known that a
`and a microphone. Typically, the microphone extends on a
`humancan focus attention onasingle source of sound even in
`boom toward the person’s mouth, to increase the likelihood
`an environmentthat contains many such sources, a phenom-
`that the microphone will pick up the sound of the person
`enon commonly referred to as the “cocktail-party effect.”
`speaking. Whenthe person speaks, the microphonereceives
`Eachofthe source signals is delayed and attenuated in some
`the person’s voice signal, and converts it to an electronic
`time varying manner during transmission from source to
`signal. The microphone also receives sound signals from
`microphone,whereit is then mixed with other independently
`various noise sources, and therefore also includes a noise
`delayed and attenuated source signals, including multipath
`component in the electronic signal. Since the headset may
`versionsofitself (reverberation), which are delayed versions
`position the microphone several inches from the person’s
`arriving from different directions. A person receiving all these
`mouth, and the environment may have many uncontrollable
`acoustic signals may be able to listen to a particular set of
`noise sources, the resulting electronic signal may have a
`sound source while filtering out or ignoring other interfering
`substantial noise component. Such substantial noise causes
`sources, including multi-path signals.
`an unsatisfactory communication experience, and may cause
`the communication device to operate in an inefficient manner,
`Considerable effort has been devoted in the prior art to
`thereby increasing battery drain.
`solve the cocktail-party effect, both in physical devices and in
`In one particular example, a speech signal is generated in a
`computational simulations of such devices. Various noise
`noisy environment, and speech processing methods are used
`mitigation techniques are currently employed, ranging from
`to separate the speech signal from the environmental noise.
`simple elimination ofa signal prior to analysis to schemesfor
`Such speech signal processing is important in many areas of
`adaptive estimation of the noise spectrum that depend on a
`everyday communication, since noise is almost always
`correct discrimination between speech and non-speechsig-
`present in real-world conditions. Noise is defined as the com-
`nals. A description ofthese techniques is generally character-
`bination of all signals interfering or degrading the speech
`ized in U.S. Pat. No. 6,002,776 (herein incorporated by ref-
`signal of interest. The real world abounds from multiple noise
`erence). In particular, U.S. Pat. No. 6,002,776 describes a
`sources, including single point noise sources, which often
`schemeto separate source signals where two or more micro-
`transgress into multiple sounds resulting in reverberation.
`phones are mountedin an environmentthat contains an equal
`Unless separated and isolated from background noise, it is
`or lesser numberof distinct sound sources. Using direction-
`difficult to make reliable and efficient use of the desired
`of-arrival information, a first module attempts to extract the
`speech signal. Background noise may include numerous
`original source signals while any residual crosstalk between
`noise signals generated by the general environment, signals
`the channels is removed by a second module. Such an
`generated by background conversations of other people, as
`arrangement may beeffective in separating spatially local-
`well as reflections and reverberation generated from each of
`ized point sources with clearly defined direction-of-arrival
`the signals. In communication where users oftentalk in noisy
`but fails to separate out a speech signal in a real-world spa-
`
`FIELD OF THE INVENTION
`
`Thepresentinvention relates to processes and methods for
`separating a speechsignal from a noisy acoustic environment.
`Moreparticularly, one example of the present invention pro-
`vides a blind signal source process for separating a speech
`signal from a noisy environment.
`
`BACKGROUND
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`16
`
`16
`
`

`

`US 7,464,029 B2
`
`3
`tially distributed noise environment for which no particular
`direction-of-arrival can be determined.
`
`Independent Component Analysis
`such as
`Methods,
`(“ICA”), provide relatively accurate and flexible means for
`the separation of speech signals from noise sources. ICA is a
`technique for separating mixed source signals (components)
`which are presumably independent from each other. In its
`simplified form, independent componentanalysis operates an
`“un-mixing” matrix of weights on the mixed signals, for
`example multiplying the matrix with the mixed signals, to
`produce separated signals. The weights are assigned initial
`values, and then adjusted to maximize joint entropy of the
`signals in order to minimize information redundancy. This
`weight-adjusting and entropy-increasing process is repeated
`until the information redundancyof the signals is reduced to
`a minimum. Because this technique does not require infor-
`mation on the source of each signal, it is known as a “blind
`source separation” method. Blind separation problemsrefer
`to the idea of separating mixed signals that come from mul-
`tiple independent sources.
`Many popular ICA algorithms have been developed to
`optimize their performance, including a number which have
`evolved by significant modifications of those which only
`existed a decade ago. For example, the work described in A.
`J. Bell and T J Sejnowski, Neural Computation 7:1129-1159
`(1995), and Bell, A. J. U.S. Pat. No. 5,706,402,is usually not
`used in its patented form. Instead, in order to optimize its
`performance,this algorithm has gone through several rechar-
`acterizations by a numberof different entities. One such
`change includesthe use ofthe “natural gradient”, described in
`Amari, Cichocki, Yang (1996). Other popular ICA algorithms
`include methods that compute higher-order statistics such as
`cumulants (Cardoso, 1992; Comon, 1994; Hyvaerinen and
`Oja, 1997).
`However, many known ICA algorithms are not able to
`effectively separate signals that have been recorded in a real
`environment which inherently include acoustic echoes, such
`as those due to room architecture related reflections. It is
`
`emphasized that the methods mentionedso far are restricted
`to the separation of signals resulting from a linear stationary
`mixture of source signals. The phenomenonresulting from
`the summingofdirect path signals and their echoic counter-
`parts is termed reverberation and poses a major issuein arti-
`ficial speech enhancement and recognition systems. ICA
`algorithms may require long filters which can separate those
`time-delayed and echoed signals, thus precluding effective
`real time use.
`
`Known ICA signal separation systemstypically use a net-
`workoffilters, acting as a neural network, to resolve indi-
`vidual signals from any number of mixed signals input into
`the filter network. Thatis, the ICA network is used to separate
`aset of sound signals into a more orderedset of signals, where
`each signal represents a particular sound source. For example,
`if an ICA networkreceives a sound signal comprising piano
`music and a person speaking, a two port ICA network will
`separate the sound into two signals: one signal having mostly
`piano music, and another signal having mostly speech.
`Another prior technique is to separate sound based on
`auditory scene analysis. In this analysis, vigorous use is made
`of assumptions regarding the nature of the sourcespresent.It
`is assumed that a sound can be decomposed into small ele-
`ments such as tones and bursts, which in turn can be grouped
`according to attributes such as harmonicity and continuity in
`time. Auditory scene analysis can be performed using infor-
`mation from a single microphone or from several micro-
`phones. Thefield of auditory scene analysis has gained more
`attention due to the availability of computational machine
`
`4
`learning approaches leading to computational auditory scene
`analysis or CASA.Although interesting scientifically sinceit
`involves the understanding ofthe humanauditory processing,
`the model assumptions and the computational techniques are
`still in its infancy to solve a realistic cocktail party scenario.
`Other techniques for separating sounds operate by exploit-
`ing the spatial separation of their sources. Devices based on
`this principle vary in complexity. The simplest such devices
`are microphonesthat have highly selective, but fixed patterns
`of sensitivity. A directional microphone, for example,
`is
`designed to have maximum sensitivity to sounds emanating
`from a particular direction, and can therefore be used to
`enhance one audio source relative to others. Similarly, a
`close-talking microphone mounted near a speaker’s mouth
`mayreject some distant sources. Microphone-array process-
`ing techniquesare then used to separate sources by exploiting
`perceived spatial separation. These techniques are not prac-
`tical because sufficient suppression of a competing sound
`source cannot be achieved due to their assumption thatat least
`one microphonecontains only the desired signal, which is not
`practical in an acoustic environment.
`A widely known technique for linear microphone-array
`processing is often referred to as “beamforming”. In this
`method the time difference between signals due to spatial
`difference ofmicrophonesis used to enhance the signal. More
`particularly,
`it is likely that one of the microphones will
`“look” more directly at the speech source, whereas the other
`microphone may generate a signal that is relatively attenu-
`ated. Although someattenuation can be achieved, the beam-
`former cannotprovide relative attenuation of frequency com-
`ponents whose wavelengths are larger than the array. These
`techniques are methods for spatial filtering to steer a beam
`towards a sound source and therefore putting a null at the
`other directions. Beamforming techniques make no assump-
`tion on the sound source but assume that the geometry
`between source and sensors or the sound signal itself is
`knownfor the purpose of dereverberating the signal or local-
`izing the sound source.
`A known technique in robust adaptive beamforming
`referred to as “Generalized Sidelobe Canceling” (GSC) is
`discussed in Hoshuyama, O., Sugiyama, A., Hirano, A., A
`Robust Adaptive Beamformerfor Microphone Arrays with a
`Blocking Matrix using Constrained Adaptive Filters, JEEE
`Transactions on Signal Processing, vol 47, No 10, pp 2677-
`2684, October 1999. GSC aimsatfiltering out a single desired
`source signal z_i from a set of measurements x, as more fully
`explained in The GSCprinciple , Griffiths, L. J., Jim, C. W.,
`An alternative approach to linear constrained adaptive
`beamforming, IEEE Transaction Antennas and Propagation,
`vol 30, no 1, pp. 27-34, January 1982. Generally, GSC pre-
`defines that a signal-independent beamformerc filters the
`sensor signals so that the direct path from the desired source
`remains undistorted whereas, ideally, other directions should
`be suppressed. Mostoften, the position of the desired source
`must be pre-determined by additional localization methods.
`In the lower, side path, an adaptive blocking matrix B aimsat
`suppressing all components originating from the desired sig-
`nal z_i so that only noise components appearat the output of
`B. From these, an adaptive interference canceller a derives an
`estimate for the remaining noise componentin the output ofc,
`by minimizing an estimate of the total output power
`E(z_i*z_i). Thus the fixed beamformerc and the interference
`canceller a jointly perform interference suppression. Since
`GSCrequires the desired speaker to be confined to a limited
`tracking region, its applicability is limited to spatially rigid
`scenarios.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`17
`
`17
`
`

`

`US 7,464,029 B2
`
`5
`Another knowntechniqueis a class of active-cancellation
`algorithms, which is related to sound separation. However,
`this technique requires a “reference signal,” i.e., a signal
`derived from only of one of the sources. Active noise-cancel-
`lation and echo cancellation techniques make extensive use of
`this technique and the noise reduction is relative to the con-
`tribution of noise to a mixture byfiltering a knownsignal that
`contains only the noise, and subtracting it from the mixture.
`This method assumesthat one of the measured signals con-
`sists of one and only one source, an assumption whichis not
`realistic in manyreallife settings.
`Techniques for active cancellation that do not require a
`reference signalare called “blind” and are of primary interest
`in this application. They are now classified, based on the
`degree ofrealism ofthe underlying assumptionsregarding the
`acoustic processes by which the unwanted signals reach the
`microphones. One class of blind active-cancellation tech-
`niques maybe called “gain-based”or also knownas “‘instan-
`taneous mixing”: it is presumedthat the waveform produced
`by each source is received by the microphones simulta-
`neously, but with varying relative gains. (Directional micro-
`phones are most often used to produce the required differ-
`ences in gain.) Thus, a gain-based system attempts to cancel
`copies of an undesired source in different microphonesignals
`by applying relative gains to the microphone signals and
`subtracting, but not applying time delays or otherfiltering.
`Numerous gain-based methodsfor blind active cancellation
`have been proposed; see Herault and Jutten (1986), Tongetal.
`(1991), and Molgedey and Schuster (1994). The gain-based
`or instantaneous mixing assumption is violated when micro-
`phones are separated in space as in most acoustic applica-
`tions. A simple extension of this methodis to include a time
`delay factor but without any otherfiltering, which will work
`under anechoic conditions. However, this simple model of
`acoustic propagation from the sources to the microphonesis
`oflimited use when echoes and reverberation are present. The
`mostrealistic active-cancellation techniques currently known
`are “convolutive”: the effect of acoustic propagation from
`each source to each microphone is modeled as a convolutive
`filter. These techniques are morerealistic than gain-based and
`delay-based techniques becausethey explicitly accommodate
`the effects of inter-microphone separation, echoes andrever-
`beration. They are also more generalsince, in principle, gains
`and delays are special cases of convolutivefiltering.
`Convolutive blind cancellation techniques have been
`described by manyresearchers including Juttenet al. (1992),
`by Van Compernolle and Van Gerven (1992), by Platt and
`Faggin (1992), Bell and Sejnowski (1995), Torkkola (1996),
`Lee (1998) and by Parra et al. (2000). The mathematical
`model predominantly used in the case of multiple channel
`observations through an array of microphones, the multiple
`source models can be formulated as follows:
`
`Lom
`HO = D1 > ay (Osje-D + rl
`=0 j=l
`
`where the x(t) denotes the observed data, s(t) is the hidden
`source signal, n(t) is the additive sensory noise signal anda(t)
`is the mixingfilter. The parameter m is the numberof sources,
`L is the convolution order and depends on the environment
`acoustics andt indicates the time index. The first summation
`
`is due to filtering of the sources in the environment and the
`second summation is due to the mixing of the different
`sources. Most of the work on ICA has been centered on
`
`6
`algorithms for instantaneous mixing scenarios in which the
`first summation is removed and the task is to simplified to
`i

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket