throbber
(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
`
`(19) World Intellectual Property Organization
`International Bureau
`
`1111111111111111 IIIIII 111111111111111111111111111111 lllll lllll llll 1111111111111111111
`
`(43) International Publication Date
`20 November 2003 (20.11.2003)
`
`PCT
`
`(10) International Publication Number
`WO 03/096031 A2
`
`(51) International Patent Classification 7:
`
`GOlR
`
`(21) International Application Number:
`
`PCT/US03/06893
`
`(74) Agent: GREGORY, Richard, L., Jr.; Shemwell Gregory
`& Courtney LLP, 4880 Stevens Creek Blvd., Suite 201, San
`Jose, CA 95129 (US).
`
`(22) International Filing Date:
`
`5 March 2003 (05.03.2003)
`
`(25) Filing Language:
`
`(26) Publication Language:
`
`English
`
`English
`
`(30) Priority Data:
`60/361,981
`60/362,103
`60/362,161
`60/362,162
`60/362,170
`
`5 March 2002 (05.03.2002) US
`5 March 2002 (05.03.2002) US
`5 March 2002 (05.03.2002) US
`5 March 2002 (05.03.2002) US
`5 March 2002 (05.03.2002) US
`
`(71) Applicant: ALIPHCOM [US/US]; 410 Jesse Street, Unit
`601, San Francisco, CA 94103 (US).
`
`(72) Inventors: BURNETT, Gregory, C.; 675 South H Street,
`Livermore, CA 94550 (US). PETIT, Nicolas, J.; 3300
`Scott Street #207, San Francisco, CA 94123 (US). AS(cid:173)
`SEILY, Alexander, M.; 2538 Post Street, San Francisco,
`CA 94115 (US). EINUADI, Andrew, E.; 1570 Grove
`Street, San Francisco, CA 94117 (US).
`
`(81) Designated States (national): AE, AG, AL, AM, AT, AU,
`AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CO, CR, CU,
`CZ, DE, DK, DM, DZ, EC, EE, ES, Fl, GB, GD, GE, GH,
`GM, HR, HU, ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC,
`LK, LR, LS, LT, LU, LV, MA, MD, MG, MK, MN, MW,
`MX, MZ, NO, NZ, OM, PH, PL, PT, RO, RU, SD, SE, SG,
`SK, SL, TJ, TM, TN, TR, TT, TZ, UA, UG, UZ, VN, YU,
`ZA, ZM,ZW.
`
`(84) Designated States (regional): ARIPO patent (GH, GM,
`KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZM, ZW),
`Eurasian patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM),
`European patent (AT, BE, BG, CH, CY, CZ, DE, DK, EE,
`ES, Fl, FR, GB, GR, HU, IE, IT, LU, MC, NL, PT, RO,
`SE, SI, SK, TR), OAPI patent (BF, BJ, CF, CG, CI, CM,
`GA, GN, GQ, GW, ML, MR, NE, SN, TD, TG).
`
`Published:
`without international search report and to be republished
`upon receipt of that report
`
`[Continued on next page]
`
`(54) Title: VOICE ACTNITY DETECTION (VAD) DEVICES AND METHODS FOR USE WITH NOISE SUPPRESSION SYS(cid:173)
`TEMS
`
`---iiiiiiii
`iiiiiiii -
`== -------------------------------------------
`-iiiiiiii ---
`---iiiiiiii
`
`/00
`J
`
`VAD
`10;:z.
`
`iiiiiiii
`
`SIGNAL
`s(n)
`)10
`
`MIC 1
`110
`
`Cleaned speech +
`
`,-...I
`~
`Q
`\0
`~
`
`NOISE
`11(11)
`1.2;2.
`
`MIC2
`11.t
`
`Q ---
`~ (57) Abstract: Voice Activity Detection (VAD) devices, systems and methods are described for use with signal processing systems
`to denoise acoustic signals. Components of a signal processing system and/or VAD system receive acoustic signals and voice activity
`0 signals. Control signals are automatically generated from data of the voice activity signals. Components of the signal processing
`> system and/or VAD system use the control signals to automatically select a denoising method appropriate to data of frequency sub(cid:173)
`
`;;, bands of the acoustic signals. The selected denoising method is applied to the acoustic signals to generate denoised acoustic signals.
`
`Page 1 of 72
`
`GOOGLE EXHIBIT 1015
`
`

`

`WO 03/096031 A2
`
`1111111111111111 IIIIII 111111111111111111111111111111 lllll lllll llll 1111111111111111111
`
`For two-letter codes and other abbreviations, refer to the "Guid(cid:173)
`ance Notes on Codes and Abbreviations" appearing at the begin(cid:173)
`ning of each regular issue of the PCT Gazette.
`
`Page 2 of 72
`
`

`

`WO 03/096031
`
`1
`
`PCT /0S03/06893
`
`Attorney Docket No. ALPH.P015WO
`
`Transmittal of Patent Application for Filing
`
`Certification Under 37 C.F.R. §I.JO (if applicable)
`
`EV 235 876 028 US
`"Express Mail" Label Number
`
`March 5, 2003
`Date of Deposit
`
`I hereby certify that this application, and any other documents referred to as enclosed herein are being
`deposited in an envelope with the United States Postal Service "Express Mail Post Office to Addressee" service under
`37 CFR §1.10 on the date indicated above and addressed to the Assistant Commissioner for Patents, Washington,
`;J., / J
`D.C. 20231
`/ ( j,X.
`-~-~---.t--+-------
`
`Richard L. Gregory, Jr.
`
`(Print Name of Person Mailing Application)
`
`Voice Activity Detection {YAD) Devices and Methods For Use With Noise
`
`5
`
`Suppression Systems
`
`INVENTORS:
`
`GREGORY C. BURNETT
`NICOLAS J. PETIT
`ALEXANDER M. ASSEILY
`ANDREW E. EINAUDI
`
`RELATED APPLICATIONS
`
`This application claims priority from the following.United States Patent
`
`Applications: Application Number 60/362,162, entitled PATHFINDER-BASED
`
`15 VOICE ACTIVITY DETECTION (PV AD) USED WITH PATHFINDER NOISE
`
`SUPPRESSION, filed March 5, 2002; Application Number 60/362,170, entitled
`
`ACCELEROMETER-BASED VOICE ACTIVITY DETECTION (PV AD) WITH
`
`PATHFINDER NOISE SUPPRESSION, filed March 5, 2002; Application Number
`
`60/361,981, entitled ARRAY-BASED VOICE ACTIVITY DETECTION (AV AD)
`
`20 AND PATHFINDER NOISE SUPPRESSION, filed March 5, 2002; Application
`
`Number 60/362,161, entitled PATHFINDER NOISE SUPPRESSION USING AN
`
`EXTERNAL VOICE ACTIVITY DETECTION (V AD) DEVICE, filed March 5, 2002;
`
`Application Number 60/362,103, entitled ACCELEROMETER-BASED VOICE
`
`Page 3 of 72
`
`

`

`WO 03/096031
`
`2
`
`PCT /0S03/06893
`
`ACTIVITY DETECTION, filed March 5, 2002; and Application Number 60/368,343,
`
`entitled TWO-MICROPHONE FREQUENCY-BASED VOICE ACTIVITY
`
`DETECTION, filed March 27, 2002, all of which are currently pending.
`
`Further, this application relates to the following United States Patent
`
`5 Applications: Application Number 09/905,361, entitled METHOD AND APPARATUS
`
`FOR REMOVING NOISE FROM ELECTRONIC SIGNALS, filed July 12, 2001;
`
`Application Number 10/159,770, entitled DETE_CTING VOICED AND UNVOICED
`
`SPEECH USING BOTH ACOUSTIC AND NONACOUSTIC SENSORS, filed May
`
`30, 2002; and Application Number 10/301,237, entitled METHOD AND
`
`10
`
`_APPARATUS FOR REMOVING NOISE FROM ELECTRONIC SIGNALS, filed
`
`November 21, 2002.
`
`TECHNICAL FIELD
`
`The disclosed embodiments relate to systems and methods for detecting and
`
`15
`
`processing a desired signal in the presence of acoustic noise.
`
`BACKGROUND
`
`Many noise suppression algorithms and techniques have been developed over
`
`the years. Most of the noise suppression systems in use today for speech
`
`20
`
`communication systems are based on a single-microphone spectral subtraction
`
`technique first develop in the 1970's and described, for example, by S. F. Boll in
`
`"Suppression of Acoustic Noise in Speech using Spectral Subtraction," IEEE Trans. on
`
`ASSP, pp. 113-120, 1979. These techniques have been refined over the years, but the
`
`basic principles of operation have remained the same. See, for example, United States
`
`25
`
`Patent Number 5,687,243 of McLaughlin, et al., and United States Patent Number
`
`4,811,404 of Vilmur, et al. Generally, these techniques make use of a single(cid:173)
`
`microphone Voice Activity Detector (V AD) to determin_e the background noise
`
`characteristics, where "voice" is generally understood to include human voiced speech,
`
`unvoiced speech, or a combination of voiced and unvoiced speech.
`
`30
`
`The V AD has also been used in digital cellular systems. As an example of such
`
`a use, see United States Patent Number 6,453,291 of Ashley, where a V AD
`
`configuration appropriate to the front-end of a digital cellular system is described.
`
`Further, some Code Division Multiple Access (CDMA) systems utilize a V AD to
`
`Page 4 of 72
`
`

`

`WO 03/096031
`
`3
`
`PCT /0S03/06893
`
`minimize the effective radio spectrum used, thereby allowing for more system capacity.
`
`Also, Global System for Mobile Communication (GSM) systems can include a V AD to
`
`reduce co-channel interference and to reduce battery consumption ·on the client or
`
`subscriber device.
`
`5
`
`These typical single-microphone V AD systems are significantly limited in
`
`capability as a result of the analysis of acoustic information received by the single
`
`microphone, wherein the analysis is performed using typical signal processing
`
`techniques. In particular, limitations in performance of these single-microphone V AD
`
`systems are noted when processing signals having a low signal-to-noise ratio (SNR),
`
`10
`
`and in settings where the background noise varies quickly. Thus, similar limitations are
`
`found in noise suppression systems using these single-microphone V ADs.
`
`Page 5 of 72
`
`

`

`WO 03/096031
`
`4
`
`PCT /0S03/06893
`
`BRIEF DESCRIPTION OF THE FIGURES
`
`Figure 1 is a block diagram of a signal processing system including the
`
`Pathfinder noise suppression system and a V AD system, under an embodiment.
`
`Figure lA is a block diagram of a V AD system including hardware for use in
`
`5
`
`receiving and processing signals relating to V AD, under an embodiment.
`
`Figure lB is a block diagram of a V AD system using hardware of the
`
`associated noise suppression system for use in receiving V AD information, under an
`
`alternative embodiment.
`
`Figure 2 is a block diagram of a signal processing system that incorporates a
`
`10
`
`classical adaptive noise cancellation system, as known in the art.
`
`Figure 3 is a flow diagram of a method for determining voiced and unvoiced
`
`speech using an accelerometer-based V AD, under an embodiment.
`
`Figure 4 shows plots including a noisy audio signal (live recording) along with
`
`a corresponding accelerometer-based V AD signal, the corresponding accelerometer
`
`15
`
`output signal, and the denoised audio signal following processing by the Pathfinder
`
`system using the V AD signal, under an embodiment.
`
`Figure 5 shows plots including a noisy audio signal (live recording) along with
`
`a corresponding SSM-based V AD signal, the corresponding SSM output signal, and the
`
`denoised audio signal following processing by the Pathfinder system using the V AD
`
`20
`
`signal, under an embodime~t.
`
`Figure 6 shows plots including a noisy audio signal (live recording) along with
`
`a corresponding GEMS-based V AD signal, the corresponding GEMS output signal, and
`
`the denoised audio signal following processing by the Pathfinder system using the
`
`V AD sign~, under an embodiment.
`
`25
`
`Figure 7 shows plots including recorded spoken acoustic data with digitally
`
`added noise along with a corresponding EGG-based V AD signal, and the
`
`corresponding highpass filtered EGG output signal, unde:r an embodiment.
`
`Figure 8 is a•flow diagram 80 of a method for determining voiced speech using
`
`a video-based V AD, under an embodiment.
`
`,.
`
`.. •
`
`30
`
`Figure 9 shows plots including a noisy audio signal (live recording) along with
`
`a corresponding single (gradient) microphone-~ased V AD signal, the corresponding
`
`gradient microphone output signal, and the denoised audio signal following processing
`
`by the Pathfinder system using the V AD signal, under an embodiment.
`
`Page 6 of 72
`
`

`

`WO 03/096031
`
`5
`
`PCT /0S03/06893
`
`Figure 10 shows a single cardioid unidirectional microphone of the microphone
`
`array, along with the associated spatial response curve, under an embodiment.
`
`Figure 11 shows a microphone array of a PV AD· system, under an embodiment.
`
`Figure 12 is a flow diagram of a method for determining voiced and unvoiced
`
`5
`
`speech using H 1(z) gain values, under an alternative embodiment of the PV AD.
`Figure 13 shows plots including a noisy audio signal (live recording) along
`
`with a corresponding microphone-based PV AD signal, the corresponding PV AD gain
`
`versus time signal, and the denoised audio signal following processing by the
`
`Pathfinder system using the PV AD signal, under an embodiment.
`
`10
`
`Figure 14 is a flow diagram of a method for determining voiced and unvoiced
`
`speech using a stereo V AD, under an embodiment.
`
`Figure 15 shows plots including a noisy audio signal (live recording) along
`
`with a corresponding SV AD signal, and the denoised audio signal following processing
`
`by the Pathfinder system using the SV AD signal, under an embodiment.
`
`15
`
`Figure 16 is a flow diagram of a method for determining voiced and unvoiced
`
`speech using an AV AD, under an embodiment.
`
`Figure 17 shows plots including audio signals and from each microphone of an
`
`AV AD system along with the corresponding combined energy signal, under an
`
`embodiment.
`
`20
`
`Figure 18 is a block diagram of a signal processing system including the
`
`Pathfinder noise suppression system and a single-microphone ( conventional) V AD
`
`system, under an embodiment.
`
`Figure 19 is a flow diagram of a method for generating voicing information
`
`using a single-microphone V AD, under an embodiment.
`
`25
`
`Figure 20 is a flow diagram of a method for determining voiced and unvoiced
`
`speech using an airflow-based V AD, under an embodiment.
`
`Figure 21 shows plots including a noisy audio signal along with a
`
`corresponding manually activated/calculated V AD signal, and the denoised audio
`
`signal following processing by the Pathfinder system using the manual V AD signal,
`
`30
`
`under an embodiment.
`
`In the drawings, the same reference numbers identify identical or substantially
`
`similar elements or acts. To easily identify the discussion· of any particular element or
`
`act, the most significant digit or digits in a reference number refer to the Figure number
`
`Page 7 of 72
`
`

`

`WO 03/096031
`
`6
`
`PCT /0S03/06893
`
`in which that element is first introduced (e.g., element 104 is first introduced and
`
`discussed with respect to Figure 1 ).
`
`Page 8 of 72
`
`

`

`WO 03/096031
`
`7
`
`PCT /0S03/06893
`
`DETAILED DESCRIPTION
`
`Numerous Voice Activity Detection (V AD) devices and methods are described
`
`below for use with adaptive noise suppression systems. Further, results are presented
`
`5
`
`below from experiments using the V AD devices and methods described herein as a
`
`component of a noise suppression system, in particular the Pathfinder Noise
`
`Suppression System available from Aliph, San Francisco, California
`
`(http://www.aliph.com), but the embodiments are not so limited. In the description
`
`below, when the Pathfinder noise suppression system is referred to, it should be kept in
`
`10 mind that noise suppression systems that estimate the noise waveform and subtract it
`
`from a signal and that use or are capable of using V AD information for reliable
`
`operation are included in that reference. Pathfinder is simply a convenient referenced
`
`implementation for a system that operates on signals comprising desired speech signals
`
`along with noise.
`
`15
`
`When using the V AD devices and methods described herein with a noise
`
`suppression system, the V AD signal is processed independently of the noise
`
`suppression system, so that the receipt and processing ofVAD information is
`
`independent from the processing associated with the noise suppression, but the
`
`embodiments are not so limited. This independence is attained physically (i.e.,
`
`20
`
`different hardware for use in receiving and processing signals relating to the V AD and
`
`the noise suppression), through processing (i.e., using the same hardware to receive
`
`signals into the noise suppression system while using independent techniques
`
`(software, algorithms, routines) to process the received signals), and through a
`
`combination of different hardware and different software.
`
`25
`
`In the following description, "acoustic" is generally defined as acoustic waves
`
`propagating in air. Propagation of acoustic waves in media other than air will be noted
`
`as such. References to "speech" or "voice" generally refer to human speech including
`
`voiced speech, unvoiced speech, and/or a combination of voiced and unvoiced speech.
`
`Unvoiced speech or voiced speech is distinguished where necessary. The term "noise
`
`30
`
`suppression" generally describes any method by which noise is reduced or eliminated
`
`in an electronic signal.
`
`Moreover, the term "V AD" is generally defined as a vector or array signal, data,
`
`or information that in some manner represents the occurrence of speech in the digital or
`
`Page 9 of 72
`
`

`

`WO 03/096031
`
`8
`
`PCT /0S03/06893
`
`analog domain. A common representation ofVAD information is a one-bit digital
`
`signal sampled at the same rate as the corresponding acoustic signals, with a zero value
`
`representing .that no speech has occurred during the corresponding time sample, and a
`
`unity value indicating that speech has occurred during the corresponding time sample.
`
`5 While the embodiments described herein are generally described in the digital domain,
`the descriptions are also valid for the analog domain.
`The V AD devices/methods described herein generally include vibration and
`
`movement sensors, acoustic sensors, and manual V AD devices, but are not so limited.
`
`In one embodiment, an accelerometer is placed on the skin for use in detecting skin
`
`10
`
`surface vibrations that correlate with human speech. These recorded vibrations are then
`
`used to calculate a V AD signal for use with or by an adaptive noise suppression
`
`algorithm in suppressing environmental acoustic noise from a simultaneously (within a
`
`few milliseconds) recorded acoustic signal that includes both speech and noise.
`
`Another embodiment of the V AD devices/methods described herein includes an
`
`15
`
`acoustic microphone modified with a membrane so that the microphone no longer
`
`efficiently detects acoustic vibrations in air. The membrane, though, allows the
`
`microphone to detect acoustic vibrations in objects with which it is in physical contact
`
`(allowing a good mechanical impedance match), such as human skin. That is, the
`
`acoustic microphone is modified in some way such that it no longer detects acoustic
`
`20
`
`vibrations in air (where it no longer has a good physical impedance match), but only in
`
`objects with which the microphone is in contact. This configures the microphone, like
`
`the accelerometer, to detect vibrations of human skin associated with the speech
`
`production of that human while not efficiently detecting acoustic environmental noise
`
`in the air. The detected vibrations are processed to form a V AD signal- for use in a
`
`25
`
`noise suppression system, as detailed below.
`
`Yet another embodiment of the V AD described herein uses an electromagnetic
`
`vibration sensor, such as a radiofrequency vibrometer (RF) or laser vibrometer, which
`
`detect skin vibrations. Further, the RF vibrometer detects the movement of tissue
`
`within the body, such as the inner surface of the cheek or the tracheal wall. Both the
`
`30
`
`exterior skin and internal tissue vibrations associated with speech production can be
`
`used to form a VAD signal for use in a noise suppression system as detailed below.
`
`Further embodiments of the V AD devices/methods described herein include an
`
`electroglottograph (EGG) to directly detect vocal fold movement. The EGG is an
`
`Page 10 of 72
`
`

`

`WO 03/096031
`
`9
`
`PCT /0S03/06893
`
`alternating current- (AC) ba~ed method of measuring vocal fold contact area. When the
`
`EGG indicates sufficient vocal fold contact the assumption that follows is that voiced
`
`speech is occurring, and a corresponding V AD signal representative of voiced speech is
`
`generated for use in a noise suppression system as detailed below. Similarly, an
`
`5
`
`additional V AD embodiment uses a video system to detect movement of a person's
`
`vocal articulators, an indication that speech is being produced.
`
`Another set of V AD devices/methods described below use signals received at
`
`one or more acoustic microphones along with corresponding signal processing
`
`techniques to produce V AD signals accurately and reliably under most environmental
`
`10
`
`noise conditions. These embodiments include simple arrays and co-located (or nearly
`
`so) combinations of omnidirectional and unidirectional acoustic microphones. The
`
`simplest configuration in this set ofV AD embodiments includes the use of a single
`
`microphone~ located very close to the mouth of the user in order to record signals at a
`
`relatively high SNR. This microphone can be a gradient or "close-talk" microphone,
`
`15
`
`for example. Other configurations include the use of combinations of unidirectional
`
`and omnidirectional microphones in various orientations and configurations. The
`
`signals received at these microphones, along with the associated signal processing, are
`
`used to calculate a V AD signal for use with a noise suppression system, as described
`
`below. Also described below is a V AD system that is activated manually, as in a
`
`20 walkie-talkie, or by an observer to the system.
`
`As referenced above, the V AD devices and methods described herein are for
`
`use with noise suppression systeµis like, for example, the Pathfinder Noise Suppression
`
`System (referred to herein as the "Pathfinder system") available from Aliph of San
`
`Francisco, California. While the descriptions of the V AD devices herein are provided
`
`25
`
`in the context of the Pathfinder Noise Suppression System, those skilled in the art will
`
`recognize that the V AD devices and methods can be used with a variety of noise
`
`suppression systems and methods known in the art.
`
`The Pathfinder system is a digital signal processing- (DSP) based acoustic noise
`
`suppression and echo-cancellation system. The Pathfinder system, which can couple to
`
`30
`
`the front-end of speech processing systems, uses V AD information and received
`
`acoustic information to reduce or eliminate noise in desired acoustic signals by
`
`estimating the noise waveform and subtracting it from a signal including both speech
`
`Page 11 of 72
`
`

`

`WO 03/096031
`
`PCT /0S03/06893
`
`and noise. The Pathfinder system is described further below and in the Related
`
`. Applications.
`
`Figure 1 is a block diagram of a signal processing system 100 including the
`
`Pathfinder noise suppression system 101 and a V AD system 102, under an
`
`5
`
`embodiment. The signal processing system 100 includes two microphones MIC 1 110
`
`and MIC 2 112 that receive signals or information from at least one speech signal
`
`source 120 and at least one noise source 122. The path s(n) from the speech signal
`
`source 120 to MIC 1 and the path n(n) from the noise source 122 to MIC 2 are
`
`considered to be unity. Further, H1(z) represents the path from the noise source 122 to
`10 MIC 1, and H2(z) represents the path from the speech signal source 120 to MIC 2. In
`
`contrast to the signal processing system 100 including the Pathfinder system 101,
`
`Figure 2 is a block diagram of a signal processing system 200 that incorporates a
`
`classical adaptive noise cancellation system 202 as known in the art.
`
`Components of the signal processing system 100, for example the noise
`
`15
`
`suppression system 101, couple to the microphones MIC 1 and MIC 2 via wireless
`
`couplings, wired couplings, and/or a combination of wireless and wired couplings.
`
`Likewise, the V AD system 102 couples to components of the signal processing system
`
`100, like the noise suppression system 101, via wireless couplings, wired couplings,
`
`and/or a combination of wireless and wired couplings. As an example, the V AD
`
`20
`
`devices and microphones described below as components of the V AD system 102 can
`
`comply with the Bluetooth wireless specification for wireless communication with
`
`other components of the signal processing system, but are not so limited.
`
`Referring to Figure 1, the V AD signal 104 from the V AD system 102, derived
`
`in a manner described herein, controls noise removal from the received signals without
`
`25
`
`respect to noise type, amplitude, and/or orientation. When the V AD signal 104
`
`indicates an absence of voicing, the Pathfinder system 101 uses MIC 1 and MIC 2
`
`signals to calculate the coefficients for a model of trap.sfer function H1(z) over pre(cid:173)
`specified sub bands of the received signals. When the V AD signal 104 indicates the
`
`presence of voicing, the Pathfinder system 101 stops updating H1(z) and starts
`
`30
`
`calculating the coefficients for transfer function H2(z) over pre-specified subbands of
`the received signals. Updates ofH1 coefficients can continue in a subband during
`
`speech production if the SNR in the subband is low (note that H1(z) and H2(z) are
`sometimes referred to herein as H1 and H2, respectively, for convenience). The
`
`Page 12 of 72
`
`

`

`WO 03/096031
`
`PCT /0S03/06893
`
`11
`
`Pathfinder system 101 of an embodiment uses the Least Mean Squares (LMS)
`technique to calculate H1 and H2, as described further by B. Widrow and S. Stearns in
`"Adaptive Signal Processing", Prentice-Hall Publishing, ISBN 0-13-004029-0, but is
`
`not so limited. The· transfer function can be calculated in the time domain, frequency
`
`5
`
`domain, or a combination of both the time/frequency domains. The Pathfinder system
`
`subsequently removes noise from the received acoustic signals of interest using
`
`combinations of the transfer functions H1(z) and H2(z), thereby generating at least one
`
`denoised acoustic stream.
`
`The Pathfinder system can be implemented in a variety of ways, but common to
`
`10
`
`all of the embodiments is reliance on an accurate and reliable VAD device and/or
`
`method. The V AD device/method should be accurate because the Pathfinder system
`
`updates its filter coefficients when there is no speech or when the SNR during speech is
`
`low. If sufficient speech energy is present during coefficient update, subsequent speech
`
`with similar spectral characteristics can be suppressed, an undesirable occurrence. The
`
`15 V AD device/method should be robust to support high accuracy under a variety of
`
`environmental conditions. Obviously, there are likely to be some conditions under
`
`which no V AD device/method will operate satisfactorily, but under normal
`
`circumstances the V AD device/method should work to provide maximum noise
`
`suppression with few adverse affects on the speech signal of interest.
`
`20
`
`When using V AD devices/methods with a noise suppression system, the V AD
`
`signal is processed independently of the noise suppression system, so that the receipt
`
`and processing ofV AD information is independent from the processing associated with
`
`the noise suppression, but the embodiments are not so limited. This independence is
`
`attained physically (i.e., different hardware for use in receiving and processing signals
`
`25
`
`relating to the V AD and the noise suppression), through processing (i.e., using the same
`
`hardware to receive signals into the noise suppression system while using independent
`
`techniques (software, algorithms, routines) to process the received signals), and through
`
`a combination of different hardware and different software, as described below.
`Figure lA is a block diagram of a V AD system 102A including hardware for
`
`30
`
`use in receiving and processing signals relating to V AD, under an embodiment. The
`
`VAD system 102A includes a VAD device 130 coupled to provide data to a
`
`corresponding V AD algorithm 140. Note that noise suppression systems of alternative
`
`Page 13 of 72
`
`

`

`WO 03/096031
`
`12
`
`PCT /0S03/06893
`
`embodiments can integrate some or all functions of the V AD algorithm with the noise
`suppression processing in any manner obvious to those skilled in the art.
`Figure lB is a block diagram of a V AD system 102B using hardware of the
`
`associated noise suppression system 101 for use in receiving V AD information 164,
`
`5
`
`under an embodiment. The V AD system 102B includes a V AD algorithm 150 that
`
`receives data 164 from MIC 1 and MIC 2, or other components, of the corresponding
`
`signal processing system 100. Alternative embodiments of the noise suppression
`
`system can integrate some or all functions of the V AD algorithm with the noise
`
`suppression processing in any manner obvious to those skilled in the art.
`
`10
`
`Vibration/Movement-based V AD Devices/Methods
`
`The vibration/movement-based VAD devices include the physical hardware
`
`devices for use in receiving and processing signals relating to the V AD and the noise
`
`suppression. As a speaker or user produces speech, the resulting vibrations propagate
`
`15
`
`through the tissue of the speaker and, therefore can be detected on and beneath the skin
`
`using various methods. These vibrations are an excellent source ofVAD information,
`
`as they are strongly associated with both voiced and unvoiced speech (although the
`
`unvoiced speech vibrations are much weaker and more difficult to detect) and generally
`
`are only slightly affected by environmental acoustic noise (some devices/methods, for
`
`20
`
`example the electromagnetic vibrometers described below, are not affected by
`
`environmental acoustic noise). These tissue vibrations or movements are detected
`
`using a number ofVAD devices including, for example, accelerometer-based devices,
`
`skin surface microphone (SSM) devices, electromagnetic (EM) vibrometer devices
`
`including both radio frequency (RF) vibrometers and laser vibrometers, direct glottal
`
`25 motion measurement devices, and video detection devices.
`
`Accelerometer-based V AD Devices/Methods
`
`Accelerometyrs can detect skin vibrations associated with speech. As such, and
`
`30 with reference to Figure 1 and Figure lA, a V AD system 102A of an embodiment
`
`includes an accelerometer-based device 130 providing data of the skin vibrations to an
`
`associated algorithm 140. The algorithm of an embodiment uses energy calculation
`
`Page 14 of 72
`
`

`

`WO 03/096031
`
`PCT /0S03/06893
`
`13
`
`techniques along with a threshold comparison, as described below, but is not so limited.
`Note that more compiex energy-based methods are available to those skilled in the art.
`Figure 3 is a flow diagram ·300 of a method for determining voiced and
`
`unvoiced speech using an accelerometer-based V AD, under an embodiment.
`
`5 Generally, the energy is calculated by defining a standard window size over which the
`
`calculation is to take place and summing the square of the amplitude over time as
`Energy = L xf ,
`
`i
`
`where i is the digital sample subscript and ranges from the beginning of the window to
`
`the end of the window.
`
`10
`
`Referring to Figure 3, operation begins upon receiving accelerometer data, at
`
`block 302. The processing associated with the V AD includes filtering the data from the
`
`accelerometer to preclude aliasing, and digitizing the filtered data for processing, at
`
`block 304. The digitized data is segmented into windows 20 milliseconds (msec) in
`
`length, and the data is stepped 8 msec at a time, at block 306. The processing further
`
`15
`
`includes filtering the windowed data, at block 308, to remove spectral information that
`
`is corrupted by noise or is otherwise unwanted. The energy in each window is
`
`calculated by summing the squares of the amplitudes as described above, at block 310.
`
`The calculated energy values can be normalized by dividing the energy values by the
`
`window length; however, this involves an extra calculation and is not needed as long as
`
`20
`
`the window length is not varied.
`
`The calculated, or normalized, energy values are compared to a threshold, at
`
`block 312. The speech corresponding to the accelerometer data is designated as voiced
`
`speech when the energy of the accelerometer data is at or above a threshold value, at
`
`block 314. Likewise, the speech corresponding to the accelerometer data is designated
`
`25
`
`as unvoiced speech when the energy of the accelerometer data is below the threshold
`
`value, at block 316. Noise suppression systems of alternative embodiments can use
`
`multiple threshold values to indicate the relative strength or confidence of the voicing
`
`signal, but are not so' limited. Multiple sub bands may also be processed for increased
`
`accuracy.
`
`30
`
`Figure 4 shows plots including a noisy audio signal (live recording) 402 along
`
`with a corresponding accelerometer-based V AD signal 404, the corresponding
`
`accelerometer output signal 412, and the denoised audio signal 422 following
`
`Page 15 of 72
`
`

`

`WO 03/096031
`
`PCT /0S03/06893
`
`14
`
`processing by the Pathfinder system using the V AD signal 404, under an embodiment.
`
`In this example, the accelerometer d,ata has been bandpass filtered between 500 and
`
`2500 Hz to remove unwanted acoustic noise that can couple to the accelerometer below
`
`500 Hz. The audio signal 402 was recorded using an Aliph microphone set and
`
`5
`
`standard accelerometer in a babble noise environment inside a chamber measuring six
`
`(6) feet on a side and having a ceiling height of eight (8) feet. The Pathfinder system is
`
`implemented in real-time, with a delay of approx

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket