`(12) Patent Application Publication (10) Pub. No.: US 2006/0120537 A1
`(43) Pub. Date:
`Jun. 8, 2006
`Burnett et al.
`
`US 2006O120537A1
`
`(54) NOISE SUPPRESSING
`MULT-MICROPHONE HEADSET
`
`(76) Inventors: Gregory C. Burnett, Dodge Center,
`MN (US); Jaques Gagne, Los Gatos,
`CA (US); Dore Mark, San Francisco,
`CA (US); Alexander M. Asseily,
`London (GB); Nicolas Petit,
`Burlingame, CA (US)
`Correspondence Address:
`COURTNEY STANFORD & GREGORY LLP
`P.O. BOX 9686
`SAN JOSE, CA 95157 (US)
`(21) Appl. No.:
`11/199,856
`
`(22) Filed:
`
`Aug. 8, 2005
`Related U.S. Application Data
`(60) Provisional application No. 60/599.468, filed on Aug.
`6, 2004. Provisional application No. 60/599,618, filed
`on Aug. 6, 2004.
`
`Publication Classification
`
`(51) Int. Cl.
`A6IF II/06
`GIOK II/I6
`HO3B 29/00
`
`(2006.01)
`(2006.01)
`(2006.01)
`
`(52) U.S. Cl. ............................................. 381/71.6; 381/72
`
`(57)
`
`ABSTRACT
`
`A new type of headset that employs adaptive noise Suppres
`Sion, multiple microphones, a voice activity detection
`(VAD) device, and unique mechanisms to position it cor
`rectly on either ear for use with phones, computers, and
`wired or wireless connections of any kind is described. In
`various embodiments, the headset employs combinations of
`new technologies and mechanisms to provide the user a
`unique communications experience.
`
`
`
`
`
`
`
`230
`
`240
`
`WAD
`Algorithm
`
`Noise
`Suppression
`
`
`
`101
`
`Page 1 of 31
`
`Amazon v. Jawbone
`U.S. Patent 10,779,080
`Amazon Ex. 1013
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 1 of 21
`
`US 2006/O120537 A1
`
`
`
`w
`
`t
`
`Page 2 of 31
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 2 of 21
`
`US 2006/O120537 A1
`
`
`
`W-I’OIH (~001
`
`
`
`
`
`
`
`
`
`ZZZZZZZZZZZZZZA ' ZZZZZZZZZZZZZZZ
`
`% Z
`
`2
`277.277717/7ZZZ
`
`Page 3 of 31
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 3 of 21
`
`US 2006/O120537 A1
`
`FIG.1-B
`
`
`
`
`
`
`
`230
`
`240
`
`VAD
`Algorithm
`
`FIG.2
`
`Noise
`Suppression
`
`
`
`101
`
`Page 4 of 31
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 4 of 21
`
`US 2006/O120537 A1
`
`
`
`Page 5 of 31
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 5 of 21
`
`US 2006/O120537 A1
`
`
`
`Page 6 of 31
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 6 of 21
`
`US 2006/0120537 A1
`
`Receive SSM sensor data
`
`Filter and digitize SSM sensor data
`
`3 O O
`
`302
`
`304
`
`Segment and step digitized data
`
`-306
`
`Remove spectral information corrupted by noise
`
`Calculate energy in each Window
`
`Compare energy to threshold values
`
`Energy above threshold indicates voiced speech
`
`Energy below threshold indicates unvoiced speech
`
`308
`
`310
`
`312
`
`314
`
`316
`
`FIG.3
`
`Page 7 of 31
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 7 of 21
`
`US 2006/0120537 A1
`
`110
`
`FIG.3-A
`
`110
`
`
`
`13
`
`
`
`p
`
`110
`
`
`
`
`
`
`
`Ø 6.4
`
`112
`
`
`
`ZZ 44
`
`7 1 A
`
`Ø 2.8
`
`Section A-A
`
`Page 8 of 31
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 8 of 21
`
`US 2006/O120537 A1
`
`
`
`Page 9 of 31
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 9 of 21
`
`US 2006/O120537 A1
`
`0.4
`
`404 "L.A.I.A.A.
`
`402
`
`2
`
`2.5
`
`3
`
`3.5
`
`4
`
`4.5
`
`5
`
`5.5
`
`6
`
`6.5
`
`-0.2
`-0.4
`
`0.1
`0.05
`
`-0.05
`
`-0.1
`
`
`
`2
`
`2.5
`
`3
`
`3.5
`
`4
`
`4.5
`
`6.5
`6
`x 10'
`Time (samples at 8 kHz)
`N--
`FIG.4
`
`5
`
`5.5
`
`Page 10 of 31
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 10 of 21
`
`US 2006/O120537 A1
`
`
`
`FIG.4-A
`
`N SYN
`
`Section A-A
`
`Page 11 of 31
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 11 of 21
`
`US 2006/O120537 A1
`
`
`
`Page 12 of 31
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 12 of 21
`
`US 2006/O120537 A1
`
`
`
`AWay
`from
`Speech
`
`Towards
`Speech
`
`FIG.5
`
`Page 13 of 31
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 13 of 21
`
`US 2006/O120537 A1
`
`
`
`504
`506
`508
`
`FIG.5-A
`
`Inside Ear Canal
`520
`
`Ear
`
`In Front of Ear
`512
`
`Page 14 of 31
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 14 of 21
`
`US 2006/0120537 A1
`
`
`
`Page 15 of 31
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 15 of 21
`
`US 2006/O120537 A1
`
`
`
`FIG.6-A
`
`Page 16 of 31
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 16 of 21
`
`US 2006/O120537 A1
`
`
`
`
`
`FIG.7
`
`Mic 1 response
`
`Mic 2 body
`704
`
`Page 17 of 31
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 17 of 21
`
`US 2006/O120537 A1
`
`
`
`&
`
`s
`
`s
`
`Page 18 of 31
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 18 of 21
`
`US 2006/O120537 A1
`
`
`
`Page 19 of 31
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 19 of 21
`
`US 2006/0120537 A1
`
`
`
`0098
`000€
`
`009 Z
`
`000Z
`
`009 I
`
`000 I
`
`009
`
`(
`
`
`
`ZH). ÁðuônbºIII
`
`8’OIH
`
`(gp) apnuseW
`
`Page 20 of 31
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 20 of 21
`
`US 2006/O120537 A1
`
`Body
`
`
`
`Earbud Barrel
`
`FIG.8-B
`
`Page 21 of 31
`
`
`
`Patent Application Publication Jun. 8, 2006 Sheet 21 of 21
`
`US 2006/O120537 A1
`
`
`
`FIG.10-B
`
`Page 22 of 31
`
`
`
`US 2006/0120537 A1
`
`Jun. 8, 2006
`
`NOISE SUPPRESSING MULT-MICROPHONE
`HEADSET
`
`RELATED APPLICATIONS
`0001) This application claims the benefit of U.S. Provi
`sional Patent Application Ser. No. 60/599.468, titled “Jaw
`bone Headset’ and filed Aug. 6, 2004, which is hereby
`incorporated by reference herein in its entirety. This appli
`cation further claims the benefit of U.S. Provisional Patent
`Application Ser. No. 60/599,618, titled “Wind and Noise
`Compensation in a Headset’ and filed Aug. 6, 2004, which
`is hereby incorporated by reference herein in its entirety.
`This application is related to the following U.S. patent
`applications assigned to Aliph, of Brisbane, Calif. These
`include:
`0002 1. A unique noise suppression algorithm (refer
`ence Method and Apparatus for Removing Noise from
`Electronic Signals, filed Nov. 21, 2002, and Voice
`Activity Detector (VAD)—Based Multiple Microphone
`Acoustic Noise Suppression, filed Sep. 18, 2003)
`0003 2. A unique microphone arrangement and con
`figuration (reference Microphone and Voice Activity
`Detection (VAD) Configurations for use with Commu
`nications Systems, filed Mar. 27, 2003)
`0004 3. A unique voice activity detection (VAD) sen
`Sor, algorithm, and technique (reference Acoustic
`Vibration Sensor, filed Jan. 30, 2004, and Voice Activ
`ity Detection (VAD) Devices and Systems, filed Nov.
`20, 2003)
`0005. 4. An incoming audio enhancement system
`named Dynamic Audio Enhancement (DAE) that filters
`and amplifies the incoming audio in order to make it
`simpler for the user to better hear the person on the
`other end of the conversation (i.e. the “far end').
`0006 5. A unique headset configuration that uses sev
`eral new techniques to ensure proper positioning of the
`loudspeaker, microphones, and VAD sensor as well as
`a comfortable and stable position.
`All of the U.S. patents referenced herein are incorporated
`by reference herein in their entirety.
`
`FIELD
`0007. The disclosed embodiments relate to systems and
`methods for detecting and processing a desired signal in the
`presence of acoustic noise.
`
`BACKGROUND
`0008 Many noise suppression algorithms and techniques
`have been developed over the years. Most of the noise
`Suppression systems in use today for speech communication
`systems are based on a single-microphone spectral Subtrac
`tion technique first develop in the 1970s and described, for
`example, by S. F. Boll in “Suppression of Acoustic Noise in
`Speech using Spectral Subtraction.” IEEE Trans. on ASSP,
`pp. 113-120, 1979. These techniques have been refined over
`the years, but the basic principles of operation have
`remained the same. See, for example, U.S. Pat. No. 5,687,
`243 of McLaughlin, et al., and U.S. Pat. No. 4,811.404 of
`Vilmur, et al. Generally, these techniques make use of a
`
`microphone-based Voice Activity Detector (VAD) to deter
`mine the background noise characteristics, where “voice' is
`generally understood to include human Voiced speech,
`unvoiced speech, or a combination of Voiced and unvoiced
`speech.
`0009. The VAD has also been used in digital cellular
`systems. As an example of Such a use, see U.S. Pat. No.
`6,453.291 of Ashley, where a VAD configuration appropriate
`to the front-end of a digital cellular system is described.
`Further, some Code Division Multiple Access (CDMA)
`systems utilize a VAD to minimize the effective radio
`spectrum used, thereby allowing for more system capacity.
`Also, Global System for Mobile Communication (GSM)
`systems can include a VAD to reduce co-channel interfer
`ence and to reduce battery consumption on the client or
`subscriber device.
`0010. These typical microphone-based VAD systems are
`significantly limited in capability as a result of the addition
`of environmental acoustic noise to the desired speech signal
`received by the single microphone, wherein the analysis is
`performed using typical signal processing techniques. In
`particular, limitations in performance of these microphone
`based VAD systems are noted when processing signals
`having a low signal-to-noise ratio (SNR), and in settings
`where the background noise varies quickly. Thus, similar
`limitations are found in noise Suppression systems using
`these microphone-based VADs.
`
`FIG. 3: Flow chart of SSM sensor VAD embodi
`
`BRIEF DESCRIPTION OF THE FIGURES
`0011 FIG. 1: Overview of the Pathfinder noise suppres
`sion system.
`0012 FIG. 2: Overview of the VAD device relationship
`with the VAD algorithm and the noise Suppression algo
`rithm.
`0013)
`ment.
`0014 FIG. 4: Example of noise suppression performance
`using the SSM VAD.
`0015 FIG. 5: A specific microphone configuration
`embodiment as used with the Jawbone headset.
`0016 FIG. 6: Simulated magnitude response of a car
`dioid microphone at a single frequency.
`0017 FIG. 7: Simulated magnitude responses for Micl
`and Mic2 of Jawbone-type microphone configuration at a
`single frequency.
`0018 FIG. 1-A: Side slice view of an SSM (acoustic
`vibration sensor).
`0.019
`FIG. 2A-A: Exploded view of an SSM.
`0020 FIG. 2B-A: Perspective view of an SSM.
`0021
`FIG. 3-A: Schematic diagram of an SSM coupler.
`0022 FIG. 4-A: Exploded view of an SSM under an
`alternative embodiment.
`0023 FIG. 5-A: Representative areas of SSM sensitivity
`on the human head.
`0024 FIG. 6-A: Generic headset with SSM placed at
`many different locations.
`
`Page 23 of 31
`
`
`
`US 2006/0120537 A1
`
`Jun. 8, 2006
`
`FIG. 7-A: Diagram of a manufacturing method
`0.025
`that may be used to construct an SSM.
`0026 FIG. 8: Diagram of the magnitude response of the
`FIR highpass filter used in the DAE algorithm to increase
`intelligibility in high-noise acoustic environments.
`0027 FIG. 1-B: Perspective view of an assembled Jaw
`bone earpiece.
`0028 FIG. 2-B: Perspective view of other side of Jaw
`bone earpiece.
`FIG.3-B: Perspective view of assembled Jawbone
`0029)
`earpiece.
`0030 FIG. 4-B: Perspective Exploded and Assembled
`view of Jawbone earpiece.
`0031 FIG. 5-B: Perspective exploded view of torsional
`spring-loading mechanism of Jawbone earpiece.
`0032 FIG. 6-B: Perspective view of control module.
`0033 FIG. 7-B: Perspective view of microphone and
`sensor booty of Jawbone earpiece.
`0034 FIG. 8-B: Top view orthographic drawing of head
`set on ear illustrating the angle between the earloop and
`body of Jawbone earpiece.
`0035 FIG.9-B: Top view orthographic drawing of head
`set on ear illustrating forces on earpiece and head of user.
`0036 FIG. 10-B: Side view orthographic drawing of
`headset on ear illustrating force applied by earpiece to pinna.
`
`DETAILED DESCRIPTION
`The Pathfinder Noise Suppression System
`0037 FIG. 1 is a block diagram of the Pathfinder noise
`suppression system 100 including the Pathfinder noise Sup
`pression algorithm 101 and a VAD system 102, under an
`embodiment. It also includes two microphones MIC 1110
`and MIC 2112 that receive signals or information from at
`least one speech source 120 and at least one noise Source
`122. The path s(n) from the speech source 120 to MIC 1 and
`the path n(n) from the noise source 122 to MIC 2 are
`considered to be unity. Further, H (Z) represents the path
`from the noise source 122 to MIC 1, and H2(z) represents the
`path from the signal source 120 to MIC 2.
`0038 AVAD signal 104, derived in some manner, is used
`to control the method of noise removal, and is related to the
`noise Suppression technique discussed below as shown in
`FIG. 2. A preview of the VAD technique discussed below
`using an acoustic transducer (called the Skin Surface Micro
`phone, or SSM) is shown in FIG. 3. Referring back to FIG.
`1, the acoustic information coming into MIC 1 is denoted by
`m(n). The information coming into MIC 2 is similarly
`labeled m(n). In the Z (digital frequency) domain, we can
`represent them as M(z) and M(z). Thus
`
`(1)
`M(z)=N(z)+S(z)H, (z)
`0039. This is the general case for all realistic two-micro
`phone systems. There is always some leakage of noise into
`MIC 1, and some leakage of signal into MIC 2. Equation 1
`has four unknowns and only two relationships and, there
`fore, cannot be solved explicitly. However, perhaps there is
`
`Some way to solve for some of the unknowns in Equation 1
`by other means. Examine the case where the signal is not
`being generated, that is, where the VAD indicates voicing is
`not occurring. In this case, s(n)=S(Z)=0, and Equation 1
`reduces to
`
`where the n subscript on the M variables indicate that only
`noise is being received. This leads to
`
`M(z) = M2, (3)H (3)
`M
`
`H. (c) = E.
`
`(2)
`
`0040. Now, H (Z) can be calculated using any of the
`available system identification algorithms and the micro
`phone outputs when only noise is being received. The
`calculation should be done adaptively in order to allow the
`system to track any changes in the noise.
`0041 After solving for one of the unknowns in Equation
`1, H(z) can be solved for by using the VAD to determine
`when voicing is occurring with little noise. When the VAD
`indicates voicing, but the recent (on the order of 1 second or
`so) history of the microphones indicate low levels of noise,
`assume that n(s)=N(Z)-0. Then Equation 1 reduces to
`
`which in turn leads to
`
`M2(3)
`H; (c) =
`i
`
`This calculation for H(z) appears to be just the inverse of
`the H (Z) calculation, but remember that different inputs are
`being used. Note that H(z) should be relatively constant, as
`there is always just a single source (the user) and the relative
`position between the user and the microphones should be
`relatively constant. Use of a small adaptive gain for the
`H(Z) calculation works well and makes the calculation
`more robust in the presence of noise.
`0042 Following the calculation of H (Z) and H(Z)
`above, they are used to remove the noise from the signal.
`Rewriting Equation 1 as
`
`allows solving for S(Z)
`
`M(z) - M2(3)H (3)
`(3) =
`it
`
`(3)
`
`Page 24 of 31
`
`
`
`US 2006/0120537 A1
`
`Jun. 8, 2006
`
`Generally, H(Z) is quite Small, and H (Z) is less than unity,
`So for most situations at most frequencies
`H(z)H, (z)<<1,
`and the signal can be estimated using
`(4)
`S(z)sM(z)-M, (z)H, (z)
`Therefore the assumption is made that H(Z) is not needed,
`and H (Z) is the only transfer function to be calculated.
`While H(z) can be calculated if desired, good microphone
`placement and orientation can obviate the need for the H(Z)
`calculation.
`0043. Significant noise suppression can best be achieved
`through the use of multiple Subbands in the processing of
`acoustic signals. This is because most adaptive filters used to
`calculate transfer functions are of the FIR type, which use
`only Zeros and not poles to calculate a system that contains
`both Zeros and poles as
`
`B(3)
`H. (3) Model A(g)
`
`Such a model can be sufficiently accurate given enough taps,
`but this can greatly increases computational cost and con
`vergence time. What generally occurs in an energy-based
`adaptive filter system Such as the least-mean squares (LMS)
`system is that the system matches the magnitude and phase
`well at a small range of frequencies that contain more energy
`than other frequencies. This allows the LMS to fulfill its
`requirement to minimize the energy of the error to the best
`of its ability, but this fit may cause the noise in areas outside
`of the matching frequencies to rise, reducing the effective
`ness of the noise Suppression.
`0044) The use of subbands alleviates this problem. The
`signals from both the primary and secondary microphones
`are filtered into multiple subbands, and the resulting data
`from each subband (which can be frequency shifted and
`decimated if desired, but it is not necessary) is sent to its own
`adaptive filter. This forces the adaptive filter to try to fit the
`data in its own Subband, rather than just where the energy is
`highest in the signal. The noise-suppressed results from each
`subband can be added together to form the final denoised
`signal at the end. Keeping everything time-aligned and
`compensating for filter shifts is essential, and the result is a
`much better model to the system than the single-subband
`model at the cost of increased memory and processing
`requirements.
`0045 An example of the noise suppression performance
`using this system with an SSM VAD device is shown in
`FIG. 4. In the top plot is the original noisy acoustic signal
`402 and the SSM-derived VAD signal 404, the middle plot
`displays the SSM signal as taken on the cheek 412, and the
`bottom plot the cleaned signal after noise Suppression 422
`using the Pathfinder algorithm outline above.
`0046) More information may be found in the applications
`referenced above in the Introduction, part 1.
`0047 Microphone Configuration
`0.048. In an embodiment of the Pathfinder noise suppres
`sion system, unidirectional or omnidirectional microphones
`may be employed. A variety of microphone configurations
`
`that enable Pathfinder are shown in the references in the
`Introduction, part 2. We will examine only a single embodi
`ment as implemented in the Jawbone headset, but many
`implementations are possible as described in the references
`cited in the Introduction, so we are not so limited by this
`embodiment.
`0049. The use of directional microphones has been very
`Successful and is used to ensure that the transfer functions
`H (Z) and H(z) remain significantly different. If they are too
`similar, the desired speech of the user can be significantly
`distorted. Even when they are dissimilar, Some speech signal
`is received by the noise microphone. If it is assumed that
`H(Z)=0, then, as in Equation 4 above, even assuming a
`perfect VAD there will be some distortion. This can be seen
`by referring to Equation 3 and solving for the result when
`H(Z) is not included:
`
`This shows that the signal will be distorted by the factor
`1-H(Z)H, (Z). Therefore, the type and amount of distor
`tion will change depending on the noise environment. With
`very little noise, H (Z) is nearly Zero and there is very little
`distortion. With noise present, the amount of distortion may
`change with the type, location, and intensity of the noise
`Source(s). Good microphone configuration design mini
`mizes these distortions.
`0050. An embodiment of an appropriate microphone con
`figuration is one in which two directional microphones are
`used as shown in configuration 500 in FIG. 5. The relative
`angle f between vectors normal to the faces of the micro
`phones is in a range between 60 and 135 degrees. The
`distances d and d are each in the range of Zero (0) to 15
`centimeters, with best performance coming with distances
`between 0 and 2 cm. This configuration orients one the
`speech microphone, termed MIC1 above, toward the user's
`mouth, and the noise microphone, termed MIC2 above,
`away from the user's mouth. Assuming that the two micro
`phones are identical in terms of spatial and frequency
`response, changing the value of the angle f will change the
`overlap of the responses of the microphones. This is dem
`onstrated in FIG. 6 and FIG. 7 for cardioid microphones. In
`FIG. 6, a simulated spatial response at a single frequency is
`shown for a cardioid microphone. The body of the micro
`phone is denoted by 602, the response by 610, the null of the
`response by 612, and the maximum of the response by 614.
`In FIG. 7, the responses of two cardioid microphones are
`shown with f-90 degrees. The responses overlap, and where
`the response of Mic1 is greater than that of Mic2 the gain G
`
`M1(3)
`M2(3)
`
`is greater than 1 (730), and where the response of Mic1 is
`less than Mic2 G is less than 1 (720). Clearly as the angle
`fbetween the microphones is varied, the amount of overlap
`and thus the areas where G is greater or less than one varies
`as well. This variation affects the noise Suppression perfor
`mance both in terms of the amount of noise Suppression and
`the amount of speech distortion, and a good compromise
`between the two must be found by adjusting funtil satis
`factory performance is realized.
`
`Page 25 of 31
`
`
`
`US 2006/0120537 A1
`
`Jun. 8, 2006
`
`0051. In addition, the overlap of microphone responses
`can be induced or further changed by the addition of front
`and rear vents to the microphone mount. These vents change
`the response of the microphone by altering the delay
`between the front and rear faces of the diaphragm. Thus,
`vents can be used to alter the response overlap and thereby
`change the denoising performance of the system.
`Design Tips:
`0.052 A good microphone configuration can be difficult
`to construct. The foundation of the process is to use two
`microphones that have similar noise fields and different
`speech fields. Simply put, to the microphones the noise
`should appear to be about the same and the speech should be
`different. This similarity for noise and difference for speech
`allows the algorithm to remove noise efficiently and remove
`speech poorly, which is desired. Proximity effects can be
`used to further increase the noise/speech difference (NSD)
`when the microphones are located close to the mouth, but
`orientation is the primary difference vehicle when the micro
`phones are more than about five to ten centimeters from the
`mouth. The NSD is defined as the amount of difference in
`the speech energy detected by the microphones minus the
`difference in the noise energy in dB. NSDs of 4-6 dB result
`in both good noise Suppression and low speech distortion.
`NSDs of 0-4 dB result in excellent noise suppression but
`high speech distortion, and NSDs of 6+dB result in good to
`poor noise Suppression and very low speech distortion.
`Naturally, since the response of a directional microphone is
`directly related to frequency, the NSD will also be frequency
`dependent, and different frequencies of the same noise or
`speech may be denoised or devoiced by different amounts
`depending on the NSD for that frequency.
`0053 Another very important stipulation is that there
`should be little or no noise in Mic1 that is not detected in
`some way by Mic2. In fact, generally, the closer the levels
`(energies) of the noise in Mic1 and Mic2, the better the noise
`suppression. However, if the speech levels are about the
`same in both microphones, then speech distortion due to
`de-voicing will also be high, and the overall increase in SNR
`may below. Therefore it is crucial that the noise levels be as
`similar as possible while the speech levels are as different as
`possible. It is normally not possible to simultaneously mini
`mize noise differences while maximizing speech differences,
`so a compromise must be made. Experimentation with a
`configuration can often yield one that works reasonably well
`for noise Suppression and acceptable speech distortion.
`In Summary, the design process rules can be stated as
`follows:
`0054) 1. The noise energy should be about the same in
`both microphones
`0.055
`2. The speech energy has to be different in the
`microphones
`0056 3. Take advantage of proximity effect to maxi
`mize NSD
`0057 4. Keep the distance between the microphones as
`Small as practical
`0.058
`5. Use venting effects on the directionality of the
`microphones to get the NSD to around 4-6 dB
`0059. In the configuration above, the amount of response
`overlap, and therefore the angle between the axes of the
`
`microphones f will depend on the responses of the micro
`phones as well as mounting and venting of the microphones.
`However, a useable configuration is readily found through
`experimentation.
`0060. The microphone configuration implementation
`described above is a specific implementation of one of many
`possible implementations, but the scope of this application
`is not so limited. There are many ways to specifically
`implement the ideas and techniques presented above, and
`the specified implementation is simply one of many that are
`possible. For example, the references cited in the Introduc
`tion contain many different variations on the configuration
`of the microphones.
`0061 VAD Device
`0062) The VAD device for the Jawbone headset is based
`upon the references given in the Introduction part 3. It is an
`acoustic vibration sensor, also referred to as a speech sensing
`device, also referred to as a Skin Surface Microphone
`(SSM), and is described below. The acoustic vibration
`sensor is similar to a microphone in that it captures speech
`information from the head area of a human talker or talker
`in noisy environments. However, it is different than a
`conventional microphone in that it is designed to be more
`sensitive to speech frequencies detected on the skin of the
`user than environmental acoustic noise. This technique is
`normally only successful for a limited range of frequencies
`(normally ~100 Hz to 1000 Hz, depending on the noise
`level), but this is normally sufficient for excellent VAD
`performance.
`0063 Previous solutions to this problem have either been
`Vulnerable to noise, physically too large for certain appli
`cations, or cost prohibitive. In contrast, the acoustic vibra
`tion sensor described herein accurately detects and captures
`speech vibrations in the presence of substantial airborne
`acoustic noise, yet within a smaller and cheaper physical
`package. The noise-immune speech information provided by
`the acoustic vibration sensor can Subsequently be used in
`downstream speech processing applications (speech
`enhancement and noise Suppression, speech encoding,
`speech recognition, talker verification, etc.) to improve the
`performance of those applications.
`0064. The following description provides specific details
`for a thorough understanding of, and enabling description
`for, embodiments of a transducer. However, one skilled in
`the art will understand that the invention may be practiced
`without these details. In other instances, well-known struc
`tures and functions have not been shown or described in
`detail to avoid unnecessarily obscuring the description of the
`embodiments of the invention.
`0065 FIG. 1-A is a cross section view of an acoustic
`vibration sensor 100, also referred to herein as the sensor
`100, under an embodiment. FIG. 2A-A is an exploded view
`of an acoustic vibration sensor 100, under the embodiment
`of FIG. 1-A. FIG. 2B-B is perspective view of an acoustic
`vibration sensor 100, under the embodiment of FIG. 1-A.
`The sensor 100 includes an enclosure 102 having a first port
`104 on a first side and at least one second port 106 on a
`second side of the enclosure 102. A diaphragm 108, also
`referred to as a sensing diaphragm 108, is positioned
`between the first and second ports. A coupler 110, also
`referred to as the shroud 110 or cap 110, forms an acoustic
`
`Page 26 of 31
`
`
`
`US 2006/0120537 A1
`
`Jun. 8, 2006
`
`seal around the enclosure 102 so that the first port 104 and
`the side of the diaphragm facing the first port 104 are
`isolated from the airborne acoustic environment of the
`human talker. The coupler 110 of an embodiment is con
`tiguous, but is not so limited. The second port 106 couples
`a second side of the diaphragm to the external environment.
`0.066 The sensor also includes electret material 120 and
`the associated components and electronics coupled to
`receive acoustic signals from the talker via the coupler 110
`and the diaphragm 108 and convert the acoustic signals to
`electrical signals. Electrical contacts 130 provide the elec
`trical signals as an output. Alternative embodiments can use
`any type/combination of materials and/or electronics to
`convert the acoustic signals to electrical signals and output
`the electrical signals.
`0067. The coupler 110 of an embodiment is formed using
`materials having acoustic impedances similar to the imped
`ance of human skin (the characteristic acoustic impedance of
`skin is approximately 1.5x10 Paxs/m). The coupler 110
`therefore, is formed using a material that includes at least
`one of silicone gel, dielectric gel, thermoplastic elastomers
`(TPE), and rubber compounds, but is not so limited. As an
`example, the coupler 110 of an embodiment is formed using
`Kraiburg TPE products. As another example, the coupler 110
`of an embodiment is formed using Sylgard(R) Silicone prod
`uctS.
`0068 The coupler 110 of an embodiment includes a
`contact device 112 that includes, for example, a nipple or
`protrusion that protrudes from either or both sides of the
`coupler 110. In operation, a contact device 112 that pro
`trudes from both sides of the coupler 110 includes one side
`of the contact device 112 that is in contact with the skin
`surface of the talker and another side of the contact device
`112 that is in contact with the diaphragm, but the embodi
`ment is not so limited. The coupler 110 and the contact
`device 112 can be formed from the same or different
`materials.
`0069. The coupler 110 transfers acoustic energy effi
`ciently from skin/flesh of a talker to the diaphragm, and seals
`the diaphragm from ambient airborne acoustic signals. Con
`sequently, the coupler 110 with the contact device 112
`efficiently transfers acoustic signals directly from the talk
`er's body (speech vibrations) to the diaphragm while iso
`lating the diaphragm from acoustic signals in the airborne
`environment of the talker (characteristic acoustic impedance
`of air is approximately 415 PaxS/m). The diaphragm is
`isolated from acoustic signals in the airborne environment of
`the talker by the coupler 110 because the coupler 110
`prevents the signals from reaching the diaphragm, thereby
`reflecting and/or dissipating much of the energy of the
`acoustic signals in the airborne environment. Consequently,
`the sensor 100 responds primarily to acoustic energy trans
`ferred from the skin of the talker, not air. When placed
`against the head of the talker, the sensor 100 picks up
`speech-induced acoustic signals on the Surface of the skin
`while airborne acoustic noise signals are largely rejected,
`thereby increasing the signal-to-noise ratio and providing a
`very reliable source of speech information.
`0070 Performance of the sensor 100 is enhanced through
`the use of the seal provided between the diaphragm and the
`airborne environment of the talker. The seal is provided by
`the coupler 110. A modified gradient microphone is used in
`
`an embodiment because it has pressure ports on both ends.
`Thus, when the first port 104 is sealed by the coupler 110.
`the second port 106 provides a vent for air movement
`through the sensor 100. The second port is not required for
`operation, but does increase the sensitivity of the device to
`tissue-borne acoustic signals. The second port also allows
`more environmental acoustic noise to be detected by the
`device, but the device's diaphragm’s sensitivity to environ
`mental acoustic noise is significantly decreased by the
`loading of the coupler 110, so the increase in sensitivity to
`the user's speech is greater than the increase in sensitivity to
`environmental noise.
`0071
`FIG. 3-A is a schematic diagram of a coupler 110
`of an acoustic vibration sensor, under the embodiment of
`FIG. 1-A. The dimensions shown are in millimeters and are
`only intended to serve as an example for one embodiment.
`Alternative embodiments of the coupler can have different
`configurations and/or dimensions. The dimensions of the
`coupler 110 show that the acoustic vibration sensor 100 is
`small (5-7mm in diameter and 3-5 mm thick on average) in
`that the sensor 100 of an embodiment is approximately the
`same size as typical microphone capsules found in mobile
`communication devices. This small form factor allows for
`use of the sensor 110 in highly mobile miniaturized appli
`cations, where some example applications include at least
`one of cellular telephones, satellite telephones, portable
`telephones, wireline telephones, Internet telephones, wire
`les