throbber
United States Patent [191
`Freeman et al.
`
`[54] VOICE ACTIVITY DETECTION
`Inventors: Daniel K. Freeman; Ivan Boyd, both
`[75]
`of Ipswich, England
`[73] Assignee: British Telecommunications public
`limited Company, London, England
`952,147
`[21] Appl. No.:
`Mar. 10, 1989
`[22] PCT Filed:
`PCT/GB89/00247
`[86] PCT No.:
`Aug. 15, 1990
`§ 371 Date:
`§ 102(e) Date: Aug, 15, 1990
`[87] PCT Pub. No.: WO89/08910
`PCT Pub. Date: Sep. 21, 1989
`
`Related U.S. Application Data
`[63] Continuation of Ser. No. 555,445, Aug. 15, 1990, aban(cid:173)
`doned.
`Foreign Application Priority Data
`[30]
`Mar. II, 1988 [GB] United Kingdom ................. 8805795
`Aug. 6, 1988 [GB] United Kingdom ................. 8813346
`Aug. 24, 1988 [GB] United Kingdom ................. 8820105
`[51] Int. Cl.s ................................................ Gl0L 5/00
`[52] U.S. Cl ......................................................... 395/2
`[58] Field of Search ............. 395/2; 381/71, 94, 46-50
`References Cited
`[56]
`U.S. PATENT DOCUMENTS
`4,227,046 10/1980 Nakajima et al. ..................... 381/47
`4,283,601 8/1981 Nakajima et al ...................... 381/47
`4,338,738 11/1982 Kahn ..................................... 381/94
`
`I 111111111111111111111 IIIII IIIII IIIII IIIII IIIII IIIII IIIII IIIIII Ill lllll llll
`US005276765A
`5,276,765
`[11] Patent Number:
`Jan. 4, 1994
`[45] Date of Patent:
`
`4,672,669 6/1987 DesBlache et al. ................... 381/46
`4,696,039 9/1987 Doddington .......................... 381/46
`4,731,846 3/1988 Secrest et al. ......................... 381/49
`
`OTHER PUB LI CA TIO NS
`Rabiner et al., "Application of an LPC Distance Mea(cid:173)
`sure to the Voiced-Unvoiced-Silence Detection Prob(cid:173)
`lem", IEEE Trans. on·ASSP, vol. ASSP-25, No. 4,
`Aug. 1977, pp. 338-343.
`McAulay, "Optimum Speech Classification and Its Ap(cid:173)
`plication to Adaptive Noise Cancellation", 1977 IEEE
`ICASSP, Hartford, CN, May 9-11, 1977, pp. 425-428.
`Un, "Improving LPC Analysis of Noisy Speech by
`Autocorrelation Subtraction Method", ICASSP '81,
`Atlanta, GA, Mar. 30, 31, Apr. 1981, pp. 1082-1085.
`Primary Examiner-David D. Knepper
`Attorney, Agent, or Firm-Nixon & Vanderhye
`ABSTRACT
`[57]
`Voice activity detector (VAD) for use in an LPC coder
`in a mobile radio system uses autocorrelation coefficient
`Ro, R1 ... of the input signal, weighted and combined,
`to provide a measure M which depends on the power
`within that part of the spectrum containing no noise,
`which is thresholded against a variable threshold to
`provide a speech/no speech logic output. The measure
`is formula (I), where H;are the autocorrelation coeffici(cid:173)
`ents of the impulse response of an Nth order FIR in(cid:173)
`verse noise filter derived from LPC analysis of previous
`non-speech signal frames. Threshold adaption and coef(cid:173)
`ficient update are controlled by a second V AD re(cid:173)
`sponse to rate of spectral change between frames.
`
`23 Claims, 3 Drawing Sheets
`
`30
`
`29
`
`THRESHQD ADAPTER
`
`21
`
`20
`,.... _J ______ ---------------- ---,
`I.!::::===;-, 22 23
`I
`~~ECH
`I
`I
`I
`•
`SIGNAL
`i
`I
`I
`I
`I
`I
`I
`I
`I ~ - - - - - ,
`I
`I
`I
`PITCH ANALYSIS 1 - - - - - - - - - '
`I
`..._ ___ ___,,-27
`I
`L----------------------------~
`
`L PC ANALYSIS
`._ ___ ..,--;,J._,____.
`
`Page 1 of 10
`
`GOOGLE EXHIBIT 1022
`
`

`

`U.S. Patent
`
`Jan. 4, 1994
`
`Sheet 1 of 3
`
`5,276,765
`
`s
`
`1
`
`ADC
`
`ACF
`
`LPC
`COEFFICIENTS
`OF NOISE
`
`3
`r-----,_ __ LPC COEFFICIENTS
`2
`t----~LPC ANALYSIS
`Lj FOR SPEECH
`CODING
`
`------
`
`ACF
`
`4
`
`14
`
`AUTOCORRELATION COEFFICIENTS
`Ri
`
`X
`
`5
`
`6
`
`LPC
`ANALYSIS
`
`13
`
`8
`.,__ __ .r,SPEECH / NON SPEECH
`LOGIC OUTPUT
`
`ADC
`
`12
`
`F/61.
`
`11
`
`N
`
`Page 2 of 10
`
`

`

`U.S. Patent
`
`Jan. 4, 1994
`
`Sheet 2 of 3
`
`5,276,765
`
`2
`
`1
`
`__..,.___ ____ _
`
`3
`
`LPC COEFFICIENTS Li
`
`ADC
`
`LP C ANALYSIS
`
`ACF
`
`4
`
`14
`
`16
`
`'--.r-"----"-..___.~-
`
`Ri
`
`X +
`
`5,6
`
`1 NOISE 1
`VALUES
`
`BUFFER
`
`15
`
`7
`
`FIG.2.
`
`8
`
`SPEECH/NONSPEECH
`OUTPUT
`
`Page 3 of 10
`
`

`

`U.S. Patent
`
`Jan. 4, 1994
`
`Sheet 3 of 3
`
`5,276,765
`
`2
`
`1
`L>--- ADC
`INPUT
`
`3
`
`14
`
`Aj
`
`INVERSE FILTER
`ANALYSIS
`
`4
`
`5,6
`x+
`
`15
`
`BUFFER
`
`30
`
`B
`
`SPEECH/NON
`SPEECH OUTPUT
`
`20
`
`THRESHOLD ADAPTER
`
`29
`r _J ______ ---------------- ---,
`~~~ECH
`I
`--=---=---=--=--=-..:::-
`I
`22 23
`i ___ __.___
`I
`•
`I
`SIGNAL
`I
`~~
`I
`I
`,_ ~
`
`21
`
`L PC ANALYSIS
`
`-==-•
`
`I
`I
`I
`. - - - - - -
`I
`I
`---------
`I
`I
`I
`PITCH ANALYSIS
`.,_____________
`I
`i..,_ ___ ___,jr--27
`I
`L----------------------------~
`FIG. 3.
`
`- - .L
`- -
`
`Page 4 of 10
`
`

`

`1
`
`VOICE ACTIVITY DETECTION
`
`5,276,765
`
`This is a continuation of application Ser.
`07/555,445, filed Aug. 15, 1990, now abandoned.
`
`No.
`
`2
`DETAILED DESCRIPTION OF THE
`DRAWINGS
`The general principle underlying a first Voice Activ-
`5 ity Detector according to the a first embodiment of the
`invention is as follows.
`A frame of n signal samples
`
`s' = (S()),
`(s1 + hoso),
`(sz + hos1 + hi.Ill),
`(s3 + hosi + h1s1 -+ h2.llJ),
`(s4 + /ios3 + h1s2 + h2s1 + h1so),
`(s5 + hos4 + h1s3 + h2sz + h3S1),
`(S(, + /ios5 + h1s4 + h2s3 + h3S2),
`(.i; ... )
`
`n-1
`R'o = . I
`0
`I=
`
`(s';)2
`
`and this is therefore a measure of the power of the no(cid:173)
`tional filtered signal s'-in other words, of that part of
`the signal s which falls within the passband of the no(cid:173)
`tional filter.
`Expanding, neglecting the first 4 terms,
`
`BACKGROUND OF THE INVENTION
`A voice activity detector is a device which is supplied
`with a signal with the object of detecting periods of
`speech, or periods containing only noise. Although the 10
`present invention is not limited thereto, one application
`of particular interest for such detectors is in mobile
`radio telephone systems where the knowledge as to the
`presence or otherwise of speech can be used and ex- 15
`ploited by a speech coder to improve the efficient utili-
`The zero order autocorrelation coefficient is the sum
`sation of radio spectrum, and where also the noise level
`(from a vehicle-mou?ted u~i9 is likely_ to ~ high.
`of each term squared, which may be normalized i.e.
`divided by the total number of terms (for constant frame
`The essen~e o~ vmce act1v1~y detection is to locate a
`measure which ?1ffers appreciably betw~en ~peech and 20 lengths it is easier to omit the division); that of the fil-
`non-speech penods. In apparatus which includes a
`tered signal is thus
`speech coder, a number of parameters are readily avail-
`able from one or other stage of the coder, and it is there(cid:173)
`fore desirable to economise on processing needed by
`utilising some such parameter. In many environments, 25
`the main noise sources occur in known defined areas of
`the frequency spectrum. For example, in a moving car
`much of the noise (e.g., engine noise) is concentrated in
`the low frequency regions of the spectrum. Where such
`knowledge of the spectral position of noise is available, 30
`it is desirable to base the decision as to whether speech
`is present or absent upon measurements taken from that
`portion of the spectrum which contains relatively little
`noise. It would, of course, be possible in practice to 35
`pre-filter the signal before analysing to detect speech
`activity, but where the voice activity detector follows
`the output of a speech coder, prefiltering would distort
`the voice signal to be coded.
`
`R'o = (s4 + /ios3 + h1s1 + h2s1 + h3S())2
`+ (ss + hos4 + h1s3 + h2sz + h3St)2
`+ ...
`= s42 + hos4s3 + h1s4si + h2.s+r1 + h3S4SQ
`+ h()l'4S3 + ho2so2 + hoh1s3S2 + hoh2s3S1 + hoh3S3.llJ
`+ h1S4SZ + hohtS3S2 + h1 2sz2 + h1h2s2s1 + hth3S2SQ
`+ h2s4S1 + hoh1s3S1 + h1h2s2s1 + hi2sz2 + h2h3StSO
`+ h3S4SQ + hoh3S3.llJ + h1h3S2so + h2h3S1SO + h32sz0
`+ ...
`= Ro(! + ho2 + h1 2 + hi + h32)
`+ R1(2ho + 2hoh1 + 2h1h2 + 2h2h3)
`+ R2(2h1 + 2h1h3 + 2hoh2)
`+ R3(2h2 + 2hoh3)
`+ R4(2h3)
`
`40
`
`SUMMARY OF THE INVENTION
`According to the invention there is provided a voice
`activity detection apparatus comprising means for re(cid:173)
`ceiving an input signal, means for periodically adap-
`tively generating an estimate of the noise signal compo- 45
`So R'o can be obtained from a combination of the
`autocorrelation coefficients R;, weighted by the brack-
`nent of the input signal, means for periodically forming
`a measure M of the spectral similarity between a portion
`eted constants which determine the frequency band to
`which the value of R'o is responsive. In fact, the brack-
`of the input signa! and the noise sign~) component,
`eted terms are the autocorrelation coefficients of the
`means for companng a parameter denved from the
`measure M with a threshold value T, and means for 50 impulse response of the notional filter, so that the ex-
`producing an output to indicate the presence or absence
`pression above may be simplified to
`of speech in dependence upon whether or not that value
`is exceeded.
`Preferably, the measure is the Itakura-Saito Distor-
`. M
`t1on
`easure.
`
`N
`R'o = RoHo + 2 I R;H;,
`i=I
`
`55
`
`(I)
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`Other aspects of the present invention are as defined
`in the claims.
`Some embodiments of the invention will now be
`described, by way of example, with reference to the
`accompanying drawings, in which:
`FIG. 1 is a block diagram ofa first embodiment of the
`invention;
`FIG. 2 shows a second embodiment of the invention;
`FIG. 3 shows a third, preferred embodiment of the
`invention.
`
`60
`
`65
`
`where N is the filter order and H; are the (un-normal(cid:173)
`ised) autocorrelation coefficients of the impulse re(cid:173)
`sponse of the filter.
`In other words, the effect on the signal autocorrela(cid:173)
`tion coefficients of filtering a signal may be simulated by
`producing a weighted sum of the autocorrelation coeffi(cid:173)
`cients of the (unfiltered) signal, using the impulse re(cid:173)
`sponse that the required filter would have had.
`Thus, a relatively simple algorithm, involving a small
`number of multiplication operations, may simulate the
`effect of a digital filter requiring typically a hundred
`times this number of multiplication operations.
`
`Page 5 of 10
`
`

`

`N
`M = Ro,4o + 2 l: R;A;,
`i=I
`
`15
`
`5,276,765
`
`3
`This filtering operation may alternatively be viewed
`as a form of spectrum comparison, with the signal spec(cid:173)
`trum being matched against a reference spectrum (the
`inverse of the response of the notional filter). Since the
`notional filter in this application is selected so as to
`approximate the inverse of the noise spectrum, this
`operation may be viewed as a spectral comparison be(cid:173)
`tween speech and noise spectra, and the zeroth autocor(cid:173)
`relation coefficient thus generated (i.e. the energy of the
`inverse filtered signal) as a measure of dissimilarity
`between the spectra. The Itakura-Saito distortion mea(cid:173)
`sure is used in LPC to assess the match between the
`predictor filter and the input spectrum, and in one form
`is expressed as
`
`4
`is converted to a digital input sample train by AD con(cid:173)
`verter 12 and LPC analysed by a second LPC analyser
`13. The "noise" LPC coefficients produced from analy(cid:173)
`ser 13 are passed to correlator unit 14, and the autocor-
`5 relation vector thus produced is multiplied term by
`term with the autocorrelation coefficients Ri of the
`input signal from the speech microphone in multiplier 5
`and the weighted coefficients thus produced are com(cid:173)
`bined in adder 6 according to Equation 1, so as to apply
`10 a filter having the inverse shape of the noise spectrum
`from the noise-only microphone (which in practice is
`the same as the shape of the noise spectrum in the signal(cid:173)
`plus-noise microphone) and thus filter out most of the
`noise. The resulting measure Mis thresholded by thre(cid:173)
`sholder 7 to produce a logic output 8 indicating the
`presence or absence of speech; if M is high, speech is
`deemed to be present.
`This embodiment does, however, require two micro-
`.
`.
`phones and two LPC analysers, which adds to the ex-
`where Ao etc are the aut~correlat1on coeffi~1e~ts of the
`~p~ parameter s~t. It '."'ill b~ seen that this is clo~elr 20 pense and complexity of the equipment necessar .
`_
`similar to the relat10nsh1p denved_ above, and when 1t 1s
`Alternatively another · embodim nt uses
`y
`remembered that the LPC coefficients are the taps of an
`.
`'
`.
`e
`a c~rre
`spondmg ~easu~e formed usmg the autocorrela~10ns
`FIR filter having the inverse spectral response of the
`from the nm~e m~crophone 11 and the LPC coefficients
`input signal so that the LPC coefficient set is the im-
`pulse response of the inverse LPC filter, it will be appar- 25 from the mam microphone 1, so that_ an extra autocor-
`relator rather than an LPC analyser 1s necessary.
`ent that the ltakura-Saito Distortion Measure is an fact
`merely a form of equation 1, wherein the filter response
`_T~ese_ embodime?ts are theref?re a~le to ?perate
`H is the inverse of the spectral shape of an all-pole
`w1thm d~fferent e?v1!onments ~avmg _nmse at d1ffe~ent
`model of the input signal.
`frequencies, or w1thm a changing nmse spectrum m a
`In fact, it is also possible to transpose the spectra, 30 given environment.
`Referring to FIG. 2, in the preferred embodiment of
`using the LPC coefficients of the test spectrum and the
`the invention, there is provided a buffer 15 which stores
`autocorrelation coefficients of the reference spectrum,
`to obtain a different measure of spectral similarity.
`a set of LPC coefficients (or the autocorrelation vector
`The 1-S Distortion measure is further discussed in
`of the set) derived from the microphone input 1 in a
`"Speech Coding based upon Vector Quantisation" by A 35 period identified as being a "non speech" (i.e. noise
`Buzo, A H Gray, R M Gray and J D Markel, IEEE
`only) period. These coefficients are then used to derive
`Trans on ASSP, Vol ASSP-28, No 5, October 1980.
`a measure using equation 1, which also of course corre-
`Since the frames of signal have 011:Iy a finite length,
`sponds to the Itakura-Saito Distortion Measure, except
`and a number of terms (N, where N 1s the filter order)
`that a single stored frame of LPC coefficients corre-
`are n~gl~cted, the above res~l~ is an ap~ro~imation 40 sponding to an approximation of the inverse noise spec-
`trum is used, rather than the present frame of LPC
`only; 1t gives, however, a surpnsmgly good md1cator of
`the presence or absence of speech and thus may be used
`coefficients.
`as a measure !"fin speech ~etection. In an envir~nment
`The LPC coefficient vector L;output by analyser 3 is
`'."'~ere t~e no1s~ spectru~ 1s well known and stationary,
`also routed to a correlator 14, which produces the auto-
`it 1s q~lte possible to si~ply empl~y fixed ho, h1 etc 45 correlation vector of the LPC coefficient vector. The
`buffer memory is is controlled by the speech/non-
`coefficients to model the m:,erse noise filter.
`.
`~owev~r, appara~us which can ~dapt to different
`speech output of thresholder 7, in such a way that dur-
`nmse en".ironments is m~ch more wide!~ useful.
`.
`ing "speech" frames the buffer retains the "noise" auto-
`,, ti
`Referrmg to FIG. l, m a first embodiment, a signal
`ffi •
`t b t d •
`..

`1 t'
`•
`h
`(
`h
`) .
`. d
`.
`corre a 10n coe 1c1en s, u
`unng nmse
`rames a
`f
`rom a m1crop one not s own 1s receive at an mput 50
`t f LPC
`ffi •
`t
`b
`d
`pd
`h
`new se O
`1 and converted to digital samples s at a suitable sam-
`coe icien s ~ay e ~se
`to u . ate ~ e
`piing rate by an analogue to digital converter 2. An
`buffer, for example by a multiple ~witch 16, via which
`LPC analysis unit 3 (in a known type of LPC coder)
`~utputs of t?e correlator 14, carrying each autocorrel~-
`t1on coeffi_c1ent, are connected to the buffer 15 .. •~ will
`then derives, for successive frames of n (e.g. 160) sam-
`pies, a se~ of N (e.g. 8 or 12) LPC filter coefficients L; 55 be appreciated that correlator 14 could be pos1t1?~ed
`after buffe~ lS. Further, the speech/no-speech dec1S1on
`which are transmitted to represent the input speech.
`The speech signal s also enters a correlator unit 4 (nor-
`for coefficient update ne~d not be from ~utput 8, but
`mally part of the LPC coder 3 since the autocorrelation
`cou!d be (and prefera~ly 1s) ~therw1se denved.
`vector Riofthe speech is also usually produced as a step
`Smee fre~uent peno~s without speech occur,· the
`in the LPC analysis although it will be appreciated that 60 LPC coefficients stored m the buffer are updated from
`a separate correlator could be provided). The correlator
`time to time, so that the apparatus is thus capable of
`4 produces the autocorrelation vector Ri, including the
`tracking changes in the noise spectrum. It will be appre-
`zero order correlation coefficient Ro and at least 2 fur-
`ciated that such updating of the buffer may be necessary
`ther autocorrelation coefficients R1, R2, R3. These are
`only occasionally, or may occur only once at the start of
`then supplied to a multiplier unit 5.
`65 operation of the detector, if (as is often the case) the
`A second input 11 is connected to a second micro-
`noise spectrum is relatively stationary over time, but in
`phone located distant from the speaker so as to receive
`a mobile radio environment frequent updating is pre-
`only background noise. The input from this microphone
`ferred.
`
`Page 6 of 10
`
`

`

`5,276,765
`
`5
`In a modification of this embodiment, the system
`initially employs equation 1 with coefficient terms cor(cid:173)
`responding to a simple fixed high pass filter, and then
`subsequently starts to adapt by switching over to using
`"noise period" LPC coefficients. If, for some reason,
`speech detection fails, the system may return to using
`the simple high pass filter.
`It is possible to normalise the above measure by divid(cid:173)
`ing through by Ro, so that the expression to be thre(cid:173)
`sholded has the form
`
`6
`periods of noise; the degree of variation (as illustrated
`by the standard deviation) is also higher, and less inter(cid:173)
`mittently variable.
`It is noted that the standard deviation of the standard
`5 deviation of M is also a reliable measure; the effect of
`taking each standard deviation is essentially to smooth
`the measure.
`In this second form of Voice Activity Detector, the
`measured parameter used to decide whether speech is
`10 present is preferably the standard deviation of the ltaku(cid:173)
`ra-Saito Distortion Measure, but other measures of vari(cid:173)
`ance and other spectral· distortion measures (based for
`example on FFT analysis) could be employed.
`It is found advantageous to employ an adaptive
`This measure is independent of the total signal energy in l5 threshold in voice activity detection. Such thresholds
`a frame and is thus compensated for gross signal level
`must not be adjusted during speech periods or the
`speech signal will be thresholded out. It is accordingly
`changes, but gives rather less marked contrast between
`"noise" an? "speech''. levels _and is hence preferably not
`necessary to control the threshold adapter using a
`.
`employed m h1gh-no1~e environments..
`speech/non-speech control signal, and it is preferable
`Instead of employmg LPC analysis to denve the 20 that this control signal should be independent of the
`invers~ filte~ coefficients oft~e noise sign~) (from ~ither
`output of the threshold adapter. The threshold T is
`the. noise micr?phone or n?ise only pei:i~s, as _m the
`adaptively adjusted so as to keep the threshold level just
`above the level of the measure M when noise only is
`vanous em~od1ments ~escnbed above)'. it is possible_ to
`model the mverse _noise spe~trum usmg an adaptive
`present. Since the measure will in general vary ran-
`filter ofkno~n type, as the noise s~ctrum changes ~nly
`domly when noise is present, the threshold is varied by
`determining an average level over a number of blocks,
`slowlr (as discussed below) a relat1vely_slow coefficient
`adaption ra~e commo~ for such filters IS acceptable. In
`and setting the threshold at a level ro
`rtional t
`th'
`O
`one embodiment, which corresponds to FIG. 1, LPC
`.
`.
`. P. po
`is
`analysis unit 13 is simply replaced by an adaptive filter
`a~erage. In a noisy environment this 1s not usually suffi-
`(for example a transversal FIR or lattice filter), con- 30 c1e~t, _however, and so an assessment of the de~ree of
`nected so as to whiten the noise input by modelling the
`vanat1~m of the parameter over several blocks IS also
`inverse filter, and its coefficients are supplied as before
`taken mto account.
`.
`to autocorrelator 14.
`The thres~old value T 1s therefore preferably calcu-
`In a second embodiment, corresponding to that of
`lated accordmg to
`FIG. 2, LPC analysis means 3 is replaced by such an 35
`adaptive filter, and buffer means 15 is omitted, but
`switch 16 operates to prevent the adaptive filter from
`where M' is the average value of the measure over a
`adapting its coefficients during speech periods.
`A second Voice Activity Detector for use with an-
`number of consecutive frames, dis the standard devia-
`other embodiment of the invention will now be de- 40 tion of the measure over those frames, and K is a con-
`scribed.
`stant (which may typically be 2).
`From the foregoing, it will be apparent that the LPC
`In practice, it is preferred not to resume adaptation
`coefficient vector is simply the impulse response of an
`immediately after speech is indicated to be absent, but to
`FIR filter which has a response approximating the in-
`wait to ensure the fall is stable (to avoid rapid repeated
`verse spectral shape of the input signal. When the Itaku- 45 switching between the adapting and non-adapting
`ra-Saito Distortion Measure between adjacent frames is
`states).
`formed, this is in fact equal to the power of the signal, as
`Referring to FIG. 3, in a preferred embodiment of the
`invention incorporating the above aspects, an input 1
`filtered by the LPC filter of the previous frame. So if
`spectra of adjacent frames differ little, a correspond-
`receives a signal which is sampled and digitised by
`ingly small amount of the spectral power of a frame will 50 analogue to digital converter (ADC) 2, and supplied to
`the input of an inverse filter analyser 3, which in prac-
`escape filtering and the measure will be low. Corre-
`tice is part of a speech coder with which the voice
`spondingly, a large interframe spectral difference pro-
`activity detector is to work, and which generates coeffi-
`duces a high Itakura-Saito Distortion Measure, so that
`cients L; (typically 8) of a filter corresponding to the
`the measure reflects the spectral similarity of adjacent
`frames. In a speech coder, it is desirable to minimise the 55 inverse of the input signal spectrum. The digitised signal
`is also supplied to an autocorrelator 4, (which is part of
`data rate, so frame length is made as long as possible; in
`analyser 3) which generates the autocorrelation vector
`other words, if the frame length is Jong enough, then a
`R; of the input signal (or at least as many low order
`speech signal should show a significant spectral change
`from frame to frame (ifit does not, the coding is redun-
`terms as there are LPC coefficients). Operation of these
`dant). Noise, on the other hand, has a slowly varying 60 parts of the apparatus is as described in FIGS. 1 and 2.
`Preferably, the autocorrelation coefficients R; are then
`spectral shape from frame to frame, and so in a period
`where speech is absent from the signal then the ltakura-
`averaged over several successive speech frames (typi-
`cally 5-20 ms long) to improve their reliability. This
`Saito Distortion Measure will correspondingly be
`low-since applying the inverse LPC filter from the
`may be achieved by storing each set of autocorrelations
`65 coefficients output by autocorrelator 4 in a buffer 4a,
`previous frame "filters out" most of the noise power.
`and employing an averager 4b to produce a weighted
`Typically, the Itakura-Saito Distortion Measure be-
`tween adjacent frames of a noisy signal containing inter-
`sum of the current autocorrelation coefficients R; and
`mittent speech is higher during periods of speech than
`those from previous frames stored in and supplied from
`
`25
`
`T=M+K-d
`
`Page 7 of 10
`
`

`

`Ra;A;
`M=Ao+2IRo,
`
`10
`
`5,276,765
`
`7
`buffer 4a. The averaged autocorrelation coefficients
`Ra; thus derived are supplied to weighting and adding
`means 5,6 which receives also the autocorrelation vec(cid:173)
`tor A; of stored noise-period inverse filter coefficients
`L; from an autocorrelator 14 via buffer 15, and forms
`from Ra; and A; a measure M preferably defined as:
`
`8
`which is "true" when voiced speech is detected, and
`this signal, together with the threshold measure derived
`from thresholder 26 (which will generally be "true"
`when unvoiced speech is present) are supplied to the
`5 inputs of a NOR gate 28 to generate a signal which is
`"false" when speech is present and "true" when noise is
`present. This signal is supplied to buffer 15 (or to in(cid:173)
`verse filter analyser 3) so that inverse filter coefficients
`L; are only updated during noise periods.
`Threshold adapter 29 is also connected to receive the
`This measure is then thresholded by thesholder 7
`non-speech signal control output of control signal gen-
`erator circuit 20. The output of the threshold adapter 29
`against a threshold level, and the logical result provides
`is supplied to thresholder 7. The threshold adapter op-
`an indication of the presence or absence of speech at
`output 8.
`erates to increment or decrement the threshold in steps
`In order that the inverse filter coefficients L; corre- 15 which are a proportion of the instant threshold value,
`spond to a fair estimate of the inverse of the noise spec-
`until the threshold approximates the noise power level
`trur_n, it is des!rable to update these coefficients du~ng
`(which may conveniently be derived from, for example,
`pe~ods of noise (and, ?f course, not to update dunng
`weighting and adding circuits 22, 23). When the input
`penods of speech). It 1~,. however, _Preferable th~t t~e
`signal is very low, it may be desirable that the threshold
`speech/non-speech dec1S1on on which the updatm¥ is 20 is automatically set to a fixed, low, level since at the low
`based doe~ not depend UJ?On t?e result of the _updatmg,
`signal levels the effect of signal quantisation produced
`by ADC 2 can produce unreliable results.
`or els~ a smgle _wron~l~ identified frame of signal ~ay
`~.esult m the v?,1ce activity detect?r su?s:quently go!ng
`There may be further provided "hangover" generat-
`out of lock
`and wrongly ide~tifymg_ followmg
`ing means 30, which operates to measure the duration of
`fram~s. Preferably: the~efo~e, there is I?rovided a con- 25 indications of speech after thresholder 7 and, when the
`trol signal generatmg c1rcu1t 20, effectively a separate
`re
`of
`h h
`· d'
`t d .,
`'od ·
`be
`~s
`e~ m ica e
`,or a pen ~
`voice activity detector, which forms an independent
`P sence
`speec
`control signal indicating the presence or absence of
`excess _of a predeterm~~ed t1me :?ns~nt, the o~tput 1s
`speech to control inverse filter analyser 3 (or buffer 15)
`h~ld ~igh for a sh~rt hangover penod. In this wa~,
`so that the inverse filter autocorrelation coefficients A; 30 chppmg of the mid~Je of lo"'.-level spe~ch bursts 1s
`used to form the measure M are only updated during
`avoided, an_d ap~ropnate selection of the time constant
`"noise" periods. The control signal generator circuit 20
`prevents_ tnggenng_ of th~ hangover gen~rat?r 30 by
`includes LPC analyser 21 (which again may be part of
`short spikes _of n01se which are fal~ely md1cated as
`a speech coder and, specifically, may be performed by
`speech. It 'Yill of course be apprec1at~ that ~I the
`analyser 3), which produces a set of LPC coefficients 35 above funct1o~s _may be ex~uted by a smgle sw!8~ly
`M;corresponding to the input signal and an autocorrela-
`p~ogrammed d_1gital process!ng means such as a Digital
`tor 21a (which may be performed by autocorrelator 3a)
`Sign~! Processmg (D~~) chip, as part ~fan LPC c<;>dec
`which derives the autocorrelation coefficients B;. of M;.
`thus implemented (this 1s the preferred 1mplementat10n),
`If analyser 21 is performed by analyser 3, then M;=L;
`or as a suitably_ pro_grammed ~icrocomputer or. mi-
`and B;=A;. These autocorrelation coefficients are then 40 crocontroller chip with an associated memory device.
`supplied to weighting and adding means 22, 23 (equiva-
`. Conveniently, as d~ribed above, the voice detec-
`lent to 5, 6) which receive also the autocorrelation vec-
`tion apparatus may be implemented as part of an LPC
`tor R;of the input signal from autocorrelator 4. A mea-
`codec. Alter:natively, where autocorrelati~n coeffici-
`sure of the spectral similarity between the input speech
`ents of the signal or related measures (partial correla-
`frame and the preceding speech frame is thus calcu- 45 tion, or "parcor", coefficients) are transmitted to a dis-
`lated; this may be the Itakura-Saito distortion measure
`tant station the voice detection may take place distantly
`between R;ofthe present frame and B;ofthe preceding
`from the codec.
`frame, as disclosed above, or it may instead be derived
`I claim:
`1. Voice activity detection apparatus comprising:
`by calculating the Itakura-Saito distortion measure for
`R; and B; of the present frame, and subtracting (in sub- 50
`(i) means for receiving an electrical input signal in
`tractor 25) the corresponding measure for the previous
`which the presence or absence of signals represent-
`frame stored in buffer 24, to generate a spectral differ-
`ing speech is to be detected;
`ence signal (in either case, the measure is preferably
`(ii) means responsive to said means for receiving for
`energy-normalised by dividing by Ro). The buffer 24 is
`periodically adaptively generating an electrical
`then, of course, updated. This spectral difference signal, 55
`signal representing an estimated noise signal com-
`when thresholded by a thresholder 26 is, as discussed
`ponent of the input signal by producing the auto-
`above, an indicator of the presence or absence of
`correlation coefficients A; of the impulse response
`speech. We have found, however, that although this
`of a FIR filter having a response approximating the
`measure is excellent for distinguishing noise from un-
`inverse of the short term spectrum of the noise
`voiced speech (a task which prior art systems are gener- 60
`signal component;
`(iii) means responsive to said means for receiving for
`ally incapable ot) it is in general rather less able to dis-
`tinguish noise from voiced speech. Accordingly, there
`periodically forming from the input signal and the
`is preferably further provided within circuit 20 a voiced
`estimated noise representing signal an electrical
`speech detection circuit comprising a pitch analyser 27
`signal representing a measure M of the spectral
`(which in practice may operate as part of a speech 65
`similarity between a portion of the input signal and
`coder, and in particular may measure the long term
`the said estimated noise signal component, said
`predictor lag value produced in a multipulse LPC
`measure forming means comprises means for pro-
`coder). The pitch analyser 27 produces a logic signal
`ducing electrical signals representing the autocor-
`
`Page 8 of 10
`
`

`

`1c
`
`5,276,765
`
`M=Ro,4o+2IR;A;.
`
`6. Apparatus according to claim 1 or 4, in which
`
`9
`10
`relation coefficients R; of the input signal, and
`tween a portion of the input signal and earlier portions
`means connected to receive R; and A; signals, and
`of the input signal.
`14. Apparatus according to claim 13 in which the
`to calculate the measure M therefrom; and
`similarity measure generating means of said second
`(iv) electrical means responsive to said means for
`forming for comparing the electrical signals repre- 5 voice activity detection means comprises means for
`providing, from LPC filter data and autocorrelation
`senting said measure with a threshold value repre-
`senting signal to produce an electrical output indi-
`data relating to a present portion of the input signal, a
`catin~ th~ prese~ce or absence of speech in the
`present distortion measure; means for providing an
`electncal mput si~al.
`.
`. .
`equivalent past frame distortion measure corresponding
`2: Apparatus accordmg ~o claim 1, further c~mpi:ismg 10 to a preceding portion of the input signal, and means for
`a!1 mput_ a~ranged t? receive .a second el~ctncal mp~t
`generating a signal indicating the degree of similarity
`signal, s_1milar!y subJect to n~1se, from which speech 1s
`therebetween as an indicator of speech
`resence or
`absent: m which the ~~neratmg means compnse LPC
`absence.

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket