throbber
Speech Communication 12 (1993) 253-259 253 North-Holland Speech enhancement using sub-band intermittent adaption E. Toner and D.R. Campbell Department of Electrical Engineering, University of Paisley, High Street, Paisley, Renfrewshire PAl 2BE, Scotland, UK Received 16 February 1993 Abstract. A sub-band multisensor structure using intermittent adaption is proposed for speech enhancement. The convergence of the proposed method is compared with conventional LMS and frequency domain LMS and a dramatic increase in convergence rate is shown using both simulated and real data. Preliminary investigation of sub-band filter order is also reported. Zusammenfassung. In diesem Artikel schlagen wir eine Mehrbandstruktur mit mehreren Gebern vor, die einen Mechanis- mus der intermittierenden Anpassung benutzt. Die Konvergenz dieser Methode wird mit der der Methode der geminderten Potenzen verglichen, die im zeitlichen und wiederholenden Bereich angewandt wird. Unsere neue Methode gew~ihrleistet auch eine wesentliche Verbesserung der Konvergenz fdr simulierte und reelle Daten. Vorergebnisse iJber die Beeinflussung in der Reihenfolge der Filter in jedem Band sind ebenfalls dargestellt. R6sum6. Dans cet article nous proposons une structure multibande a plusieurs capteurs qui utilise un m6canisme d'adaptation intermittente. La convergence de cette m&hode est compar6e avec celle de la m&hode des moindres carr6s appliqu6e dans le domaine temporel et fr6quentiel. Notre nouvelle m6thode permet d'obtenir une am61ioration tr~s importante de la convergence pour des donn6es simul6es et r6elles. Des r6sultats pr61iminaires concernant l'influence de l'ordre des filtres dans chaque bande sont aussi pr6sent6s. Keywords. Speech enhancement; adaptive processing; multi-sensor; sub-band processing I. Introduction The enhancement of speech degraded by back- ground noise by which we will mean an increase in signal-to-noise ratio (SNR), may be required to improve intelligibility for either human or ma- chine recognition. Compared with humans, mod- ern speech recognition equipment performance degrades markedly in the presence of background noise. In a recent experiment (Dabis and Wrench, 1991), the phoneme recognition rate fell from 92 to 71% correct under noise levels typical of a normal office environment (SNR = 12 dB) in which human listeners would function without problems. Some researchers have looked to the human hearing system as a source of engineering models to approach the enhancement problem, from Ghitza (1988) modelling the cochlea to recent work by Cheng and O'Shaughnessy (1991) utilis- ing a model of the lateral inhibition effect. A recurring feature in this body of work is the accepted model of the cochlea as a spectrum analyser. Single channel enhancement strategies gener- ally suffer when the noise spectrum overlaps that of the speech. Humans can function well in such circumstances, as shown by the "cocktail party" effect, which can be partly attributed to multi- sensor usage since performance degrades with sensory path damage. The existence of the "bin- aural unmasking" effect (Evans, 1982) supports the use of multiple sensors for noise reduction as well as spatial localisation, appearing functionally equivalent to Widrow's classic noise cancelling (Widrow and Stearns, 1985). Further complicat- 0167-6393/93/$06.00 © 1993 - Elsevier Science Publishers B.V. All rights reserved
`
`Exhibit 1020
`Page 01 of 07
`
`

`

`254 E. Toner, D.R. Campbell / Speech enhancement using sub-band intermittent adaption ing the enhancement problem is the non- stationarity of many everyday noise sources and the effects of room acoustics. Humans may in- voke short term adaption strategies perhaps re- lated to the effect reported by Summerfield et al. (1984), to compensate for these. The work reported here proposes a sub-band multi-sensor structure for speech enhancement which incorporates an intermittent adaptive pro- cess. I SI3eech/ [ nSise only[ micJ' Pro- cessed gnal Reconstruct Bandpass Filterbank Fig. 2. Proposed 2-mic. processing scheme. 2. Proposed scheme Two or more relatively closely spaced micro- phones may be used (Van Compernolle and Van Diest, 1989; Dabis et al., 1990; Campbell et al., 1992) to identify a differential acoustic path transfer function during a noise only period in intermittent speech. The locations of the two microphones and the noise source within a room will produce two acoustic transfer functions H~ and H 2 (see Figure 1), a function of which can be identified using an adaptive algorithm. This func- tion may then be used during the speech period (assuming short term constancy) to process the noisy speech signal. The extension of this work applies the method within a set of sub-bands provided by a filterbank as in Figure 2. The speech/noise only detector is assumed available (e.g. (Tucker, 1992)) and although not a trivial problem will not be considered further here. In the following work noise only sections were manually identified. The filter bank could be obtained by various orthogonal transforms or by a parallel filter bank approach. While the latter has considerable ad- vantage in practical implementation a readily available Fast Hartley Transform (Bracewell, [](cid:127)(cid:127)(cid:127)i riPmary R + N reference( -I E Fig. 1. Signal process configuration, after Dabis et al. (1990). 1984) was used here. For these initial trials con- stant bandwidth band-pass filters were used. The sub-band processing (SBP) could be ac- complished in a number of ways. For example, (i) Examine the noise power and if below some threshold set the processing transfer function to unity. (ii) If the noise power is significant and the noise is significantly correlated between the two channels, then perform adaptive noise cancelling. (iii) If the noise power is significant but not highly correlated between the two channels, then use the adaptive cancelling approach of Zelinski (1990). This latter option has been included since we have found the noise to exhibit different levels of correlation be- tween the two channels in different fre- quency bands. We are presently examining the last two op- tions and implementing the processing using the Least Mean Square (LMS) algorithm (Widrow and Stearns, 1985) to perform the adaption. This processing is based on the model of Fig. 1, where it is assumed for simplicity that the speaker is close enough to the microphones that room acoustic effects on the speech are insignificant and that the noise signal at the microphones may be represented as a point source modified by two different acoustic path transfer functions H~ and H 2 • Referring to Fig. 1 N, S, P, R represent the z-transforms of the noise signal, speech signal, primary signal and reference signal, respectively. Thus at the primary P = S + H1N, (2.1)
`
`Exhibit 1020
`Page 02 of 07
`
`

`

`E. Toner, D.R. Campbell / Speech enhancement using sub-band intermittent adaption 255 and at the reference R = S + H2N; (2.2) therefore E = (1 - H3)S + (H, - H3H2)N. (2.3) The noise cancelling problem is to find H 3 such that the variance Je of the error is minimised, 1 Je : -~wj ~lzl_lEE*z-I dz, (2.4) and during a noise only period S = 0, defining the noise spectral density O,,, then 1 Je = ~-~j~IzI=I(HI-H3H2) × O,,(H, - H3Hz)*Z-' dz, (2.5) which is minimised in the least squares sense when H 3 = HIH21 , (2.6) which is a transfer function that minimises the noise appearing in E. Now using H 3 as a fixed processing filter when speech and noise are present ideally yields E = (1 -H3)S, (2.7) which is a noise reduced, filtered version of the speech signal. 3. Implementation of proposed scheme down-sampling factor L (Hatty, 1990), (iii) the length of the echo signal within the correspond- ing band (Gilloire, 1987). We use the estimate given by (i), however, early indications show that it is more likely that different sub-bands may require different order filters. Once the adaption process is stopped, the sub-band adaptive filters (Hr... Hf~) are fixed and the filtering process takes place during the speech plus noise period. Preliminary results using the other possible sub-band processing are encouraging but require further investigation. 4. Results 4.1. Introduction The performance of the proposed method de- fined as multi-band LMS (MBLMS) was com- pared with the established methods of frequency domain LMS (FDLMS) (Widrow and Stearns, 1985; Lee and Un, 1986) and conventional time domain LMS (CLMS) (Widrow and Stearns, 1985) by examining the convergence of their mean square errors (mse), see Figures 3-5, respectively. The effect on mse of varying sub-band filter length within each frequency band will also be shown. Results presented are firstly those obtained using simulated room data followed by those obtained using data recorded in a real environment. The noise source for the initial test was chosen to be Bandpass filtering was performed using a Fast Hartley Transform producing a set of signals in M contiguous frequency bands allowing indepen- dent sub-band processing to be applied. The results presented here use Widrows LMS algorithm in all sub-bands. The convergence con- stant ~ of the LMS algorithm was calculated for each individual band dependent on the variance of the signal within each band (Narayan and Peterson, 1981). Some researchers estimate sub- band filter order as either (i) the length of the conventional LMS filter divided by the number of bands (Somayazulu et al., 1989), (ii) the length of the conventional LMS filter divided by the Primar Referen( ~jel i: : ! ! [ ,ota, error I rose Fig. 3. Multiband LMS (MBLMS).
`
`Exhibit 1020
`Page 03 of 07
`
`

`

`256 E. Toner, D.R. Campbell / Speech enhancement using sub-band intermittent adaption Primary N-point Discrete Hartley Transform Algorithm J'~;#'+ lerror t Camlclate ~ Fig. 4. Frequency domain LMS (FDLMS). white noise as a simple method of injecting some noise power into each band. 4.2. Simulated room results Test data was synthesized as shown in Figure 6. The impulse responses between the noise source and the two microphones were calculated by a program which simulates room acoustics using room dimensions, reflection coefficients and source/receiver locations as parameters. Realis- tic responses would be of length > 1024 at a 10 kHz sampling rate but for testing purposes a length of 256 was selected. Two synthetic micro- phone signals were then generated by convolving a white noise sequence with each of the simu- lated impulse responses to yield the primary and reference signals. These were then used as the inputs to the three adaptive noise cancelling pro- cesses to be compared. Primary Reference rrir Fig. 5. Conventional LMS (CLMS). SourceiNOise ~ Primary .~lter order 256 ~~ , Reference Fig. 6. Production of synthetic test data. The established CLMS and FDLMS methods use a single error signal in the weight update vector. The MBLMS method minimises the error signal in each frequency band. For comparison purposes the mse was evaluated from a single total error output from each configuration. The summed error signal for the multiband approach was used to calculate the mse since the individual band-limited error signals are effectively orthogo- nal. This was verified by evaluating cross-product terms. An adaptive filter length of 256 was set for CLMS to identify. In an attempt to equalise computational requirements the multiband method used a filter length of 256/M in each band (M is the number of sub-bands). The mse convergence of all three methods is shown in Figure 7. When the number of fie- 1.4-- I.C, 1£ 0.~ mse O.E 0.4 0.2 O.C M = number of sub-bands for MBLMS ........ FDLMS • ~ eLMS \ ~, ,. ~L..x" ".,... -., ........ .._. ......... ..., ,M=2 ......... - ...................... M-~:4 ~, '-..M---L6.[~ ............................... M=_8 200 400 600 800 1000 1200 1400 1600 number of points Fig. 7. MBLMS versus FDLMS versus CLMS for simulated room.
`
`Exhibit 1020
`Page 04 of 07
`
`

`

`E. Toner, D.R. Campbell / Speech enhancement using sub-band intermittent adaption 257 quency bands M= 1, the multiband method is obviously identical to CLMS. As M is increased (and adaptive filter length correspondingly de- creased) the improvement in convergence speed is dramatic. FDLMS can have faster convergence than CLMS if the reference input is coloured noise. This allows for the pre-whitening effect of FDLMS (Narayan and Peterson, 1981) to in- crease convergence speed. However, for our test data the reference input is already a white noise signal, hence FDLMS and CLMS have similar convergence performance. 4.3. Real room results The experimental set-up for recording real data was as shown in Figure 8. Two microphones were placed centrally within a reverberant room spaced approximately 40 cm apart and 1 m distant from a loudspeaker driven by a white noise generator. The room dimensions were 6 × 5 x 4 m containing the computer system as well as desks, cabinets, etc. The signal from the microphones were passed through pre-ampli- tiers and anti-aliasing filters with a cut-off fre- quency of 4 kHz and digitised at a sampling rate of 10 kHz. A filter length of 256 was assumed as an estimate of the room impulse response. The mse convergence performance of all three meth- ods is shown in Figure 9. Also shown on the graph is the adverse effect on the performance of Pre-amp. Anti-alias. filter \ \ \ \\ \ Loudspeaker -- uter li White noise mic 1 mic 2 generator Desk --1 Desk Desk Room dimensions: (6x5x4)m Fig. 8. Set-up for recording real data. 5.O M = number of sub-bands [ for MBLMS 4.o~ D = with delay mse ~£: ,..,-' ................ 2 :::F LMs .... " ..... .......... ~1"~-'" 1.0 '~'~., .... - ." " , ....... M=8 " number of points Fig. 9. MBLMS versus FDLMS versus CLMS for real room. FDLMS of a delay between the signals received by the microphones. This delay has only a slight effect on CLMS and was evaluated using cross- correlation to be 12 samples at a sampling rate of 10 kHz. Compensating this delay moves the per- formance of FDLMS closer to that of CLMS in agreement with results presented by Reed and Feintuch (1981). All multiband results are with the delay present. 4.4. Effects of varying filter order within sub-bands Varying the filter order used within each sub- band but keeping a fixed number of frequency bands reveals a trade-off between using low order filters and an increase in mse. The real recordings of Section 4.3 also yielded results for 8- and 16-frequency bands, while an adaptive filter order in each of the bands was varied from 0 to 128. To compare performance for successive filter orders the value of the mse at the 500th data point was plotted against filter order as shown in Fig. 10. The actual mse value obtained by using 256/M as an estimate of filter length within each frequency band is indicated for both cases by the dashed line. The minimum mse value attained is indicated by the dotted line. For M=8 the mse has reduced from 0.9 for order 32 to 0.75 for order 65. For M = 16 the mse has reduced from 0.73 for order 16 to 0.57 for
`
`Exhibit 1020
`Page 05 of 07
`
`

`

`258 E. Toner, D.R. Campbell / Speech enhancement using sub-band intermittent adaption 1.6 mse values obtained mse ~1 using order=256/M at !_\ // 1.2 actual minimum oataSOOth }". \-.._~/ mse values °'nt i o.8~ ":""[ I :'/~J M=8 , .... !'"~ ........... U=le o.4 Jl i!i 0.0 , * ,11 • , I , , I I ;,I ,, ,I ,, , I, ,, 0 20 40 60 80 1 O0 120 140 multiband filter order Fig. 10. Varying filter order within each sub-band. show improved performance for the multiband method. The structure being a repetition of essentially identical parallel elements has obvious attractions for possible VLSI implementation. Further work is currently underway investigat- ing different SBP within each sub-band. Future work will investigate various areas in- cluding methods of detecting noise only periods and their interaction with the processing meth- ods; metrics for selecting between processing methods; trade-offs involved in selecting number of bands, adaptive filter order, adaption speed, filterbank implementation, processing methods; cross-band effects and real-time implementation. order 70. That the minima do not occur at the points 256/M may be partly due to the fact that real room data was used and the adaptive filter of length 256 was assumed. However, the minimum for M = 16 occurs at a higher filter order than the minimum for M = 8 which suggests that the as- sumption of 256/M is too simplistic. In support of this, using noisy speech data (SNR = 8 dB) from a simulated room with high reflection coefficients (0.8) and stopping adaption after 0.1 s, a 4-band length 64 MBLMS failed to converge, however, a partly converged 4-band length 256 MBLMS yielded an SNR of 22 dB. A CLMS length 1024 had hardly started to converge yielding an SNR of 9 dB, and a CLMS length 256 had started converging yielding an SNR of 14 dB. 5. Conclusion and future work A sub-band multi-sensor structure for speech enhancement using intermittent adaptive process- ing has been proposed. The multiband structure allows various differ- ent parallel sub-band-processing (SBP) to be ap- plied within each respective frequency band and the possibility of including cross-processes like human lateral inhibition effects. For the reported work, this SBP was the LMS algorithm and was compared with CLMS and FDLMS adaptive noise cancelling in terms of mse convergence perfor- mance. Results for both real and simulated data References R.N. BraceweU (1984), "The fast Hartley transform", Proc. IEEE, Vol. 72, No. 8, pp. 1010-1018. D.R. Campbell, T.J. Moir and H.S. Dabis (1992), "Multivaria- ble polynomial matrix formulation of adaptive noise can- ceiling", Signal Processing, Vol. 26, No. 2, pp. 177-183. Y.M. Cheng and D. O'Shaughnessy (1991), "Speech enhance- ment based conceptually on auditory evidence", IEEE Trans. Signal Process., Vol. 39, No. 9, pp. 1943-1954. H.S. Dabis and A. Wrench (1991), "An evaluation of adaptive noise cancelling for speech recognition", Eurospeech '91, pp. 1301-1304. H.S. Dabis, T.J. Moir and D.R. Campbell (1990), "Speech enhancement by recursive estimation of differential trans- fer functions", Proc. ICSP '90, pp. 345-348. E.F. Evans (1982), "Basic physics and psychophysics of sound", in The Senses, ed. by H.B. Barlow and J.D. Mollon (Cam- bridge Univ. Press, Cambridge). O. Ghitza (1988), "Auditory neural feedback as a basis for speech processing", Proc. Internat. Conf. IEEE Acoust. Speech Signal Process., pp. 91-94. A Gilloire (1987), "Experiments with sub-band acoustic echo cancellers for teleconferencing", Internat. Conf. Acoust. Speech Signal Process. 87, pp. 2141-2144. B. Hatty (1990), "Recursive least squares algorithms using multirate systems for cancellation of acoustic echoes", Internat. Conf. Accoust. Speech Signal Process. '90, pp. 1145-1148. J.C. Lee and C.K. Un (1986), "Performance of transform-do- main LMS adaptive digital filters, IEEE Trans. Acoust. Speech Signal Process., Vol. 34, No. 3, pp. 499-510. S.S. Narayan and A.M. Peterson (1981), "Frequency domain least mean square algorithm", Proc. IEEE, Vol. 69, No. 1, pp. 124-126. F.A. Reed and P.L Feintuch (1981), "A comparison of LMS adaptive canceller implemented in the frequency domain
`
`Exhibit 1020
`Page 06 of 07
`
`

`

`E. Toner, D.R. Campbell / Speech enhancement using sub-band intermittent adaption 259 and the time domain", IEEE Trans. Circuits and Systems, Vol. CAS-28, No. 6, pp. 610-615. V.S. Somayazulu, S.K. Mitra and J.J. Shynk (1989), "Adaptive line enhancement using multirate techniques", Internat. Conf. Acoust. Speech Signal Process. '89, pp. 928-931. A.Q. Summerfield, M.P. Haggard, J. Foster and S. Gray (1984), "Receiving vowels from uniform spectra: Phonetic exploration of an auditory after effect", Perception and Psychophysics, Vol. 35, pp. 203-213. R. Tucker (1992), "Voice activity detection using a periodicity measure", lEE Proc.-l, Vol. 139, No. 4, August 1992. D. Van Compernolle, W. Ma and M.M. Van Diest (1989), "Speech recognition in noisy environment with the aid of microphone arrays", European Conf. on Speech Technol- ogy, Paris, Vol. 2, pp. 657-660. B. Widrow and S.D. Stearns (1985), "Adaptive Signal Process- ing (Prentice Hall, Englewood Cliffs, N J). Z.R. Zelinski (1990), "Noise reduction based on microphone array with LMS adaptive post-filtering", Electronic Letters, Vol. 26, No. 24, pp. 2036-2037.
`
`Exhibit 1020
`Page 07 of 07
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket