throbber
1586
`
`PROCEEDINGS OF THE IEEE, VOL. 67, NO. 12, DECEMBER 1979
`
`Enhancement and Bandwidth Compression
`of Noisy Speech
`
`has been considerable
`the past several years there
`Aktmct-Over
`problem of enhancement and bandwidth
`attention focused on the
`compression of speech degraded by additive background noise. This
`interest is motivated by several factors including a broad set of impor-
`tant applications, the apparent lack of robustness in current speech-
`compression systems
`and the development of
`several potentially
`solutions. One objective of this paper is to
`promising and practical
`provide an overview of the variety of techniques that have been pro-
`posed for enhancement and bandwidth compression of speech degraded
`by additive background noise. A second objective is to suggest a uni-
`fying framework in terms of which the relationships between these
`systems is more visible and which hopefully provides a structure which
`wiU suggest fruitfhl directions for further research.
`
`I. INTRODUCTION
`HERE ARE a wide variety of contexts in which it is
`desired to enhance speech. The objective of enhance-
`ment may perhaps be to improve the overall quality, to
`increase intelligibility, to reduce listener fatigue, etc. Depend-
`ing on the specific application, the enhancement system may
`be directed
`at only one of these objectives or several. For
`example, a speech communication system
`may introduce a
`low-amplitude long-time delay echo or a narrow-band additive
`disturbance. While these degradations may not by themselves
`reduce intelligibility for the purposes for which the channel
`is used, they are generally objectionable and an improvement
`in quality perhaps even at the expense of some intelligibility
`may be
`desirable. Another example
`is the communication
`between a pilot and an
`air traffic control tower. In this
`environment, the speech is typically degraded by background
`noise. Of central importance is the intelligibility of the speech
`and it would generally be acceptable to sacrifice quality if the
`intelligibility could be improved.
`Even with normal unde-
`graded speech, it is sometimes useful or desirable to provide
`enhancement. As a simple example high-pass filtering of nor-
`mal speech is often used to introduce a “crispness” which is
`generally perceived as an improvement in quality.
`The speechenhancement problem covers a broad spectrum
`of constraints, applications and issues. Environments in which
`an additive background signal has been introduced are com-
`mon. The background may be
`noise-like such as in aircraft,
`street noise, etc. or may be speech-like such as an environment
`with competing speakers. Other examples in which the need
`
`Manuscript received June 2 2 , 1979; revised August 28, 1979. This
`work was supported in part by the Defense Advance Research Rojects
`Agency monitored by the Office of Naval Research under Contract
`N00014-7542-0951-NR049-328 at M.I.T. Research Laboratory of Elec-
`tronics and in part by the Department of the Air Force under Contract
`F19628-78C-0002 at M.I.T. Lincoln Laboratory.
`The authors are with M.I.T. Research Laboratory of Electronics and
`M.I.T. Lincoln Laboratory, Cambridge, MA 02139.
`
`for speech enhancement arises include correcting for reverber-
`ation, correcting for the distortion of the speech of underwater
`divers breathing a helium-oxygen mixture, and correcting
`the distortion of speech due to pathological difficulties of the
`speaker or introduced due to an attempt to speak too rapidly.
`Even for these examples,
`the problem and techniques vary,
`depending on the availability of other signals or information.
`of speech in
`For example, for enhancement
`an aircraft a
`separate microphone can be used to monitor the background
`to
`noise so that the characteristics of the noise can be used
`adjust or adapt the enhancement system.
`At the air-traffic
`control tower, however, the only signal available for enhance-
`ment is the degraded speech.
`Another very important application for speech enhancement
`sys-
`is in conjunction with speech bandwidth compression
`tems. Because of the increasing role of digital communication
`channels coupled with
`the need for encrypting of speech and
`increased emphasis on integrated voice-data networks, speech-
`bandwidth-compression systems are destined
`to play an in-
`creasingly important role
`in speechcommunication systems.
`The conceptual
`basis for narrow-band speechcompression
`systems stems from a model for
`the speech signal based on
`what is known about the
`physics and physiology of speech
`production. Because of this reliance on a model for the signal
`it is not unreasonable to expect that as the signal deviates from
`the model due to distortion such as additive noise,
`the per-
`regard to
`formance of the speech compression system with
`factors such
`as quality, intelligibility, etc.,
`will degrade. It
`is generally agreed
`that the performance
`of current speech-
`compression systems
`degrades rapidly in
`the presence of
`is currently
`additive noise and other distortions and there
`considerable interest and
`attention being directed at the
`development of more robust speech compression systems.
`There are two basic approaches which are typically considered
`either of which may be preferable in a
`given situation. One
`compression on the as-
`approach is to base the bandwidth
`sumption of undistorted speech and
`develop a preprocessor
`to enhance the degraded speech in preparation for further
`processing by the bandwidth compression system. It is impor-
`tant to
`recognize that in enhancing speech in preparation
`for bandwidth compression the
`effectiveness of the prepro-
`cessor is judged on the basis of the output of the bandwidth-
`compression system in comparison with
`the output
`if no
`preprocessor is used. Thus, for example,
`it is possible that
`the output of the preprocessor would be judged by a listener
`to be inferior (by
`some measure) to the input but that the
`output of the bandwidth-compression system with
`the pre-
`processor is preferred to the output without it. In this
`case,
`the preprocessor would clearly be considered to be effective
`
`0018-9219/79/1200-1586$00.75 O 1979 IEEE
`
`Exhibit 1021
`Page 01 of 19
`
`

`

`
`
`
`
`LIM AND OPPENHEIM: ENHANCEMENT AND BANDWIDTH COMPRESSION
`
`1587
`
`in enhancing the speech in preparation for bandwidth com-
`pression. Another approach
`to bandwidth compression of
`degraded speech is to incorporate into the model for the signal
`degradation. A number of systems
`information about the
`based on such an approach have recently been proposed and
`will be discussed in detail in this paper.
`As is evident from the above discussion, the general problem
`of enhancing speech is broad and the constraints, information,
`and objectives are heavily dependent on the specific context
`In this paper, we consider only a small
`and applications.
`subset of possible topics, specifically the enhancement and
`bandwidth compression of speech degraded by additive noise.
`Furthermore, we assume that the only signal available is the
`degraded speech and that the noise does not depend on the
`original speech. Many practical problems, some of which have
`already been discussed, fall into this framework and some
`so that they do.
`problems that do not
`can be transformed
`noise or convolutional noise
`For example, multiplicative
`degradation can be converted t o an additive noise degradation
`[ l l , [21. As another
`by a homomorphic transformation
`noise in pulsecode
`example, signal-dependent quantization
`modulation (PCM) signal coding can be converted to a signal
`independent additive noise
`by a pseudo-noise technique
`[31-[51.
`Even within the limited framework outlined above, there is a
`objectik of this
`diversity of approaches and systems. One
`paper is to provide an overview of the variety of techniques
`that have been proposed for enhancement of speech degraded
`by additive background noise both for direct listening and as
`a preprocessor for subsequent bandwidth cornpression. Many
`of these systems were developed independently of each other
`and on the surface often appear to be unrelated. Thus another
`in
`objective of the paper is to provide a unifying framework
`terms of which the relationship between these systems is more
`visible, and which hopefully will provide a structure which
`will suggest further fruitful directions for research.
`In Section 11, we present an overview of the general topic.
`In this overview we classify the various enhancement systems
`based on the information assumed about the speech and the
`noise. Some systems based on timeinvariant Wiener filtering,
`on an assumed noise power spectrum
`for example, rely only
`and on long-time average characteristics of speech, such as the
`fact that the average speech spectrum decays with frequency
`on aspects
`at approximately 6 dB/octave. Other systems rely
`of speech perception or speech production in general or on a
`detailed model of speech.
`Sections 111-V present a more detailed discussion of several
`of these categories of speechenhancement systems. In partie
`I11 is concerned with
`the general principle of
`ular, Section
`based on estimation of the short-time
`speech enhancement
`spectral amplitude of the speech. This basic principle encom-
`passes a variety
`of techniques and systems including
`the
`specific methods of spectral subtraction, parametric Wiener
`filtering, etc. In Section IV, speech enhancement techniques
`which rely principally on the concept of the short-time period-
`icity of voiced speech are reviewed, including comb-filtering
`and related systems. Section V discusses a variety of systems
`that rely on more specific modeling of the speech waveform.
`As we will discuss in detail, in some cases, parameters of the
`model are obtained from an analysis of the degraded speech and
`In other cases, the
`used to synthesize the enhanced speech.
`results of an analysis based on a model for speech are used
`to control an enhancement filter, perhaps with the procedure
`
`being iterative so that the output of an enhancement filter is
`then subjected to further analysis, etc. Many of these systems
`also incorporate a number of the techniques introduced
`in
`Section 111, including Wiener filtering and spectral subtraction.
`In Sections 111-V, the focus is entirely on systems for en-
`hancement with the evaluation of the systems being based
`further processing. In Section VI, we
`on listening without
`consider the related but separate problem
`of bandwidth
`compression of speech degraded by additive noise.
`In Section VII, we discuss in some detail
`the evaluation of
`the performance of the various systems presented in the earlier
`sections. In general, the performance evaluation of a speech-
`in large measure
`is extremely difficult,
`enhancement system
`because the appropriate
`criteria for evaluation are
`heavily
`dependent on the specific application of the system. Relative
`importance of such factors as quality, intelligibility, listener
`the application. In
`fatigue, etc., may vary considerably with
`Section VII, we summarize the performance evaluations that
`have been reported for the various systems presented in this
`paper. Since the evaluation of different systems has generally
`been based on different procedures, environments, etc.,
`no
`attempt is made in the section to compare individual systems.
`In general, however, we will see that while many of the en-
`hancement systems reduce
`the apparent background
`noise
`and thus perhaps increase quality, many
`of them to varying
`In the context of bandwidth
`degrees, reduce intelligibility.
`compression, however, various systems provide an increase
`in intelligibility over that obtained without the incorporation
`of speech enhancement.
`II. OVERVIEW OF SYSTEMS FOR ENHANCEMENT AND
`BANDWIDTH COMPRESSION OF NOISY SPEECH
`As indicated in the previous section, our focus in this paper
`is on degradation due to the presence of additive noise. Even
`within this limited
`context there
`are a wide variety of ap-
`proaches which have been proposed and explored. Conceptu-
`attempt to capitalize on available
`ally any approach should
`information about the
`signal, i.e., the speech, and
`the back-
`is a special subclass of audio signals
`ground noise. Speech
`and there are reasonable models in terms of which the speech
`waveform can be described and categorized. The more
`speci-
`fically we attempt to model the speech signal, the more poten-
`tial'for separating it from the background noise. On the other
`hand, the more we assume about the speech the more sensitive
`the enhancement system will be to inaccuracies or deviations
`from these assumptions. Thus incorporating assumptions and
`information about the speech signal represents tradeoffs which
`the various systems. In a similar manner sys-
`are reflected in
`tems can attempt to incorporate
`detailed information about
`the background noise.
`For example, the type of processing
`suggested if the background noise is a competing speaker
`is
`different than if it is wide-band random noise. Thus enhance-
`ment systems also tend
`to differ in terms of the assumptions
`the background noise. As with assumptions
`made regarding
`related to the
`signal, the more an enhancement system at-
`tempts to capitalize on assumed characteristics of the noise
`the more susceptible it is likely to be to deviations from these
`assumptions.
`consideration in speech enhancement
`Another important
`stems from the fact that the criteria for enhancement ulti-
`mately relate to an evaluation by a human listener. In different
`contexts the criteria for evaluation may differ depending on
`whether quality, intelligibility, or some other attribute
`is the
`
`Exhibit 1021
`Page 02 of 19
`
`

`

`1588
`
`PITCH PERIOD
`
`I
`
`DIGITAL FILTER COEFFICIENTS
`
`I I
`
`RANDOM
`NOISE
`
`I
`AMPLITUDE
`
`Fig. 1. A speech production model.
`
`(b)
`(a)
`Fig. 2. An example of resonant frequencies of an acoustic cavity.
`(a) Vocal-tract transfer function. (b) Magnitude spectrum of a speech
`sound with the resonant frequencies shown in (a).
`
`most important. Thus speech enhancement must inevitably
`aspects of human perception. As we will
`take into account
`indicate shortly, some systems are
`heavily motivated by per-
`ceptual considerations,
`others rely more on mathematical
`cases, of course, the mathematical criteria
`criteria. In such
`must in some way be consistent with human perception, and,
`while an optimum mathematical criterion is not known, some
`mathematical error criteria are understood
`to be a
`better
`match than others to aspects of human perception.
`In the following discussion we briefly describe some aspects
`of speech production and speech perception that in
`varying
`degrees pray a role in speechenhancement systems. Following
`that we present a brief overview of a representative collection
`of speechenchancement systems, with
`the intent
`of cate-
`of the various aspects of
`gorizing these systems
`in terms
`speech production and perception on
`which they attempt to
`capitalize.
`Speech is generated by exciting an acoustic cavity, the vocal
`tract, by pulses of
`air released through the vocal cords for
`voiced sounds, or by turbulence for unvoiced sounds. Thus
`a simple but useful model for speech production consists
`of
`a linear system, representing
`the vocal tract, driven by an
`excitation function which is a periodic pulse train for voiced
`sounds and wide-band noise for unvoiced sounds, as illustrated
`in Fig. 1. Furthermore, since the linear system represents an
`is of a resonant nature, so that
`acoustic cavity, its response
`its transfer function
`is characterized by a set of resonant
`frequencies, referred to as formants, as illustrated in Fig. 2(a).
`are fixed,
`Thus, if the excitation and vocal-tract parameters
`then as indicated in Fig. 2(b), the speech spectrum
`has an
`envelope representing
`the vocal-tract transfer function
`of
`Fig. 2(a) and a fine structure representing the excitation.
`Many of the techniques for speech enhancement, particu-
`111 and V are conceptually based on
`larly those in Sections
`the representation of the speech signal as a stochastic process.
`This characterization of speech is clearly more appropriate in
`the case of unvoiced sounds for which the vocal tract is driven
`by wide-band noise. The vocal tract of course changes shape
`as different sounds are generated and this
`is reflected in a
`
`
`
`
`
`PROCEEDINGS OF THE IEEE,
`
`VOL. 67, NO. 12, DECEMBER 1979
`
`time varying transfer function for the linear system in Fig. 1.
`However, because
`of the mechanical and physiological con-
`straints on the motion
`of the vocal tract and articulators
`it is reasonable to represent the
`such as the tongue and lips,
`linear system in Fig. 1 as a slowly varying linear system so that
`on a short-time basis it is approximated as stationary. Thus
`some specific attributes of the speech signal, which can be
`capitalized on in an enhancement system are that
`it is the
`response of a slowly varying
`linear system, that on
`a short-
`is characterized by a set of
`time basis its spectral envelope
`resonances, and that for voiced sounds, on a short-time basis
`it has a harmonic structure. This simplified model for speech
`production has generally been very successful
`in a variety of
`engineering contexts including speech enchancement, synthe-
`sis, and bandwidth compression. A more detailed discussion
`of models for speech production can be found in [ 61 -[ 81 .
`of speech are considerably more
`The perceptual aspects
`complicated and
`less well understood. However, there are a
`number of commonly accepted aspects of speech perception
`which play an important role in speechenchancement systems.
`to be important in the
`For example, consonants are known
`intelligibility of speech even though they represent a relatively
`small fraction of the signal energy. Furthermore, it is generally
`is of central impor-
`understood that the short-time spectrum
`tance in
`the perception of speech and
`that, specifidy, the
`formants in the short-time spectrum are more important than
`other details of the spectral envelope. It appears also, that the
`first formant, typically in the range of 250 to 800 Hz, is less
`[9], [lo].
`important perceptually, than the second formant
`Thus it is possible to apply a certain degree of high pass filter-
`the f i i t
`ing [ 1 1 ], [ 121 to speech which may perhaps affect
`formant without introducing
`serious degradation
`in intelligi-
`bility. Similarly low-pass
`filtering with a cutoff frequency
`above 4 kHz, while perhaps affecting
`crispness and quality
`will in general not seriously affect intelligibility. A good repre-
`sentation of the magnitude of the short-time spectrum is also
`generally considered to be important whereas the phase is
`aspect of the
`relatively unimportant. Another perceptual
`auditory system that plays a role in speech enhancement is the
`ability to mask one signal with another. Thus, for
`example,
`narrow-band noise and many forms of artificial noise or deg-
`radation such as might be produced by a vocoder are more
`unpleasant to listen to than broad-band noise and a speech-
`enhancement system might include the introduction of broad-
`band noise to mask the narrow-band or artificial noise.
`All speech-enhancement systems rely
`to varying degrees on
`the aspects of speech production and perception outlined
`above. One of the simplest approaches to enhancement is the
`use of low-pass
`or bandpass filtering
`to attenuate the noise
`outside the band of perceptual importance for speech. More
`generally, when the power spectrum of the noise is known,
`one can consider the use of Wiener filtering, based on the long-
`time power spectrum of speech. While in some cases such as
`the presence of narrow-band background noise, this is reason-
`ably successful, Wiener filtering based on the long-time power
`spectrum of the speech and noise is limited because speech is
`not stationary. Even if speech were truly stationary, mean-
`on which Wiener
`square error which
`is the error criterion
`filtering is based is not strongly correlated with perception and
`to apply to
`thus is not a particularly effective error criterion
`speech processing systems. This
`is evidenced, for example, in
`the use of masking for enhancement. By adding broad-band
`
`Exhibit 1021
`Page 03 of 19
`
`

`

`
`
`
`
`LIM AND OPPENHEIM: ENHANCEMENT
`
`AND BANDWIDTH COMPRESSION
`
`1589
`
`noise to mask other degradation, we are, in effect, increasing
`that suggests that
`the meansquare error. Another example
`is not well matched to the perceptually
`meansquare error
`important attributes in speech is the fact that distortion of the
`speech waveform by processing with an all-pass filter results
`in essentially no audible difference if the impulse response of
`the all-pass filter is reasonably short but can result in a sub-
`the origiaal and filtered
`stantial mean-square error between
`speech. In other words, mean-square error is sensitive to phase
`of the spectrum whereas perception tends not to be.
`Masking and bandpass filtering represent
`two simple ways
`in which perceptual aspects of the auditory system
`can be
`exploited in speech enhancement. Another system whose
`motivation depends heavily on aspects of speech perception
`was proposed by Thomas and Niederjohn [ 121 as a preproces-
`to the introduction of noise in those applications
`sor prior
`where noise-free speech is available for processing. In essence,
`their system applies high-pass filtering to reduce or remove the
`first formant followed by infinite
`clipping. The motivation
`for the system lies in the observation that at a given signal-
`clipping will increase, relative to the
`to-noise ratio infinite
`vowels, the amplitude of
`the perceptually important low-
`as consonants thus making them less
`amplitude events such
`In addition, for vowels
`susceptible to masking by noise.
`the filtering will increase the amplitude of higher formants
`formant, thus making the perceptually
`relative to .the fiit
`more important higher formants less susceptible to degrada-
`in this
`tion. In the
`speech enhancement problem considered
`paper, noise-free speech is not available for processing as re-
`quired in the above system. Thomas
`and Ravindran [ 131,
`however, applied
`high-pass fitering followed by infinite
`clipping to noisy speech as an experiment. While quality may
`be degraded by the process of filtering and clipping, they claim
`to
`a noticeable improvement
`in intelligibility when applied
`enhance speech degraded by wide-band random noise. One
`possible explanation may be that the high-pass filtering opera-
`tion reduces
`the masking of
`perceptually important higher
`formants by the
`relatively unimportant 1owXrequency
`components.
`Another system which relies heavily on human perception of
`[ 141. Based on some per-
`speech was proposed by Drucker
`that one primary cause for
`ceptual tests, Drucker concluded
`the intelligibility loss in speech degraded by wide-band random
`noise is the confusion among the fricative and plosive sounds
`which is partly due to the loss of short pauses immediately
`filtering one of the
`before the plosive sounds. By high-pass
`inserting short pauses
`the /s/ sound, and
`fricative sounds,
`before the plosive sounds (assuming that their locations can
`be accurately determined), Drucker claims a
`significant
`im-
`provement in intelligibility.
`perceptual attributes we indicated that the
`In discussing
`to be
`is generally considered
`short-time spectral magnitude
`important whereas the phase is relatively unimportant. This
`forms the basis for a class of speech enhancement systems
`various ways to estimate the short-time
`which attempt in
`spectral magnitude of the speech without particular regard to
`the phase and to use this to recover or reconstruct the speech.
`This class of systems includes spectral subtraction techniques
`originally due to Weiss et al. I1 51, [ 161, and which have
`recently received a great. deal of attention [ 171 -[22] and
`as Wiener filtering and
`optimum filtering techniques such
`These systems will be discussed in
`power spectrum fitering.
`
`see, many of
`in Section 111. As we will
`considerable detail
`on the surface to be different
`these systems which appear
`are in fact identical or very closely related.
`In addition to directly or indirectly utilizing perceptual
`attributes most enhancement systems rely to varying degrees
`on aspects of speech production. For example, in Section IV,
`we describe in detail a variety
`of systems that attempt,
`in
`some way, to capitalize on short-time periodicity
`of speech
`during voiced sounds. As a consequence of this periodicity,
`during voiced intervals the speech spectrum has a harmonic
`structure which suggests the possibility of applying comb
`filtering or as proposed by Parsons [231 attempting to extract
`in other ways, the components of the speech spectrum only
`at the harmonic
`frequencies.
`In essence, knowledge of the
`harmonic structure of voiced sounds allows us in principle to
`remove the noise in the spectral bands between the harmonics.
`As discussed in Section IV, speech enhancement by comb
`fitering can also be viewed in terms of averaging successive
`periods of the noisy speech to partially cancel the noise.
`Another system, which attempts to take advantage of the
`quasi-periodic nature of the speech was proposed by Sambur
`[241. As developed in more detail in Section IV, his system
`is based on the principles of adaptive noise cancelling. Unlike
`the classical procedure Sambur’s method is designed to cancel
`out the clean speech
`signal, taking advantage of the quasi-
`of the
`periodic nature of the speech to form an estimate
`value of the signal one
`speech at each time instant from the
`period earlier.
`In the model of speech production, we represented the
`speech signal as generated by exciting a quasistationary linear
`system with a
`pulse train for
`voiced speech and noise for
`unvoiced speech. Based on this model, an approach to speech
`of the
`enhancement is to attempt to
`estimate parameters
`speech itself and to then use this to
`model rather than the
`synthesize the speech, i.e., to enhance speech through the
`use of an analysis-synthesis system.
`A particularly novel
`application of this concept was used by Miller [251 to remove
`the orchestral accompaniment from early recordings of Enrico
`Caruso. In this system homomorphic deconvolution was used
`to estimate the impulse response of the model in Fig. 1. A
`similar approach to noise reduction was proposed by Suzuki
`[261, [27] whereby the short-time correlation function of
`is used as an estimate of the impulse
`the degraded speech
`is referred to as
`response of the linear system. This system
`(SPAC). A modification
`splicing of .auto correlation function
`of SPAC is referred to as splicing of
`cross-correlation func-
`tion (SPOC). A number of systems also attempt to model
`the vocal-tract impulse response in more detail. As we dis-
`cussed previously
`the vocal-tract transfer function
`is charac-
`terized by a set of resonances or formants that are perceptually
`important. This suggests the possibility of representing the
`vocal-tract impulse response in terms
`of a pole-zero model
`with the analysis procedure directed at estimating the
`associ-
`ated parameters. The poles in particular would provide a
`reasonable representation of the formants.
`success in
`All-pole modeling of speech has had notable
`analysis-synthesis systems
`for clean speech. A number of
`recent efforts have been directed toward estimating the param-
`eters in an all-pole model from
`noisy observations of the
`speech such as the systems by Magill and Un [281, Lim and
`Oppenheim 1291, Lim [ 18 I , and Done and Rushforth
`[30].
`Extensions to pole-zero modeling have also been proposed
`
`Exhibit 1021
`Page 04 of 19
`
`

`

`1590
`
`by Musicus and Lim [31 I and Musicus [ 321. These various
`approaches are described and compared in detail in Section V.
`intended as a brief overview of
`The above discussion was
`the general approaches to speech enhancement. In
`the next
`three sections we explore in more detail many of the systems
`mentioned above.
`In particular, in Section
`111, we focus on
`speech-enhancement techniques based on short-time spectral
`OUT focus is on speech
`amplitude estimation. In Section IV
`enhancement based on periodicity
`of voiced speech and in
`Section V on speech-enhancement techniques using an analysis-
`synthesis procedure.
`III. SPEECH ENHANCEMENT TECHNIQUES BASED ON
`SHORT-TIME SPECTRAL AMPLITUDE ESTIMATION
`In general, in enhancement of a signal degraded by additive
`noise, it is significantly easier to estimate the spectral ampli-
`tude associated with the original signal than it is to estimate
`phase. As we discussed in Section 11,
`both amplitude and
`it is principally the short-time spectral amplitude rather than
`phase that is important for speech intelligibility and quality.
`of speech-
`As we discuss
`in this section, there are a variety
`on this aspect
`enhancement techniques that capitalize
`of
`speech perception by focusing on enhancing only the short-
`to be discussed can
`time spectral amplitude. The techniques
`be broadly classified into two groups. In the first, presented
`in Section 111-A, the short-time spectral amplitude is estimated
`in the frequency domain, using the spectrum of the degraded
`speech. Each short-time segment of
`the enhanced speech
`waveform in the time domain is then obtained by
`inverse
`transforming this spectral amplitude estimate combined with
`the phase of the degraded speech.
`In the second class, dis-
`is first used to
`cussed in Section 111-B the degraded speech
`obtain a filter which is then applied to the degraded speech.
`Since these procedures
`lead to zero-phase filters, it is again
`is enhanced, with the phase
`only the spectral amplitude that
`of the filtered speech being identical to that of the degraded
`speech.
`In both classes of systems discussed below no conceptual
`distinction is made between voiced and unvoiced speech and in
`particular in contrast
`to the techniques to be discussed
`in
`Section IV the periodicity of voiced speech is not exploited.
`Both classes of systems in this section are most
`easily inter-
`preted in terms of a stochastic characterization of the speech
`signal. While this characterization
`for
`is more justifiable
`empirically to also lead
`unvoiced speech it has been shown
`to successful procedures for voiced speech.
`
`A . Speech Enhancement Based on Direct Estimation
`of Short-Time Spectral Amplitude
`When a stationary random signal s(n) has been degraded by
`uncorrelated additive noise d(n) with a known power density
`spectrum, the power density spectrum
`or spectral amplitude
`of the signal is easily estimated through a process of spectral
`subtraction. Specifically, if
`r(n) = s(n) + d(n)
`(1)
`and P,,(o), P,(o), and Pd(o) represent the power density
`spectra of y(n), s(n), and d(n), respectively, then
`
`Consequently, a reasonable estimate for P,(w) is obtained by
`
`PROCEEDINGS OF THE IEEE, VOL. 67, NO. 12, DECEMBER 1979
`
`Pd(o) from an estimate of
`subtracting the known spectrum
`P,,

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket