`
`275
`
`A Soft Decision-Directed LMS Algorithm for Blind Equalization
`
`Steven J. Nowlan and Geoffrey E. Hinton
`
`Abstract-We propose a new adaptation algorithm for equaliz-
`ers operating on very distorted channels. The algorithm is based
`on the idea of adjusting the equalizer tap gains to maximize
`the likelihood that the equalizer outputs would be generated by
`a mixture of two gaussians with known means. The familiar
`decision-directed least mean square (LMS) algorithm is shown
`to be an approximation to maximizing the likelihood that the
`equalizer outputs come from such an i.i.d. source. The algorithm
`is developed in the context of a binary PAM channel and simula-
`tions demonstrate that the new algorithm converges in channels
`for which the decision-directed LMS algorithm does not converge.
`
`I. INTRODUCTION
`HE system which we investigate in this paper is illus-
`
`T trated in Fig. 1. A sequence of data (at)' is sent through
`
`a (linear) channel with unknown impulse response S . The data
`may take on one of a small number of discrete values (in the
`sequel we assume that at may take on one of two values
`*a). The output of the unknown channel (Q) is then passed
`through a linear transversal filter W whose impulse response
`is approximately S-l. The objective of W is to cancel out
`most of the distorting effects of the channel S so that the
`sequence ( g t ) output by W is a near copy of the original input
`sequence (ut). Finally, a zero-order (memoryless) nonlinear
`decision process (ZNL) examines each output yt and replaces
`it with the closest value from the set of discrete input values,
`producing the sequence (at). We wish to consider the general
`case in which S may be noncausal, so W must introduce some
`delay into the system. As a result, we desire in general that
`at-N or at = at-N where N represents the (known)
`yt
`delay of W .
`Systems of the type just described occur frequently in digital
`communications where data must be converted into analog
`form for transmission and then converted back into digital
`form at the receiver (see [ll], [14] for a general introduction
`to this problem). W is generally a filter with adjustable tap
`weights and the problem we explore is how to adjust these
`tap weights so W becomes a good approximation of S-l.
`If the sequence ( u t ) is known, the classical approach is to
`use an LMS or stochastic gradient descent procedure [ 121 to
`
`Channel Qt--p-+ Eaualizer
`
`Fig. 1. Block diagram of system of interest.
`
`minimize ~ Y [ ( y ~ - a ~ - ~ ) ~ l . ~
`In practical situations the sequence
`( a t ) is not known. Instead, a two-step procedure is used
`to adjust the equalizer: 1) an initial settling phase in which
`the transmitter sends a known initialization sequence and the
`receiver performs LMS adjustment of the equalizer and 2) a
`phase in which the receiver uses the output of the ZNL (at)
`as an estimate for ( a t ) in performing LMS adjustment of the
`equalizer.
`The second step of the procedure is usually referred to
`as operating the equalizer in decision-directed mode since
`updates to the taps of the equalizer are controlled by the
`decisions made by the ZNL.
`In the classical decision-directed LMS (DDLMS) algorithm
`[14], [13], the ZNL is a simple threshold device. For the
`binary channel we are considering, the output of this ZNL
`can be represented as a sgn(yt) and the DDLMS algorithm
`
`can be regarded as minimizing E[(yt - a ~ g n ( y ~ ) ) ~ ] . In this
`paper we derive a different form of ZNL for operating the
`equalizer in decision-directed mode. The modified ZNL is
`derived by proposing that (yt) be modeled as the output of
`an independent, identically distributed (i.i.d.) random process
`with a gaussian mixture density. The ZNL then corresponds
`to the maximum likelihood estimate of at given yt and the
`assumed model. The next section of the paper derives the
`basic algorithm, and Section I11 presents some simulation
`results on a simulated binary PAM channel which demonstrate
`that the modified equalization algorithm can converge reliably
`with initial channel error rates considerably greater than those
`allowed by the classical DDLMS procedure.
`
`Paper approved by the Editor for Channel Equalization of the IEEE Com-
`munications Society. Manuscript received April 24, 1990; revised May 23,
`1991 and August 12, 1991. This work was supported by a grant from the
`Ontario Information Technology Research Center.
`S. J. Nowlan is with the Computational Neuroscience Laboratory, The Salk
`Institute, San Diego, CA 92186.
`G.E. Hinton is with the Department of Computer Science, University of
`Toronto, Toronto, Ont. M5S 1A4, Canada.
`IEEE Log Number 9207349.
`'The notation ( u t ) is used to denote a sequence of real values, while ut
`will refer to one element of that sequence.
`
`11. DEVELOPMENT OF THE ALGORITHM
`We begin by introducing some mathematical notation, and
`then outline the basic method for adjusting the tap weights
`of the equalizer via a stochastic gradient descent procedure.
`'Technically, this expectation is evaluated with respect to ensemble aver-
`ages of the signal. However, usual practice is to assume stationarity of systems
`and processes and to substitute time averages for ensemble averages. In the
`sequel, except as noted, one may assume that all expectations are evaluated
`as time rather than ensemble averages.
`
`0090-6778/93$03.00 0 1993 IEEE
`
`Exhibit 1030
`U.S. Patent No. 6,108,388
`
`
`
`276
`
`IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 41, NO. 2, FEBRUARY 1993
`
`We then develop our algorithm by introducing a particular
`functional to be optimized and relating this functional to the
`functional used in the classicial DDLMS procedure.
`In the system depicted in figure 1, we assume that W
`represents an adjustable filter with 2N + 1 taps and a delay
`of N , so
`
`A N
`
`Yt = C W i x t - k
`
`I -
`
`k=-N
`where the superscript on the term w: indicates that the tap
`weights change over time. Define the vector wt = (W?,,
`.
`, w h ) and the vector zt = { x t + N , . . . , xt-N}. Then we
`may express (1) as
`
`+ .
`
`
`
`T
`Yt = wt 2 t
`where the superscript T denotes the vector transpose.
`As mentioned in the introduction, the objective of adjusting
`the tap weights in the equalizer is to make yt z a t - N . The
`approach we shall use is to treat (yt) as a stochastic process
`and to minimize a functional
`
`(2)
`
`J ( w ) = E[Q(yt)].
`
`min
`W E R2 N + l
`Following the classical approach (assuming stationary ergodic
`processes), we replace ensemble averages in (3) with time
`averages and we get the following stochastic gradient update
`procedure:
`
`(3)
`
`wt+l = wt - E e t q
`
`(4)
`
`where
`
`and E is a (fixed) stepsize. General convergence results for this
`class of stochastic approximation algorithms may be found in
`[3]. To complete the algorithmic specification, we simply need
`to define Q(yt).
`Since we do not have direct access to ( a t ) when updating
`our tap weights, Q(yt) must contain some assumptions about
`at-N. In the classical DDLMS algorithm for a binary PAM
`channel with input values f a , Q(yt) = (yt - a sgn(yt))2
`and et = yt - a sgn(yt). Informally, the assumption in the
`classical algorithm is that if yt 2 0 then f3t-N = a, otherwise
`a t - N = -a.
`We propose instead that we define
`
`W Y t ) = -1%
`
`f d Y t )
`
`(5)
`
`where
`
`fY(yt) is the density function of a gaussian mixture with two
`components centered at
`p1 and p2 are the proportions
`of the two gaussians in the mixture, and u is their standard
`deviation. In the sequel, we will assume that p1 = p2 = 0.5
`Upper case letters denote random variables (RV) and corresponding lower
`case letters denote values of the RV.
`
`1-
`
`0.8.-
`
`0.6--
`
`0.4--
`
`0.2.-
`--
`
`-0
`
`-0.2.-
`
`- 0 . 4 -
`
`-0.6.-
`-o.e--
`
`-1.6
`
`-0.4
`
`-0
`
`-1
`I
`-1.2 - 0 . 8
`-
`0 . 4
`0 . 8
`1 . 6
`2
`Fig. 2. Soft decision error function for different values of u. (a = 0
`corresponds to a hard decision.)
`
`0 . 0
`
`1.2
`
`which corresponds approximately to the assumption that the
`values f a are equally likely in the data sequence (at). We
`assume for now u is also fixed.
`Minimizing Q ( y t ) as defined in (5) corresponds to maxi-
`mizing the (log) likelihood that the sequence (yt) is generated
`by an i.i.d. source with density f u ( ) . One can justify this
`model for (yt) by assuming that ( a t ) is generated by an i.i.d.
`source with density f A ( a t ) = 0.56,,+, + 0.56,,-,
`and that
`the combined effect of S followed by W on (at) is simply to
`add 0 mean gaussian noise (and a delay of N ) to (at). The
`added noise is composed of channel noise and an intersymbol
`interference term. Since the intersymbol interference term
`contains the contributions from a large number of random
`symbols, the characterization of the overall noise constribution
`as gaussian is not unreasonable.
`To evaluate et based on (5) we must deal with the sum of
`exponentials in (6). One simple approximation is to ignore the
`smaller of the two exponential terms. Under this assumption
`we find
`
`{ + (Yt + a ) Yt < 0
`
`7 (yt - a ) otherwise.
`
`et
`
`(7)
`
`We can simply include the 1 / u 2 term into our step size 6 and
`we find that this approximate expression for et is identical to
`the expression for et in the classical DDLMS algorithm.
`If we do not make any approximations, the expression for
`et based on (5) is
`
`Comparing (8) to the classical DDLMS error, we see that we
`have replaced the sign function with a more complicated non-
`linearity. We refer to this particular non-linearity as a “soft”
`decision as compared to the “hard” decision represented by
`the sign function in the DDLMS algorithm. The choice of
`these terms is apparent from Fig. 2 where we have plotted
`(8) as a function of y for several values of u. As u + 0 (8)
`becomes identical to the error in the DDLMS algorithm. This
`error has a sharp discontinuity at y = 0, and as u increases
`this discontinuity is smoothed away.
`
`
`
`IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 41, NO. 2, FEBRUARY 1993
`
`277
`
`The weakness of the hard decision used in the classical
`DDLMS algorithm is apparent when we realize that when yt
`is very near 0 a decision based on the sign of yt is most likely
`to be incorrect, due to the effects of random noise. Yet it is
`in these cases that the tap weights are changed the most. The
`“soft” decision nonlinearity has a much smaller magnitude
`when yt is near 0, so it makes only small weight changes in
`these highly ambiguous cases.
`Finally, we turn to the parameters of the gaussian mixture
`density f y ( ) . There are three parameters for each component
`in the mixture, a mixing proportion p , a mean p, and a variance
`u2. In the development so far we have assumed that the means
`have been fixed at &u and the mixing proportions have been
`fixed at p1 = p 2 = 0.5. These values come directly from our
`assumption that ( u t ) is generated from a binary i.i.d. source
`with density
`~ A ( A ) = 0 . 5 6 ~ + , + 0 . 5 6 ~ ~ ~
`(9)
`where a is a constant and 6, is the usual Kronecker delta
`function
`6, = {
`
`1 x = o
`0 otherwise.
`Knowledge of the source characteristics and transmission
`scheme will usually suggest reasonable choices for the values
`of p and p for each component.
`We have also assumed that the variance of the two com-
`ponents was equal and fixed at some unspecified value u2.
`To understand why both components would have the same
`variance, it is helpful to consider more carefully the nature of
`the “noise” which results when W is not an ideal inverse filter
`for S. In Fig. 1, xt and yt may be expressed as the following
`convolution sums:
`
`k=-m
`where we have let wk = 0 for Ikl > N and dropped the
`superscript on wk for notational convenience. Let W* =
`, w&) denote the impulse response of an ideal
`(WE,,...
`inverse filter for S so
`
`k
`If we replace W with W* in Fig. 1 we would have
`
`k
`We can regard W as an approximation to the ideal equalizer
`W* and rewrite (11) as
`
`where (n;) is a convolutional noise sequence and we have used
`(13). The variance of each comDonent in our mixture densitv
`
`is largely determined by the variance of the convolutional
`noise sequence (n;). From (14) it is apparent that ni contains
`contributions from a large number of transmitted symbols, so
`it is reasonable to assume that the properties of (n:) are not
`dramatically different for different symbol values (Le., samples
`representing the symbol a and samples representing the symbol
`-u will have similar distributions for nc). This suggests that
`the variance of each component in the mixture density f u ( )
`is the same.4
`Equation (14) also suggests that the characteristics of (n;)
`will change as W changes, so the assumption that u2 should
`be fixed does not seem reasonable. Simulation studies have
`shown that better convergence may be obtained by adjusting
`u in parallel with adjusting the equalizer tap weights [lo], [7],
`[9]. The algorithm used to update u is an incremental version
`of the standard maximum likelihood estimator for mixture
`components with tied variance [5], [8]. The actual update rule
`is
`
`u2(t + 1) = K O 2 @ ) + (1 - IF.)[X(y, + u)2 + (1 - X)(y, - u)2]
`(15)
`
`where
`
`and IF. is a decay rate slightly less than 1 for discounting past
`data. The decay rate for the exponential averaging of past data
`is based on the degree of stationarity of the data. A more
`detailed development of this expression may be found in [9].
`Finally, we note that the model for the data given by
`(9) may be generalized to allow a larger class of discrete
`signal distributions while still allowing the distribution of the
`sequence ( y t ) to be modeled as a mixture of gaussians. If we
`let A denote a random process for which ( u t ) is a realization,
`the pdf of this i.i.d. source may be written in the general form
`
`z
`This source will be 0 mean and ~ncorrelated~ as long as
`
`An i.i.d. data source of the form (17) will lead to an i.i.d.
`model for (vt) with a Gaussian mixture density in which the
`mixing proportions pi and means pi are as specified in (17)
`and the variance of all components is the same.
`
`111. SIMULATION RESULTS
`The discussion in the previous section comparing the “hard”
`and “soft” decision nonlinearities suggested that the difference
`between these two nonlinearities will be most apparent when
`a channel is poorly equalized and there are numerous incorrect
`decisions (i.e., cases where at-N and yt have opposite signs).
`Simulations reported elsewhere [ 101, [7] have shown that the
`
`4This argument is developed in greater detail in [9] and also in [6] in the
`context of nonlinear deconvolution.
`.
`5EI--l,.-l,l = a? if t =
`and is 0 , otherwise.
`
`
`
`278
`
`IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 41, NO. 2, FEBRUARY 1993
`
`use of the soft decision nonlinearity can lead to more rapid
`initial convergence than the classical DDLMS algorithm in
`channels with moderate to severe noise and distortion. The
`simulations reported in this section focus on comparing the
`convergence limits of the DDLMS and soft decision-directed
`algorithms.
`Simulations were carried out using a synthetic binary PAM
`channel. The output of an ideal nonredundant binary signal
`source with signal levels f l was sent through a raised cosine
`pulse generator and a non-causal finite impulse response (FIR)
`distortion filter. White zero mean noise was then added to the
`output of the FIR filter to produce the input to the equalizer.
`50 different channels with different bit error rates were created
`by combining different distortion filters with different amounts
`of white noise. The distortion filters had a gain of 1 and phase
`distortion ranging from f 1 0 ” to f60”. The signal-to-noise
`ratio varied from 30 dB to 0 dB.
`The DDLMS and the soft-decision algorithm were used
`to equalize each of the 50 channels. In each simulation, the
`equalizer was initialized with a center tap weight of 1.0 with
`all other tap weights equal 0.0. An initial bit error rate for
`each channel was computed by comparing the sign of yt to the
`sign of at-N for 500 channel outputs.6 1000 weight updates
`were then performed (using either the DDLMS or soft-decision
`algorithm) and the tap weights frozen. A final bit error rate was
`computed based on an additional 500 channel outputs. A figure
`of merit y was then computed for each simulation:
`Final bit error rate
`y = 1.0 -
`Initial bit error rate ’
`A value of 1.0 for y indicates that the adjustment algorithm
`was able to reduce the bit error rate to 0 within 1000 update^,^
`y = 0.5 indicates that the bit error rate was cut in half by the
`adjustment algorithm, and values of y near 0 indicate that the
`adjustment algorithm has failed to improve the error rate at
`all. A range for E between 0.1 and
`was explored for
`each algorithm on each channel and results are reported for
`the value of E which gave the best value for 7.’
`The simulation results are summarized in Fig. 3 where y
`has been plotted on the vertical axis and the initial bit error
`rate as a percentage on the horizontal axis. The shaded region
`bounded by the lines marked with + indicates the performance
`range of the DDLMS algorithm over the 50 channels. The
`shaded region bounded by the lines marked with * indicates the
`performance range of the soft-decision algorithm. For channels
`with initial bit error rates up to about 5% the two algorithms
`are indistinguishable, however there are significant differences
`for larger initial bit error rates. The absolute convergence limit
`of each algorithm is indicated approximately by the bit error
`rate at which y = 0. For the DDLMS algorithm this limit is
`between 16% and 18%9 while the soft decision limit is between
`61n all cases, the initial bit error rate is > 0 indicating that the “eye” of the
`equalizer output is initially partially closed.
`’This does not mean the channel was perfectly equalized since it is possible
`to make no incorrect decisions and still have a high MSE.
`8This optimal value of f was not necessarily the same for both algorithms
`operating on the same channel.
`9This figure is in agreement with other reported convergence limits for the
`DDLMS algorithm [4].
`
`0.9
`
`12
`
`16
`
`20
`
`4
`8
`28
`0
`24
`Fig. 3. Graph of y versus initial bit error rate (%) for the DDLMS (+) and
`soft-decision (*) algorithms.
`
`32
`
`36
`
`+
`40
`
`32% and 35%. From a practical standpoint, the region of
`interest is primarily for values of y > 0.5. Even in this region,
`the soft decision algorithm maintains a significant advantage
`(10% to 12% for DDLMS with y = 0.5 compared to 20% to
`23% for the soft-decision algorithm).
`
`IV. CONCLUSION
`We have proposed a new algorithm for blind equalization
`of very distorted channels. The algorithm is based on the idea
`of adjusting the equalizer tap gains to maximize the likelihood
`that the equalizer outputs come from an i.i.d. source that is a
`mixture of two gaussians with known means. The approach
`is most similar to earlier work on blind deconvolution by
`Godfrey and Rocca [6] and blind equalization by Bellini [l],
`[2]. Our approach differs from these other approaches in that
`the algorithm is developed based on an assumed distribution
`for the equalizer output rather than assumptions about the
`distribution of signals at the channel input. This leads to an
`algorithm with an adaptive non-linearity for estimating the
`channel input given the equalizer output. This “soft” decision
`algorithm is compared to the classical decision-directed LMS
`algorithm on a binary PAM channel and simulations suggest
`that when operating in the blind mode the new algorithm can
`converge in channels with twice the initial bit error rate for
`which the DDLMS algorithm is convergent.
`
`ACKNOWLEDGMENT
`We thank Y. Le Cun and J. Bridle for helpful discussions.
`
`REFERENCES
`
`[l] S. Bellini, “Bussgang techniques for blind equalization,” in IEEE
`GLOBECOM Conf Rec., pp. 1634-1640, Dec. 1986.
`[2] S. Bellini and F. Rocca, “Near optimal blind deconvolution,” in
`Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, vol. 4,
`pp. 2236-2239, 1988.
`[3] A. Benveniste, M. Goursat, and G. Ruget, “Robust identification of
`a nonminimum phase system: Blind adjustment of linear equalizer
`in data communications,” IEEE Trans. Automat. Contr., vol. AC-25,
`pp. 385-399, 1980.
`[4] J. A. C. Bingham, The Theory and Practice of Modem Design. New
`York: Wiley, 1988.
`[5] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum likelihood
`from incomplete data via the EM algorithm,” in Proc. Royal Statist.
`SOC., V O ~ . B-39, pp. 1-38, 1977.
`
`
`
`IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 41, NO. 2, FEBRUARY 1993
`
`219
`
`R. Godfrey and F. Rocca, “Zero memory non-linear deconvolution,”
`Geophys. Prospecting, vol. 29, pp. 189-228, 1981.
`G.E. Hinton and S.J. Nowlan, “The bootstrap Widrow-Hoff
`rule as
`a cluster-formation algorithm,” Neural Comput., ~ 0 1 . 2, pp, 355-362,
`1990.
`G , J, McLachlan and K,E, Basford, Mixture Models: Inference and
`New York: Marcel Dekker, 1988.
`Applications to Clustering.
`S. J. N o w h ‘‘soft competitive adaptation: Neural network learning
`algorithms based on fitting statistical mixtures,” Ph.D. dissertation,
`School of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, Apr.
`1991.
`
`[IO] S . J. Nowlan and G. E. Hinton, Maximum likelihood decision directed
`adaptive equalization,” University of Toronto, Tech. Rep. CRG-TR-89-8,
`1989.
`[ I l l s. u. H. Qureshi, “Adaptive equalization,” Proc. IEEE, vol. 73,
`pp. 1349-1387, Sept. 1985.
`[12] B. Widrow and M.E. Hoff, Jr., “Adaptive switching circuits,” in IRE
`WESCON Convention Rec., 1960, pp. 96-104.
`[I31 B. Widrow, J. M. McCool, M. G. Larimore, and C. R. Johnson, Jr., “Sta-
`tionary and nonstationary
`learning characteristics of the LMS adaptive
`filter,” Proc. IEEE, vol. 64, pp. 1151-1162, Aug. 1976.
`[I41 B. Widrow and S. D. Steams, Adaptive Signal Processing. Englewood
`Cliffs, NJ: Prentice-Hall, 1985.
`
`