`
`IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 1, JANUARY 2006
`
`Stable Recovery of Sparse Overcomplete
`Representations in the Presence of Noise
`
`David L. Donoho, Member, IEEE, Michael Elad, and Vladimir N. Temlyakov
`
`Abstract—Overcomplete representations are attracting interest
`in signal processing theory, particularly due to their potential to
`generate sparse representations of signals. However, in general, the
`problem of finding sparse representations must be unstable in the
`presence of noise. This paper establishes the possibility of stable
`recovery under a combination of sufficient sparsity and favorable
`structure of the overcomplete system. Considering an ideal under-
`lying signal that has a sufficiently sparse representation, it is as-
`sumed that only a noisy version of it can be observed. Assuming
`further that the overcomplete system is incoherent, it is shown that
`the optimally sparse approximation to the noisy data differs from
`the optimally sparse decomposition of the ideal noiseless signal by
`at most a constant multiple of the noise level. As this optimal-spar-
`sity method requires heavy (combinatorial) computational effort,
`approximation algorithms are considered. It is shown that similar
`stability is also available using the basis and the matching pursuit
`algorithms. Furthermore, it is shown that these methods result in
`sparse approximation of the noisy data that contains only terms
`also appearing in the unique sparsest representation of the ideal
`noiseless sparse signal.
`Index Terms—Basis pursuit, greedy approximation, incoherent
`dictionary, Kruskal rank, matching pursuit, overcomplete repre-
`sentation, sparse representation, stability, stepwise regression, su-
`perresolution.
`
`I. INTRODUCTION
`
`A. Overcomplete Representation
`
`R ESEARCHERS spanning a diverse range of viewpoints
`
`have recently advocated the use of overcomplete signal
`representations [27], [30], [1], [5], [4], [33], [37], [35]. Gen-
`,
`erally speaking, they suppose we have a signal vector
`, with
`and a collection of vectors
`such vectors, so that the collection forms “more than a basis”;
`since [27] such collections are usually called dictionaries, and
`their elements are called atoms. We want a representation for
`as a linear combination of atoms in this
`our signal
`dictionary.
`Such representations differ from the more traditional basis
`representation because they offer a wider range of generating
`
`Manuscript received February 29, 2004; revised September 12, 2005. This
`work was supported in part by the National Science Foundation under Grants
`DMS 00-77261, 01-40698 (FRG), 02-00187, ANI-008584 (ITR) and by
`DARPA ACMP, and ONR-MURI
`D. L. Donoho is with the Department of Statistics, Stanford University, Stan-
`ford, CA 94305-9025 USA (e-mail: donoho@stanford.edu).
`M. Elad is with the Department of Computer Science, Technion–Israel Insti-
`tute of Technology, Technion City, Haifa 32000, Israel (e-mail: elad@cs.tech-
`nion.ac.il).
`V. N. Temlyakov is with the Department of Mathematics, University of South
`Carolina, Columbia, SC 29208 USA (e-mail: temlyak@math.sc.edu).
`Communicated by G. Battail, Associate Editor At Large.
`Digital Object Identifier 10.1109/TIT.2005.860430
`
`elements; potentially, this wider range allows more flexibility in
`signal representation, and more effectiveness at tasks like signal
`extraction and data compression. Proposals for overcomplete
`representations have included multiscale Gabor functions [27],
`[30], systems defined by algebraic codes [33], amalgams of
`wavelets and sinusoids [3], [4], [14], libraries of windowed
`cosines with a range of different widths and locations [5],
`[42], multiscale windowed ridgelets [32], systems generated at
`random [11], and amalgams of wavelets and linelike generating
`elements [22].
`A number of interesting arguments, both heuristic and
`theoretical, have been advanced to support the benefits of over-
`completeness; in theoretical neuroscience it has been argued
`that overcomplete representations are probably necessary for
`use in biological settings in the mammalian visual system [28];
`in approximation theory, there are persuasive examples where
`approximation from overcomplete systems outperforms any
`known basis [2]; in signal processing, it has been reported that
`decomposition into separate transforms gives improved com-
`pression [1], [8] and improved equalization [6]; and in image
`processing, it has been shown that one can separate images into
`disjoint signal types using such decompositions [31], [32], [22].
`At the same time, there is an apparent obstacle to overcom-
`plete representations, based on elementary linear algebra. We
`can think of the atoms in our dictionary as columns in a matrix
`, so that
`is
`by
`and
`. A representation of
`. How-
`can be thought of as a vector
`satisfying
`ever, linear algebra tells us that because
`, the problem of
`representation is underdetermined. Hence, as is widely taught
`in elementary courses, there is no unique solution to the rep-
`resentation problem, and far more disturbingly, if the data are
`even slightly inaccurate, some familiar algorithms will be stag-
`geringly unstable. That this can be a real issue was shown by
`Wohlberg [43] who considered a dictionary of sinusoids with
`frequencies spaced finer than the usual discrete Fourier frequen-
`cies, and documented the extreme ill-posedness that can result.
`In this paper, we consider the impact of sparsity constraints
`on this situation, and study algorithms which can in certain
`circumstances generate sparse representations in an overcom-
`plete dictionary. We derive rigorous bounds showing that, when
`has a property of mutual incoherence (defined
`the dictionary
`below), and when it offers a sufficiently sparse representation
`for the ideal noiseless signal, the algorithms are locally stable,
`i.e., under addition of small amounts of noise, the algorithms re-
`cover the ideal sparse representation with an error that grows at
`most proportionally to the noise level. Some of the algorithms
`are even globally stable, i.e., they recover the ideal noiseless re-
`construction with an error at worst proportional to noise level
`
`0018-9448/$20.00 © 2006 IEEE
`
`
`
`DONOHO et al.: STABLE RECOVERY OF SPARSE OVERCOMPLETE REPRESENTATIONS IN THE PRESENCE OF NOISE
`
`7
`
`even under the addition of arbitrary amounts of noise. Under
`sufficient sparsity the constants of proportionality are very rea-
`sonable.
`In short, we show that, although the problem of recovering the
`underlying overcomplete representation is admittedly very ill-
`posed in general, when the underlying representation is sparse,
`and the dictionary is incoherent, the ill-posedness can disappear.
`
`B. Sparse Representation
`To fix ideas, consider the problem of finding the sparsest rep-
`resentation possible in an overcomplete dictionary . As a mea-
`sure of sparsity of a vector
`, we take the so-called
`norm
`, which is simply the number of nonzero elements in .
`The sparsest representation is then the solution to the optimiza-
`tion problem
`
`subject to
`
`(1.1)
`
`As stated, this seems to be a general combinatorial opti-
`mization problem, requiring that one enumerate all possible
`-element collections of columns of
`, for
`,
`looking for the smallest collection permitting representation
`of the signal. Such an algorithm would cost at least
`flops to carry out in general, and at least
`even when a
`-element representation existed. We therefore turn to
`sparse
`.
`approximations/relaxations of
`Orthogonal Greedy Algorithm. One heuristic approach
`builds up -element approximate representations a step at a
`-element approximation a
`time, adding to an existing
`new term chosen in a greedy fashion to minimize the resulting
`error (over all possible choices of the single additional
`stages, one gets a sparse
`term). When stopped after
`approximate representation. In more detail, the procedure starts
`and a current decomposition
`from an initial residual
`; then for
`it augments the decomposition
`and updates the residual
`step-
`. We suppose that the
`wise, always maintaining
`dictionary has normalized atoms, so that each
`. At
`the th stage, the algorithm selects an atom to be added to the
`decomposition based on correlation with the current residual
`
`it builds a decomposition consisting of the atoms selected
`through that stage
`
`(1.2)
`
`where the coefficients
`are fitted by least squares to mini-
`; and it subtracts this model from , obtaining
`mize
`a new residual
`
`which can be input to the next stage of the algorithm. At that
`point, the basic iteration is repeated. The algorithm can be
`
`stopped when the residual norm is below some predetermined
`threshold, or based on the number of atoms used.
`In the setting of statistical modeling, greedy stepwise least
`squares is called forward stepwise regression, and has been
`widely practiced since the 1960s [7], [20]. When used in the
`signal processing setting, this goes by the name of matching
`pursuit (MP) [27]; actually we have described a variant called
`orthogonal matching pursuit (OMP) [29]. Following [9], we
`call this the orthogonal greedy algorithm (OGA).
`Convex Relaxation. A more formal approach convexifies
`by replacing the
`-norm with an
`-norm
`
`subject to
`
`(1.3)
`
`This can be cast as a linear programming (LP) problem, for
`which solutions are available even in large-scale problems,
`owing to modern interior-point linear programming methods.
`This approach to overcomplete signal representation was called
`basis-pursuit (BP) in [4], which observed that it sometimes
`gave highly sparse solutions to problems known to have such
`sparse solutions, and showed that it could, in specific cases,
`outperform the greedy pursuit approach in generating sparse
`solutions.
`Formal Justification. The key point about both OGA and BP
`is that they are much more practical than the direct solution of
`. Perhaps surprisingly, these approaches can, with certain
`. Thus, practical methods can
`conditions, correctly solve
`solve problems that otherwise on the surface seem computa-
`tionally intractable. Previous work [9], [17], [39], [38], [15],
`[13], [14], [11], [19] established that both OGA and BP ap-
`proaches can be successful for signals having sparse representa-
`and , these algorithms
`tions; under appropriate conditions on
`. The concept of
`produce the globally optimal solution of
`plays a major role in these
`mutual coherence of the dictionary
`are normal-
`results. It is defined, assuming that the columns of
`ized to unit
`-norm, in terms of the Gram matrix
`.
`denoting entries of this matrix, the mutual coher-
`With
`ence is
`
`(1.4)
`
`A dictionary is incoherent if
`is small. There are overcom-
`[33]. The results
`and
`plete systems with
`in [13], [14], [11], [19] showed that, if there exists a represen-
`with sparsity
`, and
`does not ex-
`tation
`ceed a threshold
`defined by
`alone (we consider
`), then a) this is the unique sparsest representation, and
`,
`b) these algorithms would find it. If, for example,
`this result promises, for large , an ideal form of atomic decom-
`position even of fairly complex objects. In such cases, provided
`is made from
`atoms in the dictionary, this
`the object
`sparse decomposition can be uniquely recovered.
`
`C. Presence of Noise
`In most practical situations it is not sensible to assume that
`with a sparse
`the available data obey precise equality
`representation . A more plausible scenario assumes sparse ap-
`proximate representation: that there is an ideal noiseless signal
`
`
`
`8
`
`IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 1, JANUARY 2006
`
`with a sparse representation,
`small, but that we can observe only a noisy version
`.
`where
`. We can adapt to this noisy set-
`Noise-Aware Variant of
`to include a noise allowance
`ting by modifying
`
`with
`
`,
`
`subject to
`
`(1.5)
`
`Note that
`
`obeys
`solution of
`
`. Also, if we apply this with
`, the problem has a sparse solution; in fact, the solution
`, or more formally, since
`is a feasible
`
`(1.6)
`
`of an optimization problem stands for its
`In our notations,
`rarely has a unique so-
`value at the solution. Note that
`lution, since once the sparsest solution is found, many feasible
`variants of it sharing the same support can be built.
`de-
`Noise-Aware Variant of OGA. Just as with
`mands exorbitant computational efforts in general, and so again
`we may resort to heuristics and relaxations. On the one hand,
`OGA can be employed for approximating the solution of (1.5);
`the stepwise procedure can simply be stopped when the repre-
`sentation error gets below .
`. On the other hand, we can
`Noise-Aware Variant of
`pursue a strategy of convexification, replacing the
`-norm in
`-norm
`(1.5) by an
`
`subject to
`
`(1.7)
`
`This can be cast as a convex quadratic program which can
`be solved by many standard approaches, including iteratively
`reweighted least squares (IRLS) [23], interior-point algorithms
`[4], and active-set methods. It is also closely related to basis
`pursuit denoising (BPDN) [4], and to the LASSO technique
`employed in statistical regression to avoid overfitting when
`combining predictors [34]. Both those proposals amount to
`solving a corresponding convex optimization in Lagrangian
`form
`
`with at most
`
`solution
`
`nonzeros. We have that the noise is bounded,
`. Then if
`, it follows that the
`of
`obeys
`
`(1.9)
`.
`with the stability coefficient
`The proportionality constant
`(which we also call the stability
`coefficient) can be quite moderate given sufficient sparsity. In
`words, provided the underlying object is sparsely represented
`and the noise level is known, recovery by explicitly imposing
`sparsity yields an approximation to the ideal sparse decomposi-
`tion of the noiseless signal in which the error is at worst propor-
`tional to the input noise level.
`minimization.
`Next, we develop a parallel result for
`Making parallel assumptions,
`tightened so that
`the ideal
`has a sparse representation
`with
`noiseless signal
`, we show that the solution
`
`of
`
`obeys
`
`(1.10)
`
`where the stability coefficient
`
`-based reconstruction in incoherent overcomplete
`In words,
`systems has an error which is at worst proportional to the input
`noise level. The sparsity requirement is twice as stringent for the
`-based result as for the
`-based result.
`By comparison, OGA obeys a local stability result. Again
`, and
`suppose a possibly overcomplete system with
`having a representation with at most
`an ideal noiseless signal
`atoms. Suppose that the smallest among the
`nonzeros in
`has amplitude
`. Assume that we know
`the representation of
`the noise level
`and run the OGA just until the
`. Call the result of this greedy algorithm
`representation error
`. Set
`
`Then if the noise is sufficiently weak
`
`(1.11)
`
`(1.8)
`
`the recovered representation
`
`obeys
`
`and
`the solutions of
`for appropriate
`leads to
`are the same. Solving (1.8) with fixed
`different results—see [40] for an analysis of this option.
`, one can use
`-norm with
`We note that instead of
`in order to better imitate
`while losing convexity. This is
`the spirit behind the FOCUSS method [18].
`
`(1.12)
`This is a local stability result because for large values of
`the condition (1.11) will necessarily fail.
`Note the parallel nature of the bounds and the conclusions.
`Three quite different algorithms all obey stability results in
`a fraction of
`is the key assumption.
`which having
`
`D. Stability Properties
`In this paper, we develop several results exhibiting stable re-
`covery of sparse representations in the presence of noise. We
`now briefly sketch their statements.
`First, we show that when sufficient sparsity is present, where
`“sufficient” is defined relative to the degree of mutual incoher-
`enables stable recovery. We suppose that
`ence, solving
`we have a possibly overcomplete system with mutual coher-
`ence
`. Suppose that we have a noisy signal
`and
`has a sparse representation
`that the ideal noiseless signal
`
`E. Support Properties
`A different fundamental question about efforts to obtain
`sparse representation is: do we actually recover the correct
`sparsity pattern? Our stability results do not address this ques-
`tion, since it is possible for a nonsparse representation to be
`sense.
`close to a sparse representation in an
`The question is
`fundamental and broadly significant.
`Throughout science and technology, it is habitual to fit sparse
`models to noisy data, and then simply assume that terms
`appearing in such fitted models are dependable features.
`
`
`
`DONOHO et al.: STABLE RECOVERY OF SPARSE OVERCOMPLETE REPRESENTATIONS IN THE PRESENCE OF NOISE
`
`9
`
`In this paper, we are able to shed some light on this situation.
`Our results show that, under appropriate conditions, the empir-
`is not only at least as sparse as the ideal
`ical representation
`sparse representation but it only contains atoms also appearing
`in the ideal sparse representation. Since that ideal sparse repre-
`sentation is, by our other results, unique and well-defined, these
`insights endow the empirical support of with a, perhaps sur-
`prising, significance.
`Our first result concerns solution of
`with
`Here,
`, and so we are solving the min-
`imization problem using an exaggeration of the noise level. It
`and
`, that the solu-
`shows, with
`tion
`has its support contained in the support of
`. Here
`for
`for high
`values of
`, so ordinarily it requires considerable overstatement
`of the noise level to achieve this level of conservatism. However,
`it does provide the very interesting epistemological benefit that
`the atoms appearing in the representation have more than inci-
`dental meaning.
`Our second result is obtained in the course of analyzing OGA;
`, the ideal
`it shows that, under condition (1.11) and
`is
`noiseless representation is unique, and the support of
`contained in the support of
`.
`
`have mutual coherence
`Theorem 2.1: Let the dictionary
`. Suppose the noiseless signal
`, where
`
`satisfies
`
`(2.1)
`
`Then
`; and
`is the unique sparsest such representation of
`a)
`b) the reconstruction
`from applying
`to the noisy
`approximates
`data
`
`(2.2)
`
`Claim a) actually follows from known results; e.g., see [13],
`[14] for the two-ortho case, and [11], [19] for general dictio-
`naries. Claim b) requires the following.
`
`, and let
`Lemma 2.2: Let
`submatrix formed by concatenating
`has the
`th singular value bounded below by
`.
`Proof: This is equivalent to the claim that
`
`. Every
`columns from
`
`(2.3)
`
`F. Contents
`The next sections supply the analysis behind the stability
`bounds just quoted, and a discussion of support properties. The
`final section extends this work in various directions.
`
`is
`columns from and
`is the concatenation of
`where
`any vector in
`. Assume without loss of generality these are
`columns. Let
`be the corresponding Gram
`the first
`matrix, and write
`
`(cid:127) Numerical Results. We study the actual stability and sup-
`and OGA on synthetic ex-
`port recovery behavior of the
`amples, finding typical behavior far more favorable than
`our theoretical bounds.
`
`Now
`
`(2.4)
`
`(cid:127)
`
`Superresolution. We situate our work with respect to the
`problem of superresolution, in which astronomers, seis-
`mologists, spectroscopists, and others attempt to “deblur”
`sparse spike trains.
`
`(cid:127) Geometry. We develop a geometric viewpoint explaining
`penal-
`why stability can sometimes be expected for the
`ization scheme, under conditions of sufficient sparsity.
`
`We have recently learned that in parallel to our efforts, there
`are two similar contributions, handling stability and support re-
`covery for the basis pursuit. J. A. Tropp has been working in-
`dependently on some of the same problems [40], and so has J.
`J. Fuchs [16]. After some recent discussions with these authors
`and a careful study of these works we find that their methods
`and results have a rather different flavor, ensuring that the three
`separate works are of interest in studying sparse approximation
`under noise.
`
`Using this inequality in the identity (2.4) gives (2.3).
`
`Apply this to
`
`, with
`
`, and we have
`
`is a linear combination of
`, then
`Also, note that, if
`at most
`can be written as a
`. Thus,
`columns of
`columns from . Applying
`linear combination of at most
`Lemma 2.2, or, more properly, the inequality (2.3), gives the
`result.
`
`II. STABILITY USING
`
`III. STABILITY USING
`
`Suppose again the existence of an ideal noiseless signal
`.
`and noisy observations
`with
`with
`to obtain a sparse approx-
`Consider applying
`imation to . The following establishes the stability estimate
`mentioned in the introduction.
`
`,
`As in the Introduction, we are given a signal
`. We
`is an additive noise, known to satisfy
`where
`); i.e., we
`to this signal (necessarily with
`apply
`solve (1.7) and obtain a solution
`. We study its deviation
`.
`from the ideal representation
`
`
`
`10
`
`IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 1, JANUARY 2006
`
`A. Stability Result
`Theorem 3.1: Let the overcomplete system have mutual
`coherence
`. If some representation of the noiseless signal
`satisfies
`
`(3.1)
`
`We next simplify our analysis by eliminating the noise vector
`using
`
`,
`
`(3.8)
`Expanding the feasible set of (3.7) using this observation gives
`
`then the deviation of the
`, can be bounded by
`suming
`
`representation from , as-
`
`subject to
`
`#
`
`(3.9)
`
`(3.2)
`
`Proof: The stability bound can be posed as the solution to
`an optimization problem of the form
`
`subject to
`
`subject to
`
`(3.3)
`
`.
`where we introduced
`is not posed in terms of the abso-
`The constraint
`, complicating the analysis; we now
`lute values in the vector
`. Again, the Gram
`relax this constraint using incoherence of
`, and the mutual coherence is the maximal
`matrix is
`. For a vector
`off-diagonal amplitude:
`, let
`be vector containing absolute values from ; simi-
`be the the
`-by- matrix of all
`larly for matrices. Also, let
`ones. The constraint
`
`Put in words, we consider all representation vectors
`of
`bounded support, and all possible realizations of bounded
`noise, and we ask for the largest error between the ideal sparse
`decomposition and its reconstruction from noisy data. Defining
`, and similarly
`, we can rewrite the
`above problem as
`
`can be relaxed
`
`subject to
`
`subject to
`
`Using this,
`
`(3.9) is bounded above by the value
`
`(3.4)
`
`(3.10)
`
`(3.4) in a sequence of re-
`We develop an upper bound on
`laxations, each one expanding the feasible set and increasing
`is the minimizer of
`the maximal value. To begin, note that if
`under these constraints, then relaxing the constraints
`satisfying
`expands the feasible set.
`to all
`since otherwise
`is not
`However, this is true only if
`a feasible solution. Thus, we consider
`
`subject to
`
`#
`
`(3.11)
`This problem is invariant under permutations of the entries in
`which preserve membership in
`and
`. It is also invariant
`under relabeling of coordinates. So assume that all nonzeros in
`are concentrated in the initial slots of the vector, i.e., that
`.
`
`(3.5)
`
`Putting
`, and
`
`the remaining
`
`where
`
`entries in
`gives the first
`entries of
`, we obviously have
`
`We now expand this set by exploiting the relation
`
`The
`
`norm on
`times the
`
`dominates the
`norm. Thus,
`
`norm and is dominated by
`
`(3.12)
`
`with complement
`is the support of the nonzeros in
`where
`. Therefore,
`, and we used
`we get a further increase in value by replacing the feasible set
`in (3.5) with
`
`#
`
`(3.6)
`
`We define
`
`Writing this out yields a new optimization problem with still
`larger value
`
`subject to
`
`(3.13)
`
`(3.14)
`
`#
`
`(3.7)
`
`Returning to the problem given in (3.11), and using our
`notations, we obtain a further
`reduction,
`from an opti-
`
`
`
`DONOHO et al.: STABLE RECOVERY OF SPARSE OVERCOMPLETE REPRESENTATIONS IN THE PRESENCE OF NOISE
`
`11
`
`mization problem on
`
`to an optimization problem on
`
`IV. SUPPORT RECOVERY WITH
`
`subject to
`
`(3.15)
`
`We further define
`as
`
`, where
`
`and rewrite (3.15)
`
`subject to
`
`. We now
`So far, we have concentrated on the recovery of
`.
`consider whether we can correctly recover the support of
`Our approach applies
`with a specially chosen
`
`.
`
`Theorem 4.1: Suppose that
`and
`. Let
`
`. Set
`
`where
`
`and suppose
`
`(3.16)
`
`Solve
`
`with exaggerated noise level
`.
`
`(4.1)
`
`. Then
`
`Define
`region (3.16). Setting
`defining that region takes the form
`
`. Then
`
`over the
`, the first constraint
`
`Since
`
`, the sparsity requirement (3.1) leads to
`
`Hence,
`
`(3.17)
`
`(3.18)
`
`(3.19)
`
`.
`as stated by (3.2) with the choice
`The requirement (3.18) puts a restriction on
`free parameters of the problem. Using
`sparsity requirement in (3.1), since
`
`, being
`and
`leads to the
`.
`
`B. Interpretation and Comments
`Theorem 3.1 prompts several remarks.
`
`(cid:127) Setting
`puts us in the noiseless case
`. In that setting, Theorem 3.1 tells us that if
`, there will be zero error in finding the
`unique sparsest representation—i.e., solving the
`opti-
`solves the
`problem
`.
`mization problem
`problem is convex and the
`problem combi-
`As the
`natorial in general, this is by itself significant. The same
`general phenomenon described has been observed before
`in [13], [14], [11], [19]. The sharpest results, in [11], [19],
`established that this phenomenon occurs for any sparsity
`smaller than
`, which means that the new
`result is slack by a factor of
`in the
`case. Perhaps
`a tighter inequality could be achieved with more care.
`
`(cid:127)
`
`(cid:127)
`
`If the signal is not noisy (i.e.,
`) but nevertheless
`is employed with
`, an approximate solution is
`assured, with a bound on the deviation of the approximate
`representation from the ideal noiseless representation. So
`to
`we tolerate
`in “needlessly” going from
`errors in the decomposition, but the errors are controlled.
`
`In practice, if the signal is noisy (i.e.,
`) and we set
`as if there were no noise, some degree of stability
`is still obtained! However, our results do not cover this
`case, and further work is required to establish this kind of
`stability.
`
`As an example, if
`, exaggerating the noise level by a
`leads to partial support recovery. This pro-
`factor
`nounced inflation of the noise level might cause (in an extreme
`case) a zero solution. Still, from the following proof it seems
`dependence is intrinsic to the problem.
`that
`
`th
`be the smallest
`Proof: For later use, we let
`containing
`columns
`of
`singular value of any submatrix
`from . By our assumptions and Lemma 2.2
`
`(4.2)
`
`and
`
`We need notation for the convex functional
`.
`for the quadratic functional
`be the support of the ideal noiseless representation
`Now let
`, and consider the support-constrained optimization problem
`where feasible vectors must be supported in . Let
`be a solution of this problem. We claim that, in fact,
`is ac-
`,
`tually the solution of the support-unconstrained problem
`. Observe, first of all, that unless
`, there is
`i.e.,
`nothing to prove. Now consider perturbations
`i.e., rep-
`of
`, for
`small. We will
`resentations of the form
`objec-
`show that a perturbation which does not increase the
`tive, definitely violates the constraint. Formally
`
`for all sufficiently small
`
`(4.3)
`
`implies
`
`for all sufficiently small
`
`(4.4)
`
`It follows that
`. By convexity,
`is locally optimal for
`this local condition implies global optimality; and global opti-
`as claimed.
`mality implies that the solution has support
`We now note that, for
`
`where
`and
`sufficiently small positive
`
`(4.5)
`
`, while for
`
`(4.6)
`
`
`
`12
`
`IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 1, JANUARY 2006
`
`Using
`
`and the definition (4.2) of
`
`and so
`
`In short, for (4.12) and hence (4.10) to follow, we should re-
`quire
`
`(4.13)
`
`for
`
`otherwise, and we
`, and
`where
`, valid for
`. Let
`used the identity
`denote the terms of order
`in (4.5) and (4.7);
`and
`we plan to show that
`
`(4.7)
`
`this will show that (4.3) implies (4.4).
`.
`solving
`We first work out the consequence of
`, there is a nonempty subspace of vectors sup-
`Since
`; (4.5) and (4.7) show that
`and
`are both dif-
`ported in
`. The fact that
`solves the constrained op-
`ferentiable at
`timization problem
`, implies (by classical constrained
`optimization/Lagrange multiplier ideas) that for some
`
`This implies that for vectors
`, or
`
`supported in
`
`we must have
`
`(4.8)
`
`for inner product in
`We introduce the notation
`for
`for inner product restricted to coordinates in
`, and
`. We introduce the
`inner product restricted to coordinates in
`for the
`norm restricted to
`, etc. To illus-
`notation
`trate the notation, we have that, if
`is a vector supported in
`. Then (4.8) says that, for all
`
`at
`
`which can be rearranged to
`
`(4.14)
`
`(4.15)
`
`This holds because our definition of makes this an equality.
`It only remains to demonstrate (4.11). Write
`where
`is the component of not in the span of
`, while
`is the component of
`which has norm
`, with norm
`. Hence,
`of the
`
`,
`,
`in the span
`
`We now show that
`is the unique global optimum of
`—i.e., the original problem, without the support con-
`straint. We write
`
`(4.9)
`
`and
`
`then from (4.9)
`
`while
`
`where
`are entries of the Gram matrix
`) and
`is the submatrix of
`(known to be in the range
`with columns from only. The definition of
`yields
`
`(4.16)
`
`Hence,
`
`implies
`
`and also
`
`and since
`
`, (4.16) gives
`
`giving (4.11).
`
`Hence, (4.7) follows from
`
`Later we will show that
`
`Thus, (4.10) follows from
`
`Recalling (4.9), and choosing
`
`, shows that
`
`V. LOCAL STABILITY OF THE OGA
`
`(4.10)
`
`(4.11)
`
`(4.12)
`
`Observe that both
`refer to global optimiza-
`and
`tion problems, while the OGA described in the Introduction is
`based on greedy stagewise approximation. Paralleling this dis-
`tinction, the stability result we now develop for OGA is a local
`, depending on
`one, valid only for sufficiently small
`the representation coefficients.
`For ease of exposition, we shall hereafter assume that the
`in the overcomplete system
`order of the columns
`has been chosen so that in the ideal noiseless signal
`matrix
`, the first
`entries in
`are the nonzero entries, and
`that these are ordered
`
`(5.1)
`
`
`
`DONOHO et al.: STABLE RECOVERY OF SPARSE OVERCOMPLETE REPRESENTATIONS IN THE PRESENCE OF NOISE
`
`13
`
`Theorem 5.1: Suppose the ideal noiseless signal
`satisfying1
`representation
`
`for
`for all
`We used
`. The right-hand side of (5.6) can
`and the ordering of the
`be bounded above by the same approach, leading to, for
`
`has a
`
`(5.2)
`
`the result of greedy stepwise least-squares
`Denote by
`, which stops as soon as the
`fitting applied on the noisy signal
`representation error
`. Then
`a)
`has the correct sparsity pattern
`
`(5.3)
`
`Imposing (5.5) and using the two bounds (5.7) and (5.8), we see
`that
`
`(5.8)
`
`b)
`
`approximates the ideal noiseless representa-
`
`(5.9)
`
`tion
`
`Relation (5.6) follows; the greedy algorithm therefore chooses
`at Stage 1 one of the nonzeros in the ideal representation of the
`noiseless signal.
`
`(5.4)
`
`To continue to later stages, we need the following.
`
`with
`Lemma 5.3: Let
`being
`indices in
`. Let
`be a set of
`the number of columns in . Let
`coefficients
`be a vector of
`with nonzeros located at the indices in
`. Define a new signal
`by subtracting
`atoms with nonzero coefficients in
`
`and
`
`Similarly, define
`
`Then
`(cid:127)
`
`if
`has a
`, where
`unique sparsest representation
`made of at most
`atoms; these are all atoms originally appearing in the
`;
`representation of
`can be viewed as a superposition of
`the new signal
`and noise
`, with noise level
`.
`Proof: Define the vector
`
`Note that the argument assumes the noise level
`is known,
`to enable the stopping rule in the algorithm. This parallels the
`, which we relaxed in Theorems 2.1 and 3.1
`assumption
`by using
`.
`and
`The general idea—that the support properties of
`are the same—seems worthy of its own study. In the
`Technical Report [12] on which this paper is based, we call this
`the trapping property, and develop it further.
`We break our analysis in two stages, considering claims (5.3)
`and (5.4) in turn.
`
`A. Getting the “Correct” Support
`Lemma 5.2: Suppose that we have a signal
`where
`admits sparse synthesis
`atoms, where
`
`most
`
`satisfying
`using at
`
`(5.5)
`
`. Then the first step of the greedy algorithm
`and where
`nonzeros in
`.
`selects an atom index from among the
`Proof: The greedy algorithm operates by projecting onto
`each atom in turn, selecting an atom index where the projec-
`tion magnitude is largest. The lemma will follow from
`
`(cid:127)
`
`(5.6)
`
`Then clearly
`then
`
`. Also,
`
`. Since
`
`We now develop a lower bound on the left-hand side and an
`upper bound on the right-hand side which guarantees this. As-
`occurs in slot
`,
`suming that the largest amplitude entry in
`the left-hand side of (5.6) is bounded below by
`
`we conclude that
`Moreover
`
`is the unique sparsest representation of
`
`.
`
`1This inequality is equivalent to the one posed in (1.11).
`
`Hence we have established the two claims.
`
`(5.7)
`
`The impact of the preceding two lemmas is that selection of
`a term, followed by the formation of the residual signal, leads
`us to a situation like before, where the ideal noiseless signal has
`no more atoms than before, and the noise level is the same.
`
`
`
`14
`
`IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 1, JANUARY 2006
`
`We wish to repeatedly apply these lemmas. Starting with
`, we will get an
`and an
`; we then hope to apply the
`and
`, etc. If we are allowed
`observations again, getting
`to continue invoking these lemmas for
`steps, we produce in
`, and
`. Naturally, the sets
`this way series
`are nested.
`Note, however, that a series of conditions must be satisfied
`for the repeated use