throbber
6
`
`IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 1, JANUARY 2006
`
`Stable Recovery of Sparse Overcomplete
`Representations in the Presence of Noise
`
`David L. Donoho, Member, IEEE, Michael Elad, and Vladimir N. Temlyakov
`
`Abstract—Overcomplete representations are attracting interest
`in signal processing theory, particularly due to their potential to
`generate sparse representations of signals. However, in general, the
`problem of finding sparse representations must be unstable in the
`presence of noise. This paper establishes the possibility of stable
`recovery under a combination of sufficient sparsity and favorable
`structure of the overcomplete system. Considering an ideal under-
`lying signal that has a sufficiently sparse representation, it is as-
`sumed that only a noisy version of it can be observed. Assuming
`further that the overcomplete system is incoherent, it is shown that
`the optimally sparse approximation to the noisy data differs from
`the optimally sparse decomposition of the ideal noiseless signal by
`at most a constant multiple of the noise level. As this optimal-spar-
`sity method requires heavy (combinatorial) computational effort,
`approximation algorithms are considered. It is shown that similar
`stability is also available using the basis and the matching pursuit
`algorithms. Furthermore, it is shown that these methods result in
`sparse approximation of the noisy data that contains only terms
`also appearing in the unique sparsest representation of the ideal
`noiseless sparse signal.
`Index Terms—Basis pursuit, greedy approximation, incoherent
`dictionary, Kruskal rank, matching pursuit, overcomplete repre-
`sentation, sparse representation, stability, stepwise regression, su-
`perresolution.
`
`I. INTRODUCTION
`
`A. Overcomplete Representation
`
`R ESEARCHERS spanning a diverse range of viewpoints
`
`have recently advocated the use of overcomplete signal
`representations [27], [30], [1], [5], [4], [33], [37], [35]. Gen-
`,
`erally speaking, they suppose we have a signal vector
`, with
`and a collection of vectors
`such vectors, so that the collection forms “more than a basis”;
`since [27] such collections are usually called dictionaries, and
`their elements are called atoms. We want a representation for
`as a linear combination of atoms in this
`our signal
`dictionary.
`Such representations differ from the more traditional basis
`representation because they offer a wider range of generating
`
`Manuscript received February 29, 2004; revised September 12, 2005. This
`work was supported in part by the National Science Foundation under Grants
`DMS 00-77261, 01-40698 (FRG), 02-00187, ANI-008584 (ITR) and by
`DARPA ACMP, and ONR-MURI
`D. L. Donoho is with the Department of Statistics, Stanford University, Stan-
`ford, CA 94305-9025 USA (e-mail: donoho@stanford.edu).
`M. Elad is with the Department of Computer Science, Technion–Israel Insti-
`tute of Technology, Technion City, Haifa 32000, Israel (e-mail: elad@cs.tech-
`nion.ac.il).
`V. N. Temlyakov is with the Department of Mathematics, University of South
`Carolina, Columbia, SC 29208 USA (e-mail: temlyak@math.sc.edu).
`Communicated by G. Battail, Associate Editor At Large.
`Digital Object Identifier 10.1109/TIT.2005.860430
`
`elements; potentially, this wider range allows more flexibility in
`signal representation, and more effectiveness at tasks like signal
`extraction and data compression. Proposals for overcomplete
`representations have included multiscale Gabor functions [27],
`[30], systems defined by algebraic codes [33], amalgams of
`wavelets and sinusoids [3], [4], [14], libraries of windowed
`cosines with a range of different widths and locations [5],
`[42], multiscale windowed ridgelets [32], systems generated at
`random [11], and amalgams of wavelets and linelike generating
`elements [22].
`A number of interesting arguments, both heuristic and
`theoretical, have been advanced to support the benefits of over-
`completeness; in theoretical neuroscience it has been argued
`that overcomplete representations are probably necessary for
`use in biological settings in the mammalian visual system [28];
`in approximation theory, there are persuasive examples where
`approximation from overcomplete systems outperforms any
`known basis [2]; in signal processing, it has been reported that
`decomposition into separate transforms gives improved com-
`pression [1], [8] and improved equalization [6]; and in image
`processing, it has been shown that one can separate images into
`disjoint signal types using such decompositions [31], [32], [22].
`At the same time, there is an apparent obstacle to overcom-
`plete representations, based on elementary linear algebra. We
`can think of the atoms in our dictionary as columns in a matrix
`, so that
`is
`by
`and
`. A representation of
`. How-
`can be thought of as a vector
`satisfying
`ever, linear algebra tells us that because
`, the problem of
`representation is underdetermined. Hence, as is widely taught
`in elementary courses, there is no unique solution to the rep-
`resentation problem, and far more disturbingly, if the data are
`even slightly inaccurate, some familiar algorithms will be stag-
`geringly unstable. That this can be a real issue was shown by
`Wohlberg [43] who considered a dictionary of sinusoids with
`frequencies spaced finer than the usual discrete Fourier frequen-
`cies, and documented the extreme ill-posedness that can result.
`In this paper, we consider the impact of sparsity constraints
`on this situation, and study algorithms which can in certain
`circumstances generate sparse representations in an overcom-
`plete dictionary. We derive rigorous bounds showing that, when
`has a property of mutual incoherence (defined
`the dictionary
`below), and when it offers a sufficiently sparse representation
`for the ideal noiseless signal, the algorithms are locally stable,
`i.e., under addition of small amounts of noise, the algorithms re-
`cover the ideal sparse representation with an error that grows at
`most proportionally to the noise level. Some of the algorithms
`are even globally stable, i.e., they recover the ideal noiseless re-
`construction with an error at worst proportional to noise level
`
`0018-9448/$20.00 © 2006 IEEE
`
`

`

`DONOHO et al.: STABLE RECOVERY OF SPARSE OVERCOMPLETE REPRESENTATIONS IN THE PRESENCE OF NOISE
`
`7
`
`even under the addition of arbitrary amounts of noise. Under
`sufficient sparsity the constants of proportionality are very rea-
`sonable.
`In short, we show that, although the problem of recovering the
`underlying overcomplete representation is admittedly very ill-
`posed in general, when the underlying representation is sparse,
`and the dictionary is incoherent, the ill-posedness can disappear.
`
`B. Sparse Representation
`To fix ideas, consider the problem of finding the sparsest rep-
`resentation possible in an overcomplete dictionary . As a mea-
`sure of sparsity of a vector
`, we take the so-called
`norm
`, which is simply the number of nonzero elements in .
`The sparsest representation is then the solution to the optimiza-
`tion problem
`
`subject to
`
`(1.1)
`
`As stated, this seems to be a general combinatorial opti-
`mization problem, requiring that one enumerate all possible
`-element collections of columns of
`, for
`,
`looking for the smallest collection permitting representation
`of the signal. Such an algorithm would cost at least
`flops to carry out in general, and at least
`even when a
`-element representation existed. We therefore turn to
`sparse
`.
`approximations/relaxations of
`Orthogonal Greedy Algorithm. One heuristic approach
`builds up -element approximate representations a step at a
`-element approximation a
`time, adding to an existing
`new term chosen in a greedy fashion to minimize the resulting
`error (over all possible choices of the single additional
`stages, one gets a sparse
`term). When stopped after
`approximate representation. In more detail, the procedure starts
`and a current decomposition
`from an initial residual
`; then for
`it augments the decomposition
`and updates the residual
`step-
`. We suppose that the
`wise, always maintaining
`dictionary has normalized atoms, so that each
`. At
`the th stage, the algorithm selects an atom to be added to the
`decomposition based on correlation with the current residual
`
`it builds a decomposition consisting of the atoms selected
`through that stage
`
`(1.2)
`
`where the coefficients
`are fitted by least squares to mini-
`; and it subtracts this model from , obtaining
`mize
`a new residual
`
`which can be input to the next stage of the algorithm. At that
`point, the basic iteration is repeated. The algorithm can be
`
`stopped when the residual norm is below some predetermined
`threshold, or based on the number of atoms used.
`In the setting of statistical modeling, greedy stepwise least
`squares is called forward stepwise regression, and has been
`widely practiced since the 1960s [7], [20]. When used in the
`signal processing setting, this goes by the name of matching
`pursuit (MP) [27]; actually we have described a variant called
`orthogonal matching pursuit (OMP) [29]. Following [9], we
`call this the orthogonal greedy algorithm (OGA).
`Convex Relaxation. A more formal approach convexifies
`by replacing the
`-norm with an
`-norm
`
`subject to
`
`(1.3)
`
`This can be cast as a linear programming (LP) problem, for
`which solutions are available even in large-scale problems,
`owing to modern interior-point linear programming methods.
`This approach to overcomplete signal representation was called
`basis-pursuit (BP) in [4], which observed that it sometimes
`gave highly sparse solutions to problems known to have such
`sparse solutions, and showed that it could, in specific cases,
`outperform the greedy pursuit approach in generating sparse
`solutions.
`Formal Justification. The key point about both OGA and BP
`is that they are much more practical than the direct solution of
`. Perhaps surprisingly, these approaches can, with certain
`. Thus, practical methods can
`conditions, correctly solve
`solve problems that otherwise on the surface seem computa-
`tionally intractable. Previous work [9], [17], [39], [38], [15],
`[13], [14], [11], [19] established that both OGA and BP ap-
`proaches can be successful for signals having sparse representa-
`and , these algorithms
`tions; under appropriate conditions on
`. The concept of
`produce the globally optimal solution of
`plays a major role in these
`mutual coherence of the dictionary
`are normal-
`results. It is defined, assuming that the columns of
`ized to unit
`-norm, in terms of the Gram matrix
`.
`denoting entries of this matrix, the mutual coher-
`With
`ence is
`
`(1.4)
`
`A dictionary is incoherent if
`is small. There are overcom-
`[33]. The results
`and
`plete systems with
`in [13], [14], [11], [19] showed that, if there exists a represen-
`with sparsity
`, and
`does not ex-
`tation
`ceed a threshold
`defined by
`alone (we consider
`), then a) this is the unique sparsest representation, and
`,
`b) these algorithms would find it. If, for example,
`this result promises, for large , an ideal form of atomic decom-
`position even of fairly complex objects. In such cases, provided
`is made from
`atoms in the dictionary, this
`the object
`sparse decomposition can be uniquely recovered.
`
`C. Presence of Noise
`In most practical situations it is not sensible to assume that
`with a sparse
`the available data obey precise equality
`representation . A more plausible scenario assumes sparse ap-
`proximate representation: that there is an ideal noiseless signal
`
`

`

`8
`
`IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 1, JANUARY 2006
`
`with a sparse representation,
`small, but that we can observe only a noisy version
`.
`where
`. We can adapt to this noisy set-
`Noise-Aware Variant of
`to include a noise allowance
`ting by modifying
`
`with
`
`,
`
`subject to
`
`(1.5)
`
`Note that
`
`obeys
`solution of
`
`. Also, if we apply this with
`, the problem has a sparse solution; in fact, the solution
`, or more formally, since
`is a feasible
`
`(1.6)
`
`of an optimization problem stands for its
`In our notations,
`rarely has a unique so-
`value at the solution. Note that
`lution, since once the sparsest solution is found, many feasible
`variants of it sharing the same support can be built.
`de-
`Noise-Aware Variant of OGA. Just as with
`mands exorbitant computational efforts in general, and so again
`we may resort to heuristics and relaxations. On the one hand,
`OGA can be employed for approximating the solution of (1.5);
`the stepwise procedure can simply be stopped when the repre-
`sentation error gets below .
`. On the other hand, we can
`Noise-Aware Variant of
`pursue a strategy of convexification, replacing the
`-norm in
`-norm
`(1.5) by an
`
`subject to
`
`(1.7)
`
`This can be cast as a convex quadratic program which can
`be solved by many standard approaches, including iteratively
`reweighted least squares (IRLS) [23], interior-point algorithms
`[4], and active-set methods. It is also closely related to basis
`pursuit denoising (BPDN) [4], and to the LASSO technique
`employed in statistical regression to avoid overfitting when
`combining predictors [34]. Both those proposals amount to
`solving a corresponding convex optimization in Lagrangian
`form
`
`with at most
`
`solution
`
`nonzeros. We have that the noise is bounded,
`. Then if
`, it follows that the
`of
`obeys
`
`(1.9)
`.
`with the stability coefficient
`The proportionality constant
`(which we also call the stability
`coefficient) can be quite moderate given sufficient sparsity. In
`words, provided the underlying object is sparsely represented
`and the noise level is known, recovery by explicitly imposing
`sparsity yields an approximation to the ideal sparse decomposi-
`tion of the noiseless signal in which the error is at worst propor-
`tional to the input noise level.
`minimization.
`Next, we develop a parallel result for
`Making parallel assumptions,
`tightened so that
`the ideal
`has a sparse representation
`with
`noiseless signal
`, we show that the solution
`
`of
`
`obeys
`
`(1.10)
`
`where the stability coefficient
`
`-based reconstruction in incoherent overcomplete
`In words,
`systems has an error which is at worst proportional to the input
`noise level. The sparsity requirement is twice as stringent for the
`-based result as for the
`-based result.
`By comparison, OGA obeys a local stability result. Again
`, and
`suppose a possibly overcomplete system with
`having a representation with at most
`an ideal noiseless signal
`atoms. Suppose that the smallest among the
`nonzeros in
`has amplitude
`. Assume that we know
`the representation of
`the noise level
`and run the OGA just until the
`. Call the result of this greedy algorithm
`representation error
`. Set
`
`Then if the noise is sufficiently weak
`
`(1.11)
`
`(1.8)
`
`the recovered representation
`
`obeys
`
`and
`the solutions of
`for appropriate
`leads to
`are the same. Solving (1.8) with fixed
`different results—see [40] for an analysis of this option.
`, one can use
`-norm with
`We note that instead of
`in order to better imitate
`while losing convexity. This is
`the spirit behind the FOCUSS method [18].
`
`(1.12)
`This is a local stability result because for large values of
`the condition (1.11) will necessarily fail.
`Note the parallel nature of the bounds and the conclusions.
`Three quite different algorithms all obey stability results in
`a fraction of
`is the key assumption.
`which having
`
`D. Stability Properties
`In this paper, we develop several results exhibiting stable re-
`covery of sparse representations in the presence of noise. We
`now briefly sketch their statements.
`First, we show that when sufficient sparsity is present, where
`“sufficient” is defined relative to the degree of mutual incoher-
`enables stable recovery. We suppose that
`ence, solving
`we have a possibly overcomplete system with mutual coher-
`ence
`. Suppose that we have a noisy signal
`and
`has a sparse representation
`that the ideal noiseless signal
`
`E. Support Properties
`A different fundamental question about efforts to obtain
`sparse representation is: do we actually recover the correct
`sparsity pattern? Our stability results do not address this ques-
`tion, since it is possible for a nonsparse representation to be
`sense.
`close to a sparse representation in an
`The question is
`fundamental and broadly significant.
`Throughout science and technology, it is habitual to fit sparse
`models to noisy data, and then simply assume that terms
`appearing in such fitted models are dependable features.
`
`

`

`DONOHO et al.: STABLE RECOVERY OF SPARSE OVERCOMPLETE REPRESENTATIONS IN THE PRESENCE OF NOISE
`
`9
`
`In this paper, we are able to shed some light on this situation.
`Our results show that, under appropriate conditions, the empir-
`is not only at least as sparse as the ideal
`ical representation
`sparse representation but it only contains atoms also appearing
`in the ideal sparse representation. Since that ideal sparse repre-
`sentation is, by our other results, unique and well-defined, these
`insights endow the empirical support of with a, perhaps sur-
`prising, significance.
`Our first result concerns solution of
`with
`Here,
`, and so we are solving the min-
`imization problem using an exaggeration of the noise level. It
`and
`, that the solu-
`shows, with
`tion
`has its support contained in the support of
`. Here
`for
`for high
`values of
`, so ordinarily it requires considerable overstatement
`of the noise level to achieve this level of conservatism. However,
`it does provide the very interesting epistemological benefit that
`the atoms appearing in the representation have more than inci-
`dental meaning.
`Our second result is obtained in the course of analyzing OGA;
`, the ideal
`it shows that, under condition (1.11) and
`is
`noiseless representation is unique, and the support of
`contained in the support of
`.
`
`have mutual coherence
`Theorem 2.1: Let the dictionary
`. Suppose the noiseless signal
`, where
`
`satisfies
`
`(2.1)
`
`Then
`; and
`is the unique sparsest such representation of
`a)
`b) the reconstruction
`from applying
`to the noisy
`approximates
`data
`
`(2.2)
`
`Claim a) actually follows from known results; e.g., see [13],
`[14] for the two-ortho case, and [11], [19] for general dictio-
`naries. Claim b) requires the following.
`
`, and let
`Lemma 2.2: Let
`submatrix formed by concatenating
`has the
`th singular value bounded below by
`.
`Proof: This is equivalent to the claim that
`
`. Every
`columns from
`
`(2.3)
`
`F. Contents
`The next sections supply the analysis behind the stability
`bounds just quoted, and a discussion of support properties. The
`final section extends this work in various directions.
`
`is
`columns from and
`is the concatenation of
`where
`any vector in
`. Assume without loss of generality these are
`columns. Let
`be the corresponding Gram
`the first
`matrix, and write
`
`(cid:127) Numerical Results. We study the actual stability and sup-
`and OGA on synthetic ex-
`port recovery behavior of the
`amples, finding typical behavior far more favorable than
`our theoretical bounds.
`
`Now
`
`(2.4)
`
`(cid:127)
`
`Superresolution. We situate our work with respect to the
`problem of superresolution, in which astronomers, seis-
`mologists, spectroscopists, and others attempt to “deblur”
`sparse spike trains.
`
`(cid:127) Geometry. We develop a geometric viewpoint explaining
`penal-
`why stability can sometimes be expected for the
`ization scheme, under conditions of sufficient sparsity.
`
`We have recently learned that in parallel to our efforts, there
`are two similar contributions, handling stability and support re-
`covery for the basis pursuit. J. A. Tropp has been working in-
`dependently on some of the same problems [40], and so has J.
`J. Fuchs [16]. After some recent discussions with these authors
`and a careful study of these works we find that their methods
`and results have a rather different flavor, ensuring that the three
`separate works are of interest in studying sparse approximation
`under noise.
`
`Using this inequality in the identity (2.4) gives (2.3).
`
`Apply this to
`
`, with
`
`, and we have
`
`is a linear combination of
`, then
`Also, note that, if
`at most
`can be written as a
`. Thus,
`columns of
`columns from . Applying
`linear combination of at most
`Lemma 2.2, or, more properly, the inequality (2.3), gives the
`result.
`
`II. STABILITY USING
`
`III. STABILITY USING
`
`Suppose again the existence of an ideal noiseless signal
`.
`and noisy observations
`with
`with
`to obtain a sparse approx-
`Consider applying
`imation to . The following establishes the stability estimate
`mentioned in the introduction.
`
`,
`As in the Introduction, we are given a signal
`. We
`is an additive noise, known to satisfy
`where
`); i.e., we
`to this signal (necessarily with
`apply
`solve (1.7) and obtain a solution
`. We study its deviation
`.
`from the ideal representation
`
`

`

`10
`
`IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 1, JANUARY 2006
`
`A. Stability Result
`Theorem 3.1: Let the overcomplete system have mutual
`coherence
`. If some representation of the noiseless signal
`satisfies
`
`(3.1)
`
`We next simplify our analysis by eliminating the noise vector
`using
`
`,
`
`(3.8)
`Expanding the feasible set of (3.7) using this observation gives
`
`then the deviation of the
`, can be bounded by
`suming
`
`representation from , as-
`
`subject to
`
`#
`
`(3.9)
`
`(3.2)
`
`Proof: The stability bound can be posed as the solution to
`an optimization problem of the form
`
`subject to
`
`subject to
`
`(3.3)
`
`.
`where we introduced
`is not posed in terms of the abso-
`The constraint
`, complicating the analysis; we now
`lute values in the vector
`. Again, the Gram
`relax this constraint using incoherence of
`, and the mutual coherence is the maximal
`matrix is
`. For a vector
`off-diagonal amplitude:
`, let
`be vector containing absolute values from ; simi-
`be the the
`-by- matrix of all
`larly for matrices. Also, let
`ones. The constraint
`
`Put in words, we consider all representation vectors
`of
`bounded support, and all possible realizations of bounded
`noise, and we ask for the largest error between the ideal sparse
`decomposition and its reconstruction from noisy data. Defining
`, and similarly
`, we can rewrite the
`above problem as
`
`can be relaxed
`
`subject to
`
`subject to
`
`Using this,
`
`(3.9) is bounded above by the value
`
`(3.4)
`
`(3.10)
`
`(3.4) in a sequence of re-
`We develop an upper bound on
`laxations, each one expanding the feasible set and increasing
`is the minimizer of
`the maximal value. To begin, note that if
`under these constraints, then relaxing the constraints
`satisfying
`expands the feasible set.
`to all
`since otherwise
`is not
`However, this is true only if
`a feasible solution. Thus, we consider
`
`subject to
`
`#
`
`(3.11)
`This problem is invariant under permutations of the entries in
`which preserve membership in
`and
`. It is also invariant
`under relabeling of coordinates. So assume that all nonzeros in
`are concentrated in the initial slots of the vector, i.e., that
`.
`
`(3.5)
`
`Putting
`, and
`
`the remaining
`
`where
`
`entries in
`gives the first
`entries of
`, we obviously have
`
`We now expand this set by exploiting the relation
`
`The
`
`norm on
`times the
`
`dominates the
`norm. Thus,
`
`norm and is dominated by
`
`(3.12)
`
`with complement
`is the support of the nonzeros in
`where
`. Therefore,
`, and we used
`we get a further increase in value by replacing the feasible set
`in (3.5) with
`
`#
`
`(3.6)
`
`We define
`
`Writing this out yields a new optimization problem with still
`larger value
`
`subject to
`
`(3.13)
`
`(3.14)
`
`#
`
`(3.7)
`
`Returning to the problem given in (3.11), and using our
`notations, we obtain a further
`reduction,
`from an opti-
`
`

`

`DONOHO et al.: STABLE RECOVERY OF SPARSE OVERCOMPLETE REPRESENTATIONS IN THE PRESENCE OF NOISE
`
`11
`
`mization problem on
`
`to an optimization problem on
`
`IV. SUPPORT RECOVERY WITH
`
`subject to
`
`(3.15)
`
`We further define
`as
`
`, where
`
`and rewrite (3.15)
`
`subject to
`
`. We now
`So far, we have concentrated on the recovery of
`.
`consider whether we can correctly recover the support of
`Our approach applies
`with a specially chosen
`
`.
`
`Theorem 4.1: Suppose that
`and
`. Let
`
`. Set
`
`where
`
`and suppose
`
`(3.16)
`
`Solve
`
`with exaggerated noise level
`.
`
`(4.1)
`
`. Then
`
`Define
`region (3.16). Setting
`defining that region takes the form
`
`. Then
`
`over the
`, the first constraint
`
`Since
`
`, the sparsity requirement (3.1) leads to
`
`Hence,
`
`(3.17)
`
`(3.18)
`
`(3.19)
`
`.
`as stated by (3.2) with the choice
`The requirement (3.18) puts a restriction on
`free parameters of the problem. Using
`sparsity requirement in (3.1), since
`
`, being
`and
`leads to the
`.
`
`B. Interpretation and Comments
`Theorem 3.1 prompts several remarks.
`
`(cid:127) Setting
`puts us in the noiseless case
`. In that setting, Theorem 3.1 tells us that if
`, there will be zero error in finding the
`unique sparsest representation—i.e., solving the
`opti-
`solves the
`problem
`.
`mization problem
`problem is convex and the
`problem combi-
`As the
`natorial in general, this is by itself significant. The same
`general phenomenon described has been observed before
`in [13], [14], [11], [19]. The sharpest results, in [11], [19],
`established that this phenomenon occurs for any sparsity
`smaller than
`, which means that the new
`result is slack by a factor of
`in the
`case. Perhaps
`a tighter inequality could be achieved with more care.
`
`(cid:127)
`
`(cid:127)
`
`If the signal is not noisy (i.e.,
`) but nevertheless
`is employed with
`, an approximate solution is
`assured, with a bound on the deviation of the approximate
`representation from the ideal noiseless representation. So
`to
`we tolerate
`in “needlessly” going from
`errors in the decomposition, but the errors are controlled.
`
`In practice, if the signal is noisy (i.e.,
`) and we set
`as if there were no noise, some degree of stability
`is still obtained! However, our results do not cover this
`case, and further work is required to establish this kind of
`stability.
`
`As an example, if
`, exaggerating the noise level by a
`leads to partial support recovery. This pro-
`factor
`nounced inflation of the noise level might cause (in an extreme
`case) a zero solution. Still, from the following proof it seems
`dependence is intrinsic to the problem.
`that
`
`th
`be the smallest
`Proof: For later use, we let
`containing
`columns
`of
`singular value of any submatrix
`from . By our assumptions and Lemma 2.2
`
`(4.2)
`
`and
`
`We need notation for the convex functional
`.
`for the quadratic functional
`be the support of the ideal noiseless representation
`Now let
`, and consider the support-constrained optimization problem
`where feasible vectors must be supported in . Let
`be a solution of this problem. We claim that, in fact,
`is ac-
`,
`tually the solution of the support-unconstrained problem
`. Observe, first of all, that unless
`, there is
`i.e.,
`nothing to prove. Now consider perturbations
`i.e., rep-
`of
`, for
`small. We will
`resentations of the form
`objec-
`show that a perturbation which does not increase the
`tive, definitely violates the constraint. Formally
`
`for all sufficiently small
`
`(4.3)
`
`implies
`
`for all sufficiently small
`
`(4.4)
`
`It follows that
`. By convexity,
`is locally optimal for
`this local condition implies global optimality; and global opti-
`as claimed.
`mality implies that the solution has support
`We now note that, for
`
`where
`and
`sufficiently small positive
`
`(4.5)
`
`, while for
`
`(4.6)
`
`

`

`12
`
`IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 1, JANUARY 2006
`
`Using
`
`and the definition (4.2) of
`
`and so
`
`In short, for (4.12) and hence (4.10) to follow, we should re-
`quire
`
`(4.13)
`
`for
`
`otherwise, and we
`, and
`where
`, valid for
`. Let
`used the identity
`denote the terms of order
`in (4.5) and (4.7);
`and
`we plan to show that
`
`(4.7)
`
`this will show that (4.3) implies (4.4).
`.
`solving
`We first work out the consequence of
`, there is a nonempty subspace of vectors sup-
`Since
`; (4.5) and (4.7) show that
`and
`are both dif-
`ported in
`. The fact that
`solves the constrained op-
`ferentiable at
`timization problem
`, implies (by classical constrained
`optimization/Lagrange multiplier ideas) that for some
`
`This implies that for vectors
`, or
`
`supported in
`
`we must have
`
`(4.8)
`
`for inner product in
`We introduce the notation
`for
`for inner product restricted to coordinates in
`, and
`. We introduce the
`inner product restricted to coordinates in
`for the
`norm restricted to
`, etc. To illus-
`notation
`trate the notation, we have that, if
`is a vector supported in
`. Then (4.8) says that, for all
`
`at
`
`which can be rearranged to
`
`(4.14)
`
`(4.15)
`
`This holds because our definition of makes this an equality.
`It only remains to demonstrate (4.11). Write
`where
`is the component of not in the span of
`, while
`is the component of
`which has norm
`, with norm
`. Hence,
`of the
`
`,
`,
`in the span
`
`We now show that
`is the unique global optimum of
`—i.e., the original problem, without the support con-
`straint. We write
`
`(4.9)
`
`and
`
`then from (4.9)
`
`while
`
`where
`are entries of the Gram matrix
`) and
`is the submatrix of
`(known to be in the range
`with columns from only. The definition of
`yields
`
`(4.16)
`
`Hence,
`
`implies
`
`and also
`
`and since
`
`, (4.16) gives
`
`giving (4.11).
`
`Hence, (4.7) follows from
`
`Later we will show that
`
`Thus, (4.10) follows from
`
`Recalling (4.9), and choosing
`
`, shows that
`
`V. LOCAL STABILITY OF THE OGA
`
`(4.10)
`
`(4.11)
`
`(4.12)
`
`Observe that both
`refer to global optimiza-
`and
`tion problems, while the OGA described in the Introduction is
`based on greedy stagewise approximation. Paralleling this dis-
`tinction, the stability result we now develop for OGA is a local
`, depending on
`one, valid only for sufficiently small
`the representation coefficients.
`For ease of exposition, we shall hereafter assume that the
`in the overcomplete system
`order of the columns
`has been chosen so that in the ideal noiseless signal
`matrix
`, the first
`entries in
`are the nonzero entries, and
`that these are ordered
`
`(5.1)
`
`

`

`DONOHO et al.: STABLE RECOVERY OF SPARSE OVERCOMPLETE REPRESENTATIONS IN THE PRESENCE OF NOISE
`
`13
`
`Theorem 5.1: Suppose the ideal noiseless signal
`satisfying1
`representation
`
`for
`for all
`We used
`. The right-hand side of (5.6) can
`and the ordering of the
`be bounded above by the same approach, leading to, for
`
`has a
`
`(5.2)
`
`the result of greedy stepwise least-squares
`Denote by
`, which stops as soon as the
`fitting applied on the noisy signal
`representation error
`. Then
`a)
`has the correct sparsity pattern
`
`(5.3)
`
`Imposing (5.5) and using the two bounds (5.7) and (5.8), we see
`that
`
`(5.8)
`
`b)
`
`approximates the ideal noiseless representa-
`
`(5.9)
`
`tion
`
`Relation (5.6) follows; the greedy algorithm therefore chooses
`at Stage 1 one of the nonzeros in the ideal representation of the
`noiseless signal.
`
`(5.4)
`
`To continue to later stages, we need the following.
`
`with
`Lemma 5.3: Let
`being
`indices in
`. Let
`be a set of
`the number of columns in . Let
`coefficients
`be a vector of
`with nonzeros located at the indices in
`. Define a new signal
`by subtracting
`atoms with nonzero coefficients in
`
`and
`
`Similarly, define
`
`Then
`(cid:127)
`
`if
`has a
`, where
`unique sparsest representation
`made of at most
`atoms; these are all atoms originally appearing in the
`;
`representation of
`can be viewed as a superposition of
`the new signal
`and noise
`, with noise level
`.
`Proof: Define the vector
`
`Note that the argument assumes the noise level
`is known,
`to enable the stopping rule in the algorithm. This parallels the
`, which we relaxed in Theorems 2.1 and 3.1
`assumption
`by using
`.
`and
`The general idea—that the support properties of
`are the same—seems worthy of its own study. In the
`Technical Report [12] on which this paper is based, we call this
`the trapping property, and develop it further.
`We break our analysis in two stages, considering claims (5.3)
`and (5.4) in turn.
`
`A. Getting the “Correct” Support
`Lemma 5.2: Suppose that we have a signal
`where
`admits sparse synthesis
`atoms, where
`
`most
`
`satisfying
`using at
`
`(5.5)
`
`. Then the first step of the greedy algorithm
`and where
`nonzeros in
`.
`selects an atom index from among the
`Proof: The greedy algorithm operates by projecting onto
`each atom in turn, selecting an atom index where the projec-
`tion magnitude is largest. The lemma will follow from
`
`(cid:127)
`
`(5.6)
`
`Then clearly
`then
`
`. Also,
`
`. Since
`
`We now develop a lower bound on the left-hand side and an
`upper bound on the right-hand side which guarantees this. As-
`occurs in slot
`,
`suming that the largest amplitude entry in
`the left-hand side of (5.6) is bounded below by
`
`we conclude that
`Moreover
`
`is the unique sparsest representation of
`
`.
`
`1This inequality is equivalent to the one posed in (1.11).
`
`Hence we have established the two claims.
`
`(5.7)
`
`The impact of the preceding two lemmas is that selection of
`a term, followed by the formation of the residual signal, leads
`us to a situation like before, where the ideal noiseless signal has
`no more atoms than before, and the noise level is the same.
`
`

`

`14
`
`IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 1, JANUARY 2006
`
`We wish to repeatedly apply these lemmas. Starting with
`, we will get an
`and an
`; we then hope to apply the
`and
`, etc. If we are allowed
`observations again, getting
`to continue invoking these lemmas for
`steps, we produce in
`, and
`. Naturally, the sets
`this way series
`are nested.
`Note, however, that a series of conditions must be satisfied
`for the repeated use

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket