`
`IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. rr—16, N0. 6, NOVEMBER 1970
`
`Convolutional Codes I: Algebraic Structure
`
`G. DAVID FORNEY, JR, MEMBER, IEEE
`
`Abstract—A convolutional encoder is defined as any constant
`linear sequential circuit. The associated code is the set of all output
`sequences resulting from any set of input sequences beginning at
`any time. Encoders are called equivalent if they generate the same
`code. The invariant factor theorem is used to determine when a
`convolutional encoder has a feedback-free inverse, and the minimum
`delay of any inverse. All encoders are shown to be equivalent to
`minimal encoders, which are feedback-free encoders withfeedback-
`free delay-free inverses, and which can be realized in the conven-
`tional manner with as few memory elements as any equivalent en-
`coder. Minimal encoders are shown to be immune to catastrophic
`error propagation and, in fact, to lead in a certain sense to the ,
`shortest decoded error sequences possible per error event. In two
`appendices, we introduce dual codes and syndromes, and show
`that a minimal encoder for a dual code has exactly the complexity
`of the original encoder; we show that systematic encoders with
`feedback form a canonical class, and compare this class to the
`minimal class.
`
`I. INTRODUCTION
`
`LOCK CODES were the earliest type of codes to
`
`B be investigated, and remain the subject of the
`
`overwhelming bulk of the coding literature. On
`the other hand, convolutional codes have proved to be
`equal or superior to block codes in performance in nearly
`every type of practical application, and are generally
`simpler than comparable block codes in implementation.
`This anomaly is due largely to the difficulty of analyzing
`convolutional codes, as compared to block codes. It is
`the intent of this series to stimulate increased theoretical
`
`interest in convolutional codes by review and clarification
`of known results and introduction of new ones. We hope,
`first, to advance the understanding of convolutional codes
`and tools useful in their analysis; second,
`to motivate
`further work by showing that in every case in which block
`codes and convolutional codes can be directly compared
`theoretically,
`the convolutional are as good or better.
`Two converging lines of development have generated
`interest in an algebraic approach to convolutional codes.
`On the one hand,
`the success of algebraic methods in
`generating classes of good block codes suggests that
`constructive methods of generating good convolutional
`codes might be developed through use Of algebraic struc—
`tures. Correspondingly one might expect that powerful
`decoding techniques based on such structures might be
`discovered.
`(Fortunately,
`good codes and decoding
`methods not relying on such‘ constructions are already
`known.) On the other hand, the usefulness of regarding
`convolutional encoders as linear sequential circuits has
`begun to become evident, as in the observation of Omura
`
`[1] and others that the Viterbi [2] maximum—likelihood
`decoding algorithm is the dynamic programming solution
`to a certain control problem, and in the observation of
`Massey and his colleagues [3H8] that certain questions
`concerning error propagation are related to questions
`concerning the invertibility of linear systems. As the
`theory of finite-dimensional
`linear systems is seen in—
`creasingly as essentially algebraic, we have another motive
`for examining convolutional encoders in an algebraic
`context.
`Our result is a series of structure theorems that dissect
`
`the structure of convolutional codes rather completely,
`mainly through use of the invariant factor theorem. We
`arrive at a class of canonical encoders capable of generating
`any convolutional code, and endowed with all the desirable
`properties one might Wish, except that in general they
`are not systematic.
`(The alternate ' canonical class Of
`systematic encoders with feedback is discussed in Appendix
`»II.) The results do not seem to suggest any constructive
`methods of generating good codes, and say little new
`in particular about the important class of rate-1 /n codes,
`except for putting known results in a more general context.
`It appears that the results obtained here for convolutional
`codes correspond to block—code results ([9], ch. 3).
`
`II. PROBLEM FORMULATION
`
`We are purposefully going to take a rather long time
`getting to our main results. Most of this time will be spent
`in definitions and statements of fundamental results in
`
`linear system theory, and
`. convolutional coding theory,
`algebra. It is a truism that when dealing with fundamentals,
`once the problem is stated correctly, the results are easy.
`We feel it is important that the right formulation of the
`problem (like justice) not only be done, but be seen to
`be done, in the eyes of readers who may have backgrounds
`in any Of the three areas noted.
`After exhibiting a simple convolutional encoder for
`motivation, we move to a general definition of convolu—
`tional encoders, which we see amount to general finite-
`state time—invariant linear circuits. We discuss the decoding
`problem, which leads to a definition of convolutional
`encoder equivalence and to certain desirable code prop-
`erties. Along the way certain algebraic artifacts will
`intrude into the discussion;
`in the final
`introductory
`section we collect the algebra we need, which is centered
`on the invariant factor theorem.
`
`Convolutional Encoder
`
`this paper
`Manuscript received December 18, 1969. Part of
`was presented at the Internatl. Information Theory Symposium,
`Ellenville, N. Y., January 27—31, 1969.
`The author is with Codex Corporation, Newton, Mass. 02158.
`
`Fig. 1(a) shows a simple binary systematic rate-1/2
`convolutional encoder of constraint length 2. The input
`to this encoder is a binary sequence
`
`ERICSSON EXHIBIT 1005
`
`
`
`FORNEYZ CONVOLUTIONAL CODES I
`
`{21
`
`
`
`
`
`(b)
`
`V2
`
`(The delay operator D corresponds to 2'1 in sampled—data
`theory, but is purely an indeterminate or place-holder,
`whereas 2 is a complex variable.) Now the input/output
`relationships are expressed concisely as
`
`MD) = 91(D)x(D)
`
`MD) = 92(D)x(D),
`
`where the generator polynomials gl(D) and 92(D) are
`
`91(D) = 1
`
`92(1)) = 1+ D + D2,
`
`and ordinary sequence multiplication with coeflicient
`operations modulo 2 and collection of like powers of D
`is implied.
`Similarly, we can define a general (n, k) conventional
`convolutional encoder by a matrix of generator poly-
`nomials g,,-(D), 1 _<_ i S k, 1 S j S n, With coefficients
`in some finite field F. There are k—input sequences 96,-(D),
`n—output sequences y,(D), each a sequence of symbols
`from F, with input/output relations given by
`3‘
`
`MD) = Emma-7(1)),
`1:1
`
`Fig. 1.
`
`(a) A rate—1/2 systematic convolutional encoder.
`(b) Alternate representation.
`
`again with all operations in F. If we define the constraint
`length for input 1’ as
`
`2;: (...’x_1,x0)x17...)_
`
`v.- = max [deg rim-(DH,
`ISiSn
`
`The outputs are two binary sequences 3/, and 3/2 (hence
`the rate is 1/2). The first output sequence 3/1 is simply
`equal to the input at (hence the code is systematic). The
`elements 3/2, of the second sequence yg are given by
`
`ya = xi EB xi—l (‘9 Ctr—2,
`
`the
`where (—9 denotes modulo 2 addition. Therefore,
`encoder must save 2 past information bits, so we say
`the constraint length is 2.
`Sometimes it is convenient to draw the encoder as in
`
`Fig. 1(b), with the output a function of the memory
`contents only.
`(This corresponds to the distinction in
`automata theory between Moore and Mealy machines.)
`Some authors would therefore say this code had con—
`straint length 3, since the outputs are a function of a span
`of 3 input bits. Others measure constraint length in terms
`of output bits and would assign this code a constraint
`length of 6. Our definition of constraint length is chosen
`to coincide with the number of memory elements in a
`minimal realization.
`The term “convolutional” comes from the observation
`
`that the output sequences can be regarded as a con-
`volution of the input sequence with certain generator
`sequences. With the input and output sequences, we
`associate sequences in the delay operator D (D transforms):
`
`x(D) =
`
`—{—yc_1D_1 +xo+x1D+x2D2+
`
`QUAD) = '
`
`'
`
`' + y1.—1D_1 + 1/10 + (Um—D + y12D2 +
`
`y2(D) = ' ’ ' + 92:11).—1 + yzo + 921D + y22D2 + ‘
`
`‘ '.
`
`then the general conventional encoder can be realized
`by k shift registers, the ith of length 12,, with the outputs
`formed as linear combinations in F on the appropriate
`shift register contents. We call this the obvious realization,
`and note that the number of memory elements required
`is equal to the overall constraint length, defined as the
`sum of constraint lengths
`
`k
`
`V: 2111*.
`1:1
`
`For notational convenience we shall generally suppress
`the parenthetical D in our subsequent references to se-
`quences; thus x,- means 96,-(D), y,- = y,(D), and so forth,
`where the fact that a letter represents a sequence (trans-
`form) should be clear from the context.
`
`Convolutional Encoder~Geneml Definition
`
`The encoders of the previous section are linear sequential
`circuits. We now consider all finite—state time—invariant
`
`linear sequential circuits as candidates for convolutional
`encoders.
`
`Definition 1: An (n, k) convolutional encoder over a
`finite field F is a k—input n—output constant linear causal
`finite-state sequential circuit.
`Let us dissect this definition.
`
`K—input: There are k discrete-time input sequences
`33,-, each with elements from F. We write the inputs as
`the row vector x. Sequences must start at some finite
`time and may or may not end. (Then we can represent a
`sequence such as 1 + D + D2 + -
`-
`- by a ratio of poly—
`
`
`
`722
`
`nomials such as 1 / (1 + D), without encountering the
`ambiguity 1 / (1 + D) = D"1 + D"2 + -
`- -
`.) If a sequence
`:6.- “starts” at time d (if the first nonzero element is x“),
`we say it has delay d, del x,- = d, and if it ends at time d’,
`we say it has degree d’, deg x.» = d’, in analogy to the
`degree of a polynomial. Similarly, we define the delay
`and degree of a vector of sequences as the minimum delay
`or maximum degree of the component sequences: del x =
`min del xi, deg x = max deg 13,. A finite sequence has both
`a beginning and an end. (Note that most authors consider
`only sequences starting at time 0 or later. It turns out that
`this assumption clutters up the analysis without com-
`pensating benefits.)
`N—output: There are n—output sequences 31,-, each
`with elements from F, which we write as the row vector y.
`The encoder is characterized by the map G, which maps
`any vector of input sequences x into some output vector
`y, which we can write in the functional form
`
`y = G(x).
`
`Constant (Time-Invariant): If all input sequences are
`shifted in time, all output sequences are correspondingly
`shifted. In delay—operator notation,
`
`G (D"x) = D"G (x)
`
`n any integer.
`
`(Note that most probabilistic analyses of convolutional
`codes have been forced to assume nonconstancy to obtain
`ensembles of encoders with enough randomness to prove
`theorems.)
`Linear: The output resulting from the superposition
`of two inputs is the superposition of the two outputs
`that would result from the inputs separately, and the
`output “scales” with the input. That is,
`
`0051 + x2) = G(x1) + 0052)
`
`G(ax1) = aG'(x1)
`
`a E F.
`
`It is easy to see that constancy and linearity together
`give the broader linearity condition
`
`G(ax1) = 016(06):
`
`where a is any sequence of elements of F in the delay
`operator D. Furthermore, they imply a transfer-function
`representation for the encoder. For let 2,, 1 S 2' S k, be
`the unit inputs in which the ith input at time 0 is 1,
`and all other inputs are 0. Let the generators g, be defined
`as the corresponding outputs (impulse responses):
`
`g,=G(e,-)
`
`lgz'gk.
`
`Then since any input x can be written
`
`we have by linearity
`
`y = G06) = gaye-
`
`Thus we can define a k X n transfer function matrix G,
`
`men TRANSACTIONS ON INFORMATION THEORY, NOVEMBER 1970
`
`Whose rows are the] generators g,-, such that the input/out-
`put relationship is
`
`y=xG.
`
`Therefore from this point we use the matrix notation
`y = xG in preference to the functional notation y = G (x).
`Finally, this definition implies that a zero input gives
`a zero output so that the encoder may not have any
`transient response (nonzero starting state).
`Causal: If the nonzero inputs start at time t, then
`the nonzero outputs start at time t’ 2 t. Since the unit
`inputs start at time 0, this implies that all generators
`start at time 0 or later. As sequences, all generators
`must
`therefore have all negative coefficients equal to
`0, or del g, 2 0. Conversely,
`the condition that all
`generators satisfy del g,- 2 0 implies
`k
`
`del y = del 2 mg,
`i=1
`
`2 min [del :c, + del gal
`lSiSIc
`
`2 del x,
`
`causality.
`Finite—state: The encoder shall have only a finite
`number of memory elements, each capable of assuming
`a finite number of values. The physical state of an encoder
`at any time is the contents of its memory elements; thus
`there are only a finite number of physical states. A more
`abstract definition of the state of an encoder at any time
`is the following: the state 8 of an encoder at time t‘ is the
`sequence of outputs at time t and later if there are no
`nonzero inputs at time t or later. Clearly the number
`of states so defined for any fixed 25 is less than or equal to
`the number of physical states, since causality implies
`that each physical state gives some definite sequence,
`perhaps not unique. Thus an encoder with a finite number
`of physical states must have a finite number of abstract
`states as well.
`
`By studying the abstract states, we develop further
`restrictions on the generators g,. Let us examine the set
`of possible states at time 1", which we call the state space
`Z. (By constancy the state spaces at all times are iso-
`morphic.) Let P be the projection operator that truncates
`sequences to end at time 0, and Q the complementary
`projection operator 1 — P that truncates sequences to
`start at time 1:
`
`xP =ded+
`
`+yc_1D_1 +730
`
`xQ=x1D2+x2D2+
`
`Then any input x is associated with a state at time 1‘
`given by
`
`s = xPGQ;
`
`conversely, any state in E can be so expressed using any
`input giving that state. NOW the state space E is seen to
`satisfy the conditions to be a vector space over the field F,
`for P, G, and Q are all linear over F; that is, if
`
`
`
`FORNEY: CONVOLU’I‘IONAL CODES I
`
`and
`
`then the combinations
`
`s; = xiPGQ,
`
`$2
`
`x2PGQ,
`
`s1 'l" 5'2 = (x1 + x2)PGQ
`
`0481 = (ax1)PGQ
`
`o: E F,
`
`are also states. As a vector space, 2 therefore has a finite
`dimension dim E, or else the number of states qdim 2, where
`q is the number of elements in the finite field, would be
`infinite.
`
`Consider now for simplicity a single—input single—output
`linear sequential circuit with transfer function 9, so 3/ = mg.
`Let 8., be the state obtained from a unit input at time —d:
`
`8,; = D“de
`
`d 2 0.
`
`If the state space has finite dimension dim 2, then at
`most dim E of these states can be linearly independent,
`and in particular there is some linear dependence relation
`(with coefficients ybd in F) between the first dim 2 + 1
`of these states:
`
`where
`
`dim E
`
`Z 1Pde
`d=0
`
`II
`
`¢(D“‘)9Q,
`
`dim E
`
`¢(D_l) = g tbdD—d
`
`is some nonzero polynomial over F of degree dim E or less
`in the indeterminate D". In order for :I/(D_1)g to be 0
`at time 1 and later, 9 itself must be equal to a ratio of
`polynomials h(D"1)/¢(D'1), with the degree of the nu-
`merator MD”) not greater than that of the denominator
`MD”) in order that g be causal, del g 2 0. Any sequence
`9 that can be so expressed will be called a realizable function,
`realizable meaning both causal and finite. Clearly a
`realizable function can also be expressed (by multiplying
`numerator and denominator by Ddeg ,) as a ratio of poly-
`nomials in D rather than D“, in which case the condition
`deg h 3 deg 30 is transformed to the condition that D
`not be a factor of the denominator after reduction to
`lowest terms.
`
`We shall always assume that the expression h(D")/
`MD“) of a realizable function g has been reduced to
`lowest terms and has a monic denominator (xi/deg ., = 1).
`A canonical realization of such a realizable function is
`
`shown in Fig. 2. The realization involves a feedback
`shift register with deg zp memory elements storing elements
`of F, which essentially performs long division;
`if h/11/
`is in lowest terms then the dimension of the state—space
`dim 2 is also equal to deg a/x.
`A convolutional encoder is then any linear sequential
`circuit whose transfer—function matrix G is realizable,
`that is, has components that are realizable functions.
`
`723
`
`It is clear that a brute-force finite-state realization of
`
`such an encoder always exists, since one can simply realize
`the kn components and sum their outputs; in Appendix II
`we discuss the result of realization theory that shows
`how to construct a canonical realization, namely, one
`with a number of memory elements equal to the dimension
`of the abstract state space. We see that the only respect
`in which this definition is more general than that of a
`conventional convolutional encoder
`is
`that
`realizable
`
`generators in general involve feedback and, as sequences,
`have infinite length. It might seem that getting an infinite-
`length generator sequence from a finite—state encoder
`is a good bargain, but we shall see in the sequel that in
`fact feedback buys nothing.
`
`Communications Context
`
`So far we have defined a general convolutional encoder
`merely as a general realizable linear sequential circuit,
`and developed basic concepts in linear system theory.
`It is only when we put the encoder in a communications
`context that questions unlike those posed in linear system
`theory arise. The most important new concept involves
`a definition of encoder equivalence very different from the
`linear—system-theoretic one.
`Consider then the use of a convolutional encoder in a
`
`communications system, shown in Fig. 3. From the k—input
`sequences x, called information sequences, the encoder G’
`generates a set of n-output sequences y, called a codeword,
`which is
`transmitted over some noisy channel. The
`received data, whatever their form, are denoted by r;
`a decoder operates on r in some way to produce k decoded
`sequences 2, preferably not too different from x.
`We see immediately that all is lost if the encoder map G
`is not one—to-one, for then even if there were no noise in
`the channel the decoder would make mistakes. In fact,
`by linearity, if y = le = sz for two different inputs,
`then for any output y’ there are at least two inputs x’
`and x’ + x1 — x2 such that
`
`y’ = x’G = (x’ + x1 — x2)G';
`
`and by constancy the difference between the inputs may
`be made to extend over all time by concatenating them
`if they are not infinite already, so that the probability
`of decoding error would be at least %. We therefore require
`that the encoder map be one—to—one in any useful con—
`volutional encoder. There is therefore some inverse map
`G“, obviously linear and constant, which takes outputs y
`back to the unique corresponding input x,
`
`xG’G'1 = x
`
`for all x;
`
`GG—l
`
`'7': I)“
`
`where Ik is the identity transformation. (Here we have
`used an n X k transfer-function matrix representation
`for (1*, as we may, since G'l is constant and linear; in
`matrix terminology G—1
`is a right inverse.) Of course
`this inverse map may not be realizable, but we shall show
`below that when there is any inverse there is always a
`
`
`
`724
`
`IEEE TRANSACTIONS ON INFORMATION THEORY, NOVEMBER 1970
`
`
`
`Fig. 2. Canonical realization of the transfer function h(D‘1)/¢(D‘l).
`
`ENCODER
`G
`
`DECODED
`INFORMATION
`SEQUENCES
`SEQUE NCES
`
`coca
`CODE WORD
`ESTIMATE
`3' WORD
`y
`
`
`cons woso
`ESTIMATOR
`
`> R
`
`
`
`‘ CHANNEL
`
`‘
`
`
`
`ECEIVED
`DATA
`
`
`
`
`
`
`INFORMATION
`DECODED
`
` SEQUENCES
`SEQUENCES
`CODE
`I' RECEIVED
`DATA
`
`WORD
`CHANNEL
`
`Fig. 3. Communications system using convolutional coding.
`
`realizable pseudo—inverse 6’1 such that Gé“ = IkD" for
`some d 2 0. C7“ is also called an inverse with delay d,
`and G is sometimes called information-lossless of order d.
`
`Returning to Fig. 3, we now split the decoder con—.
`ceptually into two parts, a codeword estimator that
`produces from the received data 7' some codeword estimate
`y, followed by a realizable pseudo-inverse G” that assigns
`to the codeword estimate the appropriate information
`sequences 2‘: (see Fig. 4). In practice a decoder is usually
`not realized in these two pieces, but it is clear that since
`all the information about the information sequences x
`comes through y, the decoder can do no better estimating x
`directly than by estimating y and making the one—to—one
`correspondence to x. (In fact, any decoder can be made
`into a codeword estimator by appending an encoder G’.)
`When G is one-to-one, as long as the codeword estimator
`makes no errors, there will be no error in the decoded
`sequences. However, even in the best—regulated com—
`munications systems, decoding errors will sometimes
`occur. We define the error sequencesas the difference
`between the estimated codeword y and the codeword y
`actually sent:'
`
`e=y——y.
`
`Correspondingly the information errors e, are defined
`as the difference between the decoded sequences 1‘: and
`the information sequences x, with allowance for the
`pseudo-inverse delay cl:
`
`e,
`
`120“” —- x
`
`= g’G-lD—d __ yG-l
`
`= eG".
`
`Fig. 4. Same system with decoder in two parts.
`
`As is implicit in our terminology, we consider the error
`sequences e as the more basic of the two.
`Since the error sequences are the difference between
`two codewords, e is itself a codeword, in fact the codeword
`generated by e,‘:
`
`e = e,G.
`
`We make a decomposition of e into short codewords,
`called error events, as follows. Start at some time when
`the codeword estimator has been decoding correctly.
`An error event starts at the first subsequent time at which e
`is nonzero. The error event stops at the first subsequent
`time when the error sequences in the event form a finite
`codeword, after which the decoder will be decoding cor-
`rectly again. Thus we express e as a sum of nonoverlapping
`finite codewords, the error events.
`Implicit in the above analysis is the assumption that
`infinite error events do not occur. Such a possibility is
`not excluded in principle, but can generally be disregarded,
`on the basis of the following plausibility argument. If
`two codewords differ in an infinite number of places,
`then as time goes on, we can expect the evidence in the
`received data 7' to build up in favor of the correct word,
`'with probability approaching 1. A well-designed codeword
`estimator will use this information efficiently enough
`that very long error events have very small probabilities
`and infinite error events have probability 0. More precisely,
`the average error-event length should be small, at least for
`channel noise, which is in some sense small, so that most
`of the time the decoder is decodingcorrectly. We say
`that any codeword estimator or decoder not satisfying
`this assumption is subject to ordinary error propagation,
`and exclude it from the class of useful decoders. Note
`
`The one-to—one correspondence between these two
`definitions is exhibited explicitly through the inverse G“.
`
`that any decoder that is capable of starting to decode
`correctly sooner or later, regardless of what time it is
`
`
`
`FORNEY: CONVOLUTIONAL CODES I
`
`turned on, can be made immune to ordinary error pro—
`pagation simply by restarting whenever it detects it is
`in trouble, or even periodically.
`We owe to Massey and Sain [4] the observation that
`if there is any infinite information sequence x0 such that
`the corresponding codeword y0 = xoG is finite, then even
`in the absence of ordinary error propagation decoding
`catastrophes can occur. For there will generally be a
`nonzero probability for the finite error event e = y0 to
`occur, which will lead to infinite information errors 9,, = x0.
`Thus 5: will differ from the original information sequences
`x in an infinite number of places, even if no further decoding
`errors are made by the codeword estimator.
`(In fact,
`the only chance to stop this propagation is to make another
`error.) Massey and Sain call this catastrophic error propa-
`gation. Since G‘1 must supply the infinite output x0 in
`response to the finite input yo, it must have internal
`feedback for the above situation to occur. We therefore
`
`require that any useful encoder must not only have a
`realizable pseudo—inverse G‘l, but one that is feedback—
`free. (We see later that if G has no such inverse, then
`there is indeed an infinite input leading to a finite output.)
`We come at last to our most important observations
`(see also [3]). The codeword estimator dominates both
`complexity and performance of the system: complexity,
`because both G and G“1 represent simple one—to-one
`(and in fact linear) maps, while the codeword estimator
`map from received data 7" to codeword 3: is many-to-one;
`performance, because in the absence of catastrophic error
`propagation the probability of decoding error is equal to
`the probability of an error event multiplied by the average
`decoding errors per error event, with the former generally
`dominant and the, latter only changed by small factors
`for reasonable pseudo—inverses G"1 (in fact, in many prac—
`tical applications the probability of error event is the sig—
`nificant quantity, rather than the probability of decoding
`error). But the performance and complexity of the code—
`word estimator depend only on the set of codewords y, not
`on the input/output relationship specified by G (assuming
`that all x and hence y are equally likely, or at least that
`the codeword estimator does not use codeword probabilities,
`as is true, for example, in maximum-likelihood estimation).
`Therefore it is natural to assert that two encoders genera-
`ating the same set of codewords are essentially equivalent
`in a communications context. So we give the following
`definitions.
`
`Definition 2: The code generated by a convolutional
`encoder G is the set of all codewords y = xG, where the 19
`inputs x are any sequences.
`
`Definition 3: Two encoders are equivalent
`generate the same code.
`These definitions free us to seek out the encoder in
`
`they
`
`if
`
`any equivalence class of encoders that has the most
`desirable properties . We have already seen the desirability
`of having a feedback—free realizable pseudo—inverse G”.
`Our main result is that any code can be generated by an
`encoder G with such a G“, and in fact with the following
`properties.
`
`1) G has a realizable feedback—free zero—delay inverse
`G‘l.
`
`2) G is itself conventional (feedback—free).
`3) The obvious realization of G requires as few memory
`elements as any equivalent encoder.
`4) Short codewords are associated with short information
`sequences, in a sense to be made precise later.
`
`In the context of linear system theory, the study of
`convolutional encoders under equivalence can be viewed
`as the study of those properties of linear systems that
`belong to the output space alone, or of the invariants
`over the class of all systems with the same output spaces.
`
`A lgebra.
`
`We assume that the reader has an algebraic background
`roughly at the level of the introductory chapters of Peter—
`son [9]. He will therefore be familiar with the notion of a
`field as a collection of objects that can be added, subtracted,
`multiplied, and divided with the usual associative, dis-
`tributive, and commutative rules; he will also know what
`is meant by a vector space over a field. Further, he will
`understand that a (commutative) ring is a collection of
`objects with all the properties of fields except division.
`He should also recall that the set of all polynomials in
`D with coefficients in a field F, written conventionally
`as F[D], is a ring.
`The polynomial ring F[D] is actually an example of
`the very best kind of ring, a principal ideal domain. The
`set of integers is another such example. Without giving
`the technical definition of such a ring, we can describe
`some of its more convenient properties. In a principal
`ideal domain, certain elements r, including 1, have inverses
`r‘1 such that M” = 1; such elements are called units.
`The unit integers are :lzl, and the unit polynomials are
`those of degree 0, namely, the nonzero elements of F.
`Those elements that are not units can be uniquely factored
`into products of primes, up to units; a prime is an element
`that has no factor but itself, up to units. (The ambiguity
`induced by the units is eliminated by some convention:
`the prime integers are taken to be the positive primes,
`while the prime polynominals are taken to be manic
`irreducible polynomials, where monic means having
`highest order coefficient equal to 1.) It follows that we
`can cancel: if ab = ac, then b = c; this is almost as good
`as division. Further, we have the notion of the greatest
`common divisor of a set of elements as the greatest product
`of primes that divides all elements of the set, again made
`unique by the conventional designation of primes.
`Other principal ideal domains have already occurred
`in our discussions. In general, if R is any principal ideal
`domain and S is any multiplicative subset—namely, a
`group of elements containing 1 but not 0 such that if a E S,
`b E S, then at E S—then the ring of fractions 3"]?
`consisting of the elements r/s where r E R and s E S is a
`principal ideal domain. (All elements of R that are in S
`are thereby given inverses and become units.) Letting R
`be the polynomials F[D], we have the following examples.
`
`
`
`man TRANSACTIONS ON INFORMATION THEORY, NOVEMBER 1970
`
`if there is any decomposition G = AI‘B such that A and
`B are invertible R-matrices and P is a diagonal matrix
`with 7.-
`I 7;“ or 7H1 = 0, then the 7,- are the invariant
`factors of G with respect to R.
`_
`Sketch of Proof [10]: G is said (in this context only)
`to be equivalent to G’ if G = AG’B, where A and B are
`square 10 X k and n X n R-matrices with unit deter-
`minants; the assertion of the theorem is that G is equiva-
`lent to a diagonal matrix I‘ that is unique under the
`specified conditions on the 'y.-. Since any Such A can
`be represented as the product of elementary row operations,
`and B of elementary column operations (interchange
`of rows (columns), multiplication of any row (column)
`by a unit in R, addition of any R—multiple of any row
`(column) to another), it can be shown that the A, are
`preserved under equivalence.
`In particular,
`therefore,
`A1 divides all elements of all equivalent matrices. We
`will now show that there exists an equivalent matrix
`in which some element divides all other elements, hence
`is equal to A, up to units. Let G not be already of this
`form, and let a and B be two nonzero elements in the same
`row or colunm such that a does not divide B. (If there is
`no element in the same row or column as a not divisible
`
`by a, there is some such 6 in some other column, and this
`column can be added to the column containing a to give
`an equivalent matrix for which the prescription above
`can be satisfied.) By row or column permutations a may
`be placed in the upper—left corner and {3 in the second
`entry of the first row or column; we assume column for
`definiteness. Now there exist a; and 3/ such that am + py = 5,
`where 6 is the greatest common divisor of oz and B, and
`has fewer prime factors than a since a .1’
`[3, 6 75 oz. The
`row transformation below then preserves equivalence
`while replacing a by 6:
`
`llIll
`
`___| ___..
`
`lIllI
`
`If 5 does not now divide all elements of the equivalent
`matrix, the construction can be repeated and 6 replaced
`by some 6’ with fewer prime factors. This descending
`chain can therefore terminate only with a 6 that does;
`divide all elements of the equivalent matrix. Since 5 =
`A1 = 7,, up to units, multiplication of the top row by a
`unit puts 71 in the upper-left corner, and the whole first
`row and column can be cleared to zero by transformations
`of the above type (with x = 1, y = 0), giving the equiv—
`alent matrix
`
`Il
`
`l,
`__,_|__._
`
`l|I|
`
`G,
`
`where 7, divides every element of G1. Similarly, G,
`equivalent to a matrix G; of the same form, so
`
`is
`
`726
`
`1) Let S consist of all nonzero polynomials. Then S‘IR
`consists of all ratios of polynomials with the denominator
`nonzero, which are called the rational functions, written
`conventionally F(D). Obviously,
`in F(D) all nonzero
`elements are invertible, so F (D) is actually a field, called
`the field of quotients of F[D].
`‘
`2) Let S consist of all nonnegative powers of D, in-
`cluding D° = 1. Then S'IR consists of elements D‘ f(D),
`where f(D) is a polynomial; in other words, S‘IR is the
`set of finite sequences F,,(D). Clearly all the irreducible
`polynomials except D remain as primes in FM (D).
`3) Let S consist of all polynomials with nonzero constant
`term; that is, not divisible by D. Then S'IR consists Of
`ratios of polynomials in D with a nonzero constant term
`in the denominator. We saw earlier that these are precisely
`the generators realizable by causal finite-state systems;
`these are