`
`Researchand
`Development
`Report
`
`MPEGVIDEOCODING:
`Abasictutorialintroduction
`
`S.R.Ely, Ph.D.,C.Eng. ,M.I.E.E.
`
`Research&DevelopmentDepartment
`Policy&PlanningDirectorate
`THEBRITISHBROADCASTINGCORPORATION
`
`IPR2018-00534
`Sony EX1027 Page 1
`
`
`
`BBC RD 1996/3
`
`*MPEG VIDEO CODING: A basic tutorial introduction
`
`S.R. Ely, Ph.D., C.Eng., M.I.E.E.
`
`Summary
`
`MPEG has been outstandingly successful in defining standards for video compression
`coding, serving a wide range of applications, bit-rates, qualities and services on a
`worldwide basis. The standards are based upon a flexible toolkit of techniques for bit-rate
`reduction. MPEG video coding uses a combination of motion-compensated interframe
`prediction (for reducing temporal redundancy) with Discrete Cosine Transform (DCT) and
`variable length coding tools (for reducing spatial redundancy). The specification only
`defines the bitstream syntax and decoding process: the coding process is not specified and
`the performance of a coder will vary depending upon, for example, the quality of the
`motion-vector measurement, and the processes used for prediction-mode selection.
`
`* This Report is based on a European Broadcasting Union (EBU) Review item, from the ‘EBUReview,Winter1995(Ed.266)’,
`and is published with permission. It was originally written by BBC R&D Manager, Dr. S.R. Ely and entitled: ‘MPEGvideo–
`Asimpleintroduction’.
`
`Issued under the Authority of
`
`Research & Development Department
`Policy & Planning Directorate
`BRITISH BROADCASTING CORPORATION
`
`General Manager
`Research & Development Department
`
`(R021)
`
`1996
`
`IPR2018-00534
`Sony EX1027 Page 2
`
`
`
`(cid:211) British Broadcasting Corporation
`
`No part of this publication may be reproduced, stored in a
`retrieval system, or transmitted in any form or by any
`means, electronic, mechanical, photocopying, recording,
`or otherwise, without prior permission.
`
`(R021)
`
`IPR2018-00534
`Sony EX1027 Page 3
`
`
`
`MPEG VIDEO CODING: A basic tutorial introduction
`
`S.R. Ely, Ph.D., C.Eng., M.I.E.E.
`
`1.
`
`INTRODUCTION ................................................................................................................. 1
`
`2. VIDEO CODING PRINCIPLES ......................................................................................... 1
`
`3. MPEG VIDEO COMPRESSION TOOLKIT ..................................................................... 2
`
`3.1 Discrete cosine transform ........................................................................................... 2
`
`3.2 Coefficient quantisation ............................................................................................... 3
`
`3.3 Zig-zag coefficient scanning, run-length coding, and variable length coding ....... 3
`
`3.4 Buffering and feedback ................................................................................................ 4
`
`3.5 Reduction of temporal redundancy: interframe prediction ...................................... 5
`
`3.6 Motion-compensated interframe prediction ............................................................... 5
`
`3.7 Prediction modes .......................................................................................................... 6
`
`3.8 Picture types ................................................................................................................. 7
`3.8.1 Intra pictures ......................................................................................................... 7
`3.8.2 Predictive pictures ................................................................................................. 7
`3.8.3 Bi-directionally-predictive pictures ......................................................................... 8
`3.8.4 Group of pictures ................................................................................................... 8
`
`4. MPEG PROFILES AND LEVELS .................................................................................... 8
`
`5. CONCLUSIONS ................................................................................................................ 10
`
`6. ACKNOWLEDGEMENTS ............................................................................................... 10
`
`7. REFERENCES .................................................................................................................. 10
`
`(R021)
`
`IPR2018-00534
`Sony EX1027 Page 4
`
`
`
`MPEG VIDEO CODING: A basic tutorial introduction
`
`S.R. Ely, Ph.D. C.Eng., M.I.E.E.
`
`1.
`
`INTRODUCTION
`
`2. VIDEO CODING PRINCIPLES
`
`A studio-quality 625-line component picture, when
`digitised according to ITU Recommendation 601/656
`(i.e. 4:2:2 sampling), requires 216 Mbit/s to convey the
`luminance and two chrominance sample components
`(see Fig. 1). For bandwidth-restricted media (such as
`terrestrial or satellite channels), some way of reducing
`the very high bit-rate needed to represent the digitised
`picture must be obtained.
`
`A video bit-rate reduction system5 (for producing com-
`pression) operates by removing redundant and less
`important information from the signal prior to trans-
`mission, and then reconstructing an approximation of
`the image from the remaining (compressed) informa-
`tion at the decoder. In video signals, three distinct
`kinds of redundancy can be identified:
`• Spatial and temporal redundancy: pixel val-
`ues are not independent but are correlated with
`their neighbours, both within the same frame and
`across frames. So, to some extent, the value of a
`pixel is predictable, given the values of neigh-
`bouring pixels.
`• Entropy redundancy: for any non-random dig-
`itised signal, some code values occur more
`frequently than others. This can be exploited by
`coding the more frequently-occurring values
`with shorter codes than would be used for the
`rarer values. This same principle has long been
`exploited in the Morse Code, where the com-
`monest
`letters in English (‘E’ and ‘T’) are
`represented by one dot and one dash respectively
`whereas the rarest (‘X’, ‘Y’ and ‘Z’) are repre-
`sented by four dots and dashes.
`
`R
`
`G
`
`B
`
`RGB
`to
`YUV
`matrix
`
`5.75 MHz
`
`2.75 MHz
`
`2.75 MHz
`
`8 bits
`
`8 bits
`
`8 bits
`
`Y
`
`C B
`
`RC
`
`ADC
`
`13.5 MHz
`
`ADC
`
`6.75 MHz
`ADC
`
`6.75 MHz
`
`Y = 8 x 13.5 = 108
`
`C = 8 x 6.75 = 54
`
`C = 8 x 6.75 = 54
`216 Mbit/s
`
`B R
`
`Fig. 1 - 4:2:2 sampling.
`
`MPEG (Moving Pictures Expert Group) started in
`1988 as a Working Group of the International Stand-
`ards Organisation (ISO)1 with the aim of defining
`standards for digital compression of video and audio
`signals. It took as its basis the ITU-T standard for
`video-conferencing and video-telephony* with that of
`JPEG (Joint Photographic Experts Group) which was
`initially developed for compressing still images such
`as electronic photography.
`
`The first goal of MPEG was to define a video coding
`algorithm for digital storage media; in particular, for
`the CD-ROM. The resulting standard was published in
`1993.** It comprises three parts, covering:
`• systems aspects (including multiplexing and syn-
`chronisation)2,
`• video coding
`• audio coding.3
`
`It has been applied in the Interactive CD (CDi) system
`to provide full motion video playback from CD, and is
`widely used in PC applications, for which a range of
`hardware and software coders and decoders are avail-
`able. This standard is known as MPEG-1 and is
`restricted to non-interlaced video formats; it is primar-
`ily intended to support video coding at bit-rates up to
`about 1.5 Mbit/s.
`
`In 1990, MPEG began work on a second standard, to
`be capable of coding interlaced pictures directly; origi-
`nally to support high-quality applications at bit-rates in
`the range of about 5 to 10 Mbit/s. MPEG-2,4 as it is
`now known, also supports high definition formats at
`bit-rates in the range of about 15 to 30 Mbit/s. As for
`MPEG-1,
`the MPEG-2 standard (published in
`1994***) is comprised of three parts: systems, video
`and audio.
`
`It is important to note that the MPEG standards specify
`only the syntax and semantics of the bit-streams and
`the decoding process; they do not specify the encoding
`process. Much of the latter is left to the discretion of
`the coder designers and this gives scope for improve-
`ment as coding techniques are refined and new
`techniques developed.
`
`*
`This is now known as Working Group H261.
`** As ISO/IEC 11172.
`***As ISO/IEC 13818.
`
`(R021)
`
`IPR2018-00534
`Sony EX1027 Page 5
`
`
`
`• Psycho visual Redundancy: this form of redun-
`dancy results from the way the eye and brain
`work. In audio, the limited frequency response of
`the ear is well appreciated. In video, both the
`limit to the fine detail which the eye can resolve
`(limits of spatial resolution) and the limits in
`ability to track fast-moving images (limits of
`temporal resolution), must be considered. The
`latter means, for example, that a shot-change
`masks fine detail on either side of the change.
`
`3. MPEG VIDEO COMPRESSION TOOLKIT
`
`Sample-rate reduction is a very effective method of re-
`ducing bit-rate, but of course introduces irreversible
`loss of resolution. For very low bit-rate applications
`(e.g. in MPEG-1), alternate fields are discarded and the
`horizontal sampling-rate reduced to around 360 pixels
`per line (giving about 3.3 MHz resolution). The sample
`rate for the chrominance is half that of the luminance,
`both horizontally and vertically. In this way, the bit-
`rate can be reduced to less than one fifth of that of a
`conventional definition (4:2:2-sampled) signal.
`
`For ‘broadcast quality’, at bit-rates in the range 3 to
`10 Mbit/s, horizontal sample-rate reduction is not ad-
`visable for the luminance or chrominance signals, nor
`is temporal sub-sampling. However, for distribution
`and broadcast applications, sufficient chrominance
`resolution can be provided with the vertical chromi-
`nance sampling frequency halved. Thus, for most
`MPEG-2 coding applications, 4:2:0 sampling is likely
`to be used, rather than 4:2:2, although the latter, and
`4:4:4 sampling, is also supported. It may be of interest
`to note that a conventional delay-line PAL decoder
`effectively yields the same vertical sub-sampling of the
`chrominance signals as 4:2:0 sampling.
`
`Apart from sample-rate reduction, the MPEG toolkit
`includes two different kinds of tools to exploit redun-
`dancy in images:
`• Discrete Cosine Transform (DCT)6, 7 is similar
`to the Discrete Fourier Transform (DFT). The
`purpose of using this orthogonal transform is to
`assist the processing to remove spatial redun-
`dancy by concentrating the signal energy into
`relatively few coefficients.
`• Motion-Compensated interframe prediction is
`used to remove temporal redundancy. This is
`based on techniques similar to the well known
`differential pulse-code modulation (DPCM)
`principle.
`
`3.1 Discrete cosine transform
`
`itised 625-line picture comprising about 704 pixels
`horizontally and about 576 lines vertically (see Fig. 2).
`In MPEG coding, spatial redundancy is removed by
`processing the digitised signals in two-dimensional
`blocks of 8 pixels by 8 lines (taken from either one
`field or two, depending on the mode of operation).
`
`As Fig. 3 illustrates, the DCT transform is a reversible
`process which maps between the normal 2D presenta-
`tion of the image and one which represents the same
`information in what may be thought of as the ‘fre-
`quency’ domain. Each coefficient in the 8 × 8 DCT
`domain block indicates the contribution of a different
`DCT ‘basis’ function (top-left in Fig. 3); it is called the
`DC coefficient, and may be thought of as representing
`the average brightness of the block. Moving down the
`block in Fig. 3, the coefficients represent increasing
`vertical frequencies; and moving along the block, from
`left to right, represents increasing horizontal frequencies.
`
`The DCT does not directly reduce the number of bits
`required to represent the block. In fact for an 8 × 8 im-
`age block of 8-bit pixels, the DCT produces an 8 × 8
`block of at least 11-bit DCT coefficients to allow for
`reversibility! The reduction in the number of bits fol-
`lows from the fact that for typical blocks of natural
`images, the distribution of coefficients is non-uniform
`– the transform tends to concentrate the energy into the
`low-frequency coefficients, and many of the other co-
`efficients are near zero. The bit-rate reduction is
`achieved by not transmitting the near-zero coefficients,
`and by quantising and coding the remaining co-effi-
`cients as described below. The non-uniform coefficient
`distribution is a result of the spatial redundancy present
`in the original image block.
`
`Many different forms of transformation have been in-
`vestigated for bit-rate reduction. The best transforms
`are those which tend to concentrate the energy of a pic-
`ture block into a few coefficients. The DCT is one of
`the best transforms in this respect and has the advan-
`tage that the DCT and its inverse are easy to implement
`in digital processing. The choice of 8 × 8 block size is
`a trade-off between the need to use a large picture area
`
`88 blocks
`704 pixels
`
`72
`blocks 576
`lines
`
`8
`
`8
`
`Consider the luminance signal of a 4:2:0-sampled dig-
`
`Fig. 2 - Block-based DCT.
`
`(R021)
`
`- 2 -
`
`IPR2018-00534
`Sony EX1027 Page 6
`
`
`
`8
`
`d.c. coefficient
`
`horizontal
`frequencies
`
`8
`
`DCT
`
`IDCT
`
`vertical
`frequencies
`
`increasing
`cycles per
`picture width
`
`8
`
`increasing
`cycles per
`picture height
`
`8
`
`Fig. 3 - DCT transform pairs.
`
`the energy compaction
`for the transform, so that
`described above is most efficient; and also, the fact
`that the content and movement of the picture varies
`spatially, which would tend to argue for a smaller
`block-size. A large block size would also emphasise
`variations from block-to-block in the decoded picture;
`it would also emphasise the effects of ‘windowing’ by
`the block structure.
`
`3.2 Coefficient quantisation
`
`After a block has been transformed, the transform co-
`efficients are quantised. Different quantisation is
`applied to each coefficient depending on the spatial
`frequency within the block that it represents. The ob-
`jective is to minimise the number of bits which must
`be transmitted to the decoder so that it can perform the
`inverse transform and reconstruct the image: reduced
`quantisation accuracy reduces the number of bits
`which need to be transmitted to represent a given DCT
`coefficient, but
`increases the possible quantisation
`error for that coefficient. Note that the quantisation
`noise introduced by the coder is not reversible in the
`decoder, so the coding and decoding process is ‘lossy’.
`
`More quantisation error can be tolerated in the high-
`frequency coefficients because high-frequency noise is
`less visible than low-frequency quantisation noise.
`Also, quantisation noise is less visible in the chromi-
`nance components than in the luminance component.
`MPEG uses weighting matrices to define the relative
`accuracy of the quantisation of the different coeffi-
`cients. Different weighting matrices can be used for
`different frames depending on the prediction mode
`used.
`
`The weighted coefficients are then passed through a
`fixed quantisation law which is usually a linear law.
`However, for some prediction modes there is an in-
`creased threshold level (i.e. a dead-zone) around zero.
`The effect of this threshold is to maximise the number
`of coefficients which are quantised to zero. In practice,
`it is found that small deviations around zero are usu-
`ally caused by noise in the signal; so that suppressing
`
`these values actually gives an apparent improvement to
`the subjective picture quality.
`
`Quantisation noise is more visible in some blocks than
`in others – for example, in blocks which contain a
`high-contrast edge between two plain areas. In such
`blocks, the quantisation parameters can be modified to
`limit the maximum quantisation error, particularly in
`the high-frequency coefficients.
`
`3.3 Zig-zag coefficient scanning, run-length
`coding, and variable length coding
`After quantisation, the 8 × 8 blocks of DCT coeffi-
`cients are scanned in a zig-zag pattern (see Fig. 4
`(overleaf) to turn the 2D array into a serial string of
`quantised coefficients. Two scan patterns are defined:
`one is usually preferable for picture material which has
`strong vertical frequency components, due to, perhaps,
`the interlace picture structure. In this scan pattern there
`is a bias to scan vertical coefficients first. In the other,
`which is preferable for pictures without a strong verti-
`cal structure, there is no bias and the scan proceeds
`diagonally from top left to bottom right as illustrated in
`Fig. 4. The coder signals its choice of scan pattern to
`the decoder.
`
`The strings of coefficients produced by the zig-zag
`scanning are coded by counting the number of zero co-
`efficients preceding a non-zero coefficient; that is, they
`are run-length coded. The run-length value and the
`value of the non-zero coefficient which the run of zero
`coefficients precedes are then combined and coded using
`a variable length code (VLC). This VLC coding exploits
`the fact that short runs of zeros are more likely than
`long ones and small coefficients are more likely than large
`ones. The VLC allocates codes which have different
`lengths depending upon the expected frequency of
`occurrence of each zero-run-length/non-zero coeffi-
`cient value combination. Common combinations use
`short code words, less common combinations long
`code words. All other combinations are coded by the
`combination of an escape code and two fixed length
`codes, one 6-bit word to indicate the run length and
`
`(R021)
`
`- 3 -
`
`IPR2018-00534
`Sony EX1027 Page 7
`
`
`
`Zig-Zag scanning
`
`d.c. coefficient
`
`horizontal
`frequencies
`
`vertical
`frequencies
`
`Run/amplitude coding :- run of zeros and amplitude of DCT coefficient
`given one Variable Length Code (VLC) (Huffman Code)
`
`Fig. 4 - Scanning of DCT blocks and run-length coding with variable length codes (Entropy coding).
`
`one 12-bit word to indicate the coefficient value.
`
`One VLC code table is used in most circumstances.
`However, a second VLC code table is used for some
`special pictures. The DC coefficient is treated differ-
`ently in some modes. But, all the VLCs are designed
`such that no complete codeword is the prefix of any
`other codeword – they are similar to the well-known
`Huffman code. Thus, the decoder can identify where
`one variable length codeword ends and another starts
`when operating within the correct codebook. No VLC,
`or combination of codes,
`is allowed to produce a
`sequence of 23 contiguous zeros – this combination is
`used for synchronisation purposes.
`
`picture. To produce the constant bit-rate needed for
`transmission over a fixed bit-rate system, a buffer is
`needed to smooth out the variations in bit-rate. For pre-
`venting overflow or underflow of this buffer,
`its
`occupancy is monitored and feedback applied to cod-
`ing processes to control the input to the buffer. The
`DCT quantisation process is often used to provide
`direct control of the buffer’s input. As the buffer be-
`comes full, the quantiser is made coarser to reduce the
`number of bits used to code each DCT coefficient; and
`as the buffer empties, the DCT quantisation is made
`finer. Other means of controlling the buffer occupancy
`may be used as well as, or instead of, control of the
`DCT coefficient quantisation.
`
`DC coefficients within blocks in intra macro-blocks
`(see below) are differentially encoded before variable
`length coding.
`
`Fig. 5, shows a block diagram of a basic DCT codec,
`with, in this example, the buffer occupancy controlled
`by feedback to the DCT coefficient quantisation.
`
`3.4 Buffering and feedback
`
`The DCT coefficient quantisation, run-length, and
`VLC coding processes produce a varying bit-rate
`which depends upon the complexity of the picture in-
`formation and the amount and type of motion in the
`
`It is important to note that the final bit-rate at the out-
`put of an MPEG video encoder can be freely varied. If
`the output bit-rate is reduced, the buffer will empty
`more slowly and the coder will automatically compen-
`sate by, for example, making the DCT coefficient
`quantisation coarser. Clearly, reducing the output bit-rate
`
`buffer occupancy control
`
`DCT
`
`Q
`
`quantisation
`
`zig-
`zag
`scan
`
`run-
`length
`code
`
`VLC
`
`buffer
`
`B
`
`output
`
`LS
`
`BS
`line-scan
`to
`block-scan
`conversion
`
`Y or
`C or
`B
`
`CR
`
`(R021)
`
`- 4 -
`
`Fig. 5 - Basic DCT coder.
`
`IPR2018-00534
`Sony EX1027 Page 8
`
`
`
`reduces the quality of the decoded pictures. Hence, to
`squeeze more TV channels into an r.f. channel (e.g. by
`transmitting VHS-quality instead of standard broad-
`cast quality), the output bit-rate of the MPEG video
`encoder can be readily reduced to meet this possible
`requirement. Conversely, the need for an HDTV chan-
`nel would demand a much higher bit-rate, and hence,
`geater r.f. channel space. There is, therefore, no need to
`lock input sampling rates to channel bit-rates or
`vice-versa.
`
`3.5 Reduction of temporal redundancy:
`interframe prediction
`In order to exploit the fact that pictures often change
`little from one frame to the next, MPEG includes
`temporal prediction modes; that is, there is an effort to
`predict one frame for coding from a previous
`‘reference’ frame.
`
`Fig. 6 illustrates a basic Differential Pulse Code Modu-
`lation (DPCM) coder, in which only the differences
`between the input and a prediction based on previous,
`locally-decoded output are quantised and transmitted.
`Note that the prediction cannot be based on previous
`source pictures because the prediction has to be repeat-
`able in the decoder where the source pictures are not
`available. Consequently, the coder contains a local de-
`coder which reconstructs pictures exactly as they
`would be in the destination decoder. The locally-
`decoded output then forms the input to the predictor. In
`interframe prediction, samples from one frame are
`used in the prediction of samples in other ‘reference’
`frames.
`
`In MPEG coding, interframe prediction (which re-
`duces temporal redundancy) is combined with the
`DCT and the variable length coding tools described
`above (which reduce spatial redundancy) – see Fig. 7.
`The coder subtracts the prediction from the input to
`form a ‘prediction-error’ picture. The prediction error
`is transformed with the DCT, the coefficients quan-
`tised, and these quantised values coded using a VLC.
`
`The simplest interframe prediction is for anticipating a
`block of samples from the same spatially-positioned
`(co-sited) block in the reference frame. In this case, the
`
`quantiser
`
`Q
`
`input
`
`+
`
`-
`
`quantised
`prediction error
`to channel
`
`predictor
`
`locally
`decoded
`output
`
`Fig. 6 - Basic DPCM coder.
`
`(R021)
`
`- 5 -
`
`prediction
`error
`
`DCT
`
`Q
`
`input
`
`+
`
`-
`
`DCT of
`prediction
`error
`
`inverse
`DCT
`
`frame
`delay
`
`Fig. 7 - DCT with interframe prediction coder.
`
`‘predictor’ would just comprise a delay of exactly one
`frame as shown in Fig. 7. This makes a good prediction
`for stationary regions of the image but is poor in
`moving areas.
`
`3.6 Motion-compensated interframe
`prediction
`A more sophisticated prediction method, known as
`motion-compensated interframe prediction, is to offset
`any translational motion which has occurred between
`the block being coded and the reference frame; and to
`use a shifted block from the reference frame as the pre-
`diction (see Fig. 8 (overleaf)).
`
`One method of determining the motion that has oc-
`curred, between the block being coded and the
`reference frame, is a ‘block-matching’ search in which
`a large number of trial offsets are tested in the coder
`(see Fig. 9 overleaf)). The ‘best’ offset is selected on
`the basis of a measurement of the minimum error be-
`tween the block being coded and the prediction. Since
`MPEG defines only the decoding process, not the en-
`coding, the choice of motion measurement algorithm is
`left to the designer of the coder; it is an area where
`considerable differences in performance occur between
`different algorithms and different implementations. A
`major requirement
`is to have a search area large
`enough to cover any motion that is present from frame
`to frame. However, increasing the size of the search
`area greatly increases the processing needed to find the
`best match – various techniques, such as ‘hierarchical
`block matching’, are used to try to overcome this
`dilemma.
`
`Bi-directional prediction (see Fig. 10 overleaf)) con-
`sists of forming a prediction from both the previous
`frame and the following frame by a linear combination
`of
`these,
`shifted according to suitable motion
`estimates.
`
`Bi-directional prediction is particularly useful where
`motion uncovers areas of detail; although, to enable
`backward prediction from a future frame, the coder re-
`orders the pictures so they are transmitted in a different
`
`IPR2018-00534
`Sony EX1027 Page 9
`
`
`
`frame n-1 (previous)
`
`macroblock
`vector offset
`V H1
`
`intra prediction
`
`frame n (current)
`
`macroblock used for
`motion-compensated
`prediction
`
`16
`
`16
`
`macroblock grid
`position of macroblock
`in previous frame
`
`Fig. 8 - Motion-compensated interframe prediction.
`
`search block
`(macro block)
`
`search area
`
`search block
`is reference
`
`search block moved around
`search area to find
`best match
`
`Fig. 9 - Principle of block-matching motion.
`
`frame n-1 (previous)
`
`order from the displayed order. This process, and the
`reordering to the correct display order in the decoder,
`introduces considerable end-to-end processing delay
`which may be a problem in some applications. To
`overcome this, MPEG defines a profile (see below)
`which does not use bi-directional prediction.
`
`Whereas the basic coding unit for spatial redundancy
`reduction in MPEG is based on an 8 × 8 block, motion-
`compensation in MPEG is usually based on a 16 pixel
`by 16 line macroblock. The size of the macroblock is
`a trade-off between the need to minimise the bit-rate
`needed to transmit the motion representation (known
`as ‘motion vectors’) to the decoder, which argues for a
`large macroblock size, and the need to vary the prediction
`process locally within the picture content and move-
`ment, which lays a claim for a small macroblock size.
`
`To minimise the bit-rate needed to transmit the motion
`vectors, they are differentially encoded with reference
`to previous motion vectors. The motion vector value
`‘prediction error’ is then variable-length coded using
`another VLC table.
`
`Fig. 11 shows a conceptual motion-compensated inter-
`frame DCT coder
`in which,
`for simplicity,
`the
`implementation of the process of motion-compensated
`prediction is illustrated by suggesting the use of a vari-
`able delay. In practical implementations, of course, the
`motion-compensated prediction is implemented in
`other ways.
`
`3.7 Prediction modes
`
`In an MPEG-2 coder, the motion compensated predic-
`
`macroblock
`vector offset
`V H1
`
`intra prediction
`
`position of macroblock
`in frame n+1
`
`bi-directional
`prediction
`
`16
`
`16
`
`frame n (current)
`
`position of macroblock
`in frame n-1
`
`frame n+1
`(next)
`
`macroblock
`vector offset
`V H1
`
`Fig. 10 - Motion-compensated bi-directional prediction.
`
`(R021)
`
`- 6 -
`
`IPR2018-00534
`Sony EX1027 Page 10
`
`
`
`prediction
`error
`
`DCT
`
`Q
`
`input
`
`+
`
`DCT of
`prediction
`error
`
`inverse
`DCT
`
`variable
`delay
`
`delay
`control
`
`motion
`compensation unit
`
`fixed
`delay
`
`displacement
`vectors
`
`Fig. 11 - Motion compensated interframe prediction DCT.
`
`tor supports many methods for generating a prediction.
`For example, a macroblock may be ‘forward pre-
`dicted’ from a past picture, ‘backward predicted’ from
`a future picture (P coded), or ‘interpolated’ by averag-
`ing a forward and backward prediction (B coded).
`Another option is to make a zero value prediction,
`such, that the source image block, rather than the pre-
`diction error-block, is DCT-coded. Such macroblocks
`are known as ‘intra’ or I coded. Intra macroblocks can
`carry motion vector information, although no predic-
`tion information is needed. The motion vector
`information for an I-macroblock is not used, in normal
`circumstances, but its function is to provide a means of
`concealing decoding errors when data errors in the bit-
`stream make it impossible to decode the data for that
`macroblock.
`
`Field or frame prediction coding may be used. Fields
`of a frame may be predicted separately from their own
`motion vector (field prediction coding), or together
`using a common motion vector (frame prediction cod-
`ing). Generally, for image sequences where the motion
`is slow, frame prediction coding is more efficient.
`However, when motion speed increases, field predic-
`tion coding becomes more efficient.
`
`In addition to the two basic modes of field and frame
`prediction, two further modes have been defined:
`
`16 × 8 motion compensation uses at least two motion
`vectors for each macroblock: one vector is used for the
`upper 16 × 8 region and one for the lower half. (In the
`case of B-pictures (see below) a total of four motion
`vectors are used for each macroblock in this mode,
`since both the upper and lower regions may each have
`motion vectors referring to past and future pictures.)
`This mode is permitted only in field-structured pic-
`tures and, in such cases, is intended to allow the spatial
`area that is covered by each motion vector to be ap-
`proximately equal to that of a 16 × 16 macroblock in a
`frame-structured picture.
`
`Dual prime mode may be used in both field- and
`frame-structured coding but is only permitted in P-
`pictures
`(see below) when there have been no
`B-pictures between the P-picture and its reference
`frame. In this case, a motion vector and a differential
`offset motion vector are transmitted. For field pictures,
`two motion vectors are derived from this data and are
`used to form two predictions from two reference fields.
`These two predictions are combined to form the final
`prediction. For frame pictures, this process is repeated
`for each of the two fields; each field is predicted sepa-
`rately, giving rise to a total of four field predictions
`which are combined to form the final two predictions.
`Dual prime mode is used as an alternative to bi-direc-
`tional prediction, where low delay is required; it avoids
`the frame re-ordering needed for bi-directional predic-
`tion but achieves similar coding efficiency.
`
`For each macroblock to be coded, the coder chooses
`between these prediction modes, trying to minimise the
`distortions on the decoded picture within the con-
`straints of the available channel bit-rate. The choice of
`prediction mode is transmitted to the decoder, together
`with the prediction error, so that it can regenerate the
`correct prediction.
`
`Fig. 12 (overleaf) illustrates how a bi-directionally
`coded macroblock (a ‘B’ macroblock) is decoded. The
`switches illustrate the various prediction modes avail-
`able for such a macroblock. Note that the coder has the
`option not to code some macroblocks; no DCT coeffi-
`cient information is transmitted for those blocks, and
`the macroblock address counter-skips to the next coded
`macroblock. The decoder output for the uncoded mac-
`roblocks simply comprises the predictor output.
`
`3.8 Picture types
`
`three ‘picture types’ are defined (see
`In MPEG-2,
`Fig. 13 (overleaf). The picture type defines which pre-
`diction modes may be used to code each macroblock.
`
`3.8.1 Intra pictures
`
`Intra pictures (I-pictures) are coded without reference
`to other pictures. Moderate compression is achieved by
`reducing spatial redundancy but not temporal redun-
`dancy. They are important as they provide access
`points in the bit-stream where decoding can begin
`without reference to previous pictures.
`
`3.8.2 Predictive pictures
`
`Predictive pictures (P-pictures) are coded using mo-
`tion-compensated prediction from a past I- or P-picture
`and may be used as a reference for further prediction.
`By reducing spatial and temporal redundancy, P-pictures
`offer increased compression compared to I-pictures.
`
`(R021)
`
`- 7 -
`
`IPR2018-00534
`Sony EX1027 Page 11
`
`
`
`decoder
`input
`
`input
`buffer
`
`VLC
`decoder
`motion
`vectors
`
`inverse
`quantiser
`
`inverse
`DCT
`difference picture
`not coded
`
``coded
`
`previous
`I or P
`picture
`store
`
`future
`I or P
`picture
`store
`
`decoder
`output
`
`adder
`
`display
`buffer
`
`prediction
`
`forward
`prediction
`
`motion
`compensation
`
`interpolated
`prediction
`
`interpolator
`
`motion
`compensation
`
`backward
`prediction
`
`no
`prediction
`
`Fig. 12 - Decoding a ‘B’ macroblock.
`
`3.8.3 Bi-directionally-predictive pictures
`Bi-directional-predictive pictures (B-pictures) use
`both past and future I- or P-pictures for motion com-
`pensation, and offer the highest degree of compression.
`As noted above, to enable backward prediction from a
`future frame, the coder re-orders the pictures from
`natural display order to ‘transmission’ (or ‘bitstream’)
`order so that the B-picture is transmitted after the past
`and future pictures which it references. (See Fig. 14).
`This introduces a delay which depends upon the
`number of consecutive B-pictures.
`
`3.8.4 Group of pictures
`The different picture types typically occur in a repeat-
`ing sequence termed a Group of Pictures or GOP. A
`typical GOP is illustrated in display order in Fig. 14(a),
`and in transmis