throbber
BBC RD1996/3
`
`Researchand
`Development
`Report
`
`MPEGVIDEOCODING:
`Abasictutorialintroduction
`
`S.R.Ely, Ph.D.,C.Eng. ,M.I.E.E.
`
`Research&DevelopmentDepartment
`Policy&PlanningDirectorate
`THEBRITISHBROADCASTINGCORPORATION
`
`IPR2018-00534
`Sony EX1027 Page 1
`
`

`

`BBC RD 1996/3
`
`*MPEG VIDEO CODING: A basic tutorial introduction
`
`S.R. Ely, Ph.D., C.Eng., M.I.E.E.
`
`Summary
`
`MPEG has been outstandingly successful in defining standards for video compression
`coding, serving a wide range of applications, bit-rates, qualities and services on a
`worldwide basis. The standards are based upon a flexible toolkit of techniques for bit-rate
`reduction. MPEG video coding uses a combination of motion-compensated interframe
`prediction (for reducing temporal redundancy) with Discrete Cosine Transform (DCT) and
`variable length coding tools (for reducing spatial redundancy). The specification only
`defines the bitstream syntax and decoding process: the coding process is not specified and
`the performance of a coder will vary depending upon, for example, the quality of the
`motion-vector measurement, and the processes used for prediction-mode selection.
`
`* This Report is based on a European Broadcasting Union (EBU) Review item, from the ‘EBUReview,Winter1995(Ed.266)’,
`and is published with permission. It was originally written by BBC R&D Manager, Dr. S.R. Ely and entitled: ‘MPEGvideo–
`Asimpleintroduction’.
`
`Issued under the Authority of
`
`Research & Development Department
`Policy & Planning Directorate
`BRITISH BROADCASTING CORPORATION
`
`General Manager
`Research & Development Department
`
`(R021)
`
`1996
`
`IPR2018-00534
`Sony EX1027 Page 2
`
`

`

`(cid:211) British Broadcasting Corporation
`
`No part of this publication may be reproduced, stored in a
`retrieval system, or transmitted in any form or by any
`means, electronic, mechanical, photocopying, recording,
`or otherwise, without prior permission.
`
`(R021)
`
`IPR2018-00534
`Sony EX1027 Page 3
`
`

`

`MPEG VIDEO CODING: A basic tutorial introduction
`
`S.R. Ely, Ph.D., C.Eng., M.I.E.E.
`
`1.
`
`INTRODUCTION ................................................................................................................. 1
`
`2. VIDEO CODING PRINCIPLES ......................................................................................... 1
`
`3. MPEG VIDEO COMPRESSION TOOLKIT ..................................................................... 2
`
`3.1 Discrete cosine transform ........................................................................................... 2
`
`3.2 Coefficient quantisation ............................................................................................... 3
`
`3.3 Zig-zag coefficient scanning, run-length coding, and variable length coding ....... 3
`
`3.4 Buffering and feedback ................................................................................................ 4
`
`3.5 Reduction of temporal redundancy: interframe prediction ...................................... 5
`
`3.6 Motion-compensated interframe prediction ............................................................... 5
`
`3.7 Prediction modes .......................................................................................................... 6
`
`3.8 Picture types ................................................................................................................. 7
`3.8.1 Intra pictures ......................................................................................................... 7
`3.8.2 Predictive pictures ................................................................................................. 7
`3.8.3 Bi-directionally-predictive pictures ......................................................................... 8
`3.8.4 Group of pictures ................................................................................................... 8
`
`4. MPEG PROFILES AND LEVELS .................................................................................... 8
`
`5. CONCLUSIONS ................................................................................................................ 10
`
`6. ACKNOWLEDGEMENTS ............................................................................................... 10
`
`7. REFERENCES .................................................................................................................. 10
`
`(R021)
`
`IPR2018-00534
`Sony EX1027 Page 4
`
`

`

`MPEG VIDEO CODING: A basic tutorial introduction
`
`S.R. Ely, Ph.D. C.Eng., M.I.E.E.
`
`1.
`
`INTRODUCTION
`
`2. VIDEO CODING PRINCIPLES
`
`A studio-quality 625-line component picture, when
`digitised according to ITU Recommendation 601/656
`(i.e. 4:2:2 sampling), requires 216 Mbit/s to convey the
`luminance and two chrominance sample components
`(see Fig. 1). For bandwidth-restricted media (such as
`terrestrial or satellite channels), some way of reducing
`the very high bit-rate needed to represent the digitised
`picture must be obtained.
`
`A video bit-rate reduction system5 (for producing com-
`pression) operates by removing redundant and less
`important information from the signal prior to trans-
`mission, and then reconstructing an approximation of
`the image from the remaining (compressed) informa-
`tion at the decoder. In video signals, three distinct
`kinds of redundancy can be identified:
`• Spatial and temporal redundancy: pixel val-
`ues are not independent but are correlated with
`their neighbours, both within the same frame and
`across frames. So, to some extent, the value of a
`pixel is predictable, given the values of neigh-
`bouring pixels.
`• Entropy redundancy: for any non-random dig-
`itised signal, some code values occur more
`frequently than others. This can be exploited by
`coding the more frequently-occurring values
`with shorter codes than would be used for the
`rarer values. This same principle has long been
`exploited in the Morse Code, where the com-
`monest
`letters in English (‘E’ and ‘T’) are
`represented by one dot and one dash respectively
`whereas the rarest (‘X’, ‘Y’ and ‘Z’) are repre-
`sented by four dots and dashes.
`
`R
`
`G
`
`B
`
`RGB
`to
`YUV
`matrix
`
`5.75 MHz
`
`2.75 MHz
`
`2.75 MHz
`
`8 bits
`
`8 bits
`
`8 bits
`
`Y
`
`C B
`
`RC
`
`ADC
`
`13.5 MHz
`
`ADC
`
`6.75 MHz
`ADC
`
`6.75 MHz
`
`Y = 8 x 13.5 = 108
`
`C = 8 x 6.75 = 54
`
`C = 8 x 6.75 = 54
`216 Mbit/s
`
`B R
`
`Fig. 1 - 4:2:2 sampling.
`
`MPEG (Moving Pictures Expert Group) started in
`1988 as a Working Group of the International Stand-
`ards Organisation (ISO)1 with the aim of defining
`standards for digital compression of video and audio
`signals. It took as its basis the ITU-T standard for
`video-conferencing and video-telephony* with that of
`JPEG (Joint Photographic Experts Group) which was
`initially developed for compressing still images such
`as electronic photography.
`
`The first goal of MPEG was to define a video coding
`algorithm for digital storage media; in particular, for
`the CD-ROM. The resulting standard was published in
`1993.** It comprises three parts, covering:
`• systems aspects (including multiplexing and syn-
`chronisation)2,
`• video coding
`• audio coding.3
`
`It has been applied in the Interactive CD (CDi) system
`to provide full motion video playback from CD, and is
`widely used in PC applications, for which a range of
`hardware and software coders and decoders are avail-
`able. This standard is known as MPEG-1 and is
`restricted to non-interlaced video formats; it is primar-
`ily intended to support video coding at bit-rates up to
`about 1.5 Mbit/s.
`
`In 1990, MPEG began work on a second standard, to
`be capable of coding interlaced pictures directly; origi-
`nally to support high-quality applications at bit-rates in
`the range of about 5 to 10 Mbit/s. MPEG-2,4 as it is
`now known, also supports high definition formats at
`bit-rates in the range of about 15 to 30 Mbit/s. As for
`MPEG-1,
`the MPEG-2 standard (published in
`1994***) is comprised of three parts: systems, video
`and audio.
`
`It is important to note that the MPEG standards specify
`only the syntax and semantics of the bit-streams and
`the decoding process; they do not specify the encoding
`process. Much of the latter is left to the discretion of
`the coder designers and this gives scope for improve-
`ment as coding techniques are refined and new
`techniques developed.
`
`*
`This is now known as Working Group H261.
`** As ISO/IEC 11172.
`***As ISO/IEC 13818.
`
`(R021)
`
`IPR2018-00534
`Sony EX1027 Page 5
`
`

`

`• Psycho visual Redundancy: this form of redun-
`dancy results from the way the eye and brain
`work. In audio, the limited frequency response of
`the ear is well appreciated. In video, both the
`limit to the fine detail which the eye can resolve
`(limits of spatial resolution) and the limits in
`ability to track fast-moving images (limits of
`temporal resolution), must be considered. The
`latter means, for example, that a shot-change
`masks fine detail on either side of the change.
`
`3. MPEG VIDEO COMPRESSION TOOLKIT
`
`Sample-rate reduction is a very effective method of re-
`ducing bit-rate, but of course introduces irreversible
`loss of resolution. For very low bit-rate applications
`(e.g. in MPEG-1), alternate fields are discarded and the
`horizontal sampling-rate reduced to around 360 pixels
`per line (giving about 3.3 MHz resolution). The sample
`rate for the chrominance is half that of the luminance,
`both horizontally and vertically. In this way, the bit-
`rate can be reduced to less than one fifth of that of a
`conventional definition (4:2:2-sampled) signal.
`
`For ‘broadcast quality’, at bit-rates in the range 3 to
`10 Mbit/s, horizontal sample-rate reduction is not ad-
`visable for the luminance or chrominance signals, nor
`is temporal sub-sampling. However, for distribution
`and broadcast applications, sufficient chrominance
`resolution can be provided with the vertical chromi-
`nance sampling frequency halved. Thus, for most
`MPEG-2 coding applications, 4:2:0 sampling is likely
`to be used, rather than 4:2:2, although the latter, and
`4:4:4 sampling, is also supported. It may be of interest
`to note that a conventional delay-line PAL decoder
`effectively yields the same vertical sub-sampling of the
`chrominance signals as 4:2:0 sampling.
`
`Apart from sample-rate reduction, the MPEG toolkit
`includes two different kinds of tools to exploit redun-
`dancy in images:
`• Discrete Cosine Transform (DCT)6, 7 is similar
`to the Discrete Fourier Transform (DFT). The
`purpose of using this orthogonal transform is to
`assist the processing to remove spatial redun-
`dancy by concentrating the signal energy into
`relatively few coefficients.
`• Motion-Compensated interframe prediction is
`used to remove temporal redundancy. This is
`based on techniques similar to the well known
`differential pulse-code modulation (DPCM)
`principle.
`
`3.1 Discrete cosine transform
`
`itised 625-line picture comprising about 704 pixels
`horizontally and about 576 lines vertically (see Fig. 2).
`In MPEG coding, spatial redundancy is removed by
`processing the digitised signals in two-dimensional
`blocks of 8 pixels by 8 lines (taken from either one
`field or two, depending on the mode of operation).
`
`As Fig. 3 illustrates, the DCT transform is a reversible
`process which maps between the normal 2D presenta-
`tion of the image and one which represents the same
`information in what may be thought of as the ‘fre-
`quency’ domain. Each coefficient in the 8 × 8 DCT
`domain block indicates the contribution of a different
`DCT ‘basis’ function (top-left in Fig. 3); it is called the
`DC coefficient, and may be thought of as representing
`the average brightness of the block. Moving down the
`block in Fig. 3, the coefficients represent increasing
`vertical frequencies; and moving along the block, from
`left to right, represents increasing horizontal frequencies.
`
`The DCT does not directly reduce the number of bits
`required to represent the block. In fact for an 8 × 8 im-
`age block of 8-bit pixels, the DCT produces an 8 × 8
`block of at least 11-bit DCT coefficients to allow for
`reversibility! The reduction in the number of bits fol-
`lows from the fact that for typical blocks of natural
`images, the distribution of coefficients is non-uniform
`– the transform tends to concentrate the energy into the
`low-frequency coefficients, and many of the other co-
`efficients are near zero. The bit-rate reduction is
`achieved by not transmitting the near-zero coefficients,
`and by quantising and coding the remaining co-effi-
`cients as described below. The non-uniform coefficient
`distribution is a result of the spatial redundancy present
`in the original image block.
`
`Many different forms of transformation have been in-
`vestigated for bit-rate reduction. The best transforms
`are those which tend to concentrate the energy of a pic-
`ture block into a few coefficients. The DCT is one of
`the best transforms in this respect and has the advan-
`tage that the DCT and its inverse are easy to implement
`in digital processing. The choice of 8 × 8 block size is
`a trade-off between the need to use a large picture area
`
`88 blocks
`704 pixels
`
`72
`blocks 576
`lines
`
`8
`
`8
`
`Consider the luminance signal of a 4:2:0-sampled dig-
`
`Fig. 2 - Block-based DCT.
`
`(R021)
`
`- 2 -
`
`IPR2018-00534
`Sony EX1027 Page 6
`
`

`

`8
`
`d.c. coefficient
`
`horizontal
`frequencies
`
`8
`
`DCT
`
`IDCT
`
`vertical
`frequencies
`
`increasing
`cycles per
`picture width
`
`8
`
`increasing
`cycles per
`picture height
`
`8
`
`Fig. 3 - DCT transform pairs.
`
`the energy compaction
`for the transform, so that
`described above is most efficient; and also, the fact
`that the content and movement of the picture varies
`spatially, which would tend to argue for a smaller
`block-size. A large block size would also emphasise
`variations from block-to-block in the decoded picture;
`it would also emphasise the effects of ‘windowing’ by
`the block structure.
`
`3.2 Coefficient quantisation
`
`After a block has been transformed, the transform co-
`efficients are quantised. Different quantisation is
`applied to each coefficient depending on the spatial
`frequency within the block that it represents. The ob-
`jective is to minimise the number of bits which must
`be transmitted to the decoder so that it can perform the
`inverse transform and reconstruct the image: reduced
`quantisation accuracy reduces the number of bits
`which need to be transmitted to represent a given DCT
`coefficient, but
`increases the possible quantisation
`error for that coefficient. Note that the quantisation
`noise introduced by the coder is not reversible in the
`decoder, so the coding and decoding process is ‘lossy’.
`
`More quantisation error can be tolerated in the high-
`frequency coefficients because high-frequency noise is
`less visible than low-frequency quantisation noise.
`Also, quantisation noise is less visible in the chromi-
`nance components than in the luminance component.
`MPEG uses weighting matrices to define the relative
`accuracy of the quantisation of the different coeffi-
`cients. Different weighting matrices can be used for
`different frames depending on the prediction mode
`used.
`
`The weighted coefficients are then passed through a
`fixed quantisation law which is usually a linear law.
`However, for some prediction modes there is an in-
`creased threshold level (i.e. a dead-zone) around zero.
`The effect of this threshold is to maximise the number
`of coefficients which are quantised to zero. In practice,
`it is found that small deviations around zero are usu-
`ally caused by noise in the signal; so that suppressing
`
`these values actually gives an apparent improvement to
`the subjective picture quality.
`
`Quantisation noise is more visible in some blocks than
`in others – for example, in blocks which contain a
`high-contrast edge between two plain areas. In such
`blocks, the quantisation parameters can be modified to
`limit the maximum quantisation error, particularly in
`the high-frequency coefficients.
`
`3.3 Zig-zag coefficient scanning, run-length
`coding, and variable length coding
`After quantisation, the 8 × 8 blocks of DCT coeffi-
`cients are scanned in a zig-zag pattern (see Fig. 4
`(overleaf) to turn the 2D array into a serial string of
`quantised coefficients. Two scan patterns are defined:
`one is usually preferable for picture material which has
`strong vertical frequency components, due to, perhaps,
`the interlace picture structure. In this scan pattern there
`is a bias to scan vertical coefficients first. In the other,
`which is preferable for pictures without a strong verti-
`cal structure, there is no bias and the scan proceeds
`diagonally from top left to bottom right as illustrated in
`Fig. 4. The coder signals its choice of scan pattern to
`the decoder.
`
`The strings of coefficients produced by the zig-zag
`scanning are coded by counting the number of zero co-
`efficients preceding a non-zero coefficient; that is, they
`are run-length coded. The run-length value and the
`value of the non-zero coefficient which the run of zero
`coefficients precedes are then combined and coded using
`a variable length code (VLC). This VLC coding exploits
`the fact that short runs of zeros are more likely than
`long ones and small coefficients are more likely than large
`ones. The VLC allocates codes which have different
`lengths depending upon the expected frequency of
`occurrence of each zero-run-length/non-zero coeffi-
`cient value combination. Common combinations use
`short code words, less common combinations long
`code words. All other combinations are coded by the
`combination of an escape code and two fixed length
`codes, one 6-bit word to indicate the run length and
`
`(R021)
`
`- 3 -
`
`IPR2018-00534
`Sony EX1027 Page 7
`
`

`

`Zig-Zag scanning
`
`d.c. coefficient
`
`horizontal
`frequencies
`
`vertical
`frequencies
`
`Run/amplitude coding :- run of zeros and amplitude of DCT coefficient
`given one Variable Length Code (VLC) (Huffman Code)
`
`Fig. 4 - Scanning of DCT blocks and run-length coding with variable length codes (Entropy coding).
`
`one 12-bit word to indicate the coefficient value.
`
`One VLC code table is used in most circumstances.
`However, a second VLC code table is used for some
`special pictures. The DC coefficient is treated differ-
`ently in some modes. But, all the VLCs are designed
`such that no complete codeword is the prefix of any
`other codeword – they are similar to the well-known
`Huffman code. Thus, the decoder can identify where
`one variable length codeword ends and another starts
`when operating within the correct codebook. No VLC,
`or combination of codes,
`is allowed to produce a
`sequence of 23 contiguous zeros – this combination is
`used for synchronisation purposes.
`
`picture. To produce the constant bit-rate needed for
`transmission over a fixed bit-rate system, a buffer is
`needed to smooth out the variations in bit-rate. For pre-
`venting overflow or underflow of this buffer,
`its
`occupancy is monitored and feedback applied to cod-
`ing processes to control the input to the buffer. The
`DCT quantisation process is often used to provide
`direct control of the buffer’s input. As the buffer be-
`comes full, the quantiser is made coarser to reduce the
`number of bits used to code each DCT coefficient; and
`as the buffer empties, the DCT quantisation is made
`finer. Other means of controlling the buffer occupancy
`may be used as well as, or instead of, control of the
`DCT coefficient quantisation.
`
`DC coefficients within blocks in intra macro-blocks
`(see below) are differentially encoded before variable
`length coding.
`
`Fig. 5, shows a block diagram of a basic DCT codec,
`with, in this example, the buffer occupancy controlled
`by feedback to the DCT coefficient quantisation.
`
`3.4 Buffering and feedback
`
`The DCT coefficient quantisation, run-length, and
`VLC coding processes produce a varying bit-rate
`which depends upon the complexity of the picture in-
`formation and the amount and type of motion in the
`
`It is important to note that the final bit-rate at the out-
`put of an MPEG video encoder can be freely varied. If
`the output bit-rate is reduced, the buffer will empty
`more slowly and the coder will automatically compen-
`sate by, for example, making the DCT coefficient
`quantisation coarser. Clearly, reducing the output bit-rate
`
`buffer occupancy control
`
`DCT
`
`Q
`
`quantisation
`
`zig-
`zag
`scan
`
`run-
`length
`code
`
`VLC
`
`buffer
`
`B
`
`output
`
`LS
`
`BS
`line-scan
`to
`block-scan
`conversion
`
`Y or
`C or
`B
`
`CR
`
`(R021)
`
`- 4 -
`
`Fig. 5 - Basic DCT coder.
`
`IPR2018-00534
`Sony EX1027 Page 8
`
`

`

`reduces the quality of the decoded pictures. Hence, to
`squeeze more TV channels into an r.f. channel (e.g. by
`transmitting VHS-quality instead of standard broad-
`cast quality), the output bit-rate of the MPEG video
`encoder can be readily reduced to meet this possible
`requirement. Conversely, the need for an HDTV chan-
`nel would demand a much higher bit-rate, and hence,
`geater r.f. channel space. There is, therefore, no need to
`lock input sampling rates to channel bit-rates or
`vice-versa.
`
`3.5 Reduction of temporal redundancy:
`interframe prediction
`In order to exploit the fact that pictures often change
`little from one frame to the next, MPEG includes
`temporal prediction modes; that is, there is an effort to
`predict one frame for coding from a previous
`‘reference’ frame.
`
`Fig. 6 illustrates a basic Differential Pulse Code Modu-
`lation (DPCM) coder, in which only the differences
`between the input and a prediction based on previous,
`locally-decoded output are quantised and transmitted.
`Note that the prediction cannot be based on previous
`source pictures because the prediction has to be repeat-
`able in the decoder where the source pictures are not
`available. Consequently, the coder contains a local de-
`coder which reconstructs pictures exactly as they
`would be in the destination decoder. The locally-
`decoded output then forms the input to the predictor. In
`interframe prediction, samples from one frame are
`used in the prediction of samples in other ‘reference’
`frames.
`
`In MPEG coding, interframe prediction (which re-
`duces temporal redundancy) is combined with the
`DCT and the variable length coding tools described
`above (which reduce spatial redundancy) – see Fig. 7.
`The coder subtracts the prediction from the input to
`form a ‘prediction-error’ picture. The prediction error
`is transformed with the DCT, the coefficients quan-
`tised, and these quantised values coded using a VLC.
`
`The simplest interframe prediction is for anticipating a
`block of samples from the same spatially-positioned
`(co-sited) block in the reference frame. In this case, the
`
`quantiser
`
`Q
`
`input
`
`+
`
`-
`
`quantised
`prediction error
`to channel
`
`predictor
`
`locally
`decoded
`output
`
`Fig. 6 - Basic DPCM coder.
`
`(R021)
`
`- 5 -
`
`prediction
`error
`
`DCT
`
`Q
`
`input
`
`+
`
`-
`
`DCT of
`prediction
`error
`
`inverse
`DCT
`
`frame
`delay
`
`Fig. 7 - DCT with interframe prediction coder.
`
`‘predictor’ would just comprise a delay of exactly one
`frame as shown in Fig. 7. This makes a good prediction
`for stationary regions of the image but is poor in
`moving areas.
`
`3.6 Motion-compensated interframe
`prediction
`A more sophisticated prediction method, known as
`motion-compensated interframe prediction, is to offset
`any translational motion which has occurred between
`the block being coded and the reference frame; and to
`use a shifted block from the reference frame as the pre-
`diction (see Fig. 8 (overleaf)).
`
`One method of determining the motion that has oc-
`curred, between the block being coded and the
`reference frame, is a ‘block-matching’ search in which
`a large number of trial offsets are tested in the coder
`(see Fig. 9 overleaf)). The ‘best’ offset is selected on
`the basis of a measurement of the minimum error be-
`tween the block being coded and the prediction. Since
`MPEG defines only the decoding process, not the en-
`coding, the choice of motion measurement algorithm is
`left to the designer of the coder; it is an area where
`considerable differences in performance occur between
`different algorithms and different implementations. A
`major requirement
`is to have a search area large
`enough to cover any motion that is present from frame
`to frame. However, increasing the size of the search
`area greatly increases the processing needed to find the
`best match – various techniques, such as ‘hierarchical
`block matching’, are used to try to overcome this
`dilemma.
`
`Bi-directional prediction (see Fig. 10 overleaf)) con-
`sists of forming a prediction from both the previous
`frame and the following frame by a linear combination
`of
`these,
`shifted according to suitable motion
`estimates.
`
`Bi-directional prediction is particularly useful where
`motion uncovers areas of detail; although, to enable
`backward prediction from a future frame, the coder re-
`orders the pictures so they are transmitted in a different
`
`IPR2018-00534
`Sony EX1027 Page 9
`
`

`

`frame n-1 (previous)
`
`macroblock
`vector offset
`V H1
`
`intra prediction
`
`frame n (current)
`
`macroblock used for
`motion-compensated
`prediction
`
`16
`
`16
`
`macroblock grid
`position of macroblock
`in previous frame
`
`Fig. 8 - Motion-compensated interframe prediction.
`
`search block
`(macro block)
`
`search area
`
`search block
`is reference
`
`search block moved around
`search area to find
`best match
`
`Fig. 9 - Principle of block-matching motion.
`
`frame n-1 (previous)
`
`order from the displayed order. This process, and the
`reordering to the correct display order in the decoder,
`introduces considerable end-to-end processing delay
`which may be a problem in some applications. To
`overcome this, MPEG defines a profile (see below)
`which does not use bi-directional prediction.
`
`Whereas the basic coding unit for spatial redundancy
`reduction in MPEG is based on an 8 × 8 block, motion-
`compensation in MPEG is usually based on a 16 pixel
`by 16 line macroblock. The size of the macroblock is
`a trade-off between the need to minimise the bit-rate
`needed to transmit the motion representation (known
`as ‘motion vectors’) to the decoder, which argues for a
`large macroblock size, and the need to vary the prediction
`process locally within the picture content and move-
`ment, which lays a claim for a small macroblock size.
`
`To minimise the bit-rate needed to transmit the motion
`vectors, they are differentially encoded with reference
`to previous motion vectors. The motion vector value
`‘prediction error’ is then variable-length coded using
`another VLC table.
`
`Fig. 11 shows a conceptual motion-compensated inter-
`frame DCT coder
`in which,
`for simplicity,
`the
`implementation of the process of motion-compensated
`prediction is illustrated by suggesting the use of a vari-
`able delay. In practical implementations, of course, the
`motion-compensated prediction is implemented in
`other ways.
`
`3.7 Prediction modes
`
`In an MPEG-2 coder, the motion compensated predic-
`
`macroblock
`vector offset
`V H1
`
`intra prediction
`
`position of macroblock
`in frame n+1
`
`bi-directional
`prediction
`
`16
`
`16
`
`frame n (current)
`
`position of macroblock
`in frame n-1
`
`frame n+1
`(next)
`
`macroblock
`vector offset
`V H1
`
`Fig. 10 - Motion-compensated bi-directional prediction.
`
`(R021)
`
`- 6 -
`
`IPR2018-00534
`Sony EX1027 Page 10
`
`

`

`prediction
`error
`
`DCT
`
`Q
`
`input
`
`+
`
`DCT of
`prediction
`error
`
`inverse
`DCT
`
`variable
`delay
`
`delay
`control
`
`motion
`compensation unit
`
`fixed
`delay
`
`displacement
`vectors
`
`Fig. 11 - Motion compensated interframe prediction DCT.
`
`tor supports many methods for generating a prediction.
`For example, a macroblock may be ‘forward pre-
`dicted’ from a past picture, ‘backward predicted’ from
`a future picture (P coded), or ‘interpolated’ by averag-
`ing a forward and backward prediction (B coded).
`Another option is to make a zero value prediction,
`such, that the source image block, rather than the pre-
`diction error-block, is DCT-coded. Such macroblocks
`are known as ‘intra’ or I coded. Intra macroblocks can
`carry motion vector information, although no predic-
`tion information is needed. The motion vector
`information for an I-macroblock is not used, in normal
`circumstances, but its function is to provide a means of
`concealing decoding errors when data errors in the bit-
`stream make it impossible to decode the data for that
`macroblock.
`
`Field or frame prediction coding may be used. Fields
`of a frame may be predicted separately from their own
`motion vector (field prediction coding), or together
`using a common motion vector (frame prediction cod-
`ing). Generally, for image sequences where the motion
`is slow, frame prediction coding is more efficient.
`However, when motion speed increases, field predic-
`tion coding becomes more efficient.
`
`In addition to the two basic modes of field and frame
`prediction, two further modes have been defined:
`
`16 × 8 motion compensation uses at least two motion
`vectors for each macroblock: one vector is used for the
`upper 16 × 8 region and one for the lower half. (In the
`case of B-pictures (see below) a total of four motion
`vectors are used for each macroblock in this mode,
`since both the upper and lower regions may each have
`motion vectors referring to past and future pictures.)
`This mode is permitted only in field-structured pic-
`tures and, in such cases, is intended to allow the spatial
`area that is covered by each motion vector to be ap-
`proximately equal to that of a 16 × 16 macroblock in a
`frame-structured picture.
`
`Dual prime mode may be used in both field- and
`frame-structured coding but is only permitted in P-
`pictures
`(see below) when there have been no
`B-pictures between the P-picture and its reference
`frame. In this case, a motion vector and a differential
`offset motion vector are transmitted. For field pictures,
`two motion vectors are derived from this data and are
`used to form two predictions from two reference fields.
`These two predictions are combined to form the final
`prediction. For frame pictures, this process is repeated
`for each of the two fields; each field is predicted sepa-
`rately, giving rise to a total of four field predictions
`which are combined to form the final two predictions.
`Dual prime mode is used as an alternative to bi-direc-
`tional prediction, where low delay is required; it avoids
`the frame re-ordering needed for bi-directional predic-
`tion but achieves similar coding efficiency.
`
`For each macroblock to be coded, the coder chooses
`between these prediction modes, trying to minimise the
`distortions on the decoded picture within the con-
`straints of the available channel bit-rate. The choice of
`prediction mode is transmitted to the decoder, together
`with the prediction error, so that it can regenerate the
`correct prediction.
`
`Fig. 12 (overleaf) illustrates how a bi-directionally
`coded macroblock (a ‘B’ macroblock) is decoded. The
`switches illustrate the various prediction modes avail-
`able for such a macroblock. Note that the coder has the
`option not to code some macroblocks; no DCT coeffi-
`cient information is transmitted for those blocks, and
`the macroblock address counter-skips to the next coded
`macroblock. The decoder output for the uncoded mac-
`roblocks simply comprises the predictor output.
`
`3.8 Picture types
`
`three ‘picture types’ are defined (see
`In MPEG-2,
`Fig. 13 (overleaf). The picture type defines which pre-
`diction modes may be used to code each macroblock.
`
`3.8.1 Intra pictures
`
`Intra pictures (I-pictures) are coded without reference
`to other pictures. Moderate compression is achieved by
`reducing spatial redundancy but not temporal redun-
`dancy. They are important as they provide access
`points in the bit-stream where decoding can begin
`without reference to previous pictures.
`
`3.8.2 Predictive pictures
`
`Predictive pictures (P-pictures) are coded using mo-
`tion-compensated prediction from a past I- or P-picture
`and may be used as a reference for further prediction.
`By reducing spatial and temporal redundancy, P-pictures
`offer increased compression compared to I-pictures.
`
`(R021)
`
`- 7 -
`
`IPR2018-00534
`Sony EX1027 Page 11
`
`

`

`decoder
`input
`
`input
`buffer
`
`VLC
`decoder
`motion
`vectors
`
`inverse
`quantiser
`
`inverse
`DCT
`difference picture
`not coded
`
``coded
`
`previous
`I or P
`picture
`store
`
`future
`I or P
`picture
`store
`
`decoder
`output
`
`adder
`
`display
`buffer
`
`prediction
`
`forward
`prediction
`
`motion
`compensation
`
`interpolated
`prediction
`
`interpolator
`
`motion
`compensation
`
`backward
`prediction
`
`no
`prediction
`
`Fig. 12 - Decoding a ‘B’ macroblock.
`
`3.8.3 Bi-directionally-predictive pictures
`Bi-directional-predictive pictures (B-pictures) use
`both past and future I- or P-pictures for motion com-
`pensation, and offer the highest degree of compression.
`As noted above, to enable backward prediction from a
`future frame, the coder re-orders the pictures from
`natural display order to ‘transmission’ (or ‘bitstream’)
`order so that the B-picture is transmitted after the past
`and future pictures which it references. (See Fig. 14).
`This introduces a delay which depends upon the
`number of consecutive B-pictures.
`
`3.8.4 Group of pictures
`The different picture types typically occur in a repeat-
`ing sequence termed a Group of Pictures or GOP. A
`typical GOP is illustrated in display order in Fig. 14(a),
`and in transmis

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket