`
`9.1-8.1 Introduction
`This is a first definition of B pictures to be used in TML. It is mainly intended to get started on work of
`testing relevant coding tools. Due to the early stage of definition, a separate description and definition of
`syntax elements is included in this section. In a later version of TML it is foreseen that B-frames will be
`fully incorporated in the remaining definition. The use of B pictures is indicated in PTYPE.
`Temporal scalability is achieved using bi-directionally predicted pictures, or B pictures. The B pictures
`are predicted from either or both the previous and subsequent reconstructed pictures to achieve improved
`coding efficiency as compared to that of P pictures. The B pictures are disposable, since the B pictures
`are not used as reference pictures for the prediction of any other pictures. This property allows B pictures
`to be discarded without destroying the ability to decode the sequence and adversely affecting the quality
`of any subsequent pictures, thus providing temporal scalability. Figure A.l illustrates the predictive
`structure with two B pictures inserted between IIP pictures.
`
`FIGURE 18
`
`Illustration of B picture concept.
`
`The location of B pictures in the hitstream is in a data-dependence order rather than in temporal order.
`Pictures that are dependent on other pictures shall be located in the bitstream after the pictures on which
`they depend. For example, as illustrated in Figure A.l, B2 and B3 are dependent on 11 and P4, and B 5 and
`B6 are dependent on P4 and P7• Therefore the hitstream syntax order of the encoded pictures would be I~o
`P.h B2, B3, P7 , B5 , B6, • .• • However, the display order of the decoded pictures should be It. B2, B3, P 4, B5,
`B 6, P7, .. .. The difference between the bitstream order of encoded pictures and the display order of
`decoded pictures will increase latency and memory to buffer the P pictures.
`There is no limit to the number of B pictures that may be inserted between each liP picture pair. The
`maximum number of such pictures may be signaled by external means (for example Recommendation
`H.245). The picture height, width, and pixel aspect ratio of a B picture shall always be equal to those of
`its temporally subsequent reference picture.
`The B pictures described in this section support multiple reference frame prediction. The maximum
`number of previous reference frames that may be used for prediction in B pictures must be less than or
`equal to the number of · reference frames used in the immediately foUowing P frame, and it may be
`signaled by external means (for example Recommendation H.245). The use of this mode is indicated by
`PTYPE.
`
`~ve Prediction modes
`There are five different prediction modes supported by B pictures. They are direct, forward, backward,
`bi-directional and the intra prediction modes. Both direct mode and hi-directional mode are bi-directional
`prediction. The only difference is that the bi-directional mode uses separate motion vectors for forward
`and backward prediction, whereas the forward and backward motion vectors of the direct mode are
`derived from the motion vectors used in the corresponding macroblocks of the subsequent reference
`frame. In the direct mode, the same number of motion vectors are used as are used in the reference
`
`File:++VCEG-N83dl.doc
`
`Page: 41
`
`Vedanti Systems Limited - Ex. 2006
`Page 41
`
`
`
`macroblock for prediction. To calculate prediction blocks for the direct and bi-directional prediction
`mode, the forward and backward motion vectors are used to obtain appropriate blocks from reference
`frames and then these blocks are averaged by dividing the sum of the two prediction blocks by two.
`Forward prediction means prediction from a previous reference picture, \lnd backward prediction means
`prediction from a temporally subsequent reference picture.
`The intra prediction means to encode the macroblock by using intra coding.
`
`9-J.8.3.Finding optimum prediction mode
`SA(T)D is initialized by a bias value to f11vor a prediction mode thatneeds few bits to be transmitted. This
`bias value is bit usage_ times QP0(QP) for the given coding mode.
`For flat regions having zero motion, B pictures basically fail to make effective use of zero motion and
`instead are penalized in performance by selecting 16xl6 intra mode. Therefore, in order to prevent
`assigning 16xl6 intra mode to a region with little details and zero motion, SA(T)D of direct mode is
`subtracted by 16xQP0(QP) to bias the decision toward selecting the direct mode.
`And SA(T)D of 4x4 intra mode employs the same manner as section 5.1 .
`The calculation of SA(T)D at each mode is as follows.
`• Forward prediction mode :
`SA(T)DO = QP0(QP) x (2xcode_number_of_Ref_frame + Bits_to_code_MVDFW)
`• Backward prediction mode :
`SA(T)DO = QP0(QP) x Bits_to_code_MVDBW
`• Bi-directional prediction mode:
`SA(T)DO = QP0(QP) x (2xcode_number_of_Ref_frame + Bits_to_code_forward_Blk_size +
`Bits_to_code_backward_Blk_size +
`Bits_to_code_MVDFW +
`Bits_to_code_MVDBW)
`• Direct prediction mode :
`SA(T)D = SA(T)D- 16 x QP0(QP)
`4x4 Intra mode :
`SA(T)DO == QP0(QP) x Order_of_prediction_mode
`• SA(T)D = SA(T)D + 24 x QP0(QP)
`Finally the mode with the minimum SA(T)Dmin is selected as an optimum prediction mode.
`
`•
`
`9.4MSyntax
`Some additional syntax elements are needed for B pictures. The structure of B picture related fields is
`shown in Figure AZ. On the Ptype, two picture types shall be added to include B pictures with and
`without multiple reference frame prediction. On the MB_type, different macroblock types shall be
`defined to indicate the different prediction types forB pictures. The fields of Blk_size, MVDFW, and
`MVDBW shall be inserted to enable bi-directional prediction.
`
`File:++VCEG-N83dl.doc
`
`Page: 42
`
`Vedanti Systems Limited - Ex. 2006
`Page 42
`
`
`
`Omit
`
`~Loop
`
`Blk_size
`
`MVDFW
`
`MVDBW
`
`FIGURE 19
`
`_syntax diagram for B pictures.
`
`9.4.!8.4.1 Picture type (Ptype) and RUN
`See sections 0 and 3.3 for definitions.
`
`9.4.28.4.2 Macro block type (MB_type)
`The MB_type indicates the prediction mode and block size used to encode each macroblock. As
`mentioned earlier, five different prediction modes are supported by B pictures. For the forward,
`backward and bi-directional prediction modes, a macroblock is predicted from either or both of the
`previous and subsequent pictures with block size NxM. Table A.l shows the macroblock types and the
`included data elements forB pictures.
`In "Direct" prediction type, no motion vector data is transmitted.
`The "Forward_NxM" indicates that the macroblock is prediction from a previous picture with block size
`NxM. The "backward_NxM" indicates that the macroblock is prediction from a subsequent picture with
`block size NxM. For each NxM block, motion vector data is provided. Therefore, depending on N and
`M, up to 16 sets of motion vector data have to be transmitted for a macroblock.
`For the "Bi-directional" prediction type, the parameter Blk_size is used to indicate the block size used for
`forward and backward motion prediction (the Blk_size field is described in detail below). Both forward
`and backward motion vector data sets are transmitted. Depending on the block size indicated in Blk_size,
`up to 16 fields of motion vector data is transmitted for each of forward and backward prediction for a
`macro block.
`
`File:++VCEG-N83dl.doc
`
`Page: 43
`
`Vedanti Systems Limited - Ex. 2006
`Page 43
`
`
`
`The "Intra_ 4x4" and "Intra_16xl6" prediction type indicates that the macroblock is encoded by intra
`coding with different intra prediction modes which are defined in the same manner as section 3.4. No
`transmitted motion vector data is needed for" intra mode.
`
`~8.4.3 Intra prediction mode (Intra_pred_mode)
`As present, Intra_pred_mode indicates which intra prediction mode is used for a macroblock.
`Intra_pred_mode is present when Intra_ 4x4 prediction type is indicated in the MB_type. The
`code_number is ·same as that described in the lntra_pred_mode entry of Table 1.
`
`9.4.48.4.4 Reference Frame (Ref_frame)
`At present, Ref_frame indicates the position of the reference frame in the reference frame buffer to be
`used for forward motion compensated for current macro block. Ref_frame is present only when the Ptype
`signals the use of multiple reference frames and only when the present MB_type indicates Forward_NxM
`or Bi-directional prediction type. Decoded IJP pictures are stored in the reference frame buffer in first-in(cid:173)
`first-out manner and the most recently decoded IJP frame is always stored at position 0 in the reference
`frame buffer. The code_number for Ref_frame is described in Table A.2.
`
`TABLE 11
`
`MB_Type and related data elements forB pictures
`
`Direct
`0
`Code_number Prediction Type
`
`Intra_pred_mode Ref_frame Blk_size MVDFW MVDBW
`
`X
`
`X
`X
`
`X
`
`X
`
`X
`
`X
`
`X
`
`X
`
`X
`
`1
`Forward~_l6xl6
`Backward_16x 16
`2
`Bi-directional
`3
`Forward_l6x8
`4
`Backward_16x8
`5
`Forward_8x16
`6
`Backward_8x16
`7
`Forward_8x8
`8
`9
`Backward_8x8
`Forward_8x4
`10
`Backward_8x4
`11
`Forward_ 4x8
`12
`Backward_ 4x8
`13
`Forward_ 4x4
`14
`Backward_ 4x4
`15
`Intra_4x4
`16
`Intra_I6x 162
`17
`...
`. ..
`-
`Ref_ frame IS a valid field only when the usage of multlple reference frames IS present m Ptype, e.g., when Ptype=4
`the Ref_frame field is present.
`2 Intra_l6xl6 indicates 16x16 based intra mode and should represent 24 different prediction modes as defined in
`section 3.4.9 in QlS-J-28. For code_number greater than 16 in Table Al, please see the code numbers from 9 and
`upwards in the field of inter MB_typc of Table 1 in .Q15-J-28 for reference.
`
`X
`
`X
`X
`
`X
`
`X
`
`X
`
`X
`
`X
`
`X
`X
`
`X
`
`X
`
`X
`
`X
`
`X
`
`X
`
`TABLE 12
`
`Fi le:++VCEG-N83d l .doc
`
`Page: 44
`
`Vedanti Systems Limited - Ex. 2006
`Page 44
`
`
`
`Code_number for ref_frame
`
`Code_number
`0
`1
`2
`...
`
`Reference frame
`The most recent previous frame ( l frame back)
`2 frames back
`3 frames back
`...
`
`9.4.$8.4.5 Block Size (Blk_size)
`If present, Blk_size indicates which block size is used for forward and backward motion prediction in a
`macroblock as described in Table A.3. Blk_size is present only when Bi-directional prediction type is
`indicated in the MB_type. There are two sets of Blk_size data, one for forward motion vector data, and
`another for backward motion vector data.
`
`TABLE 13
`
`Code_number for Blk_size
`
`Code_number
`0
`1
`2
`3
`4
`5
`6
`
`Block Size
`1 16x16 block
`4 8x8 blocks
`2 16x8 blocks
`2 8x16 blocks
`2 8x4 blocks
`8 4x8 blocks
`16 4x4 blocks
`
`9.4.68.4.6 Motion vector data (MVDFW, MVDBW)
`MVDFW is the motion vector data for the forward vector, if present. MVDBW is the motion vector data
`for the backward vector, if present. If so indicated by MB_type or Blk_size (bi-directional prediction
`type only), vector data for 1-16 blocks are transmitted. The order of transmitted motion vector data is the
`same as that indicated in Figure2. For the code_number of motion vector data, please refer to Table 1.
`
`9.$8.S.Decoder Process for motion vector
`
`9-.S.l8.5.1 Differential motion vectors
`Motion vectors for forward, backward, or bi-directionally predicted macroblock are differentially
`encoded. A prediction bas to be added to the motion vector differences to get the motion vectors for the
`macroblock. The predictions are formed in way similar to that described in section 3.6.2. The only
`difference is that forward motion vectors are predicted only from forward motion vectors in surrounding
`macroblocks, and backward motion vectors are predicted only from backward motion vectors in
`surrounding macro blocks.
`If a neighboring macroblock does not have a motion vector of the same type or does not use the same
`reference frame for multiple reference frame prediction, the candidate predictor for that macro block is set
`to zero for that motion vector type.
`
`9.Y8,5.2 Motion vectors in direct mode
`In direct mode the same block structure as for the macro block in the. temporally subsequent picture is
`assumed. For each of the subblocks the forward and backward motion ve<-iors are computed as scaled
`versions of the corresponding vector components of the macroblock in the temporally subsequent picture
`as described below.
`
`File:++VCSG-N83dl.doc
`
`Page: 45
`
`Vedanti Systems Limited - Ex. 2006
`Page 45
`
`
`
`As the multiple reference frame prediction is used, the forward reference frame for the direct mode is the
`same as the one used for the corresponding macroblock in the temporally subsequent reference picture.
`The forward and backward motion vectors for direct mode macro blocks are calculated as follows.
`MVp= (TR8 * MV) /TR0
`MVB = (TRB- TRo) * MV /TRo
`Where the vector component MVF is the forward motion vectors, MVB is the backward motion vector,
`and MV represents the motion vectors in the corresponding macroblock in the subsequent reference
`picture. Note that if the subsequent reference is an intra-coded frame or the reference macroblock is an
`intra-coded block, the motion vectors are set to zero. TR0 is the temporal distance between the
`temporally previous and next ·reference frame, and TR8 is the temporal distance between the current
`frame and previous reference frame.
`It should be noted that when multiple reference frame prediction is used, the reference frame for the
`motion vector predictions is treated as though it were the most recent previous decoded frame. Thus,
`instead of using the temporal reference of the exact reference frame to compute the temporal distances
`TR0 and TRB, the temporal reference in most recent previous reference frame is used to compute the
`temporal distances TRn and TR8 .
`
`File: ++VCEG · N83d.l. doc
`
`Page: 46
`
`Vedanti Systems Limited - Ex. 2006
`Page 46
`
`
`
`.W9 SF-pictures
`
`Wd-·:f9!:!.1'===Introduction
`SF-pictures make use of motion-compensated predictive coding to exploit temporal redundancy in the
`sequence similar to F-pictures while still allowing identical reconstruction of the frame even when
`different reference frames are being used. This picture type provides functionalities for bitstream
`functionalities
`such as
`fast-forward and error
`random access, VCR
`switching,
`splicing,
`resilience/recovery. SP-pictures make use of the existing coding blocks and the syntax elements for P(cid:173)
`type frames while the difference is being forward transform coding and quantizing the predicted block
`during the reconstruction of blocks in SF-pictures.
`
`w.a,:f!9.~2==Syntax changes
`The only syntax change related to SP-frames is illustrated in FIGURE 20 as changes to syntax diagram to
`FIGURE 4. An additional quantization parameter is sent after the quantization parameter used for
`prediction error. This additional quantization parameter is used for (dc)quantization of the predicted
`block. Otherwise, SP-frames have the identical syntax elements as P-frames. And all the syntax elements
`are interpreted identical to P-frames, i.e., 4x4 or 4x8 or 8x8 etc. blocks for motion compensation,
`reference frame number, etc.
`
`Ptype
`
`RUN
`
`Omit
`
`~Loop
`
`FIGURE 20
`
`Syntax diagram for SP-frames
`
`~<f!9•;!!f,3 = ='SP-fr ame decoding
`SP-frames consist of intra and inter-type macroblocks. Intra macroblocks are encoded/decoded as in I and
`P-type pictures, therefore the following section refers to the decoding of the inter-type macro blocks in SP(cid:173)
`frames.
`Similar to P-type frame decoding, first the motion-compensated predicted block is formed by using
`received motion vector information, the reference frame number and the already decoded frames. Then,
`forward transform is applied to the predicted block. The resulting coefficients and the received level
`values are then used to calculate reconstructed image coefficients as follows:
`L,., = (KpredxA(QPz) + L.rrxF(QPr.QP2)+fx220)/220
`where
`F(QPt.QP2)= (A(QFz)x220 + A(QPI)/2)/ A(QPI)
`and A(QF) is defined in Section 4.3.3. Here QP 1 is signaled by PQP value and the QP2 is signaled by the
`additional QP parameter SPQP. Notice that when QPl=QP2, then the calculation of L,.c reduces simply to
`the sum of the received level and the level found by quantizing the predicted block coefficient. The
`
`File:++ VC EG- N83 dl .doc
`
`Pa ge: 47
`
`Vedanti Systems Limited - Ex. 2006
`Page 47
`
`
`
`coefficients, Lrec• are then dequantized using QP=QP2 and inverse transform is performed for these
`dequantized levels, as defined in Sections 4.3 and 4.1, respectively. Finally, the result of the inverse
`transformation is shifted by 20 bit (with rounding).
`While applying deblocking filter for macroblocks in SP-frames, all macroblocks are treated as Intra
`macroblocks as described in Section 4.5.
`Since an additional 2x2 transform is performed for the DC coefficients of chroma blocks in a macroblock
`after 4x4 transform of each, Section 4.2, the decoding of the chroma component is performed in the
`similar manner as described above with the following difference: For chroma macroblocks in SP-frames,
`an additional 2x2 transform is applied after 4x4 transform of the predicted chroma blocks. And then the
`steps described above are repeated for the DC coefficients as well as for the AC coefficients. Note also
`that for chroma blocks, the values of QP1 and QP2 are both changed according to the relation between QP
`values of luma and chroma specified in Section 4.3.4.
`
`File:++VCEG-N83dl.doc
`
`Page: 48
`
`Vedanti Systems Limited - Ex. 2006
`Page 48
`
`
`
`810 Hypothetical Reference Decoder
`
`Purpose
`8.110.1
`The hypothetical reference decoder (HRD) is a mathematical model for the decoder, its input buffer, and
`the channel. The HRD is characterized by the channel's peak rateR (in bits per second), the buffer size B
`(in bits), and the initial decoder buffer fullness F (in bits). These parameters represent levels of resources
`(transmission capacity, buffer capacity, and delay) used to decode a bit stream.
`A closely related object is the leaky bucket (LB), which is a mathematical constraint on a bit stream. A
`leaky bucket is characterized by the bucket's leak rate R1 (in bits per second), the bucket size 8 1 (in bits),
`and the initial bucket fullness B 1-F1 (in bits). A given bit stream may be constrained by any number of
`leaky buckets (RhB 1,F1), ••• ,(RN,BN,FN), ~1. The LB parameters for a bit stream, which are encoded in
`the bit stream header, precisely describe the minimum levels of the resources R, B, and F that are
`sufficient to guarantee that the bit stream can be decoded.
`
`&al0.2
`Operation of the HRD
`The HRD input buffer has capacity B bits. Initially, the buffer begins empty. At time tsrart it begins to
`receive bits, such that it receives S(t) bits through time t. S(t) can be regarded as the integral of the
`instantaneous bit rate through timet. The instant at which S(t) reaches the initial decoder buffer fullness
`F is identified as the decoding time t0 of the first picture in the bit stream. Decoding times t" t2, t3, ..• , for
`subsequent pictures (in bit stream order) are identified relative to t0, per Section 10.3. At each decoding
`time t;, the HRD instantaneously removes and decodes all d; bits associated with picture i, thereby
`reducing the decoder buffer fullness froni b; bits to b;- d; bits. Between timet; and ti+h the decoder buffer
`fullness increases from b;- d; bits to b;- d; + [S(t1+1) - S(t;)] bits. That is, fori 2: 0,
`b0 =F
`b;+t= b;- d; + [S(t;+t)- S(t;)].
`The channel connected to the HRD buffer has peak rate R. This means that unless the channel is idle
`(whereupon the instantaneous rate is zero), the channel delivers bits into the HRD buffer at instantaneous
`rate R bits per second.
`
`Decoding Time of a Picture
`&J10.3
`The decoding time t; of picture i is equal to its presentation time t;, if there are no B pictures in the
`sequence. If there are B pictures in the sequence, then t; = t;- m;, where m; = 0 if picture i is a B picture;
`t;', where t/ is the presentation time of the I or P picture that immediately
`otherwise m; equals 't; -
`precedes picture i (in presentation order). If there is no preceding I or P picture (i.e., if i = 0), then m; = mo
`= t1 -
`t0. The presentation time of a picture is determinable from its temporal reference and the frame
`rate.
`
`Schedule of a Bit Stream
`8.410.4
`The sequence (to,d0), (tJ.d1), (tz,dz), ... is called the schedule of a bit stream. The schedule of a bit stream
`is intrinsic to the bit stream, and completely characterizes the instantaneous coding rate of the bit stream
`over its lifetime. A bit stream may be pre-encoded, stored to a file, and later transmitted over channels
`with different peak bit rates to decoders with different buffer sizes. The schedule of the bit stream is
`invariant over such transmissions.
`
`S.S10.5
`Containment in a Leaky Bucket
`A leaky bucket with leak rate Ri> bucket size Bt, and initial bucket fullness B1-F1 is said to contain a bit
`stream with schedule (t0,d0), (thd1), (t2,d2), .•• if the bucket does not overflow under the following
`conditions. At time t0, d0 bits are inserted into the leaky bucket on top of the B1-F1 bits already in the
`bucket, and the bucket begins to drain at rate R1 bits per second. If the bucket empties, it remains empty
`until the next insertion. At timet;. i 2: 1, d;, bits are inserted into the bucket, and the bucket continues to
`drain at rate R 1 bits per second. In other words, fori 2: 0, the state of the bucket just prior to timet; is
`
`File:++VCEG-N83dl.doc
`
`Page: 49
`
`Vedanti Systems Limited - Ex. 2006
`Page 49
`
`
`
`bo = B1-F1
`bi+l = max{O, b; + d;- Rt(t;+1-t;)}.
`The leaky bucket does not overflow if b; + d; ~ B1 for all i 2C 0.
`Equivalently, the leaky bucket contains the bit stream if the graph of the schedule of the bit stream lies
`between two parallel lines with slope R1, separated vertically by B1 bits, possibly sheared horizontally,
`such that the upper line begins at F 1 at time t0 , as illustrated in the figure below. Note from the figure that
`the same bit stream is containable in more than one leaky bucket. Indeed, a bit stream is containable in an
`infinite number of leaky buckets.
`
`bits·
`
`bits
`
`slope Ry··,·····,...
`.. ··
`
`slope Ri/
`//
`, ..... .;·'
`~---··/
`Bz - _____ ), ...... / ·
`
`Fz
`
`---~;::(
`
`//
`
`0
`
`time
`
`B, - - - - - / /
`Ft
`
`.. ···· .. ···· .. ·
`
`..
`
`..... ·•···
`. ... ···
`....
`0 ~--r-~r-r-r-r-r-r-r-r-+
`time
`
`FIGURE21
`
`lllustration of the leaky bucket concept.
`If a bit stream is contained in a leaky bucket with parameters (Rt>Bt.F1), then when it is communicated
`over a channel with peak rate R1 to a hypothetical reference decoder with parameters R=R 1, B=Bi> and
`F=Fr. then the HRD buffer does not overflow or underflow.
`
`Bit Stream Syntax
`8.{JJ0.6
`The header of each bit stream shall specify the parameters of a set of N 2C 1 leaky buckets,
`In the current Test Model, these
`(Rt.BJ.F1), ... ,(RN,BN,FN), each of which contains the bit stream.
`parameters are specified in the first 1 + 3N 32-bit integers of the Interim File Format, in network (big(cid:173)
`endian) byte order:
`N, Rr. Br, Ft. ... , RN, B;,, FN.
`The Rn shall be in strictly increasing order, and both B. and F. shall be in strictly decreasing order.
`These parameters shall not exceed the capability limits for particular profiles and levels, which are yet to
`be defined.
`
`Minimum Buffer Size and Minimum Peak Rate
`&+10.7
`If a bit stream is contained in a set of leaky buckets with parameters (R~oB 1 ,F1 ), ••• , (RN,BN,FN). then when
`it is communicated over a channel with peak rate R, it is decodable (i.e., the HRD buffer does not
`overflow or underflow) provided B 2C B,.;.(R) and F 2C Fm;.(R), where for R" ~ R ~ Rn+ 1,
`B,.;.(R) = aB. + (1 - a.)Bn+t
`F,.;n(R) = a.Fn + (1 - U)Fn+l
`a= (Rn+l- R) I (Rn+l - Rn).
`ForR~R~o
`Bm;.(R) = B, + (R1- R)T
`Fm;.(R) = F1o
`
`~ile:++VCEG-N83dl.doc
`
`Page: 50
`
`Vedanti Systems Limited - Ex. 2006
`Page 50
`
`
`
`to is the duration of the bit stream (i.e., the difference between the decoding times for the
`where T = tL-l -
`first and last picLUres in the bit stream). And for R 2:: Rw,
`B,.,n(R) = BN
`Fmin(R) = F N ·
`Thus, the leaky bucket parameters can be linearly interpolated and extrapolated.
`Alternatively, when the bit stream is communicated to a decoder with buffer size B, it is decodable
`provided R 2:: R,.1n(B) and F 2:: F min( B), where for Bn 2:: B 2:: Bn+-h
`R,.;n(B) = aRn + (l- a)RnH
`F,.,n(B) = aF, + (1- a)F,.t
`a= (B- Bn+l) I (B.- Bn+t).
`For B2:: Bt,
`R111;n(B) = R1- (B- B,)IT
`F,.;n(B) = F1 .
`ForB~ BN, the stream may not be decodable_
`In summary, the bit stream is guaranteed to be decodable in the sense that the HRD buffer does not
`overflow or underflow, provided that the point (R,B) lies on or above the lower convex hull of the set of
`points (O,B 1+R1T), (R~oBJ), . .. , (RN,BN), as illustrated in the figure below. The minimum start:UP delay
`necessary to maintain this guarantee is Fm1n(R) I R .
`
`<:········- 8=81 + (R1 -R)T
`
`(R"B1)
`
`B
`(bits)
`
`(R.~-I .BN- t )
`
`~ A
`L__ _____ _ _ _____ _ _ _ .J; BN
`
`R (bits/sec)
`
`FIGURE 22
`
`Illustration of the leaky bucket concept.
`
`A compliant decoder with _buffer size B and initial decoder buffer fullness F that is served by a channel
`with peak rateR shall perform the tests B 2:: Bm1iR) and F 2:: Fm;.(R), as defined above, for any compliant
`bit stream with LB parameters (R~oB~oF1 ), ... ,(R.v,BN,FN), and shall decode the bit stream provided that B 2::
`B,.in(R) and F2:: F,.1 .. (R) .
`
`Encoder Considerations (informative)
`&810.8
`The encoder can create a bit stream that is contained by some given N leaky buckets, or it can simply
`compute N sets of leaky bucket parameters after the bit stream is generated, or a combination of these. In
`the former, the encoder enforces the N leaky bucket constraints during rate control. Conventional rate
`control algorithms enforce only a single leaky bucket constraint. A rate control algorithm that
`simultaneously enforces N leaky bucket constraints can be obtained by running a conventional rate
`control algorithm for each of the N leaky bucket constraints, and using as the current quantization
`parameter (QP) the maximum of the QPs recommended hy theN rate control algorithms.
`
`File:++VCEG- N83 d l . doc
`
`Pa g e: 51
`
`Vedanti Systems Limited - Ex. 2006
`Page 51
`
`
`
`Additional sets of leaky bucket parameters can always be computed after the fact (whether rate controlled
`or not), from the bit stream schedule for any given Rn, from the iteration specified in Section 10.5.
`
`<ile:++VCEG-N83dl.doc
`
`Page: 52
`
`Vedanti Systems Limited - Ex. 2006
`Page 52
`
`
`
`t-211Supplemental Enhancement Information
`
`~~ll~.l~='Syntax
`Supplemental enhancement information (SED is encapsulated into chunks of data separate from coded
`slices, for example. It is up to the network adaptation layer to specify the means to transport SEI chunks.
`Each SEI chunck may contain one or more SEI messages. Each SEI message shall consist of a SEI header
`and SEI payload. The SEI header starts at a byte-aligned position from the first byte of a SEI chunk or
`from the first byte after the previous SEI message. The SEI header consists of two codewords, both of
`which consist of one or more bytes. The fust codeword indicates the SEI payload type. Values from 00 to
`FE shall be reserved for particular payload types, whereas value FF is an escape code to extend the value
`range to yet another byte as follows:
`payload_type = 0;
`for (;;) {
`payload_type += *byte_ptr_to_sei;
`if (*byte_ptr_to_sei < OxFF)
`break;
`byte_ptr_to_sei++;
`
`The second codeword of the SEI header indicates the SEI payload size in bytes. SEI payload size shall be
`coded similarly to the SEI payload type.
`The SEI payload may have a SEI payload header. For example, a payload header may indicate to which
`picture the particular data belongs. The payload header shall be defined for each payload type separately.
`
`File:++VCEG-N83dl.doc
`
`Page: 53
`
`Vedanti Systems Limited - Ex. 2006
`Page 53
`
`
`
`Appendix I
`
`Non-normative Encoder Recommendation
`
`1.1 Motion Estimation and Mode Decision
`
`1.1.1 Low-complexity mode
`
`1.1.1.1 Finding optimwn prediction mode
`Both for intra prediction and motion compensated prediction, a similar loop as indicated in FIGURE 23 is
`run through. The different elements will be described below.
`
`FIGURE 23
`
`Loop for prediction mode decision
`
`1.1.1.1.1 SA(T)DO
`The SA(T)D to be minimised is given a 'bias' value initially in order to favour prediction modes that need
`few bits to be signalled. This bias is basically a parameter representing bit usage times QP0(QP)
`SA(T)DO = QP0(QP)x0rder_of_prediction_mode (see above)
`Intra mode decision:
`SA(T)DO = QP0(QP)x(Bits_to_code_ vector+ 2xcode_number_of_ref_frame)
`Motion vector search:
`In addition there are two special cases:
`• For motion prediction of a 16x16 block with 0 vector components, 16xQP0(QP) is subtracted
`from SA(T)D to favour the skip mode.
`• For the whole intra macroblock, 24xQP0(QP) is added to the SA(T)D before comparison with the
`best SA(T)D for inter prediction. This is an empirical value to prevent using too many intra
`blocks.
`
`1.1.1.1.2 Block_:difference
`For the whole block the difference between the original and prediction is produced
`Diff(i,j) = Original(i,j) - Prediction(i,j)
`
`1.1.1.1.3 Hadamard transform
`For integer pixel search (see below) we use SAD based on Diff(i,j) for decision. Hence no Hadamard is
`done and we use SAD instead of SATD.
`
`File:++VCEG-N83dl.doc
`
`Page: 54
`
`Vedanti Systems Limited - Ex. 2006
`Page 54
`
`
`
`SAD= IIDiff(i, j)l
`i,j
`However, since we will do a transform of Diff(ij) before transmission, we will do a better optimisation if
`a transform is done before producing SAD. Therefore a two dimensional transform is performed in the
`decision loop for selecting intra modes and for fractional pixel search (see below). To simplify
`implementation, the Hadamard transform is chosen in this mode decision loop. The relation between
`pixels and basis vectors (BV) in a 4 point Hadamard transform is illustrated below (not normalized):
`Pixels ---1
`1
`1
`1
`1
`
`1
`-1
`
`1
`-1
`
`-1
`-1
`1
`1
`-1
`-1
`1
`1
`This transformation is performed horizontally and vertically and result in DifiT(i,j). Finally SATD for
`the block and for the present prediction mode is produced.
`
`B
`v
`t
`
`i,j
`Choose the prediction mode that results in the minimum SA(T)Dmin·
`
`SATD = <IIDijJT(i, J)l) /2
`
`1.1.1.2 Encoding on macroblock level
`
`1.1.1.2.1
`Intra coding
`When starting to code a macroblock, intra mode is checked first. For each 4x4 block, full coding
`indicated in FIGURE 23 is performed. At the end of this loop the complete macro block is intra coded and
`a SATDintra is calculated.
`
`1.1.1.2.2 Table for intra prediction modes to be used at the encoder side
`TABLE 14 gives the table of intra prediction modes according to probability of each mode to be used on
`the decoder side. On the encoder side we need a sort of inverse table. Prediction modes for A and B are
`known as in
`TABLE 1. For the encoder we have found a Mode that we want to signal with an ordering number in the
`bitstream (whereas on the decoder we receive the order in the bitstream and want to convert this to a
`mode). TABLE 14 is therefore the relevant table for the encoder. Example: Prediction mode for A and B
`is 2. The string in TABLE 14 is 2 1 0 3 4 S. This indicates that prediction mode 0 has order 2 (third most
`probable). Prediction mode 1 is second most probable and prediction mode 2 has order 0 (most probable)
`etc. As in
`TABLE 1 '-' indicates that this instance can not occur because A or B or both are outside the picture.
`
`TABLE14
`
`Prediction ordering to be used in the bitstream as a function of prediction mode (see text).
`3\A
`3
`outside
`012---
`o·.1tside
`0-----
`0---12
`113025
`032145
`0---12
`132045
`0---12
`143025
`0---12
`245103
`1---02
`1---20
`245130
`
`012---
`C352H
`024315
`032415
`145203
`14520:l
`245301
`
`012---
`045213
`015324
`013245
`115032
`145302
`135420
`
`1
`2
`
`021---
`025314
`014325
`012345
`135021
`145203
`245310
`
`102---
`104325
`102435
`102345
`214035
`125403
`015432
`
`2
`120---
`240135
`130245
`210345
`320151
`250314
`120334
`
`I.1.1.2.3
`
`Inter mode selection
`
`f'i 1e: ++VC:<:G-N83d1. doc
`
`Page: SS
`
`Vedanti Systems Limited - Ex. 2006
`Page 55
`
`
`
`Next motion vector search is performed for motion compensated prediction. A search is made for all 7
`possible block structures for prediction as well as from the 5 past decoded pictures. This result in 35
`combinations of blocksizes and reference frames.
`
`Integer pixel search
`1.1.1.2.4
`The search positions are organised in a 'spiral' structure around the predicted vector (see vector
`prediction). The number