throbber
ITU Telecommunication Standardization Sector
`Study Group 15, Working Party 15/1
`Expert's Group on Very Low Bitrate Video Telephony
`
`LBC-97-029
`February 1996
`
`Title:
`Source:
`Purpose:
`
`Proposal for Advanced Video Coding
`Nokia Research Center1
`Proposal
`
`Anlage D 7
`
`i.Sa. Google I VSL
`Nichtigkeitsklage vom 8. April 2015
`Bundespatentgericht Munchen
`Quinn Emanuel LLP
`
`Introduction
`This document presents a low bit rate video coding scheme developed at Nokia Research Center. The simulation results
`reported in this document indicate that the scheme consistently achieves higher coding efficiency than the H.263 video
`coder. This contribution is intended as a continuation of the proposal presented in the document LBC-96-91 at the July
`1996 London ITU-T meeting.
`The Appendix of this document contains a detailed description of the proposed video coder. It gives a brief description of
`novel elements of the coder as well as a description of the bitstream syntax.
`
`Video coder
`
`The video coder presented in this contribution contains three major elements distinguishing it from the VM codec which
`prove to be the source of its improved coding efficiency. These elements are:
`1. Rough segmentation of the video frame into arbitrary shaped regions composed of 4-connected 8x8 pixel blocks. The
`segmentation allows compact encoding of motion vector fields and can be described with relatively few bits.
`2. Motion compensation scheme utilizing the above mentioned segmentation and a quadratic motion field model which
`enables very accurate prediction. Motion fields are compactly encoded using 2-D separable orthogonal polynomials.
`3. Powerful VQ and Multi-Shape DCT based scheme utilizing spatial properties of the prediction frame for efficient
`coding of the residual error.
`
`Coding Results
`The performance of the proposed coder has been compared to the H.263 video coder. The Telenor implementation of
`H.263 (TMN 1.6c) with Advanced Prediction Mode and Unrestricted Motion Vectors was used in all the comparisons.
`MPEG-4 test sequences of Class A and Class B in QCIF resolution were used in the simulations.
`Both schemes operated at a fixed frame rate and with a fixed value of the quantiser parameter. In all simulations the
`Nokia coder used the same frrst INTRA frame as H.263.
`
`In the frrst experiment the objective quality of reconstructed sequences at approximately equal bit rates was compared.
`Average2 Peak-Signal-to-Noise Ratio (PSNR) was used as the measure of quality. Simulation results are collected in
`Table 1.
`
`contact person: Contact person: M Karczewicz
`email: marta.karczewicz @research.nokia.fi
`
`2 obtained by averaging PSNRs of particular frames.
`
`Description
`
`of
`
`the
`
`Nokia
`
`Coder
`
`Vedanti Systems Limited - Ex. 2008
`Page 1
`
`

`
`Sequence Picture QP
`NOKIA
`Intra QP Bit rate PSI\'R(YfC) QP Bit rate
`[kbps]
`[dB]
`[kbps]
`
`size
`
`H.263
`
`Mthr&Dtr
`
`Hall Objects
`
`Container
`
`Akiyo
`
`Mthr&Dtr
`
`Silent Voice
`
`Container
`
`News
`
`qcif
`
`qcif
`
`qcif
`
`qcif
`
`qcif
`
`qcif
`
`qcif
`
`qcif
`
`8
`
`8
`
`8
`
`4
`
`4
`
`8
`
`4
`
`8
`
`15
`
`13
`
`13
`
`7
`
`7
`
`10
`
`7
`
`12
`
`10.14
`
`34.5/40.5
`
`10.22
`
`33.7/39.1
`
`9.88
`
`9.77
`
`20.68
`
`24.19
`
`24.07
`
`33.3/38.8
`
`38.6/42.3
`
`37.2/42.5
`
`33.5/37.8
`
`36.4/41.0
`
`24.91
`
`33.8/37.8
`
`12
`
`15
`
`15
`
`9
`
`I 8
`10
`
`9
`
`13
`
`10.26
`
`10.21
`
`10.00
`
`10.28
`
`21.10
`
`24.30
`
`23.85
`
`24.86
`
`PSNR (YfC)
`
`[dB]
`
`33.5/39.4
`
`32.4/38.5
`
`31.0/37.7
`
`35.9/40.9
`
`PSNR
`
`improvement
`(Y/C) [dB]
`
`1.0 I 1.1
`
`1.3/0.6
`
`2.3/1.1
`
`2.7/1.4
`
`35.7/41.2
`
`1.5/1.3
`
`32.6/37.2
`
`33.9/39.7
`
`0.9/0.6
`
`2.5/1.3
`
`31.7/36.6
`
`2.1/1.2
`
`32.3/38.0
`
`Foreman
`
`Coastguard
`
`qcif
`
`qcif
`
`8
`
`8
`
`10
`
`13
`
`47.03
`
`49.12
`
`34.3/38.8
`
`I 10
`
`30.4/41.3
`
`13
`
`47.43
`
`49.51
`
`29.5/40.6
`
`2.0/0.8
`
`0.9/0.7
`
`Table 1: Comparison of reconstruction PSNRs at equal bit rates
`
`Results in Table 1 shows that the Nokia coder achieves 0.9- 2.7 dB higher reconstruction PSNR than H.263.
`The purpose of the second experiment was to fmd out the bit rate reductions achieved by the Nokia coder when compared
`to H.263 at approximately equal reconstruction PSNRs. As before, all the simulations were obtained using fixed values of
`quantiser parameters. The results of simulations are collected in Table 2.
`As can be seen in Table 2 the Nokia coder enables bit rate reductions between 33 and 53 %. Experiments not shown in
`this document confirm similar improvements over H.263 when PB frames are used in both schemes. It was also noted
`that improvements over H.263 tends to be higher when the quality of the first frame improves.
`
`Sequence QP
`
`NOKIA
`
`H.263
`
`Bitrate
`
`Intra QP Bit rate
`[kbps]
`
`PSNR(Y/C)
`[dB]
`
`QP
`
`Bit rate
`[kbps]
`
`PSNR (YIC)
`[dB]
`
`reduction
`[%]
`
`Akiyo
`
`Container
`
`Mthr&Dtr
`
`Mthr&Dtr
`
`Foreman
`
`4
`
`8
`
`8
`
`4
`
`8
`
`7
`
`9
`
`11
`
`7
`
`8
`
`11.11
`
`14.14
`
`39.0/42.3
`
`34.4/39.3
`
`12.19
`
`35.1/40.8
`
`26.66
`
`55.70
`
`38.1/42.6
`
`35.2 I 39.6
`
`5
`
`7
`
`8
`
`5
`
`6
`
`23.55
`
`28.48
`
`18.23
`
`44.67
`
`87.64
`
`39.0/42.2
`
`34.5 I 39.4
`
`35.2/40.4
`
`38.1/42.7
`
`35.1 f 4·0.0
`
`53
`so
`33
`
`40
`
`36
`
`PSNR
`(YIC)
`
`diff.
`[dB]
`
`0.0/0.1
`
`-0.1/-0.1
`
`-0.1/0.4
`
`0.0/-0.1
`
`0.1/-0.4
`
`Table 2: Bit rate reductions achieved by Nokia coder.
`
`Conclusions
`
`The video coder presented in this proposal is shown to achieve higher coding efficiency than H.263 video coder. The
`objective improvements achieved by the Nokia coder range from 0.9 to 2.7 dB which translates to bit rate savings
`between 33 and 53%. In the light of above facts we believe that the codec has a good potential to be a basis for
`development of the ITU-T Long-term Videotelephony Standard H.263L.
`
`2
`
`Description of the Nokia Coder
`
`Vedanti Systems Limited - Ex. 2008
`Page 2
`
`

`
`~
`NOKiA~
`
`RESEARCH CENTER
`
`APPENDIX
`
`DESCRIPTION OF NOKIA VERY LOW BIT RATE
`VIDEO CODER
`version 3.0
`
`Description
`
`of
`
`the
`
`Nokia
`
`Coder
`
`3
`
`Vedanti Systems Limited - Ex. 2008
`Page 3
`
`

`
`1. INTRODUCTION
`
`2. CODER DESCRIPTION
`
`2.1 Video Source Coding Algorithm
`2.1.1 Definitions
`2.1.2 Coding modes
`
`2.2 Frame Segmentation
`2.2.1 Splitting
`2.2.2 Merging
`
`2.3 Motion Field Coding
`2.3 .1 Motion model
`2.3.2 Motion coefficient scaling and quantization
`2.3 .2.1 Motion coefficient scaling
`2.3 .2.2 Motion coefficient quantization
`2.3.3 Motion coefficient coding
`2.3.4 Image interpolation
`
`2.4 Multi-Transform Prediction Error Coding
`2.4.1 Overview
`2.4.2 8x8 Classification
`2.4.3 4x4 Classification
`2.4.3.1 Variance Classification
`2.4.3.2 Rate Classification
`2.4.3.3 Directionality Classifier
`2.4.4 Coding Methods
`2.4.4.1 8x8 Coding Methods
`2.4.4.2 4x4 Coding Methods
`2.4.5 Method Selection
`2.4.6 Decoding process.
`
`2.5 Chrominance coding
`
`3. PB-FRAME MODE
`
`3.1 Introduction
`
`3.2 Region Layer
`
`3.3 PH-frames and INTRA regions
`
`3.4 Calculation of motion vector field for the H-picture
`
`3.5 Prediction of a H-region in PH-frame
`
`3.6 Reconstruction of a H-region in PH-frame
`
`4. SYNTAX AND SEMANTICS
`
`4.1 Picture Layer
`4.1.1 Picture Start Code (PSC) (22 bits)
`4.1.2 Picture Type Information (PTYPE) (3 bits)
`4.1.3 Split Information (SPLIT) (Variable Length)
`4.1.4 Merge Information (MERGE) (Variable Length)
`4.1.5 Quantizer Information (PQUANT) (5 bits)
`4.1.6 Quantization information forB-pictures (PBQUANT) (2 bits)
`
`4
`
`Description of the Nokia Coder
`
`6
`
`6
`
`6
`7
`8
`
`8
`9
`9
`
`10
`11
`12
`12
`12
`13
`13
`
`14
`14
`14
`15
`15
`15
`15
`17
`17
`18
`18
`19
`
`19
`
`20
`
`20
`
`20
`
`21
`
`21
`
`22
`
`22
`
`22
`
`23
`24
`24
`24
`24
`25
`25
`
`Vedanti Systems Limited - Ex. 2008
`Page 4
`
`

`
`4.1.7 Coded/NotCoded Information for Chrominance (CNCC) (Variable Length) and Coded/NotCoded Information
`for Chrominance ofB Frames (CNCCB) (Variable Length)
`25
`
`4.2 Region Layer
`4.2.1 Region Type Information (RTYPE) (Variable Length)
`4.2.2 Region Type Information forB Frame Region (RTYPEB) (Variable Length)
`4.2.3 Quantization Parameter Offset for Inter and Intra Regions (RDQUANT) (Variable Length)
`4.2.4 Motion Information (MINFO), Motion Information forB-region (MINFOB) (Variable length)
`4.2.5 Coded/NotCoded Information forB Frame Region Luminance(CNCYB) (Variable Length) AND
`Coded/NotCoded Information for P Region Luminance (CNCY) (Variable Length)
`
`4.3 Block Layer, Block Layer B and Block Layer C
`4.3.1Blocklayer
`4.3.2 8x8 Coding Method Type (MTYPE8) and 4x4 Coding Method Type (MTYPE4) (Variable Length)
`4.3.3 INTRADC (Fixed Length) and VLC for Coding Methods (MVLC) (Variable Length)
`4.3.4 Block layer B
`4.3 .5 Block layer C
`
`S.REFERENCES
`
`26
`26
`26
`27
`27
`
`30
`
`31
`31
`31
`32
`36
`36
`
`38
`
`Description
`
`of
`
`the
`
`Nokia
`
`Coder 5
`
`Vedanti Systems Limited - Ex. 2008
`Page 5
`
`

`
`1. Introduction
`
`This document describes a scheme for compression of video at very low bit rates developed in Nokia Research Center.
`The primary objective of the scheme is to achieve higher coding efficiency when compared to H.263 video coder and
`MPEG-4 Verification Model. The video coder described in this document includes a number of novel elements which
`contribute to its improved performance. These elements are:
`
`•
`
`segmentation of coded frames into regions obtained by quadtree splitting and then merging.
`
`• very accurate motion compensated prediction utilizing 2-D orthogonal polynomials for encoding of motion vector
`fields.
`
`• multiple choice coding scheme applied to motion compensated prediction error.
`
`In order to facilitate fair comparison with H.263, the coder described in this document has been designed taking into
`account similar algorithmic delay constraints as those imposed on H.263 codec. Also the bitstream syntax was kept,
`whenever possible, similar to H.263. The major syntactic difference is the adoption of Region Layer in place of H.263
`Macroblock Layer. Frame segmentation into regions is defined by segmentation information contained in the Picture
`Layer of the bitstream. The syntax of the coder allows straightforward extension to support arbitrarily-shaped frame
`segmentation.
`
`Unless otherwise specified in this document the video source is assumed to in the QCIF format (176x 144 pixels for
`luminance, 88 x 72 pixels for chrominance). The coder support also other source formats.
`
`2. Coder Description
`
`2.1 Video Source Coding Algorithm
`
`The schematic diagram of the encoder is shown in Figure 2-1. The main novel elements of the coder are the motion field
`encoding scheme and the prediction error coding scheme shown in the figure as Inter Coding which utilizes motion
`compensated prediction frame (as shown in Figure 2-1) to improve coding efficiency.
`
`6
`
`Description of the Nokia Coder
`
`Vedanti Systems Limited - Ex. 2008
`Page 6
`
`

`
`CODER CONTROL
`
`l,(x,y)
`
`video
`input
`
`A
`l,(x,y)
`
`E : Prediction Error Frame
`i : Prediction Frame
`E : Reconstructed Prediction Error
`I
`: Original Current Frame
`T : Reconstructed Current Frame
`
`C!l z
`~
`--' c.. s
`§l to
`decoder
`
`motion
`information
`
`Figure2-1
`
`Schematic diagram of the encoder.
`
`2. 1. 1 Definitions
`Reference frame: The most recently reconstructed frame. Denoted ln-l
`Current frame: The frame that is being coded at the current moment. Denoted In.
`Prediction frame: Motion compensated prediction of the current frame. Denoted in.
`I type frame: The frames which are coded independently of the reference frame.
`P type frame: The frames coded by motion compensated prediction.
`PB type frame:. A PB-frame consists of two frames being coded as one unit. The name PB comes from the name of
`picture types in Recommendation H.263.
`Region: A semi-arbitrary shaped segment in the current frame. The term 'semi-arbitrary shaped' refers to the convention
`that the building blocks of the regions are n x n pel blocks. In the current implementation, n=8.
`UNCHANGED type region: The regions which are found to have very little temporal change. These regions are coded
`by copying the corresponding area from the reference frame.
`INTRA type region: The regions which are decoded independently from the reference frame.
`INTER type region: Regions coded by motion compensated predict,ion using the reference frame. Information for such
`regions includes motion information and coded prediction error.
`NO MOTION INTER type region: The difference between the current frame and the reference frame is coded.
`Information for such regions includes only coded prediction error.
`INTER-B type region: (This type of region can occur only in B type frames of a PB unit). Region which is coded by
`bidirectional motion compensated prediction using both the reference frame and the reconstructed P-frame.
`
`Motion vector: A pair of numbers [.ix(x,y),dy(x,y)] where .ix(x,y) and dy(x,y) are the values of horizontal and
`vertical displacements of a pel at location (x, y), respectively.
`Motion vector field: A set of motion vectors of all pels in a INTER type region.
`Motion model: A parametric formula describing values of motion vectors. Motion model used in Nokia coder is a
`second order polynomial model. (See Section 2.3 .l for details)
`
`Description
`
`of
`
`the
`
`Nokia
`
`Coder
`
`7
`
`Vedanti Systems Limited - Ex. 2008
`Page 7
`
`

`
`Motion coefficients: Coefficients needed to reconstruct motion for a region, as defined by the underlying motion model.
`The Nokia coder uses motion model with 12 coefficients.
`
`2.1.2 Coding modes
`
`The coding mode defmes the region type in the current frame. Each frame type has its own set of coding modes. The
`coding mode decision is sent to the decoder.
`
`If the current frame is P type, its regions can be of type UNCHANGED, INTER or INTRA. In UNCHANGED mode, no
`further information is sent. In INTER mode, the motion coefficients and the prediction error for the region are sent. In
`INTRA mode, the contents of the region are coded as such.
`
`For PB-type frames, the region types are determined independently for the P- and B-frames. For the ?-component, the
`possible region types are UNCHANGED, INTER-B or INTRA. In UNCHANGED mode, no further information is sent.
`In INTER-B mode, the motion coefficients and the prediction error for the region are sent. In INTRA mode both motion
`coefficients and the contents of the region are sent. For B-component, there are 8 possible region types. These types are
`determined by independently indicating usage of the following three coding elements:
`
`the differential correction to the motion parameters inherited from the region in ? -component (similar to
`delta motion vectors in PB-frames of H.263),
`the prediction error information,
`backward prediction.
`The coding of INTRA-type frames is adopted from Recommendation H.263 [3].
`
`2.2 Frame Segmentation
`
`Throughout this section the current frame is assumed to consist of luminance component only. The division of the
`current frame into segments is described in two steps: splitting (SPLIT bits) and merging (MERGE bits). The SPLIT and
`MERGE bitstream can describe any segmentation consisting of combination of 8 x 8 pel blocks.
`
`The decoder starts with a fixed partitioning of the current frame into 32 x 32 pel segments (initial segmentation) shown
`by the solid lines in Figure 2-23
`• The received SPLIT bits indicate which of those segments should be further divided into
`smaller segments. The received MERGE bits indicate how the resulting split segments should be recombined to form the
`final frame segmentation
`
`The encoder starts the segmentation from the 32 x 32 pel regions. Motion of these regions is estimated and each region
`not satisfying a normalized prediction error criterion is split into smaller blocks. Splitting and motion estimation for
`regions proceeds recursively until the measure of prediction error for the region falls below a given threshold, or until tlie
`minimum size of the region (Rx8 pixels) is reached. The next step in the encoder is the merging during which some of the
`neighboring blocks are combined together using the rate-distortion criterion. This two-stage process can result in an
`segmentation consisting of regions which are combinations of 4-connected 8 x 8 pel blocks (Figure 2-5).
`~ ~ 32
`~~2
`
`I<E--
`
`'
`'
`
`- -·--
`-- .- -
`'
`
`-----:::>~16 fE-
`
`'
`
`I
`
`'
`
`I
`
`'
`
`I
`
`- -r- - -- , --
`'
`'
`
`- -.--
`
`i
`1
`
`144
`
`'
`--l-- - -- .; - -~--· - -
`'
`I
`I
`'
`- _I_- __ J... _ _ - - -' - - L - -• - -
`
`I
`I
`
`I
`I
`
`I
`
`176
`
`'
`'
`
`- --.--
`'
`
`-
`
`--1 - -
`I
`
`I
`_ _I __
`
`I
`
`~
`16
`f
`
`Figure 2-2.
`
`Initial segmentation of the coded frame.
`
`3 QCIF resolution frames cannot be fully divided into 32 x 32 blocks which is why the initial partitioning of the QCIF
`frame includes 16 x 32, 32 x 16 and 16 x 16 blocks on the frame boundaries.
`
`8
`
`Description of the Nokia Coder
`
`Vedanti Systems Limited - Ex. 2008
`Page 8
`
`

`
`2.2.1 Splitting
`
`Split information is sent from the encoder to the decoder as two sequences of bits (SPLIT bits). Bits of the first sequence
`indicate splitting of all the regions in the initial segmentation which are greater than 16x16 pel (i.e. 32x32, 32xl6, and
`16x32) into four or two 16x/6 regions as shown by the dashed lines in Figure 2-2. We refer to this operation as first split
`level. An example of the regions resulting from the first split is shown in Figure 2-3.
`
`The bits of the second sequence indicate which of the 16xl6 regions present in the frame segmentation after performing
`the first split level should be split into four 8x8 regions. We refer to this operation as second split level. An example of
`the allowed partitioning for the second split is shown in Figure 2-3. Figure 2-4 shows an example frame segmentation
`after two levels of split.
`
`Each of the resulting two SPLIT bit sequences is coded using an entropy coded run-length technique described in Section
`4.1.3.
`
`2.2.2 Merging
`The next step in building the segmentation is merging of regions prodm::ed by the above splitting procedure. In this step,
`the encoder merges those neighboring regions which satisfy a prediction error criterion for the hypothetical region that
`would result from merging those two neighboring regions. We refer to this merging algorithm as Motion Assisted
`Merging. The algorithm is described in [2}.
`The encoder informs the decoder about the merge/not-merge decisions using MERGE bits which are inserted in the
`bitstream right after the SPLIT bits. MERGE bits refer to an adjacency graph, which is initialized at the beginning of the
`transmission and is being updated in the meantime according to the following rules:
`Initialize adjacency graph:
`1
`1.1
`Assign an unique label to each of the split segments: scan the segmentation image from left to right(cid:173)
`top to bottom with a step of 8x8 pel block, every time a new segment is encountered assign a new label
`to it, which is incremented by one comparing to the previous label.
`Associate with each segment array of labels of neighboring segments using 4-connectivity rule4
`arrays are initially sorted according to increasing index of segment labels.
`Construct MERGE bit sequence by parsing the adjacency diagram from the segment with lowest index to the
`segment with highest index, and parsing the array of neighbors from start to end. Generate bits for merge/not
`merge decisions only for those neighbors having a higher segment index than the index of the segment being
`processed. 0 indicates that the two segments remain intact, 1 indicates that the two segments are merged.
`Every time two segments are merged update the adjacency graph before proceeding with encoding/decoding
`subsequent MERGE bits.
`3 .1
`The whole area of the merged segment should be labeled with the lower label. The higher label should
`be removed from the adjacency graph completely.
`Adjacency relations must be updated. Specifically, if the segment labeled i is merged with the segment
`labeled j, where i<j, do the following for each of the indices k in the array of neighbors of segment j;
`parsing the array from start to end:
`lfk is a common neighborofi andj,
`proceed to the next k
`
`1.2
`
`3.2
`
`2
`
`3
`
`• These
`
`else
`
`concatenate the index k to the end of neighbor array of segment i.
`
`Figure 2-5 shows a possible segment layout after merging.
`
`4 In this context, 4-connectivity rule means that two segments are neighbors if they have at least a line segment as their
`common border
`
`Description
`
`of
`
`the
`
`Nokia
`
`Coder
`
`9
`
`Vedanti Systems Limited - Ex. 2008
`Page 9
`
`

`
`144
`
`1 - - - - l - - - - -+ - - - _ ~ 1
`
`~---- - ~--~---L--~~
`
`~<~----176------->•
`
`Figure 2-3.
`
`Example region boundaries after the first level split. Dashed lines indicate candidates for the second
`level split.
`
`f-t-K-
`f-t-
`
`1-f---
`
`I
`I
`
`-
`
`144
`
`<
`
`--
`176 ----~---'>,..
`
`Figure2-4.
`
`Example region boundaries after the second level split
`
`~16f;E-r 144 1
`
`~
`16
`7f •<--- 176 ---->•
`
`Figure2-5.
`
`Example regions after merging
`
`2.3 Motion Field Coding
`In this document only decoder specific features of motion vector field coding are described. Detailed description of
`motion estimation and encoding (Motion Assisted Merging and Coefficient Removal) can be found in [1,2].
`Motion compensated prediction of a region resulting from the segmentation process described above is performed
`according to the following equation:
`
`10
`
`Description of the Nokia Coder
`
`Vedanti Systems Limited - Ex. 2008
`Page 10
`
`

`
`in (x,y) = ln-![x+ ~(x,y),y+ t1y(x,y)]
`
`( 2-1)
`
`where in is the prediction frame and In-I is the previous reconstructed frame (reference frame) and the pair of numbers
`[ &(x,y), t1y(x, y)] is a motion vector of the pel in location (x, y).
`
`The chrorninance motion vectors are calculated by evaluating the luminance motion vector field in the half-pel position
`corresponding to the location of this chrominance pel and by dividing the resulting motion vector by two.
`
`2.3. 1 Motion model
`The motion field of each INTER region is represented by a set of 12 motion coefficients. These coefficients are produced
`in the encoder by the motion estimation and motion field encoding blocks in Figure 2-l. The relation between motion
`coefficients and the actual values of motion vectors in a given point of a region is defined by the following parametric
`model:
`
`&(x,y) =edt (x,y)+ c2fz(x,y) +c3f3(x,y) + c4/4(x,y) +csfs(x,y) + c6f6(x,y)
`~y(x,y) = c7ft (x,y) + c8h (x, y) + c9.f3(x,y) + c10f4(x, y) + c11J5 (x,y) + 'Ct2f6(x,y)
`where functions /jO (j=1,2, ... ,6) are called motionfield basis functions and (x, y) are integer pixel coordinates in a
`system with the origin in the left-upper comer of the frame.
`The motion model is based on 6 basis functions and the same model is used for horizontal and vertical displacements.
`The basis functions are obtained by orthonorrnalizing the basis function set {1, x, y, xy, x?, lJ to the bounding box of the
`region (example shown in Figure 2-6).
`
`( 2-2)
`
`Hence the form of the motion field basis functions is uniquely determined by the size of the bounding box of the region
`and ·can be constructed in the decoder after the SPLIT and MERGE bits arc received. The two dimensional (2-D) basis
`functions /jO are built as a tensor product of two sequences of one dimensional (1-D) discrete orthonormal
`polynomials:
`r
`g,(x) = :La,,jxj, r = 0,1,2 .orthonormalontheinterval [xrnin,Xmax]
`j~O
`
`r
`h,(y) = Lflr.jYj, r = 0,1,2 orthonormal on the interval [Ymin•Ymax]
`j;Q
`
`Functions fj 0 are built as follows:
`fz (x,y) = go(x)hl (y) IJ(x,y) = gl (x)ho(Y)
`/1 (x,y) = go(x)ho(Y)
`f5 (x, y) =go (x)h2 (y) 16 (x, y) = gz (x)ho (y)
`/4 (x, y) = gl (x)hl (y)
`The coefficients of the polynomial gr (x), with L = xmax- Xmin are given by
`ao,o =~ L~1
`
`( 2-3)
`
`( 2-4)
`
`3
`3L
`(L+I)(L+2) +Zxrnin L(L+l)(L+2)
`
`ai,O =
`
`3
`- 21--~--
`a
`1·1 - - L(L + l)(L + 2)
`
`S(L-l)L
`a2 0 - - - - ' - - - ' - - - + 6xmin
`(L+l)(L+2)(L+3)
`' -
`
`5L
`(L-1)(L+l)(L+2)(L+3)
`
`6 2
`+ Xmin
`
`f<
`
`5
`(L- l)L(L+l)(L+2)(L+3)
`
`(2-5)
`
`a, 1 = -6
`-·
`
`5L

`(L-l)(L+l)(L+2)(L+3)
`
`- 12x ·
`mm
`
`5
`(L -l)L(L + l)(L + 2)(L + 3)
`
`5
`a
`-6 J - - - - - - - - - -
`(L-'1)L(L+l)(L+2)(L+3)
`2'2-
`
`Description
`
`of
`
`the
`
`Nokia
`
`Coder 11
`
`Vedanti Systems Limited - Ex. 2008
`Page 11
`
`

`
`The respective coefficients Pr,j are calculated as coefficients ar,j using the above formulas by replacing Xmin by Ymin
`
`and setting L = Ymax - Ymin ·
`The choice of orthogonal polynomials for basis functions was motivated by the observation that coefficients
`corresponding to such polynomials are less sensitive to quantization than coefficients corresponding to ba~is functions
`{l,x, y, xy, :1, /J. Two options were considered initially:
`• polynomials orthonormalized with respect to shape of the region (using some orthogonalization method) or
`•
`separable polynomials orthonormalized with respect to the bounding box of the region calculated according. to
`formulas ( 2-4 ).
`It was found that the latter motion model provides equally good performance5 as the former one. The separable model
`was chosen since its basis functions are given by analytic formulas and can be computed with relatively low
`computational complexity.
`
`(Xmin' Ymin)
`
`r------n----------------------:
`
`(Xmax' Ymin)
`
`Figure2-6.
`
`Bounding box for the region Rk
`
`2.3.2 Motion coefficient scaling and quantization
`
`2.3.2.1 Motion coefficient scaling
`
`Scaling of motion coefficients enables to vary the bit allocation between segments having different sizes, but the same
`size of the bounding box. Scaling operation itself does not affect neither the bit allocation nor the precision of motion
`estimation. If there was no quantization of motion coefficients, scaling would not affect the codec performance at all. It is
`the quantization of scaled motion coefficients, which dedicates more bits (assuming that the number of motion
`coefficients are equal) to larger segments among those having the same size of the bounding box.
`Motion coefficients of a segment are scaled according to the ratio between the size of the bounding box and the size of
`the segment. Let us denote the number of pixels in the segment by P, and the number of pixels in the bounding box by
`Pbo~· The scaling factor scale equals to:
`
`pbox
`scale=-(cid:173)
`p
`
`( 2-6)
`
`Each of the motion coefficients of the segment is divided by the scaling factor scale. In order to keep the validity of ( 2-2
`) it is necessary that the value of each of the basis functions ( 2-4 ) at each pixel location in the segment is multiplied by
`the same scaling factor scale.
`
`2.3.2.2 Motion coefficient quantization
`
`A uniform scalar quantizer is applied to each of the non-zero motion coefficients. The quantizer step size is predefined as
`STEP=3. The quantization of a coefficient cj corresponding to orthonormal basis functions is done according to the
`formula:
`
`measured in bits needed to achieve a given prediction error.
`
`12
`
`Description of the Nokia Coder
`
`Vedanti Systems Limited - Ex. 2008
`Page 12
`
`

`
`LEVEL = ci II STEP,
`
`( 2-7)
`
`where II denotes division followed by rounding operation.
`Each of the non-zero motion coefficients cj is reconstructed from the transmitted LEVEL. The following formula
`describes the inverse quantization process of a given coefficient resulting in a reconstructed coefficient value c j :
`c j = sign(LEVEL) * !LEVELl * STEP
`
`( 2-8)
`
`The maxL'llum eneodable value of !LEVELl is 1130. Whenever !LEVELl exceeds the maximum encodable value it is
`clipped.
`Encoder transmits scaled motion coefficients, so before proceeding to build the prediction frame in the decoder it is
`necessary to scale the basis functions as described in the previous section.
`
`2.3.3 Motion coefficient coding
`
`The motion field of an INTER type region is described by 12 motion coefficients, 6 of which relate to horizontal and 6 to
`vertical components of motion vectors ( 2-2 ). Since the amount and type of motion in the sequences tends to vary, the
`method for coding motion coefficients was designed to enable adaptation of the complexity of the motion model to the
`motion in the sequence.
`The motion information for a region consists oftwo elements: selection information and quantized coefficients. Selection
`information contains two 6-tuples of bits: MCP _x and MCP _y. Each bit of the 6-tuple MCP _x corresponds to one
`coefficient of the horizontal displacement function Ax(x, y) whereas bits of MCP _y correspond to coefficients of the
`vertical displacement function ~y(x,y)of the region. The role of these bits is to indicate whether the corresponding
`coefficient has a nonzero value. Thus selection information identifies which motion coefficients are transmitted to the
`decoder.
`Note that this structure of information allows varying the complexity of the motion model between regions, frames, and
`sequences. This enables the usage in the encoder of Coefficient Removal Algorithm to determine the importance of a
`given coefficient for the result of the prediction and to determine which coefficients need to be transmitted for a
`particular region. The algorithm is described in [2]. However, other methods for estimating and selecting the motion
`coefficients can be used in the encoder.
`The 6-tuples in the selection information as well as the values of the quantized non-zero coefficients are Huffman coded
`as described in Section 4.2.4.
`The coder uses a PB-frame mode adapted to operate on arbitrary shaped regions and to utilize a polynomial motion
`model. If the PB-frame mode is used, then, as in H.263, additional differential motion coefficients can be sent for a
`region to improve prediction of the region in the B-frame. The implementation ofPB-frame mode is described in detail in
`Section 3.
`
`2.3.4 Image interpolation
`
`Since values of motion vectors can have non-integer values, motion compensated prediction requires evaluating the
`luminance and chrominanee values at non-integer locations (x, y) in the reference frame In-l. The interpolation of
`luminance and ehrominance values is done by cubic convolution interpolation using pixel luminance values in the 4x4
`neighborhood [4].
`For a frame of size M x N, the pixels have coordinates x j = 0,1, ... , M -1 and Yk = 0,1, ... , N -1. Let (x J, Yk) be such
`that xi :5: x <xi+! and Yk s y :5: Yk+l. The cubic convolution interpolation in the point (x, y) is defined as:
`2
`2
`ln-l(x,y) = L Lcj+l,k+mu(x - xj+z)u(y- Yk+m)
`l=-lm=-1
`
`( 2-9)
`
`where 1-D interpolation function is defined as:
`
`Description
`
`of
`
`the
`
`Nokia
`
`Coder 13
`
`Vedanti Systems Limited - Ex. 2008
`Page 13
`
`

`
`O<ls!<l
`1 <lsi< 2
`2 <lsi
`
`( 2-10)
`
`·For pixels within the frame boundaries, the c1k's are given by c jk = ln-l(x1,yk ). Luminance values cjk residing outside
`the frame which are needed for interpolation are obtained by duplicating the luminance values of pixels on the boundary
`of the frame.
`
`2.4 Multi-Transform Prediction Error Coding
`
`2.4.1 Overview
`The Multi-Transform Prediction Error Coding technique used in this coder is based on the observation that in typical
`video sequences the prediction error (residual error after motion compensation) is concentrated near the contours of the
`moving objects. Knowledge of the localization of prediction error can be used to improve coding efficiency by using
`transforms with better localization properties.
`The exact location of contours of moving objects is not known in general. However, the locations can be approximately
`determined by finding edges and other discontinuities in a video frame. For this purpose, the coder utilizes the prediction
`frame which is known both to the encoder and the decoder (after receiving motion coefficients). The improved coding
`efficiency of the system is due to the fact that properties (location, directionality, etc.) of the prediction error signal at a
`given location can be inferred from properties of the prediction frame in in the same location. Thus the decoder can
`anticipate what coding technique(s) the encoder will choose for coding of prediction error pattern in this block.
`In the proposed coder both the encoder and the decoder include a classifier which analyses spatial properties (location of
`discontinuities and their directionality) of the prediction frame i,. The above information is used to switch between a
`multitude of coding methods. The decision on the best methods is made by an optimization procedure based on rate(cid:173)
`distortion performance. The selection information is transmitted to the decoder as a variable length codeword. Among the
`possible methods there are Multi-Shape DCTs, extrapolation and Entropy-Constrained VectorQuantization (ECVQ).
`
`2.4.2 BxB Classification
`
`The criterion for classification of an 8x8 block, using the prediction frame in, is the location of the areas with high
`variance of pixel values. Each 8x8 block is

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket