87
INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO ISO/IEC JTC1/SC29/WG11 MPEG2011/N12355 November 2011, Geneva, Switzerland Source Video Subgroup Status draft Title Internet Video Coding Test Model (ITM) Version 1.0 Editor Siwei Ma, Yunfei Wang, Jianwen Chen

INTERNATIONAL ORGANISATION FOR STANDARDISATION

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

INTERNATIONAL ORGANISATION FOR STANDARDISATION

ORGANISATION INTERNATIONALE DE NORMALISATION

ISO/IEC JTC1/SC29/WG11

CODING OF MOVING PICTURES AND AUDIO

ISO/IEC JTC1/SC29/WG11

MPEG2011/N12355

November 2011, Geneva, Switzerland

Source Video Subgroup

Status draft

Title Internet Video Coding Test Model (ITM) Version 1.0

Editor Siwei Ma, Yunfei Wang, Jianwen Chen

N12355

-ii-

Table of Contents

1 Introduction ................................................................................................................... 6 1.1 Objective................................................................................................................. 6 1.2 Technical Summary ................................................................................................ 6 1.3 Prediction Technique .............................................................................................. 6 1.3.1 Picture Partition ............................................................................................... 6 1.3.2 Transform and Quantization ............................................................................ 7 2 Terms and Definitions ................................................................................................... 8 2.1 Reserved ................................................................................................................. 8 2.2 Bit string ................................................................................................................. 8 2.3 Bitstream................................................................................................................. 8 2.4 Bitstream buffer ...................................................................................................... 8 2.5 Bitstream order ....................................................................................................... 8 2.6 Variable length coding ............................................................................................ 8 2.7 Transform coefficient ............................................................................................. 8 2.8 Encoding presentation ............................................................................................ 8 2.9 Encoding process .................................................................................................... 9 2.10 Encoder ................................................................................................................... 9 2.11 Coded picture .......................................................................................................... 9 2.12 Flag ......................................................................................................................... 9 2.13 Compensation ......................................................................................................... 9 2.14 Residual .................................................................................................................. 9 2.15 Reference index ...................................................................................................... 9 2.16 Reference picture .................................................................................................... 9 2.17 Layer ....................................................................................................................... 9 2.18 Profile ..................................................................................................................... 10 2.19 Non-reference picture ............................................................................................. 10 2.20 Component ............................................................................................................. 10 2.21 Inverse transform .................................................................................................... 10 2.22 Dequantization ........................................................................................................ 10 2.23 Block ...................................................................................................................... 10 2.24 Block scan .............................................................................................................. 10 2.25 Luma ....................................................................................................................... 10 2.26 Quantization parameter ........................................................................................... 10 2.27 Quantized coefficient .............................................................................................. 11 2.28 Raster scan .............................................................................................................. 11 2.29 Macroblock ............................................................................................................. 11 2.30 Macroblock address ................................................................................................ 11 2.31 Macroblock line ...................................................................................................... 11 2.32 Macroblock position ............................................................................................... 11 2.33 Backward prediction ............................................................................................... 11 2.34 Partitioning ............................................................................................................. 11 2.35 Level ....................................................................................................................... 12 2.36 AC coefficient ........................................................................................................ 12 2.37 Decode processing .................................................................................................. 12 2.38 Decoding process .................................................................................................... 12 2.39 Decoder................................................................................................................... 12 2.40 Decoding order ....................................................................................................... 12 2.41 Decoded picture ...................................................................................................... 12 2.42 Decoded picture buffer ........................................................................................... 12 2.43 Parse ....................................................................................................................... 12 2.44 Forbidden ................................................................................................................ 13 2.45 X-profile decoder .................................................................................................... 13

N12355

-iii-

2.46 Start code ................................................................................................................ 13 2.47 Forward prediction ................................................................................................. 13 2.48 Forward inter decoded picture ................................................................................ 13 2.49 Chroma ................................................................................................................... 13 2.50 Sequence ................................................................................................................. 13 2.51 Output reorder delay ............................................................................................... 13 2.52 Output processing ................................................................................................... 14 2.53 Output order ............................................................................................................ 14 2.54 Bidirectional prediction .......................................................................................... 14 2.55 Bidirectional inter decoded picture ......................................................................... 14 2.56 Random access ....................................................................................................... 14 2.57 Random access point .............................................................................................. 14 2.58 Stuffing bits ............................................................................................................ 14 2.59 Slice ........................................................................................................................ 14 2.60 Slice header ............................................................................................................ 14 2.61 Skipped macroblock ............................................................................................... 14 2.62 Picture reordering ................................................................................................... 15 2.63 Display order .......................................................................................................... 15 2.64 Sample .................................................................................................................... 15 2.65 Width height ratio ................................................................................................... 15 2.66 Sample value .......................................................................................................... 15 2.67 Run ......................................................................................................................... 15 2.68 Prediction ................................................................................................................ 15 2.69 Prediction process ................................................................................................... 15 2.70 Prediction value ...................................................................................................... 15 2.71 Syntax element ....................................................................................................... 16 2.72 Source ..................................................................................................................... 16 2.73 Motion vector ......................................................................................................... 16 2.74 DC coefficient ........................................................................................................ 16 2.75 Frame ...................................................................................................................... 16 2.76 Inter coding ............................................................................................................. 16 2.77 Inter prediction ....................................................................................................... 16 2.78 Intra coding ............................................................................................................. 16 2.79 Intra decoded picture .............................................................................................. 16 2.80 Intra prediction ....................................................................................................... 17 2.81 Byte ........................................................................................................................ 17 2.82 Byte alignment ........................................................................................................ 17 3 Abbreviations ................................................................................................................. 18 4 Conventions ................................................................................................................... 19 4.1 Arithmetic operators ............................................................................................... 19 4.2 Logical operators .................................................................................................... 19 4.3 Relational operators ................................................................................................ 19 4.4 Bitwise operators .................................................................................................... 20 4.5 Assignment ............................................................................................................. 20 4.6 Mathemetical functions .......................................................................................... 20 4.7 Description of bitsteam syntax parsing process and decoding process ................... 21 4.7.1 Method of describing bitstream syntax ............................................................ 21 4.7.2 Functions ......................................................................................................... 22 4.7.3 Descriptor ........................................................................................................ 24 4.7.4 Reserved, forbidden and marker bit................................................................. 24 5 Bitstream syntax and semantics ..................................................................................... 25 5.1 Structure of coded video data ................................................................................. 25 5.1.1 Video sequence ................................................................................................ 25 5.1.2 Sequence header .............................................................................................. 25 5.1.3 Picture .............................................................................................................. 26 5.1.4 Color format .................................................................................................... 26 5.1.5 Picture types .................................................................................................... 26 5.1.6 Order between pictures .................................................................................... 26 5.1.7 Reference picture ............................................................................................. 27

N12355

-iv-

5.1.8 Slice ................................................................................................................. 27 5.1.9 Macroblock ...................................................................................................... 28 5.1.10 8x8 block ......................................................................................................... 28 5.1.11 4x4 block ......................................................................................................... 28 5.2 Bitstream syntax ..................................................................................................... 29 5.2.1 Start codes ....................................................................................................... 29 5.2.2 Video sequence ................................................................................................ 29 5.2.3 Extension and user data ................................................................................... 30 5.2.4 Picture .............................................................................................................. 31 5.2.5 Slice ................................................................................................................. 32 5.2.6 Macroblock ...................................................................................................... 32 5.2.7 Block ............................................................................................................... 34 5.3 Video bitstream semantics ...................................................................................... 34 5.3.1 Video sequence ................................................................................................ 34 5.3.2 Sequence header .............................................................................................. 35 5.3.3 Extension data and user data ........................................................................... 37 5.3.4 Picture .............................................................................................................. 38 5.3.5 Slice ................................................................................................................. 38 5.3.6 Macroblock ...................................................................................................... 38 5.3.7 Block ............................................................................................................... 39 6 Video decoding process ................................................................................................. 41 6.1 High-level syntax structure ..................................................................................... 41 6.2 Variable length decoding ........................................................................................ 41 6.2.1 Initialization of the qcoder Decoder ................................................................ 42 6.2.2 Entropy decoding processing........................................................................... 43 6.2.3 Binary decoding method .................................................................................. 47 6.3 Inverse scanning ..................................................................................................... 57 6.3.1 Inverse scanning process for 4×4 block coefficients ....................................... 57 6.3.2 Inverse scanning process for 8×8 block coefficients ....................................... 57 6.4 Inverse quantization process ................................................................................... 58 6.5 Inverse transform process ....................................................................................... 59 6.5.1 Inverse transform for 4×4 block ...................................................................... 59 6.5.2 Inverse transform for 8×8 block ...................................................................... 60 6.6 Intra prediction ....................................................................................................... 62 6.6.1 Intra prediction modes of DC coefficients ....................................................... 63 6.6.2 Getting intra DC coefficients‟ prediction values ............................................. 63 6.6.3 Reconstruction ................................................................................................. 64 6.7 Inter prediction ....................................................................................................... 64 6.7.1 Inter prediction modes ..................................................................................... 65 6.7.2 Frame prediction modes selection ................................................................... 65 6.7.3 Motion vectors ................................................................................................. 66 6.7.4 Luma motion vectors prediction ...................................................................... 66 6.7.5 Forming predictors .......................................................................................... 67 6.7.6 Skipped mode macroblocks ............................................................................. 68 6.7.7 Combining predictions .................................................................................... 68 6.7.8 Adding prediction and coefficient data ............................................................ 69 7 Description of the Internet Video Coding Encoder........................................................ 70 7.1 General Coding Structure ....................................................................................... 70 7.2 Picture Partitioning ................................................................................................. 71 7.2.1 Macroblock ...................................................................................................... 71 7.2.2 Slice ................................................................................................................. 71 7.3 Intra Prediction ....................................................................................................... 71 7.4 Inter Prediction ....................................................................................................... 72 7.4.1 Motion vector prediction ................................................................................. 73 7.4.2 Skip Mode ....................................................................................................... 74 7.5 Transform ............................................................................................................... 74 7.5.1 1-D 4-point forward transform ........................................................................ 74 7.5.2 1-D 8-point forward transform ........................................................................ 74 7.6 Quantization ........................................................................................................... 75

N12355

-v-

7.7 Entropy Coding ...................................................................................................... 77 7.7.1 Binarization and Context model Selection (CS) .............................................. 77 7.7.2 Initialization..................................................................................................... 78 7.8 Encoder configurations ........................................................................................... 79 7.8.1 Constraint set 1 configuration.......................................................................... 79 7.8.2 Constraint set 2 configuration.......................................................................... 79 Annex A VLC coding table ........................................................................................................... 80 Annex B Profiles and levels ....................................................................................................................... 84 B.1 Profile 84 B.2 Level 84 B.3 Level constraints independent of profiles ............................................................................................ 85

N12355

-6-

1 Introduction

1.1 Objective

Internet Video Coding (IVC) is an effort to produce a video coding standard

whose baseline profile complies with the IVC CfP (N12204). This work has been

originated by the proposal made by a group of Chinese Universities (M22477).

This Core Experiment (CE) document includes descriptions of investigations of

coding modules in IVC, analysis of the coding performance of different

configurations to further improve the coding performance of the IVC tools included in

the test model (ITM1.0). Everybody is encouraged to propose further core

experiments. Changes to the test model must comply with the IVC CfP (N12204).

In Section 5 the decoder description, syntax and semantics are provided.

In Section 6 the encoder description is provided.

1.2 Technical Summary

The ITM includes a set of tools to achieve efficient video coding, including intra

prediction, inter prediction, transform, quantization and entropy coding, etc. Inter

prediction uses block-based motion vectors to eliminate redundancy between pictures;

intra prediction uses spatial prediction mode to eliminate redundancy within the

picture. The visual redundancy within the picture is eliminated by the transformation

and quantization of the prediction residual. And finally, motion vectors, prediction

modes, quantization parameters and transform coefficients are compressed using

entropy coding.

1.3 Prediction Technique

Intra prediction doesn‟t need to refer to other pictures, and the pictures coded by

intra prediction can serve as random access points of the encoded sequence.

Inter prediction needs to refer to previously decoded pictures, and decoding order

can be different from the source picture capture order at the encoder side or the

display order at the decoder side. The motion vector precision of Inter prediction can

be as precise as 1 / 4 pixel, and motion vectors are coded by predictive coding.

1.3.1 Picture Partition

The basic unit of video decoding in this part is macroblock. A macro block

consists of a 1616 luminance block and corresponding chroma blocks. Macroblock

can be further divided to 88 block and 4x4 block to perform the prediction.

N12355

-7-

1.3.2 Transform and Quantization

The unit of transform is 88 or 44 block. Transform coefficients are quantized

by scalar quantization.

N12355

-8-

2 Terms and Definitions

The terms and definitions below are applicable to the content in this part.

2.1 Reserved

Defines some special syntax element values which will be used to extend this

part in the future.

Note: These values should not exist in the bitstream which conforms to the

syntax defined in this part.

2.2 Bit string

Ordered string with limited number of bits. The left most bit is the most

significant bit (MSB), the right most bit is the least significant bit (LSB).

2.3 Bitstream

The binary bit stream generated by encoding the frame.

2.4 Bitstream buffer

The buffer which stores the bitstream.

2.5 Bitstream order

The order in the bitstream where the encoded frame located, which is the same as

the frame order in the decoding process.

2.6 Variable length coding

A reversible entropy coding process, which distributes short codewords to the

high-frequency symbols and distributes long codewords to the low-frequency

symbols.

2.7 Transform coefficient

A scalar in the transform domain.

2.8 Encoding presentation

The representation after the encoding process

N12355

-9-

2.9 Encoding process

The process which generates the bitstream conforms to the description in the

current part.

Note: This part doesn‟t specify the encoding process.

2.10 Encoder

The realization of the encoding process.

2.11 Coded picture

The representation of one picture after the encoding process.

2.12 Flag

A binary variable.

2.13 Compensation

Obtaining the addition of the decoded residual and the corresponding prediction

values.

2.14 Residual

The difference between the reconstructed samples and the corresponding

prediction values.

2.15 Reference index

The number of the reference frame or the corresponding field in the frame buff in

the decoding process.

2.16 Reference picture

Picture for inter prediction of subsequent pictures in the decoding process.

2.17 Layer

Layered structure in bitstream, of which higher layer includes lower layer. The

coding layers ranging from high to low are respectively: sequence, picture, slice,

macroblock and block.

N12355

-10-

2.18 Profile

A subset of syntax, semantics and algorithms defined in this part.

2.19 Non-reference picture

Picture not used for inter prediction of subsequent pictures in the decoding

process

2.20 Component

One of the three picture sample value matrices (one luma matrix and two chroma

matrices) or its single sample value.

2.21 Inverse transform

The process in which transform coefficient matrix is transformed into spatial

sample value matrix.

2.22 Dequantization

The process in which transform coefficients are obtained after scaling the

quantized coefficients.

2.23 Block

An MN sample value matrix or transform coefficient matrix (M columns and N

rows).

2.24 Block scan

Specified serial ordering of quantized coefficients.

2.25 Luma

Sample value matrix or single sample value representing the luma signal.

Note: the symbol representing luma is Y.

2.26 Quantization parameter

The parameter that dequantizes the quantized coefficients in the decoding

process.

N12355

-11-

2.27 Quantized coefficient

Transform coefficients before dequantization.

2.28 Raster scan

Maps a two dimensional rectangular raster into a one dimensional raster, in

which the entry of the one dimensional raster starts from the first row of the two

dimensional raster, and the scanning then goes through the second row and the third

row, and so on. Each raster row is scanned in the left to right order.

2.29 Macroblock

Includes a 1616 luma sample value block and its corresponding chroma sample

value blocks.

2.30 Macroblock address

Starting from the upper left macroblock and numbering according to the order of

raster scan, with an initial number 0.

2.31 Macroblock line

Consecutive macroblocks within the same vertical position that start from the left

coded picture boundary to the right. The height of one macroblock line is 16 samples.

2.32 Macroblock position

The two-dimensional coordinates of one macroblock in a picture denoted by

(x,y).The coordinate of the top left macroblock (x,y) is equal to (0,0); x is

incremented by 1 for each macroblock column from left to right; y is incremented by

1 for each macroblock row from top to bottom.

2.33 Backward prediction

Predict current picture by using future pictures in the display order as reference

pictures.

2.34 Partitioning

The process of dividing a set into subsets such that each element in the set

belong to only one of the subsets.

N12355

-12-

2.35 Level

A defined set of constraints on the values for the syntax elements and syntax

element parameters under certain level

2.36 AC coefficient

Any transform coefficient whose frequency indexes are non-zero in at least one

dimension.

2.37 Decode processing

Including the analyzing processing and the decoding processing.

2.38 Decoding process

The process that derives decoded pictures from syntax elements.

2.39 Decoder

One embodiment of the decoding process.

2.40 Decoding order

The order of decoding frames, which depends on the relationship of inter

prediction.

2.41 Decoded picture

The reconstructed picture out of the bitstream by the decoder.

2.42 Decoded picture buffer

The buffer used for saving the decoded pictures for prediction as well as output

reordering and output timing.

2.43 Parse

The procedure of getting the syntax element from the bitstream.

N12355

-13-

2.44 Forbidden

Define some special syntax elements, which should not exist in the bitstream

which conforms to the syntax defined in this part. The reason for forbidden is to avoid

the pseudo initial code in the bitstream.

2.45 X-profile decoder

The decoder which is able to decode the bitstream which satisfies the

specifications of a certain profile.

2.46 Start code

A 32-bit codeword which is unique in the whole bitstream. Start code has a lot of

usages, one of which is to identify the start point of the syntax structure in the

bitstream.

2.47 Forward prediction

The process of predicting the current picture by the past reference pictures in the

display order.

2.48 Forward inter decoded picture

Decoded pictures using only forward prediction in inter prediction.

2.49 Chroma

Sample value matrix or single sample value of one of the two colour difference

signals.

Notes: symbols of chroma are Cr and Cb.

2.50 Sequence

The highest level syntax structure of coding bitstream, including one or several

consecutive coded pictures.

2.51 Output reorder delay

The delay between the beginning of decoding one frame in the bitstream and the

output of the decoded picture, which is caused by the difference between the display

order and the decoding order.

N12355

-14-

2.52 Output processing

The process of deriving the output frame or field from the decoded picture.

2.53 Output order

The order of outputting decoded pictures, which is the same as the display order.

2.54 Bidirectional prediction

The process of predicting the current picture by the past reference pictures and

future reference pictures in the display order.

2.55 Bidirectional inter decoded picture

Decoded pictures using bidirectional prediction in inter prediction.

2.56 Random access

The ability to decode the bit-stream and restore the decoded picture from a point

which is not the starting point.

2.57 Random access point

The point which can be accessed randomly in the bit-stream.

2.58 Stuffing bits

The bit string which is inserted into bit-stream during encoding process and

should be aborted during the decoding process.

2.59 Slice

Several consecutive macroblock rows in the raster scan order.

2.60 Slice header

One part of the encoded slice which is the encoding presentation for the public

data of macroblocks in the slice.

2.61 Skipped macroblock

Macroblock without other encoding data except for the indicator “skipped”.

N12355

-15-

2.62 Picture reordering

The process of reordering the decoded pictures if the decoding order is different

from the output order.

2.63 Display order

The order of displaying decoded pictures.

2.64 Sample

The basic elements that compose the picture.

2.65 Width height ratio

The ratio of the horizontal distance between columns to the vertical distance

between rows of the luma samples in one frame.

Shown as , where is the horizontal width and is the vertical height.

2.66 Sample value

The amplitude value of a sample.

2.67 Run

A number of data elements of the same value in the decoding process. On one

hand, it means the number of zero coefficients before a non-zero coefficient in the

block scan; on the other hand, it means the number of skipped macroblocks.

2.68 Prediction

The implementation of the prediction process.

2.69 Prediction process

The process of estimating the decoded sample value or data element using a

predictor.

2.70 Prediction value

The value, which is the combination of the previously decoded sample values or

data elements, used in the decoding process of the next sample value/data element.

N12355

-16-

2.71 Syntax element

The analysis result of the data unit in the bitstream.

2.72 Source

The term describing the raw video clips or some of their attributes before the

encoding process.

2.73 Motion vector

A two-dimensional vector used for inter prediction which refers the current

picture to the reference picture, the value of which provides the coordinate offsets

between the current picture and the reference picture.

2.74 DC coefficient

A transform coefficient whose frequency indexes are zero in both dimensions

2.75 Frame

The representation of video signals in the space domain, Composed of one luma

sample matrix (Y) and two chroma sample matrices (Cb and Cr).

2.76 Inter coding

Coding one macroblock or picture using inter prediction.

2.77 Inter prediction

The process of deriving the prediction value for the current picture (or field)

using previously decoded pictures (or fields).

2.78 Intra coding

Coding one macroblock or picture using intra prediction.

2.79 Intra decoded picture

The decoded picture using only intra prediction. If the I frame uses field coding,

the first field can only use intra prediction.

N12355

-17-

2.80 Intra prediction

The process of deriving the prediction value for the current sample using

previously decoded sample values in the same decoded picture (or field).

2.81 Byte

8-bit bit string.

2.82 Byte alignment

Starting from the first bit in the bitstream, one bit is byte aligned if the position

of the bit is an integer multiple of eight.

N12355

-18-

3 Abbreviations

BBV: Bitstream Buffer Verifier

CBR: Constant Bit Rate

LSB: Least Significant Bit

MB: Macroblock

MSB: Most Significant Bit

VBR: Variable Bit Rate

VLC: Variable Length Coding

N12355

-19-

4 Conventions

The mathematical operators and their precedence rules used to describe this

Specification are similar to those used in the C programming language. However,

operators of integer divisions with truncation and of rounding are specifically defined.

If not specifically explained, numbering and counting begin from zero.

4.1 Arithmetic operators

Addition

– Subtraction (as a binary operator) or negation (as a unary prefix operator)

× Multiplication

ab Exponential operation. a is raised to power of b. also it can represent

superscript.

/ Integer division with truncation of the result toward zero. For example, 7/4

and –7/–4 are truncated to 1 and –7/4 and 7/–4 are truncated to –1.

Division in mathematical equations where no truncation or rounding is

intended

b

a Division in mathematical equations where no truncation or rounding is

intended

b

ai

if )( The summation of the f (i) with i taking integral values from a up to, b

(including b)

a % b Remainder from division of a by b. both a and b are positive integers

4.2 Logical operators

a && b Logical AND operation between a and b

a || b Logical OR operation between a and b

! Logical NOT operation

4.3 Relational operators

Greater than

Greater than or equal to

Less than

Less than or equal to

Equal to

! Not equal to

N12355

-20-

4.4 Bitwise operators

& AND operation

| OR operation

~ Negation operation

a >> b Shift a in 2‟s complement binary integer representation format to the right by

b bit positions. This operator is only defined with b, a positive integer

a << b Shift a in 2‟s complement binary integer representation format to the left by b

bit positions. This operator is only defined with b, a positive integer

4.5 Assignment

Assignment operator

Increment, x++ is equivalent to x = x + 1. When this operator is used for an

array index, the variable value is obtained before the auto increment operation

-- Decrement, i.e. x– – is equivalent to x = x - 1. When this operator is used for

an array index the variable value is obtained before the auto decrement operation

+= Addition assignment operator, for example x += 3 corresponds to

x = x + 3, x += (-3) is equivalent to x = x + (-3)

-= Subtraction assignment operator,for example x -= 3 corresponds to

x = x - 3, x -= (-3) is equivalent to x = x - (-3)

4.6 Mathemetical functions

Abs(x) =; 0

; 0

x x

x x

(1)

Ceil(x) takes the smallest integer not smaller than x (2)

Clip1(x) = Clip3(0, 255, x) (3)

Clip3(a,b,c) =

;

;

; else

a c a

b c b

c

(4)

Floor(x) takes the biggest integer not bigger than x (5)

Log2(x) logarithm number of x with base 2

Log10(x) logarithm number of x with base 10 (6)

Median(x,y,z) = x + y + z – Min(x, Min(y, z)) – Max(x, Max(y, z)) (7)

Min(x, y) = ;

;

x x y

y x y

(8)

N12355

-21-

Max(x, y) = ;

;

x x y

y x y

(9)

Round(x) = Sign(x) Floor(Abs(x) + 0.5)

Sign(x) =

01

01

x

x (10)

4.7 Description of bitsteam syntax parsing process

and decoding process

4.7.1 Method of describing bitstream syntax

The bitstream description language used for this specification is similar to C language.

Syntax elements of the language are represented in bold type. Each syntax element is described by

its name syntax and semantics. The name is represented by a combination of English words with

all lower case letters separated by an underline character. The value of a syntax element in a

syntax table and in text is represented in normal type.

In some cases, variable values derived from syntax elements need to be used in syntax tables.

These variables in syntax table and in the text use name with combined lower case characters and

upper case characters without underlines. Variables with the first character in upper case are used

for current decoding and related syntax structures. They can be also used for syntax structures

after current decoding. Variables with its first character in lower case are only used inside a

section where they are located.

Mnemonics of syntax element values and Mnemonics of variable values and their

relationships are explained in the text. In some cases, they are used equivocally. A Mnemonic is

represented by combination of words separated by one or more underlines where each word starts

with a upper case character and may contain more upper case characters.

When the bit length of a bit string is integer multiple of 4, it can be represented by

hexadecimal representation. The prefix of hexadecimal representation is „0x‟. For example,

„0x1a‟ represents a bit string „0001 1010‟.

In condition statement, 0 represents FALSE, and non zero represents TRUE.

Syntax tables describe the superset of all the bitstream syntaxes conforming to this

Specification. The additional constraints on syntaxes are explained in the corresponding section.

An example of pseudo bistream description syntax is shown below. When a syntax element

appears, this means that a data element is read from the bitstream.

descriptor

/* a statement is a descriptor of a syntax element, or explains the presence of a syntax element, its type and value. The below shows two examples */

syntax_element ue(v)

conditioning statement

N12355

-22-

/* a combination of statements closed by brace symbols is a compound statement. In terms of functionality, a compound statement is still a statement */

{

statement

statement

}

/* “while” statement first evaluates the condition. If the condition is TRUE, then the statement is executed and looped back to evaluate again the condition. The loop continues until the condition is not TRUE.*/

while ( condition )

statement

/* “do … while” statement first executes the statement and then evaluates the condition. If the condition is TRUE, then looped back to execute the statement. The loop continues until the condition is not TRUE.*/

Do

statement

while ( condition )

/* “if … else”statement first evaluates the condition, if the condition is TRUE, then executes the primary statement, else executes the alternative statement. If the alternative statement does not need to be executed, then the else part and its related alternative statement can be omitted.*/

if ( condition )

primary statement

else

alternative statement

/* “for”statement first executes the initial statement and then evaluates the condition. If the condition is true, then the primary statement and the subsequent statement are executed in sequence and then control is looped back to evaluate the condition. The loop continues until the condition is not TRUE.*/

for ( initial statement; condition; subsequent statement )

primary statement

Parse and decoding process are described using text and C-like pseudo language.

4.7.2 Functions

Functions used for syntax description are explained in this section. It is assumed that the

decoder has a bitstream position indicator. This bitstream position indicator locates the position of

the bit that is going to be read right next. A function consists of its name and a sequence of

parameters inside of parentheses. A function may not have any parameters.

byte_aligned( )

The function byte_aligned () returns TRUE if the current position is on a byte boundary.

Otherwise, it returns FALSE.

N12355

-23-

next_bits( n )

The function returns the next n bits from the bitstream, MSB first. The current bitstream

position indicator is not changed. If the remaining number of bits to be read are less than n, then

returns 0.

byte_aligned_next_bits( n )

If the current position of the bitstream is not byte aligned, returns n bits beginning from the

next byte aligned position, MSB first. The current bitstream position indicator is not changed. If

the current position of the bitstream is byte aligned, returns n bits from the current position, MSB

first. The current bitstream position is not changed. If the remaining number of bits to be read is

less than n, then returns 0.

next_start_code( )

The next_start_code() function locates the next start code. It is defined in the table below.

next_start_code() { descriptor

stuffing_bit '1'

while ( ! byte_aligned() )

stuffing_bit '0'

while ( next_bits(24) != '0000 0000 0000 0000 0000 0001' )

stuffing_byte '0000 0000'

}

The stuffing_bytes shall appear after a picture header and before a slice header start code.

is_end_of_slice( )

This function tests if the current position is at the end of the slice. The function‟s definition is

shown in the table below.

is_end_of_slice () { descriptor

if ( byte_aligned ( ) {

if ( next_bits(32) == 0x80000001

return TRUE; // end of slice

}

else {

if ( (byte_aligned_next_bits(24) == 0x000001) && is_stuffing_pattern() )

return TRUE; // end of slice

}

return FALSE;

}

is_stuffing_pattern( )

This function tests whether the remaining bits of the current byte or the next byte (in case the

current position is byte aligned), are stuffing bits. The function‟s definition is shown in the table

below.

is_stuffing_pattern () { descriptor

if ( next_bits(8-n) == ( 1<< (7-n) ) ) // n:0~7,for shifting the bitstream position indicator in the current byte, when n is 0, the bitstream position indicator indicates the MSB of the current byte.

return TRUE;

N12355

-24-

Else

return FALSE;

}

read_bits( n )

This function returns n bits of the bitstream from the current position, MSB first. The

bitstream position indicator advances n bits. If n is equal to 0, then returns 0. And the bitstream

position indicator does not move.

Functions can be also used for describing parsing process and decoding process.

4.7.3 Descriptor

The descriptors below represent different parsing processes of syntax elements.

b( 8 )

A byte. It‟s parsing process is defined as the returned value of the read_bits(8) function.

f( n )

Specifically define n number of sequential bits. It‟s parsing process is defined as the

returned value of the read_bits(n) function.

i( n )

Integer with n bits. If n is v in the syntax table, the number of bits n is determined by values

of other syntax elements. It‟s parsing process is defined as the returned value of read_bits(n)

function. The returned value shall represent a 2‟s complement number with MSB first.

r( n )

A series of n number of 0s. It‟s parsing process is defined as the returned value of the

read_bits(n) function.

u( n )

Unsigned integer of n bits. If n is v in the syntax table, the number of bits n is determined by

values of other syntax elements. It‟s parsing process is defined as the returned value of

read_bits(n) function. The returned value shall represent a binary number with MSB first.

q( v )

Syntax element of variable length coding. An arithmetic coding is used. Parsing process is

defined in section 8.2.

4.7.4 Reserved, forbidden and marker bit

In this specification, values of some syntax elements are represented as „reserved‟ or

„forbidden‟ in the bitstream definition.

„Reserved‟ is defined as value for some syntax elements, which will be used when this

specification is extended in the future.

„Forbidden‟ is defined as value for some syntax elements. This value should not appear in the

bitstream conforming to this Specification.

„Marker_bit‟ indicates that the value of the bit shall be „1‟.

N12355

-25-

‟Reserved_bits‟ represents that values for some syntax elements are reserved, which will be

used when this specification is extended in the future. The decode processing shall ignore these

bits.

5 Bitstream syntax and semantics

5.1 Structure of coded video data

This section explains the structure of coded bitstream, relationships between layers and

processing order.

5.1.1 Video sequence

The highest syntactic structure of the coded video bitstream is the video sequence. A video

sequence commences with a sequence header which is followed by one or more coded pictures. In

front of each picture, a picture header is present. The order of the coded pictures in the coded

bitstream is the bitstream order. The bitstream order is same as the decoding order. The decoding

order is not necessarily same as the display order. The video sequence is terminated by a

sequence_end_code.

This Specification deals with coding of progressive sequences.

A frame consists of three sample matrices of integers: a luminance sample matrix (Y), and two

chrominance sample matrices (Cb and Cr).

An element of each color sample matrix has integer value. The relationship between these Y, Cb

and Cr components and the primary (analogue) Red, Green and Blue Signals, the chromaticity of these

primaries and the transfer characteristics of the source frame may be specified in the bitstream. This

information does not affect the decoding process.

The output of the decoding process is a series of frames. Reconstructed frames are separated

in time by a frame period.

5.1.2 Sequence header

A video sequence header commences with sequence header start code and is followed by a series

of coded picture data. A sequence header is allowed to be repeatedly present in bitstream. This

sequence header is called repeat sequence header. The main purpose of repeat sequence header is

providing with random access functionality. The first coded picture after a sequence header should be I

frame. The first P frame after a sequence header only refers to pictures appeared after the sequence

header. If a bitstream is edited so that all of the data preceding any of the repeat sequence headers is

removed (or alternatively random access is made to that sequence header), then the resulting bitstream

shall be a legal bitstream that complies with this specification.

N12355

-26-

5.1.3 Picture

A picture is a frame. Its coded data starts with a picture start code and ends with a sequence

start code, a sequence end code or another picture start code. The decode process of a picture

includes parsing processing and decoding processing.

5.1.4 Color format

In 4:2:0 format, the Cb and Cr matrices shall be one half the size of the Y-matrix in both

horizontal and vertical dimensions. The luminance and chrominance samples are positioned as

shown in Figure 1.

Luminance sample Chrominance sample

Figure 1 Position of luminance and chrominance samples in 4:2:0 format

5.1.5 Picture types

This specification defines 2 types of decoded pictures:

1) a non-bidirectional Predictive-decoded (P);

2) a Bidirectional predictive-decoded (B) picture.

5.1.6 Order between pictures

If there is no B frames in a video sequence, the decoding order and the display order are same.

If a video sequence contains more than one B frame, the decoding order is not same as the display

order so that before the decoded pictures are output to display, they need to be reordered. The

re-ordering is performed according to the following rules:

1) If there are no decoded frames, and the current frame is not coded with only intra blocks,

no frame is output. If there are no decoded frames, and the current frame is coded with

only intra blocks, the frame is reconstructed and marked as P-frame;

2) If the current frame to decode is a B-frame, the output frame is the frame reconstructed

from that B frame;

3) If the current frame to decode is a P-frame and a previously decoded P-frame exists, the

output frame is the frame reconstructed from the previously decoded P-frame. If

previously decoded P-frame does not exist, no frame is output;

N12355

-27-

4) After all the steps are finished, if there are still frames not output in the buffer, output

those frames.

The following is an example for explaining re-ordering: there are two coded B-frames

between successive coded P-frames. The P-frame with only intra coded blocks is marked as “I”.

Frame „1I‟ is used to form a prediction for frame „4P‟. Frames „4P‟ and „1I‟ are both used to form

predictions for frames „2B‟ and „3B‟. Therefore the order of coded frames in the coded sequence

shall be „1I‟, „4P‟, „2B‟, „3B‟. However, the decoder shall display them in the order „1I‟, „2B‟,

„3B‟, „4P‟.

Encoder input order:

1 2 3 4 5 6 7 8 9 1

0

1

1

1

2

1

3

I B B P B B P B B I B B P

Decoding order :

1 4 2 3 7 5 6 1

0

8 9 1

3

1

1

1

2

I P B B P B B I B B P B B

Decoder output (display order):

1 2 3 4 5 6 7 8 9 1

0

1

1

1

2

1

3

I B B P B B P B B I B B P

5.1.7 Reference picture

At most two reference pictures can be used for P or B frame coding. P frame can use one

forward frames as reference; B frame can refer to one forward reference frame and one backward

reference frame.

In a situation where a pixel indicated by a motion vector is outside of the reference picture

boundary, the nearest integer sample inside a picture from the indicated outside position shall be

used for boundary padding. For luminance sample matrix, pixels in a reference block shall not

surpass 16 pixels both horizontally and vertically from the reference picture boundary. For

chrominance sample matrix, if color format is 4:2:0, pixels in a reference block shall not surpass 8

pixels both horizontally and vertically from the reference picture boundary.

5.1.8 Slice

Slice is a series of one or more macroblocks in the order of raster scan. Macroblocks of a slice

shall not overlap and also slices shall not overlap. The position of slices may change from picture

to picture. The decoding process of a macroblock inside a slice should not use data in the other

slices of the same picture.

N12355

-28-

5.1.9 Macroblock

A picture is partitioned into macroblocks. The top-left corner of macroblock shall not surpass

the boundary of picture. For interlace case, when two coded fields for a frame appears in sequence

in the bitstream, any macroblock shall consist of pixels from the same field data.

A macroblock is partitioned for motion compensation as shown in Figure 3. The number

inside a rectangle indicates the order of motion vectors and reference indices after partitioning in

the bitstream.

Figure 3 Macroblock partition

5.1.10 8x8 block

For 4:2:0 format, a macroblock contains 4 blocks of 8x8 luminance (Y) block and 2

chrominance blocks of 8x8 size (one Cb and one Cr). The numbers shown in Figure 4 indicate the

order of 8x8 blocks in a macroblock.

04 5

1

2 3

Y Cb Cr

Figure 4 partitioning of a macroblock into 8x8 blocks (4:2:0 format)

5.1.11 4x4 block

For 4:2:0 format, a macroblock contains 16 blocks of 4x4 luminance (Y) block and four 4x4

blocks of Cb, and four 4x4 blocks of Cr. The numbers shown in Figure 5 indicate the order of 4x4

blocks in a macroblock.

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

0 1

2 3

0 1

2 3

Y Cb Cr

Figure 5 partitioning of a macroblock into 4x4 blocks (4:2:0 format)

0

0 1

2 3

A 16x16 luma block

and its corresponding

chroma block

Four 8x8 luma blocks

and their corresponding

chroma blocks

N12355

-29-

5.2 Bitstream syntax

5.2.1 Start codes

Start codes are specific bit strings that do not otherwise occur in the video stream. Each start

code consists of a start code prefix followed by a start code value. The start code prefix is the bit

string „0000 0000 0000 0000 0000 0001‟.All the start codes shall be byte aligned.

Start code value is an 8 bit integer. The following table 1 shows various start code values

used in this Specification.

Table 1 Start code value

Start code type Start code value

( hexadecimal )

videoSequenceStartCode B0

videoSequenceEndCode B1

userDataStartCode B2

pictureStartCode B3

sliceStartCode 00~7F

reserved B4-B6

extensionStartCode B7

reserved B8

5.2.2 Video sequence

videoSequence() { descriptor

do {

nextStartCode()

videoSequenceStartCode f(32)

profileID u(8)

levelID u(8)

if(profileID==0x20) {

numberBidirectionallyPredictedPictures u(3)

baselineSequenceHeader()

N12355

-30-

}

extensionAndUserData(0)

do {

pictureHeader()

pictureData()

} while ( nextBits(32) == pictureStartCode)

} while ( nextBits(32) != videoSequenceEndCode)

videoSequenceEndCode f(32)

}

5.2.2.1 Baseline sequence header

baselineSequenceHeader() { descriptor

horizontal_size u(14)

vertical_size u(14)

frame_rate_code u(4)

bit_rate_lower u(18)

marker_bit f(1)

bit_rate_upper u(12)

chroma_format u(2)

sample_precision u(2)

aspect_ratio u(4)

marker_bit f(2)

pictureApplicationDataEnable f(1)

reserved_bits r(5)

nextStartCode()

}

5.2.3 Extension and user data

extensionAndUserData( i ) { descriptor

while ( ( nextBits(32) extensionStartCode ) || ( nextBits(32) user_dataStartCode ) ) {

if ( nextBits(32) extensionStartCode )

extension_data( i )

if ( nextBits(32) user_dataStartCode )

userData()

}

N12355

-31-

}

5.2.3.1 Extension data

extensionData( i ) { descriptor

while (nextBits(32) == extensionStartCode ) {

extensionStartCode f(32)

while ( nextBits(24) != '0000 0000 0000 0000 0000 0001' )

extensionDataByte u(8)

}

}

5.2.3.2 User data

extensionData( i ) { descriptor

while (nextBits(32) == extensionStartCode ) {

extensionStartCode f(32)

while ( nextBits(24) != '0000 0000 0000 0000 0000 0001' )

extensionDataByte u(8)

}

}

5.2.4 Picture

5.2.4.1 Picture header

pictureHeader() { descriptor

pictureStartCode u(32)

if (pictureApplicationDataEnable) {

pictureApplicationData u(18)

marker_bit f(1)

pictureApplicationData u(18)

marker_bit f(1)

pictureApplicationData u(2)

}

fixed_picture_qp u(1)

picture_qp u(6)

vbs_enable u(1)

nextStartCode()

N12355

-32-

}

5.2.4.2 Picture data

pictureData() { descriptor

do {

slice()

} while ( nextBits(32) == sliceStartCode )

nextStartCode()

}

5.2.5 Slice

The MPEG-2 style slice is used in the ITM.

5.2.6 Macroblock

macroblock() { descriptor

mb_skip_flag q(v)

mb_qp_delta q(v)

blockSize // (16 or 8) q(v)

if (blockSize == 16) {

mbSpatialTemporalDirection // (0: intra, 1: fwd, 2: bwd, 3: bi,) q(v)

if (mbSpatialTemporalDirection != intra) {

mvNum = getMotionVectorNumber(mbSpatialTemporalDirection) // (0, 1, 1, 2)

for ( i = 0; i < mvNum; i++ ) {

mvDiffX(i) q(v)

mvDiffY(i) q(v)

}

for ( i = 0; i < 4; i++ ) {

block (8)

}

} else{ // intra macroblock

for (i=0, i<4, i++) {

if (vbs_enable) {

if subBlockSize (i) = 8 { // (8 or 4) q(v)

lumaIntraMode(i) q(v)

block(8)

} else { // subBlockSize (i) = 4

for (j=0, j<4, j++)

N12355

-33-

lumaIntraMode(i,j) q(v)

block(4)

}

}

} else {

subBlockSize (i) = 8

lumaIntraMode(i) q(v)

block(8)

} // vbs_enable

} // for (i)

chromaIntraMode

} // mbSpatialTemporalDirection != intra or intra

} else { // blockSize = 8

for (i=0; i<4; i++) {

if (subBlockSize (i) == 8) { q(v)

subMBSpatialTemporalPredictionDirection (i) q(v)

if (subMBSpatialTemporalDirection(i) != intra) {

mvNum = getMotionVectorNumber(subMBSpatialTemporal Direction(i)) // (0, 1, 1, 2)

for ( k = 0; k< mvNum; k++ ) {

mvDiffX(i, k) q(v)

mvDiffY(i, k) q(v)

}

} else {

lumaIntraMode(i) q(v)

}

block(8)

} else { // subBlockSize (i) == 4

for (j=0; j<4; i++) {

blockSpatialTemporalPredictionDirection (i, j) q(v)

if (blockSpatialTemporalDirection(j) != intra) {

mvNum = get_motion_vector_number(blockSpatialTemporalDirection(i, j))

for ( k = 0; k< mvNum; k++ ) {

mvDiffX(i, j, k) q(v)

mvDiffY(i, j, k) q(v)

}

} else {

lumaIntraMode(i,j) q(v)

}

block(4)

} // for (j)

} // subBlockSize (i) = 8 or 4

} // for (i)

} // blockSize == 16 or 8

N12355

-34-

block(8) // Cr coeffs in 8x8

block(8) // Cb coeffs in 8x8

}

5.2.7 Block

block(size) { descriptor

for (cof=0; cof<size*size;) {

if ( cof != (size*size-1))

eob_flag q(v)

if (eob_flag == „0‟ || (coef== (size*size-1)) ) {

do {

trans_coefficient q(v)

cof++

} while (trans_coefficient == „0‟)

}

else

break;

}

}

5.3 Video bitstream semantics

5.3.1 Video sequence

video_sequence_start_code

The video_sequence_start_code is the bit string equal to „0x000001B0‟ in hexadecimal. It

indicates that the start of one video sequence.

video_sequence_end_code

The video_sequence_end_code is the bit string „0x000001B1‟ in hexadecimal. It indicates the

end of one video sequence.

profile_id

This is an eight-bit unsigned integer used to specify the profile of the bitstream.

level_id

This is an eight-bit unsigned integer used to specify the level of the bitstream.

numberBidirectionallyPredictedPictures

Indicates the fixed number of bi-directionally predicted pictures between each forward

predicted picture.

N12355

-35-

5.3.2 Sequence header

horizontal_size

The horizontal_size is a 14-bit unsigned integer used to specify the width of the intended

display‟s region of the luminance component of pictures in samples.

The width of the encoded luminance component of pictures in macroblocks, MBwidth, is

calculated as:

MbWidth = (horizontal_size + 15) / 16。

The value of horizontal_size should not be zero. The unit of horizontal_size should be image

samples per line. The displayable part is left-aligned in the decoded pictures.

vertical_size

The vertical_size is a 14-bit unsigned integer used to specify the height of the intended

display‟s region (it‟s top-aligned in the decoded pictures) of the luminance component of pictures

in lines.

The height of the encoded luminance component of frame pictures in macroblocks,

MbHeight, is calculated as

MbHeight = (vertical_size + 15) / 16

The value of vertical_size should not be zero. The unit of horizontal_size should be the lines

of image samples.

Note: the relationship between horizontal_size, vertical_size and the image borders is

illustrated in figure 6. In figure 6, the solid line represents the border of the displayable part. Its

width and height are specified by horizontal_size and vertical_size respectively. The dotted line

represents the border of the pitcures. Its width and height are specified by MbWidth and

MbHeight respectively. For example, if horizontal_size is 1920 and vertical_size is 1080, then

MbWidth 16 equals to 1920 and MbHeight 16 equals to 1088.

Figure 6 Illustration of the image border

frame_rate_code

This is a 4-bit unsigned integer indicating the frame rate as defined in the Table 2.

Table 2 the frame rate code

frame_rate_code Frame rate

N12355

-36-

0000 forbidden

0001 24000 ÷ 1001 (23.976...)

0010 24

0011 25

0100 30000 ÷ 1001 (29.97...)

0101 30

0110 50

0111 60000 ÷ 1001 (59.94...)

1000 60

1001 ~ 1111 reserved

In the case that progressive_sequence is „1‟, the time interval between two continuous frames

is the reciprocal of frame rate.

In the case that progressive_sequence is „0‟, the time interval between two fields is half of the

reciprocal of frame rate.

bit_rate_lower

The lower 18 bits of Bitrate.

bit_rate_upper

The upper 12 bits of Bitrate.

Bit_rate is measured in units of 400 bits/second, rounded upwards. The value zero is

forbidden.

BitRate = (bit_rate_upper << 18) + bit_rate_lower

chroma_format

This is a 2-bit integer indicating the chrominance format as defined in Table 3

Table 3 chrominance format

chroma_format Meaning

00 4:0:0

01 4:2:0

10 4:2:2

11 reserved

sample_precision

This is a 2-bit unsigned integer indicating the precision of luminance and chrominance

samples as defined in Table 4

Table 4 sample precision

sample_precision meaning

00 forbidden

01 Precision of luminance and chrominance are 8 bits

N12355

-37-

10 reserved

11 reserved

aspect_ratio

This is a 4-bit unsigned integer indicating the Sample Aspect Ratio (SAR) or the Display

Aspect Ratio (DAR) as defined in Table 5.

Table 5 aspect ratio

aspect_ratio Sample Aspect Ratio

(SAR)

Display Aspect Ratio

(DAR)

0000 forbidden forbidden

0001 1.0 –

0010 – 4 ÷ 3

0011 – 16 ÷ 9

0100 – 2.21 ÷ 1

0101 ~ 1111 – reserved

If the sequence_display_extension() is not present in the bitstream, then the entire

reconstructed frame is intended to be mapped to the entire active region of the display. The sample

aspect ratio

may be calculated as follows:

SAR = DAR vertical_size horizontal_size

NOTE - In this case, horizontal_size and vertical_size are constrained by the SAR of the

source and the DAR selected.

If the sequence_display_extension() is present then the sample aspect ratio may be calculated

as:

SAR = DAR display_vertical_size display_horizontal_size

pictureApplicationDataEnable

This is one bit flag. „1‟ indicates that pictureApplicationData appears in the picture header.

„0‟ indicates that pictureApplicationData does not appear in the picture header.

5.3.3 Extension data and user data

5.3.3.1 Extension data

extension_start_code

The extension_start_code is the bit string „0x000001B5‟ in hexadecimal. It identifies the

beginning of video extension data.

extension_data_byte

The extension_data_byte is an 8-bit unsigned integer which is used for identifying the video

extension data.

N12355

-38-

5.3.3.2 user data

user_data_start_code

The user_data_start_code is the bit string „0x000001B2‟ in hexadecimal. It identifies the

beginning of user data. The user data continues until receipt of another start code.

user_data

This is an 8-bit integer. User data is defined by users for their specific applications. In the

series of consecutive user_data bytes there shall not be a string of 23 or more consecutive zero

bits.

5.3.4 Picture

5.3.4.1 Picture header

picture_start_code

The picture_start_code is the bit string 0x000001B3‟ in hexadecimal. It is the startcode of

aframes and identifies the beginning of a frame.

pictureApplicationData

may be used by an application.

fixed_picture_qp

This is one bit flag. „1‟ indicates the quantization parameter does not change in the picutre. „0‟

indicates the quantization parameter may change.

picture_qp

This is 6-bit unsigned integer. It specifies the quantization parameter of the picture, with a

range from 0 to 63 inclusive.

vbs_enable

This is one bit flag. „1‟ indicates that current decoded picture can use 4x4 transforms. „0‟

indicates 4x4 luminance blocks are not allowed. If this flag is not present in the picture header, it

is set to be „0‟.

5.3.5 Slice

start_code_prefix

The start_code_prefix is the 24-bit bit string „0x000001‟ in hexadecimal.

5.3.6 Macroblock

mb_skip_flag

It equal to 1 specifies that the current macroblock is skiped and equal to 0 specifies that the

current macroblock is not skipped.

mb_qp_delta

It gives the increment of current quantization coefficients relative to predicted quantization

coefficients, with a range of -32 to 31. The QP of the current Macroblock QPMB is equal to

picture_qp + mb_qp_delta. If mb_qp_delta is not present in the picture header, it is set to be 0.

N12355

-39-

blockSize

It equal to 16 specifies that the current macroblock is coded as one block with 16x16-size and

equal to 8 specifies that the current macroblock is divided into four 8x8 blocks.

mbSpatialTemporalDirection

It equal to 0 specifies that the current block is intra coded, equal to 1 specifies that the current

block is forward predicted, equal to 2 specifies that the current block is backward predicted and

equal to 3 specifies that the current block is bi-predicted.

subBlockSize

It equal to 8 specifies that the current 8x8 block is coded as one block and equal to 4 specifies

that the current block is divided into four 4x4 blocks.

subMBSpatialTemporalPredictionDirection

It equal to 0 specifies that the current block is intra coded, equal to 1 specifies that the current

block is forward predicted, equal to 2 specifies that the current block is backward predicted and

equal to 3 specifies that the current block is bi-predicted.

blockSpatialTemporalPredictionDirection

It equal to 0 specifies that the current block is intra coded, equal to 1 specifies that the current

block is forward predicted, equal to 2 specifies that the current block is backward predicted and

equal to 3 specifies that the current block is bi-predicted.

lumaIntraMode

It is used to determine the intra prediction mode of a luma block. It equal to 0 specifies that

the prediction mode for the current block is horizontal prediction, equal to 1 specifies that it is

vertical prediction and equal to 2 specifies that it is direct prediction. If it is not present, it is set to be

2.

chromaIntraMode

It is used to determine the intra prediction mode of a luma block. It equal to 0 specifies that

the prediction mode for the current block is horizontal prediction, equal to 1 specifies that it is

vertical prediction and equal to 2 specifies that it is direct prediction. If it is not present, it is set to be

2.

mvDiffX

mvDiffY

They define the values of motion vector differences. It is in one-half luma sample unit, with

range -2048 to 2047 (the range is -1024 to 1023.75 in luma sample units). Decoder decodes all

forward motion vectors first, and then decodes all backward motion vectors. See subclause 8.2 for

parsing process.

5.3.7 Block

eob_flag

This flag, when set to „1‟, indicates that trans_coefficient of current block have not been decoded

completely, there is still non-zero trans_coefficient after it.

trans_coefficient

N12355

-40-

Transform coefficient, could be either non-zero value or zero value.

N12355

-41-

6 Video decoding process

This chapter defines video decoding process.

The video decoding process is shown in figure 7.

Variable Length

Decoding

Inverse Quantis-

ation

Inverse Scan

Motion Compen-

sation

Inverse DCT

Frame- store

Memory

f[y][x]F[v][u]

QF[v][u]QFS[n]

Coded Data

Decoded samples

d[y][x]

Figure 7 video decoding process

6.1 High-level syntax structure

The reconstructed frames shall be output from the decoding process at regular intervals of the

frame period.

6.2 Variable length decoding

Option-1 video uses binary arithmetic code based on QM-coder. This method uses definite

state auto machine to running after the change of the probability for one or more syntax elements

which share the same probability distributing, and code or decoder the syntax elements with

binary arithmetic code based on the context.

The decoder of QM-coder is defined as

typedef struct qcoder {

unsigned long interval;

unsigned long code;

int code_bits;

}

There are two registers in QM-coder: the probability interval register and the code register.

QM-coder uses 16bit unsigned integer to estimate the probability. The initial value for the interval

is 0x10000, and the renormalization boundary value is 0x8000. The definition of interval and code

register is in Table 6.

Table 6 interval and code register

Interval 00000000 00000000 vvvvvvvv vvvvvvvv

Code xxxxxxxx xxxxxxxx bbbbbbbb 00000000

N12355

-42-

In interval register, “v” bits stand the size of the interval in current. And in code register, “x”

bits are the sub-interval bits in current, “b” bits are the value of the next input byte from the bit

stream.

Definite state auto machine defines the rules for the probability estimation and changing. Its

structure is :

typedef struct prob_state {

int lps_interval;

int next_state_lps;

int next_state_mps;

int do_switch_mps;

} prob_state_t;

Context(prob_context_t) is made up of two things: the current state from definite state auto

machine and the next probability prediction. The structure is defined below.

typedef struct prob_context {

int mps;

int state;

prob_state_t* prob_fsm;

} prob_context_t;

In entropy decoding process, there are mainly two methods which can be found in Table 7.

Table 7 Mainly methods in entropy decoding

Methods Function

initializeArithmeticCoder () Initialization of the qcoder decoder engine

qcoder_decode_symbol(prob_context_t context) Output the binary value based on the input context

In itializeArithmeticCoder() is to initialize the context value of the syntax elements in the

qcoder decoder. And qcoder_decode_symbol(prob_context_t context) output the bits of “0” and

“1” based on the context.

6.2.1 Initialization of the qcoder Decoder

In initializeArithmeticCoder() processing, every syntax element has the initial value. There

are many different but independent syntax elements, so there are also many different but

independent contexts. Every context can predict the next state according to their own state

machine, and update the state machine. The flow chart of the Initialization processing is:

N12355

-43-

Figure 8 The initialization of decoder

The syntax elements and the corresponding context initial values are in Table 8.

Table 8 syntax elements and the corresponding contexts initial values

Contexts syntax elements value

mps state prob_fsm

cx_eq_prob N/A 0 0 eq_prob_fsm

OTHERS All other syntaxes 0 0 standard_prob_fsm

eq_prob_fsm and standard_prob_fsm are probability prediction state machine, which is

obtained from by certain learning processing. This is the same with JPEG Annex-D, which can be

found in Annex-A.

6.2.2 Entropy decoding processing

qcoder_decode_symbol(prob_context_t context) runs with certain context as its input, and

qcoder_init()

interval = 0x10000

code = 0

code += (input_byte() << 8)

code <<= 8

code += (input_byte() << 8)

code <<= 8;

code += (input_byte() << 8)

code_bits = 8

context initilization

return

N12355

-44-

produce a binary value. The flow chart is as below.

Figure 2 Flow chart for entropy decoding

MPS conditional exchanging processing Cond_MPS_EX(prob_context_t c) is as Figure 10.

qcoder_decode_symbol(prob_context_t c)

interval -= c.lps_interval

code < interval

Yes

Yes

b = Cond_MPS_EX(c)

Renormalize()

b = c.mps

No

b = Cond_LPS_EX(c)

Renormalize()

return b

No

interval < 0x8000

N12355

-45-

Figure 10 Flow chart of MPS conditional exchanging processing

LPS conditional exchanging processing Cond_LPS_EX(prob_context_t c) is as Figure 11.

Figure 11 Flow chart of LPS conditional exchanging processing

Cond_LPS_EX(prob_context_t c)

interval < c.lps_interval

No Yes

b = c.mps

code -= interval

interval = c.lps_interval

b = 1 - c.mps

code -= interval

interval = c.lps_interval

MPS_estimate(c)

Cond_MPS_EX(prob_context_t c)

interval < c.lps_interval

No Yes

b = 1 – c.mps b = c.mps

MPS_estimate(c)

return b

LPS_estimate(c)

LPS_estimate(c)

return b

N12355

-46-

The flow chart of Renormalization processing is as Figure 12.

Figure 3 Flow chart of Renormalization

LPS_estimate is the processing to compute the value of interval under the LPS condition,

which is defined in Figure 13.

Renormalize()

interval <<= 1

code <<= 1

code_bits --

code_bits == 0

No

code += input_byte()<<8

code_bits = 8

Interval < 0x8000

Yes

No

return

Yes

N12355

-47-

Figure 4 Flow chart of LPS_estimate

MPS_estimate is the processing to compute the value of interval under the MPS condition,

which is defined in Figure 14.

Figure 5 Flow chart of MPS_estimate

6.2.3 Binary decoding method

6.2.3.1 Decoding the flag

This is to decoding a flag signal from the bit stream based on one certain context, and its flow

chart is in Figure 15 as below.

MPS_estimate(prob_context_t c)

c.state = c.prob_fsm[c.state].next_state_mps

interval = c.prob_fsm[c.state].lps_interval

return

LPS_estimate(prob_cntext_t c)

c.do_switch_mps

Yes No

c.mps = 1 – c.mps

c.state = c.prob_fsm[c.state].next_state_lps

interval = c.prob_fsm[c.state].lps_interval

return

N12355

-48-

Figure 6 flow chart of decoding a flag

6.2.3.2 Decoding the fixed length unsigned value

This is to produce an unsigned and fixed length integer from the bit stream based on certain

context. Its flow chart is in Figure 16.

Aricod_decode_flag(prob_context_t c)

b = qcoder_decode_symbol(c)

return b

N12355

-49-

Figure 16 Flow chart of decoding the fixed length unsigned value

6.2.3.3 Decoding unsigned unary code

This is to produce an unsigned unary code from the bit stream based on certain context, and

put the unary code to an unsigned integer. Its flow chart is in Figure 17.

Aricod_decode_fixed_bits(prob_context_t c[], int nc, int nb)

n = 0, i = 0

value = 0

b = qcoder_decode_symbol(nextCX(n++, c[], nc))

b == 0

Yes

i = nb – 1

b = qcoder_decode_symbol(nextCX(n++, c[], nc))

b == 0

No

No

value |= 1<<i

i --

i >= 0

Yes

No

return value+1

Yes

N12355

-50-

Figure 7 Flow chart of Decoding unsigned unary code

6.2.3.4 Decoding signed unary code

This is to produce a signed unary code from the bit stream based on certain context, and put

the unary code into a signed integer. Its flow chart is in Figure 18.

Aricod_decode_unary(prob_context_t c[], int nc)

n = 0

value = 0

b = qcoder_decode_symbol(nextCX(n++, c[], nc))

b == 0

value ++

return value

Yes

No

N12355

-51-

Figure 18 Flow chart of decoding signed unary code

6.2.3.5 Decoding unsigned truncated unary code

This is to produce an unsigned truncated unary code from the bit stream based on certain

context, and put the truncated unary code into an unsigned integer. Its flow chart is in Figure 19.

Aricod_decode_signed_unary(prob_context_t c[], int nc)

n = 0

value = 0

b = qcoder_decode_symbol(nextCX(n++, c[], nc))

b == 0

value ++

return value

Yes

No

pos = value & 1

value += 1

value >>= 1

Value = value * (pos?1:-1)

N12355

-52-

Figure 19 Flow chart of decoding unsigned trunary code

6.2.3.6 Decoding unsigned Exp-Golomb code

This is to produce an unsigned Exp-Golomb code from the bit stream based on certain

context, and put the Exp-Golomb code into an unsigned integer. Its flow chart is in Figure 20.

Aricod_decode_truncated_unary(prob_context_t c[], int nc, int maxValue)

n = 0

value = 0

b = qcoder_decode_symbol(nextCX(n++, c[], nc))

value < maxValue && b!=0

value ++

Yes

Return value

No

N12355

-53-

Figure 20 Flow chart of decoding unsigned Exp-Golomb code

Aricod_decode_expGolomb(prob_context_t c[], int nc, int k)

n = 0

value = 0

b == 0

b = qcoder_decode_symbol(nextCX(n++, c[],

nc))

return value

b = qcoder_decode_symbol(nextCX(n++, c[],

nc))

b == 0

No

Yes

No

Value |= 1 << k++

k-- >

0 Yes

b = qcoder_decode_symbol(cx_eq_prob)

b == 0

Value += (1<<k)

return value

Yes

No

value++

N12355

-54-

6.2.3.7 Decoding signed Exp-Golomb code

This is to produce a signed Exp-Golomb code from the bit stream based on certain context,

and put the Exp-Golomb code into a signed integer. Its flow chart is in Figure 21.

Figure 8 Flow chart of decoding signed Exp-Golomb code

Aricod_decode_signed_expGolomb(prob_context_t c[], int nc,

int k)

n = 0, neg = 0

value = 0

b == 0

b = qcoder_decode_symbol(nextCX(n++, c[],

nc))

return value

b = qcoder_decode_symbol(nextCX(n++, c[],

nc))

b == 0

No

Ye

s

No

Value |= 1 << k++

k-- > 0

Yes

b = qcoder_decode_symbol(cx_eq_prob)

b == 0

Value += (1<<k)

return value

Yes

No

neg = qcoder_decode_symbol(nextCX(n++, c[], nc))

value++

value = value * (neg ? -1 : 1)

N12355

-55-

6.2.3.8 Decoding of syntax elements

6.2.3.8.1 Decoding macroblockSkipFlag

The syntax element macroblockSkipFlag in the bit stream is using flag, and the decoding

processing is: aricod_decode_flag(cx_skip_flag).

6.2.3.8.2 Decoding blocksize

The syntax element blocksize in the bit stream is using flag, and the decoding processing is:

aricod_decode_flag(cx_block_size).

6.2.3.8.3 Decoding subBlockSize

The syntax element subBlockSize in the bit stream is using flag, and the decoding processing

is: aricod_decode_flag(cx_subblock_size).

6.2.3.8.4 Decoding mbSpatialTemporalDirection

The syntax element mbSpatialTemporalDirection in the bit stream is using fixed length code, and

the decoding processing is: aricod_decode_fixed_bits(cx_mb_dir,2).

6.2.3.8.5 Decoding subMBSpatialTemporalDirection

The syntax element subMBSpatialTemporalDirection in the bit stream is using fixed length code,

and the decoding processing is: aricod_decode_fixed_bits(cx_submb_dir,2).

6.2.3.8.6 Decoding blockSpatialTemporalDirection

The syntax element blockSpatialTemporalDirection in the bit stream is using fixed length code,

and the decoding processing is: aricod_decode_fixed_bits(cx_block_dir,2).

6.2.3.8.7 Decoding eobFlag

The syntax element eobFlag in the bit stream is using flag.

The decoding processing for 8x8 luminance block is: aricod_decode_flag

(cx_luma_8x8[idx]),

The decoding processing for 8x8 chroma block is: aricod_decode_flag

(cx_chroma_8x8[idx]),

N12355

-56-

The decoding processing for 4x4 luminance block is: aricod_decode_flag

(cx_luma_4x4[idx]).

6.2.3.8.8 Decoding chromaIntraMode

The syntax element chromaIntraMode in the bit stream is using truncated unary code, and the

decoding processing is: aricod_decode_truncated_unary (cx_chroma_mode, 2,2).

6.2.3.8.9 Decoding lumaIntraMode

The syntax element lumaIntraMode in the bit stream is using truncated unary code, and the

decoding processing is: aricod_decode_truncated_unary (cx_luma_mode, 2,2).

6.2.3.8.10 Decoding mvDiffx, mvDiffy

The syntax elements mvDiffx and mvDiffy in the bit stream is using signed Exp-Golomb

code, and the decoding processing is: aricod_decode_signed_expGolomb(cx_mvd_x, 5, 0) and

aricod_decode_signed_expGolomb(cx_mvd_y, 5, 0).

6.2.3.8.11 Decoding mb_qp_delta

The syntax element mb_qp_delta in the bit stream is using signed unary code, and the

decoding processing is: aricod_decode_signed_unary(cx_delta_qp, 4)。

6.2.3.8.12 Decoding trans_coefficient

The syntax element trans_coefficient in the bit stream is using signed Exp-Golomb code.

The decoding processing for 8x8 luminance block is:

aricod_decode_signed_expGolomb(cx_luma_8x8[idx<32?idx:32], 6, 0),

The decoding processing for 8x8 chroma block is:

aricod_decode_signed_expGolomb(cx_chroma_8x8[idx<32?idx:32], 6, 0),

The decoding processing for 4x4 luminance block is:

aricod_decode_signed_expGolomb(cx_luma_4x4[idx<8?idx:8], 6, 0).

6.2.3.8.13 Decoding macroblockSkipFlag

The syntax element macroblockSkipFlag in the bit stream is using flag, and the decoding

processing is: aricod_decode_flag (cx_macroblockSkip_flag).

N12355

-57-

6.3 Inverse scanning

6.3.1 Inverse scanning process for 4×4 block coefficients

Input of this process is an array Q with size of 16. The elements of the array is qn, with 0≤n≤15.

Output of this process is a two-dimensional array C with size of 4×4. The elements of the array is cij,

with 0≤i≤3,0≤j≤3.

The conversion between the array Q and C is: cij= qn , and Table 9 shows the mapping from the index n

of Q to the indices i and j of the array C.

Table 9 Inverse scanning order of 4×4 block

n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

i 0 1 0 0 1 2 3 2 1 0 1 2 3 3 2 3

j 0 0 1 2 1 0 0 1 2 3 3 2 1 2 3 3

6.3.2 Inverse scanning process for 8×8 block coefficients

Input of this process is an array Q with size of 64. The elements of the array is qn, with 0≤n

≤63.

Output of this process is a two-dimensional array C with size of 8×8. The elements of the

array is cij, with 0≤i≤7,0≤j≤7.

The conversion between the array Q and C is: cij= qn , and Table 10 shows the mapping from

the index n of Q to the indices i and j of the array C.

Table 10 Inverse scanning order of 8×8 block

n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

i 0 1 0 0 1 2 3 2 1 0 0 1 2 3 4 5

j 0 0 1 2 1 0 0 1 2 3 4 3 2 1 0 0

n 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

i 4 3 2 1 0 0 1 2 3 4 5 6 7 6 5 4

j 1 2 3 4 5 6 5 4 3 2 1 0 0 1 2 3

n 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

i 3 2 1 0 1 2 3 4 5 6 7 7 6 5 4 3

j 4 5 6 7 7 6 5 4 3 2 1 2 3 4 5 6

n 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

i 2 3 4 5 6 7 7 6 5 4 5 6 7 7 6 7

N12355

-58-

j 7 7 6 5 4 3 4 5 6 7 7 6 5 6 7 7

6.4 Inverse quantization

6.4.1 Quantization parameter

Input of this process is QPMB.

Output of this process is QP.

If current block is a luma one, QP is equal to QPMB.

If current block is a chroma one, the relationship between QP and QPMB is given in table 11.

Table 11 The relationship between QP and QPMB in chroma block

QPMB <43 43 44 45 46 47 48 49 50 51 52

QP QPMB 42 43 43 44 44 45 45 46 46 47

QPMB 53 54 55 56 57 58 59 60 61 62 63

QP 47 48 48 48 49 49 49 50 50 50 51

6.4.2 Inverse quantization process

Inputs of this process are

— the variables of BitDepth and QP

— a two-dimensional array C with size of N×N. The elements of the array is cij, with 0≤i

≤N-1,0≤j≤N-1.

Output of this process is a two-dimensional array D with size of N×N. The elements of the

array is dij, with 0≤i≤N-1,0≤j≤N-1. N can be 4 or 8, which means 4×4 or 8×8 block

respectively.

The inverse quantization process is:

dij = Sign( (Abs(cij) ×DequantTable(QP) + 2(ShiftTable(QP)-1)

)>> ShiftTable(QP) , cij )

Data in the bitstream shall ensure that any element cij and dij must be in the range of integer values

from -2(BitDepth+7)

to 2(BitDepth+7)

-1, inclusive.

Table 12 shows the relationship between QP, DequantTable and ShiftTable

Table 12 The relationship between QP, DequantTable and ShiftTable

QP 0 1 2 3 4 5 6 7

DequantTable(QP) 32768 36061 38968 42495 46341 50535 55437 60424

ShiftTable(QP) 14 14 14 14 14 14 14 14

QP 8 9 10 11 12 13 14 15

DequantTable(QP) 32932 35734 38968 42495 46177 50535 55109 59933

ShiftTable(QP) 13 13 13 13 13 13 13 13

N12355

-59-

QP 16 17 18 19 20 21 22 23

DequantTable(QP) 65535 35734 38968 42577 46341 50617 55027 60097

ShiftTable(QP) 13 12 12 12 12 12 12 12

QP 24 25 26 27 28 29 30 31

DequantTable(QP) 32809 35734 38968 42454 46382 50576 55109 60056

ShiftTable(QP) 11 11 11 11 11 11 11 11

QP 32 33 34 35 36 37 38 39

DequantTable(QP) 65535 35734 38968 42495 46320 50515 55109 60076

ShiftTable(QP) 11 10 10 10 10 10 10 10

QP 40 41 42 43 44 45 46 47

DequantTable(QP) 65535 35744 38968 42495 46341 50535 55099 60087

ShiftTable(QP) 10 9 9 9 9 9 9 9

QP 48 49 50 51 52 53 54 55

DequantTable(QP) 65535 35734 38973 42500 46341 50535 55109 60097

ShiftTable(QP) 9 8 8 8 8 8 8 8

QP 56 57 58 59 60 61 62 63

DequantTable(QP) 32771 35734 38965 42497 46341 50535 55109 60099

ShiftTable(QP) 7 7 7 7 7 7 7 7

6.5 Inverse transform process

6.5.1 Inverse transform for 4×4 block

Inputs of this process are

— the variables of BitDepth

— a two-dimensional array D with size of 4×4. The elements of the array is dij, with 0≤i

≤3, 0≤j≤3

Output of this process is a two-dimensional array R with size of 4×4. The elements of the array is

rij, with 0≤i≤3, 0≤j≤3

The inverse transform process is equivalent to the following.

First, horizontal transform for the array D is done:

Step 1, with i = 0, 1, 2, 3

ei0 = di0 + di2

ei2 = di0 - di2

t = (di1 + di3)*69>>7

ei1 = t + (di1*98>>7)

ei3 = t - (di3*236>>7)

Data in the bitstream shall ensure that any element dij, t and eij must be in the range of integer

values from -2(BitDepth+7)

to 2(BitDepth+7)

-1, inclusive.

N12355

-60-

Step 2, with i = 0, 1, 2, 3

fi0 = ei0 + ei1

fi3 = ei0 - ei1

fi1 = ei2 + ei3

fi2 = ei2 - ei3

Data in the bitstream shall ensure that any element fij must be in the range of integer values from

-2(BitDepth+7)

to 2(BitDepth+7)

-1, inclusive.

And then, vertical transform for the resulting matrix is done:

Step 1, with j = 0, 1, 2, 3

g0j = f0j + f2j

g2j = f0j - f2j

t = (f1j + f3j)*69>>7

g1j = t + (f1j*98>>7)

g3j = t - (f3j*236>>7)

Data in the bitstream shall ensure that any element gij and t must be in the range of integer values

from -2(BitDepth+7)

to 2(BitDepth+7)

-1, inclusive.

Step 2, with j = 0, 1, 2, 3

h0j = g0j + g1j

h3j = g0j - g1j

h1j = g2j + g3j

h2j = g2j - g3j

Data in the bitstream shall ensure that any element hij must be in the range of integer values from

-2(BitDepth+7)

to 2(BitDepth+7)

-1, inclusive.

At last, after horizontal and vertical transform, the final constructed value is derived as

rij = Sign ( ( Abs( hij ) + 4 )>>3, hij ), with i=0,1…,3, j=0,1,…,3

6.5.2 Inverse transform for 8×8 block

Inputs of this process are

— the variables of BitDepth

— a two-dimensional array D with size of 8×8. The elements of the array is dij, with 0≤i

≤7, 0≤j≤7

Output of this process is a two-dimensional array R with size of 8×8. The elements of the array is

rij, with 0≤i≤7, 0≤j≤7

The inverse transform process is equivalent to the following.

First, horizontal transform for the array D is done:

Step 1, with i = 0, 1, … , 7

ei0 = (di0 + di4)*181>>7

ei1 = (di0 - di4)*181>>7

N12355

-61-

ei2 = (di2*196>>8) - (di6*473>>8)

ei3 = (di2*473>>8) + (di6*196>>8)

ti4 = di1 - di7

ti7 = di1 + di7

ti5 = di3*181>>7

ti6 = di5*181>>7

ei4 = ti4 + ti6

ei5 = ti7 - ti5

ei6 = ti4 - ti6

ei7 = ti7 + ti5

Data in the bitstream shall ensure that any element dij, tij and eij must be in the range of integer

values from -2(BitDepth+7)

to 2(BitDepth+7)

-1, inclusive.

Step 2, with i = 0, 1, … , 7

fi0 = ei0 + ei3

fi3 = ei0 - ei3

fi1 = ei1 + ei2

fi2 = ei1 - ei2

fi4 = (ei4*301>>8) - (ei7*201>>8)

fi7 = (ei4*201>>8) + (ei7*301>>8)

fi5 = (ei5*710>>9) - (ei6*141>>9)

fi6 = (ei5*141>>9) + (ei6*710>>9)

Data in the bitstream shall ensure that any element fij must be in the range of integer values from

-2(BitDepth+7)

to 2(BitDepth+7)

-1, inclusive.

Step 3, with i = 0, 1, … , 7

gi0 = fi0 + fi7

gi7 = fi0 - fi7

gi1 = fi1 + fi6

gi6 = fi1 - fi6

gi2 = fi2 + fi5

gi5 = fi2 - fi5

gi3 = fi3 + fi4

gi4 = fi3 - fi4

Data in the bitstream shall ensure that any element gij must be in the range of integer values from

-2(BitDepth+7)

to 2(BitDepth+7)

-1, inclusive.

And then, vertical transform for the resulting matrix is done:

Step 1, with j = 0, 1, … , 7

h0j = (g0j + g4j)*181>>7

h1j = (g0j - g4j)*181>>7

h2j = (g2j*196>>8) - (g6j*473>>8)

N12355

-62-

h3j = (g2j*473>>8) + (g6j*196>>8)

t4j = g1j - g7j

t7j = g1j + g7j

t5j = g3j*181>>7

t6j = g5j*181>>7

h4j = t4j + t6j

h5j = t7j - t5j

h6j = t4j - t6j

h7j = t7j + t5j

Data in the bitstream shall ensure that any element hij must be in the range of integer values from

-2(BitDepth+7)

to 2(BitDepth+7)

-1, inclusive.

Step 2, with j = 0, 1, … , 7

m0j = h0j + h3j

m3j = h0j - h3j

m1j = h1j + h2j

m2j = h1j - h2j

m4j = (h4j*301>>8) - (h7j*201>>8)

m7j = (h4j*201>>8) + (h7j*301>>8)

m5j = (h5j*710>>9) - (h6j*141>>9)

m6j = (h5j*141>>9) + (h6j*710>>9)

Data in the bitstream shall ensure that any element mij must be in the range of integer values from

-2(BitDepth+7)

to 2(BitDepth+7)

-1, inclusive.

Step 3, with j = 0, 1, … , 7

n0j = m0j + m7j

n7j = m0j - m7j

n1j = m1j + m6j

n6j = m1j - m6j

n2j = m2j + m5j

n5j = m2j - m5j

n3j = m3j + m4j

n4j = m3j - m4j

Data in the bitstream shall ensure that any element nij must be in the range of integer values from

-2(BitDepth+7)

to 2(BitDepth+7)

-1, inclusive.

At last, after horizontal and vertical transform, the final constructed value is derived as

rij = Sign ( ( Abs( nij ) + 16 )>>5, nij ), with i=0,1…,7, j=0,1,…,7

6.6 Intra prediction

In IVC, in order to decode the current intra coded block‟s DC coefficient, first a prediction

value of DC coefficient is got from its neighbouring blocks, and then a DC coefficient differential

N12355

-63-

value is recovered from the coded data which is added to the predictor to recover the final decoded

coefficient.

The DC prediction is performed for intra coded blocks.

6.6.1 Intra prediction modes of DC coefficients

As is shown in table 13, three probable prediction modes are used in coding current block‟s

DC coefficient. The current block‟s prediction mode of intra macroblocks can be get by decoding

syntax elements (intra_mode( intra_mode_8[i], intra_mode_4[i][j])).

Table 13 Intra prediction modes of DC coefficients

Value prediction modes

0 horizontal prediction

1 vertical prediction

2 direct prediction

Horizontal prediction: The DC coefficient of current block can be predicted from its left-hand

block.

Vertical prediction: The DC coefficient of current block can be predicted from its upper block.

Direct prediction: The DC coefficient of current block can be predicted from a predetermined

value: 0.

6.6.2 Getting intra DC coefficients’ prediction values

If the DC coefficient of current block is encoded with prediction (horizontal prediction or

vertical prediction), the prediction values can calculate as follows.

There are four 8x8 blocks in one macroblock and every block‟s DC coefficient can be indicated

by B8_DC_Level. That is B8_DC_Level[j][i] (0≤i,j≤1) which is shown in figure 22.

B8_DC_Level[0][0] B8_DC_Level[0][1]

B8_DC_Level[1][0] B8_DC_Level[1][1]

Figure 22 DC coefficient of 8x8 blocks

The size of the reference block is determined as follows.

N12355

-64-

If current block is an 8x8 block, its neighboring blocks should be regarded as 8x8 block no

matter whether the neighboring blocks use 8x8 spatial prediction or 4x4 spatial prediction.

If current block is a 4x4 block, its neighboring blocks should be regarded as

--If current block and its neighboring block belong to the same 8x8 block, the

neighboring block should be regarded as 4x4 block and the neighboring 4x4 block‟s DC value

equals to the DC value of the 4x4 transform.

--Otherwise, the neighboring blocks should be regarded as 8x8 blocks.

If the block size of the current block is equal to its reference block, the neighboring block‟s

DC value is used as the prediction value. Otherwise, the prediction value equals to one half of the

neighboring block‟s DC value: B8_DC_Level / 2.

If a reference block is entirely intra coded, then it is available for DC prediction; otherwise, it

is treated to be unavailable, and the corresponding mode cannot be used. If the reference block

size is equal to its transform block size, then the DC coefficient is used as the prediction value.

Otherwise, the reference block-size is 8x8 and the transform block-size is 4x4, then the DC of the

8x8 block is derived as:

B8_DC_Level = (B4_DC_Level [0][0]+ B4_DC_Level [0][1]+ B4_DC_Level [1][0]+

B4_DC_Level [1][1]+1)/2

Where B4_DC_Level [i][j] is the DC of the a 4x4 block within the 8x8 block.

6.6.3 Reconstruction

The reconstructed block can be obtained as follows. The transform data f[y][x] shall be

added to the prediction data 128 and saturated to form the final decoded samples d[y][x] as

follows:

for (y=0; y<size; y++) {

for (x=0; x<size; x++) {

d[y][x] = f[y][x]+128;

if (d[y][x] < 0) d[y][x] = 0;

if (d[y][x] > 255) d[y][x] = 255;

}

}

6.7 Inter prediction

Inter prediction creates a prediction model from one or more previously decoded video

frames. Then the current frame is got by adding decoded residual to the prediction model. The

process of inter prediction is shown in figure 23.

Intra coding techniques of Inter frame can refer to 6.6.2.2.

Under the two circumstances a block has no coefficients. One is skip mode and the other is

N12355

-65-

when the current coefficients are all equals to zero. So the residual f[y][x] is zero and the decoded

picture is actually the predicted picture p[y][x].

Fra mestore

Addressing

P rediction

Field/Fra me

Se le ction

Vec tor

Dec oding

Additiona l

Dua l-P rime

Arithmetic

Fra mestore s

Half-pe l

P rediction

Filte ring

Sat

ura

tion

Vec tor

P redictors

From

Bitstream

Dec ode d

P els

f[y][x] d[y][x]

p[y][x]

ve ctor[r][s][t]

Half-P el

Info.Combine

P redictions

Sc aling

for Colour

Compone nts

ve ctor' [r][s][t]

Figure 23 A simplified motion compensation process

6.7.1 Inter prediction modes

For each coding block (16x16, 8x8, or 4x4) The prediction mode is derived from

SpatialTemporalDirection as defined in Table 14.

Table 14 Prediction mode

SpatialTemporalDirection

MvNum PredMode

0 0 intra

1 1 FWD

2 1 BWD

3 2 BI

6.7.2 Frame prediction modes selection

Method of this section is to determine which frame is chosen as the predicted value.

P frame uses one forward frame as reference.

B frame uses the neighbouring forward and backward P frame as reference.

The relation between blocksize and DCT transform is as follows:

N12355

-66-

For 16x16 block, 8x8 transform is performed;

For 8x8 block, 8x8 transform is performed;

For 4x4 block, 4x4 transform is performed.

6.7.3 Motion vectors

When coding motion vectors, only the differentials between motion vectors and their predicted

ones are coded. In order to decode them, the decoder should save four motion vectors (every

motion vector has one horizontal component and one vertical component) labelled as

PMV[r][s][t]. For every predicted value, firstly, its corresponding motion vector is derived

labelled as vector‟[r][s][t]. Then the motion vector is scaled depending on video signal‟s format

and finally we get the motion vector vector[r][s][t]. Table 15 shows the index‟s meaning in

PMV[r][s][t], vector’[r][s][t] and vector[r][s][t].

Table 15 Meanings of index in PMV[r][s][t], vector[r][s][t] and vector[r][s][t]

0 1

r the first motion vector in current

macroblock

the second motion vector in current

macroblock

s forward motion vector backward motion vector

t horizontal component vertical component

Note: r can be 2 or 3 which indicates current macroblock‟s third and fourth motion

vector.

6.7.4 Luma motion vectors prediction

If the current macroblock mode is skip, the motion vectors prediction please refer to 6.7.5.

Else if current block‟s left-hand block size is 16x16 and available, the predicted value of luma

motion vector is equal to its left-hand 16x16 block‟s motion vector.

Else if its left-hand block size is 8x8 and available, the predicted value of luma motion vector

is equal to its left-hand 8x8 block‟s motion vector.

Else if its left-hand block size is 4x4 and available, the predicted value of luma motion vector

is equal to its left-hand 4x4 block‟s motion vector.

Else if the left block isn‟t available or uses intra prediction mode, the prediction value is 0.

6.7.4.1 Decoding luma motion vectors

The current block‟s motion vector is equal to the sum of predicted motion vector and the

differentials decoded by mv_diff_x and mv_diff_y. If the current macroblock or subblock mode is

skip, then the motion vector is the predicted one.

N12355

-67-

6.7.4.2 Resetting motion vector predictors

All motion vector predictors shall be reset to zero in the following cases:

At the start of each slice.

Whenever an intra macroblock is decoded, the motion vector is 0.

6.7.4.3 Motion vectors for chrominance components

Motion vectors for chrominance components can get by scaling the luminance component.

If the current block is an intra block, chrominance components need to do intra prediction.

Please refer to 6.6.3.

If the current block is not an intra block,

If the current block size is 16x16 or 8x8, both the horizontal and vertical components of

the motion vector are scaled by dividing by two. That is

vector[r][s][0] = vector‟[r][s][0] / 2;

vector[r][s][1] = vector‟[r][s][1] / 2;

If the current block size is 4x4, we choose the 8x8 block‟s first 4x4 block as reference.

Both the horizontal and vertical components of the motion vector are scaled by dividing the

reference by two.

6.7.5 Forming predictors

Predictions are formed by reading prediction samples from the reference fields or frames. A

given sample is predicted by reading the corresponding sample in the reference field or frame

offset by the motion vector.

A positive value of the horizontal component of a motion vector indicates that the prediction

is made from samples (in the reference field/frame) that lie to the right of the samples being

predicted. A positive value of the vertical component of a motion vector indicates that the

prediction is made from samples (in the reference field/frame) that lie the below the samples being

predicted.

All motion vectors are specified to an accuracy of one half sample. Thus if a component of

the motion vector is odd, the samples will be read from mid-way between the actual samples in the

reference field/frame. These half-samples are calculated by simple linear interpolation from the

actual samples.

For each prediction block the integer sample motion vectors int_vec[t] and the half sample

flags half_flag[t] shall be formed as follows;

for (t=0; t<2; t++) {

int_vec[t] = vector[r][s][t] DIV 2;

if ((vector[r][s][t] - (2 * int_vec[t]) != 0)

half_flag[t] = 1;

else

N12355

-68-

half_flag[t] = 0;

}

Then the final predicted value is calculated as follows:

if ( (! half_flag[0] )&& (! half_flag[1]) )

pel_pred[y][x] = pel_ref[y + int_vec[1]][x + int_vec[0]] ;

if ( (! half_flag[0] )&& half_flag[1] )

pel_pred[y][x] = ( pel_ref[y + int_vec[1]][x + int_vec[0]] +

pel_ref[y + int_vec[1]+1][x + int_vec[0]] ) // 2;

if ( half_flag[0]&& (! half_flag[1]) )

pel_pred[y][x] = ( pel_ref[y + int_vec[1]][x + int_vec[0]] +

pel_ref[y + int_vec[1]][x + int_vec[0]+1] ) // 2;

if ( half_flag[0]&& half_flag[1] )

pel_pred[y][x] = ( pel_ref[y + int_vec[1]][x + int_vec[0]] +

pel_ref[y + int_vec[1]][x + int_vec[0]+1] +

pel_ref[y + int_vec[1]+1][x + int_vec[0]] +

pel_ref[y + int_vec[1]+1][x + int_vec[0]+1] ) // 4;

where pel_pred[y][x] is the prediction sample being formed and pel_ref[y][x] are samples in

the reference field or frame.

6.7.6 Skipped mode macroblocks

A skipped macroblock is a macroblock for which no residual data is encoded. Except at the

start of a slice, if the number (macroblock_address - previous_macroblock_address - 1) is larger

than zero then this number indicates the number of macroblocks that have been skipped. The

decoder shall form a prediction for skipped macroblocks which shall then be used as the final

decoded sample values. A skipped macroblock should be derived as follows.

The coding block-size should be 16x16. If the left block exists and is not intra coded, the block

mode should be equal to the mode of the left block. Otherwise, if the picture type is P, the block mode

should be forward; if the picture type is B, it should be bi-directional. The MVD equals to 0. The

residue block is an all-zero block.

6.7.7 Combining predictions

The final stage is to combine the various predictions together in order to form the final

prediction blocks. For B frames, if bi-direction prediction is executed, the final prediction value

should be an average of forward and backward prediction. If forward prediction is denoted as

pel_pred_forward[y][x] and backward prediction is pel_pred_backward[y][x], then the final

prediction can be calculated as:

N12355

-69-

pel_pred[y][x] = (pel_pred_forward[y][x] + pel_pred_backward[y][x])//2;

6.7.8 Adding prediction and coefficient data

The prediction blocks have been formed and added to its corresponding residuals to get

reconstructed picture. The transform data f[y][x] shall be added to the prediction data p[y][x] and

saturated to form the final decoded samples d[y][x] as follows;

for (y=0; y<size; y++) {

for (x=0; x<size; x++) {

d[y][x] = f[y][x]+p[y][x];

if (d[y][x] < 0) d[y][x] = 0;

if (d[y][x] > 255) d[y][x] = 255;

}

}

N12355

-70-

7 Description of the Internet Video

Coding Encoder

7.1 General Coding Structure

The coding structure of the IVC is similar to MPEG-1, and the codec is royalty free, while

providing better coding performance compared with MPEG-2. The key technologies used in the

current Test Model are listed as follows:

Integer DCT transforms: transform sizes of 4x4 and 8x8 are supported. 16-bit

implementation is supported.

Quad-tree based variable block-size coding: the macro-block (MB) size is 16x16. The

MB is tiled to coding blocks in a quad-tree style. Inter coding supports 16x16, 8x8 and

4x4; intra coding supports 8x8 and 4x4.

QMCoder for entropy coding: the classic QMCoder is used for entropy coding. This is

the same as JPEG, Annex D.

Motion accuracy of 1/2 pel with 2-tap interpolation filter: a simple 2-tap interpolation

filter is used for sub-pel MC.

IBBP structure: I/B/P frames are supported, and the number of B frames is defined in the

sequence header.

Figure24 shows the coding process of this proposal. It is similar to MPEG-1, but with JPEG

arithmetic coding instead of VLC coding. Each coding tool will be discussed in details in this section.

Block Segment

Transform

Quantization

Entropy Coding

Intra DC Prediction

Inter Prediction

Intra ?Yes No

Transform

Quantization

N12355

-71-

Figure 24. Coding Process

7.2 Picture Partitioning

7.2.1 Macroblock

The basic unit of video decoding in this part is macroblock. A macro block consists of a

1616 luminance block and corresponding chroma blocks. Macroblock can be further divided to

88 block and 4x4 block to perform the prediction.

7.2.2 Slice

Slice is a series of one or more macroblocks in the order of raster scan. Macroblocks of a

slice shall not overlap and also slices shall not overlap. The position of slices may change from

picture to picture. The decoding process of a macroblock inside a slice should not use data in the

other slices of the same picture.

7.3 Intra Prediction

One intra coded macroblock is divided into four 8x8 intra blocks. Each 8x8 intra block can be

coded as either one 8x8 block or four separate 4x4 blocks. The structure is shown in Figure 25. For

chroma, only the 8x8 block-size is used.

8x8

8x88x8

4x4

4x44x4

4x4

8x8

Figure 25. Quad-tree segmentation for intra coding

If one macroblock is intra coded, all the blocks with it are intra coded. Otherwise, a block mode is

signaled for each block, and if the block mode is intra, this block is intra coded. Encoders can choose to

encode a picture in which all the macroblocks are intra coded.

N12355

-72-

Spatial prediction is not employed. The value 128 is used as the prediction value for each pixel in

an intra coded block. Intra coded blocks are transformed directly, and the DC coefficient is predicted

from the DC coefficient of a neighboring block. This block is referred as the reference block. As it

shown in Table 6, three prediction modes for DC can be used.

Table 16 Prediction modes for intra DC

Prediction mode Prediction value

Left DC of the left block

Up DC of the up block

None 0

The size of the reference block is determined as follows.

1. If current block is an 8x8 block, its neighboring blocks should be regarded as 8x8 block no

matter whether the neighboring blocks use 8x8 spatial prediction or 4x4 spatial prediction.

2. If current block is a 4x4 block, its neighboring blocks should be regarded as

a) If current block and its neighboring block belong to the same 8x8 block, the

neighboring block should be regarded as 4x4 block and the neighboring 4x4 block‟s

DC value equals to the DC value of the 4x4 transform.

b) Otherwise, the neighboring blocks should be regarded as 8x8 blocks.

That means, in most cases, the reference block size is 8x8. If a reference block is entirely intra

coded, then it is available for DC prediction; otherwise, it is treated to be unavailable, and the

corresponding mode cannot be used. If the reference block size is equal to its transform block size, then

the DC coefficient is used as the prediction value. Otherwise, the reference block-size is 8x8 and the

transform block-size is 4x4, then the DC of the 8x8 block is derived as:

B8_DC = (B4_DC[0][0]+ B4_DC [0][1]+ B4_DC [1][0]+ B4_DC [1][1]+1)/2

Where B4_DC[i][j] is the DC of the a 4x4 block within the 8x8 block.

7.4 Inter Prediction

If the macroblock is not intra coded, the macroblock can be segmented into intra coded blocks and

inter coded blocks in a quad-tree structure. An example is shown in Figure 27.

For inter coded blocks, the coding block-size is the temporal prediction block-size. For temporal

prediction, block sizes of 16x16, 8x8 and 4x4 are supported. One macroblock can be temporally

predicted as a whole, i.e., a 16x16 block (inter16x16), or split into four 8x8 blocks. And each 8x8 block

can be coded as a whole (intra8x8 or inter8x8), or further split into four 4x4 blocks. Each 4x4 block

can be either intra coded or inter coded separately. The structure is shown in Figure 26. The temporal

prediction block-size of chroma is half of the luma block-size.

N12355

-73-

8x8 Intra

8x8 Inter8x8 Inter

4x4

Intra

4x4

Inter

4x4

Inter

4x4

Intra

Figure 26. An example of the quad-tree segmentation

8x8

8x88x8

4x4

4x44x4

4x4

8x8

16x16

Figure 27. Quad-tree segmentation for inter prediction

7.4.1 Motion vector prediction

While coding an MV, a predicted MV (MVP) is first generated, and the differential (MVD) is coded.

If the left neighboring block of the current block is available, and an MV with the same direction

(forward or backward) is used for the left block, this MV is used as the MVP of current MV.

Otherwise, 0 is used as the MVP.

N12355

-74-

7.4.2 Skip Mode

One bit is signaled for each macroblock, indicating if it is skipped. A skipped macroblock should be

derived as follows.

The coding block-size should be 16x16. If the left block exists and is not intra coded, the block mode

should be equal to the mode of the left block. Otherwise, if the picture type is P, the block mode should

be forward; if the picture type is B, it should be bi-directional. The MVD equals to 0. The residue block

is an all-zero block.

7.5 Transform

IVC supports 4x4 and 8x8 transforms. Discrete Cosine Transform (DCT) is used for the separable

two-dimensional transform. There are no scale factors for coefficients since the transform is

orthonormal. Low complexity butterfly structure for 4-point and 8-point transforms is used. Moreover,

the design of the transform is fully recursive.

7.5.1 1-D 4-point forward transform

The butterfly structure of 4x4 1-D DCT is given as below, with “x” as input and “X” as output.

+

+

+

+-

+

+

+

+

×

×

××

A

A

B

X0

X2

X1

X3

>>1

>>1

>>1

>>1

x0

x1

x2

x3

The irrational numbers of the parameters in the butterfly structure are approximated with rational

numbers as follows.

128167832

12869832

/)/sin(

/)/cos(

B

A

7.5.2 1-D 8-point forward transform

The butterfly structure of 8x8 1-D DCT is given as below, with “x” as input and “X” as output.

N12355

-75-

+

+

+

+

+

+

+

+

x0

x1

x2

x3

x4

x5

x6

x7

+

+

+

+

+

+

+

+-

×

×

××

×

×

×

×

C

C

E

EF

F

D

+

+

+

+

×

×

××

A

A

B

+

+

+

+

G

G

G

G

X0

X4

X2

X6

X1

X5

X3

X7

>>2

>>2

>>2

>>2

>>2

>>2

>>2

>>2

The irrational numbers of the parameters in the butterfly structure are approximated with rational

numbers as follows.

1281812

25620116

32256301

16

32

51214116

12512710

16

12

25647316

22256196

16

22

/

/)sin(,/)cos(

/)sin(,/)cos(

/)cos(,/)sin(

G

DC

FE

BA

7.6 Quantization

The QP range is from 0 to 63 and Table lists the parameters in the encoder side. The

quantization process is defined as follows with 16-bit precision.

inter,/)(

intra,/)(

)][_(

6210151

3110151

15

offset

offsetQPTABQCCq

Where C is the coefficient after transform and Cq the coefficient after quantization.

Table 17. The value of Q_TAB.

QP 0 1 2 3 4 5 6 7

Q_TAB 32768 29775 27554 25268 23170 21247 19369 17770

N12355

-76-

QP 8 9 10 11 12 13 14 15

Q_TAB 16302 15024 13777 12634 11626 10624 9742 8958

QP 16 17 18 19 20 21 22 23

Q_TAB 8192 7512 6889 6305 5793 5303 4878 4467

QP 24 25 26 27 28 29 30 31

Q_TAB 4091 3756 3444 3161 2894 2654 2435 2235

QP 32 33 34 35 36 37 38 39

Q_TAB 2048 1878 1722 1579 1449 1329 1218 1117

QP 40 41 42 43 44 45 46 47

Q_TAB 1024 939 861 790 724 664 609 558

QP 48 49 50 51 52 53 54 55

Q_TAB 512 470 430 395 362 332 304 279

QP 56 57 58 59 60 61 62 63

Q_TAB 256 235 215 197 181 166 152 140

7.6.1 Quantization parameter for Luma

If current block is a luma one, the quantization parameter QP of this block (i.e. QPL) is equal to

the QP of the current Macroblock (i.e. QPMB).

7.6.2 Quantization parameter for Chroma

If current block is a chroma one, the relationship between the quantization parameter QP of this

block (i.e. QPC) and QPMB is given in table 18.

Table 18 The relationship between QPC and QPMB

QPMB <43 43 44 45 46 47 48 49 50 51 52

QPC QPMB 42 43 43 44 44 45 45 46 46 47

QPMB 53 54 55 56 57 58 59 60 61 62 63

QPC 47 48 48 48 49 49 49 50 50 50 51

N12355

-77-

7.7 Entropy Coding

IVC employs a QM Coder, which is the same as Annex D of the JPEG standard (ISO/IEC

10918-3). For coefficients coding, the coefficients are coded in a zigzag order. An eobflag is coded at

the beginning of a block, and after each coefficient, to indicate if there are more coefficients after it.

7.7.1 Binarization and Context model Selection (CS)

Signed Unary code, Truncated Unary code, Fixed Length code, Signed Exp-Golomb code and

flag are used for the binarization. The binarization of all the syntax elements is given in Table 19.

Table 19 Binarization of syntax elements.

Syntax elements Binarization CS

macroblockSkipFlag flag 1context model

mbQPDelta Signed Unary code Sec. 7.7.1.1

blockSize flag 1 context model

mbSpatialTemporalDirection Fixed Length code (2-bin) 1context model for

each bin

mvDiffX Signed Zero-order Exp-Golomb

code

Sec. 7.7.1.2

mvDiffY Signed Zero-order Exp-Golomb

code

Sec. 7.7.1.2

chromaIntraMode Truncated Unary code (1 or

2-bin)

1context model for

each bin

subBlockSize flag 1context model

lumaIntraMode Truncated Unary code (1 or

2-bin)

1context model for

each bin

subMBSpatialTemporalPredictionDire

ction Fixed Length code (2-bin)

1context model for

each bin

blockSpatialTemporalPredictionDirect

ion Fixed Length code (2-bin)

1context model for

each bin

eobFlag flag Sec. 7.7.1.3

transCoefficient Signed Zero-order Exp-Golomb

code

Sec. 7.7.1.4

N12355

-78-

7.7.1.1 CS for mbQPDelta

4 context models are used. For the first 3 bins, each bin has one context model while the rest bins

share the fourth context model.

7.7.1.2 CS for mvDiffX, mvDiffY

5 context models are used for mvDiffX. For the first 4 bins, each bin has one context model while

the rest bins share the fourth context model. Another 5 context models are used for mvDiffY, with the

same CS as mvDiffX.

7.7.1.3 CS for eobFlag

There are 16, 64 and 64 context models used in 4x4 Luma transform block, 8x8 Luma transform

block and 8x8 Chroma transform block, respectively. The model selection is dependent on the position

in one block.

7.7.1.4 CS for transCoefficient

In an NxN transform block, for the first M coefficients according to the forward scan, each

coefficient has 6 context models, respectively.The rest coefficients share another 6 context models. The

value of M and N are given in Table 20.

For the first five bins of one coefficient, each bin uses one context model, respectively. The rest

bins shared one context model.

Table 20 Context models for different transform block.

N M Model numbers

4x4 luma transform block 4 8 8x6 + 6 = 54

8x8 luma transform block 8 32 32x6 + 6 = 198

8x8 chroma transform block 8 32 32x6 + 6 = 198

7.7.2 Initialization

All the context models are initialized with equal probability.

N12355

-79-

7.8 Encoder configurations

7.8.1 Constraint set 1 configuration

For satisfying constraint set 1, structural delay of processing units is restricted to be no larger than

8-picture "group of pictures (GOPs)" and random access intervals is restricted to be 1.1 seconds or less.

The encoder is configured as follows:

IBBP coding structure

Random access intervals is restricted to be 1.1 seconds or less.

Fixed QP assignment: QP for I, QP+2 for P, QP+5 for B

1 forward reference picture & 1 backward reference picture

RD Optimization enabled

Fast motion estimation (UMHexagon Search)

RDOQ ensabled

7.8.2 Constraint set 2 configuration

For satisfying constraint set 2, no picture reordering is allowed between decoder processing and

output, with bit rate fluctuation characteristics and no multi-pass encoding. The encoder is configured

as follows:

IPPP coding structure

Fixed QP assignment: QP for I, QP+2 for P

1 forward reference picture

RD Optimization enabled

Fast motion estimation (UMHexagon Search)

RDOQ enabled

N12355

-80-

Annex A VLC coding table

Arithmetic coding probability distribution table in IVC is the same as that in JPEG Annex-D

(ISO/IEC 10918-3 ). Equal probability estimation distribution table is shown in table [A.1] and

standard probability estimation state machine in table [A.2]

Table [A.1]: eq_prob_fsm probability estimation distribution table

ID Lps_interval next_state_lps next_state_mps do_switch_mps

0 0x5555 0 0 0

Table [A.2]: standard_prob_fsm probability estimation distribution table

ID Lps_interval next_state_lps next_state_mps do_switch_mps

0 0x5a1d 1 1 1

1 0x2586 14 2 0

2 0x1114 16 3 0

3 0x080b 18 4 0

4 0x03d8 20 5 0

5 0x01da 23 6 0

6 0x00e5 25 7 0

7 0x006f 28 8 0

8 0x0036 30 9 0

9 0x001a 33 10 0

10 0x000d 35 11 0

11 0x0006 9 12 0

12 0x0003 10 13 0

13 0x0001 12 13 0

14 0x5a7f 15 15 1

15 0x3f25 36 16 0

16 0x2cf2 38 17 0

17 0x207c 39 18 0

18 0x17b9 40 19 0

19 0x1182 42 20 0

20 0x0cef 43 21 0

21 0x09a1 45 22 0

22 0x072f 46 23 0

N12355

-81-

23 0x055c 48 24 0

24 0x0406 49 25 0

25 0x0303 51 26 0

26 0x0240 52 27 0

27 0x01b1 54 28 0

28 0x0144 56 29 0

29 0x00f5 57 30 0

30 0x00b7 59 31 0

31 0x008a 60 32 0

32 0x0068 62 33 0

33 0x004e 63 34 0

34 0x003b 32 35 0

35 0x002c 33 9 0

36 0x5ae1 37 37 1

37 0x484c 64 38 0

38 0x3a0d 65 39 0

39 0x2ef1 67 40 0

40 0x261f 68 41 0

41 0x1f33 69 42 0

42 0x19a8 70 43 0

43 0x1518 72 44 0

44 0x1177 73 45 0

45 0x0e74 74 46 0

46 0x0bfb 75 47 0

47 0x09f8 77 48 0

48 0x0861 78 49 0

49 0x0706 79 50 0

50 0x05cd 48 51 0

51 0x04de 50 52 0

52 0x040f 50 53 0

53 0x0363 51 54 0

54 0x02d4 52 55 0

N12355

-82-

55 0x025c 53 56 0

56 0x01f8 54 57 0

57 0x01a4 55 58 0

58 0x0160 56 59 0

59 0x0125 57 60 0

60 0x00f6 58 61 0

61 0x00cb 59 62 0

62 0x00ab 61 63 0

63 0x008f 61 32 0

64 0x5b12 65 65 1

65 0x4d04 80 66 0

66 0x412c 81 67 0

67 0x37d8 82 68 0

68 0x2fe8 83 69 0

69 0x293c 84 70 0

70 0x2379 86 71 0

71 0x1edf 87 72 0

72 0x1aa9 87 73 0

73 0x174e 72 74 0

74 0x1424 72 75 0

75 0x119c 74 76 0

76 0x0f6b 74 77 0

77 0x0d51 75 78 0

78 0x0bb6 77 79 0

79 0x0a40 77 48 0

80 0x5832 80 81 1

81 0x4d1c 86 82 0

82 0x438e 89 83 0

83 0x3bdd 90 84 0

84 0x34ee 91 85 0

85 0x2eae 92 86 0

86 0x299a 93 87 0

N12355

-83-

87 0x2516 86 71 0

88 0x5570 88 89 1

89 0x4ca9 95 90 0

90 0x44d9 96 91 0

91 0x3e22 97 92 0

92 0x3824 99 93 0

93 0x32b4 99 94 0

94 0x2e17 93 86 0

95 0x56a8 95 96 1

96 0x4f46 101 97 0

97 0x47e5 102 98 0

98 0x41cf 103 99 0

99 0x3c3d 104 100 0

100 0x375e 99 93 0

101 0x5231 105 102 0

102 0x4c0f 106 103 0

103 0x4639 107 104 0

104 0x415e 103 99 0

105 0x5627 105 106 1

106 0x50e7 108 107 0

107 0x4b85 109 103 0

108 0x5597 110 109 0

109 0x504f 111 107 0

110 0x5a10 110 111 1

111 0x5522 112 109 0

112 0x59eb 112 111 1

N12355

-84-

Annex B Profiles and levels

Profiles and levels provide a means of defining subsets of the syntax and semantics of this

Specification and thereby the decoder capabilities required to decode a particular bitstream. A

profile is a defined sub-set of syntax, semantics and algorithm that is defined by this Specification.

Decoder conforms to one profile should support the sub-set defined by this profile totally. A level

is a defined set of constraints imposed on syntax element and syntactic element parameters.

Conformance tests will be carried out against defined profiles at defined levels. Given the profile,

different level means different acquirements for decoding ability and memory capacity.

In this clause the constrained parts of the defined profiles and levels are described. All

syntactic elements and parameter values which are not explicitly constrained may take any of the

possible values that are allowed by this Specification. In general, a decoder shall be deemed to

be conformant to a given profile at a given level if it is able to properly decode all allowed values

of all syntactic elements as specified by that profile at that level. One exception to this rule

exists in the case of a Simple profile Main level decoder, which must also be able to decode Main

profile, Low level bitstreams. A bitstream shall be deemed to be conformant if it does not exceed

the allowed range of allowed values and does not include disallowed syntactic elements.

Profile_id and level_id define profile and level in the bitsream.

B.1 Profile

Profile defined in this part is shown in table [B.1].

Table [B.1] profile

profile_id profile

0x00 forbidden

0x20 ???baseline???

others reserved

For one given profile, different level supports different sub-set of syntax.

Bitstream in option-1 baseline profile should meet the following requirements:

1) Profile_id should be 0x20.

2) Chroma_format should be „01‟or‟10‟.

3) Level constraints provided in 9.1.3.

IVC baseline supports level 4.0 and 4.2.

B.2 Level

Level defined in this part is shown in table [B.2]

N12355

-85-

Table [B.2] Level

level_id level

0x00 forbidden

0x10 2.0

0x20 4.0

level_id level

0x22 4.2

0x40 6.0

0x42 6.2

others reserved

B.3 Level constraints independent of profiles

For all the profiles, the maximum bits constraints for one coded macroblock is shown in table

[B.3].

Table [B.3] maximum bits constraints for one coded macroblock

picture format maximum bits

4:2:0 128 + 25681.5 = 3200

4:2:2 128 + 25682 = 4224

Table [B.4], [B.5] and [B.6] give other constraints.

Table [B.4] parameter constraints in level

Parameter

level

2.0

maximum samples in a row 352

maximum rows in a frame 288

maximum frames per second 30

luma sample rate 2,534,400

N12355

-86-

maximum bit rates (bit/s) 1,000,000

BBV buffer (bits) 122,880

maximum number of macroblock per frame 396

maximum number of macroblock per second 11,880

maximum vertical motion vector confines in frame coding(luma sample numbers)

[-128, +127.75]

maximum vertical motion vector confines in field coding(luma sample numbers)

-

maximum horizontal motion vector confines (luma sample numbers)

[-2048, +2047.75]

picture format 4:2:0

Table [B.5] parameter constraints in level

parameter

level

4.0 4.2

maximum samples in a row 720 720

maximum rows in a frame 576 576

maximum frames per second 30 30

luma sample rate 10,368,000 10,368,000

maximum bit rates (bit/s) 10,000,000 15,000,000

BBV buffer (bits) 1,228,800 1,851,392

maximum number of macroblock per frame 1,620 1,620

maximum number of macroblock per second 40,500 40,500

maximum vertical motion vector confines in frame coding(luma sample numbers)

[-256, +255.75] [-256, +255.75]

maximum vertical motion vector confines in field coding(luma sample numbers)

[-128, +127.75] [-128, +127.75]

maximum horizontal motion vector confines (luma sample numbers)

[-2048, +2047.75] [-2048, +2047.75]

picture format 4:2:0 4:2:0 or 4:2:2

Table [B.6] parameter constraints in level

parameter

level

6.0 6.2

N12355

-87-

maximum samples in a row 1,920 1,920

maximum rows in a frame 1,152 1,152

maximum frames per second 60 60

luma sample rate 62,668,800 62,668,800

maximum bit rates (bit/s) 20,000,000 30,000,000

BBV buffer (bits) 2,457,600 3,686,400

maximum number of macroblock per frame 8,160 8,160

maximum number of macroblock per second 244,800 244,800

maximum vertical motion vector confines in frame coding(luma sample numbers)

[-512, +511.75] [-512, +511.75]

maximum vertical motion vector confines in field coding(luma sample numbers)

[-256, +255.75] [-256, +255.75]

maximum horizontal motion vector confines (luma sample numbers)

[-2048, +2047.75] [-2048, +2047.75]

picture format 4:2:0 4:2:0 or 4:2:2

Note: Syntactic elements relevant to table 23, 24 and 25 are horizontal_size, vertical_size, frame_rate_code

and chroma_format.