Spatial-Domain Transcoder with Reused Motion Vectors in H.264/AVC

8/3/2019 Spatial-Domain Transcoder with Reused Motion Vectors in H.264/AVC

1/12

Spatial-Domain Transcoder with Reused Motion Vectors in H.264/AVCClass Report, (ICE815) Special Topics on Image Engineering, Fall 2006

Wonsang You

KAIST ICC

Munjiro, Yuseong-gu, Daejeon, 305732, Korea

[email protected]

1. IntroductionTranscoding architectures for compressed videos can be classified as a spatial-domain transcoding architecture

(SDTA), a frequency-domain transcoding (FDTA), and a Hybrid-domain transcoding architecture (HDTA). A spatial-

domain transcoding architecture has the advantage that it can achieve more flexible transcoding. Since most parts of the

encoder module and the decoder module are separated each other, this transcoder can perform the transcoding in

quantization step-size, picture resolution, and so on. However, this type of transcoders also suffers from high computa-tional complexity.

To reduce its computational complexity, common information of the decoder module such as motion vectors can be

reused by the encoder module. Since the motion estimation is the most time-consuming procedure, it is very effective to

reduce the transcoding time.

This report provides an explanation about the software implementation of H.264/AVC spatial-domain transcoder with

motion vector reused. . I introduce the spatial-domain transcoding architecture (SDTA) of H.264/AVC transcoder and

the methodology of its software implementation. It is shown that it can perform the transcoding in quantization

parameter faster than the basic form of spatial-domain transcoding.

2. The Structure of TranscoderThe spatial-domain transcoding architecture with motion vector reused is the transcoding form that its decoder module

shares the motion information with the encoder module. Since the encoder module receives total motion information

from the decoder module, the encoder module does not need to perform the motion estimation procedure.

Fig. 1(b) shows a spatial-domain transcoding architecture that reuses the motion information. Unlike the basic form of

spatial-domain transcoding that performs the motion estimation as shown in Fig. 1(a), it does not contain the motion

estimation procedure. Instead, it uses repeatedly the motion information among the data decoded by its decoder module

to perform the motion compensation and the variable length coding in the encoder module.

The motion compensation includes the estimation of motion vectors as well as the reconstruction of each picture from

the motion information and reference pictures. Motion vectors are estimated to reduce the bit-rate of encoded bitstream;

that is, the difference between the original motion vector and the predicted motion vector is just encoded by the entropy

coding.

It needs to be noticed that the motion information is encoded by the variable length coder. Three components of the

motion information such as motion vector, reference index, and macroblock type are encoded by different types of the

variable length coder. For example, the macroblock types and the reference indices are encoded by the unsigned

exponential Golomb coding; on the other hand, the motion vectors are encoded by the signed exponential Golomb

coding.

As shown in Fig. 1(a), the spatial-domain transcoding architecture with motion vector reused can include the optional

function for adjusting the spatial or temporal resolution. This optional function consists of two modules such as

spatial/temporal resolution reduction (STR) module and MV composition and refinement (MVCR) module. STR

supports the input bitstream to be transcoded with reduced resolution; MVCR adjusts motion vectors according to

transcoded resolution. In our report, the trasncoder is designed without STR and MVCR since we put our purpose in

checking the change of quantization parameters instead of resolution.


2/12

(a)

(b)

Fig. 1. The spatial-domain transcoding architectures: (a) with the basic form and (b) with motion vector reused

Like the basic form of spatial-domain trasncoder, the spatial-domain transcoder with motion vector reused the input

bitstream to be fully decoded by the decoder module which is inside the transcoder; then, the raw data which is

generated by a decoder module is again used for the motion compensation along with motion information in thecascaded encoder module. While the basic transcoder delivers just the decoded raw data to the encoder module, the new

transcoder with motion vector reused delivers not only the decoded raw data but also motion information to the encoder

module. Nevertheless, it reduces the computation time effectively since the motion estimation which is the most time-

consuming procedure is not necessary for motion compensation.

3. BackgroundA.. The Structure of the Transcoder

In this section, a software implementation of H.264/AVC transcoder, which is made using the reference software JM

10.2, is introduced in detail. The brief procedure of this software is shown in Fig. 2.


3/12

Fig. 2. The brief procedure of H.264/AVC transcoder

The configuration file and the encoded input file are opened and read by two functions: Configure() and

OpenBitstreamFile(). After saving the input data, buffers for the motion information and raw data, which will be

decoded from the decoder module, are allocated as four variables such as YUVb, dec_ref, dec_mv, and B8mode by the

function allocate_buffer_for_encoder(). YUVb is the buffer for saving the decoded raw data. dec_ref and

dec_mv are the buffers for reference indices and motion vectors. B8mode is the buffer for 8x8 sub-block modes in a

macroblock. Next, the initialization of encoder module is performed by the function initialize_encoder_main().

Then, the main transcoding procedure is performed in every frame by the function decode_one_frame() whichcontains the procedure of decoding all slices. This function is ended by the function exit_picture() which contains

the deblocking function DeblockPicture(). The deblocking function includes the buffer-writing function write

IntoFile() which performs writing the decoded raw data and the motion information into temporary buffer.

Especially, the motion information is stored in a buffer by the function store_motion_into_buffer(). This buffer-

writing function includes the frame-encoding function encode_sequence() which resides in the function

enc_encode_single_frame(). Such relationship of functions is shown in Fig. 2.

B. Variable Length Coding

In this section, it is shown how the variable length coding is performed in the JM reference software. Since this

mechanism informs what variables are related to motion information, it is necessary to know the structure of the

variable length coding and motion compensation.

main()

decode_one_frame()

exit_picture()

DeblockPicture()

writeIntoFile()

enc_encode_single_frame()

encode_sequence()

Start

All FramesEncoded?

N

Y

Open Configuration File

End

Open Encoded Input File

Initialize Transcoder

Decode a Frame

Finalize Transcoder

Write Data into Buffers

Encode a Frame

FrameNum++

Configure()

OpenBitstreamFile()

Initialize_encoder_main()

decode_one_frame()

writeIntoFile()

enc_encode_single_frame()

Allocate Buffersallocate_buffer_for_encoder()


4/12

The variable length coding exists in both the decoder module and the encoder module. It should be noticed that the

variable length coding procedure is performed in the unit of macroblock. In the decoder module, the variable length

decoding (VLD) is performed by the function read_one_macroblock() which reads the macroblock information

such as macroblock type, intra prediction mode, and motion vectors from the encoded bitstream. On the other hand, in

the encoder module, the variable length encoding (VLE) is performed by the function writeMBLayer() which writes

the syntax elements of a macroblock into the bitstream. These syntax elements include macroblock type, intra prediction

mode, coded block pattern, quantization parameter, motion information, and residual block data.

The summarized algorithm of the VLD function read_one_macroblock() is shown in Fig. 3. In this function, the

macroblock type is stored as the variable currMB->mb_type after it is read from the bitstream. Then, the motion vector

mode and the predicting direction are extracted from this macroblock type by the functions interpret_mb_mode_I()

or interpret_mb_mode_P(). The motion vector mode represents how an inter-coded macroblock is separated by the

sub-block such as 8x4, 4x8, and so on; it is assigned to the variable currMB->b8mode. Likewise, the predicting

direction is assigned to the variable currMB->b8pdir.

In the meantime, the motion information is read by the function readMotionInfoFromNAL() which includes the

prediction module of motion vectors. Fig. 3 shows the abbreviated algorithm of this function. This function read the

motion vectors and the reference indices. While the reference indices are assigned to the variable dec_picture-

>ref_idx, the motion vectors are assigned to the variable dec_picture->mv. Actually, the motion vector information

interpreted from the bistream is intrinsically the motion vector difference between the original motion vector and the

predicted motion vector. Since the decoder module also has the prediction function SetMotionVectorPredictor(),

it can generate the predicted motion vector by itself; this fact notices that the bit-rate can be reduced through the

encoding of not motion vectors but motion vector difference. The original motion vector can be reconstructed from this

predicted motion vector and the motion vector difference. This calculation procedure is shown in Fig. 3. By the addition

of the motion vector difference curr_mvd and the predicted motion vector pmv[k], the reconstructed motion vector

vec is generated and stored at the variable dec_picture->mv.

// read MB mode: read_one_macroblock()

dP->readSyntaxElement(&currSE,img,inp,dP);

currMB->mb_type = currSE.value1;

if (img->type==P_SLICE)

interpret_mb_mode_P(img);

else if (img->type==I_SLICE)

interpret_mb_mode_I(img);

// read the reference indices: readMotionInfoFromNAL()

readSyntaxElement_FLC(&currSE, dP->bitstream);

refframe = 1 - currSE.value1;

dec_picture->ref_idx[LIST_0][img->block_y + j][img->block_x + i] = refframe;

// read the motion vectors: readMotionInfoFromNAL()

SetMotionVectorPredictor ();

dP->readSyntaxElement(&currSE,img,inp,dP);

curr_mvd = currSE.value1;

vec = curr_mvd + pmv[k];

dec_picture->mv[LIST_0][j4+jj][i4+ii][k] = vec;

Fig. 3. The brief algorithm of reading the motion information in the decoder module


5/12

It should be reminded that the inter-coded macroblocks can have various types of sub-block. Nevertheless, the motion

vectors are assigned to all 4x4 blocks in a macroblock, in order to simplify the decoding procedure; this concept is

shown in Fig. 5. We may call this process as the uniformizing of motion vectors.

By the similar way with the variable length decoding, the variable length encoding (VLE) is performed by the function

writeMBLayer() which is included in the function write_one_macroblock(). This function includes two modules;

one is the motion information encoding module, and the other is the macroblock type encoding module.

The motion information encoding is performed by the module writeMotionInfo2NAL() which writes the motion

information into the bitstream. This function consists of two functions: the reference encoding function

writeReferenceFrame() and the motion vector encoding function writeMotionVector8x8(). The function

writeReferenceFrame() performs the variable length encoding of the reference indices which are stored in the

variable enc_picture->ref_idx. If the reference frame is the first preceding frame, the number of reference frame is

not encoded. If it is the second preceding frame, it is encoded as just one bit. On the contrary, if it is more previous than

the second preceding frame, the number of reference frame is encoded in itself according to the unsigned exponential

Golomb encoding method.

The function writeMotionVector8x8() performs the variable length encoding of motion vectors which are stored

in the variable img->all_mv. This variable is accumulated from enc_picture->mv in every macroblock. Motion

vectors are encoded not as original motion vectors but as differences between original motion vector and predicted

motion vector; then, Fig. 4 shows this relationship. In the figure Fiq. 4, curr_mvd represents the motion vector

difference while all_mv and pred_mv indicate the original motion vector and the predicted motion vector. This motion

vector difference is encoded by the signed exponential Golomb encoding method.

In the meantime, the macroblock type encoding module is performed by the function writeMBLayer(). If the current

macroblock is not included in the I-slice, the macroblock type is encoded by the run-length coding scheme. That is, it

counts the number of skipped macroblocks which exist from the previous non-skipped macroblock to the current non-

skipped macroblock; this number is stored at the variable img->cod_counter. Then, this number img->cod_

counter and the macroblock type currMB->mb_type are basically encoded according to the unsigned exponential

Golomb encoding method.

// write the macroblock type: writeMBLayer()

currSE->value1 = img->cod_counter;

dataPart->writeSyntaxElement( currSE, dataPart);

currSE->value1 = MBType2Value (currMB->mb_type);

currSE->value1--;

dataPart->writeSyntaxElement(currSE, dataPart);

// write the reference indices: writeReferenceFrame()

ref = enc_picture->ref_idx[LIST_0][j][i];

currSE->value1 = ref;

dataPart->writeSyntaxElement (currSE, dataPart);

// write the motion vectors: writeMotionInfo2NAL()

curr_mvd = all_mv[j][i][list_idx][refindex][mv_mode][k]

- pred_mv[j][i][list_idx][refindex][mv_mode][k];

currSE->value1 = curr_mvd;

dataPart->writeSyntaxElement (currSE, dataPart);

Fig. 4. The brief algorithm of reading the motion information in the encoder module


6/12

Fig. 5. The uniformizing of motion vectors for the coding process

C. Motion Compensation

In the encoder module, the motion information, which is generated by the variable length decoding function in the

decoder module, are provided to the motion compensation function as well as the variable length encoder. The motion

compensation is performed by the function LumaResidualCoding() which gets the inter-predicted frame and

calculates the displaced frame difference (DFD). Fig. 8 shows the overall flow diagram of this function.

First, the function SetModesAndRefframe() sets the mode parameters and the reference frames like Fig. 6. Here,

the motion vector mode currMB->b8mode is extracted from the macroblock mode by the function SetModesAndRef

frameForBlocks() before executing the residual coding. Then, the residual data of 8x8 sub-macroblocks are encoded

by the function LumaResidualCoding8x8() which consists of the function LumaPrediction4x4() for 4x4 blocks.

This function requires the motion vectors img->all_mv as well as the mode parameters and the reference frames. The

final predicted frame is stored at the variable img->mpr[y][x]. Last, the displaced frame difference is quantized and

transformed by the function dct_luma().

From the discussion so far, the variables of motion information, which are used at the decoder module, are the

macroblock type currMB->mb_type, the reference indices dec_picture->ref_idx, the motion vector mode

currMB->b8mode, and the motion vectors dec_picture->mv. On the other hand, in the encoder module, these are

corresponded to three variables such as e_currMB->mb_type, enc_picture->ref_idx, currMB->b8mode, and

enc_picture->mv. Accordingly, we need the mediating buffers which deliver these data from the decoder module to

the encoder module. These buffers are shown in Fig. 7.

*fw_ref =enc_picture->ref_idx[LIST_0][img->block_y+j][img->block_x+i];

*bw_ref = 0;

*fw_mode = currMB->b8mode[b8];

*bw_mode = 0;

Fig. 6. The setting of the mode parameter and the reference frame

Variable Name Decoder Module Buffer Encoder Module

Macroblock type currMB->mb_type MBmode e_currMB->mb_type

Motion vector mode currMB->b8mode B8mode e_currMB->b8mode

Predicting direction currMB->b8pdir B8pdir e_currMB->b8pdir

Reference index dec_picture->ref_idx dec_ref enc_picture->ref_idx

Motion vector dec_picture->mv dec_mv enc_picture->mv

Fig. 7. The buffers for delivering motion information from the decoder module to the encoder module


7/12

Start

4x4 Luma Prediction

End

All 4x4 Blocks

Checked?N

Y

Get Displaced FrameDifferences (DFD)

DCT/Quantization/

Inverse Quant./IDCTReconstruction

block4++

LumaPrediction4x4()

dct_luma()

Set Modes andReference Frames

SetModesAndRefframe()

All 8x8 BlocksChecked?

N

Y

block8++

4x4 Coding

LumaResidualCoding8x8()

Fig. 8. The flow diagram of the function LumaResidualCoding()

4. Software ImplementationA. Getting the Motion Information from the Decoder Module

The motion information can be classified as five components such as macroblock type, motion vector mode, predicting

direction, reference indices, and motion vectors. Among them, three components such as macroblock type, motion

vector mode, and predicting direction may be gotten when each macroblock is decoded. On the contrary, the remaining

components such as reference indices and motion vectors are stored at the buffer when each picture is decoded.

As a result, the former components, which is called the macroblock information, is extracted by the function store

_mbinfo_into_buffer() which is inserted in the slice decoding function decode_one_slice() like Fig. 9. In this

figure, after one macroblock is decoded by the function decode_one_macroblock(), the macroblock information such as

macroblock type currMB->mb_type, motion vector mode currMB->b8mode, and predicting direction currMB->b8pdir are stored at buffers like Fig. 9. It should be noticed that a macroblock has four motion vector modes and four

prediction directions; it indicates that each 8x8 sub-block inside a macroblock has its own motion vector modes and

prediction direction. The motion vector mode and prediction direction are decided in the unit of 8x8 sub-blocks.

On the other hand, the latter components, which is called the motion vector information, is extracted by the function

store_motion_into_buffer() which is inserted in the buffer-writing function writeIntoFile() like Fig. 9. As

noticed before, the function writeIntoFile() does not only store the decoded raw data and the motion information

into buffers but also performs the encoding procedure with different parameters by the function enc_encode_single_

frame(). After decoding a picture, the function store_motion_into_buffer() stores the motion vector informa-

tion such as motion vectors dec_picture->ref_idx and reference indices dec_picture->mv into buffers as shown

in Fig. 9. In the case that the picture has the QCIF size, the X and Y sizes in the 4x4 block unit are 44 and 36. Thus, the

number of motion vectors in a picture is 1584 (44x36).


8/12

void decode_one_slice(struct img_par *img,struct inp_par *inp)

{

while (end_of_slice == FALSE) // loop over macroblocks

{

start_macroblock(img,inp, img->current_mb_nr);

read_flag = read_one_macroblock(img,inp);decode_one_macroblock(img,inp);

store_mbinfo_into_buffer(img);

exit_slice();

}

}

void writeIntoFile(StorablePicture *p, struct img_par *img)

{

store_motion_into_buffer();

enc_encode_single_frame();

frameNumberToWrite++;

}

void store_mbinfo_into_buffer(struct img_par *img)

{

struct macroblock *currMB = &img->mb_data[img->current_mb_nr];

for(int k=0; kcurrent_mb_nr][k] = currMB->b8mode[k];

B8pdir[img->current_mb_nr][k] = currMB->b8pdir[k];

}

MBmode[img->current_mb_nr] = currMB->mb_type;}

void store_motion_into_buffer()

{

for(int by=0; bymv[LIST_0][by][bx][0];

dec_mv[by][bx][1] = dec_picture->mv[LIST_0][by][bx][1];

}

}

Fig. 9. The algorithm of getting the motion information from the decoder module

B. Putting the Motion Information into the Encoder Module

After getting the motion information from the decoder module, the motion information is stored in temporary buffers.

The encoder module can gain these data from buffers in order to perform the variable length encoding and the motion

compensation without motion estimation. These data can be uploaded to the encoder module whenever each

macroblock is encoded through the function encode_one_macroblock() like Fig. 10. In this macroblock encoding

function, the inter-coding part should be removed to stop the motion estimation procedure. Notice that the cost of an

inter mode should be the minimum value if the current macroblock is a macroblock in P-slice or B-slice. The H.264/

AVC encoder compares all possible modes to select the mode with the minimum cost as the best mode. However, since


9/12

a macroblock in P-slice or B-slice already has its own motion information from the decoder module, forcing this

macroblock to be inter-coded gives little effect on the bit-rate efficiency. Although a macroblock in P-slice or B-slice is

forced to be inter-coded, it provides lower computational complexity than performing the complex mode decision

procedure. In this case, the best mode also is chosen to be the macroblock mode which is gotten from the decoder

module.

void encode_one_macroblock()

{

if (!intra)

{

(removed part of inter-coding)

init_motion_info(e_img);

best_mode = currMB->mb_type;

min_cost = 0;

}

}

void init_motion_info(e_ImageParameters *e)

{

struct e_macroblock *e_currMB = &e->mb_data[e->current_mb_nr];

int bx, by, b8, x, y;

int nr = e->current_mb_nr;

int block_x = (nr % 11) * 4;

int block_y = (nr / 11) * 4;

e_currMB->mb_type = MBmode[nr];

for(b8=0; b8b8mode[b8] = B8mode[nr][b8];

e_currMB->b8pdir[b8] = B8pdir[nr][b8];

}

for(by=0; by


10/12

Analyzing the macroblock encoding function encode_one_macroblock() is valuable to understand why the inter-

coding part can be removed correctly by setting the cost as the minimum value. Fig. 11 shows the brief flow diagram of

this function. According to this flow diagram, if the inter-coding part is removed, all macroblock will be intra-coded

since the intra mode will be selected as the best mode. On the contrary, if we set the cost of an inter mode as the

minimum value, the final mode of the corresponding macroblock will be decided as the inter mode. The function

SetModesAndRefframeForBlocks() sets the final mode and the reference information for a macroblock after the

best mode is decided.

5. ExperimentsTo test the basic transcoder explained so far, I used the part of Foreman video formatted with the QCIF as shown in

Fig. 12(a). When we decode this video encoded directly with the quantization parameter QPISlice and QPPSlice equal

to 36 without transcoding, we can get a video like Fig. 12(b). On the other hand, if we transcode this encoded video

with the same quantization parameter, we get the output video as shown in Fig. 12(c). This transcoding result is the

same as the video encoded directly without transcoding. Fig. 12(d) shows the result of transcoding with the samequantization parameter in the condition that the motion information is reused in the encoder module. This picture is also

the same as the transcoding result of Fig. 12(c).

Start

Only Intra?

(I slice)

N

Y

Initialization

Inter Prediction

High ComplexityMode?

Low ComplextiyIntra Prediction

High ComplexityIntra Prediction

N

Y

End

Set Modes& Reference Frames

Set Coefficient& Reconstruction for 8x8

SetModesAndRefframeForBlocks()

SetCoeffAndReconstruction8x8()

Set Macroblock Parameters

Choose Best Mode

Set MB Parameters

RandomIntra()

init_enc_mb_params()

Residual Coding

set_stored_macroblock_parameters()

Fig. 10. The brief flow chart of the function encode_one_macroblock()


11/12

Table 1. The characteristics of directly encoded videos and the transcoded video

QP=12 QP=36 QP=36 (with ME) QP=36 (without ME)

Bitstream Size 21 KB 1.80 KB 1.77 KB 2.18 KB

SNR Y 48.91 31.60 31.71 31.03

SNR U 49.68 38.71 38.60 38.65SNR V 50.66 38.89 39.14 38.96

Encoding time 3.468 sec 2.578 sec 2.547 sec 1.516 sec

ME time 0.980 sec 1.202 sec 0.967 sec 0.000 sec

(a) (b)

(c) (d)

Fig. 11. The comparison of direct encoded video and transcoded video; (a) An original Foreman video, (b) a videoencoded directly with the quantization parameter 36, (c) a video transcoded with the quantization parameter 36, and(d) a video transcoded with the quantization parameter 36 in the condition that the motion information are reused.

Some characteristics of the encoded bitstream in each case are shown in Table 1. The size of the bitstream transcoded

with motion vector reused is similar with that of the bitstream transcoded basically although there exists a little

difference. Such a difference seems to be generated from the work forcing macroblocks in P-slice or B-slice to be inter-

coded. Actually, even though a macroblock is included inside P-slice or B-slice, it can be intra-coded. In this case, thetranscoding error is generated.


12/12

The SNR performances are nearly similar each other; then, the above three results with the same quantization para-

meter have the same distortion performance and quality visually and statistically. On the other hand, the encoding time

of the bitstream transcoded with motion vector reused is greatly different from the others. It is less than half an encod-

ing time of the bitstream transcoded basically; this is due to removing the motion estimation procedure which is very

time-consuming. While the motion estimation time of the bitstream transcoded with motion vector reused is zero, the

others takes nearly one second for the motion estimation. It shows that the computational complexity of the bitstream

transcoded with motion vector reused is greatly lower than that of the basic transcoder

6. ConclusionReducing the computational complexity is important issue for designing the tranascoder in order to apply for more

various applications. As a kind of the spatial-domain transcoding, the trasncoder with motion vector resued provides the

powerful function that can reduce the total transcoding time. For this purpose, the motion information is delivered from

the decoder module to the encoder module through the temporary buffers. Such motion information can be classified as

five components such as macroblock type, motion vector mode, predicting direction, reference indices, and motionvectors. These data should be updated whenever each macroblock is decoded and encoded newly. The experiments

show that the spatial domain trasncoder with motion vector resued has the same performance as the simple transcoder.

Nevertheless, it has lower computational complexity than the simple transcoder. The future work is the development of

frequency domain transcoder which may be faster and effective than spatial domain trasncoders.

Documents

Spatial-Domain Transcoder with Reused Motion Vectors in H.264/AVC