Upload
wonsang-you
View
228
Download
0
Embed Size (px)
Citation preview
8/3/2019 Spatial-Domain Transcoder with Reused Motion Vectors in H.264/AVC
1/12
Spatial-Domain Transcoder with Reused Motion Vectors in H.264/AVCClass Report, (ICE815) Special Topics on Image Engineering, Fall 2006
Wonsang You
KAIST ICC
Munjiro, Yuseong-gu, Daejeon, 305732, Korea
1. IntroductionTranscoding architectures for compressed videos can be classified as a spatial-domain transcoding architecture
(SDTA), a frequency-domain transcoding (FDTA), and a Hybrid-domain transcoding architecture (HDTA). A spatial-
domain transcoding architecture has the advantage that it can achieve more flexible transcoding. Since most parts of the
encoder module and the decoder module are separated each other, this transcoder can perform the transcoding in
quantization step-size, picture resolution, and so on. However, this type of transcoders also suffers from high computa-tional complexity.
To reduce its computational complexity, common information of the decoder module such as motion vectors can be
reused by the encoder module. Since the motion estimation is the most time-consuming procedure, it is very effective to
reduce the transcoding time.
This report provides an explanation about the software implementation of H.264/AVC spatial-domain transcoder with
motion vector reused. . I introduce the spatial-domain transcoding architecture (SDTA) of H.264/AVC transcoder and
the methodology of its software implementation. It is shown that it can perform the transcoding in quantization
parameter faster than the basic form of spatial-domain transcoding.
2. The Structure of TranscoderThe spatial-domain transcoding architecture with motion vector reused is the transcoding form that its decoder module
shares the motion information with the encoder module. Since the encoder module receives total motion information
from the decoder module, the encoder module does not need to perform the motion estimation procedure.
Fig. 1(b) shows a spatial-domain transcoding architecture that reuses the motion information. Unlike the basic form of
spatial-domain transcoding that performs the motion estimation as shown in Fig. 1(a), it does not contain the motion
estimation procedure. Instead, it uses repeatedly the motion information among the data decoded by its decoder module
to perform the motion compensation and the variable length coding in the encoder module.
The motion compensation includes the estimation of motion vectors as well as the reconstruction of each picture from
the motion information and reference pictures. Motion vectors are estimated to reduce the bit-rate of encoded bitstream;
that is, the difference between the original motion vector and the predicted motion vector is just encoded by the entropy
coding.
It needs to be noticed that the motion information is encoded by the variable length coder. Three components of the
motion information such as motion vector, reference index, and macroblock type are encoded by different types of the
variable length coder. For example, the macroblock types and the reference indices are encoded by the unsigned
exponential Golomb coding; on the other hand, the motion vectors are encoded by the signed exponential Golomb
coding.
As shown in Fig. 1(a), the spatial-domain transcoding architecture with motion vector reused can include the optional
function for adjusting the spatial or temporal resolution. This optional function consists of two modules such as
spatial/temporal resolution reduction (STR) module and MV composition and refinement (MVCR) module. STR
supports the input bitstream to be transcoded with reduced resolution; MVCR adjusts motion vectors according to
transcoded resolution. In our report, the trasncoder is designed without STR and MVCR since we put our purpose in
checking the change of quantization parameters instead of resolution.
8/3/2019 Spatial-Domain Transcoder with Reused Motion Vectors in H.264/AVC
2/12
(a)
(b)
Fig. 1. The spatial-domain transcoding architectures: (a) with the basic form and (b) with motion vector reused
Like the basic form of spatial-domain trasncoder, the spatial-domain transcoder with motion vector reused the input
bitstream to be fully decoded by the decoder module which is inside the transcoder; then, the raw data which is
generated by a decoder module is again used for the motion compensation along with motion information in thecascaded encoder module. While the basic transcoder delivers just the decoded raw data to the encoder module, the new
transcoder with motion vector reused delivers not only the decoded raw data but also motion information to the encoder
module. Nevertheless, it reduces the computation time effectively since the motion estimation which is the most time-
consuming procedure is not necessary for motion compensation.
3. BackgroundA.. The Structure of the Transcoder
In this section, a software implementation of H.264/AVC transcoder, which is made using the reference software JM
10.2, is introduced in detail. The brief procedure of this software is shown in Fig. 2.
8/3/2019 Spatial-Domain Transcoder with Reused Motion Vectors in H.264/AVC
3/12
Fig. 2. The brief procedure of H.264/AVC transcoder
The configuration file and the encoded input file are opened and read by two functions: Configure() and
OpenBitstreamFile(). After saving the input data, buffers for the motion information and raw data, which will be
decoded from the decoder module, are allocated as four variables such as YUVb, dec_ref, dec_mv, and B8mode by the
function allocate_buffer_for_encoder(). YUVb is the buffer for saving the decoded raw data. dec_ref and
dec_mv are the buffers for reference indices and motion vectors. B8mode is the buffer for 8x8 sub-block modes in a
macroblock. Next, the initialization of encoder module is performed by the function initialize_encoder_main().
Then, the main transcoding procedure is performed in every frame by the function decode_one_frame() whichcontains the procedure of decoding all slices. This function is ended by the function exit_picture() which contains
the deblocking function DeblockPicture(). The deblocking function includes the buffer-writing function write
IntoFile() which performs writing the decoded raw data and the motion information into temporary buffer.
Especially, the motion information is stored in a buffer by the function store_motion_into_buffer(). This buffer-
writing function includes the frame-encoding function encode_sequence() which resides in the function
enc_encode_single_frame(). Such relationship of functions is shown in Fig. 2.
B. Variable Length Coding
In this section, it is shown how the variable length coding is performed in the JM reference software. Since this
mechanism informs what variables are related to motion information, it is necessary to know the structure of the
variable length coding and motion compensation.
main()
decode_one_frame()
exit_picture()
DeblockPicture()
writeIntoFile()
enc_encode_single_frame()
encode_sequence()
Start
All FramesEncoded?
N
Y
Open Configuration File
End
Open Encoded Input File
Initialize Transcoder
Decode a Frame
Finalize Transcoder
Write Data into Buffers
Encode a Frame
FrameNum++
Configure()
OpenBitstreamFile()
Initialize_encoder_main()
decode_one_frame()
writeIntoFile()
enc_encode_single_frame()
Allocate Buffersallocate_buffer_for_encoder()
8/3/2019 Spatial-Domain Transcoder with Reused Motion Vectors in H.264/AVC
4/12
The variable length coding exists in both the decoder module and the encoder module. It should be noticed that the
variable length coding procedure is performed in the unit of macroblock. In the decoder module, the variable length
decoding (VLD) is performed by the function read_one_macroblock() which reads the macroblock information
such as macroblock type, intra prediction mode, and motion vectors from the encoded bitstream. On the other hand, in
the encoder module, the variable length encoding (VLE) is performed by the function writeMBLayer() which writes
the syntax elements of a macroblock into the bitstream. These syntax elements include macroblock type, intra prediction
mode, coded block pattern, quantization parameter, motion information, and residual block data.
The summarized algorithm of the VLD function read_one_macroblock() is shown in Fig. 3. In this function, the
macroblock type is stored as the variable currMB->mb_type after it is read from the bitstream. Then, the motion vector
mode and the predicting direction are extracted from this macroblock type by the functions interpret_mb_mode_I()
or interpret_mb_mode_P(). The motion vector mode represents how an inter-coded macroblock is separated by the
sub-block such as 8x4, 4x8, and so on; it is assigned to the variable currMB->b8mode. Likewise, the predicting
direction is assigned to the variable currMB->b8pdir.
In the meantime, the motion information is read by the function readMotionInfoFromNAL() which includes the
prediction module of motion vectors. Fig. 3 shows the abbreviated algorithm of this function. This function read the
motion vectors and the reference indices. While the reference indices are assigned to the variable dec_picture-
>ref_idx, the motion vectors are assigned to the variable dec_picture->mv. Actually, the motion vector information
interpreted from the bistream is intrinsically the motion vector difference between the original motion vector and the
predicted motion vector. Since the decoder module also has the prediction function SetMotionVectorPredictor(),
it can generate the predicted motion vector by itself; this fact notices that the bit-rate can be reduced through the
encoding of not motion vectors but motion vector difference. The original motion vector can be reconstructed from this
predicted motion vector and the motion vector difference. This calculation procedure is shown in Fig. 3. By the addition
of the motion vector difference curr_mvd and the predicted motion vector pmv[k], the reconstructed motion vector
vec is generated and stored at the variable dec_picture->mv.
// read MB mode: read_one_macroblock()
dP->readSyntaxElement(&currSE,img,inp,dP);
currMB->mb_type = currSE.value1;
if (img->type==P_SLICE)
interpret_mb_mode_P(img);
else if (img->type==I_SLICE)
interpret_mb_mode_I(img);
// read the reference indices: readMotionInfoFromNAL()
readSyntaxElement_FLC(&currSE, dP->bitstream);
refframe = 1 - currSE.value1;
dec_picture->ref_idx[LIST_0][img->block_y + j][img->block_x + i] = refframe;
// read the motion vectors: readMotionInfoFromNAL()
SetMotionVectorPredictor ();
dP->readSyntaxElement(&currSE,img,inp,dP);
curr_mvd = currSE.value1;
vec = curr_mvd + pmv[k];
dec_picture->mv[LIST_0][j4+jj][i4+ii][k] = vec;
Fig. 3. The brief algorithm of reading the motion information in the decoder module
8/3/2019 Spatial-Domain Transcoder with Reused Motion Vectors in H.264/AVC
5/12
It should be reminded that the inter-coded macroblocks can have various types of sub-block. Nevertheless, the motion
vectors are assigned to all 4x4 blocks in a macroblock, in order to simplify the decoding procedure; this concept is
shown in Fig. 5. We may call this process as the uniformizing of motion vectors.
By the similar way with the variable length decoding, the variable length encoding (VLE) is performed by the function
writeMBLayer() which is included in the function write_one_macroblock(). This function includes two modules;
one is the motion information encoding module, and the other is the macroblock type encoding module.
The motion information encoding is performed by the module writeMotionInfo2NAL() which writes the motion
information into the bitstream. This function consists of two functions: the reference encoding function
writeReferenceFrame() and the motion vector encoding function writeMotionVector8x8(). The function
writeReferenceFrame() performs the variable length encoding of the reference indices which are stored in the
variable enc_picture->ref_idx. If the reference frame is the first preceding frame, the number of reference frame is
not encoded. If it is the second preceding frame, it is encoded as just one bit. On the contrary, if it is more previous than
the second preceding frame, the number of reference frame is encoded in itself according to the unsigned exponential
Golomb encoding method.
The function writeMotionVector8x8() performs the variable length encoding of motion vectors which are stored
in the variable img->all_mv. This variable is accumulated from enc_picture->mv in every macroblock. Motion
vectors are encoded not as original motion vectors but as differences between original motion vector and predicted
motion vector; then, Fig. 4 shows this relationship. In the figure Fiq. 4, curr_mvd represents the motion vector
difference while all_mv and pred_mv indicate the original motion vector and the predicted motion vector. This motion
vector difference is encoded by the signed exponential Golomb encoding method.
In the meantime, the macroblock type encoding module is performed by the function writeMBLayer(). If the current
macroblock is not included in the I-slice, the macroblock type is encoded by the run-length coding scheme. That is, it
counts the number of skipped macroblocks which exist from the previous non-skipped macroblock to the current non-
skipped macroblock; this number is stored at the variable img->cod_counter. Then, this number img->cod_
counter and the macroblock type currMB->mb_type are basically encoded according to the unsigned exponential
Golomb encoding method.
// write the macroblock type: writeMBLayer()
currSE->value1 = img->cod_counter;
dataPart->writeSyntaxElement( currSE, dataPart);
currSE->value1 = MBType2Value (currMB->mb_type);
currSE->value1--;
dataPart->writeSyntaxElement(currSE, dataPart);
// write the reference indices: writeReferenceFrame()
ref = enc_picture->ref_idx[LIST_0][j][i];
currSE->value1 = ref;
dataPart->writeSyntaxElement (currSE, dataPart);
// write the motion vectors: writeMotionInfo2NAL()
curr_mvd = all_mv[j][i][list_idx][refindex][mv_mode][k]
- pred_mv[j][i][list_idx][refindex][mv_mode][k];
currSE->value1 = curr_mvd;
dataPart->writeSyntaxElement (currSE, dataPart);
Fig. 4. The brief algorithm of reading the motion information in the encoder module
8/3/2019 Spatial-Domain Transcoder with Reused Motion Vectors in H.264/AVC
6/12
Fig. 5. The uniformizing of motion vectors for the coding process
C. Motion Compensation
In the encoder module, the motion information, which is generated by the variable length decoding function in the
decoder module, are provided to the motion compensation function as well as the variable length encoder. The motion
compensation is performed by the function LumaResidualCoding() which gets the inter-predicted frame and
calculates the displaced frame difference (DFD). Fig. 8 shows the overall flow diagram of this function.
First, the function SetModesAndRefframe() sets the mode parameters and the reference frames like Fig. 6. Here,
the motion vector mode currMB->b8mode is extracted from the macroblock mode by the function SetModesAndRef
frameForBlocks() before executing the residual coding. Then, the residual data of 8x8 sub-macroblocks are encoded
by the function LumaResidualCoding8x8() which consists of the function LumaPrediction4x4() for 4x4 blocks.
This function requires the motion vectors img->all_mv as well as the mode parameters and the reference frames. The
final predicted frame is stored at the variable img->mpr[y][x]. Last, the displaced frame difference is quantized and
transformed by the function dct_luma().
From the discussion so far, the variables of motion information, which are used at the decoder module, are the
macroblock type currMB->mb_type, the reference indices dec_picture->ref_idx, the motion vector mode
currMB->b8mode, and the motion vectors dec_picture->mv. On the other hand, in the encoder module, these are
corresponded to three variables such as e_currMB->mb_type, enc_picture->ref_idx, currMB->b8mode, and
enc_picture->mv. Accordingly, we need the mediating buffers which deliver these data from the decoder module to
the encoder module. These buffers are shown in Fig. 7.
*fw_ref =enc_picture->ref_idx[LIST_0][img->block_y+j][img->block_x+i];
*bw_ref = 0;
*fw_mode = currMB->b8mode[b8];
*bw_mode = 0;
Fig. 6. The setting of the mode parameter and the reference frame
Variable Name Decoder Module Buffer Encoder Module
Macroblock type currMB->mb_type MBmode e_currMB->mb_type
Motion vector mode currMB->b8mode B8mode e_currMB->b8mode
Predicting direction currMB->b8pdir B8pdir e_currMB->b8pdir
Reference index dec_picture->ref_idx dec_ref enc_picture->ref_idx
Motion vector dec_picture->mv dec_mv enc_picture->mv
Fig. 7. The buffers for delivering motion information from the decoder module to the encoder module
8/3/2019 Spatial-Domain Transcoder with Reused Motion Vectors in H.264/AVC
7/12
Start
4x4 Luma Prediction
End
All 4x4 Blocks
Checked?N
Y
Get Displaced FrameDifferences (DFD)
DCT/Quantization/
Inverse Quant./IDCTReconstruction
block4++
LumaPrediction4x4()
dct_luma()
Set Modes andReference Frames
SetModesAndRefframe()
All 8x8 BlocksChecked?
N
Y
block8++
4x4 Coding
LumaResidualCoding8x8()
Fig. 8. The flow diagram of the function LumaResidualCoding()
4. Software ImplementationA. Getting the Motion Information from the Decoder Module
The motion information can be classified as five components such as macroblock type, motion vector mode, predicting
direction, reference indices, and motion vectors. Among them, three components such as macroblock type, motion
vector mode, and predicting direction may be gotten when each macroblock is decoded. On the contrary, the remaining
components such as reference indices and motion vectors are stored at the buffer when each picture is decoded.
As a result, the former components, which is called the macroblock information, is extracted by the function store
_mbinfo_into_buffer() which is inserted in the slice decoding function decode_one_slice() like Fig. 9. In this
figure, after one macroblock is decoded by the function decode_one_macroblock(), the macroblock information such as
macroblock type currMB->mb_type, motion vector mode currMB->b8mode, and predicting direction currMB->b8pdir are stored at buffers like Fig. 9. It should be noticed that a macroblock has four motion vector modes and four
prediction directions; it indicates that each 8x8 sub-block inside a macroblock has its own motion vector modes and
prediction direction. The motion vector mode and prediction direction are decided in the unit of 8x8 sub-blocks.
On the other hand, the latter components, which is called the motion vector information, is extracted by the function
store_motion_into_buffer() which is inserted in the buffer-writing function writeIntoFile() like Fig. 9. As
noticed before, the function writeIntoFile() does not only store the decoded raw data and the motion information
into buffers but also performs the encoding procedure with different parameters by the function enc_encode_single_
frame(). After decoding a picture, the function store_motion_into_buffer() stores the motion vector informa-
tion such as motion vectors dec_picture->ref_idx and reference indices dec_picture->mv into buffers as shown
in Fig. 9. In the case that the picture has the QCIF size, the X and Y sizes in the 4x4 block unit are 44 and 36. Thus, the
number of motion vectors in a picture is 1584 (44x36).
8/3/2019 Spatial-Domain Transcoder with Reused Motion Vectors in H.264/AVC
8/12
void decode_one_slice(struct img_par *img,struct inp_par *inp)
{
while (end_of_slice == FALSE) // loop over macroblocks
{
start_macroblock(img,inp, img->current_mb_nr);
read_flag = read_one_macroblock(img,inp);decode_one_macroblock(img,inp);
store_mbinfo_into_buffer(img);
exit_slice();
}
}
void writeIntoFile(StorablePicture *p, struct img_par *img)
{
store_motion_into_buffer();
enc_encode_single_frame();
frameNumberToWrite++;
}
void store_mbinfo_into_buffer(struct img_par *img)
{
struct macroblock *currMB = &img->mb_data[img->current_mb_nr];
for(int k=0; kcurrent_mb_nr][k] = currMB->b8mode[k];
B8pdir[img->current_mb_nr][k] = currMB->b8pdir[k];
}
MBmode[img->current_mb_nr] = currMB->mb_type;}
void store_motion_into_buffer()
{
for(int by=0; bymv[LIST_0][by][bx][0];
dec_mv[by][bx][1] = dec_picture->mv[LIST_0][by][bx][1];
}
}
Fig. 9. The algorithm of getting the motion information from the decoder module
B. Putting the Motion Information into the Encoder Module
After getting the motion information from the decoder module, the motion information is stored in temporary buffers.
The encoder module can gain these data from buffers in order to perform the variable length encoding and the motion
compensation without motion estimation. These data can be uploaded to the encoder module whenever each
macroblock is encoded through the function encode_one_macroblock() like Fig. 10. In this macroblock encoding
function, the inter-coding part should be removed to stop the motion estimation procedure. Notice that the cost of an
inter mode should be the minimum value if the current macroblock is a macroblock in P-slice or B-slice. The H.264/
AVC encoder compares all possible modes to select the mode with the minimum cost as the best mode. However, since
8/3/2019 Spatial-Domain Transcoder with Reused Motion Vectors in H.264/AVC
9/12
a macroblock in P-slice or B-slice already has its own motion information from the decoder module, forcing this
macroblock to be inter-coded gives little effect on the bit-rate efficiency. Although a macroblock in P-slice or B-slice is
forced to be inter-coded, it provides lower computational complexity than performing the complex mode decision
procedure. In this case, the best mode also is chosen to be the macroblock mode which is gotten from the decoder
module.
void encode_one_macroblock()
{
if (!intra)
{
(removed part of inter-coding)
init_motion_info(e_img);
best_mode = currMB->mb_type;
min_cost = 0;
}
}
void init_motion_info(e_ImageParameters *e)
{
struct e_macroblock *e_currMB = &e->mb_data[e->current_mb_nr];
int bx, by, b8, x, y;
int nr = e->current_mb_nr;
int block_x = (nr % 11) * 4;
int block_y = (nr / 11) * 4;
e_currMB->mb_type = MBmode[nr];
for(b8=0; b8b8mode[b8] = B8mode[nr][b8];
e_currMB->b8pdir[b8] = B8pdir[nr][b8];
}
for(by=0; by
8/3/2019 Spatial-Domain Transcoder with Reused Motion Vectors in H.264/AVC
10/12
Analyzing the macroblock encoding function encode_one_macroblock() is valuable to understand why the inter-
coding part can be removed correctly by setting the cost as the minimum value. Fig. 11 shows the brief flow diagram of
this function. According to this flow diagram, if the inter-coding part is removed, all macroblock will be intra-coded
since the intra mode will be selected as the best mode. On the contrary, if we set the cost of an inter mode as the
minimum value, the final mode of the corresponding macroblock will be decided as the inter mode. The function
SetModesAndRefframeForBlocks() sets the final mode and the reference information for a macroblock after the
best mode is decided.
5. ExperimentsTo test the basic transcoder explained so far, I used the part of Foreman video formatted with the QCIF as shown in
Fig. 12(a). When we decode this video encoded directly with the quantization parameter QPISlice and QPPSlice equal
to 36 without transcoding, we can get a video like Fig. 12(b). On the other hand, if we transcode this encoded video
with the same quantization parameter, we get the output video as shown in Fig. 12(c). This transcoding result is the
same as the video encoded directly without transcoding. Fig. 12(d) shows the result of transcoding with the samequantization parameter in the condition that the motion information is reused in the encoder module. This picture is also
the same as the transcoding result of Fig. 12(c).
Start
Only Intra?
(I slice)
N
Y
Initialization
Inter Prediction
High ComplexityMode?
Low ComplextiyIntra Prediction
High ComplexityIntra Prediction
N
Y
End
Set Modes& Reference Frames
Set Coefficient& Reconstruction for 8x8
SetModesAndRefframeForBlocks()
SetCoeffAndReconstruction8x8()
Set Macroblock Parameters
Choose Best Mode
Set MB Parameters
RandomIntra()
init_enc_mb_params()
Residual Coding
set_stored_macroblock_parameters()
Fig. 10. The brief flow chart of the function encode_one_macroblock()
8/3/2019 Spatial-Domain Transcoder with Reused Motion Vectors in H.264/AVC
11/12
Table 1. The characteristics of directly encoded videos and the transcoded video
QP=12 QP=36 QP=36 (with ME) QP=36 (without ME)
Bitstream Size 21 KB 1.80 KB 1.77 KB 2.18 KB
SNR Y 48.91 31.60 31.71 31.03
SNR U 49.68 38.71 38.60 38.65SNR V 50.66 38.89 39.14 38.96
Encoding time 3.468 sec 2.578 sec 2.547 sec 1.516 sec
ME time 0.980 sec 1.202 sec 0.967 sec 0.000 sec
(a) (b)
(c) (d)
Fig. 11. The comparison of direct encoded video and transcoded video; (a) An original Foreman video, (b) a videoencoded directly with the quantization parameter 36, (c) a video transcoded with the quantization parameter 36, and(d) a video transcoded with the quantization parameter 36 in the condition that the motion information are reused.
Some characteristics of the encoded bitstream in each case are shown in Table 1. The size of the bitstream transcoded
with motion vector reused is similar with that of the bitstream transcoded basically although there exists a little
difference. Such a difference seems to be generated from the work forcing macroblocks in P-slice or B-slice to be inter-
coded. Actually, even though a macroblock is included inside P-slice or B-slice, it can be intra-coded. In this case, thetranscoding error is generated.
8/3/2019 Spatial-Domain Transcoder with Reused Motion Vectors in H.264/AVC
12/12
The SNR performances are nearly similar each other; then, the above three results with the same quantization para-
meter have the same distortion performance and quality visually and statistically. On the other hand, the encoding time
of the bitstream transcoded with motion vector reused is greatly different from the others. It is less than half an encod-
ing time of the bitstream transcoded basically; this is due to removing the motion estimation procedure which is very
time-consuming. While the motion estimation time of the bitstream transcoded with motion vector reused is zero, the
others takes nearly one second for the motion estimation. It shows that the computational complexity of the bitstream
transcoded with motion vector reused is greatly lower than that of the basic transcoder
6. ConclusionReducing the computational complexity is important issue for designing the tranascoder in order to apply for more
various applications. As a kind of the spatial-domain transcoding, the trasncoder with motion vector resued provides the
powerful function that can reduce the total transcoding time. For this purpose, the motion information is delivered from
the decoder module to the encoder module through the temporary buffers. Such motion information can be classified as
five components such as macroblock type, motion vector mode, predicting direction, reference indices, and motionvectors. These data should be updated whenever each macroblock is decoded and encoded newly. The experiments
show that the spatial domain trasncoder with motion vector resued has the same performance as the simple transcoder.
Nevertheless, it has lower computational complexity than the simple transcoder. The future work is the development of
frequency domain transcoder which may be faster and effective than spatial domain trasncoders.