22
Towards Efficient Wavefront Parallel Encoding of HEVC: Parallelism Analysis and Improvement Keji Chen, Yizhou Duan, Jun Sun, Zongming Guo 2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)

Towards Efficient Wavefront Parallel Encoding of HEVC: Parallelism Analysis and Improvement Keji Chen, Yizhou Duan, Jun Sun, Zongming Guo 2014 IEEE 16th

Embed Size (px)

Citation preview

Towards Efficient Wavefront Parallel Encoding of HEVC: Parallelism

Analysis and Improvement

Keji Chen, Yizhou Duan, Jun Sun, Zongming Guo

2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)

2

Outline

Introduction Parallelism Evaluation Of HEVC Encoding Proposed Method Experimental Results Conclusion

3

Introduction

Great increment of computational complexity introduced by the enhanced coding tools makes HEVC difficult for application.

By developing the parallelism among the encoding tasks, the encoding speed can be significantly improved.

4

Introduction

Compared with slices, WPP can achieve similar parallelism with less loss of coding efficiency.

In [11], Chi et al. proposed an Overlapped WaveFront (OWF) method based on WPP.

• [11] C. C. Chi, M. Alvarez-Mesa, B. Juurlink, G. Clare, F. Henry, S. Pateux, and T. Schierl, “Parallel Scalability and Efficiency of HEVC Parallelization Approaches,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, pp.1827-1838, Dec. 2012

5

Parallelism Evaluation Of HEVC Encoding(1/3)

Ti,j,k : Self Encoding Complexity (SEC) of Ci,j,k. SEC can be evaluated by the encoding time. Determined by the frame content and RDO design

and does not change with parallel methods. ETF(Ci,j,k) : Required Encoding Complexity

(REC) to encode Ci,j,k using parallel method F. REC can be regarded as the earliest ending time. Affected by the data dependence.

6

Parallelism Evaluation Of HEVC Encoding(2/3)

max{} (1) (2)

• i, j, k : order of frame, line, and CTU.• DEPF,inter(Ci,j,k) : CTBs that Ci,j,k depends on when using parallel encoding method F.

7

Parallelism Evaluation Of HEVC Encoding(3/3)

From (1) and (2), it is clear that the parallelism of different parallel methods can be evaluated:

This criterion is easy to be proved with (1) and (2) and can be simply explained as the less dependence in HEVC encoding, the higher parallelism can be obtained.

} (3)

(4)

8

Data Dependence Analysis of WPP and OWF Method(1/4)

For intra :

(5)

9

Data Dependence Analysis of WPP and OWF Method(2/4)

SEC of each CTB is of significant difference. Variance of the SEC in inter frame is much

greater than that of intra frame. Under the given encoding algorithm, the

unbalanced SEC is determined, thus being the bottleneck of intra-frame parallelism.

10

Data Dependence Analysis of WPP and OWF Method(3/4)

11

Data Dependence Analysis of WPP and OWF Method(4/4)

For inter :

• i, j, k : order of frame, line, and CTU.• W : the width of a frame measured by CTB.• L_OWF : a positive integer parameter denoting the safe range.

• In [11], L_OWF is roughly set to the upper round of 1/4 height of a frame measured by CTB.

, (6), (7)

12

Proposed Method(1/5)

To best exploit the inter-frame parallelism, we designed a new Inter-frame Wavefront (IFW) coding order.

13

Proposed Method(2/5)

For intra :

For inter :

, (8)

(9)

14

Proposed Method(3/5)

Frame Thread (FT) is assigned to each frame to develop inter-frame parallelism.

Wavefront Thread (WT) is assigned to each frame to develop intra-frame parallelism.

15

Proposed Method(4/5)

If L_IFW is no greater than L_OWF, for any i, j, k we can deduce that:

, (12)

,(13)

16

Proposed Method(5/5)

It is also confirmed that the unbalanced SEC is a bottleneck for intra-frame parallelism.

Parallelism of IFW significantly increases as B-frames increase, because the effectively reduced inter-frame dependence makes much greater contribution in improving the overall parallelism.

17

Experimental Results

The common test conditions and software reference configurations [12].

The hardware platform is a shared memory system with two AMD Opteron 6272 processors.

18

Experimental Results(2/)

19

Experimental Results

Frame Thread = 9, Wavefront Thread = 8

20

21

x265

22

Conclusion

A parallelism evaluation criterion and an IFW method are proposed to improve the encoding speed of HEVC.

IFW method achieves significant speedup on various sequences, being a promising technology for large-scale HEVC video applications.