Efficient block-based transparent encryption for H.264/SVC bitstreams

REGULAR PAPER

Efficient block-based transparent encryption for H.264/SVCbitstreams

Robert Huijie Deng • Xuhua Ding •

Yongdong Wu • Zhuo Wei

� Springer-Verlag Berlin Heidelberg 2013

Abstract Taking advantage of the inter-layer prediction

technique used in H.264/scalable video coding (H.264/

SVC), in this paper we propose an efficient block-based

encryption scheme (BBES) for encrypting H.264/SVC

enhancement layers (ELs). BBES operates in three modes,

namely, Intra-MB mode, Group-MB mode and 4Group-

MB mode. All the three modes are effective in securing

ELs, preserve the ‘‘adaptation-transparent’’ property of

H.264/SVC, and are format-compliant to the H.264/SVC

bitstream format specifications. Moreover, Intra-MB and

Group-MB modes also possess the property we termed as

‘‘transcoding transparency’’. Experimental results indicate

that BBES has low computational complexity and small

compression overhead. Thus, BBES is suitable for trans-

parent encryption of H.264/SVC bitstreams in which ELs

are encrypted but base layers are left in cleartext.

Keywords Data confidentiality � H.264/SVC �Transparent encryption � Scalability

1 Introduction

The scalable extension of H.264, referred to as scalable

video coding (SVC) [1], is composed of a base layer (BL),

which is compatible with the H.264 advance video coding

(AVC), and one or more enhancement layers (ELs), which

provide video scalability in all three dimensions (i.e., time,

quality and resolution). With the rapid advancement of

networking and multimedia processing technologies,

applications of H.264/SVC are becoming more and more

popular. However, SVC bitstreams can easily be inter-

cepted when they are delivered over open networks. The

content of an SVC bitstream, such as the content of a video

conference, might need to be protected due to commercial,

political or security purposes. Such protection can be

achieved by full bitstream encryption, i.e., encrypting all

layers of the SVC bitstream. On the other hand, a pay TV

broadcaster does not always intend to prevent unauthorized

viewers from receiving and watching a program, but rather

intends to promote a contract with non-paying viewers.

This can be facilitated by providing a BL version of the

broadcasted program for everyone; but only authorized

users get access to the full bitstream, (i.e., the BL and the

ELs). In this article, we focus on this latter scenario that

can be accomplished using transparent encryption in which

the BL is left in cleartext while all the ELs are encrypted.

It is highly desirable for a H.264/SVC encryption

scheme to satisfy the following properties. First, an

encrypted SVC bitstream should preserve the adaptation-

transparent property of the original plaintext bitstream.

H.264/SVC enables adaptation of a high quality (resolu-

tion, frame rate) SVC bitstream into a low quality (reso-

lution, frame rate) one by simply removing parts of the

network abstraction layer (NAL) units at media-aware

network elements (MANEs), such as proxies, so as to meet

R. H. Deng � X. Ding � Z. Wei (&)

School of Information Systems, Singapore Management

University, Singapore 178902, Singapore

e-mail: [email protected]

R. H. Deng


X. Ding


Y. Wu

Institute for Infocomm Research, 1 Fusionopolis Way,

Singapore 138632, Singapore


123

Multimedia Systems

DOI 10.1007/s00530-013-0326-0

the requirements of various user devices and network

bandwidths (as shown in Fig. 1). In an open and public

video streaming environment, those MANEs are not

always considered as trustworthy and are not allowed to

have access to SVC bitstreams. An encryption scheme for

H.264/SVC is adaptation-transparent which means that

encryption will not affect the scalability of the plaintext

SVC bitstream. That is, to perform quality or resolution

adaptation, a MANE simply discards certain encrypted

ELs, without having to first decrypt them. On the one hand,

adaptation-transparency simplifies the design and opera-

tions of MANEs since they do not have to explicitly dif-

ferentiate whether the streams are in ciphertext or in

plaintext; such a property is significant especially in large

networks shared by many content providers and users. On

the other hand, MANEs do not need to perform decryption

and re-encryption, which lowers their operational com-

plexity and maintains end-to-end (from content providers

to end users) security. Second, transcodability of the

encrypted bitstreams should be preserved. An encryption

scheme for H.264/SVC is transcoding-transparent. Trans-

coding transparency means that a MANE can perform

content-preserving manipulations, such as requantization

or resolution downsampling, directly on encrypted ELs

without decryption, which simultaneously guarantees

security and favors operational complexity. Since only ELs

are encrypted by transparent-encryption, we consider

encrypted ELs transcoding-transparent but not BLs.

Transcoding of ELs refers to requantization of the coeffi-

cients using large QP (quantization parameter) in order to

lower ELs bit rate. Finally, the encryption scheme should

preserve format-compliance to avoid decoder freezing or

crashing, have low computation complexity, and incur

small compression overhead.

A naıve way to protect an SVC bitstream is to treat the

bitstream as a monolithic non-structured data and encrypt it

as a whole. Apparently, this naıve approach destroys the

scalable property of the SVC bitstream. Many schemes for

encrypting different SVC formats have been proposed in

the literature. For examples, secure scalable streaming

(SSS) [2] and SSS with error correction codes [3] for

general SVC bitstreams, encryption of JPEG XR [4, 5],

encryption of MPEG-4 fine-grain scalability (FGS) [6, 7],

and encryption of DWTSB-based SVC [8]. Note that above

encryption schemes are designed for corresponding SVC

formats, hence schemes which are proposed for one SVC

format cannot satisfy performance of other SVC formats.

For instance, if the scheme of DWTSB-based SVC in [8],

which is a selective and scalable encryption for ELs by

scrambling of scan patterns, is exploited for protecting

H.264/SVC codestreams, our experiments1 on Foreman

sequence indicate that: (1) it causes about 16.6 % com-

pression overhead because scrambling destroys the statis-

tical distribution of quantized DCT coefficients; (2) it

increases computation overhead, since it must scramble

every 4 9 4 block; (3) it is not transcoding-transparent

because encrypted SVC codestream cannot be decrypted

again after transcoding, e.g., re-quantization. In this article,

we focus on H.264/SVC encryption [9] whose related

works are explained in Sect. 2.

Block permutation has been proposed to encrypt mul-

timedia streams such as JPEG2000 bitstreams [10]. How-

ever, block permutation cannot be used to encrypt BLs in

Fig. 1 SVC content

distribution example

1 Encoded H.264/SVC (QCIF) consists of base layer and an

encrypted quality enhancement layer. Quantization parameters (QP)

of base layer and enhancement layer are equal 40 and 30,

respectively.

R. H. Deng et al.

123

H.264/AVC and H.264/SVC bitstreams because intra-code

operations in these coding standards must follow a fixed

macroblock order. A comparison of different scrambling

techniques in the DCT domain was given in [11], which

claims that block-based permutation adversely affects

compression overhead. To reduce the overhead, in this

paper we present an efficient block-based encryption

scheme (BBES) which randomly permuts macroblocks and

subblocks in an SVC EL. The BBES encrypts ELs of H.264/

SVC bitstreams in three modes: Intra-MB, Group-MB and

4Group-MB, by exploiting the inter-layer prediction tech-

niques [1]. All the three modes of BBES provide robust

security, preserve adaptation-transparency of the original

cleartext bitstreams, and are fully format-compliant with

the H.264/SVC format specifications. Additionally, Intra-

MB and Group-MB modes are also transcoding-transparent.

We implement the BBES-transparent encryption of SVC

bitstreams using JSVM 19.9 [12]. Our experiments show

that BBES incurs low computation complexity at both the

sender and the receiver sides, and has small compression

overhead.

The rest of this paper is organized as follows. We briefly

review the existing H.264/SVC encryption schemes in

Sect. 2. Then we present an overview of H.264/SVC in

Sect. 3. The BBES is presented in Sect. 4. Experimental

results and our evaluations are given in Sect. 5. Section 6

concludes the paper.

2 Related work

Encryption schemes for H.264/SVC can be classified into

two major approaches. One is to perform encryption and

compression simultaneously, while the other is to perform

compression first, and then encryption.

2.1 Encryption-with-compression

Several encryption schemes were proposed in which

compression (decompression) and encryption (decryp-

tion) are simultaneously executed. The scheme in [13]

encrypts signs of texture, intra-mode, and motion vectors

during SVC encoding. Similarly, the schemes in [14–17]

encrypt signs of residual coefficients, or signs of motion

vectors. Additionally, the scheme in [14] also encrypts

DC coefficients. Naturally, this joint encryption with

compression scheme is adaptation-transparent, preserves

format-compliance, and has no effect on compression

overhead. However, this approach is not transcoding-

transparent. Based on transcoding-transparent description,

although transcodings, e.g., re-quantization or resolution

downsampling, are performed on encrypted SVC ELs

without decryption, encrypted ELs after transcoding can

still be decrypted at received sides. Since encryption-

with-compression utilized sign encryption, i.e., the signs

of non-zero coefficients or motion vectors were encryp-

ted by streamcipher or blockcipher, transcoding opera-

tions must change sign value (i.e., positive and negative)

and value of coefficients (i.e., non-zero and zero).

Hence, encrypted ELs after transcoding cannot be

decrypted again by original secret key. More impor-

tantly, some of its embodiment (e.g., sign encryption) is

not secure because they do not provide sufficient visual

scrambling for human eyes [18].

2.2 Encryption-after-compression

Based on the H.264/SVC NAL structure, several

encryption techniques on compressed bitstreams were

presented in the literature. All of them propose to

encrypt important NAL units (NALUs) (e.g., video

coding layer NAL) using a standard stream or block

cipher. The scheme in [19] encrypts SVC layers using

block keys which are in turn encrypted using a content

key, and the encrypted block keys are inserted into the

encrypted SVC bitstream. The encryption schemes in

[20–23] also encrypt bitstreams based on H.264 format.

Encryption performed after compression may satisfy the

property of adaptation-transparency or format-compli-

ance. However, such an approach normally results in

increased computational cost. The approach may also

introduce relatively higher compression overhead since

initial vectors or semantic markers are inserted into the

encrypted bitstreams in order to facilitate decryption and

preserve the syntax requirements for NALUs. Obviously,

it is difficult for this approach to achieve transcoding

transparency if encryption is performed on compressed

bitstreams.

3 Preliminaries

This section introduces the H.264/SVC standard, its inter-

layer prediction technique for ELs, and its context-adaptive

variable-length coding (CAVLC) technique.

3.1 H.264/SVC overview

An SVC bitstream consists of a basic quality video pro-

vided by a BL as well as one or more ELs. Due to the

flexible arrangement of NALUs, SVC provides three kinds

of scalabilities which enable an MANE to reduce bit-rate

by directly discarding NALUs so as to meet the require-

ments of network bandwidth and/or end user devices’

capabilities.

Efficient block-based transparent encryption for H.264/SVC bitstreams

123

3.1.1 Temporal scalability

Temporal scalable coding can be efficiently provided by

using hierarchical coding structures with P- or B-frames.

Frames in the temporal BLs are coded with the highest

fidelity since they are used as references for motion-com-

pensated prediction of frames in all other temporal layers.

When an NAME discards some frames (e.g., B-frames), it

produces an SVC bitstream of lower bit-rate.

3.1.2 Spatial scalability

In SVC spatial scalability, the spatial BL represents a video

of the lowest resolution while the spatial ELs increase the

resolutions of the video. Since inter-layer prediction is

used, a lower spatial layer must be present if a higher

spatial layer exists but not the other way around. Therefore,

when the spatial layers are discarded starting from the

highest layer, the rest of the spatial layers are still deco-

dable. This discarding process can be repeated until only

one layer (the BL) remains. In other words, the resolution

of a video can be decreased directly and gradually.

3.1.3 Quality scalability

In quality scalability, the quality BL is coded at the lowest

visual quality, and the quality ELs increase the visual quality

of the decoded sequence. Therefore, when the quality layers

are discarded starting from the highest layer, the rest of the

quality layers are still decodable. This discarding process can

be repeated until only one quality layer remains.

3.2 Inter-layer prediction

H.264/SVC employs the inter-layer prediction technique

which aims to minimize the usage of lower layer infor-

mation for improving the rate-distortion efficiency of ELs.

Only residuals are used for ELs, instead of intra-prediction

information such as intra-prediction modes, inter partitions,

motion vectors and references. Three types of inter-layer

predictions are defined in the H.264/SVC standard.

• Inter-layer motion prediction The associated motion

parameters, such as reference frame indexes, block

partition choices, and motion vectors, are completely

derived, or up-sampled (for spatial scalability) from the

co-located blocks in the reference layer.

• Inter-layer intra-prediction The macroblock prediction

signal is completely inferred from co-located intra-

coded blocks in the reference layer without transmitting

any additional side information. Specially, the predic-

tion signal requires up-sampling by using a 4-tap finite

impulse response (FIR) filter for spatial scalability.

• Inter-layer residual prediction The residual signal of

inter-coded macroblocks in the reference layer is up-

sampled (for spatial scalability) or directly transmitted

to the corresponding macroblock of the EL.

It is evident that the prediction signals for macroblocks

in the spatial and quality ELs are solely dependent on the

corresponding macroblocks in the reference layers, with no

relation with its neighboring macroblocks. As soon as the

reference layer frame finishes prediction coding, every

macroblock’s residual signals of EL can be independently

calculated and compressed by entropy coding.

3.3 CAVLC entropy coding

CAVLC uses seven fixed variable-length coding tables to

encode the residual and the zig-zag ordered 4 9 4 blocks

of transform coefficients [24]. CAVLC is designed to take

advantage of several characteristics of the quantized 4 9 4

blocks. CAVLC encoding of a transform coefficient

proceeds as follows:

1. Encode total number of non-zero coefficients

(TotalCoeff) and number of trailing ±1 values

(T1).

2. Encode sign of each T1, starting from the highest-

frequency T1.

3. Encode levels (sign and magnitude) of each remaining

non-zero coefficients in the block.

4. Encode total number of zeros (TotalZeros) before the

last coefficient.

5. Encode total number of zeros preceding each non-zero

coefficient.

The performances of the BBES are closely related to

step (1), as explained below. CAVLC encodes both the

total number of non-zero coefficients (TotalCoeff) and

the number of trailing ±1 values (T1). There are four

choices of look-up tables (Num_VLC0, Num_VLC1,

Num_VLC2 and Num_FLC) to use for encoding Total-

Coeff and T1 (coeff_token), and the choices depend on a

parameter Num, which is calculated by the number of non-

zero coefficients in the upper and left-hand subblocks,

named (NU) and (NL), as shown in Eq. (1). Equation (2)

below describes how the four tables are selected.

Num ¼

NU; NU 6¼ 0 and NL ¼ 0

NL; NU ¼ 0 and NL 6¼ 0

ðNU þ NLÞ=2; NU 6¼ 0 and NL 6¼ 0

0; NU ¼ 0 and NL ¼ 0

8>><

>>:

ð1Þ

Table ¼

Num VLC0; 0�Num\2

Num VLC1; 2�Num\4

Num VLC2; 4�Num\8

Num FLC; 8�Num

8>><

>>:

ð2Þ

R. H. Deng et al.

123

4 Proposed encryption scheme

BBES seamlessly integrates encryption operations into the

SVC EL macroblock coding process using secret block

permutations. This section first introduces a method for

generating secure pseudo-random block permutations and

then describe the encryption algorithms. As a result, the

rendered video stream appears as random noise to any

entity without knowledge of the secret permutations.

4.1 Pseudo-random block permutation

Let Ekð�Þ be the encryption algorithm with key k which is

shared between the sender and the receiver. Denote B ¼fB0; . . .;Bn�1g the n macroblocks (or subblocks) used in a

coding process. To permute B using k, the sender deter-

mines a permutation function pn : ½0; n� 1� ! ½0; n� 1� as

follows.

Step 1. Compute C ¼ fc0; c1; . . .; cn�1g whereci = Ek

(i, R) for 0 B i B n - 1, and R is a string

related to scalability information [priority id

(PRID), dependence id (DID), quality id (QID)

and temporal id (TID), block number, and frame

number], which can be directly obtained from

SVC NAL header.

Step 2. Sort the n ciphertexts in the ascending order,

such that ci0\ci1\ � � �\cin�1; where 0 B ij \ n.

Step 3. Define pn(ij) = i. In other words, ci is replaced

with cij.

pn is a secure pseudo-random permutation if an encryption

algorithm (e.g., AES) is used.

4.2 Encryption

An H.264/SVC frame is encoded based on (16 9 16) mac-

roblocks in a predefined order. Let n denote the number of

macroblocks in an SVC frame. Each macroblock is com-

posed of 4 9 4 subblocks. The BBES encryption operations

are seamlessly integrated with the sender’s SVC frame

encoding process for the spatial/quality ELs and are based on

block permutations at two levels: the macroblock level and

the subblock level. BBES supports three different encryption

modes, which allows the sender to balance security and

performance. The basic mode of BBES is called Intra-MB

mode whereby subblocks are shuffled within their macro-

blocks. The extended mode of BBES, with stronger security

but higher cost, is called Group-MB mode whereby sub-

blocks from different macroblocks are permuted. Different

from Intra-MB and Group-MB modes, a third mode called

4Group-MB mode shuffles subblocks from different mac-

roblocks and takes the entropy coding characteristics into

consideration in order to reduce compression overhead.

4.2.1 Encryption in Intra-MB and Group-MB modes

A content sender chooses a secret key k and a string R from

NAL header and slice header to encrypt and encode a

frame F with n macroblocks, i.e. F ¼ hB0; . . .;Bn�1i; the

sender carries on the following steps.

Step 1. Compute a pseudo-random macroblock permuta-

tion pn, as described in Sect. 4.1.

Step 2. If in Intra-MB mode, compute a subblock

permutation p16; otherwise choose a parameter

m where m B n, and compute a subblock permu-

tation p16m and then segment consecutive m mac-

roblocks as one group.

Step 3. Execute the standard SVC frame coding algo-

rithm, except that

– If a macroblock Bi is needed by SVC coding,

replace it with macroblock Bpn(i).

– If a subblock bj is needed by SVC coding,

either (when in Intra-MB mode) replace it

with bp16(j), where bp16(j) is in the same

macroblock of bj; or (when in Group-MB

mode) replace it with bp16m(j).

The sender first permutes F into F0 ¼ hBpnð0Þ; . . .;

Bpnðn�1Þi: For each macroblock Bi with 16 subblocks, i.e.,

Bi ¼ hb0; . . .; b15i; the sender converts Bi into B0i ¼hbp16ð0Þ; . . .; bp16ð15Þi in the case of Intra-MB mode.

Alternatively, the sender assembles consecutive m macro-

block as one group in Group-MB mode, and randomly

permutes all 16 9 m subblocks within the group. After

the permuted selection of both the macroblock and sub-

block levels, the sender runs the standard SVC algorithm

on the resulting frame. Note that as shown in the algo-

rithm above, the sender does not perform permutation

prior to encoding. The permutation is realized in the

block selection of the SVC algorithm. Figure 2a depicts

the flow chart of the frame encryption procedure using

BBES in Intra-MB mode and Fig. 3a shows the flow chart

of the frame encryption procedure using BBES in Group-

MB mode.

4.2.2 Encryption in 4Group-MB mode

4Group-MB mode is similar to Group-MB mode that re-

constructs subblocks of MB from different MBs. Essen-

tially, 4Group-MB organizes all the 4 9 4 subblocks into

four groups for EL frames based on TotalCoeff, then

permutes the subblocks within each group. Encryption in

this mode proceeds as follows:

Step 1. Compute a macroblock permutation pn, as

described in Sect. 4.1.


123

Step 2. Based on the TotalCoeff of subblocks,

organize them into four groups using Eq. (3):

p ¼

1 TotalCoeff 2 ½0; 2Þ2 TotalCoeff 2 ½2; 4Þ3 TotalCoeff 2 ½4; 8Þ4 TotalCoeff 2 ½8; 16�

8>><

>>:

ð3Þ

Step 3. Let mp be the number of subblocks in Group

p, and n = (P

i=14 mp)/16. Compute a subblock

permutation pmp for each Group p and then

segment consecutive mp subblocks as one group.

Step 4. Execute the standard CAVLC entropy coding as

described Step 3 in Sect. 4.2.1.

The encryption flow chart of 4Group-MB mode is

illustrated in Fig. 4a. Similar to Intra-MB and Group-MB

modes, it also depends on macroblock and subblock per-

mutations using a secret key. However, subblock permu-

tation complies with entropy coding theory. As a result,

4Group-MB mode has a negligible compression overhead,

as we will demonstrate later. Meanwhile, its encryption

operation does not require extra computational resources in

the grouping operation because standard coder also calcu-

lates TotalCoeff for entropy coding.

4.3 Decryption

For every received EL’s frame F00 in a protected bitstream,

the receiver reconstructs the string R from the frame in the

same way as the sender, and carries out the following steps

to decrypt the frame.

(a) (b)Encryption in the Intra-MB mode Decryption in the Intra-MB mode

Fig. 2 Encryption and

decryption processes in Intra-

MB mode

R. H. Deng et al.

123

4.3.1 Decryption in Intra-MB and Group-MB modes

Step 1. Compute a macroblock permutation pn using key

k and the string R, as described in Sect. 4.1.

Compute the inverse permutation pn-1, such that

pn-1(pn(i)) = i.

Step 2. If in Intra-MB mode, compute a subblock permu-

tation p16-1 which is the inverse of p16; otherwise

compute a subblock permutation p16m-1 which is the

inverse of p16m.

Step 3. Execute the standard SVC frame decoding algo-

rithm, except that

– If a macroblock Bi is needed by SVC

decoding, replace it with macroblock Bp�1n ðiÞ:

– If a subblock bj is needed by SVC decoding,

replace it with bp�116ðjÞ in Intra-MB mode); or

bp�116mðjÞ in Group-MB mode.

The decryption steps are the inverse of the encryption

steps. When decoding a frame, the receiver locates the right

block by inverting the permutation. Figure 2b depicts the

decryption operations in Intra-MB mode while Fig. 3b

illustrates the decryption process in Group-MB mode.

Similar to encryption, the decryption steps are embedded in

the SVC frame decoding procedure.

4.3.2 Decryption in 4Group-MB mode

Step 1. Compute a macroblock permutation pn using key

k and the string R, as described in Sect. 4.1.

Compute the inverse permutation pn-1, such that

pn-1(pn(i)) = i.

Step 2. Let i be the MB address, j 2 ½0; 15� the subblock

address in MB, p 2 ½1; 4� the index of groups, and

q 2 ½1;mp� the index inside Group p.

(a) Encryption in the Group-MB mode (b) Decryption in the Group-MB mode


decryption processes in Group-

MB mode


123

Step 3. Compute a subblock permutation pmp-1 which is the

inverse of pmp.

Step 4. Execute the standard subblock decoding process-

ing, except that

– if decoding subblock belongs to Bi, replace it

with macroblock Bp�1n ðiÞ:

– if decoding subblock belongs to Group p and

its index is q, replace it with bp�1mpðqÞ.

In Step 2 above, calculating TotalCoeff of subblocks is

time-consuming. However, TotalCoeff can be easily

obtained by look-up tables during entropy decoding, which

does not introduce additional computation cost.

Figure 4b illustrates the decryption operations in

4Group-MB mode. It is almost the same as normal

decoding process.

5 Evaluation

We have implemented all the three modes of BBES using

JSVM 9.19 [12]. In our experiments, the group of pictures

(GOP) size and the intra-period are set as 8 and 16,

respectively, and entropy coding selects CAVLC. To test

performance of the proposed scheme, we choose ten stan-

dard H.264/SVC benchmark video sequences2: Bus (150

frames), Foreman (300 frames), Football (260 frames),

Soccer (300 frames), Bridge-far (2,100 frames), Bridge-

close (2,000 frames), Highway (2,000 frames), Silent (300

frames), Mobile (300 frames) and Hall (300 frames) in our

BBES experiments.

(a) Encryption in the 4Group-MB mode (b) Decryption in the 4Group-MB mode


decryption processes in 4Group-

MB mode

2 Available at http://media.xiph.org/video/derf/.

R. H. Deng et al.

123

http://media.xiph.org/video/derf/

5.1 Security

There are two levels of block permutations in BBES. The

macroblock-level permutation first shuffles structures or

profiles of images, then the subblock-level permutation

further shuffles the textures and details of images. The

security of the scrambling process can be analyzed as

follows.

The pseudo-random block permutations in BBES are

generated using a block cipher such as AES. It is well

known that block ciphers can be regarded as secure

pseudo-random permutations and it is computationally

infeasible to distinguish the output of a block cipher from

that of a truly random permutation [25]. This implies that

without the knowledge of the secret key for the block

cipher, an attacker has to resort to brute force attack to

reverse the pseudo-random permutation used in BBES. Let

n be the number of macrobolcks in a frame and let k be the

key size of the block cipher. Then the number of pseudo-

random block permutations at the macroblock level is

given by min {n!, 2k}. Assuming a CIF frame comprising

n = 396 macroblocks and AES with key size k = 128 bits,

The number of pseudo-random block permutations at the

macroblock level is min {396!, 2128} = 2128! Moreover,

the subblock permutation further enhances security. In

Intra-MB mode, the number of subblock permutations is

16! = 244; in Group-MB mode, the number of subblock

permutations is (16m)! = 64! = 2296 when m = 4; and in

4Group-MB mode, the number of subblock permutations is

(m1! ? m2! ? m3! ? m4!), where mp C 64 is the number

of subblocks in Group p = 1, 2, 3, 4. In addition to the

above cryptographic-based attack, an attacker might try to

search for the best estimate directly in the transformed

domain by exploiting contextual or structural information

of the quantized coefficients such as edge continuity.

However, such attacks are very difficult to succeed due to

the diversity of high-level object structures and the

uncorrelated nature of the quantized coefficients.

At last, we further analyze semantic attack as follows.

BBES encrypts ELs by macroblock and block permutations

while leaving the plaintext of BL in order to achieve

transparent encryption. Since BL supplies the basic SVC

content, it seems that ELs can be easily attacked based on

the semantic features, i.e., using the BLs features to

reconstruct shuffling 4 9 4 blocks of ELs. However,

recovering the orders of macroblocks and blocks are very

hard without secret key. On the one hand, an EL image

contains limited residual information as shown in Fig. 5a

such that semantic features are so obscure. Figure 6 illus-

trates the histogram of Fig. 5a, which indicates most of

pixel values center on 120–130. Hence, ELs contain lim-

ited features, e.g., our scale-invariant feature transform

(SIFT) experiments show that BL contains 171 feature

points while its EL image Fig. 5a only has 17 features

points. Similar experimental results can also be obtained

from other nine SVC sequences. On the other hand, since

BBES reconstructs a new EL image by shuffling 4 9 4

blocks, the image could produce several new features (e.g.,

edge or gradient) due to the new relationship between

neighbors 4 9 4 blocks. For instance, our SIFT experiment

on the EL image (Fig. 5c) indicates that it has 27 feature

points. The feature points number is more than Fig. 5a’s

17. Hence, directly using feature extraction methods on this

image might cause mistakes. In addition, in order to

remove the effect of shuffling neighbors blocks, attack

might only depend on 4 9 4 blocks to extract features

without considering neighbors blocks. However, experi-

ments, e.g., SOBEL edge or gradient orientation histogram

(GOH), indicate that 4 blocks essentially are so small fields

such that they contain limited useful information. There-

fore, semantic attacks are not easy to recover the orders of

macroblocks and blocks without secret key.

We conducted experiments to demonstrate the visual

effects of BBES. Firstly, without the reference BL (i. e., by

setting the BL as a blank frame in the experiments), Fig. 5a

shows the image of plaintext ELs for the Foreman

sequence, while Fig. 5b and c, which appear like white

noise, illustrate the scrambled EL images when encrypted

in Intra-MB mode and 4Group-MB mode, respectively. To

compare BBES with sign encryption schemes [13–17] in

which only signs of coefficients are encrypted since EL

does not have prediction mode and motion vectors, we

performed sign encryption to spatial ELs; the result for the

(a) (b)

(c) (d)

Fig. 5 Original EL image and encrypted EL images


123

Foreman sequence is shown in Fig. 5d which clearly

exposes visual content.

Secondly, we demonstrate the visual effects of BBES

when used in transparent encryption. Figure 7a illustrates

the original image when both BL and ELs are in plaintext,

Fig. 7b depicts the image when BL is in plaintext and ELs

are encrypted in Intra-MB mode and Fig. 7c shows the

image when BL is in plaintext and ELs are encrypted in

4Group-MB mode. Finally, Fig. 7d shows the image with

BL in plaintext while ELs are encrypted using sign

encryption. Subjectively, Fig. 7b and c are of lower image

quality than Fig. 7d. As objective evaluations, Fig. 8a

illustrates the scores of local edge gradients (LEG) [26],

structural similarity index measure (SSIM) [27], and edge

similarity scores (ESS) [28]3; Fig. 8b illustrates the scores

of YAO09A [29], local feature-based visual security metric

(LFBVS) [30] and natural image contour evaluation

(NICE) [31].4 These measures indicate that videos recon-

structed from BBES-encrypted streams are noisier than

those reconstructed from sign encryption-encrypted

streams for unauthorized users who do not have the

decryption key.

5.2 Property

5.2.1 Adaptation transparency

When an encrypted SVC bitstream is delivered over a

content distribution network, MANEs may distribute it to

various clients. If an encryption scheme does not achieve

adaptation transparency, MANEs have to first decrypt an

encrypted SVC bitstream, then scale and re-encrypt it in

order to perform adaptation. Note that decryption and

encryption operations significantly increase the computa-

tional cost of MANEs. Furthermore, it may severely

compromise system security since encryption keys must be

made available to MANEs, which may not be trusted by

content sources. BBES is adaptation-transparent since its

operations are seamlessly integrated into the standard SVC

compression coding process. As every encrypted layer

satisfies the H.264/SVC format structure, BBES does not

affect NAL headers which contain scalability information

(e.g., PRID, DID, QID and TID). Therefore, encrypted ELs

preserve the adaptation-transparency property of H.264/

SVC.

5.2.2 Transcoding transparency

As elaborated in Sect. 1, an encryption scheme for H.264/

SVC should preserve the transcoding-transparency prop-

erty of SVC so that encrypted bitstreams can be transcoded

without decryption and re-encryption. In the literature,

bitstream encryption schemes [19–23] encrypt compressed

streams in a straightforward way and hence do not preserve

transcoding transparency; while selective encryption

schemes [13–17] encrypt signs of non-zero coefficients

only. If the encrypted SVC bitstream is transcoded without

decryption, the original non-zero coefficients of subblocks

are changed due to re-quantization. Hence, a receiver will

not be able to decrypt the SVC bitstream correctly.

Both Intra-MB and Group-MB modes have the trans-

coding-transparency property. They use 4 9 4 subblocks

(a) (b)

(c) (d)

Fig. 7 Scrambled images in various transparent encryption schemes

Fig. 6 Histogram of an enhancement layer image

3 if the compared images are the same, the scores are 1.4 If the compared images are the same, the score are 0.

R. H. Deng et al.

123

as permutation unit inside one microblock or a group of

microblocks as shown in Sect. 4.2.1 which is independent

of the content (i.e., quantized coefficients) of 4 9 4 sub-

blocks. Transcoding is also based on 4 9 4 subblock unit,

which changes internal content of subblocks, but has no

effect on the order of those subblocks/macroblock. Trans-

coding utilizes partial decoding and partial recoding

operations, e.g., quantization/inverse quantization, and

entropy coding/decoding (CAVLC) [24]. Partial decoding

first executes entropy decoding for the SVC EL bitstream

to recover quantized coefficients, then processes inverse

quantization for those quantized coefficients to restore

DCT coefficients. Partial recoding encodes those DCT

coefficients using different QP and operates entropy

encoding for the new quantized coefficients in order to

degrade quality of EL. Both partial decoding and recoding

processes do not damage the shuffled structure of macro-

blocks and subblocks such that the permutation order of EL

subblocks/macroblocks is preserved. Therefore, the trans-

coded SVC bitstream can still be decrypted correctly at the

end users. However, 4Group-MB does not have transcod-

ing transparency. It depends on internal content (i.e., the

TotalCoeff of each subblock) which can be changed by

transcoding such that the grouping of subblocks is

destroyed.

5.2.3 Format compliance

Except the coding/decoding order of subblocks and mac-

roblocks, BBES encryption and decryption operations are

similar to the normal H.264/SVC encoding and decoding

operations, respectively, for example, DCT/IDCT, quanti-

zation/invert quantization, entropy coding/decoding.

Hence, a BBES encrypted SVC bitstream complies with

the syntax requirement of H.264/SVC. Although a

ciphertext frame is incomprehensible without being

decrypted, a standard H.264/SVC decoder is able to oper-

ate an encrypted bitstream without crashing. This format-

compliance property differentiates BBES from the bit-

stream encryption schemes in [19–23].

5.2.4 Computational cost

The computation cost incurred by BBES is small because

BBES uses AES to generate pseudo-random block per-

mutations and BBES operations are integrated into the

standard H.264/SVC coding operations. Note that the

encryption speed of AES is very high, e.g., it can encrypt

109 MiB/s on a PC with Intel Core 2 1.83 GH processor

(http://www.cryptopp.com/benchmarks.html).

5.3 Compression overhead

Nonetheless, BBES introduces some compression overhead.

Let QP of BL be 34; Figs. 9, 10, 11 show the compression

overhead in Intra-MB, Group-MB and 4Group-MB modes

on quality and spatial scalabilities. These figures indicate

that compression overhead in Group-MB mode is the highest

compared to those in Intra-MB and 4Group-MB modes

because this mode has more effects on statistical properties

of a frame. Among the three modes, the compression over-

head in 4Group-MB mode is the lowest, which is no more

than 1%. Note that the increase of compression overhead is

almost unavoidable, because compression benefits from

information redundancy while encryption reduces redun-

dancy through randomization. Let 4QP denote the QP dif-

ference between the reference layer and an EL, we analyze

the compression overhead of BBES as follows.

5.3.1 Compression overhead in Intra-MB mode

BBES permutes subblocks in a random manner, but the

value and position of each subblock’s coefficients do not

change. Based on the descriptions in Sects. 3.3 and 4, the

only coding difference comes from coeff_token while other

coding parts related to the value and position of subblock’s

coefficients remain intact. The coeff_token encodes To-

talCoeff and T1 by selecting a look-up table which

(a) Scores of ESS, LEG, SSIM and LSS.

(b) Scores of YAO09A, LFBVS and NICE.

Fig. 8 Scores of objective quality assessment


123

http://www.cryptopp.com/benchmarks.html

depends on NU and NL. BBES alters the top and left-hand

side subblocks of the current subblock, and consequently

changes NU and NL. This leads to a different value of the

parameter Num and hence coeff_token may choose a dif-

ferent look-up table, resulting in possible compression

overhead.

In the following, we analyze the relationships among

QP, 4QP and compression overhead. Based on Eq. (2),

when the number of non-zero coefficients in a subblock is

[8, then the table Num-FLC which is the fix length coding

will be selected. For spatial scalability, as 4QP decreases

(in the x-axis direction of Fig. 9), the number of non-zero

coefficients in more and more subblocks becomes \8 (but

there are still more subblocks which contain non-zero

coefficients). As such, tables Num_VLC0, Num_VLC2, and

Num_VLC3 will be selected. As a result, the subblock

permutation will introduce more compression overhead at

spatial scalability as shown in Fig. 9. For quality scala-

bility, when 4QP is large (e.g., QP 2 [6,20)), the com-

pression overhead of quality scalability has the same trend

as that of spatial scalability (i.e., compression overhead

increases when 4QP decreases). However, as QP [ 20

(4QP changes small), the compression overhead decreases

because more and more subblocks have no non-zero

coefficients so that subblock permutation has little effect on

compression efficiency. A similar characteristic can be

observed in Group-MB mode as shown in Fig. 10.

Since the overhead in Group-MB mode is similar to that

of Intra-MB mode, for simplicity, its analysis is omitted.

5.3.2 Compression overhead in 4Group-MB mode

The compression overhead in 4Group-MB mode is caused

by the same reason as that discussed in Sect. 5.3.1, but its

compression overhead is less. This is because 4Group-MB

mode organizes subblocks based on Totalcoeffs in

order to lower the effect on NU and NL for selecting tables

Num_VLC0, Num_VLC2, Num_VLC3 and Num_FLC.

Based on the above discussion and experimental results,

one can flexibly choose Intra-MB, Group-MB or 4Group-

MB modes to encrypt SVC streams with different features

to meet various requirements.

5.4 Comparison

Table 1 summarizes the performances of typical SVC

encryption schemes, where ‘‘Sign encryption’’ refers to

sign encryption schemes [13, 15–17], ‘‘Sign&DC encryp-

tion’’ represents the sign encryption combined with DC

encryption [14], ‘‘Format-preserved’’ refers to the scheme

in [21] which preserves format-compliance property, and

‘‘Adaptation-preserved’’ denotes bitstream encryption

schemes [19, 20, 22, 23], which preserves the adaptation-

transparency property.

Compared with encryption-with-compression, although

BBES causes compression overhead while encryption-

with-compression scheme has no compression overhead

except ‘‘Sign&DC‘‘ as shown in the last row of Table 1,

BBES trades off additional properties. Firstly, BBES is

more secure than encryption-with-compression because

SVC codestreams protected by using encryption-with-

compression can easily be attacked so as to expose sensi-

tive information, e.g., adversaries just set all signs of DCT

Quantization Parameter of Enhancement Layer

Co

mp

ress

ion

ove

rhea

d(%

)

Fig. 9 Compression overhead in Intra-MB mode


Co

mp

ress

ion

ove

rhea

d (

%)

Fig. 10 Compression overhead in Group-MB mode

Co

mp

ress

ion

ove

rhea

d (

%)


Fig. 11 Compression overhead in 4Group-MB mode

R. H. Deng et al.

123

coefficients with positive or negative, then semantic con-

tent of SVC codestreams will appear [18]. Secondly, Intra-

MB and Group-MB modes preserve transcoding transpar-

ency as shown in row 3 of Table 1.

Compared with encryption-after-compression, 4Group-

MB mode has the least compression overhead. In addition,

although Intra-MB which is the simplest scheme has

slightly higher compression overhead and Group-MB

introduces the highest compression overhead, they trade off

with additional properties, e.g., adaptation transparency,

format compliance and transcoding transparency. Specifi-

cally, row 3 of Table 1 indicates that Intra-MB and Group-

MB mode can preserve transcoding transparency.

Meanwhile, according to rows 4 and 5, BBES can simul-

taneously possesses format-compliance and adaptation-

transparency properties.

6 Conclusion

Based on the macroblock prediction characteristics of SVC

EL coding technique, BBES permutes both macroblocks

and subblocks in Intra-MB, Group-MB and 4Group-MB

modes using secure pseudo-random block permutations.

Our analysis and experimental results show that all the

three modes in BBES preserve adaptation-transparent

property of H.264/SVC. Furthermore, Intra-MB and

Group-MB are transcoding-transparent. These properties

allow an MANE to directly adapt or transcode encrypted

SVC bitstreams for ELs. In addition, BBES is format-

compliant and incurs small computational and compression

costs. These properties make BBES highly suitable for

perceptual/transparent encryption of H.264/SVC bitstreams

in applications such as pay TV broadcasting.

Acknowledgments This work was supported in part by A*STAR

SERC Grant No. 102 101 0027 in Singapore.

References

1. Schwarz, H., Marpe, D., Wiegand, T.: Overview of the scalable

video coding extension of the h.264/avc standard. IEEE Trans

Circuits Syst Video Technol 17(9), 1103–1120 (2007)

2. Apostolopoulos, J.G., Wee, S.J.: Secure scalable streaming

enabling transcoding without decryption. In: Proceedings of the

IEEE International Conference on Image Processing,

pp. 437–440 (2001)

3. Gergely, V., Feher, G.: Enhancing progressive encryption for

scalable video streams. In: Conference on Information and

Communications Technologies, pp. 51–58 (2009)

4. Sohn, H., Neve, W.D., Ro, Y.M.: Region-of-interest scrambling

for scalable surveillance video using jpeg xr. ACM Multimedia,

Barcelona, pp. 861–864 (2009)

5. Sohn, H., Neve, W.D., Ro, Y.M.: Privacy protection in video sur-

veillance systems: Analysis of subband-adaptive scrambling in jpeg

xr. IEEE Trans Circuits Syst Video Technol 21(2), 170–177 (2011)

6. Yuan, C., Zhu, B.B., Wang, Y., Li, S., Zhong, Y.: Efficient and

fully scalable encryption for mpeg-4 fgs. IEEE Int Symp Circuits

Syst 2, 620–623 (2003)

7. Yuan, C., Zhu, B.B., Wang, Y., Li, S., Zhong, Y.: Scalable

protection for mpeg-4 fine granularity scalability. IEEE Trans

Multimed 7, 222–233 (2005)

8. Shahid, Z., Chaumont, M., Puech, W.: Selective and scalable

encryption of enhancement layers for dyadic scalable H.264/

AVC by scrambling of scan patterns. In: Proceedings of the

International Conference on Image Processing, pp. 1273–1276

(2009)

9. Stutz, Thomas, Uhl, Andreas: A survey of h.264 avc/svc

encryption. IEEE Transactions on Circuits and System for Video

Technology 22(3), 325–339 (2012)

10. Norcen R., Uhl, A.: Encryption of wavelet-coded imagery using

random permutations. In: Proceedings of the IEEE International

Conference on Image Processing, pp. 3431–3434 (2004)

11. Zeng, W., Lei, S.: Efficient frequency domain selective scram-

bling of digital video. IEEE Trans Multimed 5(1), 118–129

(2003)

12. Reichel, J., Schwarz, H., Wien, M.: Joint Scalable Video Model

JSVM-19, doc. Joint Video Team (JVT) of ISO/IEC MPEG &

ITU-T VCEG (2011)

13. Won, Y.G., Bae, T.M., Ro, Y.M.: Scalable protection and access

control in full scalable video coding. In: Proceedings of the

International Workshop on Digital Watermarking, pp. 407–421

(2006)

14. Algin, G.B., Tunali, E.T.: Scalable video encryption of h.264 svc

codec. J Visual Commun Image Represent 22(4), 353–364 (2011)

15. Park, S.W., Shin, S.U.: Efficient selective encryption scheme for

the h.264/scalable video coding(svc). In: Proceedings of the 4th

International Conference on Networked Computing and

Advanced Information Management, pp. 371–376 (2008)

16. Li, C.H., Zhou, X.X., Zhong, Y.Z.: Nal level encryption for

scalable video coding. In: Proceedings of the Pacific-Rim Con-

ference on Multimedia, pp. 496–505 (2008)

17. Li, C.H., Yuan, C., Zhong, Y.Z.: Layered encryption for scalable

video coding,’’ In: Proceedings of the 2nd International Congress

on Image and Signal Processing, pp. 1–4 (2009)

Table 1 Comparison between BBES and other schemes

Encryption-with-compression Encryption-after-compression BBES

Sign

encryption

Sign and DC

encryption [14]

Format-

preserved

Adaptation-

preserved

Intra-MB Group-MB 4Group-MB

Transcoding-transparency No No No No Yes Yes No

Adaptation-transparency Yes Yes No Yes Yes Yes Yes

Format-compliance Yes Yes Yes No Yes Yes Yes

Average overhead (%) No 15 2.35 1.46 2.095 4.12 0.545


123

18. Wu, C.P., Kuo, C.C.J.: Fast encryption methods for audiovisual

data confidentiality. In: Proceedings of SPIE in Multimedia

Systems and Applications III, pp. 284–295 (2000)

19. Lian, S.G.: Secure service convergence based on scalable media

coding. Telecommun Syst 45(1), 21–35 (2010)

20. Magli, E., Grangetto, M., Olmo, G.: Transparent encryption

techniques for h.264/avc and h.264/svc compressed video. Signal

Process 91(5), 1103–1114 (2011)

21. Hellwagner, H., Stutz, T., Kuschnig, R., Uhl, A.: Efficient in-

network adaptation of encrypted h.264/svc content. Signal Pro-

cess Image Commun 24(9), 740–758 (2009)

22. Arachchi, H.K., Perramon, X., Dogan, S., Kondoz, A.M.:

Adaptation-aware encryption of scalable h.264/avc video for

content security. Signal Process Image Commun 24(6), 468–483

(2009)

23. Thomas, N., Bull, D., Redmill, D.: A novel h.264 svc encryption

scheme for secure bit-rate transcoding. In: Proceedings of the

Picture Coding Symposium, pp. 1–4 (2009)

24. ITU-T Recommendation H.264 & ISO/IEC 14496 AVC.

Advanced video coding for generic audio-visual services. ITU-T

and ISO/IEC JTC 1 Recommendation H.264 and ISO/IEC 14

496-10 (MPEG-4) AVC (2003)

25. Katz, J., Lindell, Y.: Introduction to Modern Cryptography.

Chapman & Hall/CRC, London (2008)

26. Hofbauer, H., Uhl, A.: An effective and efficient visual quality

index based on local edge gradients.In: Proceedings of the 3rd

European Workshop on Visual Information Processing (EUVIP),

pp. 162–167 (2011)

27. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.: Image

quality assessment: From error visibility to structural similarity.

IEEE Trans Image Process 13(4), 600–612 (2004)

28. Mao Y., Wu M.: Security evaluation for communication-friendly

encryption of multimedia. In: Proceedings of the IEEE Interna-

tional Conference on Image Processing, pp. 1522–4880 (2004)

29. Yao, Y., Xu, Z., Li, W.: Visual security evaluation for video

encryption.In: Proceedings of the 3rd International Conference on

Communications and Networking in China, pp. 1317–1322

(2008)

30. Tong, L., Dai, F., Zhang, Y., Li, J.: Visual security evaluation for

video encryption. ACM Multimedia, Barcelona, pp. 835–838

(2010)

31. Hemami, S.S., Rouse, D.: Natural image utility assessment using

image contours. In: Proceedings of the IEEE International Con-

ference on Image Processing, pp. 2217–2220 (2009)

R. H. Deng et al.

123

Documents

Efficient block-based transparent encryption for H.264/SVC bitstreams