Upload
zhuo-wei
View
212
Download
0
Embed Size (px)
Citation preview
REGULAR PAPER
Efficient block-based transparent encryption for H.264/SVCbitstreams
Robert Huijie Deng • Xuhua Ding •
Yongdong Wu • Zhuo Wei
� Springer-Verlag Berlin Heidelberg 2013
Abstract Taking advantage of the inter-layer prediction
technique used in H.264/scalable video coding (H.264/
SVC), in this paper we propose an efficient block-based
encryption scheme (BBES) for encrypting H.264/SVC
enhancement layers (ELs). BBES operates in three modes,
namely, Intra-MB mode, Group-MB mode and 4Group-
MB mode. All the three modes are effective in securing
ELs, preserve the ‘‘adaptation-transparent’’ property of
H.264/SVC, and are format-compliant to the H.264/SVC
bitstream format specifications. Moreover, Intra-MB and
Group-MB modes also possess the property we termed as
‘‘transcoding transparency’’. Experimental results indicate
that BBES has low computational complexity and small
compression overhead. Thus, BBES is suitable for trans-
parent encryption of H.264/SVC bitstreams in which ELs
are encrypted but base layers are left in cleartext.
Keywords Data confidentiality � H.264/SVC �Transparent encryption � Scalability
1 Introduction
The scalable extension of H.264, referred to as scalable
video coding (SVC) [1], is composed of a base layer (BL),
which is compatible with the H.264 advance video coding
(AVC), and one or more enhancement layers (ELs), which
provide video scalability in all three dimensions (i.e., time,
quality and resolution). With the rapid advancement of
networking and multimedia processing technologies,
applications of H.264/SVC are becoming more and more
popular. However, SVC bitstreams can easily be inter-
cepted when they are delivered over open networks. The
content of an SVC bitstream, such as the content of a video
conference, might need to be protected due to commercial,
political or security purposes. Such protection can be
achieved by full bitstream encryption, i.e., encrypting all
layers of the SVC bitstream. On the other hand, a pay TV
broadcaster does not always intend to prevent unauthorized
viewers from receiving and watching a program, but rather
intends to promote a contract with non-paying viewers.
This can be facilitated by providing a BL version of the
broadcasted program for everyone; but only authorized
users get access to the full bitstream, (i.e., the BL and the
ELs). In this article, we focus on this latter scenario that
can be accomplished using transparent encryption in which
the BL is left in cleartext while all the ELs are encrypted.
It is highly desirable for a H.264/SVC encryption
scheme to satisfy the following properties. First, an
encrypted SVC bitstream should preserve the adaptation-
transparent property of the original plaintext bitstream.
H.264/SVC enables adaptation of a high quality (resolu-
tion, frame rate) SVC bitstream into a low quality (reso-
lution, frame rate) one by simply removing parts of the
network abstraction layer (NAL) units at media-aware
network elements (MANEs), such as proxies, so as to meet
R. H. Deng � X. Ding � Z. Wei (&)
School of Information Systems, Singapore Management
University, Singapore 178902, Singapore
e-mail: [email protected]
R. H. Deng
e-mail: [email protected]
X. Ding
e-mail: [email protected]
Y. Wu
Institute for Infocomm Research, 1 Fusionopolis Way,
Singapore 138632, Singapore
e-mail: [email protected]
123
Multimedia Systems
DOI 10.1007/s00530-013-0326-0
the requirements of various user devices and network
bandwidths (as shown in Fig. 1). In an open and public
video streaming environment, those MANEs are not
always considered as trustworthy and are not allowed to
have access to SVC bitstreams. An encryption scheme for
H.264/SVC is adaptation-transparent which means that
encryption will not affect the scalability of the plaintext
SVC bitstream. That is, to perform quality or resolution
adaptation, a MANE simply discards certain encrypted
ELs, without having to first decrypt them. On the one hand,
adaptation-transparency simplifies the design and opera-
tions of MANEs since they do not have to explicitly dif-
ferentiate whether the streams are in ciphertext or in
plaintext; such a property is significant especially in large
networks shared by many content providers and users. On
the other hand, MANEs do not need to perform decryption
and re-encryption, which lowers their operational com-
plexity and maintains end-to-end (from content providers
to end users) security. Second, transcodability of the
encrypted bitstreams should be preserved. An encryption
scheme for H.264/SVC is transcoding-transparent. Trans-
coding transparency means that a MANE can perform
content-preserving manipulations, such as requantization
or resolution downsampling, directly on encrypted ELs
without decryption, which simultaneously guarantees
security and favors operational complexity. Since only ELs
are encrypted by transparent-encryption, we consider
encrypted ELs transcoding-transparent but not BLs.
Transcoding of ELs refers to requantization of the coeffi-
cients using large QP (quantization parameter) in order to
lower ELs bit rate. Finally, the encryption scheme should
preserve format-compliance to avoid decoder freezing or
crashing, have low computation complexity, and incur
small compression overhead.
A naıve way to protect an SVC bitstream is to treat the
bitstream as a monolithic non-structured data and encrypt it
as a whole. Apparently, this naıve approach destroys the
scalable property of the SVC bitstream. Many schemes for
encrypting different SVC formats have been proposed in
the literature. For examples, secure scalable streaming
(SSS) [2] and SSS with error correction codes [3] for
general SVC bitstreams, encryption of JPEG XR [4, 5],
encryption of MPEG-4 fine-grain scalability (FGS) [6, 7],
and encryption of DWTSB-based SVC [8]. Note that above
encryption schemes are designed for corresponding SVC
formats, hence schemes which are proposed for one SVC
format cannot satisfy performance of other SVC formats.
For instance, if the scheme of DWTSB-based SVC in [8],
which is a selective and scalable encryption for ELs by
scrambling of scan patterns, is exploited for protecting
H.264/SVC codestreams, our experiments1 on Foreman
sequence indicate that: (1) it causes about 16.6 % com-
pression overhead because scrambling destroys the statis-
tical distribution of quantized DCT coefficients; (2) it
increases computation overhead, since it must scramble
every 4 9 4 block; (3) it is not transcoding-transparent
because encrypted SVC codestream cannot be decrypted
again after transcoding, e.g., re-quantization. In this article,
we focus on H.264/SVC encryption [9] whose related
works are explained in Sect. 2.
Block permutation has been proposed to encrypt mul-
timedia streams such as JPEG2000 bitstreams [10]. How-
ever, block permutation cannot be used to encrypt BLs in
Fig. 1 SVC content
distribution example
1 Encoded H.264/SVC (QCIF) consists of base layer and an
encrypted quality enhancement layer. Quantization parameters (QP)
of base layer and enhancement layer are equal 40 and 30,
respectively.
R. H. Deng et al.
123
H.264/AVC and H.264/SVC bitstreams because intra-code
operations in these coding standards must follow a fixed
macroblock order. A comparison of different scrambling
techniques in the DCT domain was given in [11], which
claims that block-based permutation adversely affects
compression overhead. To reduce the overhead, in this
paper we present an efficient block-based encryption
scheme (BBES) which randomly permuts macroblocks and
subblocks in an SVC EL. The BBES encrypts ELs of H.264/
SVC bitstreams in three modes: Intra-MB, Group-MB and
4Group-MB, by exploiting the inter-layer prediction tech-
niques [1]. All the three modes of BBES provide robust
security, preserve adaptation-transparency of the original
cleartext bitstreams, and are fully format-compliant with
the H.264/SVC format specifications. Additionally, Intra-
MB and Group-MB modes are also transcoding-transparent.
We implement the BBES-transparent encryption of SVC
bitstreams using JSVM 19.9 [12]. Our experiments show
that BBES incurs low computation complexity at both the
sender and the receiver sides, and has small compression
overhead.
The rest of this paper is organized as follows. We briefly
review the existing H.264/SVC encryption schemes in
Sect. 2. Then we present an overview of H.264/SVC in
Sect. 3. The BBES is presented in Sect. 4. Experimental
results and our evaluations are given in Sect. 5. Section 6
concludes the paper.
2 Related work
Encryption schemes for H.264/SVC can be classified into
two major approaches. One is to perform encryption and
compression simultaneously, while the other is to perform
compression first, and then encryption.
2.1 Encryption-with-compression
Several encryption schemes were proposed in which
compression (decompression) and encryption (decryp-
tion) are simultaneously executed. The scheme in [13]
encrypts signs of texture, intra-mode, and motion vectors
during SVC encoding. Similarly, the schemes in [14–17]
encrypt signs of residual coefficients, or signs of motion
vectors. Additionally, the scheme in [14] also encrypts
DC coefficients. Naturally, this joint encryption with
compression scheme is adaptation-transparent, preserves
format-compliance, and has no effect on compression
overhead. However, this approach is not transcoding-
transparent. Based on transcoding-transparent description,
although transcodings, e.g., re-quantization or resolution
downsampling, are performed on encrypted SVC ELs
without decryption, encrypted ELs after transcoding can
still be decrypted at received sides. Since encryption-
with-compression utilized sign encryption, i.e., the signs
of non-zero coefficients or motion vectors were encryp-
ted by streamcipher or blockcipher, transcoding opera-
tions must change sign value (i.e., positive and negative)
and value of coefficients (i.e., non-zero and zero).
Hence, encrypted ELs after transcoding cannot be
decrypted again by original secret key. More impor-
tantly, some of its embodiment (e.g., sign encryption) is
not secure because they do not provide sufficient visual
scrambling for human eyes [18].
2.2 Encryption-after-compression
Based on the H.264/SVC NAL structure, several
encryption techniques on compressed bitstreams were
presented in the literature. All of them propose to
encrypt important NAL units (NALUs) (e.g., video
coding layer NAL) using a standard stream or block
cipher. The scheme in [19] encrypts SVC layers using
block keys which are in turn encrypted using a content
key, and the encrypted block keys are inserted into the
encrypted SVC bitstream. The encryption schemes in
[20–23] also encrypt bitstreams based on H.264 format.
Encryption performed after compression may satisfy the
property of adaptation-transparency or format-compli-
ance. However, such an approach normally results in
increased computational cost. The approach may also
introduce relatively higher compression overhead since
initial vectors or semantic markers are inserted into the
encrypted bitstreams in order to facilitate decryption and
preserve the syntax requirements for NALUs. Obviously,
it is difficult for this approach to achieve transcoding
transparency if encryption is performed on compressed
bitstreams.
3 Preliminaries
This section introduces the H.264/SVC standard, its inter-
layer prediction technique for ELs, and its context-adaptive
variable-length coding (CAVLC) technique.
3.1 H.264/SVC overview
An SVC bitstream consists of a basic quality video pro-
vided by a BL as well as one or more ELs. Due to the
flexible arrangement of NALUs, SVC provides three kinds
of scalabilities which enable an MANE to reduce bit-rate
by directly discarding NALUs so as to meet the require-
ments of network bandwidth and/or end user devices’
capabilities.
Efficient block-based transparent encryption for H.264/SVC bitstreams
123
3.1.1 Temporal scalability
Temporal scalable coding can be efficiently provided by
using hierarchical coding structures with P- or B-frames.
Frames in the temporal BLs are coded with the highest
fidelity since they are used as references for motion-com-
pensated prediction of frames in all other temporal layers.
When an NAME discards some frames (e.g., B-frames), it
produces an SVC bitstream of lower bit-rate.
3.1.2 Spatial scalability
In SVC spatial scalability, the spatial BL represents a video
of the lowest resolution while the spatial ELs increase the
resolutions of the video. Since inter-layer prediction is
used, a lower spatial layer must be present if a higher
spatial layer exists but not the other way around. Therefore,
when the spatial layers are discarded starting from the
highest layer, the rest of the spatial layers are still deco-
dable. This discarding process can be repeated until only
one layer (the BL) remains. In other words, the resolution
of a video can be decreased directly and gradually.
3.1.3 Quality scalability
In quality scalability, the quality BL is coded at the lowest
visual quality, and the quality ELs increase the visual quality
of the decoded sequence. Therefore, when the quality layers
are discarded starting from the highest layer, the rest of the
quality layers are still decodable. This discarding process can
be repeated until only one quality layer remains.
3.2 Inter-layer prediction
H.264/SVC employs the inter-layer prediction technique
which aims to minimize the usage of lower layer infor-
mation for improving the rate-distortion efficiency of ELs.
Only residuals are used for ELs, instead of intra-prediction
information such as intra-prediction modes, inter partitions,
motion vectors and references. Three types of inter-layer
predictions are defined in the H.264/SVC standard.
• Inter-layer motion prediction The associated motion
parameters, such as reference frame indexes, block
partition choices, and motion vectors, are completely
derived, or up-sampled (for spatial scalability) from the
co-located blocks in the reference layer.
• Inter-layer intra-prediction The macroblock prediction
signal is completely inferred from co-located intra-
coded blocks in the reference layer without transmitting
any additional side information. Specially, the predic-
tion signal requires up-sampling by using a 4-tap finite
impulse response (FIR) filter for spatial scalability.
• Inter-layer residual prediction The residual signal of
inter-coded macroblocks in the reference layer is up-
sampled (for spatial scalability) or directly transmitted
to the corresponding macroblock of the EL.
It is evident that the prediction signals for macroblocks
in the spatial and quality ELs are solely dependent on the
corresponding macroblocks in the reference layers, with no
relation with its neighboring macroblocks. As soon as the
reference layer frame finishes prediction coding, every
macroblock’s residual signals of EL can be independently
calculated and compressed by entropy coding.
3.3 CAVLC entropy coding
CAVLC uses seven fixed variable-length coding tables to
encode the residual and the zig-zag ordered 4 9 4 blocks
of transform coefficients [24]. CAVLC is designed to take
advantage of several characteristics of the quantized 4 9 4
blocks. CAVLC encoding of a transform coefficient
proceeds as follows:
1. Encode total number of non-zero coefficients
(TotalCoeff) and number of trailing ±1 values
(T1).
2. Encode sign of each T1, starting from the highest-
frequency T1.
3. Encode levels (sign and magnitude) of each remaining
non-zero coefficients in the block.
4. Encode total number of zeros (TotalZeros) before the
last coefficient.
5. Encode total number of zeros preceding each non-zero
coefficient.
The performances of the BBES are closely related to
step (1), as explained below. CAVLC encodes both the
total number of non-zero coefficients (TotalCoeff) and
the number of trailing ±1 values (T1). There are four
choices of look-up tables (Num_VLC0, Num_VLC1,
Num_VLC2 and Num_FLC) to use for encoding Total-
Coeff and T1 (coeff_token), and the choices depend on a
parameter Num, which is calculated by the number of non-
zero coefficients in the upper and left-hand subblocks,
named (NU) and (NL), as shown in Eq. (1). Equation (2)
below describes how the four tables are selected.
Num ¼
NU; NU 6¼ 0 and NL ¼ 0
NL; NU ¼ 0 and NL 6¼ 0
ðNU þ NLÞ=2; NU 6¼ 0 and NL 6¼ 0
0; NU ¼ 0 and NL ¼ 0
8>><
>>:
ð1Þ
Table ¼
Num VLC0; 0�Num\2
Num VLC1; 2�Num\4
Num VLC2; 4�Num\8
Num FLC; 8�Num
8>><
>>:
ð2Þ
R. H. Deng et al.
123
4 Proposed encryption scheme
BBES seamlessly integrates encryption operations into the
SVC EL macroblock coding process using secret block
permutations. This section first introduces a method for
generating secure pseudo-random block permutations and
then describe the encryption algorithms. As a result, the
rendered video stream appears as random noise to any
entity without knowledge of the secret permutations.
4.1 Pseudo-random block permutation
Let Ekð�Þ be the encryption algorithm with key k which is
shared between the sender and the receiver. Denote B ¼fB0; . . .;Bn�1g the n macroblocks (or subblocks) used in a
coding process. To permute B using k, the sender deter-
mines a permutation function pn : ½0; n� 1� ! ½0; n� 1� as
follows.
Step 1. Compute C ¼ fc0; c1; . . .; cn�1g whereci = Ek
(i, R) for 0 B i B n - 1, and R is a string
related to scalability information [priority id
(PRID), dependence id (DID), quality id (QID)
and temporal id (TID), block number, and frame
number], which can be directly obtained from
SVC NAL header.
Step 2. Sort the n ciphertexts in the ascending order,
such that ci0\ci1\ � � �\cin�1; where 0 B ij \ n.
Step 3. Define pn(ij) = i. In other words, ci is replaced
with cij.
pn is a secure pseudo-random permutation if an encryption
algorithm (e.g., AES) is used.
4.2 Encryption
An H.264/SVC frame is encoded based on (16 9 16) mac-
roblocks in a predefined order. Let n denote the number of
macroblocks in an SVC frame. Each macroblock is com-
posed of 4 9 4 subblocks. The BBES encryption operations
are seamlessly integrated with the sender’s SVC frame
encoding process for the spatial/quality ELs and are based on
block permutations at two levels: the macroblock level and
the subblock level. BBES supports three different encryption
modes, which allows the sender to balance security and
performance. The basic mode of BBES is called Intra-MB
mode whereby subblocks are shuffled within their macro-
blocks. The extended mode of BBES, with stronger security
but higher cost, is called Group-MB mode whereby sub-
blocks from different macroblocks are permuted. Different
from Intra-MB and Group-MB modes, a third mode called
4Group-MB mode shuffles subblocks from different mac-
roblocks and takes the entropy coding characteristics into
consideration in order to reduce compression overhead.
4.2.1 Encryption in Intra-MB and Group-MB modes
A content sender chooses a secret key k and a string R from
NAL header and slice header to encrypt and encode a
frame F with n macroblocks, i.e. F ¼ hB0; . . .;Bn�1i; the
sender carries on the following steps.
Step 1. Compute a pseudo-random macroblock permuta-
tion pn, as described in Sect. 4.1.
Step 2. If in Intra-MB mode, compute a subblock
permutation p16; otherwise choose a parameter
m where m B n, and compute a subblock permu-
tation p16m and then segment consecutive m mac-
roblocks as one group.
Step 3. Execute the standard SVC frame coding algo-
rithm, except that
– If a macroblock Bi is needed by SVC coding,
replace it with macroblock Bpn(i).
– If a subblock bj is needed by SVC coding,
either (when in Intra-MB mode) replace it
with bp16(j), where bp16(j) is in the same
macroblock of bj; or (when in Group-MB
mode) replace it with bp16m(j).
The sender first permutes F into F0 ¼ hBpnð0Þ; . . .;
Bpnðn�1Þi: For each macroblock Bi with 16 subblocks, i.e.,
Bi ¼ hb0; . . .; b15i; the sender converts Bi into B0i ¼hbp16ð0Þ; . . .; bp16ð15Þi in the case of Intra-MB mode.
Alternatively, the sender assembles consecutive m macro-
block as one group in Group-MB mode, and randomly
permutes all 16 9 m subblocks within the group. After
the permuted selection of both the macroblock and sub-
block levels, the sender runs the standard SVC algorithm
on the resulting frame. Note that as shown in the algo-
rithm above, the sender does not perform permutation
prior to encoding. The permutation is realized in the
block selection of the SVC algorithm. Figure 2a depicts
the flow chart of the frame encryption procedure using
BBES in Intra-MB mode and Fig. 3a shows the flow chart
of the frame encryption procedure using BBES in Group-
MB mode.
4.2.2 Encryption in 4Group-MB mode
4Group-MB mode is similar to Group-MB mode that re-
constructs subblocks of MB from different MBs. Essen-
tially, 4Group-MB organizes all the 4 9 4 subblocks into
four groups for EL frames based on TotalCoeff, then
permutes the subblocks within each group. Encryption in
this mode proceeds as follows:
Step 1. Compute a macroblock permutation pn, as
described in Sect. 4.1.
Efficient block-based transparent encryption for H.264/SVC bitstreams
123
Step 2. Based on the TotalCoeff of subblocks,
organize them into four groups using Eq. (3):
p ¼
1 TotalCoeff 2 ½0; 2Þ2 TotalCoeff 2 ½2; 4Þ3 TotalCoeff 2 ½4; 8Þ4 TotalCoeff 2 ½8; 16�
8>><
>>:
ð3Þ
Step 3. Let mp be the number of subblocks in Group
p, and n = (P
i=14 mp)/16. Compute a subblock
permutation pmp for each Group p and then
segment consecutive mp subblocks as one group.
Step 4. Execute the standard CAVLC entropy coding as
described Step 3 in Sect. 4.2.1.
The encryption flow chart of 4Group-MB mode is
illustrated in Fig. 4a. Similar to Intra-MB and Group-MB
modes, it also depends on macroblock and subblock per-
mutations using a secret key. However, subblock permu-
tation complies with entropy coding theory. As a result,
4Group-MB mode has a negligible compression overhead,
as we will demonstrate later. Meanwhile, its encryption
operation does not require extra computational resources in
the grouping operation because standard coder also calcu-
lates TotalCoeff for entropy coding.
4.3 Decryption
For every received EL’s frame F00 in a protected bitstream,
the receiver reconstructs the string R from the frame in the
same way as the sender, and carries out the following steps
to decrypt the frame.
(a) (b)Encryption in the Intra-MB mode Decryption in the Intra-MB mode
Fig. 2 Encryption and
decryption processes in Intra-
MB mode
R. H. Deng et al.
123
4.3.1 Decryption in Intra-MB and Group-MB modes
Step 1. Compute a macroblock permutation pn using key
k and the string R, as described in Sect. 4.1.
Compute the inverse permutation pn-1, such that
pn-1(pn(i)) = i.
Step 2. If in Intra-MB mode, compute a subblock permu-
tation p16-1 which is the inverse of p16; otherwise
compute a subblock permutation p16m-1 which is the
inverse of p16m.
Step 3. Execute the standard SVC frame decoding algo-
rithm, except that
– If a macroblock Bi is needed by SVC
decoding, replace it with macroblock Bp�1n ðiÞ:
– If a subblock bj is needed by SVC decoding,
replace it with bp�116ðjÞ in Intra-MB mode); or
bp�116mðjÞ in Group-MB mode.
The decryption steps are the inverse of the encryption
steps. When decoding a frame, the receiver locates the right
block by inverting the permutation. Figure 2b depicts the
decryption operations in Intra-MB mode while Fig. 3b
illustrates the decryption process in Group-MB mode.
Similar to encryption, the decryption steps are embedded in
the SVC frame decoding procedure.
4.3.2 Decryption in 4Group-MB mode
Step 1. Compute a macroblock permutation pn using key
k and the string R, as described in Sect. 4.1.
Compute the inverse permutation pn-1, such that
pn-1(pn(i)) = i.
Step 2. Let i be the MB address, j 2 ½0; 15� the subblock
address in MB, p 2 ½1; 4� the index of groups, and
q 2 ½1;mp� the index inside Group p.
(a) Encryption in the Group-MB mode (b) Decryption in the Group-MB mode
Fig. 3 Encryption and
decryption processes in Group-
MB mode
Efficient block-based transparent encryption for H.264/SVC bitstreams
123
Step 3. Compute a subblock permutation pmp-1 which is the
inverse of pmp.
Step 4. Execute the standard subblock decoding process-
ing, except that
– if decoding subblock belongs to Bi, replace it
with macroblock Bp�1n ðiÞ:
– if decoding subblock belongs to Group p and
its index is q, replace it with bp�1mpðqÞ.
In Step 2 above, calculating TotalCoeff of subblocks is
time-consuming. However, TotalCoeff can be easily
obtained by look-up tables during entropy decoding, which
does not introduce additional computation cost.
Figure 4b illustrates the decryption operations in
4Group-MB mode. It is almost the same as normal
decoding process.
5 Evaluation
We have implemented all the three modes of BBES using
JSVM 9.19 [12]. In our experiments, the group of pictures
(GOP) size and the intra-period are set as 8 and 16,
respectively, and entropy coding selects CAVLC. To test
performance of the proposed scheme, we choose ten stan-
dard H.264/SVC benchmark video sequences2: Bus (150
frames), Foreman (300 frames), Football (260 frames),
Soccer (300 frames), Bridge-far (2,100 frames), Bridge-
close (2,000 frames), Highway (2,000 frames), Silent (300
frames), Mobile (300 frames) and Hall (300 frames) in our
BBES experiments.
(a) Encryption in the 4Group-MB mode (b) Decryption in the 4Group-MB mode
Fig. 4 Encryption and
decryption processes in 4Group-
MB mode
2 Available at http://media.xiph.org/video/derf/.
R. H. Deng et al.
123
5.1 Security
There are two levels of block permutations in BBES. The
macroblock-level permutation first shuffles structures or
profiles of images, then the subblock-level permutation
further shuffles the textures and details of images. The
security of the scrambling process can be analyzed as
follows.
The pseudo-random block permutations in BBES are
generated using a block cipher such as AES. It is well
known that block ciphers can be regarded as secure
pseudo-random permutations and it is computationally
infeasible to distinguish the output of a block cipher from
that of a truly random permutation [25]. This implies that
without the knowledge of the secret key for the block
cipher, an attacker has to resort to brute force attack to
reverse the pseudo-random permutation used in BBES. Let
n be the number of macrobolcks in a frame and let k be the
key size of the block cipher. Then the number of pseudo-
random block permutations at the macroblock level is
given by min {n!, 2k}. Assuming a CIF frame comprising
n = 396 macroblocks and AES with key size k = 128 bits,
The number of pseudo-random block permutations at the
macroblock level is min {396!, 2128} = 2128! Moreover,
the subblock permutation further enhances security. In
Intra-MB mode, the number of subblock permutations is
16! = 244; in Group-MB mode, the number of subblock
permutations is (16m)! = 64! = 2296 when m = 4; and in
4Group-MB mode, the number of subblock permutations is
(m1! ? m2! ? m3! ? m4!), where mp C 64 is the number
of subblocks in Group p = 1, 2, 3, 4. In addition to the
above cryptographic-based attack, an attacker might try to
search for the best estimate directly in the transformed
domain by exploiting contextual or structural information
of the quantized coefficients such as edge continuity.
However, such attacks are very difficult to succeed due to
the diversity of high-level object structures and the
uncorrelated nature of the quantized coefficients.
At last, we further analyze semantic attack as follows.
BBES encrypts ELs by macroblock and block permutations
while leaving the plaintext of BL in order to achieve
transparent encryption. Since BL supplies the basic SVC
content, it seems that ELs can be easily attacked based on
the semantic features, i.e., using the BLs features to
reconstruct shuffling 4 9 4 blocks of ELs. However,
recovering the orders of macroblocks and blocks are very
hard without secret key. On the one hand, an EL image
contains limited residual information as shown in Fig. 5a
such that semantic features are so obscure. Figure 6 illus-
trates the histogram of Fig. 5a, which indicates most of
pixel values center on 120–130. Hence, ELs contain lim-
ited features, e.g., our scale-invariant feature transform
(SIFT) experiments show that BL contains 171 feature
points while its EL image Fig. 5a only has 17 features
points. Similar experimental results can also be obtained
from other nine SVC sequences. On the other hand, since
BBES reconstructs a new EL image by shuffling 4 9 4
blocks, the image could produce several new features (e.g.,
edge or gradient) due to the new relationship between
neighbors 4 9 4 blocks. For instance, our SIFT experiment
on the EL image (Fig. 5c) indicates that it has 27 feature
points. The feature points number is more than Fig. 5a’s
17. Hence, directly using feature extraction methods on this
image might cause mistakes. In addition, in order to
remove the effect of shuffling neighbors blocks, attack
might only depend on 4 9 4 blocks to extract features
without considering neighbors blocks. However, experi-
ments, e.g., SOBEL edge or gradient orientation histogram
(GOH), indicate that 4 blocks essentially are so small fields
such that they contain limited useful information. There-
fore, semantic attacks are not easy to recover the orders of
macroblocks and blocks without secret key.
We conducted experiments to demonstrate the visual
effects of BBES. Firstly, without the reference BL (i. e., by
setting the BL as a blank frame in the experiments), Fig. 5a
shows the image of plaintext ELs for the Foreman
sequence, while Fig. 5b and c, which appear like white
noise, illustrate the scrambled EL images when encrypted
in Intra-MB mode and 4Group-MB mode, respectively. To
compare BBES with sign encryption schemes [13–17] in
which only signs of coefficients are encrypted since EL
does not have prediction mode and motion vectors, we
performed sign encryption to spatial ELs; the result for the
(a) (b)
(c) (d)
Fig. 5 Original EL image and encrypted EL images
Efficient block-based transparent encryption for H.264/SVC bitstreams
123
Foreman sequence is shown in Fig. 5d which clearly
exposes visual content.
Secondly, we demonstrate the visual effects of BBES
when used in transparent encryption. Figure 7a illustrates
the original image when both BL and ELs are in plaintext,
Fig. 7b depicts the image when BL is in plaintext and ELs
are encrypted in Intra-MB mode and Fig. 7c shows the
image when BL is in plaintext and ELs are encrypted in
4Group-MB mode. Finally, Fig. 7d shows the image with
BL in plaintext while ELs are encrypted using sign
encryption. Subjectively, Fig. 7b and c are of lower image
quality than Fig. 7d. As objective evaluations, Fig. 8a
illustrates the scores of local edge gradients (LEG) [26],
structural similarity index measure (SSIM) [27], and edge
similarity scores (ESS) [28]3; Fig. 8b illustrates the scores
of YAO09A [29], local feature-based visual security metric
(LFBVS) [30] and natural image contour evaluation
(NICE) [31].4 These measures indicate that videos recon-
structed from BBES-encrypted streams are noisier than
those reconstructed from sign encryption-encrypted
streams for unauthorized users who do not have the
decryption key.
5.2 Property
5.2.1 Adaptation transparency
When an encrypted SVC bitstream is delivered over a
content distribution network, MANEs may distribute it to
various clients. If an encryption scheme does not achieve
adaptation transparency, MANEs have to first decrypt an
encrypted SVC bitstream, then scale and re-encrypt it in
order to perform adaptation. Note that decryption and
encryption operations significantly increase the computa-
tional cost of MANEs. Furthermore, it may severely
compromise system security since encryption keys must be
made available to MANEs, which may not be trusted by
content sources. BBES is adaptation-transparent since its
operations are seamlessly integrated into the standard SVC
compression coding process. As every encrypted layer
satisfies the H.264/SVC format structure, BBES does not
affect NAL headers which contain scalability information
(e.g., PRID, DID, QID and TID). Therefore, encrypted ELs
preserve the adaptation-transparency property of H.264/
SVC.
5.2.2 Transcoding transparency
As elaborated in Sect. 1, an encryption scheme for H.264/
SVC should preserve the transcoding-transparency prop-
erty of SVC so that encrypted bitstreams can be transcoded
without decryption and re-encryption. In the literature,
bitstream encryption schemes [19–23] encrypt compressed
streams in a straightforward way and hence do not preserve
transcoding transparency; while selective encryption
schemes [13–17] encrypt signs of non-zero coefficients
only. If the encrypted SVC bitstream is transcoded without
decryption, the original non-zero coefficients of subblocks
are changed due to re-quantization. Hence, a receiver will
not be able to decrypt the SVC bitstream correctly.
Both Intra-MB and Group-MB modes have the trans-
coding-transparency property. They use 4 9 4 subblocks
(a) (b)
(c) (d)
Fig. 7 Scrambled images in various transparent encryption schemes
Fig. 6 Histogram of an enhancement layer image
3 if the compared images are the same, the scores are 1.4 If the compared images are the same, the score are 0.
R. H. Deng et al.
123
as permutation unit inside one microblock or a group of
microblocks as shown in Sect. 4.2.1 which is independent
of the content (i.e., quantized coefficients) of 4 9 4 sub-
blocks. Transcoding is also based on 4 9 4 subblock unit,
which changes internal content of subblocks, but has no
effect on the order of those subblocks/macroblock. Trans-
coding utilizes partial decoding and partial recoding
operations, e.g., quantization/inverse quantization, and
entropy coding/decoding (CAVLC) [24]. Partial decoding
first executes entropy decoding for the SVC EL bitstream
to recover quantized coefficients, then processes inverse
quantization for those quantized coefficients to restore
DCT coefficients. Partial recoding encodes those DCT
coefficients using different QP and operates entropy
encoding for the new quantized coefficients in order to
degrade quality of EL. Both partial decoding and recoding
processes do not damage the shuffled structure of macro-
blocks and subblocks such that the permutation order of EL
subblocks/macroblocks is preserved. Therefore, the trans-
coded SVC bitstream can still be decrypted correctly at the
end users. However, 4Group-MB does not have transcod-
ing transparency. It depends on internal content (i.e., the
TotalCoeff of each subblock) which can be changed by
transcoding such that the grouping of subblocks is
destroyed.
5.2.3 Format compliance
Except the coding/decoding order of subblocks and mac-
roblocks, BBES encryption and decryption operations are
similar to the normal H.264/SVC encoding and decoding
operations, respectively, for example, DCT/IDCT, quanti-
zation/invert quantization, entropy coding/decoding.
Hence, a BBES encrypted SVC bitstream complies with
the syntax requirement of H.264/SVC. Although a
ciphertext frame is incomprehensible without being
decrypted, a standard H.264/SVC decoder is able to oper-
ate an encrypted bitstream without crashing. This format-
compliance property differentiates BBES from the bit-
stream encryption schemes in [19–23].
5.2.4 Computational cost
The computation cost incurred by BBES is small because
BBES uses AES to generate pseudo-random block per-
mutations and BBES operations are integrated into the
standard H.264/SVC coding operations. Note that the
encryption speed of AES is very high, e.g., it can encrypt
109 MiB/s on a PC with Intel Core 2 1.83 GH processor
(http://www.cryptopp.com/benchmarks.html).
5.3 Compression overhead
Nonetheless, BBES introduces some compression overhead.
Let QP of BL be 34; Figs. 9, 10, 11 show the compression
overhead in Intra-MB, Group-MB and 4Group-MB modes
on quality and spatial scalabilities. These figures indicate
that compression overhead in Group-MB mode is the highest
compared to those in Intra-MB and 4Group-MB modes
because this mode has more effects on statistical properties
of a frame. Among the three modes, the compression over-
head in 4Group-MB mode is the lowest, which is no more
than 1%. Note that the increase of compression overhead is
almost unavoidable, because compression benefits from
information redundancy while encryption reduces redun-
dancy through randomization. Let 4QP denote the QP dif-
ference between the reference layer and an EL, we analyze
the compression overhead of BBES as follows.
5.3.1 Compression overhead in Intra-MB mode
BBES permutes subblocks in a random manner, but the
value and position of each subblock’s coefficients do not
change. Based on the descriptions in Sects. 3.3 and 4, the
only coding difference comes from coeff_token while other
coding parts related to the value and position of subblock’s
coefficients remain intact. The coeff_token encodes To-
talCoeff and T1 by selecting a look-up table which
(a) Scores of ESS, LEG, SSIM and LSS.
(b) Scores of YAO09A, LFBVS and NICE.
Fig. 8 Scores of objective quality assessment
Efficient block-based transparent encryption for H.264/SVC bitstreams
123
depends on NU and NL. BBES alters the top and left-hand
side subblocks of the current subblock, and consequently
changes NU and NL. This leads to a different value of the
parameter Num and hence coeff_token may choose a dif-
ferent look-up table, resulting in possible compression
overhead.
In the following, we analyze the relationships among
QP, 4QP and compression overhead. Based on Eq. (2),
when the number of non-zero coefficients in a subblock is
[8, then the table Num-FLC which is the fix length coding
will be selected. For spatial scalability, as 4QP decreases
(in the x-axis direction of Fig. 9), the number of non-zero
coefficients in more and more subblocks becomes \8 (but
there are still more subblocks which contain non-zero
coefficients). As such, tables Num_VLC0, Num_VLC2, and
Num_VLC3 will be selected. As a result, the subblock
permutation will introduce more compression overhead at
spatial scalability as shown in Fig. 9. For quality scala-
bility, when 4QP is large (e.g., QP 2 [6,20)), the com-
pression overhead of quality scalability has the same trend
as that of spatial scalability (i.e., compression overhead
increases when 4QP decreases). However, as QP [ 20
(4QP changes small), the compression overhead decreases
because more and more subblocks have no non-zero
coefficients so that subblock permutation has little effect on
compression efficiency. A similar characteristic can be
observed in Group-MB mode as shown in Fig. 10.
Since the overhead in Group-MB mode is similar to that
of Intra-MB mode, for simplicity, its analysis is omitted.
5.3.2 Compression overhead in 4Group-MB mode
The compression overhead in 4Group-MB mode is caused
by the same reason as that discussed in Sect. 5.3.1, but its
compression overhead is less. This is because 4Group-MB
mode organizes subblocks based on Totalcoeffs in
order to lower the effect on NU and NL for selecting tables
Num_VLC0, Num_VLC2, Num_VLC3 and Num_FLC.
Based on the above discussion and experimental results,
one can flexibly choose Intra-MB, Group-MB or 4Group-
MB modes to encrypt SVC streams with different features
to meet various requirements.
5.4 Comparison
Table 1 summarizes the performances of typical SVC
encryption schemes, where ‘‘Sign encryption’’ refers to
sign encryption schemes [13, 15–17], ‘‘Sign&DC encryp-
tion’’ represents the sign encryption combined with DC
encryption [14], ‘‘Format-preserved’’ refers to the scheme
in [21] which preserves format-compliance property, and
‘‘Adaptation-preserved’’ denotes bitstream encryption
schemes [19, 20, 22, 23], which preserves the adaptation-
transparency property.
Compared with encryption-with-compression, although
BBES causes compression overhead while encryption-
with-compression scheme has no compression overhead
except ‘‘Sign&DC‘‘ as shown in the last row of Table 1,
BBES trades off additional properties. Firstly, BBES is
more secure than encryption-with-compression because
SVC codestreams protected by using encryption-with-
compression can easily be attacked so as to expose sensi-
tive information, e.g., adversaries just set all signs of DCT
Quantization Parameter of Enhancement Layer
Co
mp
ress
ion
ove
rhea
d(%
)
Fig. 9 Compression overhead in Intra-MB mode
Quantization Parameter of Enhancement Layer
Co
mp
ress
ion
ove
rhea
d (
%)
Fig. 10 Compression overhead in Group-MB mode
Co
mp
ress
ion
ove
rhea
d (
%)
Quantization Parameter of Enhancement Layer
Fig. 11 Compression overhead in 4Group-MB mode
R. H. Deng et al.
123
coefficients with positive or negative, then semantic con-
tent of SVC codestreams will appear [18]. Secondly, Intra-
MB and Group-MB modes preserve transcoding transpar-
ency as shown in row 3 of Table 1.
Compared with encryption-after-compression, 4Group-
MB mode has the least compression overhead. In addition,
although Intra-MB which is the simplest scheme has
slightly higher compression overhead and Group-MB
introduces the highest compression overhead, they trade off
with additional properties, e.g., adaptation transparency,
format compliance and transcoding transparency. Specifi-
cally, row 3 of Table 1 indicates that Intra-MB and Group-
MB mode can preserve transcoding transparency.
Meanwhile, according to rows 4 and 5, BBES can simul-
taneously possesses format-compliance and adaptation-
transparency properties.
6 Conclusion
Based on the macroblock prediction characteristics of SVC
EL coding technique, BBES permutes both macroblocks
and subblocks in Intra-MB, Group-MB and 4Group-MB
modes using secure pseudo-random block permutations.
Our analysis and experimental results show that all the
three modes in BBES preserve adaptation-transparent
property of H.264/SVC. Furthermore, Intra-MB and
Group-MB are transcoding-transparent. These properties
allow an MANE to directly adapt or transcode encrypted
SVC bitstreams for ELs. In addition, BBES is format-
compliant and incurs small computational and compression
costs. These properties make BBES highly suitable for
perceptual/transparent encryption of H.264/SVC bitstreams
in applications such as pay TV broadcasting.
Acknowledgments This work was supported in part by A*STAR
SERC Grant No. 102 101 0027 in Singapore.
References
1. Schwarz, H., Marpe, D., Wiegand, T.: Overview of the scalable
video coding extension of the h.264/avc standard. IEEE Trans
Circuits Syst Video Technol 17(9), 1103–1120 (2007)
2. Apostolopoulos, J.G., Wee, S.J.: Secure scalable streaming
enabling transcoding without decryption. In: Proceedings of the
IEEE International Conference on Image Processing,
pp. 437–440 (2001)
3. Gergely, V., Feher, G.: Enhancing progressive encryption for
scalable video streams. In: Conference on Information and
Communications Technologies, pp. 51–58 (2009)
4. Sohn, H., Neve, W.D., Ro, Y.M.: Region-of-interest scrambling
for scalable surveillance video using jpeg xr. ACM Multimedia,
Barcelona, pp. 861–864 (2009)
5. Sohn, H., Neve, W.D., Ro, Y.M.: Privacy protection in video sur-
veillance systems: Analysis of subband-adaptive scrambling in jpeg
xr. IEEE Trans Circuits Syst Video Technol 21(2), 170–177 (2011)
6. Yuan, C., Zhu, B.B., Wang, Y., Li, S., Zhong, Y.: Efficient and
fully scalable encryption for mpeg-4 fgs. IEEE Int Symp Circuits
Syst 2, 620–623 (2003)
7. Yuan, C., Zhu, B.B., Wang, Y., Li, S., Zhong, Y.: Scalable
protection for mpeg-4 fine granularity scalability. IEEE Trans
Multimed 7, 222–233 (2005)
8. Shahid, Z., Chaumont, M., Puech, W.: Selective and scalable
encryption of enhancement layers for dyadic scalable H.264/
AVC by scrambling of scan patterns. In: Proceedings of the
International Conference on Image Processing, pp. 1273–1276
(2009)
9. Stutz, Thomas, Uhl, Andreas: A survey of h.264 avc/svc
encryption. IEEE Transactions on Circuits and System for Video
Technology 22(3), 325–339 (2012)
10. Norcen R., Uhl, A.: Encryption of wavelet-coded imagery using
random permutations. In: Proceedings of the IEEE International
Conference on Image Processing, pp. 3431–3434 (2004)
11. Zeng, W., Lei, S.: Efficient frequency domain selective scram-
bling of digital video. IEEE Trans Multimed 5(1), 118–129
(2003)
12. Reichel, J., Schwarz, H., Wien, M.: Joint Scalable Video Model
JSVM-19, doc. Joint Video Team (JVT) of ISO/IEC MPEG &
ITU-T VCEG (2011)
13. Won, Y.G., Bae, T.M., Ro, Y.M.: Scalable protection and access
control in full scalable video coding. In: Proceedings of the
International Workshop on Digital Watermarking, pp. 407–421
(2006)
14. Algin, G.B., Tunali, E.T.: Scalable video encryption of h.264 svc
codec. J Visual Commun Image Represent 22(4), 353–364 (2011)
15. Park, S.W., Shin, S.U.: Efficient selective encryption scheme for
the h.264/scalable video coding(svc). In: Proceedings of the 4th
International Conference on Networked Computing and
Advanced Information Management, pp. 371–376 (2008)
16. Li, C.H., Zhou, X.X., Zhong, Y.Z.: Nal level encryption for
scalable video coding. In: Proceedings of the Pacific-Rim Con-
ference on Multimedia, pp. 496–505 (2008)
17. Li, C.H., Yuan, C., Zhong, Y.Z.: Layered encryption for scalable
video coding,’’ In: Proceedings of the 2nd International Congress
on Image and Signal Processing, pp. 1–4 (2009)
Table 1 Comparison between BBES and other schemes
Encryption-with-compression Encryption-after-compression BBES
Sign
encryption
Sign and DC
encryption [14]
Format-
preserved
Adaptation-
preserved
Intra-MB Group-MB 4Group-MB
Transcoding-transparency No No No No Yes Yes No
Adaptation-transparency Yes Yes No Yes Yes Yes Yes
Format-compliance Yes Yes Yes No Yes Yes Yes
Average overhead (%) No 15 2.35 1.46 2.095 4.12 0.545
Efficient block-based transparent encryption for H.264/SVC bitstreams
123
18. Wu, C.P., Kuo, C.C.J.: Fast encryption methods for audiovisual
data confidentiality. In: Proceedings of SPIE in Multimedia
Systems and Applications III, pp. 284–295 (2000)
19. Lian, S.G.: Secure service convergence based on scalable media
coding. Telecommun Syst 45(1), 21–35 (2010)
20. Magli, E., Grangetto, M., Olmo, G.: Transparent encryption
techniques for h.264/avc and h.264/svc compressed video. Signal
Process 91(5), 1103–1114 (2011)
21. Hellwagner, H., Stutz, T., Kuschnig, R., Uhl, A.: Efficient in-
network adaptation of encrypted h.264/svc content. Signal Pro-
cess Image Commun 24(9), 740–758 (2009)
22. Arachchi, H.K., Perramon, X., Dogan, S., Kondoz, A.M.:
Adaptation-aware encryption of scalable h.264/avc video for
content security. Signal Process Image Commun 24(6), 468–483
(2009)
23. Thomas, N., Bull, D., Redmill, D.: A novel h.264 svc encryption
scheme for secure bit-rate transcoding. In: Proceedings of the
Picture Coding Symposium, pp. 1–4 (2009)
24. ITU-T Recommendation H.264 & ISO/IEC 14496 AVC.
Advanced video coding for generic audio-visual services. ITU-T
and ISO/IEC JTC 1 Recommendation H.264 and ISO/IEC 14
496-10 (MPEG-4) AVC (2003)
25. Katz, J., Lindell, Y.: Introduction to Modern Cryptography.
Chapman & Hall/CRC, London (2008)
26. Hofbauer, H., Uhl, A.: An effective and efficient visual quality
index based on local edge gradients.In: Proceedings of the 3rd
European Workshop on Visual Information Processing (EUVIP),
pp. 162–167 (2011)
27. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.: Image
quality assessment: From error visibility to structural similarity.
IEEE Trans Image Process 13(4), 600–612 (2004)
28. Mao Y., Wu M.: Security evaluation for communication-friendly
encryption of multimedia. In: Proceedings of the IEEE Interna-
tional Conference on Image Processing, pp. 1522–4880 (2004)
29. Yao, Y., Xu, Z., Li, W.: Visual security evaluation for video
encryption.In: Proceedings of the 3rd International Conference on
Communications and Networking in China, pp. 1317–1322
(2008)
30. Tong, L., Dai, F., Zhang, Y., Li, J.: Visual security evaluation for
video encryption. ACM Multimedia, Barcelona, pp. 835–838
(2010)
31. Hemami, S.S., Rouse, D.: Natural image utility assessment using
image contours. In: Proceedings of the IEEE International Con-
ference on Image Processing, pp. 2217–2220 (2009)
R. H. Deng et al.
123