An Efficient Security System for CABAC Bin-Strings of H.264/SVC

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 3, MARCH 2013 425

An Efficient Security System for CABACBin-Strings of H.264/SVC

Mamoona Naveed Asghar and Mohammad Ghanbari, Fellow, IEEE

Abstract—The distribution of copyrighted scalable video con-tent to differing digital devices requires protection during ren-dering and transmission. In this paper, we propose a complete se-curity system for H.264/scalable video coding (SVC) video codecand present a solution for the bit-rate and format complianceproblems by careful selection of entropy coder syntax elements(bin-strings) for selective encryption (SE), and the problemof managing multiple layer encryption keys for scalable videodistribution. A standard key management protocol, multimediaInternet keying protocol, is implemented for the hierarchicalkey generation mechanism, in which a subscriber has only oneencryption key to unlock all scalable layers that have beensubscribed to. The evaluation demonstrates the resulting videoquality degradation arising from SE for many CIF and 4CIF testvideo sequences, without there being any impact upon the bit-rate or format compliancy, and with small computational delay.The security and statistical analysis performed further verify theeffectiveness of the proposed security system for H.264/SVC. Theproposed system is highly suitable for video distribution to userswho have subscribed to a varying degree of video quality ondevices with medium to high computational resources.

Index Terms—AES-CFB algorithm, context adaptive binaryarithmetic coding (CABAC), H.264/scalable video coding (SVC),key management, multimedia Internet keying protocol (MIKEY),security system.

I. Introduction

W ITH ADVANCES in digital media, increases in pro-cessing power and network bandwidth, numerous mul-

timedia applications and codecs have evolved in recent years.The joint video team of the ITU-T VCEG and the ISO/IECMPEG have standardized scalable video coding (SVC) thatis an extension of the state-of-the-art H.264/AVC standard[1], [2]. Scalable video coding (H.264/SVC) [3] permits thetransmission and decoding of partial bit-streams to providevideo services at various temporal, spatial and/or qualityresolutions, as well as preserving a reconstruction quality thatis high enough relative to the rate of the partial bit-streams.

Copyrighted digital content is always vulnerable to pla-giarism attacks by its ease of copying and modification.Therefore, the concerns that exist about their protection andauthentication are significant. Cryptography is a conventional

Manuscript received January 24, 2012; revised March 14, 2012; acceptedMarch 25, 2012. Date of publication June 14, 2012; date of current versionMarch 7, 2013. This paper was recommended by Associate Editor M. Barni.

The authors are with the School of Computer Science and ElectronicEngineering, University of Essex, Colchester CO4 3SQ, U.K. (e-mail: [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSVT.2012.2204941

technique that has been used for many decades to providemultimedia content protection. To improve the runtime per-formance and to avoid computational complexities, selectiveand partial encryption for video content security has beensuggested. The selective encryption is carried out on the mostsignificant information at a choice of different stages of thecodec, such as on the original pixels, the transform coeffi-cients, the quantization indexes, the bit-planes, the entropycoder, or the final bit-stream [4]. Encryption alters the videostatistics, resulting in the issues of bit-rate overhead and formatcompliancy at the decoder. However, applying the encryptionat the entropy coding stage minimizes these problems.

The cryptographic algorithms used for encryption are nevera secret; their steps are visible to public. The object that shouldbe hidden from the public and unauthorized access is the keyused for encryption by cryptographic algorithms. Kerckhoffs’principle [5] declares that the rival can know the chosen cipheralgorithm but not the key, and thus the security of the key isimperative for data security. Therefore, the proposed securitysystem for scalable video coding should not only involvethe protection of data by encryption, but also incorporate theprotection of secret values (keys) with a solution that providesmanagement of multiple keys with minimal overhead.

The key generation and distribution is the issue to cruciallytackle in order to further enhance the security of any cipheralgorithm. This field of study has not been given enoughattention in the past studies on multimedia security. So, tofulfill the current need for scalable content distribution, wehave devised an efficient security system including digitalrights management (DRM) state, giving sufficient encryption[6] with a key management mechanism to further enhance theprotection of the encryption key. The idea behind sufficientencryption is that the scalable contents should have enoughsecurity with selective encryption and must reach the userin a scrambled form, rather than be watchable with a singleencryption key for all layers. Receiving the upper layers ina scrambled form enhances a user’s interest in subscribing tothe upper high-quality video layers.

The proposed security system includes: 1) careful selec-tion of context adaptive binary arithmetic coding (CABAC)syntax elements (bin-strings) for selective encryption (SE);2) application of SE on bin-strings of H.264/SVC on a per-layer basis by using advanced encryption standard (AES) witha cipher feedback mode (AES-CFB) in a compression-friendlyand format compliant manner; and 3) a key management(generation and distribution) scheme to improve the content

1051-8215/$31.00 c© 2012 IEEE

426 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 3, MARCH 2013

security of H.264/SVC layers by using the IETF standard keymanagement protocol MIKEY.

The remainder of this paper is organized as follows. Sec-tion II overviews the prior research in the area of H.264protection and key management. Section III describes ele-ments of the H.264/SVC codec, specifically CABAC entropycoding used in this paper. It also reviews the key manage-ment protocol (MIKEY) and cipher algorithm (AES-CFB).Section IV describes the proposed security system featuresand implementation. Section V elaborates the experimentalresults on video appearance after applications of SE with adetailed performance analysis. Finally, Section VI containsconclusions, with some proposals for future work.

II. Prior Research on H.264 Security

Many multimedia selective content encryption methods havebeen proposed over the last five years [7], [8], for the securityof the latest H.264/AVC standard video codec. The study in [9]describes the SE on I-frames by extracting them from theH.264 bit-stream and using the AES algorithm for cipheringand deciphering. The scheme reduces the computational cost,but it is nonformat-compliant. Additionally, the idea is notsuitable for selective encryption, because I-frame encryptionis not as significant as encryption of other encoding com-ponents, e.g., data part A from the data partition mode ofH.264/AVC [10]. The idea of scrambling DCT coefficients[11] was applied by Wang et al. [12], but it degraded thecompression efficiency. Selective scrambling of bits, DCTcoefficients, and motion vectors were proposed by Zeng et al.[13], which also degraded the compression efficiency. Spin-sante et al. [14] proposed H.264/AVC partial encryption ofquantization parameters (QP), deblocking filter coefficients,and intraprediction mode, one by one, and altogether for thefinal outcome. The selected parameters increase the bit-rateand the encryption algorithm that is described for the results isinefficient. Fan et al. [15], [16] presented a novel video encryp-tion scheme for H.264/AVC. Three different block and streamcipher algorithms AES, fast leak extraction (FLEX), and XOR

are used to encrypt the H.264/AVC stream. The work describesthe unequal secure encryption approach in which they classi-fied the important and unimportant video content by using datapartitioning. The important data content is encrypted by AES,the least important by FLEX, and the XOR technique is usedto show the alternative simple encryption. This paper is a sig-nificant contribution toward H.264/AVC selective encryption,but the computational cost can be further minimized by notencrypting the identified unimportant video data.

Limited research has been carried out on the security ofSVC [17]. Apostolopoulos [18] investigated a secure scalablestreaming framework that provides end-to-end security and in-network secure transcoding for content using SVC [19], [20].The NAL level encryption for H.264/AVC and SVC is alsoproposed in the study of [21]. The NAL units are individuallyencrypted after the compression so that they have no sideeffect on compression efficiency and format compliancy ofthe bit-stream. The scheme is applied by setting the NAL unittype of encrypted NALs outside the defined range, and the

decoder is forced to reject those NALs unless encryption isenabled. The scheme is only applied to the SVC enhancementlayers, with a small effect on bit-rate. In [22] and [23], alow-quality free preview application was already developedby performing transparent encryption and conditional accesson the H.264/SVC layers. The scalable enhancement layers areencrypted by using AES, while leaving the base layer in a plainformat. However, the authors pointed out that the enhancementlayers are nonformat-compliant.

On the other hand, there is a conviction that if the base layeris protected then no one can get the data from the enhancementlayers and the whole SVC bit-stream is secured. The researchshows [24] that if objects are encrypted in this way, thereal content can be easily guessed without decryption. Con-sequently, Algin et al. [25] proposed the idea of SE on SVCwith three security levels. The idea concerns the encryption ofsigns of coefficients, signs of motion vectors, and alteration ofDC values. The sign encryption has no effect on bit-rate andcompression efficiency (as the signs are equally distributed),but DC value alterations affect the compression efficiency.

There has been some recent work on key generation anddistribution for standard and scalable video coding. Li et al.[26], [27] devised an NAL level selective encryption techniquefor H.264/SVC. The scheme encrypts the instantaneous de-coding refresh (IDR) pictures, sequence parameter set (SPS),picture parameter set (PPS) on individual NAL units [26] andintraprediction modes (IPM) with signs of textures for baselayer [27] by using the stream cipher leak extraction (LEX)algorithm. The LEX uses three keys for each of the three NALunits. The study pointed out required future work, such as akey management scheme, which is a foremost issue in the se-curity of any cipher algorithm. Wang et al. [28] demonstratedthe idea of hierarchical key generation for the cipher algorithmto encrypt the partial H.264/AVC video content, including theintraprediction mode, motion vector differences (MVDs), andquantization coefficients. Every frame has a unique key in thewhole video as in the group of pictures (GOP) key generationdesign by Yuan et al. [29]. Three subkeys are also derived,each for the encryption of intraprediction mode, motion vectordifferences, and quantization coefficients. Therefore, if anattacker can hack the frame key, he can decipher the frame butcannot obtain the frame contents. This appears to be a deriva-tive of MIKEY with multiple key overhead but with reducedefficacy. Perhaps, it was done to reduce the computationalcost. Nevertheless, we feel that using the derivative insteadof MIKEY has weakened the security, specifically in wirelesstransmission. The study in [30] investigated the scalable layerprotection with individual layer keys. The keys are generatedfor individual scalable NAL units, meaning N different keysare derived and distributed to decode the individual layer.The scheme is complicated and has a high computational costfor identifying the NAL units related to the scalable features.The same selective parameters for encryption are extended forthe protection of the region of interest (ROI) with a streamcipher [31]. Park et al. [32], [33] designed a hierarchicalkey management scheme for the selective encryption of SVC.The study in [32] proposed the scheme of partial encryptionof base and enhancement layers. The IPM, motion vector

ASGHAR AND GHANBARI: EFFICIENT SECURITY SYSTEM FOR CABAC BIN-STRINGS OF H.264/SVC 427

differences, and residual (texture sign bits) are encrypted in thebase layer. For the security of spatial and signal-to-noise ratio(SNR) scalability layers, the texture sign bits in every layerare encrypted, but for temporal scalability layers, the MVDsign bits and texture sign bits are encrypted. The authors in[33] devised a key management scheme by creating multiplekeys, i.e., all layer keys are generated with the help of a MD5hash [34]. The NAL unit key is generated by a Hash messageauthentication code (HMAC) [35] and features created by theabsolute DC and some threshold values. The key managementscheme provides robustness against the known brute-forceattack due to the different NAL unit keys.

All the reviewed studies have their own devised key man-agement mechanisms, but they do not refer to any standard keymanagement protocol. The author in [36] pointed out that theearlier data communication protocols and standards had veryfew security features. Generally, the security was handled ata system level that uses the communication protocols. But,these days, the communication protocols alone cannot handlethe proliferating security demands of digital devices (smartphones, tablets, netbook, and laptops). Hence, there is a needfor a key management mechanism to enhance the functionalityof the communication security protocols.

III. Overview of H.264/SVC

A. H.264 Scalable Video Coding

H.264/SVC technology permits devices to send and receivemultilayered bit-streams; it allows the transmission and de-coding of partial bit-streams to provide video services withdifferent frame rates, spatial resolutions (picture size), andquality (SNR). Scalable video has a base and a numberof enhancement layers containing various improvements inframe rates, resolution, and quality per layer. Consideringencryption will alter data characteristics, it should be appliedwhere it has a minimal side effect. This can be achievedby applying the encryption as part of the entropy codingwhere all the natural redundancies have already been exploitedfor maximum compression efficiency. The problem with jointsecurity and compression is to make sure that the securitywill not affect the compression efficiency [37]. Due to this theencryption at entropy coding needs to be handled with greatcare, since tampering with the statistical dependence of thesymbols will harm the compression efficiency.

The entropy coding used in H.264/AVC and its extensionH.264/SVC is context adaptive, and is applied in the twoforms of Huffman and arithmetic coding [38]. Our purposeof choosing CABAC over its Huffman counterpart, contextadaptive variable length coding (CAVLC), is based on thegreater range of parameters for encryption that CABAC pro-vides over CAVLC with more compression efficiency. TheH.264 main profile and the various high profiles, which dealwith 4CIF resolution pictures and above, support CABAC.Thus, it appears that the multiscale video distribution of thefuture will support CABAC. Currently, the outlook is for fullVGA resolution on standard streaming mobile applications(e.g., Apple’s FaceTime), full 720p high definition (HD) onmobile devices, and full 1080i HD for desktop streaming,

Fig. 1. CABAC encoder top view.

which will require a reduction in bit-rate that can be supportedby CABAC.

B. Context-Adaptive Binary Arithmetic Coding

CABAC [39] is one of the entropy coding modes used byH.264/SVC to achieve high compression and can be easilycomputed on devices with medium-to-high computational re-sources. To make entropy coding computationally efficient,both CAVLC and CABAC use the single infinite extent code-word, called Exp-Golomb code [40] to generate the requiredcode for most of the data elements [41].

CABAC encoding (Fig. 1) is based on three steps:1) binarization; 2) context modeling (CM); and 3) binary arith-metic coding (BAC). The binarization step is the elementarystage for CABAC encoder. Here, the input nonbinary syntaxelements, such as the quantized transform coefficients, mac-roblock (MB)-type specifier, or motion vector components, areconverted into unique binary codewords, known as bin-strings,for a given syntax element. The bit position in each bin-string,known as a bin, is then passed to one of the two coding modedecisions; regular coding mode and by-pass coding mode. Thebins in regular coding mode are passed to the next step, CMand probability distribution, and then encoded by the regularBAC engine. The bins from the bypass coding mode skipthe CM step and directly enter in the bypass BAC enginefor the encoding process. These bins are related to the signinformation of MVD and the signs of transform coefficientlevels or for lower significant bins which are assumed to beuniformly distributed.

CABAC uses five binarization schemes according to syntaxelements similar to Huffman trees for binary sequences, whichare as follows.

1) Unary code: Each unsigned integer value symbol x ≥ 0is mapped onto x “1” bits followed by a “0” terminatingbit.

2) Truncated unary (TU) code: Defined for x with0 ≤ x ≤ CV (cutoff value) is coded with a unary codeif x < CV . If x < CV , the terminating 0 bit is neglectedand the TU codeword comprises x “1” bits only.

3) kth-order Exp-Golomb (EGk) code: It is a derivative ofGolomb codes [30]. Each unsigned integer value symbolx is mapped onto two sequential bit strings: a prefix,and a suffix. The prefix part of the EGk codewordconsists of a unary code with length ls bits of 1 andone termination bit 0. The length ls of the prefix stringof bit 1 is ls =

[log2

(x2k + 1

)]and the EGk suffix part


is computed as the binary representation x + 2k(1 − 2ls),which uses k + ls significant bits, but in the kth orderof EGk the number of symbols having the same codelength is represented by Codeword = K + (2 × ls) + 1.

4) Fixed-length (FL) code: This FL binarization iscommonly applied to syntax elements with fairlyuniform distribution, where each bit in the FL binaryformat represents a specific coding decision, e.g., codedblock pattern (CBP) symbol related to the luma residualdata part. In FL a symbol x within a finite size of cutoffvalue CV is represented by FL ls =

[log2 CV

].

5) Concatenation of the first and third scheme (UEGk):There are three situations where concatenations of thefour basic types are used.

a) UFL-coded block−pattern is encoded using a4-bit FL prefix for luma and TU suffix with cutoffvalue CV = 2 for chroma.

b) UEG3- motion vector differences are encodedwith a concatenation of a unary prefix and athird-order Exp-Golomb code suffix: for a valueMVD, the prefix is a TU coding with the cutoffvalue CV = 9 of the value min(|MVD|, 9), or, ifMVD = 0, just the bit 0. If |MVD| ≤ 9, a suffix isoutput with the value | MVD |−9 using the EG3code. A sign bit is then output if |MVD|> 0: 0 ifMVD is positive and 1 otherwise.

c) UEG0− absolute values of transform coefficientlevels are coded using a TU prefix with cutoffvalue CV = 14 and the EG0 suffix. The syntax ele-ments (coeff−abs−value−minus1 = abs−level−1)are coded by using this scheme, while the zero-valued coefficient levels are encoded using asignificance map.

C. Multimedia Internet Keying Protocol

MIKEY [42] is designed to tackle the key exchange prob-lems, especially in real-time networks. The key managementprotocol is devised to enable end-to-end security, i.e., only theparticipants involved in the communication have authorizedaccess to the generated key(s) and hence to the content.MIKEY uses a total of eight keys. The keys are generatedon either sender side or both sides (sender and receiver) andare described as the:

1) traffic generation key (TGK);2) traffic encryption key (TEK);3) encryption keys (one for each sender and receiver);4) authentication keys (one for each sender and receiver);5) salting keys (one for each sender and receiver).

MIKEY supports five methods for transporting and es-tablishing a TGK or to setup a common secret, for allcommunication scenarios by using a preshared key, public-key encryption, Diffie–Hellman (DH) key exchange, HMAC-authenticated DH, and reverse RSA. MIKEY has the capabilityof establishing keys and parameters for more than one secu-rity protocol (or for multiple instances of the same securityprotocol) at the same time. The TEK can be used directly bythe security protocol or it can be used to derive further master

keys from the TEK. It is, however, up to the security protocolto define how the TEK is used.

D. Advanced Encryption Standard

The AES [43] is based on a modified substitution–permutation network. AES can use the keys of lengths 128bits, 192 bits, and 256 bits. For both ciphering and deciphering,the AES algorithm uses a round function that is comprised offour different byte-oriented transformations.

AES is basically a symmetric key block cipher using 128-bit block size, but it can be used as a stream cipher in cipherfeedback (CFB), output feedback (OFB), and counter modes.In selective encryption, a small number of bytes are encrypted,so implementing the AES as a stream cipher is recommended[44]. Among the above-mentioned three modes, the CFB modeis used to build a self-synchronization stream cipher thatprovides confidentiality at the transmitter and receiver. Thisproperty makes this mode a valid choice for the real-time videoapplications. However, the scheme could adopt another mode,such as OFB, and in this case more protection against errors intransmission would become available, at a cost in the lack ofself-synchronization. Thus, the choice of mode is not criticalto the proposed scheme.

The AES is chosen for encryption because of its strengthagainst all exhaustive key search attacks. It is estimated thatthe time required for breaking a 128 bit key by applying allpossible keys at 50 billion keys/s takes 5 × 1021 years [45].

IV. Proposed Security System

When the same copyrighted multimedia content is dis-tributed to multiple users with different scalable features, thereis a need for transmitting scalable coded layers separately,hence demanding individual layer security. We choose selec-tive encryption on bin-strings of the CABAC entropy coder,which are the input to the probability or context model andfinally code with a binary arithmetic coder. The research aimis to devise an efficient security system that provides sufficientencryption and a key management mechanism for SVC layers.

A. Bins Selection for Selective Encryption

The CABAC coder has multiple parameters (bin-strings)that can be encrypted; for example, transform coefficients(TC), MVD, delta quantization parameters (dQP), and thearithmetical signs of TC and MVD. To make the SE moreeffective, we need to choose sensibly the parameters for theencryption. There are two constraints in parameters selection.

1) Compression friendliness specifies that the SE mustnot disturb the compression efficiency of the encoder;otherwise, the SE would increase the encrypted datasize to be transferred for a given bandwidth. It can becontrolled by keeping the size of encrypted bin-string(codewords) the same as is the size of input bin-string,and also by keeping the context model unchanged forthe given syntax element.

2) Format compliance means the SE must not change theoverall video statistics that would otherwise make the


SVC decoder complain about decoding the selectivelyencrypted bit-stream.

To fulfill the above two constraints we can make somerecommendations for SE. Some of these recommendations aremade on the basis of experimental results (to be describedin Section V), while others pertain to the nature of syntaxelements. The SE should not be applied on the following.

1) The intracoded syntax elements having relationship withneighboring MB syntax elements like intra DC and AC:Because it increases the bit-rate and drift in the valuesof syntax elements and also the bit-stream will not bedecodable at some stage.

2) The intercoded syntax elements like MVDs: Becausethis prediction residual used to predict the future MBsalters the video statistics by changing magnitudes andincreases the bit-rate while the bit-stream can be decod-able.

3) Delta QP syntax element: Because it causes bit-ratefluctuation, either increasing or decreasing the overallbit-rate according to new encrypted dQP values.

4) MB header information (encoded first in CABAC en-coding): Because it is used for the prediction of futureMBs, this is related to format compliance.

5) Coded-block-flag (CBF): Because it makes the bit-stream nonformat-compliant. Every 4 × 4 block withinan MB is encoded if CBP and MB modes are set forit. The encoded 4 × 4 block has a CBF syntax elementshowing the NZs existence in the current block.

6) Unary and TU bin-strings: Because they have differentcodeword lengths and cause the change of bit-rate.

7) FL bins: Because they have the mandatory header infor-mation.

The above-discussed syntax elements have disadvantageswith either lower compression efficiency and/or nonformatcompliancy. Therefore, we have eliminated them from ourproposed SE scheme.

We have proposed that the SE should be applied on the bin-strings that are equally distributed, since this does not changethe compression efficiency of the codec and is encoded by thebypass BAC engine with uniform probability [39].

We found three bin-strings to fulfill our purpose of SE, thesebeing: 1) UEG3 suffix; 2) UEG0 suffix; and 3) signs of thetransform coefficient levels.

The UEG3 suffix consists of the MVD sign bits if twoconditions hold, i.e., |MVD| = 9 and 0 < |MVD| < 9, the signbits of TC levels and the suffix of UEG0 can be encrypted onlywhen abs−level > 14. The experimental results (Section V)show that the selected bins are fully compression friendly,format compliant, and do not alter the context models.

B. Bins Selection for SVC Layers

An SVC coded video has a base and a number of enhance-ment layers [46], depending on how the three scalabilitiesof temporal, spatial, and quality at various resolutions areused. The selected bins must be compliant with all threescalabilities in SVC coded video, so the bins selection isdone also by keeping in mind the specific scalability type.

Fig. 2. Bins selection per spatio-SNR-temporal scalability.

In SVC, every temporal layer requires the change of MVD,dQP, and coefficients. While spatial layers require the changeof coefficients only and SNR layers require the change ofdQP and coefficients. Such SVC layer behavior shows thatthe UEG0 suffix and sign of coefficient levels are the mostsuitable parameters for the SVC layer encryption, because thecoefficients are changing with every scalability option. TheUEG3 suffix encryption is more suitable for temporal scala-bility, but is meaningful for all three scalabilities, as spatial andSNR are usually combined with temporal scalability. In Fig. 2,X represents the temporal scalability, Y for spatial, and Z forSNR scalability. Z0 and Z1 denote the SNR quality levels.

C. SE on Bins by AES-CFB

The SE is implemented on individual SVC coded layersby using AES-CFB, which is a stream cipher; hence, it doesnot alter the number of output bits. The CFB mode usesan initialization vector (fixed in our implementation) and anencryption key (variable) for each data block. The singleencryption key is used to encrypt all data blocks of eachlayer, i.e., one encryption key will encrypt all three chosenbin-strings on individual layers. So the client will receive onlyone encryption key to decrypt and watch the subscribed data.Let the three chosen bin-strings be P1B1, P2B2, and P3B3, andtheir encryption represented as C1B1, C2B2, and C3B3. Then,the general cipher process can be presented as follows:∑

(C1B1, C2B2, C3B3) = {∑(P1B1,P2B2,P3B3)} XOR

{Encrypt(C1B1 − 1, C2B2 − 1,

C1B1 − 1)} .

(1)

The encryption process (Fig. 3) is done in a unique way,which makes the bit-rate consistent. The encryption of thesign bits is not tricky, because of constant bin sizes, althoughhandling of UEG0 suffix bins is different, as the suffixes havevariable length codewords. If the sizes of suffix codewords arechanged after encryption, the bit-rate will definitely increase.So, to make the codewords compression friendly, we firstcount the suffix bins present in each UEG0 bin string andthen perform the encryption only on the number of existingsuffix bins rather than on the whole suffix allocated size. Thisscheme makes the encrypted codewords the same size as theoriginal ones.


Fig. 3. Block diagram of SE over CABAC bins.

The decryption process is reversed at the CABAC decodingside. The client is supplied with the same encryption keythat was used for enciphering. The encrypted values of thebin-strings are converted into the original bin values andthen passed to the inverse binarization, quantization and DCTprocess to get the finally de-ciphered and fully decoded bit-stream. The general de-cipher process can be represented asfollows:∑

(P1B1,P2B2,P3B3) = {Encrypt(C1B1 − 1, C2B2 − 1,

C1B1 − 1)}XOR{∑(C1B1, C2B2, C3B3)

}.

(2)

D. Key Management Scheme

Besides SE another objective of this paper is to devise akey management scheme that provides users with access tothe layers that they have subscribed to while stopping accessto other layers. Providing scalable security requires that SEis applied on all layers of data from Bl0 (base layer) to Eln(top enhancement layer). If client Ci has subscribed to receivethe data of layer Eli, he must have access to the entire lowerlayer encryption keys (i.e., eki to ek0) to be able to decodethe subscribed layer data. The management of all sets of layerEl i keys for client Ci is a potential security hazard, especiallywhen the scalable data have a large number of layers. Manyproblems arise with the generation of large number of keys,specifically: 1) high computational cost of generating multiplekeys at one time to get access to the Bl0 to Eli data;2) memory consumption; and 3) the time it takes to save ek0

to eki keys that are sizeable as per the security requirements.Consequently, the goal is to derive a mechanism in whicheach client needs to hold a single encryption key to retrievethe subscribed layer data. A single key significantly reducesthe security hazards related to key management, storage, andtransmission. In this paper, MIKEY is used to attain this goal.

MIKEY generates the two major keys (TGK and TEK),which will further generate the lower keys in a hierarchicalfashion. Table I shows the characteristics of all MIKEY keys(key length, life time, and constants) with their generation anddistribution summaries.

TABLE I

Characteristics of MIKEY Keys

Keys KeyLength(bits)

Generation/DistributionMethods andParameters

MIKEY Constants Key LifeTime

TGK (Master Key) 128 DH DH prime andbase values

01 month

TEK 128 HMAC-SHA1(TGK)

0 × 2AD01C64 Daily for12 h

Master EncryptionKey (eK)

128 HMAC-SHA1(TEK)

0 × 15798CEF For session

Authentication Key(aK)

160 HMAC-SHA1(TEK)

0 × 1B5C7973 Unique forevery user

Salt Keys (sK) 112 HMAC-SHA1(TEK)

0 × 39A2C14B Daily for12 h

TGK is generated by the DH algorithm and it generatesTEK, while TEK further generates the master encryption key,authentication key, and salt key. The purpose of salt keygeneration is to enhance the security by altering some bytesof TEK on a daily basis and thus to stop look-up table basedattacks. The few bytes of the salt key are replaced in the TEK,and after 12 h use of TEK the salted TEK will be used for thenext 12 h. The general equations for the overall key generationscheme are as follows:

TGK→gsr mod p (Diffie Hellman)

Where p = prime no., g = generator, sr = sender

& receiver RAND values (3)

TEK→ HMAC (TGK, MIKEY Constant || RAND,

TEK length) (4)

Master ek→HMAC (TEK, MIKEY eK

Constant || RAND, eK length) (5)

ak→HMAC (TEK, MIKEY aK Constant || RAND,

aK length) (6)

sk→(TEK, MIKEY sK Constant || RAND,

sK length). (7)


Fig. 4. Key generation mechanism.

The master encryption key further generates the 128-bitlower layer keys. The lower layer keys are then used toencrypt the content of the SVC lower layers by the use ofself-defined constants for each layer. The keys are generatedin a recursively hierarchical fashion, i.e., top enhancementSVC layer Eln encryption key ekn will generate its immediatelower layer Eln−1key ekn−1, ekn−1 will generate ekn−2 key,and so on. This key generation is carried out at the clientside. All the recursively derived keys will be stored in theworking memory. The generalized concept of encryption keysgeneration for lower SVC layers is represented as follows:

ekn→HMAC (TEK, ekn Constant || RAND,ekn length) (8)

ekn−1→HMAC(ekn, ekn−1 Constant|| RAND, ekn−1 length)

(9)

ekn−2→HMAC(ekn−1, ekn−2 Constant||RAND, ekn−2 length).

(10)

RAND is generated according to the PRF (a keyed pseudo-random function) in [42]. The overall key generation schemeis shown in Fig. 4.

After the key generation and distribution, the proposedsolution will provide client authentication and SE of the layersby using AES-CFB stream cipher algorithm. The idea behindthe SE of scalable layers can be understood from Fig. 5.

Three ascending order scalable layers are shown in Fig. 5,lowest is the base layer and the upper two are enhancementlayers. The term frame in Fig. 5 generalizes I, P, and B frameswith their respective contents. Fig. 5 shows that SE is appliedby key ek0 on the base layer video frames 1 and 5 (horizontallines patterns). Three video frames 1, 3, and 5 are on the first

Fig. 5. Keys per SVC layers.

enhancement layer el1, frames 1 and 5 are already encryptedby ek0; only frame 3 (vertical lines) belongs to the El1, soSE is applied on frame 3 only by key ek1. This process ofencryption is continued on all the above layers. The frames thatare already encrypted on lower layers will not be re-encryptedon upper layers. Only the respective layer frame(s) will beencrypted with the corresponding layer encryption key. Theequations for the SE on bit-streams within layers are

ek2(SE)→El2Frames − El1Frames (11)

ek1 → El1Frames − Bl0Frames (12)

eko → Bl0Frames. (13)

The process of SE on frames can be generalized as

ekn(SE) → ElnFrames − Eln−1Frames. (14)

V. Evaluation

The performance of the proposed SE with a key manage-ment scheme has been tested with the SVC reference softwareJoint Scalable Video Model 9.19.10 version encoder. For theevaluation of results, several different test video sequences


Fig. 6. PSNR variance of (a) Mobile (CIF) sequence and (b) ICE (4CIF) sequence at different QP values.

TABLE II

Comparison of Average PSNR (dB) of 90 Frames (I+P+B) at QP 24

Sequences Plain SE Plain SE Plain SE(CIF) PSNR PSNR PSNR PSNR PSNR PSNR

Y Y U U V V

City 38.6 10.3 45.4 30.0 46.8 31.7

Container 40.8 7.4 46.4 25.0 46.7 25.0

Crew 40.3 12.0 44.9 12.0 44.7 22.7

Football 38.5 10.6 43.1 20.6 44.0 19.4

Foreman 39.6 8.1 44.8 24.5 47.2 26.2

Harbour 37.6 7.4 44.7 21.7 45.8 34.7

Ice 41.8 10.6 48.1 29.7 48.7 25.3

Mobile 37.6 7.3 41.3 18.8 41.1 15.0

News 42.2 11.3 45.1 19.9 46.3 24.3

Soccer 39.4 7.9 45.4 22.8 46.9 21.6

Sequences (4CIF)

City 37.1 10.0 44.9 26.8 46.8 29.3

Harbour 37.2 7.1 44.7 23.1 46.2 32.6

Ice 41.7 11.1 49.2 31.1 49.9 27.5

Soccer 39.2 7.2 45.4 21.4 47.2 22.4

were chosen with different combinational features, such ascolors with high or low contrast, motion, texture, objects, andso on. The experiments were performed on CIF (352 × 288pixels/frame) resolution and 4CIF (704 × 576 pixels/frame)resolution video sequences. Both CIF and 4CIF resolution testsequences were encoded into four layers (one base with threeenhancement layers) representing three temporal, two spatial,and two SNR scalable levels for CIF and four temporal,two spatial, and two SNR scalable levels for 4CIF resolutionpictures. All test videos were encoded in a main or highprofile (for base layer encoding) and baseline profile forenhancement layers. The intraframes and interframes wereselectively encrypted in sequence of their occurrence in a bit-stream with GOP size 8 and intraperiod 16. The SE results areformulated by taking different QP values, different encodingframe rates, and by calculation of computational overhead interms of encryption or decryption and key generation timings.

To demonstrate the efficiency of our proposed scheme, wehave encoded 90 frames at 30 f/s for CIF and 4CIF resolutionvideos. Table II compares the average PSNR of 90 frames(I+P+B) with and without SE. It shows the suitability of ourSE scheme for both intraframes and interframes of CIF andhigher resolution pictures. The average PSNR value of lumais in the lower range for all CIF and 4CIF resolution pictures.

We performed the experiments at different QP values of8, 16, 24, 32, 40, and 48 to show the independence of the

proposed SE on the QP values. The graphs in Fig. 6 showthe PSNR variance in YUV values with and without SE onMobile (CIF) and ICE (4CIF) videos on different QP valuesfor intraframes and interframes. Both graphs verify that ourSE scheme is independent of QP, and the average PSNR isstill in the lowest range at all QP values.

A. Security Analysis

The robustness of the proposed key management and en-cryption system against various security attacks has beenevaluated by the following tests.

1) Replacement attacks: To evaluate the strength of theproposed encryption system, we performed experimentson different sequences by replacing the encrypted bitsof data with constant bits and determining the PSNRvalues against such a guessing attack. If someonetries to guess and insert the data with the intentionof improving video quality and to make it watchable,they would tend to insert the specific constant bits orrandom strings of 0 and 1. But the proposed system isrobust against such replacement or guessing attacks. Asan experiment, in the News (CIF) video, we replacedthe encrypted data with 0s on MVD signs, 1s on signsof run levels, and added a constant integer value of fivein the suffixes to get the video. The result is a distortedimage, as shown in Fig. 7(c).

2) Video perception test with different keys: In anotherexperiment, we tested the effect of changing the keysand checked the sensitivity of video perception againstvariation in the keys. Let us assume a malicious useris able to guess the key to a very near exact value,i.e. there is a difference of only one or two bits ascompared to the exact key value and use this guessedkey for decryption. It is observed that the video can stillnot be decrypted. On the other hand, the video qualityis noticeably changed with even a single bit change inthe key. This test shows that a hacker’s attempt to guessthe video parameters will fail until the hacker is ableto guess the exact key. Let us assume that the hackerhas guessed the exact key (nearly impossible for a 128bit-key), even then, they will succeed for a very shorttime, as every time whenever the same sequence isplayed, it will be encrypted with a different encryptionkey and will have a different perception. We did thisexperiment (Fig. 8) by encoding the same sequence


Fig. 7. Impact of replacement attack on the News (CIF) sequence encoded with 90 frames (I+P+B) and QP 24. (a) Frame #41 [Y = 42.29, U = 45.15, V =46.33] dB. (b) Proposed SE [Y = 11.35, U = 19.92, V = 24.32] dB.(c) Replacement attack [Y = 3.94, U = 17.58, V=19.01] dB.

twice, as every time a new encryption key is generatedand used, a different video is produced. We also triedto decrypt with a key of only one bit changed. Theresults confirmed our findings, as mentioned above.

3) Exhaustive key search attack: The exhaustive key searchis a strategy to find the correct key by continuouslytrying every possible key in turn until the correct key isidentified. However, it is not practical to find a 128-bitkey by exhaustive key search. To quantify this securitywe can relate the number of generated attacks on dataand keys with Poisson probability distribution, given byP(μ; n) = e−μμn

n! , where e is a constant equal to approx.2.71828, µ is the number of attacks and n is the actualnumber of attacks occurring in the fixed interval of timeof region. P defines the probability of a given numberof events (attacks) occurring in a fixed interval of time.CISCO security statistics [47] show that an attack on ahost machine occurs every 5 min, translating to about300 attacks per day. We assume that 20% of theseattacks are on video and if there is one attack everyhour, a continuous time Markov chain [48] can beassociated with the attacks queue. Our system is robustenough to meet the security needs, as the time for atraffic encryption key (TEK) is fixed, i.e., 12 h andafter every 12 h TEK will be changed. Within these 12h the number of attacks that can occur is not likely tosuccessfully break the key, as the TEK will be replacedby a new one. So the previously rendered successfulattack will be useless for all subsequent key changes.

B. Statistical Analysis

An image data distribution can be examined by two statis-tical measuring parameters, mean µ and its standard deviationσ. The pixels within an image are highly correlated witheach other in the horizontal, vertical and diagonal directions.As a result, when the image is encrypted the entropy (datarandomness) falls and correlation becomes high because thevideo frames (texture and edges) are converted into flat regionsand produce artifacts in the image. During SE, pixel valueswere truncated to a maximum and minimum of 255 and 0,respectively. This causes the spread of dark or very brightcolors across the video image, which is why correlation anddata randomness increase in the encrypted video. Correlationof adjacent pixels is dependent on the local μp and σp. Astatistical analysis on video was performed on the Mobile(CIF) sequence to show the impact of SE on video statis-tics. The mean (Table III) and standard deviation (Table IV)were determined for the local neighborhood of each pixel,

TABLE III

Mean (μ) of SE for MOBILE (CIF) Sequence With 90 Frames

(I+P+B) at Different QP Values

QP μ of μ of SE μ of μ of μ of μ ofValues Plain Y Y Plain U SE U Plain V SE V

8 135.23 46.02 113.25 111.52 131.61 126.82

16 135.31 54.31 113.29 119.12 131.74 96.29

24 135.42 53.25 113.45 119.07 131.81 145.56

32 135.53 42.29 113.51 121.24 131.93 122.98

40 135.47 41.18 113.38 111.20 131.98 97.71

48 135.28 29.09 113.42 103.92 132.07 113.83

TABLE IV

Standard Deviation (�) of SE for MOBILE (CIF) Sequence With

90 Frames (I+P+B) at Different QP Values

QP σ of σ of σ of σ of σ of σ ofValues Plain Y SE Y Plain U SE U Plain V SE V

8 63.58 44.05 21.83 26.54 26.50 38.21

16 63.50 46.81 21.76 29.23 26.40 39.52

24 63.26 44.54 21.52 24.56 26.13 34.70

32 62.82 40.29 21.14 28.08 25.66 45.99

40 61.95 38.82 20.51 28.08 25.10 44.03

48 59.01 33.62 20.37 28.44 24.74 52.51

before averaging across all pixels and all frames of the testedsequence.

Table IV shows the standard deviation of luma values afterSE. Note that these are smaller than the original video whilethe chroma values are larger; this produces the dark or brightcolor pictures. The statistical analysis shows that the luma andchroma values of the whole video are drastically changed bythe proposed SE and there is no way to extrapolate or derivethe encrypted parts from the unencrypted parts.

C. Computational Overhead Analysis

The computational overhead is calculated on the basis ofthe additional processing time required for the encoding anddecoding of test sequences with SE on whole SVC bit-streamand on per layer basis. The experiments were performed on amachine, Intel Core 2 Duo (3.33 GHz) processor with 4 GBRAM. Tables V and VI show the encoding and decodingtimings of the ICE (CIF) and ICE (4CIF) videos, respectively,at different frame rates with and without SE. It is also notedhere that additional encoding and decoding delay (Tables Vand VI, columns 4 and 7) includes the keys generation time aswell, which is calculated separately and shown in Fig. 10. Theprocessing delays are negligible as they fall in the range ofmilliseconds, verifying the efficiency of the proposed schemeon intraframes and interframes for four-layer SVC, on both


Fig. 8. Impact of keys on video perception of the News (CIF) sequence encoded with 90 frames (I+P+B) and QP 24. (a) Frame #41 [Y = 42.29, U = 45.15,V = 46.33] dB. (b) ek change by 1 bit [Y = 11.34, U = 19.87, V = 24.78] dB. (c) ek change by 2 bits [Y = 11.37, U = 19.79, V = 24.70] dB.

Fig. 9. Additional encoding and decoding delay caused by SE on (a) ICE(CIF) video, and (b) ICE(4CIF ) video.

Fig. 10. Key generation time.

the encoder and decoder side. Fig. 9 shows the additionalcomputational delay on encoding and decoding in millisecondswith different numbers of frames (x-axis).

Different numbers of frames were tested to study thesuitability of the scheme for real-time transmissions, forexample, by encoding 10, then 30, up to 90 frames. TheITU-T G.114 [49] recommends a maximum of a 150 msone-way latency for real-time streaming over the Internet.Suppose the sender and receiver both have a maximum buffersize of 30 frames, the results (Tables V and VI) show thatthe encoding and decoding overheads are negligible for bothCIF and 4CIF resolutions when a sequence is encoded 30frames at a time. Thus, the proposed security system issuitable for the streaming of preencoded video, such as inweb TV and IPTV applications, as well as for interactive real-time communication scenarios, such as video conferencingapplications, to which SVC is now being applied.

The computational overhead is not only calculated by theencryption and decryption timings on an SVC bit-stream,the key generation timings are also calculated separately toshow the exact computational overhead of deriving multiplekeys. Fig. 10 depicts the time (in microseconds) required forgenerating the desired keys.

For each subscriber, three keys TEK, ak, and a master key ekhave to be derived. TEK and ak will be generated once at theclient registration stage and must be unique for each client. Inaddition, depending upon the subscribed layers by the client,the master encryption key is generated by the system for thesubscribed layer, and sent to the client. Then, he derives hisown encryption keys for all the lower layers. It is a hierarchicalsystem and each layer encryption key ek1 is derived from itsformer layer ek0. The timings given in Fig. 10 are for the keysof the scalable layers El10, El8, El6, El4 and El2, but for gener-ating hierarchical encryption keys these must be derived fromlayer El10 to El0. The experiments show that the timings ofgenerating TEK, aK, master eK and sK are the same whetherthey are generated for layer El0 or layer El10. The differenceis shown in the encryption keys generation timings of layersLn to L0. If the hierarchical encryption keys are derived forjust two scalable layers (base and enhancement layers), it willtake 49 ms and if they are generated for ten layers (one baseand nine enhancement layers) then it will take 109 ms. Thefrequent key generation does not cause much additional over-head on the encryption or decryption computational cost of theproposed system because of the negligible key generation time.

The computational cost is also calculated for each scalablelayer. We have measured the per layer encoding or decodingtime with SE on the News (CIF) sequence. The maximumprocessing delay time for 90 frames at 30 f/s for the entirefour layers encoding was 0.1124 ms and decoding processingdelay was 0.1213 ms. Fig. 11 shows the impact of encryptionon the individual layers; the quality of sequence and YUVPSNR values are degraded gradually with the specific layerencryption. If someone has the key for the base layer but notfor the enhancement layers then he can only view the baselayer contents while the enhancements remain encrypted.

Fig. 12 shows the layer-wise decryption of the News (CIF)sequence. Considering four SVC layers, if someone has justthe base layer key the video on the base layer is totally clear(but with smaller resolution, SNR, and frame rate). However,the client cannot view the data of three enhancement layers andthe video will be the same, as shown in Fig. 12(a). The keysare generated in a hierarchical top-down fashion, if someoneis subscribed to layer 1 data and has the layer 1 key only,then he can generate the lower layer 0 key and can be ableto view the video shown by Fig. 12(b). Fig. 12(d) shows thevideo with layer 3 key enabled and has the maximum quality.

D. Comparative Analysis

For comparative analysis of our proposed key managementand SE scheme with the existing work, we choose eight


TABLE V

The Computational Overhead Measurement (Milliseconds) for the ICE(CIF ) Sequence at a

Different Number of Encoded Frames (I+P+B) and QP 24

No. ofFrames

EncodingTimeWithoutSE

EncodingTimeWithSE

EncodingDelay

DecodingTimeWith SE

DecodingTimeWithoutSE

DecodingDelay

10 6248.72 6227.62 21.1 454.35 439.05 15.3

30 19152.41 19108.91 43.5 970.59 937.09 33.5

50 31933.08 31865.18 67.9 1469.85 1423.15 46.7

70 44753.66 44664.46 89.2 2008.4 1941.2 67.2

90 57724.86 57609.76 115.1 2490.14 2406.24 83.9

TABLE VI

Computational Overhead Measurement (Milliseconds) for the ICE(4CIF ) Sequence at a

Different Number of Encoded Frames (I+P+B) and QP 24

No.ofFrames

EncodingTimeWithoutSE

EncodingTimeWithSE

EncodingDelay

DecodingTimeWith SE

DecodingTimeWithoutSE

DecodingDelay

10 21 955 22 012 55 1321.31 1285.59 35.72

30 68 432 68 552 120 3514.5 3440.18 74.32

50 113 956 114 193 237 5615.69 5510.39 105.30

70 158 945 159 316 371 7645.58 7481.89 163.69

90 204 964 205 461 497 9719.98 9489.50 230.48

Fig. 11. Impact of SE applied on layer basis on the News (CIF) sequence with 90 frames (I+P+B) and QP 24. (a) SE applied on Layer 0 [Y = 14.5, U =23, V = 28.3] dB. (b) SE applied on Layers 0, 1 [Y = 12.9, U = 21.7, V = 26.8] dB. (c) SE applied on Layers 0, 1, 2 [Y = 11.6, U = 19.9, V = 24.3] dB.(d) SE applied entire layers [Y = 11.3, U = 19.8, V = 24.2] dB.

Fig. 12. Impact of having a layer-wise key for decryption on the News (CIF) sequence with 90 frames (I+P+B) and QP 24. (a) Decryption by Layer 0 (eK0)key [Yb = 13.86, U = 23.51, V = 25.99] dB. (b) Decryption by Layer 1 (eK1) key [Y = 16.61, U = 25.46, V = 28.96] dB. (c) Decryption by Layer 2 (eK2)key [Y = 41.31, U = 43.66, V = 45.93] dB. (d) Decryption by Layer 3 (eK3) key [Y = 42.29, U = 45.15, V = 46.33] dB.

encryption and key management methods, specifically forCABAC entropy coding of H.264/SVC scalable video codec.The chosen proposed techniques are compared on the basisof the following parameters that are denoted by comparisonsymbol Cn in comparison Table VII.

All the compared techniques are applied in the same do-main of selective encryption on CABAC of H.264/SVC. Theencryption proposed by Stütz et al. [21] was applied onNAL units of an SVC bit-stream and it was reported thatthere was a small bit-rate overhead due to the change in thenumber of bytes after NAL unit encryption. Recent researchregarding SVC is presented by [25], which has detailed work

on SVC layers. However, the DC value alteration in [25]damages the video statistics before compression and thuscauses a bit-rate overhead. The IPM encryption [26], [27],[32], [33] changes the video statistics, hence compressionefficiency degradation increases the bit-rate. The studies in[26], [27], [30], and [33] provide complete security systemsfor SVC layers and complex key management schemes areproposed, without any reference to standard key managementprotocols. More than one key were generated per layer inthese works; hence, they do not solve the problem of overheadfor managing multiple keys for each layer. The selectiveencryption presented in [30] was implemented in a similar


TABLE VII

Comparative Analysis of Proposed Security System

Proposed Schemes C1 C2 C3 C4 C5 C6 C7 C8

Stütz et al. [21] NAL unit Yes Yes CABAC/CAVLC

Yes AES-ECB No Not specified

Align et al. [25] DC alteration, signs of texture andMVD

No Yes Not speci-fied

Yes XOR No Not specified

Li et al. [26], [27] IDR frames, PPS, SPS, IPM,signs of texture

No Yes CABAC/CAVLC

Yes LEX stream cipher Yes Not specified

Won et al. [30] Signs of texture, MVD and FGS Yes Yes CABAC No XOR stream cipher Yes Not specified

Kim et al. [31] ROI with signs of texture MVDand FGS

Yes Yes CABAC No XOR stream cipher No Not specified

Park et al. [32], [33] IPM, signs of residual and MVD No Yes CABAC/CAVLC

Yes Stream cipher Yes Not specified

Our scheme UEG3 suffix, UEG0 suffix, andsigns of TC levels

Yes Yes CABAC No AES-CFB Yes MIKEY

C1: selected parameters for encryption.C2: compression friendliness.C3: format compliance.C4: entropy coding.C5: bit-rate overhead.C6: encryption algorithms.C7: incorporated key management scheme for SVC layers.C8: key management protocol.

way on ROI by Kim et al. [31], but without a key managementscheme.

To summarize, we have proposed a complete security sys-tem for scalable video content protection. It incorporates thestandard security algorithm AES-CFB for SE on justified SVCbin-strings; and the key management protocol (MIKEY) isused for client authentication at the registration phase and alsofor key generation or distribution on layer basis.

VI. Conclusion

In this paper, an efficient complete security system wasproposed for H.264 scalable video codec on CABAC bin-strings. The security system incorporated selective protectionof the scalable layers utilizing DRM techniques [50] for clientauthentication at registration stage and efficient key manage-ment mechanism through MIKEY. AES-CFB was used for SEon sensibly chosen bin-strings by taking into account the secu-rity of the video, compression efficiency, bit-rate fluctuation,format compliance, and scalability features (temporal, SNR,and spatial) of H264/SVC. The results showed that our schemewas fully implementable with all scalable features (temporal,SNR, and spatial) of SVC and with intracoded and intercoded(I, P, and B) frames. The performance of the proposed systemwas justified by many important factors, such as a securityanalysis on video perception and keys, video statistical analy-sis after the application of precompression encryption, compu-tational overhead calculation caused by SE with a keys genera-tion process, and comparative analysis with existing work. Theresults demonstrated that the proposed security system had nodrawbacks over security, compression efficiency, bit-rate, andformat compliance on the decoder side, except the inevitableminimal computational overhead due to SE over SVC layers.

The significance of the proposed system was to resolve themultiple key overhead issues: the subscriber of each layer willreceive only one encryption key to use, but this key will trans-parently open the doors of all the layers below. The proposedsystem was suitable for video distribution to users who had

subscribed to a different video quality regarding bandwidth,storage, and device rendering capabilities. The same systemcan be extended to ROI for bit-rate reduction in video surveil-lance [31], [51] without any modification. The error resilience[52] issues for the proposed system can be investigated in thetransmission scenarios of scalable layers as a future work.

References

[1] T. Wiegand, G. Sullivan, J. Sullivan, G. Bjøntegaard, and A. Luthra,“Overview of the H.264/AVC video coding standard,” IEEE Trans.Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560–576, Jul. 2003.

[2] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira,T. Stockhammer, and T. Wedi, “Video coding with H.264/AVC: Tools,performance, and complexity,” IEEE Circuits Sys. Mag., vol. 4, no. 1,pp. 7–28, Apr. 2004.

[3] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable videocoding extension of the H.264/AVC standard,” IEEE Trans. Circuits Syst.Video Technol., vol. 17, no. 9, pp. 1103–1120, Sep. 2007.

[4] Y. Mao and M. Wu, “A joint signal processing and cryptographicapproach to multimedia encryption,” IEEE Trans. Image Process., vol.15, no. 7, pp. 2061–2075, Jul. 2006.

[5] F. Cayre, C. Fontaine, and T. Furon, “Watermarking security: Theory andpractice,” IEEE Trans. Signal Process., vol. 53, no. 10, pp. 3976–3987,Oct. 2005.

[6] H. D. Engel, R. Kutil, and A. Uhl, “A symbolic transform attack onlightweight encryption based on wavelet filter parameterization,” in Proc.ACM Multimedia Security Workshop, Sep. 2006, pp. 202–207.

[7] B. Furht, E. Muharemagic, and D. Socek, Eds., Multimedia Encryptionand Watermarking. New York: Springer-Verlag, 2005.

[8] A. Uhl and A. Pommer, “Image and video encryption: From DigitalRights Management to secured personal communication,” in AdvancesInformation Security Series, vol. 15. New York: Springer-Verlag, 2005.

[9] M. Abomhara, O. Zakaria, O. O. Khalifa, A. A. Zaidan, and B. B.Zaidan, “Enhancing selective encryption for H.264/AVC using advancedencryption standard,” Int. J. Comput. Electr. Eng., vol. 2, no. 2, pp. 223–229, 2010.

[10] B. Barmada, M. M. Ghandi, E. V. Jones, and M. Ghanbari, “Prioritizedtransmission of data partitioned H.264 video with hierarchical QAM,”IEEE Signal Process. Lett., vol. 12, no. 8, pp. 577–580, Aug. 2005.

[11] P. Melih and D. Vadi, “A MPEG-2-transparent scrambling technology,”IEEE Trans. Consum. Electron., vol. 48, no. 2, pp. 345–355, May 2002.

[12] C. Wang, H. B. Yu, and M. Zheng, “A DCT-based MPEG-2 transparentscrambling algorithm,” IEEE Trans. Consum. Electron., vol. 49, no. 4,pp. 1208–1213, Nov. 2003.

[13] W. Zeng and S. Lei, “Efficient frequency domain selective scramblingof digital video,” IEEE Trans. Multimedia, vol. 5, no. 1, pp. 118–129,Mar. 2003.


[14] S. Spinsante, F. Chiaraluce, and E. Gambi, “Masking video informationby partial encryption of H.264/AVC coding parameters,” in Proc. 13thEur. Signal Process. Conf., 2005, pp. 1338–1441.

[15] Y. Fan, J. Wang, T. Ikenaga, Y. Tsunoo, and S. Goto, “An unequal secureencryption scheme for H.264/AVC video compression standard,” IEICETrans. Fundam. Electron. Commun. Comput. Sci., vol. 91, no. 1, pp.12–21, 2008.

[16] Y. Fan, J. Wang, T. Ikenaga, Y. Tsunoo, and S. Goto, “A new videoencryption scheme for H.264/AVC,” in Proc. Adv. Multimedia Inform.Process. (LNCS, vol. 4810). 2007, pp. 246–255.

[17] J. R. Ohm, “Advances in scalable video coding,” Proc. IEEE, vol. 93,no. 1, pp. 42–56, Jan. 2005.

[18] J. G. Apostolopoulos, “Architectural principles for secure streamingand secure adaptation in the developing scalable video coding (SVC)standard,” invited paper presented at the Network-Aware MultimediaProcessing and Communications IEEE ICIP, 2006.

[19] S. J. Wee and J. G. Apostolopoulos, “Secure scalable video streamingfor wireless networks,” in Proc. IEEE Int. Conf. Acoust. Speech SignalProcess., May 2001, pp. 2049–2052.

[20] S. J. Wee and J. G. Apostolopoulos, “Secure scalable streaming en-abling transcoding without decryption,” in Proc. IEEE Int. Conf. ImageProcess., Oct. 2001, pp. 437–440.

[21] T. Stütz and A. Uhl, “Format-compliant encryption of H.264/AVCand SVC,” in Proc. 10th IEEE Int. Symp. Multimedia, Jan. 2009, pp.446–451.

[22] E. Magli, M. Grangetto, and G. Olmo, “Conditional access techniquesfor H.264/AVC and H.264/SVC compressed video,” IEEE Trans.Circuits Syst. Video Technol., to be published.

[23] E. Magli, M. Grangetto, and G. Olmo, “Transparent encryptiontechniques for H.264/AVC and H.264/SVC compressed video,” J.Signal Process., vol. 91, no. 5, pp. 1103–1114, May 2011.

[24] C. Yuan, B. B. Zhu, Y. Wang, S. Li, and Y. Zhong, “Efficient andfully scalable encryption for MPEG-4 FGS,” in Proc. IEEE Int. Symp.Circuits Syst., May 2003, pp. 620–623.

[25] G. B. Algin and E. T. Tunali, “Scalable video encryption of H.264SVC codec,” J. Visual Commun. Image Representation, vol. 22, no. 4,pp. 353–364, May 2011.

[26] C. Li, X. Zhou, and Y. Zong, “NAL level encryption for scalable videocoding,” in Proc. PCM, no. 5353. 2008, pp. 496–505.

[27] C. Li, X. Zhou, and Y. Zong, “Layered encryption for scalable videocoding,” in Proc. IEEE Conf. Image Signal Process., Oct. 2009, pp. 1–4.

[28] X. Wang, N. Zheng, and L. Tian, “Hash key-based video encryptionscheme for H.264/AVC,” Signal Process. Image Commun., vol. 25, no.6, pp. 427–437, Jul. 2010.

[29] C. Yuan, Y. Zhong, and Y. He, “Selective video stream encryptionalgorithm based on chaos,” Chin. J. Comput., vol. 27, no. 2, pp.257–263, 2004.

[30] Y. G. Won, T. M. Bae, and Y. M. Ro, “Scalable protection and accesscontrol in full scalable video coding,” in Proc. 5th Int. WorkshopDigital Watermarking, LNCS 4283. 2006, pp. 407–421.

[31] Y. Kim, S. H. Jin, T. M. Bae, and Y. M. Ro, “A selective videoencryption for the region of interest in scalable video coding,” in Proc.IEEE Region 10 Conf., Oct. 2007, pp. 1–4.

[32] S. W. Park and S. U. Shin, “An efficient encryption and key managementscheme for layered access control of H.264/scalable video coding,”IEICE Trans. Inform. Syst., vol. 92, no. 5, pp. 851–858, 2009.

[33] S. W. Park and S. U. Shin, “Efficient selective encryption scheme forthe H.264/scalable video coding (SVC),” in Proc. Int. Conf. NetworkedComput. Advanced Inf. Manage., 2008, pp. 371–376.

[34] R. Rivest, The MD5 Message-Digest Algorithm, IETF RFC 1321, Apr.1992.

[35] H. Krawczyk, M. Bellare, and R. Canetti, HMAC: Keyed-Hashing forMessage Authentication, IETF RFC 2104, 1997.

[36] G. B. White, E. A. Fisch, and U. W. Pooch, Computer System andNetwork Security. Boca Raton, FL: CRC Press, 1995.

[37] E. Magli, M. Grangetto, and G. Olmo, “Joint source, channel coding, andsecrecy,” EURASIP J. Inform. Security, vol. 2007, no. 79048, p. 7, 2007.

[38] G. Sullivan, P. Topiwala, and A. Luthra, “The H.264/AVC advancedvideo coding standard: Overview and introduction to the fidelity rangeextensions,” in Proc. 27th SPIE Conf. Applicat. Digital Image Process.,2004, pp. 454–474.

[39] D. Marpe, H. Schwarz, and T. Wiegand, “Context-adaptive binaryarithmetic coding in the H.264/AVC video compression standard,”IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 620–636,Jul. 2003.

[40] J. Teuhola, “A compression method for clustered bit-vectors,” Inf.Process. Lett., vol. 7, no. 6, pp. 308–311, 1978.

[41] M. Ghanbari, Standard Codecs: Image Compression to Advanced VideoCoding, 3rd ed. London, U.K.: IET Press, 2011.

[42] J. Arkko, E. Carrara, F. Lindholm, M. Naslund, and K. Norrman,MIKEY: Multimedia Internet KEYing, IETF RFC 3830, 2004.

[43] Advanced Encryption Standard (AES). (2001, Nov. 26). FederalInformation Processing Standards Publication 197 [Online]. Available:http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf

[44] M. Kuchar, “Dispelling the myths of cryptography,” Database Netw. J.,vol. 30, no. 2, p. 3, 2000.

[45] B. Esslinger, The CrypTool Script: Cryptography, Mathematics andMore, 10th ed. (distributed with CrypTool version 1.4.30), freeeLearning program CrypTool, 2010, pp. 79–150.

[46] B. B. Zhu, M. D. Swanson, and S. Li, “Encryption and authenticationfor scalable multimedia: Current state of the art and challenges,” inProc. SPIE Internet Multimedia Manage. Syst., vol. 5601. Oct. 2004,pp. 157–170.

[47] D. Tesch and G. Abelar, Security Threat Mitigation and Response: Un-derstanding Cisco Security MARS. Indianapolos, IN: Cisco Press, 2006.

[48] D. Malone and W. G. Sullivan, “Guesswork and entropy,” IEEE Trans.Inf. Theory, vol. 50, no. 3, pp. 525–526, Mar. 2004.

[49] One-Way Transmission Time, ITU-T Rec. G.114, ITU-T, Feb. 1996.[50] E. T. Lin, A. M. Eskicioglu, R. L. Lagendijk, and E. J. Delp, “Advances

in digital video content protection,” Proc. IEEE, vol. 93, no. 1, pp.171–183, Jan. 2005.

[51] J. M. Rodrigues, W. Puech, and A. Bors, “Selective encryption ofhuman skin in JPEG images,” in Proc. IEEE Int. Conf. Image Process.,Oct. 2006, pp. 1981–1984.

[52] A. Massoudi, F. Lefebvre, C. D. Vleeschouwer, B. Macq, and J.-J.Quisquater, “Overview on selective encryption of image and video,challenges and perspectives,” EURASIP J. Information Security, vol.2008, no. 179290, p. 18, 2008.

Mamoona Naveed Asghar received the Bachelorsdegree in computer science from the Islamia Uni-versity of Bahawalpur, Bahawalpur, Pakistan, andthe Masters degree in computer science, with themajor in computer networks security, from Interna-tional Islamic University, Islamabad, Pakistan. She iscurrently pursuing the Ph.D. degree with the Schoolof Computer Science and Electronic Engineering,University of Essex, Colchester, U.K.

Prior to her Ph.D. studies, she served as an As-sistant Professor with the Department of Computer

Science and Information Technology, Islamia University of Bhawalpur. Hercurrent research interests include security aspects of multimedia (audio andvideo), compression, encryption, steganography, secure transmission, and thekey management schemes for standard and scalable video.

Mohammad Ghanbari (M’78–SM’97–F’01) is cur-rently an Emeritus Professor with the School ofComputer Science and Electronic Engineering, Uni-versity of Essex, Colchester, U.K. After working tenyears with the industry, he became a Lecturer withthe University of Essex in 1988, and was promotedto the Professorial Chair in video networking in1996. He was awarded the title of Emeritus Professorin 2011. He is best known for the pioneering workon layered video coding, currently known as SNRscalability in the standard video codecs. He has

registered for 11 international patents and has published more than 600technical papers in various aspects of video networking, many of whichhave had fundamental influences in this field. These include video and imagecompression, layered and scalable video coding, video transcoding, motionestimation, and video quality metrics. He is an author and co-author of sevenbooks.

Prof. Ghanbari has been an Organizing Member of several international con-ferences and workshops. He was the General Chair of the 1997 InternationalWorkshop on Packet Video and a Guest Editor for numerous special issues onvideo networking. He was a recipient of the Rayleigh Prize for the Best Bookof the Year from IET, in 2000, for his book Video Coding: An Introductionto Standard Codecs (IET Press, 1999). He served as an Associate Editorfor the IEEE Transactions on Multimedia from 1998 to 2004, and hasrepresented the University of Essex as one of the six U.K. academic partnersin the Virtual Centre of Excellence in Digital Broadcasting and Multimedia.He is a fellow of IET and a Chartered Engineer.

Documents

An Efficient Security System for CABAC Bin-Strings of H.264/SVC