[IEEE 2010 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) - Luxor, Egypt (2010.12.15-2010.12.18)] The 10th IEEE International Symposium on Signal

A Robust Audio Watermarking Technique based on the Perceptual Evaluation of Audio Quality Algorithm in the Multiresolution Domain

Masmoudi Salma, Charfeddine Maha and Ben Amar Chokri

REsearch Group on Intelligent Machines (REGIM) University of Sf ax, National Engineering School of Sf ax (ENIS)

Sfax, TUNISIA [email protected],

[email protected], [email protected]

Abstract - Audio watermarking is a method that embeds inaudible information into digital audio data. This paper

describes an audio watermarking scheme using the Least

Significant Bit (LSB) in the multiresolution domain exploiting

the perceptual evaluation of audio quality algorithm (PEAQ) and MP3 compression characteristics. To guarantee perfect inaudibility propriety, we adapt the PEAQ algorithm to select the

adequate audio band used to hide the watermark in the LSB of

the multiresolution coefficients. Additionally, we use some MP3

compression characteristics to increase robustness of the scheme and we duplicate the watermark bits in the whole audio signal blocks assuring then an important ratio property without

affecting the inaudibility property. The described method provides good robustness results against compression and StirMark aUacks. Besides, it allows blind retrieval of embedded watermark which does not need the original audio.

Keywords-component; audio watermarking, least significant

bit, discrete wavelet transforms, PEAQ, MP3 compression.

I. INTRODUCTION

With the rapid development of the Internet and multimedia technologies in the last decade, the copyright protection of digital media is becoming increasingly important and challenging. Digital watermarking [1] has been proposed as a tool to a number of media security problems. The purpose of audio watermarking is to supply some additional information about the audio by hiding watermark data into it. This watermark data may be used for various applications such as authentication, copyright protection, proof of ownership, etc. Audio watermarking technique should exhibit some desirable properties [1]. Imperceptibility and robustness are two fundamental properties of audio watermarking schemes. The watermark should not be inaudible and the watermarking process should not introduce any perceptible artifacts into the original audio signal. In other words, watermark data should be embedded imperceptibly into digital audio media. Also, the watermark should survive after various intentional and unintentional attacks. These attacks may include additive noise, low-pass filtering, MP3 compression, cropping, jittering and any other attacks that remove the watermark or confuse watermark reading system

978-1-4244-9991-5/11/$26.00 ©2011 IEEE

[2]. A trade-off should be maintained between these two conflicting properties. In general, the audio watermarking techniques can be classified into different ways. Concerning the time domain approaches, the simplest method is based on embedding the watermark in the least significant bits (LSB's) of audio samples. Probably the most famous time domain technique proposed in [3] is based on human auditory system (HAS). However, time domain techniques are not resistant enough to MP3 compression and other signal processing attacks. For example, removing the LSB of every sample or a simple low-pass filtering may eliminate the watermark. Transform domain watermarking [1] schemes are those based on the fast Fourier transform (FFT), Discret cosinuse transform (DCT) [4][5] and Discret wavelet transform (DWT)[4][6], typically provide higher audio fidelity and are much robust to audio manipulations.

In previous works, we have proposed two new schemas of digital audio watermarking based on DWT, DCT and LSB in [4]. Referring to the comparison's results of the DWT-LSB technique to the DCT-LSB technique, we found that DWT-LSB is more performing than DCT-LSB. Therefore, we decide to improve the DWT-LSB technique by incorporating the measure ODG of the PEAQ [7] algorithm and the measure NC [8] following the attacks of MP3 compression by proposing a new method called DWTLSB-PEAQ.

The rest of this paper is organized as follows: The important features of the proposed algorithm are given in section II. The experimental results of this scheme are presented and discussed in section III. Finally, section IV concludes the paper.

II. THE PROPOSED AUDIO DWT-LSB-PEAQ WATERMARKING

ALGORITHM

This proposed scheme consists of two parts: watermark embedding, and watermark detection. Details are described in the following subparts.

326

A. DWT-LSB-PEAQ watermark embedding:

The main steps of the developed embedding procedure are depicted in Fig. 1 and Fig. 2.

OripuJ Audio

'------,----� ------ - - ---'Yatumad;.ed audio, ,,,

NC> - l o r NCttlJI x

"-a tumIo rl::ed audio,a, b '"-___ .-___ -' audio, a, b

\Yatenna.rked Audio

Fig. l.Watermark embedding process.

1) Insertion and watermarked signal construction: The different details of the step "insertion and watermarked signal construction" are descriped in Fig. 2.

r----------------o · . al � I 512 blocks dhis ion au JO

� ____ � ________ �

NB blocs

'" atennarked Audio sign

Fin din. g tb e sample of insertion

"'a tenn arked signal const,"UcDon

"'atennark

Fig.2. The "Insertion and watermarked signal construction" process.

First an original audio signal is divided into nonoverlapping NB blocks of 512 samples. This number is

978-1-4244-9991-5/11/$26.00 ©2011 IEEE

inspired from the fact that a signal with 512 samples has features stable. Then a 3-level DWT transform is performed. Simultaneously, a watermark is decomposed into different sets of length 8. Each set is encoded with a Hamming error correcting code (12,8). After that, we locate the band of middle frequencies of each block and we search the sample value closed to the average value of the located band. We proceed then to the insertion by substituting the LSB value of the sample by the current watermark bit. Finally we construct the watermarked audio signal after performing 3-level Inverse DWT transform. Every bit of the watermark will be embedded N time. This number N depends of the length of the audio file, the NB blocks and the size of the watermark.

a) Hamming error correcting code: The Hamming error correcting code (12, 8) is used to increase the robustness. In fact, error correcting code plays an important role to the watermark, especiaJly when this watermark is corrupted, i.e., when it is damaged significantly. The Hamming code can overcome the corruption of a watermark, and can help it survive through serious attacks.

b) DWT: The discrete wavelets transform (DWT) is a mathematical tool capable of giving a time-frequency representation of any given signal [9]. Starting from the original audio signal S, DWT produces two sets of coefficients as shown in Fig.3 [10]. The approximated coefficients A (low frequencies) are produced by passing the signal S through a low pass filter y. The details coefficients D (high frequencies) are produced by passing the signal S through a low pass filter g.

Appro . m arion coefficient s

S [nl Detail coefficients

Fig.3. One-level DWT decomposition.

Depending on the application and the length of the signal, the low frequencies part might be further decomposed into two parts of high and low frequencies. FigA shows a 3-level DWT decomposition of signal S. The original signal S can be reconstructed using the inverse DWT process.

Fig.4. Three-level DWT decomposition

Due to its exceJlent spatio-frequency localization properties, the DWT are very suitable to identify areas in an audio signal where a watermark can be embedded effectively [11].

327

c) Locating the [a bJ interval using a and localization of embedding sample: In this step, we take each block in our case each column of the matrix M obtained after division in blocks, and we choose the appropriate band of detail coefficients useful in the insertion of watermark bits. This band is taken by a factor a derived by integrating the measurement of the value OOG of PEAQ algorithm, and the value of NC following the attacks of MP3 compression. a is initially set at a value equal to 64 and serves to fix the boundaries (a and b) of the band of details coefficients [a b]. Indeed, the choice to insert the bits of the mark in the band O2 or 03 is argued by the fact that the high frequencies (01) will be modified and even deleted by certains attacks or manipUlations of the watermarked signal (mainly compression), and low frequency (cA3) presents the most audible areas in the audio signal (the ear is very sensitive to low frequencies). Thus, it remains two other bands O2 and 03 to insert the bits of the watermark. First, we limit the band [a b] (02 or 03), a = a and b = 2*a, that band should give the best transparence of watermarking algorithm. Then, with new values of a, b and a, and after applying compression, we calculate the value of NC (correlation between the watermark inserted and the once detected). we fix again the new band a=a and b = 2 * a-scale ("scale" is the scale with which we decrease the band of insertion to have the NC closer to 1. Here we choose it equal to 10.

d) Insertion Strategies: For each block, we substitute the LSB of the located sample in the previous step by the current watermark bit. A sample can contain one or two bytes. If we have a sample with two bytes, the substitution is done in the LSB of the byte on the right. Our hiding process is applied to all blocks by inserting N times each watermark bit (watermark duplication.) N is deducted from the audio signal length, the watermark size and the number of blocks. After the duplication of the substitutive insertion process and performing of the IDWT transform, we obtain the audio watermarked signal.

2) Measuring ODG the value of the algorithm PEAQ: Every time we do the insertion and watermarked signal construction, we calculate the value of the �OG. This value varies from -4 up to O. For the transparence of a watermarking system, OOG must be higher or equal to -l. After measuring the value of the �OG, we test: IF (OOG> = -1) or (OOG is maximal), THEN we have a transparent watermarking system and we didn't need to change the band transporter of the watermark's bits ELSE (OOG <-1 and OOG <OOGmax and band of insertion does not exceed [64,256]) we increase a (factor to determine the bande of insertion) with 64 and we change the band of details coefficients [ab] (a and b depends on a) and we repeat the insertion and watermarked signal construction. Once we guarantee that we obtain the best value of OOG and the best band providing this value of OOG, we move to improve the value of NC following the attacks of MP3 compression.

978-1-4244-9991-5/11/$26.00 ©2011 IEEE

3) Measuring the value ofNe after compression MP3 attacks: After delivering a watermark transparent as possible, we proceed now to ensure the most reliable robustness against MP3 compression. We apply the compression to the watermarked signal obtained in step of test of inaudibility, then we calculate the NC value. This value may be less than or equal to 1. To have a fully robust watermarking system, it is essential to have NC equal to l. Subsequently, we check: IF (NC = 1) or (NC or maximum), THEN we have a robust watermarking system and we didn't need to change the band of insertion ELSE ( (NC -:f 1) and NC> NCmax) we reduce the upper limit of the band [a b] "b" with scale equal to 1 0 (this value is obtained after several experiments) and this step is repeated until obtaining the best value of Ne.

B. DWT-LSB-PEAQ watermark retrieval:

The extraction process is the inverse embedding one and is exhibited in Fig.5.

+-Detected Watennar k

Watennark treatment I+-

Elimination of dup6cation

Fig. 5. Watermark retrieval process.

The extraction process does not require the original audio signal. We need in this process the duplication number N and the middle frequency positions searched in the insertion process. Those parameters constitute the secret key of our technique. We use the middle frequency positions of the watermarked samples to extract from each watermarked block the hidden bit. The length of the mark detected is necessary a multiple of 12 because of the Hamming coding. After eliminating the duplication, we extract all the watermark bits and we decode and perhaps correct them using Hamming decoding. We obtain finally a corrected watermark multiple of 8.

III. EXPERIMENTAL RESULTS

In order to test the imperceptibility and the robustness characteristics of this proposed audio watermarking method, we performed several experiments. The chosen watermark in this paper is the text "author _ n" or a binary

328

image of size 32x32 showed in Fig. 6. We recall that we duplicate each watermark bit.

I.� .. } ----=----:-----+� . k.} .... , :f

Preprocessing �;�

Fig. 6. Image watermark.

To achieve this step, various audio ".wav" files with 44.1 kHz sampling rate and 16 bits per samples are used in our experiments and are presented in Tab.l. In order to estimate the audio quality after watermark embedding, we use the signal-to-noise ratio (SNR) as an objective measure.

TABLE!. AUDIO SIGNAL AND THEIRS DESCRIPTIONS.

Signal Name Description I "beet" Symphony

orchestra 2 "Partita" Violin

3 "hello" Piano

4 "coran" Male spoken voice

5 "Tunisia" Rythmic music

6 'Jonass" Male song voice

7 "svega' Female song voice

The normalized cross-correlation [8] NC is calculated to measure similarity between the extracted and the inserted watermarks if they exist:

n

L bin(i,j)*bin'(i,j)

NC = ---.===i,J=·=

=1

=======

n

L bin'(i,j) 2 i,j=1

n

* L bin(i,j) 2 i,j=1

(1)

The more NC is close to 1, the more binary detected watermark "bin'" is similar to the binary inserted watermark "bin".

Our technique guarantee free error detection (NC=I) in the case of an ideal exchange (exchange without signal processing manipulations or attacks.)

To consider these eventual operations and to evaluate the robustness performance of the watermarking schemes, we performed several tests in which the watermarked audio is subjected to commonly encountered degradations.

978-1-4244-9991-5/11/$26.00 ©2011 IEEE

A. Robustness results

The degradations used to evaluate our schemes include MP3 compression with three compression rates 128kbps, 96kbps and 64kbp and various stirmark attacks [12].

Fig.7 and Fig.8 summarize the watermark detection results for these degradations. In this paper, we present the robustness results of the DWT -LSB-PEAQ technique for the songs "svega.wav" and "tunisia.wav".

We remark that the DWT-LSB-PEAQ method presents good robustness results against MP3 compression.

1,05

1

0,95

0,9

0,85

0,8

0,75

NC

128 Kps 96 Kps 64 Kps 128 Kps 96 Kps 64 Kps

Watermark = text Watermark = image

• Tunisia. way • Svega.wav

Fig. 7. NC values of DWT-LSB-PEAQ method after compression MP3 attack.

For Stirmark attacks, we observe that our technique is robust to several signal manipulations. For example, for the Compressor, addbrumn_lOO, fft_realJeverse, flippsample, extrastereo and Lsbzero, we perfectly extract the mark for our technique. However, we can conclude that the method cannot resist to the echo attack. It is necessary to note that adding echo affect considerably the watermarked audio signal which isn't interesting for the watermarker to have a watermarked signal very different to the original signal. In general, if we observe carefully the robustness results against Stirmark attacks, we can deduct that we have obtained good results for our proposed technique.

In addition, if we compare the robustness results to the attacks proposed in this paper versus the results against the same attacks in the paper titled Two audio watermarking schemes using the least significant bit in the transform domain [4], we can mention that the performance of the DWT-LSB-PEAQ algorithm is better than the DWT-LSB algorithm [4] especially in the MP3 compression with different rates and extrastereo with different values. Besides, the robustness of our proposed algorithm is also better than the algorithm DCT-LSB proposed in [4], mainly in the MP3 compression with different rates.

329

NC 1,2 -----------------

1 0,8 0,6 0,4 0,2

°

NC 1,5

1

0,5

°

1,2 1

0,8 0,6 0,4 0,2

°

NC

• Tunisia.wav • Svega.wav

add noise ddbrum


text image text i

Lsbzero Smooth

text image text image

Smooth2 dynnoise


1,2 N1"C--------------1

0,8 0,6 0,4 0,2

°

normalize Cutsamples copysample


Fig 8. NC values of DWT-LSB-PEAQ method after stirmark attack.

B. Inaudibility results (imperceptibility):

Imperceptibility is related to the perceptual quality of the embedded watermark data within the original audio

978-1-4244-9991-5/11/$26.00 ©2011 IEEE

signal. It ensures that the quality of the signal is not perceivably distorted and the watermark is imperceptible to a listener. To measure imperceptibility, we use signal-tonoise-ratio (SNR) as an objective measure.

Signal to Noise Ratio (SNR) is a statistical difference metric which is used to measure the similitude between the undistorted original audio signal and the distorted watermarked audio signal.

The SNR computation is done according to Equation (2), where "Y "corresponds to the original signal, and "y" corresponds to the watermarked signal.

(2)

Fig.9 show the SNR results of the watermarking method DWT -LSB-PEAQ.

55

50

45

40

SNR

texte image

• svega.wav • tunisia.wav

Fig. 9. SNR values DWT -LSB-PEAQ technique.

It's instructive to point out that the proposed algorithm satisfY the desired features of optimal audio watermarking, which have been set by the International Federation of Photographic Industry (IF PI) [14][15].

IF PI states that the watermark should not degrade perception of audio, the algorithm should offer more than 20 dB SNR. Referring to the figure above; it's easy to conclude that the performance of the proposed algorithm fulfills the desired IFPI required performance.

Fig. 10. SNR values for differents techniques.

330

To make comparison with other techniques, we observe the inaudibility results of the DWT-SVM scheme [13] and of some traditional techniques [ 15] (each SNR value is calculated as the mean of the SNR values of different audio signals each marked by different watermarks). The comparison results are showed in Fig.8. In fact, we can notice that our proposed technique presents the highest SNR values (the means of the SNR values presented in Fig.l0) except for the LSB method in the temporal domain with a 67.91 SNR which has poor immunity to manipulation and weakness in channel noise [15].

IV. CONCLUSIONS

In this paper, we present a blind audio watermarking scheme using the mesure OOG of the algorithm PEAQ and the value of NC after compression MP3 attacks. Our proposed scheme can embed the watermark into the digital audio signal in the wavelet domain using DWT transform. To improve inaudibility propriety, we adapt the PEAQ algorithm to select the adequate audio band used to hide the watermark in the LSB of the multiresolution coefficients. Additionally, we use some MP3 compression characteristics to increase robustness of the scheme and we duplicate the watermark bits in the whole audio signal blocks assuring then an important ratio property without affecting the inaudibility property. Robustness is also ameliorated by using Hamming code because it overcomes the corruption of the watermark. Moreover, the method makes the mark imperceptible by hiding the watermark bits in the LSB of the middle frequencies. Watermark detection is done without referencing the original audio signal which permits to identifY easily the audio owner copyright.

Experiments have shown that the inaudibility and robustness performance goals can be achieved successfully for our proposed technique.

To prove the robustness results against diverses attacks, we currently study the differents attacks properties.

ACKNOWLEDGMENT

The authors would like to thank Professor Mohamed Adel ALIMI from REGIM (REsearch Group on Intelligent Machines) laboratory for his advice and the fruitful discussions elaborated with him. The authors also would like to acknowledge the financial support of this work by grants from the General OGRST (Direction Generale de la Recherche Scientifique et de Technologie), Tunisia, under the ARUB program 011URl11102.

978-1-4244-9991-5/11/$26.00 ©2011 IEEE

REFERENCES

[I) Cox, I. , Miller, M., and Bloom, J. "Digital Watermarking",USA: Academic Press, 2002.

(2) Cvejic, N., Seppanen, T.: Digital audio watermarking techniques and technologies. Information Science Reference, USA (August 2007)

(3) Basia, P., Pitas, I. , Nikolaidis, N.: Robust audio watermarking in the time domain. IEEE Transactions on Multimedia 3(2), 232-241 (2001).

(4) S. Masmoudi, M. Charfeddine, and C. Ben Amar. "two audio watermarking schemes using the least significant bit in the transform domains", the IEEE MCC'IO, zurick, 2010.

(5) M. Charfeddine, M. Elarbi, M. Koubaa, C. Ben Amar, "DCT based blind audio watermarking scheme", the IEEE SIGMAP'IO' Athens, Greece, 26-28 July 2010.

(6) M. Charfeddine, M. EI'arbi, C. Ben amar, "A blind audio watermarking scheme based on neural network and psychoacoustic model with error correcting code in wavelet domain", ISCCSP'2008, Malta, 12-14 March 2008, pp. 1138-1143.

(7) Union Internationale des Telecommunications (UIT) . Recommandation B.S. 1387 : "Methode de mesure objective de la qualite du son pen;;u ", 200 I.

(8) B.Cleo, " Tatouage informe de signaux audio numeriques", 2005,thesis.

(9) Strang, G., & Nguyen, T. (1996). Wavelets and Filter Banks. Wellesley, MA: Wellesley- Cambridge Press.

(10) Mallat, S. (1989). A theory for multi-resolution signal decomposition: The wavelet Representation. IEEE Transactions on Pattern Analysis And Machine Intelligence; 11(7): 674- 693.

[II) Hsieh, M., Tseng, D., & Huang, Y. (2001). Hiding Digital Watermarks Using Multiresolution Wavelet Transform. IEEE Transactions on Industrial Electronics; 48(5): 875-882.

(12) Lang, Dittmann, Spring and Vielhauer, "Audio Watermark Attacks: From Single to Profile Attacks" , 2005.

[13) AI-Haj and Mohammad, "Digital Audio Watermarking Based on the Discrete Wavelets Transform and Singular Value Decomposition". European Journal of Scientific Research Vo1.39, .2010, pp.6-21.

(14) IFPI (International Federation of the Phonographic Industry.), 2009. http://www,ifpi.org.

[IS) Sehirli, M., Gurgen, F. and Ikizoglu. ,"Performance evaluation of digital audio watermarking techniques designed in time, frequency and cepstrum domains". Proceedings of the International Conference on Advances in Information Systems, 2004, pp.430-440.

331

Documents

[IEEE 2010 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) - Luxor, Egypt (2010.12.15-2010.12.18)] The 10th IEEE International Symposium on Signal