31
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 260 HYBRID ZERO-WATERMARKING AND MARKOV MODEL OF WORD MECHANISM AND ORDER TWO ALGORITHM FOR CON- TENT AUTHENTICATION OF ENGLISH TEXT DOCUMENTS Kulkarni U. Vasantrao 1 , Fahd N. Al-Wesabi 2 , Adnan Z. Alsakaf 3 1 Professor, Department of Comp. Sci. and Engg., SGGS Institute of Engg. and Tech., Maharashtra, INDIA. 2 PhD Candidate, Faculty of Engineering, SRTM University, Nanded, INDIA, 2 Assistant Teacher, Department of IT, Faculty of Computing and IT, UST, Sana’a, Yemen. 3 Professor, Department of IS, Faculty of Computing and IT, UST, Sana’a, Yemen. ABSTRACT Content authentication and tamper detection of digital text documents has become a major concern in the communication and information exchange era via the Internet. There are very limited techniques available for content authentication of text documents using digital watermarking techniques. English text Zero-Watermark approach based on word mechanism order twoof Markov model is developed in this paper for content authentication and tamper detection of text documents. In the proposed approach, Markov model used as a soft computing tool for text analysis and hybrid with digital watermarking techniques in order to improve the accu- racy and complexity issues of the previous watermark technique presented in reference(27). The proposed approach is implemented using PHP programming language. Further- more, the effectiveness and feasibility of the proposed approachis proved with experiments using six datasets of varying lengths. The experiment results shows that the proposed ap- proach is more sensitive for all kinds of tampering attacks and has good accuracy of tamper- ing detection. The accuracy of tampering detection is compared with other recent approaches under random insertion, deletion and reorder attacks in multiple random locations of experi- mental datasets. The comparative results shows that the proposed approach is better than WO1 approach in term of watermark complexity, capacity, and watermark accuracy of tam- pering detection under insertion and deletion attacks. Which means the proposed approach is recommended in these cases, but it is not applicable under reorder tampering attacks espe- cially on large sizes of text documents. INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 4, Issue 3, May-June (2013), pp. 260-290 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com IJCET © I A E M E

Hybrid zero watermarking and markov model of word mechanism and order-2-3

  • Upload
    iaeme

  • View
    388

  • Download
    2

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

260

HYBRID ZERO-WATERMARKING AND MARKOV MODEL OF

WORD MECHANISM AND ORDER TWO ALGORITHM FOR CON-

TENT AUTHENTICATION OF ENGLISH TEXT DOCUMENTS

Kulkarni U. Vasantrao1, Fahd N. Al-Wesabi

2, Adnan Z. Alsakaf

3

1 Professor, Department of Comp. Sci. and Engg., SGGS Institute of Engg. and Tech.,

Maharashtra, INDIA. 2 PhD Candidate, Faculty of Engineering, SRTM University, Nanded, INDIA,

2 Assistant Teacher, Department of IT, Faculty of Computing and IT, UST, Sana’a, Yemen.

3 Professor, Department of IS, Faculty of Computing and IT, UST, Sana’a, Yemen.

ABSTRACT

Content authentication and tamper detection of digital text documents has become a

major concern in the communication and information exchange era via the Internet. There are very limited techniques available for content authentication of text documents using digital watermarking techniques.

English text Zero-Watermark approach based on word mechanism order twoof Markov model is developed in this paper for content authentication and tamper detection of text documents. In the proposed approach, Markov model used as a soft computing tool for text analysis and hybrid with digital watermarking techniques in order to improve the accu-racy and complexity issues of the previous watermark technique presented in reference(27).

The proposed approach is implemented using PHP programming language. Further-more, the effectiveness and feasibility of the proposed approachis proved with experiments using six datasets of varying lengths. The experiment results shows that the proposed ap-proach is more sensitive for all kinds of tampering attacks and has good accuracy of tamper-ing detection. The accuracy of tampering detection is compared with other recent approaches under random insertion, deletion and reorder attacks in multiple random locations of experi-mental datasets. The comparative results shows that the proposed approach is better than WO1 approach in term of watermark complexity, capacity, and watermark accuracy of tam-pering detection under insertion and deletion attacks. Which means the proposed approach is recommended in these cases, but it is not applicable under reorder tampering attacks espe-cially on large sizes of text documents.

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING

& TECHNOLOGY (IJCET)

ISSN 0976 – 6367(Print)

ISSN 0976 – 6375(Online)

Volume 4, Issue 3, May-June (2013), pp. 260-290

© IAEME: www.iaeme.com/ijcet.asp

Journal Impact Factor (2013): 6.1302 (Calculated by GISI)

www.jifactor.com

IJCET

© I A E M E

Page 2: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

261

Keywords: Digital watermarking, Markov Model, order two, word mechanism, probabilistic patterns, information hiding, content authentication, tamper detection, copyright protection.

I. INTRODUCTION

With the increasing use of internet, e-commerce, and other efficient communication technologies, the copyright protection and authentication of digital contents, have gained great importance. Most of these digital contents are in text form such as email, websites, chats, e-commerce, eBooks, news, and short messaging systems/services (SMS) [1].

These text documents may be tempered by malicious attackers, and the modified data can lead to fatal wrong decision and transaction disputes [2].

Content authentication and tamper detection of digital image, audio, and video has been of great interest to the researchers. Recently, copyright protection, content authentication, and tamper detection of text document attracted the interest of researchers. Moreover, during the last decade, the research on text watermarking schemes is mainly focused on issues of copy-right protection, but gave less attention on content authentication, integrity verification, and tamper detection [4].

Various techniques have been proposed for copyright protection, authentication, and tamper detection for digital text documents. Digital Watermarking (DWM) techniques are con-sidered as the most powerful solutions to most of these problems. Digital watermarking is a technology in which various information such as image, a plain text, an audio, a video or a combination of all can be embedded as a watermark in digital content for several applications such as copyright protection, owner identification, content authentication, tamper detection, access control, and many other applications [2].

Traditional text watermarking techniques such as format-based, content-based, and image-based require the use of some transformations or modifications on contents of text document to embed watermark information within text. A new technique has been proposed named as a zero-watermarking for text documents. The main idea of zero-watermarking techniques is that it does not change the contents of original text document, but utilizes the contents of the text itself to generate the watermark information [13].

In this paper, the authors present a new zero-watermarking technique for digital text documents. This technique utilizes the probabilistic nature of the natural languages, mainly the second order based on word level of Markov model.

The paper is organized as follows. Section 2 provides an overview of the previous work done on text watermarking. The proposed generation and detection algorithms are described in detail in section 3. Section 4 presents the experimental results for the various tampering attacks such as insertion, deletion and reordering. Performance of the proposed approach is evaluated by multiple text datasets. The last section concludes the paper along with directions for future work.

II. PREVIOUS WORK

Text watermarking techniques have been proposed and classified by many literatures based on several features and embedding modes of text watermarking. We have examined briefly some traditional classifications of digital watermarking as in literatures. These tech-niques involve text images, content based, format based, features based, synonym substitu-tion based, and syntactic structure based, acronym based, noun-verb based, and many others of text watermarking algorithms that depend on various viewpoints [1][3][4].

Page 3: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

262

A. Format-based Techniques Text watermarking techniques based on format are layout dependent. In [5], proposed

three different embedding methods for text documents which are, line shift coding, word shift coding, and feature coding. In line-shift coding technique, each even line is shifted up or down depending on the bit value in the watermark bits. Mostly, the line is shifted up if the bit is one, otherwise, the line is shifted down. The odd lines are considered as control lines and used at decoding. Similarly, in word-shift coding technique, words are shifted and modifies the inter-word spaces to embed the watermark bits. Finally, in the feature coding technique, certain text features such as the pixel of characters, the length of the end lines in characters are altered in a specific way to encode the zeros and ones of watermark bits. Watermark detection process is performed by comparing the original and watermarked document.

B. Content-based Techniques

Text watermarking techniques based on content are structure-based natural language dependent [4]. In [6][14], a syntactic approach has been proposed which use syntactic struc-ture of cover text for embedding watermark bits by performed syntactic transformations to syntactic tree diagram taking into account conserving of natural properties of text during wa-termark embedding process. In [18], a synonym substitution has been proposed to embed wa-termark by replacing certain words with their synonyms without changing the sense and con-text of text.

C. Binary Image-based Techniques

Text Watermarking techniques of binary image documents depends on traditional im-age watermarking techniques that based on space domain and transform domain, such as Dis-crete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Least Significant Bit (LSB) [5]. Several formal text watermarking methods have been proposed based on em-bedding watermark in text image by shifting the words and sentences right or left, or shifting the lines up or down to embed watermark bits as it is mentioned above in section format-based watermarking [5][7].

D. Zero-based Techniques

Text watermarking techniques based on Zero-based watermarking are content features dependent. There are several approaches that designed for text documents have been pro-posed in the literatures which are reviewed in this paper [1][19] [20] and [21].

The first algorithm has been proposed by [19] for tamper detection in plain text documents based on length of words and using digital watermarking and certifying authority techniques. The second algorithm has been proposed by [20] for improvement of text authenticity in which utilizes the contents of text to generate a watermark and this watermark is later extracted to prove the authenticity of text document. The third algorithm has been proposed by [1] for copy-right protection of text contents based on occurrence frequency of non-vowel ASCII characters and words. The last algorithm has been proposed by [21] to protect all open textual digital con-tents from counterfeit in which is insert the watermark image logically in text and extracted it later to prove ownership. In [22], Chinese text zero-watermark approach has been proposed based on space model by using the two-dimensional model coordinate of word level and the sentence weights of sentence level.

Page 4: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

263

E. Combined-based Techniques One can say the text is dissimilar image. Thus, language has a distinct and syntactical

nature that makes such techniques more difficult to apply. Thus, text should be treated as text instead of an image, and the watermarking process should be performed differently. In [23] A combined method has been proposed for copyright protection that combines the best of both image based text watermarking and language based watermarking techniques.

The above mentioned text watermarking approaches are not appropriate to all types of text documents under document size, types and random tampering attacks, and its mecha-nisms are very essential to embed and extract the watermark in which maybe discovered eas-ily by attackers . On the other hands, these approaches are not designed specifically to solve problem of authentication and tamper detection of text documents, and are based on making some modifications on original text document to embed added external information in text document and this information can be used later for various purposes such as content authen-tication, integrity verification, tamper detection, or copyright protection. This paper proposes a novel intelligent approach for content authentication and tamper detection of English text documents in which the watermark embedding and extraction process are performed logically based on text analysis and extract the features of contents by using hidden Markov model in which the original text document is not altered to embed watermark.

III. THE PROPOSED APPROACH

This paper presents an improved intelligent approach of English text zero-

watermarking based on word level and second order of Markov model for content authentica-tion and tampering detection of text documents.

An improved approach depends on word mechanism and order two of Markov model to improve the performance, complexity and accuracy of tampering detection of similar ap-proach that used order one of Markov model presented in [27] and developed by F. Al-wesbiet. el. An improved approach should perform watermark generation, embedding, extrac-tion and detection processes under higher accuracy and security measures. An improved ap-proach hybrid text zero-watermarking techniques and soft computing tools for natural lan-guage processing and protect the digital text documents. A Markov model uses for text analy-sis and extracts the interrelationship between its contents as probabilistic patterns based on word level and second order of Markov model in order to generate the watermark information. This watermark can later be extracted using extraction algorithm and matched with water-mark generated from attacked document using detection algorithm for identifying any tam-pering and prove the authenticity of text document.

Before we explain the watermark generation and detection processes, in the next sub-section we present a preliminary mathematical description for second order of Markov mod-els based on word mechanism for text analysis

A. Markov Models for Text Analysis In this subsection, we explain how to model text using a Markov chain, which is defined

as a stochastic (random) model for describing the way that processes move from state to a state. For example, suppose that we want to analyse the following sentence:

“The quick brown fox jumps over the brown fox who is slow jumps over the brown

fox who is dead.”

Page 5: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

264

When we use a Markov model of order two for words mechanism, then each sequence of two words is a state. As the above sample text is processed, the system makes the following transitions:

"the quick" -> "quick brown" -> "brown fox" -> "fox jumps" -> "jumps over" -> "over

the" -> "the brown" -> "brown fox" -> "fox who" -> "who is" -> "is slow" -> "slow jumps" -> …etc

Next we present a simple method to build a Markov matrix of states and

tionsM��, ��which is the most basic part of text analysis using Markov model. Based on this approach, the size of Markov matrix is not fixed, which means the number of

states and transition probabilities are vary based on contents of the given text. A list of all possible states and transitions can be computed by the equation (1) and (2):

Ps = (n-2) ………. (1)

Ps = (n-2) ^ 2 ………. (2)

Where,

- n: is the length of given text document.

So the matrix of states probabilities for the above given sample text should have (20 – 2) = 18 double of words.

A matrix of transition probabilities from each state, there are (n -2) possible transitions.

IF the Markov chain is currently at first state (first two words) in the given text document, the possible states that could come next are [W i+2, W i+3, W i+4, …, Wi+n]. So the matrix of transition probabilities should have (n – 2) ^ 2 entries. For example in the above given sam-ple text, If the Markov chain is currently at "the quick" state, the possible transitions that could come next are [brown, fox, jumps, over, the, ...,dead].

So the matrix of transition probabilities for the above given sample text should have

(20 – 2) ^ 2 = 18 ^ 2 = 324 entries.

In general, if each state has n transitions of words, then there are (n-2) states, and the matrix of transition probabilities needs (n-2) ^ 2 entries.

As a result of a Markov model of order two for words mechanism to analysing the

above given sentence which contains 20 words and after processed by the system and repre-sented in a Markov chains, we obtain the figure 2 which gives the 11 present states as a words setsin matrix of Markov chains without reputations and 18 (n – 2) all possible transi-tions.

Page 6: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

265

Fig. 2. Sample text states and transitions based on order 2 of a Markov model

Now if we consider state "brown fox", the next state transitions are "jumps", "who", and ”who”. We observe that state “who” occurs twice.

For the analysis of large sized text, we calculate the frequencies of occurrences of the

next states to finally obtain the probabilities. Next is a simple procedure to obtain a Markov model of order twofor a given text.

Build (and initialize to all-zeroes) an-2 by n-2 matrix M to store the transitions. The

entry M�i, j� will be used to keep track of the number of times that the iword is followed by

the jwordwithin given text. For i � 1 to L � 2, where L is the length of the text document - 2", let x be the ithword in the text and y be the (i+2)stword in the text. Then increment M[x,y]. Now the matrix M contains the counts of all transitions. We want to turn these counts into probabilities. Here is a method that can do it. For each i from 3 to n, sum the entries on the ith row, i.e., let counter[i] = M[i,3] + M[i,4] + M[i,5] + ... + M[i,n]. Now define P[i,j] = M[i,j] / counter[i] for all pairs i,j. This just gives a matrix of probabilities. In other words, now P[i,j] is the probability of making a transition from word i to word j. Hence a matrix of probabili-ties that describes a Markov model of order twofor the given text is obtained.

B. Watermark Generation and Embedding Algorithm

The watermark generation and embedding algorithm requires the original text docu-

ment (To) as input which provided by the author, then as a pre-processing step it is required to perform conversion of capital letters to small letters. A watermark pattern is generated as the output of this algorithm. This watermark is then stored in watermark database along with the main properties of the original text document such as document identity, author name, current date and time.

This stage includes involves three algorithms, which are pre-processing and building the Markov matrix, text analysis, and watermark generation and embedding as shown in fig-ure 3.

Page 7: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

266

Fig. 3. Watermark generation and embedding processes

1) Pre-processing and Build the Markov Matrix This algorithm requires the original text document as inputs, and provides the prepro-

cessed text document and Markov matrix as outputs. Building the states and transition matrix is the most base part of text analysis and watermark generation using Markov model. A Markov matrix that represents the possible states and transitions available in given text is constructed without reputations. In this approach, each unique sequence of two words within given text represent as state (words set) and transition in the Markov matrix. During building process of Markov matrix, the proposed algorithm initialize all transition values by zero to use these cells later to keep track of the number of times that

the iword is followed by the jword within given text document.

Original Text Document (TDO)

Text Pre- processing

WMO

DigestWM patterns using MD5 algorithm

Watermark DB WMO Patterns, DocID, Date,

Time

No

Text Analysis using Markov model

Building Markov matrix

Word Level Order TWO

Compute # of occurrences of NS transitions for every

PS

Present State (PS)

NextState (NS)

Terminate?

Probabilistic patterns

Yes

Page 8: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

267

The algorithm of preProcessing executes as following:

PROCEDURE preProcessing(TO)

- Input: Original Text Document (TO) - Output: preprocessed text document (TP), state matrix of given text without repeats

arrayList[Ts], - BEGIN - Loop index = 0 to Text.Length - 2,

o // convert letters case from capital to small

o IF UpperCharacter(TO[index]) = True THEN � TP[index] = LowerCharacter(TO[index]);

o // List all unique sequence of two words within given text as states in array

list

o exist = TP[index]; o Loop j = 0 to index o IF arrayList[j] <>exist THEN o arrayList[index] = exist;

- index ++; - END Where, o To: represent the original text document, Tp: represent the processed text docu-

ment,arrayList: represent the states arrayof given text after preprocessing process,index: represent the current word in given text.

The algorithm of buildingMarkov matrix executes as following:

PROCEDURE Build_Markov_Matrix(TP)

- Input: preprocessed text (TP) - Output: Markov matrix with zeros initial value - BEGIN - // perform preprocessing process - Call preProcessing (TP) - // Build states and transitions matrix of Markov model and initialize all zeros - Loop ps = 0 to arrayList.Length - 2,

o Loop ns = 0 to arrayList.Length, � MarkovMatrix[ps][ns] = 0;

o ns ++; - ps ++; - END Where, o TP: represent the preprocessed text document, MarkovMatrix: States and Transi-

tions matrix with zero value for all cells, ps: The present state, ns: The next state.

2) Algorithm of Text Analysis This algorithm takes the preprocessed text document as input, and provides the wa-

termark patterns as output. Aafter the Markov matrix was constructed, text analysis process should be done using Markov model based on order two of word mechanism by-

Page 9: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

268

finding the interrelationship between words of the given text document. In the other word, the proposed algorithm computes the number occurrences of the next state transi-tions for every present state. A Matrix of transition probabilities that represents the num-ber of occurrences of transition from a state to another which constructed by equation (3) as following.

MarkovMatrix[ps][ns] =Total Number of Transition[i][j], for i.j=1,2, .,n-2 …….(3)

Where, o n: is the total number of states. o i: refers to PS "the present state". o j: refers to NS "the next state". o P[i,j]: is the probability of making a transition from wordi to word j.

Text analysisof given sentence based on word mechanism and order twoshowed in

Markov chain and proceeds as illustrated in figure 4.

Fig. 4. Text analysis processes based on order 2 of a Markov model

Let TPis the preprocessed text, MarkovMatrix[ps][ns] represent the Markov matrix to

store values of the number of times that the iword is followed by the jword in the given text. The text analysis algorithm is presented formally and executes as following:

PROCEDURE text_analysis(TP)

- Input: preprocessed text (TP) - Output: Markov matrix with values of transition probabilities - BEGIN - // build states and transitions matrix of Markov model - Call Build_Markov_Matrix (TP) - // compute the total frequencies of transitions for every state - Loop ps = 0 to arrayList.Length - 2,

o Loop ns = 1 to arrayList.Length, � Loop counter = 2 to TP.length - 1,

• MarkovMatrix[ps][ns] = Total Number of Transition[ps][ns] � counter ++;

Page 10: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

269

o ns ++; - ps ++; - END Where, o TP: represent the preprocessed text document, MarkovMatrix: States and Transi-

tions matrix with transition probability values for every state.

3) Algorithm of Watermark Generation and Embedding After performing the text analysis and extracting the probability features, the water-

mark is obtained by identifying all the nonzero values in the above Markov matrix. These nonzero values are sequentially concatenated to generate a watermark pattern, de-noted by WMPO as given by equation (4) and presented in figure 5.

WMPO &= MarkovMatrix [ps] [ns], for i,. j= nonzero values in the Markov ma-

trix………….. (4)

Fig. 5. The original watermark patterns for a given sample text

The embedding process will be done logically during text analysis process by keeping the tracksof all nonzero transitions and its values shown in the Markov matrix. In which

the cells of nonzero transitions contains the number of times that the i word is followed

by the j word within given text document. These tracks can be used later by detection algorithm for matching it with those tracks that will be producing from the attacked text document.

This watermark is then stored in the watermark database along with some properties of

the original text document such as document identity, author name, current date and time.After watermark generation as sequential patterns, an MD5 message digest is gen-erated for obtaining a secure and compact form of the watermark, notationalyas given by equation (5) and presented in figure 6.

DWM = MD5(WMPO) ……………….. (5)

Fig. 6. The original watermark after MD5 digesting

Page 11: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

270

The proposed watermark generation and embedding algorithm, using the first order of Markov model word level is presented formally andexecutes as the following:

PROCEDURE watermark_gen_embed(M[I,j])

- Input: Markov matrix[i,j] - Output: originalwatermark patterns (WMPO) - BEGIN - // compute the total frequencies of transitions for every state of original document - Call text_analysis(TP) - // concatenate watermark patterns of every states shown in Markov matrix - Loop ps = 0 to MarkovMatrix[rows].Length - 2,

o Loop ns = 1 to MarkovMatrix[columns].Length, o IF MarkovMatrix [ps][ns] != 0 // states that have nonzero transitions o WMPO &= MarkovMatrix [ps] [ns] o ns ++;

- ps ++; - Store WMPOin DWM database. - // Digest the original watermark using MD5 algorithm - WMO = MD5(WMPO) - Output WMPO, WMO - END

Where, o WMO: Original watermark, WMPO: Original watermark patterns, MD5: Hash al-

gorithm.

C. Algorithms of Watermark Extraction and Detection

The watermark detection algorithm is on the base of zero-watermark, so before detec-tion for attacked text document TA, the proposed algorithm still need to generate the attacked watermark patterns′. When received the watermark patterns′, the matching rate of patterns′ and watermark distortion are calculated in order to determine tampering detection and content authentication.

This stage includes two main processes which are watermark extraction and detection. Extracting the watermark from the received attacked text document and matching it with the original watermark will be done by the detection algorithm.

The proposed watermark extraction algorithm takes the attacked text document, and performs the same watermark generation algorithm to obtain the watermark pattern for the attacked text document as shown in figure 7.

Page 12: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

271

Fig. 7. Watermark Extraction and Detection processes

1) Watermark Extraction Algorithm In this algorithm the proposed approach takes the attacked text document (TA), origi-

nal watermark patterns or original text document as inputs and the procedure is similar to that of watermark generation. Output of this algorithm is attacked watermark patterns’ (WMPA).

WMO

Attacked Text Document

(TDA)

Text Pre-processing

EWMA Watermark DB WMO

Patterns, DocID, Date, Time

No

Text Analysis using Markov model

Building Markov matrix

Word Level Order TWO

Compute # of occurrences of NS transitions for every

PS

Present State (PS)

NextState (NS)

Terminate?

Probabilistic patterns

Yes

Text Document

Tampered

WM Pattern

Matching

Text Document is

Authentic

Yes

No

Page 13: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

272

The watermark extraction algorithm is executes as following

PROCEDURE watermark_extraction(TP’)

- Input: Attacked text document(TA) - Output: attackedwatermark patterns (WMPA). - BEGIN - // perform preprocessing process for attacked text document - Call preProcessing(TA) - // compute the total frequencies of transitions for every state of attacked document - Call text_analysis(TAp) - // Generatethe attacked watermark patterns from the attacked text document. - Loop ps = 0 to MarkovMatrix’[rows].Length - 2,

o Loop ns = 0 to MarkovMatrix’[columns].Length, o IF MarkovMatrix’[ps][ns] != 0, o WMPA &= MarkovMatrix’[ps] [ns], o ns ++;

- ps ++; - Output WMPA - END

Where, o WMPA: Attacked watermark patterns, TA: Attacked text document, TAp: prepro-

cessed attacked text document, MarkovMatrix’[ps] [ns]: Markov matrix of the at-tacked text document.

2) Watermark Detection Algorithm

After extracting the attacked watermark pattern, the watermark detection is performed in three steps,

• Primary matching is performed on the whole watermark pattern of the original document WMPO, and the attacked document WMPA. If these two patterns are found the same, then the text document will be called authentic text without tam-pering. If the primary matching is unsuccessful, the text document will be called not authentic and tampering occurred, then we proceed to the next step.

• Secondary matching is performed by comparing the components associated with each state of the overall pattern. which compares the extracted watermark pattern for each state with equivalent transition of original watermark pattern. This process can be described by the following mathematical equations (6), and (7).

������, �� � ����������� ����������� ���!����������������� � "#$ %&& �. �, (0 < PMRT<=1) ……….. (6)

Where, o ()*+:represent the value of pattern matching rate on transition level.. o ,, -: represent the indexes of states and transitions respectively, i= 0 ..number of

non-zero states in given text, j= 0 .. number of non-zero transitions in given text. o .)(/: represent the value of original watermark in transition level. o .)(0: represent the value of attacked watermark in transition level.

Page 14: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

273

���1��� � 2 ∑ 4��56��,��789:;�<=>? 1=>=@�>==@ABC<DB=���2 "#$ %&& �, (0 < PMRS<=1) …….. (7)

Where, o n: is the number of non-zero transitions of every state represented in matrix of

Markov model. o i: is the count of non-zero patterns of every state represented in matrix of Markov

model. o ()*E: represent the value of pattern matching rate on state level.

After we get the pattern matching rate of every state, we have find the weight of every state

from a whole states in Markov matrix. We can get for it by equation (8) as shown follow.

State weight Sw =���5F��� G �A>BH�=�<BH IA@JD@BKL���=<=>? B< <I =A>BH�=�<BH �…………………...…….. (8)

Where, o MNO1: is the total pattern matching rate of the state i. o P:is the number of states of given text document.

Finally, the PMR is calculated by equation (9), which represent the pattern matching rate

between the original and attacked text document.

��� � �∑ 1����8R:;S �……………….…….. (9)

Where, o N:is the total number of statesin the Markov matrix.

The watermark distortion rate refers to tampering amount occurred by attacks on contents

of attacked text document, this value represent in WDR which we can get for it by equation (10):

TU� � 1� ��� ……………………. (10)

This process is illustrated in figure 8.

Fig. 8: Watermark extraction process based on order 2 of a Markov model

Page 15: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

274

The watermark detection algorithm is executes as following:

PROCEDURE watermark_detection(Pt, Pt’)

- Input: preprocessed text (TP, TP’) - Output: PMR, WDR - BEGIN - // getting watermark of the original document - Call watermark_gen_embed(MarkovMatrix[ps][ns]) - // extract watermark from the attacked document - Call watermark_extraction(MarkovMatrix’[ps][ns]) - // pweform matching process between the original and attacked watermark patterns

- IF WMA = WMO o Print “Document is authentic and no tampering occurred” o PMR = 1

- Else � Print “Document is not authentic and tampering occurred”

o // compute pattern matching rate on the transition level - Loop i = 0 to MarkovMatrix’[rows].Length - 2,

o Loop j = 0 to MarkovMatrix’[columns].Length o IF WMPO[i][j] != 0

� patternCount +=1

������, �� � ����������� ����������� ���!����������������� � � transPMRTotal += PMRX

o Else � IF WMPA[i][j] != 0 � patternCount += WMPZ�i��j�

- // compute pattern matching rate on state level

-

- ���1��� � 2 ∑ 4��56��,��789:;�<=>? 1=>=@�>==@ABC<DB=���2

-

- stateWeight = ��5F��� G �A>BH�=�<BH IA@JD@BKL���

=<=>? B< <I =A>BH�=�<BH

- Sw += stateWeight - - // compute pattern matching rate on document level

-

- PMR � ∑ �1��8R:; G�<=>? BD]^@A <I =A>BH�=�<BH�<=>? BD]^@A <I =A>BH�=�<BH

- - // compute watermark distortion rate on document level

- WDR = 1 – PMR - END

- Where,

o SW: is the weight of states correctly matched. o WDR: represent the value of watermark distortion rate (0 < WDRS<=1).

Page 16: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

275

IV. EXPERIMENTAL SETUP, RESULTS AND DISCUSSION

A. Experimental Setup In order to test the proposed approach and compare with other the approach, we con-

ducted a series of simulation experiments. The experimental environment is listed as below: CPU: Intel Core™i5 M480/2.67 GHz, RAM: 8.0GB, Windows 7; Programming language PHP NetBeans IDE 7.0. With regard to the data sets used, six samples from the data sets de-signed in [24]. These samples were categorized into three classes according to their size, namely Small Size Text (SST), Medium Size Text (MST), and Large Size Text (LST).

Next, we define the types of attacks and their percentage as follows, Insertion attack, deletion attack and reorder attack performed randomly on multiple locations of these datasets.

The details of our datasets volume and attacks percentage used is shown in table I, which is considered are similar to those performed in [25] for comparison purpose, and it should be mentioned that we perform the reorder attack on the datasets which is not con-tained in the same paper.

TABLE I

ORIGINAL AND ATTACKED TEXT SAMPLES WITH INSERTION AND DELETION PERCENTAGE

Sample

Text ID

Original

Text Attacks Percentage

Word

Count Insertion Deletion Reorder

[SST4] 179

5$, 10%, 20%, 50$ 5$, 10%, 20%,

50$ 5$, 10%, 20%,

50$

[SST2] 421 [MST5] 469 [MST2] 559 [LST4] 2018

To measure the performance of our approach and compare it with others, the tamper-

ing accuracy which is a measure of the watermark robustness will be used. The PMR value will give the Tampering Accuracy of the given text document. The watermark distortion rate WDR is also measured and compared with other approaches. The values of both PMR and WDR range between 0 and 1 value. The larger PMR value, and obviously the lowest WDR value mean more robustness, while the lowest PMR value and largest WDR value means less robustness.

Desirable value of PMR with close to 0, and close to 1 with WDR. We categorize tam-per detection states into three classes based on PMR threshold values which are: (High when PMR values greater than 0.70, Mid when PMR values between 0.40 and 0.70, and Low when PMR values less than 0.40).

To evaluate the accuracy of the proposed approach, a series of experiments were con-ducted with all the well-known attacks such as random insertion, deletion and reorder of words and sentences on each sample of the datasets. These various kinds of attacks were applied at mul-tiple locations in the datasets. The experiments were conducted, firstly with individual attacks, then with all attacks at the same time and conducted comparative results of the proposed ap-proach with recently similar approach.

Page 17: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

276

B. Experiments with the proposed approach In this section, we evaluate the performance of the proposed approach. The character

set of words cover all English letters, space, numbers, and specialsymbols. The experiments were conducted with the various kinds of attacks individually with the rates of these attacks which are 5%, 10%, 20% and 50%respectively. The performance results of this approach un-der all the mentioned attacks are presented in tabular form in table II and graphically repre-sented in figure 9, 10, 11 and 12, for Insertion, Deletion, and Reorder attacks respectively. These results are discussed below.

TABLE II

EXTRACTED WATERMARK MATCHING AND DISTORTION PERCENTAGE UNDER VARIOUS INDIVID-

UAL ATTACKS

Sample

Text

Category

Origi-

nal

Text AT-

TACKS

Volume

Extracted watermark matching and accuracy under 3 ATTACKS

Word

Count Insertion Deletion Reorder

PMR

% WDR%

PMR

% WDR%

PMR

% WDR%

[SST4] 179 5% 0.9409 0.0591 0.8936 0.1064 0.7354 0.2646 10% 0.8929 0.1071 0.8506 0.1494 0.7825 0.2175 20% 0.693 0.307 0.8652 0.1348 0.363 0.637 50% 0.6386 0.3614 0.7576 0.2424 0.3008 0.6992

[SST2] 421 5% 0.9246 0.0754 0.9448 0.0552 0.8835 0.1165 10% 0.9052 0.0948 0.7423 0.2577 0.7412 0.2588

20% 0.8182 0.1818 0.8083 0.1917 0.7535 0.2465 50% 0.6622 0.3378 0.9144 0.0856 0.7624 0.2376

[MST5] 469 5% 0.9473 0.0527 0.9854 0.0146 0.8589 0.1411 10% 0.9068 0.0932 0.9553 0.0447 0.7484 0.2516 20% 0.8233 0.1767 0.9475 0.0525 0.5715 0.4285 50% 0.6428 0.3572 0.548 0.452 0.2619 0.7381

[MST2] 559 5% 0.9463 0.0537 0.9565 0.0435 0.8916 0.1084 10% 0.9006 0.0994 0.823 0.177 0.7544 0.2456

20% 0.8282 0.1718 0.8269 0.1731 0.5697 0.4303 50% 0.6576 0.3424 0.3258 0.6742 0.0493 0.9507

[LST4] 2018 5% 0.0102 0.9898 0.9852 0.0148 0.8697 0.1303 10% 0.0095 0.9905 0.979 0.021 0.0577 0.9423 20% 0.0106 0.9894 0.0065 0.9935 0.0676 0.9324 50% 0.0066 0.9934 0.008 0.992 0.0502 0.9498

� Results of various attacks under 5% scenario The results shows the PMR accuracy of the proposed algorithm, as applied on different

datasets, under 5% rate of insertion, deletion and reorder attacks. The PMR is more than 70% for all kinds of attacks except under insertion attack with large size of text document (LST4) as shown in figure no. 9. It can be observed also that as the PMR is the worst un-der reorder attack, and it is the best under deletion attack, in which the PMR still maintains a value close to or greater than 90% in all cases.

Page 18: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

277

Fig. 9 PMR accuracy under5% scenariosof various attacks

� Results of various attacks under 10% scenario As applied on different datasets under 10% rate of insertion, deletion and reorder at-

tacks as shown in figure 10, the PMR value is the best under deletion attacks in which the PMR still maintains a value greater than 70% for all datasets. However, in insertion attack, the PMR is still maintains its values close to90% for all datasets except with LST4 dataset. Which refer to that the proposed approach is not applicable under 10% of insertion attacks with large sizes of text document. Finally, in case of reorder attack, the PMR value is in-crease with small size of text documents and decrease with the large documents.

Fig. 10 PMR accuracy under 10% scenariosof various attacks

0

10

20

30

40

50

60

70

80

90

100

[SST4] [SST2] [MST5] [MST2] [LST4]

Insertion 5% Deletion 5% Reorder 5%

0

10

20

30

40

50

60

70

80

90

100

[SST4] [SST2] [MST5] [MST2] [LST4]

Insertion 10% Deletion 10% Reorder 10%

Page 19: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

278

� Results of various attacks under 20% scenario Figure 11 shows experimental results as applied on different datasets under 20% rate

of insertion, deletion and reorder attacks. As shown in figure 11, the PMR accuracy is good with small and middle sizes of text document, but it is bad with the large size of text docu-ments with all kinds of attacks as shown with LST4 dataset. Which it is refers to that the pro-posed approach dose not applicable with large documents under 20% rates of various kinds of attacks.

Fig. 11 PMR accuracy under20% scenariosof various attacks

� Results of various attacks under 50% scenario

As applied on different datasets under 50% rate of insertion, deletion and reorder at-tacks as shown in figure 12, the PMR accuracy is increase with small sizes of text documents, decrees with middle size of text documents, and very bad with large size of text documents in which values are close to zero with all scenarios. As shown also from figure 12, the PMR still maintains a value greater than 60% for small and middle datasets under insertion attack.

Fig. 12 PMR accuracy under50% scenariosof various attacks

0

20

40

60

80

100

[SST4] [SST2] [MST5] [MST2] [LST4]

Insertion 20% Deletion 20% Reorder 20%

0

20

40

60

80

100

[SST4] [SST2] [MST5] [MST2] [LST4]

Insertion 50% Deletion 50% Reorder 50%

Page 20: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

279

C. Comparative Results In order to compare the performance of the proposed approach named here as (WO2)

with recently published approach for text watermarking which presented in [27] named here as (WO1) proposed by F. Al-wesabi et al., TheWO1 has the environment and parameters as the same of the proposed approach. Both WO2 and WO1approaches depend on wordmecha-nism of Markov model. However, the core difference between that is order nature, which WO1 approach based on order one of Markov model, while the proposed approach (WO2) based on order two of Markov model.

In this experiments, random multiple insertions, deletion and reorder attacks were per-formed individually on each sample of the datasets with various rates of attacks as shown above in table I. Ratios of successfully detected watermark of the proposed algorithm as compared with reference 27 (WO1)are shown in table III and graphically represented in fig-ure 13, 14, 15and 16.

TABLE III

COMPARATIVE PERFORMANCE ACCURACY OF THE PROPOSED ALGORITHM WITH WO1 UNDER

INDIVIDUAL ATTACKS

Sample

Text

Category

Origi

nal

Text AT-

TACKS

Volume

Successfully detected watermark

Word

Coun

t

Reference 27 (WO1) The proposed approach

(WO2)

Inser-

tion

Dele-

tion

Reor-

der

Inser-

tion

Dele-

tion

Reor-

der

[SST4] 179

5% 94.59 86.02 81.85 94.09 89.36 73.54 10% 89.53 86.39 81.65 89.29 85.06 78.25 20% 67.47 88.02 45.91 69.3 86.52 36.3 50% 64.21 71.56 42.68 63.86 75.76 30.08

[SST2] 421

5% 91.91 95.19 72.05 92.46 94.48 88.35 10% 90.34 69.49 78.69 90.52 74.23 74.12 20% 81.42 73.95 81.46 81.82 80.83 75.35 50% 65.54 75.13 82.59 66.22 91.44 76.24

[MST5] 469

5% 94.64 97.33 88.21 94.73 98.54 85.89 10% 90.6 93.19 78.85 90.68 95.53 74.84 20% 80.93 89.18 63.2 82.33 94.75 57.15 50% 63.03 40.96 0.7 64.28 54.8 26.19

[MST2] 559

5% 94.8 93.69 90.02 94.63 95.65 89.16 10% 89.99 78.71 80.13 90.06 82.3 75.44 20% 82.43 73.97 65.6 82.82 82.69 56.97 50% 66.08 27.86 7.89 65.76 32.58 4.93

[LST4] 2018

5% 94.76 96.62 88.61 1.02 98.52 86.97 10% 89.67 93.02 60.8 0.95 97.9 5.77 20% 4.09 1.2 9.09 1.06 0.65 6.76 50% 1.07 1.54 7.37 0.66 0.8 5.02

Page 21: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

280

� Comparative Results for Individual Dataset In order to evaluate the accuracy of the proposed approach, we will compare its expe-

riment results with WO1 approach under various scenarios of insertion, deletion and reorder attacks. To perform this comparative, we choose three classes of experimental datasets, SST4 as small dataset, MST5 as a middle dataset, and LST4 as a large dataset.

� Comparative Results under Various Scenarios for Individual Datasets As shown in figure 13 under 5% volume of various attacks, the pattern matching rate

(PMR) of WO2 is better than PMR of WO1 in terms of deletion attacks for all datasets. How-ever, in terms of insertion and reorder attacks, the PMR of WO1 is better than PMR of WO2 for all datasetsexcept with MST5 dataset under insertion attack. Which means the proposed approach provide added value with all sizes of text documents under deletion attacks.

Fig. 13 Comparison results between (WO1) and (WO2) under5% of various attacks

0

20

40

60

80

100

Insertion 5% Deletion 5% Reorder 5%

SST4 Dataset

Refrence 27 (WO1)

This Approach (WO2)

50

60

70

80

90

100

Insertion 5% Deletion 5% Reorder 5%

MST5 Dataset

Refrence 27 (WO1)

This Approach (WO2)

50

60

70

80

90

100

110

Insertion 5% Deletion 5% Reorder 5%

LST4 Dataset

Refrence 27 (WO1) This Approach (WO2)

Page 22: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

281

As shown in figure 14, the performance of WO2 approach is better than WO1 on middle sizes of text documents under insertion and deletion attacks. On the other hand, the WO1 is better than WO2 under reorder attacks for all datasets, and under insertion attack on large size of text documents such as LST4. Which means the proposed approach is robustness against deletion attacks for all sizes of text documents, recommended for small and middle sizes of text document under this range of insertion attacks, and not applicable under inser-tion and reorder attacks for large size of text documents.

Fig. 14 Comparison results between (WO1) and (WO2) under 10% of various attacks

50

60

70

80

90

100

Insertion 10% Deletion 10% Reorder 10%

SST4 Dataset

Refrence 27 (WO1) This Approach (WO2)

0

20

40

60

80

100

Insertion 10% Deletion 10% Reorder 10%

MST5 Dataset

Refrence 27 (WO1) This Approach (WO2)

0102030405060708090

100

Insertion 10% Deletion 10% Reorder 10%

LST4 Dataset

Refrence 27 (WO1) This Approach (WO2)

Page 23: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

282

As shown in figure 15, and comparative with20% scenario of various attacks, the ro-bustness of the proposed approach (WO2) improves with small and middle size of text docu-ments when the volume of insertion and deletion attacks is increments. On the other hand, comparative results shown also, the robustness of WO2 approach is shown worse than (WO1) approach under this rate of reorder attack for all datasets. This means that it is applying the test of the proposed approach (WO2) under 20% and less of volume attacks are applicable for small and middle size of text documents but not recommended for large sizes of text docu-ments.

Fig. 15 Comparison results between (WO1) and (WO2) under 20% of various attacks

0102030405060708090

100

Insertion 20% Deletion 20% Reorder 20%

SST4 Dataset

Refrence 27 (WO1) This Approach (WO2)

0102030405060708090

100

Insertion 20% Deletion 20% Reorder 20%

MST5 Dataset

Refrence 27 (WO1) This Approach (WO2)

0102030405060708090

100

Insertion 20% Deletion 20% Reorder 20%

LST4 Dataset

Refrence 27 (WO1) This Approach (WO2)

Page 24: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

283

As shown in figure 16, and comparative with previous discussed scenarios of various attacks, the robustness of the proposed approach is still better than (WO1) approach under this rate (50%) especially under insertion and deletion attacks for all datasets.And the robust-ness value is decrease with large size of text documents. In the other word, the proposed ap-proach provide added value in term of robustness on small and middle sizes of text docu-ments especially under insertion and deletion attacks.

Fig. 16 Comparison results between (WO1) and (WO2) under 50% of various attacks

0102030405060708090

100

Insertion 50% Deletion 50% Reorder 50%

SST4 Dataset

Refrence 27 (WO1) This Approach (WO2)

0102030405060708090

100

Insertion 50% Deletion 50% Reorder 50%

MST5 Dataset

Refrence 27 (WO1) This Approach (WO2)

0

2

4

6

8

Insertion 50% Deletion 50% Reorder 50%

LST4 Dataset

Refrence 27 (WO1) This Approach (WO2)

Page 25: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

284

� Comparative Results under Various Scenarios for All Datasets The following figure no. 17, shows the performance of the two approaches, as applied

under 5% of various kind of attacks on different datasets. As shown, for all datasets, the pro-posed approach WO2 is better performance under insertion and deletion attacks. However, the WO1 is better performance than WO2 approach under reorder attacks, which show in general that the proposed approach is recommended under low volume of all tampering at-tacks for all sizes of text documents

Fig. 17 Comparison results between (WO1) and (WO2) under 5% of various attacks for all datasets

Figure 18 illustrate the comparative results under 10% rate of various attacks, As

shown for all datasets, the WO1 and WO2 approaches are close together under insertion and deletion attacks exception on large size dataset (LST4 dataset) which WO1 is better under insertion attacks. Also, under reorder attack, compression results shows that the WO1 ap-proach is better than WO2 with all datasets.

Fig. 18 Comparison results between (WO1) and (WO2) under 10% of various attacks for all datasets

0

10

20

30

40

50

60

70

80

90

100

WO1 WO2 WO1 WO2 WO1 WO2

Insertion 5% Deletion 5% Reorder 5%

[SST4] [SST2] [MST5] [MST2] [LST4]

0102030405060708090

100

WO1 WO2 WO1 WO2 WO1 WO2

Insertion 10% Deletion 10% Reorder 10%

[SST4] [SST2] [MST5] [MST2] [LST4]

Page 26: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

285

As applied under 20% of different attacks for all datasets. We can say the perform-ance of the two approachesis the same under insertion and deletion attacks as shown in figure no. 19. However, the performance of WO1 approach is better than WO2 under reorder attacks.

Fig. 19 Comparison results between (WO1) and (WO2) under 20% of various attacks for all

datasets

Figure 20 illustrate the comparative results under high rate (50%) of various attacks, As shown for all datasets, the proposed approach WO2 has the best performance and provide added value under insertion and deletion attacks for all datasets, and the proposed approach WO2 is not effective under reorder attacks.

Fig. 20 Comparison results between (WO1) and (WO2) under 50% of various attacks for all

datasets

0

10

20

30

40

50

60

70

80

90

100

WO1 WO2 WO1 WO2 WO1 WO2

Insertion 50% Deletion 50% Reorder 50%

[SST4] [SST2] [MST5] [MST2] [LST4]

0

10

20

30

40

50

60

70

80

90

100

WO1 WO2 WO1 WO2 WO1 WO2

Insertion 20% Deletion 20% Reorder 20%

[SST4] [SST2] [MST5] [MST2] [LST4]

Page 27: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

286

� Comparative Results of PMR Standard Deviation for Individual Dataset

In order to evaluate the performance of the proposed approach (WO2), we find the PMR standard deviation between WO1 and WO2 approaches (PMR of WO2 - PMR of WO1) for all scenarios of each attack applied on each dataset as shown in Table IV.

TABLE. IV

STANDARD DEVIATION OF ALL SCENARIOS FOR ALL DATASETS UNDER VARIOUS ATTACKS

Dataset

PMR Standard Deviation

Reference 27 (WO1) This approach (WO2)

Insertion Deletion Reorder Insertion Deletion Reorder

SST4 78.95 83.00 99.44 79.14 84.18 54.54

SST2 82.30 78.44 99.52 82.76 85.25 78.52

MST5 82.30 80.17 87.53 83.01 85.91 61.02

MST2 83.33 68.56 98.19 83.32 73.31 56.63

LST4 47.40 48.10 99.23 0.92 49.47 26.13

The average of standard deviation of all scenarios for small dataset (SST4), medium dataset (MST5), and large dataset (LST4)are shown respectively in figure21. As shows, in case of SST4 dataset, the proposed approachWO2observed as the best under insertion and deletion attacks. On the other side, the WO1 approach is the best under reorder tampering attack in which the difference of standard deviation average with the proposed approach WO2 is (-44.9) which means that WO1 approach is recommended for detect reorder attacks, and the performance has been improved by the proposed approach WO2 under insertion and deletions attacks.

As shown in case of MST5 dataset, the performance of WO2has improved under in-sertion and deletion attacks especially under deletion attacks with deference of standard devi-ation average with WO1 approach that approximately equal to (5.74), we observed also the PMR of the proposed approach has improved under reorder attacks with middle size of text document (MST5) as a comparative with small size of text document (SST4) but the WO1 approach still the best under this reorder tampering attacks.

Page 28: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

287

Finally, in case of large dataset LST4, comparative results shows that the PMR stan-dard deviation of WO2 approach still the best under deletion attacks, and decrease under in-sertion and reorder attacks.

Fig. 21 PMR standard deviation of all scenarios forSST4, MST5 and LST4 dataset under various attacks

0102030405060708090

100

Insertion Deletion Reorder

SD of SST4 Dataset

Reference 27(WO1) This approach (WO2)

0102030405060708090

100

Insertion Deletion Reorder

SD of MST5 Dataset

Reference 27(WO1) This approach (WO2)

0102030405060708090

100

Insertion Deletion Reorder

SD of LST4 Dataset

Reference 27(WO1) This approach (WO2)

Page 29: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

288

� Comparative Results of PMR Standard Deviation for All Datasets As shown in figure 22, the averageof standard deviation of all scenarios for all data-

sets shows that the proposed approach WO2 has positive difference with WO1approach (WO2 PMR – WO1 PMR) in term of deletion attack which equal to (3.97) and has negative-difference under insertion (-9.03) and reorder (-41.42) attacks. Thus, the test of WO2 provide added value and recommended under deletion attacks, and it is not recommended for inser-tion and reorder attacks.

Fig. 22 PMR standard deviation of all scenarios for all datasets under various attacks

V. CONCLUSION

Based on word mechanism of Markov model order two, the authors have designed a text zero-watermark approach which is based on text analysis. The algorithm uses the text features as probabilistic patterns of states and transitions in order to generate and detect the watermark. The proposed approach is implemented using PHP programming language. The experiment results shows that the proposed approach is sensitive for all kinds of random tam-pering attacks and has good accuracy of tampering detection. Compared with the recent pre-vious watermark approach named WO1 presented in reference (27) under random insertion, deletion and reorder attacks in multiple locations of 5 variable size text datasets, the compara-tive results shows that the watermark complexity is increased with the proposed approach, not effective under reorder attacks. However, the accuracy of tampering detection of the pro-posed approach is improved under all rates of deletion attacks with all sizes of text docu-ments, and it’s close to accuracy of WO1 approach under insertion attacks.This means that the proposed approach provide added value and recommended in these cases, but it is not ro-bust against reorder attacks especially for large sizes of text documents.

0

20

40

60

80

100

Insertion Deletion Reorder

SD for all Datasets

Reference 27 (WO1) This approach (WO2)

Page 30: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

289

REFERENCES

[1] Z. JaliI, A. Hamza, S. Shahidm M. Arif, A. Mirza, A Zero Text Watermarking Algorithm

based on Non-Vowel ASCII Characters. International Conference on Educational and In-formation Technology (ICET 2010), IEEE.

[2] Suhail M. A., Digital Watermarking for Protection of Intellectual Property. A Book Published by University of Bradford, UK, 2008.

[3] L. Robert, C. Science, C. Government Arts, A Study on Digital Watermarking Tech-

niques. International Journal of Recent Trends in Engineering, Vol. 1, No. 2, pp. 223-225, 2009.

[4] X. Zhou, S. Wang, S. Xiong, Security Theory and Attack Analysis for Text Watermarking. International Conference on E-Business and Information System Security, IEEE, pp. 1-6, 2009.

[5] T. Brassil, S Low, and N. F. Maxemchuk, Copyright Protection for the Electronic Dis-

tribution of Text Documents. Proceedings of the IEEE, vol. 87, no. 7, July 1999, pp. 1181-1196.

[6] M. Atallah, V. Raskin, M. C. Crogan, C. F. Hempelmann, F. Kerschbaum, D. Mohamed, and S.Naik, Natural language watermarking: Design, analysis, andimplementation. Pro-ceedings of the a Fourth Hiding Workshop, vol. LNCS 2137, 25-27 , 2001.

[7] N. F. Maxemchuk and S Low, Marking Text Documents. Proceedings of the IEEE Inter-national Conference on Image Processing, Washington, DC, Oct 26-29, 1997, pp. 13- 16.

[8] D. Huang, H. Yan, Interword distance changes represented by sine waves for water-

marking text images. IEEE Trans. Circuits and Systems for Video Technology, Vol.11, No.12, pp. 1237 1245, 2001.

[9] N. Maxemchuk, S. Low, Performance Comparison of Two Text Marking Methods. IEEE Journal of Selected Areas in Communications (JSAC), vol. 16 no. 4, pp. 561-572, 1998.

[10] S. Low, N. Maxemchuk, Capacity of Text Marking Channel. IEEE Signal Processing Letters, vol. 7, no. 12 , pp. 345 -347, 2000.

[11] M. Kim, Text Watermarking by Syntactic Analysis. 12th WSEAS International Confe-rence on Computers, Heraklion, Greece, 2008.

[12] H. Meral, B. Sankur, A. Sumru, T. Güngör, E. Sevinç , Natural language watermarking

via morphosyntactic alterations. Computer Speech and Language, 23, pp. 107-125, 2009. [13] Z. Jalil, A. Mirza, A Review of Digital Watermarking Techniques for Text Documents.

International Conference on Information and Multimedia Technology, pp. 230-234 , IEEE, 2009.

[14] M. AtaIIah, C. McDonough, S. Nirenburg, V. Raskin, Natural Language Processing for

Information Assurance and Security: An Overview and Implementations. Proceedings 9th ACM/SIGSAC New Security Paradigms Workshop, pp. 5 1-65, 2000.

[15] H. Meral, E. Sevinc, E. Unkar, B. Sankur, A. Ozsoy, T. Gungor, Syntactic tools for text

watermarking. In Proc. of the SPIE International Conference on Security, Steganography, and Watermarking of Multimedia Contents, pp. 65050X-65050X-12, 2007.

[16] O. Vybornova, B. Macq., Natural Language Watermarking and Robust Hashing Based

on Presuppositional Analysis. IEEE International Conference on Information Reuse and Integration, IEEE, 2007.

[17] M. tallah, V. Raskin, C. Hempelmann, language watermarking and tamperproofing. Proc. of al.. Natural 5th International Information Hiding Workshop, Noordwijkerhout, Netherlands, pp.196-212, 2002.

Page 31: Hybrid zero watermarking and markov model of word mechanism and order-2-3

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME

290

[18] U. Topkara, M. Topkara, M. J. Atallah, The Hiding Virtues of Ambiguity: Quantifiably

Resilient Watermarking of Natural Language Text through Synonym Substitutions. In Proceedings of ACM Multimedia and Security Conference, Geneva, 2006.

[19] Z Jalil, A. Mirza, H. Jabeen, Word Length Based Zero-Watermarking Algorithm for

Tamper Detection in Text Documents. 2nd International Conference on Computer Engi-neering and Technology, pp. 378-382, IEEE, 2010.

[20] Z Jalil, A. Mirza, M. Sabir, Content based Zero-Watermarking Algorithm for Authentica-

tion of Text Documents. (IJCSIS) International Journal of Computer Science and Infor-mation Security, Vol. 7, No. 2, 2010.

[21] Z. Jalil , A. Mirza, T. Iqbal, A Zero-Watermarking Algorithm for Text Documents based

on Structural Components. pp. 1-5 , IEEE, 2010. [22] M.Yingjie, G. Liming, W.Xianlong, G Tao, Chinese Text Zero-Watermark Based on

Space Model.In Proceedings of I3rd International Workshop on Intelligent Systems and Applications,pp. 1-5 , IEEE, 2011.

[23] S. Ranganathan, A. Johnsha, K. Kathirvel, M. Kumar, Combined Text Watermarking. In-ternational Journal of Computer Science and Information Technologies, Vol. 1 (5), pp. 414-416, 2010.

[24] Fahd N. Al-Wesabi, Adnan Alsakaf, Kulkarni U. Vasantrao, “A Zero Text Watermark-ing Algorithm based on the Probabilistic weights for Content Authentication of Text Documents”, in Proc. On International Journal of Computer Applications(IJCA), U.S.A, pp. 388 - 393, 2012.

[25] Fahd N. Al-Wesabi, Adnan Z. Alsakaf and Kulkarni U. Vasantrao, “A Zero Text Watermarking Algorithm Based on the Probabilistic Patterns for Content Authentication of Text Documents”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 1, 2013, pp. 284 - 300, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.

[26] Fahd N. Al-Wesabi, Adnan Alsakaf, Kulkarni U. Vasantrao, “English Text Zero-Watermark Based on Markov Model of Letter Level Order Two”, Inderscience, Interna-tional Journal of Applied Cryptography (IJACT), Submitted..

[27] Fahd N. Al-Wesabi, Adnan Alsakaf, Kulkarni U. Vasantrao, “Content Authentication of English Text Documents Using Word Mechanism Order ONE of Markov Model and Ze-ro-Watermarking Techniques”, Elsevier, International journal of applied soft compu-ting, Submitted.