Multimedia Watermarking Using Intelligent Techniquesprr.hec.gov.pk/jspui/bitstream/123456789/6759/1/Khurram_Jawad_Computer...Multimedia Watermarking Using Intelligent Techniques Page

Multimedia Watermarking using Intelligent

Techniques

Khurram Jawad

Doctor of Philosophy

2015

Pakistan Institute of Engineering & Applied Sciences

Multimedia Watermarking Using Intelligent Techniques

Page ii

Multimedia Watermarking using Intelligent

Techniques

Khurram Jawad

Submitted in partial fulfillment of the requirements

for the degree of Ph.D

2015

Department of Computer and Information Sciences,

Pakistan Institute of Engineering and Applied Sciences,

Nilore, Islamabad


Page iii

In the Name of Allah, the Most Beneficent, the Most Merciful


Page iv

This thesis is prepared under the supervision of

Dr. Asifullah Khan

Associate Professor

Department of Computer and Information Sciences,

Pakistan Institute of Engineering and Applied Sciences,

Islamabad, Pakistan

Financial support by Higher Education Commission Pakistan

through indeginous-5000 PhD fellowship program Batch-VI,

Grant No. 074-0773-Ps4-403.

Multimedia Watermarking using Intelligent Techniques


Page v

Declaration of Originality

I hereby declare that the work contained in this thesis and the intellectual content of this thesis

are the product of my own work. This thesis has not been previously published in any form nor

does it contain any verbatim of the published resources which could be treated as infringement of

the international copyright law. I also declare that I do understand the terms „copyright‟ and

„plagiarism,‟ and that in case of any copyright violation or plagiarism found in this work, I will

be held fully responsible of the consequences of any such violation.

Signature: _____________________

Name: ___ KhurramJawad ____

Date: _____________________

Place: _____________________


Page vi

Certificate

This is to certify that the work contained in this thesis entitled: Multimedia Watermarking

Using Intelligent Techniques, was carried out by: Khurram Jawad, and in my opinion, it is

fully adequate, in scope and quality, for the degree of Ph.D.

Supervisor:……………………………

(Dr. Asifullah Khan)

Head, Department Name ……………………

(Dr. Javaid Khurshid)


Page vii

Copyright Statement

The entire contents of this thesis titled „Multimedia Watermarking Using Intelligent

Techniques‟ and authored by Mr. Khurram Jawad, are an intellectual property of Pakistan

Institute of Engineering & Applied Sciences (PIEAS). No portion of the thesis should be

reproduced without obtaining explicit permission from PIEAS.


Page viii

Dedicated to my beloved Parents


Page ix

LIST OF PUBLICATIONS

1. K. Jawad and A. Khan, "Genetic algorithm and difference expansion based reversible

watermarking for relational databases," Journal of Systems and Software, vol. 86, pp. 2742-

2753, 2013.

2. S. A. Malik, A. Khan, M. Hussain, K. Jawad, R. Chamlawi, and A. Jalil, "Authentication of

images for 3D cameras: Reversibly embedding information using intelligent approaches,"

Journal of Systems and Software, vol. 85, pp. 2665-2673, 2012.

3. K. Jawad and A. Khan, “Robust and Blind Watermarking of Relational Databases Using Reversible

Contrast Mapping”. To be submitted in, IEEE Transactions on information forensics and security.

4. I. Hafeez, K. Jawad, M. Chaumont, and A. Khan, “Watermarking of DNA Sequences: Hybrid

Synonymous Substitution of Nucleotides and Dual Layer Error Correction”. Submitted in Protein and

Peptide Letters.


Page x

TABLE OF CONTENTS

Multimedia Watermarking using Intelligent Techniques ................................................................ i

Declaration of Originality ............................................................................................................... v

Acknowledgement ...................................................................................................................... xvii

List of Publications ........................................................................................................................ ix

Table of Contents ............................................................................................................................ x

List of Figures ............................................................................................................................... xii

List of Tables ............................................................................................................................... xiv

Abstract ......................................................................................................................................... xv

Abbreviations/Key Words ........................................................................................................... xvi

Introduction ............................................................................................................... 1 Chapter 1

1.1 Motivation and Objectives ............................................................................................... 3

1.2 Research Perspective ........................................................................................................ 4

1.3 Thesis Structure ................................................................................................................ 4

1.4 Contribution ..................................................................................................................... 5

Literature Review...................................................................................................... 7 Chapter 2

2.1 Genetic Algorithm and Difference Expansion based Watermarking for Databases ........ 7

2.2 Reversible and Blind Watermarking Technique for Relational Databases ...................... 9

2.3 Synonymous Substitution based Watermarking for DNA Sequences ........................... 10

2.4 Chapter Summary ........................................................................................................... 12

GA and DEW based Watermarking for Databases ................................................. 13 Chapter 3

3.1 Reversible Difference Expansion Watermarking (DEW) Method ................................ 14

3.2 Genetic Algorithm based Difference Expansion Watermarking (GADEW) Method .... 15

3.2.1 Message Authentication Code (MAC).................................................................... 16

3.2.2 Chromosome Structure of the Genetic Algorithm (GA) ......................................... 17

3.2.3 Calculating Fitness .................................................................................................. 18

3.2.4 Example of Obtaining TC, CrC, AwD, and TwD................................................... 21

3.2.5 Watermark Embedding ........................................................................................... 23

3.2.6 Watermark Extraction ............................................................................................. 25

3.3 Results and Analysis ...................................................................................................... 25

3.3.1 Capacity Analysis ................................................................................................... 26

3.3.2 Security Analysis .................................................................................................... 28

3.3.3 Different Attacks ..................................................................................................... 32

3.4 Chapter Summary ........................................................................................................... 37

Reversible and Blind Watermarking for Databases ................................................ 38 Chapter 4

4.1 Proposed Reversible and Blind Watermarking Technique for Relational Database ..... 39

4.1.1 Automatic Bit Checking ......................................................................................... 39


Page xi

4.1.2 RCM Transform ...................................................................................................... 40

4.1.3 Distortion Tolerance (DT) Check ........................................................................... 41

4.1.4 Watermark Embedding ........................................................................................... 42

4.1.5 Watermark Extraction ............................................................................................. 45

4.1.6 Analyzing Three Steps of RBW-RD ....................................................................... 47

4.1.7 Reduction in Watermarking Distortion ................................................................... 48

4.2 Improvements of The Proposed RBW-RD Technique Over RCM Technique .............. 49

4.2.1 Increased Watermarking Capacity .......................................................................... 49

4.2.2 Less Distortion with Same Capacity ....................................................................... 50

4.2.3 Reducing FPs and Distortion Because of Addition and Bit Flipping Attack ......... 53

4.3 Robustness Analysis of The Proposed RBW-RD Method ............................................. 55

4.4 Comparison of Proposed RBW-RD Technique with DEW Technique ......................... 59

4.4.1 Experimental Analysis of The Proposed RBW-RD Technique against DEW

Technique .............................................................................................................................. 60

4.5 Chapter Summary ........................................................................................................... 63

Watermarking of DNA Sequences .......................................................................... 65 Chapter 5

5.1 Sequences Used for Testing ........................................................................................... 66

5.2 The Proposed SSW-DNA Method ................................................................................. 66

5.2.1 Data Embedding Section......................................................................................... 67

5.2.2 Correction of Errors ................................................................................................ 68

5.2.3 Employing RS Codes for Restoring Mutation Losses ............................................ 68

5.2.4 Enhanced Synonymous Substitution Technique ..................................................... 70

5.2.5 Data Extraction Section .......................................................................................... 74

5.3 Results and Analysis ...................................................................................................... 78

5.3.1 Capacity of Storing Bits .......................................................................................... 78

5.3.2 RS Codes for Error Correction ............................................................................... 79

5.4 Comparison with Existing Methods ............................................................................... 82

5.5 Chapter Summary ........................................................................................................... 86

CONCLUSIONS AND FUTURE DIRECTIONS.................................................. 87 Chapter 6

6.1 Thesis Summary ............................................................................................................. 87

6.2 Future Research Directions ............................................................................................ 89

6.2.1 Intelligent Watermarking ........................................................................................ 89

6.2.2 Reversible Watermarking ....................................................................................... 89

6.2.3 Watermarking Different Objects ............................................................................. 90

References ..................................................................................................................................... 91


Page xii

LIST OF FIGURES

Figure 1.1 Watermarking System. .................................................................................................. 2

Figure 3.1 Chromosome Structure of GA ..................................................................................... 17

Figure 3.2 Calculating Attribute Wise Distortion ......................................................................... 20

Figure 3.3 An Example Of Calculating TC, Awd, Twd, And Crc ............................................... 23

Figure 3.4 Process Of Watermark Embedding ............................................................................. 24

Figure 3.5 Process Of Watermark Detection ................................................................................ 25

Figure 3.6 Capacity Comparison Of GADEW And DEW Method Using R-Dataset .................. 27

Figure 3.7 Capacity Comparison Of GADEW And DEW Method Using FCT-Dataset .............. 28

Figure 3.8 Std Comparison Of Ord, DEW, And GADEW Method Using R-Database ............... 30

Figure 3.9 GADEW method Bit Flipping, Deletion, And Addition Attack Comparison ............. 33

Figure 3.10 Tuple-Wise-Multifaceted Attacks Comparison Between DEW And GADEW

Method .......................................................................................................................................... 35

Figure 3.11 Attribute-Wise-Multifaceted Attacks Comparison between DEW and GADEW

Method .......................................................................................................................................... 36

Figure 4.1 RCM Domain For 8-Bit Attribute ............................................................................... 40

Figure 4.2 Block diagram Of Watermark Embedding Phase. ...................................................... 43

Figure 4.3 Block Diagram Of Watermark Extraction Phase. ........................................................ 44

Figure 4.4 Embedding and Extraction Algorithms of The Proposed RBW-RD Technique ......... 46

Figure 4.5 Capacity Comparisons After Deletion Attack ............................................................. 50

Figure 4.6 Measuring Capacity Against Varying DT ................................................................... 52

Figure 4.7 Watermark Detection After Bit Flipping Attack (With BitCheck) ............................. 53

Figure 4.8 Watermark Detection After Addition Attack (With And Without Bitcheck) ............. 54

Figure 4.9 Comparisons Of FPs Between Bit Flipping And Addition Attack .............................. 55

Figure 4.10 FP&TP Detection After Bit Flipping Attack(Without BitCheck) .............................. 56

Figure 4.11 RBW-RD Capacity Comparison Of Simple Relation And After 10% Subtraction,

Addition, & Bit Flipping Attacked................................................................................................ 56

Figure 4.12 RBW-RD Capacity Comparison After 10% To 80% Bit Flipping Attack ................ 57

Figure 4.13 RBW-RD Capacity Comparison of Simple Relation And After 80% Subtraction,

Addition, & Bit flipping (50% Attributes Altered) ....................................................................... 58


Addition, & Bit Flipping Attacked................................................................................................ 58

Figure 4.15 DEW Method Comparison After Subtraction, Bit Flipping, Addition Attack, & No

Attack On Relation (Average Of 10, 20…90% Attack) ............................................................... 59

Figure 4.16 RBW-RD And DEW Method Comparison After Bit Flipping Attack (After Taking

Average Of 0.01, 0.02, 0.04, 0.08, And 0.17)............................................................................... 60

Figure 4.17 RBW-RD Comparison After Subtraction, RBF, Addition Attack, & No Attack On

Relation (Average of 10, 20…90% Attack) ................................................................................. 61

Figure 4.18 RBW-RD And DEW Method Comparison After Addition Attack (After Taking

Average Of 0.01, 0.02, 0.04, 0.08, And 0.17) ............................................................................... 61

Figure 4.19 Capacity Of DEW Method After Changing Values Of DT ....................................... 62


Page xiii

Figure 4.20 Decrease In Distortion By Using RBW-RD (Fixed DT) Over DEW (Changing DT).

....................................................................................................................................................... 63

Figure 5.1 SSW-DNA Method ..................................................................................................... 67

Figure 5.2 RS Code Implementation ............................................................................................ 69

Figure 5.3 Structure of Text Encoded Using RS Coder................................................................ 69

Figure 5.4 Synonymous Substitution ............................................................................................ 70

Figure 5.5 Data Insertion In 4-Fold Degenerative Codons ........................................................... 72

Figure 5.6 Data Insertion In 2-Fold Degenerative Codons ........................................................... 73

Figure 5.7 Data Embedding Module ............................................................................................. 74

Figure 5.8 Reconstruction Of DNA Using Binary Strings ........................................................... 76

Figure 5.9 Data Extraction Module .............................................................................................. 77

Figure 5.10 RS Coder Performance For Point And Burst Mutation Scenario .............................. 80

Figure 5.11 RS Coder Performance In Burst Mutation Scenario ................................................. 81

Figure 5.12 RS Coder Performance In Point Mutation Scenario.................................................. 82

Figure 5.13 Bpn Comparison ........................................................................................................ 83

Figure 5.14 Average Uncorrected Mutations At Different Block Sizes ....................................... 84

Figure 5.15 Average Uncorrected Mutation Trend At Different N/K .......................................... 85


Page xiv

LIST OF TABLES

Table 3.1: Results Of Reducing Tuple And Attribute-Wise Distortion ....................................... 31 Table 3.2: Results Of Modification In Std And Mean .................................................................. 32 Table 4.1: Probability Of Watermarking For All Three Steps ...................................................... 48 Table 4.2: Mean And Std (Distortion) By Varying DT For Three Watermarked Relations ........ 51 Table 4.3: Measuring Improvement In Distortion By Using Different DT .................................. 52 Table 4.4: Effect Of Bit Checking On Distortion, Because Of FP‟s Caused By Bit Flipping And

Addition Attack (DT Check = 0-250) ........................................................................................... 54 Table 5.1: Dataset ......................................................................................................................... 66 Table 5.2: Data Encoding Table ................................................................................................... 68 Table 5.3: Synonymous Substitution ............................................................................................ 71 Table 5.4: VOSS Representation Of DNA Sequence ................................................................... 75 Table 5.5: Bit Storage Capacity .................................................................................................... 79 Table 5.6: Comparison With Existing Techniques For Data Hiding Capacity ............................. 83


Page xv

ABSTRACT

Increase in research and commercial interest is observed in the area of digital watermarking in current

era. Main reason behind this development is the extensive use of internet with high speed and bandwidth

availability. As a result, sharing of multimedia content is increased significantly, which rises need for

copyright protection and authentication. Digital watermarking provides vast commercial applications for

securing content of different multimedia objects. However, watermarking can cause permanent loss in the

content of the object. Different sensitive multimedia objects like medical, military, scientific etc. do not

tolerate such permanent loss. Therefore, the distortion needs to be minimized or completely eliminated so

that the true purpose of the multimedia content can be retained. Numerous watermarking techniques are

available for different sensitive applications. Our objective is to propose reversible and secure techniques

for watermarking different multimedia objects. Robust watermarking approaches are targeted in this

research. Main focus is to achieve maximum watermarking capacity and to minimize embedding

distortion, while preserving functional capability of the underlying multimedia object.

Two different types of multimedia objects are targeted in our dissertation; these include relational

databases and DNA medium. Genetic algorithm is selected in order to apply intelligent techniques for

improving different properties of digital watermarking. Two relational database watermarking approaches

are designed and developed; both approaches are reversible, robust, and follow the distortion tolerance of

the attributes. First approach uses genetic algorithm to improve capacity, reduce distortion, and false

positive rate of the difference expansion based watermarking technique. Whereas, second approach

improves watermarking capacity, reduces distortion, and false positive rate of the reversible contrast

mapping, which is first time designed for relational databases.

A robust data hiding technique for DNA medium is proposed, which increases watermark capacity by

improving current synonymous substitution technique and resists mutation losses. Moreover, synonymous

substitution does not causes any disturbance in the amino acid sequence, thus the DNA functionality is

retained. In order to tackle different DNA mutations binary strings and Reed Solomon Codes are applied.

The watermark is encoded before embedding using Reed Solomon Codes and the structural information is

retained using binary strings.


Page xvi

ABBREVIATIONS/KEY WORDS Abbreviation Description Abbreviation Description

RS Reed Solomon bs Bit stored

FP False positive A, C Adenine and Cytosine

TP True positive T, G Thiamine and Guanine

RCM Reversible Contrast Mapping bpn Bit Per Nucleotide

DT Distortion Tolerance DNA Deoxyribonucleic Acid

Std Standard Deviation RS Reed-Solomon

Λ Watermark embedding/extracting

parameter

n final watermark data (party and data

bits combined) embedded in DNA

R Original Relation k Data bits to be embedded in DNA

GA Genetic Algorithm n-k Number of parity bits

LSB Least Significant Bit. GMO Genetically Modified Organisms

DE Differential Evolution CrC Capacity related Cost

RR Restored Relation TwD Tuple-wise Distortion

H MAC Hashing Function AwD Attribute-wise Distortion

SK Secret Key TC Total cost

PK Primary Key OrD Original database

ti.PK Primary key of the Tuple i T Word length of attribute

1/ λ Fraction of tuple selected L Upper limit of RCM Domain

R-dataset Random Dataset Dec Binary to decimal

FCT-dataset Forest Cover Type Dataset RW Watermarked Relation

TV Target Value a Mark-able attributes

CV Changed Value M_OrD Attribute wise mean of OrD

TC Total Cost b Automatically generated watermark bit

Int_x, Int_y Integer value of attribute x and y M_DEW Attribute wise mean of RW using DEW

BCH Bose-Chaudhuri- Hocquenghem

Algorithm

S_GADEW Attribute wise Std of RW using

GADEW

Int_x', Int_y' Watermarked Integer value of

attribute x' and y'

S_DEW Attribute wise Std of RW using DEW

Frac_x, Frac_y Fraction value of attribute x and y S_OrD Attribute wise Std of R

Frac_x', Frac_y' Watermarked fraction value of

attribute x' & y'

DA, DB, & DC Distortion (mean & Std) measure for

RW using different value of DT

H MAC Hashing function DO Distortion measure for R

M_GADEW Attribute wise mean of watermarked

database using GADEW

DE, DF Distortion measure for RR with and

without bit checking

DEW Difference Expansion Based

Watermarking

δjk Upper (δj1) and lower(δj2) limit of DT

for attribute j

UCa Number of potential codons for

watermarking

UnCa Number of restricted codons for

watermarking


Page xvii

ACKNOWLEDGEMENT

All praises to Almighty ALLAH, creator of the Universe, Who made us the super creature, blessed us

with knowledge. I am grateful to Almighty ALLAH, the most Benevolent and Merciful, Who blessed me

throughout my life and gave me the ability to undertake such a challenging task and proceeding towards

completion.

I extend my sincerest thanks to my supervisor, Dr. Asifullah Khan for his generous guidance and

moral support during my PhD. I appreciate his endless patience, positive attitude, ability to provide

assistance and especially his willingness to put his students before his work. I thank him greatly for his

meticulous proof reading of all of my published work. His valuable suggestions and persuasive criticism

has led me to complete my goal successfully.

A very special note of thanks goes to my Sisters, Nephew, Niece, Uncle Arshad, Uncle Irfan Saeed,

Arshad Bahi, Ehtasham, Nana, Nano, late Dada, and late Dado whose heart felt prayers, appreciation, and

support have always been a valuable asset and a great source of inspiration for me. They always

encouraged me, whenever I was demoralized during my academic career. They really deserve special

thanks for enduring all my problems with great patience and love.

I am also indebted to Dr. Abdul Jalil, Dr. Muttawarra Hussain, Dr. Javed Khurshid and all other

teachers of the department for their corporation and encouragement to attain my goal. Thanks are due to

as this work would not have been possible without their encouragement and moral support. I gratefully

acknowledge Higher Education Commission of Pakistan for the financial support provided through

Indigenous PhD scholarship program.

Last, but certainly not the least, I would like to thank my friends and colleagues (Muhammad Sami,

Muhammad Shukaib, Dr. Zia Ur Rehman, Dr. Muhammad Tahir, Adnan Idrees, Dr. Aksam Iftekhar, Dr.

Maqsood Hayat, Mehdi Hassan, Dr. Sana Ambreen, Naeem Ur Rehman, Iqbal Murtza, Ibbad Hafeez,

Faheem Khan, and Raheel Zia). They helped me in times of troubles, praised me on my achievements,

and cheered me when I was down.

Khurram Jawad


Page 1

INTRODUCTION Chapter 1

Conventionally, corporation stamps and copyright logos were used to verify ownership and

authenticity. Such procedures are safe, provided that the documents can only be transported and

copied materially. However, requirements of the digital era have changed. With the increasing use of

fast internet connections, multimedia objects are handled, altered, and transferred unlawfully over the

network. Therefore, it is no longer adequate for the objects to just have a visible logo. Digital

watermarking consists of embedding a patent mark in a multimedia content so that the proprietor can

establish his right on the multimedia object [1].

Digital watermarking is divided into three categories, robust [2-7], fragile [8-13], and semi-fragile

[14-18]. A robust watermarking system can survive both deliberate and unintended (legitimate)

attacks. Semi-fragile watermarking schemes can survive certain (unintended) alterations but the

watermark is destroyed if it undergoes any deliberate attack. Whereas, fragile watermarking technique

can detect any slight modification to the watermarked multimedia object and the watermark becomes

undetectable after the watermarked object is altered by any method [1]. The important features of a

digital watermarking system include embedding capacity, fidelity, blind or semi-blind detection,

distortion tolerance, secret and cipher keys, robustness and security, reversibility, false positive rate,

and computational time.

Improving embedding capacity means more watermark bits can be embedded, which helps to

spread the watermark all over the multimedia object. Thus, watermark can be successfully detected

even from a portion (subset) of the watermarked object [19]. High embedding capacity helps

watermark to survive different attacks even if intensity of the attacks is severe. Visual quality of the

multimedia object should not be compromised after embedding the watermark and should not affect

the true usefulness of the multimedia content [20]. In case of sensitive databases, watermarking

process can affect the usefulness of the multimedia content. Quality of the medical related databases is

important so that they can clearly help in decision making for the physicians [21]. On the other hand,

attackers can detect the perceptual change and can modify the watermarked part to remove the


Page 2

watermark.

Blind watermarking technique can detect a watermark in an object without requiring any side

information. Whereas, semi-blind watermarking technique requires some side information to detect the

watermark [22]. However, non-blind detection requires, original version of the watermarked object to

detect watermark. Distortion Tolerance defines the level, up to which the quality of multimedia object

can be compromised after watermarking. Distortion Tolerance or usability constraint provides the

range, up to which the distortion can be introduced into the multimedia object. This helps to keep the

distortion within the acceptable range for a particular application [23].

A general structure of a watermarking system is presented in Figure 1.1. Cipher keys can be helpful

for encrypting the watermark that is to be embedded into the multimedia object. Cypher keys are also

used for semi-blind watermarking, where the side information can be encrypted and can be transferred

to the detection side along with the watermarked object [24]. Secret keys can be used for controlling

watermark embedding and detection for the targeted multimedia object. Both secret and cipher keys

help proprietor to randomly distribute the watermark throughout the multimedia object, so that the

attacker cannot predict the location of watermark [25]. Robustness points to the ability of the

watermark to endure normal operations of the multimedia content. Whereas, security refers to the

competency of watermark to fight daunting attacks. Thus, the watermark can be extracted from the

watermarked multimedia object even after attacker modifies its content [1].

Figure 1.1 Watermarking System.

Original Object

Embedding

Watermarked Object

Recovered Object

Watermark

Watermark

Watermarked Object

Extraction

Watermark Embedder

Watermark Extractor


Page 3

Reversibility enables watermarking technique to exactly restore the original multimedia object

during watermark detection process. This property is useful for sensitive multimedia objects that can

afford zero distortion in its unwatermarked content [26]. An important application of reversibility is

that the trial version of watermarked multimedia object can be sent for testing purpose. Once the buyer

is satisfied, then he can buy full multimedia object by just obtaining the secret key for the user. Thus,

the multimedia object can be restored to its original version by using the secret key [27].

A watermark may be falsely detected in an unwatermarked region of the object. On one hand, it can

affect the process of watermark detection and on the other hand it can also degrade the quality of the

recovered object in case of reversible watermarking technique. Therefore, for a good watermarking

technique false positive rate should be minimal [22]. Computational measure of the watermarking

system provides detail about how much time a watermarking systems can take for embedding and

detection. This property is referred as computational time of the watermarking technique [28].

1.1 Motivation and Objectives

In this era, internet and associated devices are undergoing immense improvement, as a result

electronic business, electronic commerce, electronic marketing and online buying and selling have

evolved tremendously. High speed internet and related technology has made illegal file sharing and

dissemination easier [23, 29]. This requires considerable attention to the copyright protection of the

multimedia contents. Digital watermarking provides vast commercial applications for securing content

of different multimedia objects. Watermarking can cause permanent distortion into the content of the

object. However, different sensitive multimedia contents related to medical, military, scientific etc.; do

not tolerate such permanent loss.

Our objective is to construct more appropriate and operational watermarking systems for different

multimedia objects i.e. relational database and DNA medium. It should provide larger understanding

of certain watermarking properties, such as watermarking capacity, reversibility, false positive rate,

blindness, imperceptibility, distortion, and usability. Furthermore, our focus is to minimize or

completely eliminate the watermarking distortion, so that the true purpose of the multimedia content

can be retained.


Page 4

1.2 Research Perspective

Advances in internet and associated devices have raised issues of unlawful manipulation, replication,

malicious interfering, and copying of multimedia content. Therefore, need for research in multimedia

integrity and security is increasing day by day. In order to safeguard the security of the digital media,

different techniques are proposed and watermarking is regarded as powerful tool for securing the

digital content.

This work focuses on improving the embedding capacity and reducing or eliminating distortion

along with achieving reversibility and robustness in domain of relational database and DNA medium.

Reversible Contrast Mapping (RCM-DB) and Genetic Algorithm based Difference Expansion

Watermarking (GADEW) are employed for database watermarking and have shown improvements in

results of different watermarking properties, i.e. increased embedding capacity, reduced distortion,

better robustness and false positive rate. An improvement in synonymous substitution technique is

proposed for DNA watermarking, which provides high embedding capacity and better robustness.

1.3 Thesis Structure

Chapter 2 covers a brief survey of the existing watermarking approaches. It presents the ideas and

methods described by different relevant methods. To make it more clear, the literature study is divided

into three sub sections, which provide literature survey for all the three proposed techniques.

Chapter 3 presents a reversible watermarking approach for relational databases. It uses genetic

algorithm (GA) to improve capacity and reduce distortion of difference expansion (DEW) technique.

Additionally, the proposed genetic algorithm based reversible watermarking (GADEW) approach is

robust against different attacks including sorting, addition, bit flipping, deletion, additive attacks,

attribute-wise-multifaceted, and tuple-wise-multifaceted. Randomizing selection of the features also

makes it hard for the aggressor to guess watermark. Difficulty of the false positive recognition is fixed

and even addition attack does not result in false positive recognition.

Chapter 4 demonstrates a reversible and blind watermarking method for relational database (RBW-

RD), that utilizes reversible contrast mapping (RCM) to achieve reversibility. It offers distinguishing

improvement in terms of payload, while following distortion tolerance (DT) for the relation and causes


Page 5

less distortion and low false positives (FPs). Along with previously mentioned properties, it also

retains pre-existed properties of RCM method, that is no encryption or compression is necessary and

the computational complexity is also low. It makes the most of both fraction and integer part for

watermarking. However, even if fraction part is changed the detection process is not disturbed. Finally,

it is also compared with difference expansion (DEW) based watermarking technique.

Chapter 5 provides SSW-DNA watermarking method that attains high capacity and achieves

robustness against mutations. It exploits whole coding region as a result high data storage is attained.

Existing DNA watermarking systems use only 4-fold synonymous codons, which may not increase

watermarking capacity, as 2-fold and 3-fold synonymous codons makes substantial portion of DNA.

The proposed method facilitates the use of 4-fold, 3-fold, and 2-fold synonymous codons, enabling

high data storage capacities. Structural information is retained using binary strings that make it a semi

blind approach and watermark is encoded before embedding using Reed Solomon codes. Biologically,

synonymous substitution method maintains the amino acid sequence, thus DNA functionality is

retained.

Chapter 6 summarizes the effort accomplished in this dissertation. Furthermore, it provides certain

future directions of the effort presented in current thesis.

1.4 Contribution

This work focuses on copyright protection of digital multimedia objects, while targeting capacity,

robustness, reversibility, and imperceptibility. The research covered in this dissertation contributes in

the following areas:

A watermarking approach for relational database GADEW is proposed that is reversible, semi-

blind, and robust. It provides better results in terms of capacity, distortion and false positive

rate as compared to existing approach. Reversible database watermarking is essential for

research and business in different applications.

A reversible, blind, and robust watermarking method for relational database RBW-RD is

presented. It achieves high capacity and causes less distortion and false positives. Comparison


Page 6

with RCM and DEW based watermarking method shows that the proposed RBW-RD

approach is superior.

A new semi-blind data hiding technique for DNA medium SSW-DNA is proposed. It provides

high embedding capacity along with providing robustness against DNA mutations. The

proposed technique uses improved synonymous substitution technique, which does not, affects

functionality of DNA.


Page 7

LITERATURE REVIEW Chapter 2

A brief survey of the existing watermarking approaches in field of relation database and DNA

watermarking is provided in current chapter. The ideas and methods described by different current

researchers are elaborated. For simplicity, the literature study is divided into three separate sections

representing three techniques.

2.1 Genetic Algorithm and Difference Expansion based

Watermarking for Databases

Major reason behind growth in research and business is because of easy availability of the internet.

These days, distributing data upon internet is essential for research and business, which also

encompasses buying/selling of database. Exchange of information in some important areas like

scientific, medical, stock market etc. is essential. Therefore, it is necessary to restrict unlawful copying

and circulation of relational databases [18]. In this regard, tamper resilient shipping and proof of

proprietorship of relational databases is the utmost puzzling concern [19].

Sharing of different multimedia objects (for example image, text, and audio etc.) can be secured

using watermarking techniques [30]. Likewise, watermarking provides effective solution for securing

relational databases. Though, a relational database provides very little bandwidth for embedding

watermark into its contents. Therefore, embedding more payloads in database can result in losing true

meaning of its content. At the start, Rakesh and Jerry [31] used least significant bits (LSB) of database

attributes for embedding watermark bits. As a result, they permanently altered the relational database

for watermarking. However, the attackers can easily manipulate the trivial LSB based watermarking

method.


Page 8

In another work, Sion et al. [19] used collection of tuples (partitions) to embed watermark bits in its

statistics. Statistics were manipulated using distortion tolerance, which is responsible of keeping

values of the database content in a predefined limit. Shehab et al. [32] used optimization approaches

for bringing improvement in the Sion et al method. For the purpose of optimization, hiding function of

technique is targeted, by using Genetic algorithm (GA) to embed watermark in partition statistics of

database.

Additionally, Mailing et al. [33] used GA to hide watermark in database statistics by targeting

frequency domain. Their primary intention was to increase the detection efficiency of watermark;

therefore, the correlation between watermark and database was targeted. However, all of the above-

mentioned watermarking techniques permanently distorted the content of dataset, which cannot be

restored to original version at detection side. This triggered the concept of using reversible

watermarking in the domain of relational database.

Gupta and Pieprzyk [34] for the first time used a reversible difference expansion based

watermarking (DEW) approach for watermarking relational databases. Reversible watermarking

approach can recover embedded watermark as well as restore the database to its original form without

causing any permanent distortion in it. DEW approach of Gupta and Pieprzyk also has implicit

distortion tolerance check capability for both watermark embedding and extraction. A scenario of

additive watermarking attack is also addressed by Gupta et al. [35] using the DEW approach.

Farfoura et al. prediction error expansion (PEE) method utilizes single attribute to embed

watermark [36]. As they used fraction part of the numeric attribute, an adversary can attack fraction

part without disturbing the distortion tolerance of attribute. PEE method have no distortion tolerance

check capability for an attribute, therefore, distortion limit of attribute may be compromised.

Prevailing watermarking techniques [35, 37-39] are less capable of utilizing different attributes for

fitness. Current approaches only check the distortion tolerance of selected attributes, and if the

condition is not fulfilled then the tuple is left unwatermarked. Therefore, there is need for checking

more potential attributes in the same tuple for embedding watermark. Evolutionary methods are

incorporated for achieving optimal solution in the area of pattern recognition [7, 40-43]. Thus, a global

approach is required that utilizes intelligent approaches for refining results of the watermarking

method, by achieving reasonable balance among the elementary properties of a watermarking

approach [24, 44-46].


Page 9

2.2 Reversible and Blind Watermarking Technique for Relational

Databases

Relational database is a key multimedia item, and security of its content is a tough job [22, 32, 47-50].

Specially, safeguarding relational database such as those concerned with consumer behavior, weather,

medical, stock market, scientific data, defense, and business is an intimidating job. It is obvious that,

sharing of database with certain organizations is necessary, for better utilization of its content. For

example, intelligent data mining methods help in recognizing interesting outlines in databases, which

helps the process of decision making. Subsequently, sharing of information among its proprietors and

data mining corporations is increasing. For this purpose, watermarking is beneficial for providing

resolution to prohibited copying and relocation of database over the world wide web [50].

Kamran et al. [49] presented a watermarking system that sets the distortion tolerance (DT)

according to the dataset semantics. DT was determined to attain balance among robustness and

distortion. Furthermore, Kamran and Farooq [50] recommended a model for DT for outsourced

classification datasets. Whereby, their informed watermarking does not disturb the classification

capability of the datasets. But, their method is not reversible.

Some motivating database watermarking procedures also observe DT limit for watermarking [34,

35, 37-39]. Though, majority of the prevailing watermarking practices introduce lasting distortion in

the database. However, real life applications like military and medical cannot bear everlasting changes.

Therefore, diverse reversible database watermarking methods are reported in literature [22, 34, 36].

Reversible watermarking methods have the advantage of obtaining the original multimedia object as it

was before watermarking, along with the extraction of watermark information.

DEW method causes high watermarking distortion that is recovered during the detection process.

Furthermore, high DT can result in high false positives (FPs) during detection. On the other side

watermarking capacity may decrease on keeping low DT [35]. Coltuc and Chassery [51] proposed

reversible contrast mapping (RCM) technique to watermark image objects. RCM method has fast

embedding and detection, because no compression is involved in it. In another method, Chen and

Wang [52] devised a new RCM technique having steganalysis for detection and approximation of

hidden message length. They utilized the information obtained by calculating the difference among

cover and watermarked image. Hence, probability information of the pixels belonging to RCM domain

[53] is utilized for watermark detection and approximation.

Payload, imperceptibility, and distortion is enhanced using RCM technique by Maiti and Maity


Page 10

[54]. Structural similarity index is utilized to obtain the covariance and variance among watermarked

and cover work. Their focus was to preserve structural information while achieving high watermarking

capacity. RCM method is applied for watermarking in DNA content by Mousa et al. [55]. Without

using any compression method, their approach achieves high watermarking capacity. However, it can

be observed that RCM based watermarking methods do not provide high capacity [51, 54, 55]. This is

mainly, because both watermark bits and extra bits (supporting) are combined as payload bits.

Chen and Wang [52] have first used RCM method to calculate which pixels do not belong to RCM

domain. Afterwards, they have only used those pixels as watermark information that does not belong

to RCM domain. A watermarking technique inspired from RCM method needs to be proposed for

relational database objects. That can embed watermark in every selected attribute, without leaving any

attribute pairs that belong or do not belong to the RCM domain of watermarking. As a result, more

payload can be embedded, while need for encryption or compression and extra storage can be omitted.

2.3 Synonymous Substitution based Watermarking for DNA

Sequences

Research interest in DNA watermarking has increased in previous decade [56]. Initially, information

was secretly hidden in DNA by using the idea of microdots that were used in world war II for

information concealing [57]. In another work, DNA arrangement of Deino-coccusradio-durans was

used for storing song script [58]. Few biological organisms have capability of surviving severe

condition like high temperature, radiation etc. Therefore, their DNA can be useful for hiding data and

the watermark information can also be successfully saved and extracted. Modegi et al. proposed an

information hiding technique that utilizes codon-usage bias-feature to embed data in genomic

sequences [59]. A modified Huffman coding approach was used by Ailenberg et al. [60] to embed

watermark in the DNA sequence. A DNA information hiding method inspired by multiple sequence

alignment was proposed by Yachie et al. [61].

Arithmetic coding was used for degenerative codons by Shimanovsky et al. to hide information in

DNA sequence [62]. Coding region of bacteria DNA was utilized by Arita et al. for storing the

information by applying synonymous substitution technique upon nucleotides [63]. Synonymous

substitution method has advantage of sustaining the amino acid sequence; hence the DNA

functionality is retained. DNA-Crypt model was presented by Heider et al. by combining error


Page 11

correction and encryption methods [64]. Difference expansion and lossless compression was used by

Chang et al. to embed watermark in DNA sequence [65]. Likewise, Shiu et al. presented three diverse

DNA data-hiding approaches that comprise insertion, substitution, and complementary approaches

[66]. Mousa et al. presented a reversible contrast mapping method for DNA watermarking [55].

Twenty million replicas of a book, having 5.27 megabyte of data was stored by Church et al. in the

DNA, showing the importance of DNA data storing [67].

Aisling O‟ Driscoll [68] presented a brief overview to certain problems that can grow into probable

difficulties in the large scale acceptance of DNA based storing techniques. The author enlightened how

the problems like decoding, write-once, and sequential read of DNA based storage can be fixed by

using simple schemes. Likewise, write-once problem of rewriteable DNA storage was proposed by

Bonnet et al.[69]. Previous lacking ability of rewriting a data over DNA was resolved by the proposed

method of rewriting the digital data in living cells using directional recombination. 739 kilobyte file

was embedded and extracted using artificial DNA by Goldman et al. [53]. They helped to fulfill the

need for storing huge volumes of data.

Some of the techniques discussed above have certain limitations in DNA watermarking domain.

Some of these techniques cannot be incorporated on living organisms and artificial DNA is used

instead [56-58]. Whereas, those techniques that provide data hiding capability in living organisms, do

not support the handling of losses resulted because of mutation [62, 63]. While, maximum number of

mutations is adjusted by cellular events. For example, through DNA repair, few uncorrected mutations

can result in substantial loss of the watermark.

Similarly, structural arrangement of DNA may be affected by missense and nonsense mutations. As

a result, substantial loss of watermark is observed [58, 63, 67]. Both Missenses and nonsenses

mutations are type of point mutation which exchanges a single nucleotide for another. Missense

mutation converts one amino acid in another amino acid while nonsense changes the nucleotide in

such a way that normal codon turns into a premature stop codon. Frame Shift is type of mutation that

causes insertion or deletion of nucleotides. Silent Mutation is also known as synonymous substitution,

whereby alteration in any nucleotide, results in some different codon. Silent mutation do not disturb

the amino acid sequence and even after mutation the codon translates to same amino acid [70]. The

latest effort utilizes artificial DNA for storing information that involves special situations for survival

and they are not meant to store data on living organisms. Consequently, the information from DNA

cannot be recovered once DNA is affected. Similarly, mutation scenarios are also not given significant

importance. However, the main cause of evolution is mutation, which can result in losing watermarked


Page 12

data of DNA.

In order to increase robustness compared to mutations, additional information for error correction

can be used; as a result payload is reduced. Biologically synonymous substitution method, maintains

the amino acid sequence, consequently the DNA functionality is retained. Likewise, it is necessary to

attain balance between different properties of the watermarking system. Therefore, the DNA

watermarking approach needs to have high embedding capacity and more resistant against mutation

attacks.

2.4 Chapter Summary

This chapter reports relevant literature in the area of multimedia watermarking. It was designed to

cover two multimedia objects including relational databases and DNA medium. Next chapter

elaborates the genetic algorithm and difference expansion based reversible watermarking for relational

databases.


Page 13

GA AND DEW BASED Chapter 3WATERMARKING FOR DATABASES

Relational databases consist of relations (tables), which contain set of values, organized horizontally

as tuples and vertically as attributes. Addition, alteration, and deletion operation can be performed on

the tuple. Attributes constitutes important properties of a relation, alteration operation can be

performed on them but deletion or addition in attributes can result in losing true meaning of the

relational database. Selection of tuple and then selecting attributes gives a selected cell value that can

be used to embed watermark. These cell values are referred to as target values (TV). The TV‟s are

converted to changed values (CV‟s) after embedding watermark.

Difference Expansion based watermarking technique (DEW) is a reversible technique that can

embed watermark using at least two TVs. In DEW the focus is on how much distortion can be

introduced into the TVs. Limit of distortion for each attribute is different and it is known as distortion

tolerance. Distortion tolerance represents upper and lower limit for each attribute to allow change in its

values. Therefore, no CV should exceed distortion tolerance of its attribute. One can therefore, reduce

distortion by using small value for distortion tolerance; however, this may result in limited amount of

watermark embedding into the cover work. In order to keep the watermarked attribute hidden from the

attacker, we thus select those attributes that are hidden in the neighborhood and cause minimum

distortion in the cover work. The neighborhood represents two contiguous cells located above and two

cells below of TV within the similar attribute.

Another problem with DEW method is the false positive detection. Due to the false positives, exact

extraction of the watermark and restoration of the original data is not possible. However, we have

observed that it is easy to recover the original watermark and original data exactly using a semi blind

detection technique. Therefore, the problem of false positive is resolved using side information.

Similarly, changing order of attributes doesn‟t affect the usability of a relation in a database. However,


Page 14

reshuffling attributes at detection side may affect the process of watermark detection [71]. This

problem can also be resolved by passing the order of attributes to the detection module.

Existing DEW approach is mostly unable to increase watermark capacity of the relation without

increasing distortion tolerance of the attributes. Attacker can thus use distortion to predict marked

attributes, which may affect successful detection of the watermark. Consequently, false positives and

changing order of attributes can‟t be tackled at the detection side. In order to solve the above problems

for database watermarking, we thus propose a Genetic Algorithm and Difference Expansion based

watermarking (GADEW) technique. GA and DEW are combined to select suitable TVs for reversible

watermarking, which can help to improve capacity and reduce distortion. Mean and standard deviation

of the watermarked relation are used to evaluate distortion in the attributes. False positives are

eliminated and problem of shuffling attributes at detection side is resolved.

Proposed GADEW technique is able to increase watermarking capacity of the relational database at

a fixed distortion tolerance. Distortion tolerance enforces limits on each attribute so that the value may

not lose its meaning during watermark insertion. Distortion introduced due to watermark insertion is

reduced to minimum by introducing tuple-wise and attribute-wise distortion measures. GADEW is a

reversible watermarking technique that recovers both watermark and cover work exactly as it was

before watermark insertion. Additionally, it is robust against different attacks including, addition,

deletion, sorting, bit flipping, tuple-wise-multifaceted, attribute-wise-multifaceted, and additive

attacks. Random selection of attributes also makes it tough for the attacker to predict watermark.

Problem of the false positive detection is resolved and even addition attack doesn‟t result in false

positive detection.

3.1 Reversible Difference Expansion Watermarking (DEW) Method

Invertible mathematical operations on integer values are undertaken by DEW method [26]. It embeds a

watermark bit in target value (TV) and both original value and watermark bit can be recovered. Two

cell values (TVx and TV

y) are selected using two attributes of a selected tuple of a relation. Three steps

are used for embedded watermark in the selected values. Equation (3.1) shows mathematical

representation of obtaining difference d and average b. Equation (3.2) represents insertion of

watermark bit b and converting d to d'. Equation (3.3) explains how both changed values (CVx and

CVy)

are obtained.


Page 15

x yx y(TV +TV )

a = ,d = TV -TV2

(3.1)

d' = 2 d +b (3.2)

x y(d' +1) d'CV = a+ ,CV = a -

2 2

(3.3)

Three steps are used for extracting watermark bit from the changed values and restoring original

values. Equation (3.4) shows how average a and difference d' are obtained. Equation (3.5) is showing

process of extracting watermark bit b from the d'. Equation (3.6) represents the recovery of original

values (TVx and TV

y).

x yx y(CV +CV )

a = ,d' = CV -CV2

(3.4)

2

d'b = d' - 2

(3.5)

x y(d' +1) d'TV = a+ ,TV = a -

2 2

(3.6)

3.2 Genetic Algorithm based Difference Expansion Watermarking

(GADEW) Method

DEW method only embeds watermark into the TV‟s if the CV‟s are within distortion tolerance limit of

their attributes. Distortion tolerance represents upper and lower change that can be tolerated by

attributes. Each attribute have separate limit for distortion tolerance depending upon the properties of

information it is carrying.


Page 16

The procedure of Gupta and pieprzyk [34] has a shortcoming as it only checks two TVs in a

selected tuple. Their approach left the tuple unmarked if CVs exceeds the distortion tolerance. It

caused reduction in the embedding capacity of the object. Whereas, exploring capability of different

combinations of TVs in the selected tuple can be handy. Therefore we are exploring capabilities of

different attributes for improving watermarking capacity. Along with improving watermark capacity,

we have also considered reducing distortion of the host object. For this purpose we have employed GA

to embed watermark using DEW method. Instead of just relying on capability of the selected

attributes, it explores combination of attributes in same targeted tuple, and attempts to select the

optimal attribute pair.

Attribute and tuple-wise distortions are combined to decrease the watermarking distortion.

Attribute-wise distortion represents neighborhood cell values of TV. The neighborhood represents

contiguous cells located above and below of TV within the similar attribute. Because each attribute

show values that are nearly independent of the other attributes. Accordingly, TV is matched at definite

position in the neighborhood. Conversely, tuple-wise property only targets the selected tuple and the

distortion within same tuple is taken in consideration. This feature helps to minimize distortion and

develop amenity against the watermarking predictability. The proposed GADEW method makes the

most of watermarking capacity along with decrease tuple and attribute-wise distortion. GA fitness

function uses the above mentioned features for optimization. Tuple and attribute is selected using

message authentication code (MAC) [72].

3.2.1 Message Authentication Code (MAC)

It is a hashing technique that gives a MAC value by using primary key (PK) and secret key (SK). PK

is used to uniquely identify a tuple and attribute in a relation. Where, SK is provided by the owner of

the database. A hashing technique provides integrity along with authenticity of the PK. Equation (3.7)

represents how a MAC value is obtained, here || shows concatenation operator.

MAC value = H(Sk || H(Pk || Sk)) (3.7)

Every time, a different value of PK gives a varied value of H. This property randomizes the

watermark bit and brings difference in indexes for the selection of tuples and attribute pairs. Owner of

the watermark uses SK to provide data integrity as well as authenticity of the PK.


Page 17

3.2.2 Chromosome Structure of the Genetic Algorithm (GA)

GA is a global optimization method and idea is taken from nature‟s evolution procedure [48].

Chromosome represents a solution to the given problem. Chromosome structure for the proposed

GADEW technique is provided in Figure 3.1. For each selected tuple only two TVs are used for

embedding. The tuples are selected using MAC [72] procedure. Thus, the size of chromosome is two

times greater than the total selected tuples for watermark.

Total number of chromosomes describes the size of population for GA. Successive populations are

identified as generations of GA. Genetic operators are operated on the existing population and as a

result new population is created. Common operators of GA are crossover, elitism and mutation. Fitness

function determines the fitness of the every chromosome of the population. Normally optimization

process of GA is completed when the total generations are expired or any given fitness condition is

fulfilled. As a result the final solution for the problem is represented by the best chromosome obtained.

Figure 3.1 Chromosome Structure of GA

Crossover operator combines portions of two or additional chromosomes to produce fresh

chromosomes. Elitism operator transfers the best chromosomes of current generation to the next

generation without performing crossover or mutation on them. Whereas mutation operator produces

fresh chromosome by varying values of one or more cells randomly. Mutation is responsible for

discovering the new individuals in the search space.

Collection of different fitness parameters showing separate objectives can be combined into a single

fitness function, such type of fitness functions are known as Multi objective fitness function. Multi

objective fitness function carryout concurrent optimization of several objectives, for example: cost and

performance are two separate objectives. These objectives can be contradictory and cannot be

optimized concurrently; therefore a midway result can be obtained.

TVx1 TV

y1 TV

x2 TV

y2 TV

x3 TV

y3 …………… TV

xn TV

yn

TVx = 1

st target value, x= [2,v]

TVy = 2

nd target value, y= [2,v]

n = Total number of selected tuples

v = Total number of capable attributes, where position 1 is reserved for PK

Two target values

(TVs) of 1st

selected tuple

Two target values

(TVs) of 2nd

selected tuple


Page 18

3.2.3 Calculating Fitness

Determining suitability of chromosomes is carried out by GA fitness function. In Multi-objective GA

different objectives/parameters can be combined in single fitness function represented as total cost

(TC). Number of constraints can be fulfilled while using multi-objective GA, which can optimize

multiple objectives concurrently [6, 73]. Three parameters, Capacity related Cost (CrC), Attribute-

wise Distortion (AwD), and Tuple-wise Distortion (TwD) are united in our fitness function. Fitness for

every parameter is obtained and finally summed together as TC. This process is carried out for every

chromosome of the population. Finally, the chromosome with least value of TC is forwarded to the

next module for watermark embedding.

3.2.3.1 Cost related to Capacity

Equation (3.8) is responsible for determining cost related to Capacity (CrC). If distortion tolerance of a

selected TV is not satisfied then the CrC is incremented. In order to insert watermark in the TV of

selected attribute, it is mandatory to check that the CV does not exceed its distortion tolerance.

Equation 3.8 contains CrC, which denotes overall sum of unsuccessful watermarked tuples, whereas,

overall sum of the entire selected tuple of a relation is denoted by n.

Three equations 3.1, 3.2, and 3.3 are utilized to convert the TVs into CVs, by inserting the

watermark bit. CrC is incremented only if the CVs are not fulfilling the distortion tolerance of their

attributes. For example, out of 100 selected tuples only 65 pair of TVs satisfies the limits of distortion

tolerance. Whereas, other 35 pair of TVs does not satisfy the distortion tolerance of their attributes,

therefore the aggregate cost for the current chromosome will be 35.

1

n

k

k

CrC WMUnSuccessful WatermarkBit

(3.8)

0,, ( )

1,

k

k

k

if WatermarkBit is Insertedwhere WMUnSuccessful WatermarkBit

if WatermarkBit is NotInserted

3.2.3.2 Distortion related to Tuple

TC consists of a second factor, which is the resulted distortion of the CV because of watermark

insertion. Transformation of TVs to CVs because of DEW technique signifies distortion. Different

TVs can be selected for embedding watermark bit in a single tuple. However, selection of TVs if done


Page 19

randomly can cause distortion [34], therefore those TVs are targeted that causes less distortion.

Process of measuring tuple-wise distortion (TwD) is explained using equation 3.9. Absolute difference

between the resulted CVs and the TVs is obtained that is then summed up as distortion related to tuple.

TwD for a chromosome uses n-CrC that shows all watermarked tuples only.

(3.9)

3.2.3.3 Distortion related to Attribute

Neighborhood of the selected TV is used for obtaining distortion related to attribute referred as

Attribute-wise distortion (AwD). An attribute in a relation may not dependent on any other attribute of

the relation. For this purpose, four immediate values are taken in account that is adjacent to TV. Both

two values above and two values below the TV are considered as neighborhood. Following equation is

used for obtaining AwD.

j=k+2 j=k+2 j=k+2 j=k+2

j=k -2, j=k -2, j=k -2, j=k -2,

j 0 j 0 j 0 j 0

x x y yj j j j

n-CrCx x y y

k k k k

k=1

AwD =

X X X X

- CV - -TV - CV - -TV4 4 4 4

(3.10)

In equation (4.10), n-CrC denotes total number of marked tuples, where X denotes two values

beneath and two values above TV. Every attribute is autonomous from other attributes; hence values

of each attribute have a separate arrangement. Sudden modification in the sequence may become

suspicious for the aggressor as well as raise the distortion. Thus aim is to reduce distortion, so that

aggressor may not calculate the watermarked attribute.

Two cells values below and above of a one selected TV are presented in Figure 3.2. Mean of these

four cell values is obtained. CV is attained after embedding watermark in the TV utilizing DEW

method. Mean value of the neighborhood is deducted from both TV and CV, absolute is obtained for

them. Their difference is summed as the AwD for only one CV. This procedure is done for both TVs

of every selected tuple and their values are finally summed as AwD.

yx x y

k k k k

n-CrC

k=1

TwD = TV CV TV CV


Page 20

These fitness parameters are obtained for every chromosome of the GA population. Finally,

chromosome having least TC is selected as the answer of the problem. Minimizing attribute and tuple-

wise distortion of the embedded watermark can provide benefit against the aggressor. DEW base

watermarking method can introduce huge distortion in the underlying object. Aggressor can utilize

these distorted values to guess the watermarked attributes. Probabilities for aggressor are minimized

by adding these fitness parameters in our fitness function called as TC.

Figure 3.2 Calculating Attribute Wise Distortion

3.2.3.4 Overall Cost

Procedure of computing Overall/Total cost (TC) is explained in equation (3.11). TC can be calculated

in equation (3.11) by combining three equations (3.8, 3.9, and 3.10) explaining AwD, TwD, and CrC.

TC = [CrC Wc] +[TwD Wt] [AwD Wa]

(3.11)

W represents a weight vector comprising of Wa, Wt, and Wc. Here, Wc demonstrates weight

allocated to CrC. Whereas, Wa and Wt are the weights allocated to AwD and TwD, respectively. Sum

Absolute difference

of AV and CV

Attribute value of 2nd

neighbor above (X1)

Attribute value of 1st

neighbor above (X2)

Attribute value of 1st

neighbor below (X3)

Attribute value of 2nd

neighbor below (X4)

Attribute value of

first tuple n=1

Attribute value of last

tuple n

Target value (TV) Attribute-wise

distortion (AwD) for one CV only

Difference expansion based

watermarking (DEW)

Changed value (CV)

Average value (AV)

of X1, X2, X3, X4.

Absolute difference

of AV and TV

Calculating difference


Page 21

of all these three weights equals to one. These weights can be adjusted accordingly. In order to give

emphasis to certain parameter of the fitness function, the weight value of that parameter can be

increased.

CrC, TwD, and AwD are obtained and summed together as TC, for every chromosome. Superior

chromosomes have smaller value for TC, whereas inferior chromosomes have higher value. This

procedure is performed for all chromosomes of GA population to discover the finest chromosome.

Chromosome with least value of TC is graded as more appropriate.

All fitness constraints can be joined in a solo fitness function or can be used discretely agreeing to

our necessities. GA can be used distinctly for two stages, initial stage can increase capacity and

secondary stage can decrease distortion. Undertaking this one can attain maximum benefit of

achieving more capacity and least distortion. One can also fix weights (W) for each of the three fitness

parameters in order to provide extra worth to certain parameters and fewer to some, by means of this

one can attain the results accordingly.

3.2.4 Example of Obtaining TC, CrC, AwD, and TwD

We are explaining the optimization phenomena for a single selected tuple using an example. Figure

3.3 gives logical explanation of the whole process, by clarifying equations (3.8, 3.9, and 3.10) of the

proposed GADEW method. Initially, watermark is embedded in third tuple of all five attributes (five

TVs) using DEW method. Figure 3.3 (a) represents example dataset and its selected TVs. DEW

method involves two target values TVx & TV

y to embed one watermark bit. In this example the

position of TVx is kept constant and its value is remains the same that is 49. Whereas, selection among

four TVy

is carried out in order to describe the procedure of selecting near optimal TVy. Distortion

tolerance for each attribute is stated at upper most row of the example dataset, which denotes upper

and lower limit for each attribute.

Consequently, changed values (CVs) are obtained by using DEW method on the four TVs one by

one as TVy, whereas TV

x remains constant. Figure 3.3 (b) shows four CVs that were obtained by

transforming one constant TVx, and four CV

ys that were obtained by transforming four different TV

ys.

In order to provide better understanding, only four TVy for four attributes are explained. Only first

three CVys fulfill the distortion tolerance of their attributes, whereas fourth CV

y4 doesn‟t fulfill the

distortion tolerance of its attribute. Therefore, CrC is only incremented for the fourth attribute only,

which is provided in Figure 3.3 (c).


Page 22

Target Value

TVx

Target Value

TVy

Equation (3.9) shows process of obtaining TwD, which favors the selection of smallest CV

obtained by DEW method. Figure 3.3 (d) shows CVs of TwD, after applying it on present example.

Consequently, third CVy3 shows smallest distortion that is 2. Equation (3.9) shows process of

obtaining AwD, which favors the choice of smallest CV in perspective of its neighborhood. Figure 3.3

(e) presents CVs after applying AwD on the present example; as a result first CVy1 shows smallest

distortion which is -3.

Figure 3.3 (a) Example dataset followed by its TVs

Attributes Constant

Attribute

First

Attribute

Second

Attribute

Third

Attribute

Fourth

Attribute

Distortion

Tolerance 42-53 59-72 54-62 50-61 58-64

1 48 69 59 54 63

2 46 69 57 53 60

3 49 64 55 53 60

4 42 69 56 54 62

5 48 69 60 55 63

TVs of

selected tuple

TVx1 TV

x2 TV

x3 TV

x4 TV

y1 TV

y2 TV

y3 TV

y4

49 49 49 49 64 55 53 60

Figure 3.3 (b) Resultant CVs by applying DEW method on TV.

CVx

1 CVx2 CV

x3 CV

x4 CV

y1 CV

y2 CV

y3 CV

y4

DEW method 41 46 47 43 71 58 55 65

Figure 3.3 (c) Capacity related cost

CrC equation 3.8 0 0 0 1 0 0 0 1

Figure 3.3 (d) Tuple wise distortion

TwD equation 3.9 7 3 2 5 7 3 2 5

Figure 3.3 (e) Attribute wise distortion

AwD equation 3.10 2 -3 - 2 0 -3 -2 0 1

Figure 3.3 (f) Total Cost

TC equation 3.11 2.75 0 0 1.25 1.5 .25 .5 2


Page 23

Figure 3.3 An Example Of Calculating TC, Awd, Twd, And Crc

Total cost (TC) utilizes all of the above values in order to finalize a TVy, which meets the

requirement of all three fitness parameters. As a result only one TVy

is selected, for which TC has

smallest value. Figure 3.3 (f) displays TC for each CVs involved in the example; finally second CVy2

is chosen as near optimal option. In this specific example weights of the equation (3.11) are adjusted

as Wc = 0.5, Wt = 0.25 and Wa = 0.25.

Consequently, watermark can be embedded in second CVy2 using DEW method. Second CV

y2 is

not best (smallest) in every of the fitness parameter; however it is more appropriate to all of the fitness

parameters if matched to other CVs. Use of DEW method alone, which doesn‟t include GA may

simply select the fourth CVy4 and find that it doesn‟t satisfy the distortion tolerance of its attribute. As

a result, no watermark is embedded in the specific tuple and full tuple is left unused. Whereas,

GADEW method can search for potential TVs that support watermark embedding and also favor the

selection of least distorted CV.

Optimization method is elaborated by above example for only one tuple that shows four pair of

TVs. But, chromosome used in the GADEW method only shows one pair of TVs from every selected

tuple of the relation, as shown in figure 3.1. Additionally, CrC for every TV of the chromosome is

added and provided to equation (3.11) for obtaining TC; this procedure is also executed separately for

both TwD and AwD for the same chromosome. Lastly, TC is computed for all the chromosomes that

determine the fitness of all chromosomes and chromosome with smallest TC is finalized as near

optimal choice.

3.2.5 Watermark Embedding

Stages involved in watermark embedding method are represented in Figure 3.4. Embedding process is

divided in three modules. Preprocessing module initializes values for DEW method and to select

suitable fitness parameters for next GA Module. Selection of tuple is carried out by MAC which is a

hashing method. It requires primary key of the selected tuple and a secret key selected by the

proprietor [34]. Every attribute can endure only defined amount of alteration, which denotes upper and

lower limit of the attribute, called as distortion tolerance. Hence, distortion tolerance is diverse for

every other attribute.

Primary key is used to sort the tuples of a relation, for handling tuple deletion attack. Whereas,

attribute wise sorting is carried out alphabetically, by using the names of attributes. Once the sorting is


Page 24

carried out during the embedding phase, then same process can be used during the extraction phase.

We are using three parameters for handling the GA fitness function. In order to set the fitness

according to the requirement, weights (W) of the three parameters are used. Once the initial

parameters of DEW method and weights of the GA fitness function are determined, then the GA

module can start its optimization process.

Figure 3.4 Process Of Watermark Embedding

Near optimal chromosome is generated using GA module by using the suitable GA fitness

function. Number of the selected tuple determines chromosome size, which is twice the size of

selected tuple because two TVs are involved in DEW method. In order to bring variety in the

chromosome, three main components involved in GA are used that are selection, crossover and

mutation. Near optimal chromosome that is best among the available chromosomes of GA module are

forwarded to the insertion module for embedding watermark in it. Watermark bits are obtained using

the MAC technique and CVs are obtained using DEW method. CVs are checked against the distortion

tolerance of its attributes, if the change is within the limit then the CVs are replaced by TVs.

Sorting Tuples and

Attributes

Initial values Relation to be watermarked

Tuples to be watermarked

Private Key

Distortion tolerance

MAC

Preprocessing Module

Selection of GA Fitness parameters

Capacity

Related

Cost

Attribute-

wise

Distortion

Tuple-wise

distortion

GA Module

GA Initialization Size of Chromosome

Size of Population

Termination Criteria

Number of GA Iterations

Fitness evaluation

New Generation

Crossover

Mutation

Elitism

Iterations

Expires

Change values (CV)

using DEW Technique

Comparing distortion

tolerance

Insert Changed values

(CV) in Dataset

Best Individual

Obtaining watermark

bits

Insertion Module


Page 25

3.2.6 Watermark Extraction

Figure 3.5 provides the pictorial representation of watermark extraction process. Initial step covers

sorting of attribute and tuple using primary key. Attribute names are used for sorting the attribute and

primary keys are used for sorting of tuples. Secret and Primary key are used by MAC to produce the

watermark bit that can be used in the extraction process. DEW based extraction process restores the

original values along with extracting watermark bits.

If the restored values are not exceeding the distortion tolerance limit and both the extracted

watermark bit and the watermark bit obtained by MAC method are same. Then the restored values are

replaced by the changed values (CVs). Watermark extraction requires the GA chromosome that was

used by embedding phase. Combination of Huffman coding [74] and RSA [75] are used for

compression/decompression and encryption/decryption. RSA is a good public key cryptography

method, whereas Huffman coding is a well-known data compression method.

Figure 3.5 Process Of Watermark Detection

3.3 Results and Analysis

GA toolbox of Matlab 2008b is used for simulation purpose. The value of distortion tolerance is kept

fixed in the experiments. Increasing the value of distortion tolerance may improve chances of

Comparing watermark bits

Restoring original relation using obtained values

Sorting tuple and attributes of a relation

Obtaining Watermark Bits

Obtaining attribute values and bits using DEW detection technique

Checking distortion limit


Page 26

watermark insertion in the attributes. As a result, watermark capacity, distortion, and false positive

detection increases.

Two datasets are used for testing the proposed technique. Random-Dataset (R-dataset) contains

randomly generated relations with varying number of tuples from 100 to 10000 and the number of

attributes is fixed to 1000. Each cell of the relation is filled with randomly generated values ranging

from 1 to 1000. Second dataset that we have used in our experiment is the Forest Cover Type dataset

(FCT-dataset), that is provided by University of California, Irvine on its website [76]. This dataset

contains 581,012 tuples and 54 attributes.

A relation of R-dataset containing 10000 tuples and 1000 columns can take 19 seconds to insert

and 22 seconds to detect watermark through DEW technique. Average time of many runs of insertion

and detection algorithm is reported here. Approximately 50% of the TVs are successfully watermarked

that fulfill distortion tolerance of the attributes. Lower bound of distortion tolerance for each attribute

is a fixed value that is 100 and upper bound is 850.

Whereas, GADEW technique completes the whole process of watermark insertion in 13 minutes

and it detects watermark in 8 seconds. Watermark insertion process takes more time because 10

chromosomes are initialized and 10 generations are used to select the optimal chromosome. Increasing

the number of both chromosomes and generations, favors the selection of more optimal chromosome,

as a result time required for GA module is increased. Time consumed by GA module is also dependent

upon the size of chromosome. Size of the chromosome is twice as large as number of selected tuples

because two TVs are selected from each selected tuple.

3.3.1 Capacity Analysis

Probability of successful insertion of watermark increases, when attributes present in a relation are

large in number. Because, GADEW technique can have more attributes to search, in order to select

appropriate attributes for watermarking. As a result, capacity enhances. Less improvement in capacity

is observed, when more tuples are selected to be watermarked as compared to number of attributes.

While, more improvement in capacity is observed, when less tuples are selected to be watermarked.

Watermark is inserted in 44, 105, 475, 1026, 4953 tuples using R-dataset, size of the dataset is set

to 100, 300, 1000, 2000, 10000 accordingly. Success rate increases up to 70 percent, when less

number of tuples is selected to be watermarked. However, success rate decreases down to 37 percent,

when high number of tuples is selected for watermarking.


Page 27

Comparison of capacity between DEW and GADEW technique on R-dataset is provided in Figure

3.6. It shows average results of all experiments performed for both GADEW and DEW techniques. As

a result, overall improvement of 12 percent is observed. Using DEW based approach, on average only

36 percent of the tuples are successfully watermarked, while using GADEW approach, 48 percent of

tuples are successfully watermarked.

Figure 3.6 Capacity Comparison Of GADEW And DEW Method Using R-Dataset

Watermark is inserted in 21777, 10945, 9383, 6081, 4136, and 3110 tuples using FCT-dataset.

Average success of watermark insertion using DEW based watermarking is 30.2 %. While after

applying GADEW method, the results improved up to 33.57 %. In this experiment, it is noticed that

for less number of tuples (small chromosome size) the GA achieves high capacity but when the

chromosome size is large, the capacity difference between DEW and the proposed GADEW technique

decreases. Therefore, the decrease in capacity is caused by the selection of large number of tuples to

be watermarked (large chromosome size). To reduce this problem, multiple runs of GA might be

helpful. That is, imposing limit on maximum size of chromosome and running GA separately for each

set of tuples and combining the outcome of all GA runs. Multiple runs may increase the time of

watermark embedding process and the improvement in capacity might remain the same.

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

44 105 475 1026 4953

% o

f S

ucc

esfu

lly W

ater

mar

ked

Tup

les

Number of Tuples Watermarked

DEW

GADEW


Page 28

Figure 3.7 shows comparison of capacity between GADEW and DEW method on FCT-dataset.

More improvement in capacity is witnessed using R-dataset compared to FCT-dataset. Because, only

ten attributes are favorable for DEW technique in FCT-dataset. On the other hand, all attributes of R-

dataset are favorable for DEW technique.

Figure 3.7 Capacity Comparison Of GADEW And DEW Method Using FCT-Dataset

3.3.2 Security Analysis

Bit flipping, deletion, and sorting attacks don‟t cause much harm to the watermark according to Gupta,

et al. [35]. Attacker might get the knowledge of distortion in the database and can predict the number

of capable attributes (v) for watermarking [34]. Consequently, the attacker can alter all watermarked

attributes, which can result in losing major portion of watermark. Normally, capable attributes are

small in number; therefore, it is easy for an attacker to predict the value of v. Distortion introduced in

the data or the abrupt changes caused by the insertion of the watermark can provide information about

watermarked attributes to the attacker. This might result in losing most part of the watermark. Two

properties may help in this regard, namely randomness of GA and reduction in distortion. Brief

discussion about each of them is provided as follows:

26

27

28

29

30

31

32

33

34

35

36

3110 4136 6081 9383 10945 21777

% o

f su

cces

sfull

y W

ater

mar

ked

Tup

les


DEW

GADEW


Page 29

3.3.2.1 Randomness of GA

In the proposed technique, value of v doesn‟t determine the selection of attribute. It is the GA that

performs selection between attributes of a selected tuple. GA has introduced randomness in the

selection of attribute; so, it is very difficult for an attacker to predict the marked attributes. This helps

to improve the overall security of the watermark. Thus, GA itself provides help to guard against

attacker.

3.3.2.2 Reduction in Distortion

Random selection of attributes based upon GA doesn‟t provide complete guard against the attacker.

Distortion introduced due to the insertion of watermark may attract the attacker, so minimizing

distortion will also add to the security of the database. At the tuple level, we are trying to select those

two attributes, which are causing minimum distortion in a relation.

At attribute level, our approach tries to hide the watermarked value using neighborhood value.

Invisibility of the watermark also adds to the overall security of the watermark, because it helps to

hide the distortion according to the neighborhood values. Therefore, the attacker will find it hard to

predict the attributes that are marked.

Furthermore, we have also performed experiments related to security of relational database using

two datasets. First, we have performed experiments on R-dataset. Figure 3.8 provides detailed

comparison between standard deviation (Std) of the original-dataset [77] with both DEW and

GADEW method. It clearly shows that the distortion produced in terms of DEW method is high,

compared to GADEW method.

It can be observed from Figure 3.8 that Std of GADEW watermarking technique is closer to the Std

of OrD. On the other hand, DEW based watermarking has high Std compared to the OrD. We can

compare the Std of the whole relation using R-dataset, because in this data, we don‟t have attribute

wise restriction for values, since all the data is randomly generated ranging from zero to one thousand.


Page 30

Figure 3.8 Std Comparison Of Ord, DEW, And GADEW Method Using R-Database

Secondly, we have used the FCT-dataset. Only first 10 attributes and 3110 tuples are watermarked

using both DEW and GADEW algorithms. Subsequently, measuring mean and Std of both

watermarked datasets. It is clear that most of the attributes of GADEW watermarked dataset have less

distortion compared to DEW method. Table 3.1 shows mean and Std of four datasets. These datasets

are attained by changing fitness function, during watermark insertion process.

Names of the first 10 attributes are listed in first column of Table 3.1, against which mean and Std

value of each attribute is mentioned. These values are attained after watermark insertion process using

four combinations of fitness functions. Second column shows values for DEW based algorithm. Third

column shows values for GADEW method using fitness value of attribute-wise distortion only. Fourth

column shows values for GADEW method using fitness value of tuple-wise distortion only. Last

column shows GADEW method for both tuple and attribute wise distortion, combined in single fitness

function.

Difference in Mean = | M_DEW - M_OrD|-| M_GADEW - M_OrD| (4.12)

Difference in Std = | S_DEW - S_OrD|-| S_GADEW - S_OrD| (4.13)

0

50

100

150

200

250

44 105 475 1026 4953

Sta

nd

ard

Dev

iati

on o

f fu

ll T

able


DEW

GADEW

OrD


Page 31

Table 3.1: Results Of Reducing Tuple And Attribute-Wise Distortion

M_DEW represents mean, while S_DEW represents Std, value of DEW based watermarked

dataset. M_OrD represents mean, while S_OrD represents Std value of original-dataset. M_GADEW

represents mean, while S_GADEW represents Std, value of GADEW based watermarked dataset.

Difference in mean and Std is calculated using equations (3.12 and 3.13). Difference represents the

improvement. Absolute difference of M_GADEW and M_OrD is subtracted, from absolute difference

of M_DEW and M_OrD. The positive value for Difference in Mean indicates improvement. If

difference is negative it means our results are decreasing. Values of Table 3.1 are used, along with

mean and Std values of OrD to measure the improvement in terms of distortion.

ATRIBUTE

NAME

DEW

Attribute-wise

GADEW

Tuple-wise

GADEW

Tuple &

Attribute wise

Combined

GADEW

Mean Std Mean Std Mean Std Mean Std

ELEVATION 2959.5 280.01 2959.6 279.98 2959.4 279.99 2959.6 279.98

ASPECT 155.63 113.82 155.6 113.7 155.7 113.6 155.6 113.7

SLOPE 14.22 25.83 14.26 25.81 14.26 24.78 14.26 25.81

H_DIST _HY 269.52 213.45 269.5 213.4 269.5 213.2 269.5 213.4

V_DIST _HY 46.57 61.41 46.57 61.19 46.53 60.76 46.57 61.19

H_DIST _RD 2349.7 1559.9 2349.7 1559.9 2349.7 1559.9 2349.7 1559.9

HS_9AM 212.20 33.66 212.2 30.97 212.2 30.82 212.2 30.97

HS_NOON 223.37 29.18 223.4 25.96 223.3 25.81 223.4 25.96

HS_3PM 142.62 43.16 142.6 41.04 142.6 42.39 142.6 41.04

HL_FR_POT 1979.9 1324.7 1980 1324.6 1979.9 1324.6 1980.0 1324.6


Page 32

Table 3.2: Results Of Modification In Std And Mean

In Table 3.2, first column shows names of the attributes against which difference for each attribute

is mentioned. Second column shows difference values for GADEW method using attribute-wise

distortion only. Third column shows difference values for GADEW method using tuple-wise

distortion only. Last column shows GADEW method for both tuple and attribute-wise distortion

combined. Negative values are less compared to positive values, which mean we are having overall

reduction in distortion.

3.3.3 Different Attacks

Experimental results of tuple addition, deletion, and bit flipping attacks on GADEW method are

presented and comparisons of GADEW with DEW method are provided on both tuple and attribute

wise-multifaceted attacks. Solution for the problem of secondary watermarking attack is also provided.

Figures presented in this section have detection ratio on vertical axis, which represents the ratio of

successful detection of watermark. Watermark is correctly detected from all the watermarked tuples if

the detection ratio approaches one. If the attacker manages to alter some watermark bits then the ratio

drops below one. Presence of some false positive bits on detection side will increase the detection ratio

from the normal range and in some situations it may become greater than one.

ATRIBUTE

NAME

Attribute-wise GADEW Tuple-wise GADEW Tuple and Attribute wise

Combined GADEW

Difference

in Mean

Difference

in Std

Difference

in Mean

Difference

in Std

Difference in

Mean

Difference

in Std

ELEVATION 0.0160 0.0163 0.0897 0.0193 0.0147 0.0224

ASPECT 0.0170 0.2134 0.0053 0.4329 0.0009 0.1379

SLOPE -0.0345 1.0598 -0.1180 -5.3088 -0.0375 0.0272

H_DIST _HY 0.0365 0.2461 0.0097 -0.0336 0.0006 0.0689

V_DIST _HY 0.0521 0.6555 -0.0302 -0.9078 0.0035 0.2254

H_DIST _RD -0.0503 -0.0062 -0.0634 -0.0208 -0.0555 -0.0056

HS_9AM 0.0484 2.8369 0.0258 2.4839 0.0406 2.6868

HS_NOON 0.0349 3.3747 0.0360 4.7476 0.0110 3.2223

HS_3PM 0.0451 0.7797 0.0352 0.7322 0.0349 2.1220

HL_FR_POT 0.2316 0.0940 0.0707 0.1151 0.1223 0.0808


Page 33

This shows that some non-watermarked bits are falsely detected as watermark bits by the detection

algorithm. Horizontal axis represents change in attack percentage (%) according to size of the dataset.

This means altering specific ratio of tuples that are present in the dataset. If a dataset includes 1,000

tuples then altering 10 % means 100 tuples are attacked. Zero (0) % attacks represent detection

without any attack on the dataset, followed by 10 %, 20 %, 30 %, 40 % and 50 % attacks on the

watermarked data.

3.3.3.1 Addition Attack

Adversary performs addition attack by inserting new tuples in order to affect the watermark detection

process. Addition of non-watermarked new tuples can result in increasing amount of false positive

rate. It can also be used by attacker to increase the percentage of non-watermarked tuples in order to

prove that the detection is week. In Figure 3.9 detection ratio is 1.0 when no attack is performed and it

remains 1.0 even after the 50 % new tuples are added into the watermarked dataset, this shows that

GADEW method has solved the problem of false positive detection. Addition attack is not causing

problem because our technique is concerned with watermarked tuples only, non-watermarked tuples or

newly added tuples don‟t affect the process of watermark detection.

Figure 3.9 GADEW method Bit Flipping, Deletion, And Addition Attack Comparison

0

0.2

0.4

0.6

0.8

1

1.2

0 10 20 30 40 50

Det

ecti

on R

atio

Attack Percentage according to Database Size

GADEW Addition Attack

GADEW Bit Flipping Attack

GADEW Deletion Attack


Page 34

3.3.3.2 Deletion Attack

Deletion attack is performed for removing the tuples in order to destroy the watermark. Attacker

randomly removes some tuples from the watermarked dataset, hoping that some watermarked tuples

may also get deleted which can result in losing watermark bits. Compared to other tuple wise attacks,

deletion attack can be more harmful because it leaves no chance for correct detection of watermark bit

when a watermarked tuple is deleted. Without performing any attack, detection rate is 1.0 but with

increase in percentage of deletion attack, the ratio of correct detection is also decreasing. It shows that

deletion attack can be more harmful compared to other attacks.

3.3.3.3 Bit flipping Attack

Bit flipping attack is performed for choosing tuples randomly and flipping all the LSBs of attribute in

those tuples. This attack is successful only when sufficient amount of watermarked bits are altered

[35]. In order to increase the difference between results of deletion and bit flipping attack, we have

only flipped LSB‟s of 50 % of the attributes in the attacked tuple, rather than altering whole attributes.

Figure 9 is showing decrease in detection rate as the attack percentage according to dataset size is

increasing. Bit flipping attack seems to be less harmful compared to deletion attack, when only 50 %

of the attributes are altered in the attacked tuple.

3.3.3.4 Sorting Attack

In sorting attack, if the attacker re-sorts the tuples based on any attribute, it doesn‟t affect the detection

algorithm, because we again resort the attribute according to the primary key on the detection side.

Problem with the DEW approach is that if the position of the attribute is changed in the dataset, the

algorithm fails to detect the attributes [71]. Therefore, we sort the attributes according to their names

before insertion of watermark and re-sort them on the detection side accordingly.

3.3.3.5 Tuple-wise-Multifaceted Attack

Tuple-wise-multifaceted attack includes three attacks sequentially carried out on the same

watermarked dataset. Initially addition attack is performed followed by deletion attacks, and finally bit


Page 35

flipping attack is applied. After performing tuple-wise-multifaceted attack, comparison between

detection results of both simple DEW and GADEW method is provided in Figure 3.10. Detection ratio

on vertical axis helps us to analyse detection of false positives during detection process.

It is evident through Figure 3.10 that false positive rate is high in DEW, whereas GADEW method

has tackled this problem easily. Detection rate is 1.2 when no attack is performed on DEW method,

showing that we have false positive rate of 0.2. However, for GADEW method detection rate is 1.0,

this shows that GADEW method solves the problem of false positive rate as well. Horizontal axes

represent changes in attack percentage according to size of the dataset. This means altering specific

ratio of tuples that is present in the dataset. 10 % tuple-wise-multifaceted attack indicates 10 %

addition attack followed by 10 % deletion attack and finally 10% bit flipping attack. This attacking

process is also repeated for 20, 30, 40 and 50% tuple-wise-multifaceted attacks.

Figure 3.10 Tuple-Wise-Multifaceted Attacks Comparison Between DEW And GADEW Method

3.3.3.6 Attribute-wise-Multifaceted Attack

Attribute-wise-multifaceted attack comparison for both GADEW and simple DEW method is shown

in Figure 3.11. Attribute-wise-multifaceted attack includes updation attack followed by bit flipping

attack on the selected attribute. Attribute updation attack consists of deleting an attribute and its

contents followed by the addition of another attribute and its contents at the same location. Purpose of

updation attack is not to disturb the position of attribute in the watermarked dataset, because detection

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 10 20 30 40 50

Det

ecti

on R

atio


DEW GADEW


Page 36

process can easily be disturbed by not maintaining the position record of attributes in watermarked

dataset [78].

Detection ratio comparison between DEW and GADEW method after applying attribute-wise-

multifaceted attacks is provided in Figure 3.11. These results facilitate us to analyse detection of false

positives during detection process. False positive rate is high in DEW method, whereas GADEW has

tackled this problem easily.

Effects of detection after attribute-wise-multifaceted attacks are higher compared to tuple-wise-

multifaceted attacks because usually in datasets, the attributes are less in numbers compared to the

tuples. So, the probability of destroying watermark is more if attribute wise attacks are considered.

But, on the other hand, integrity of database will also be on stack even if a single attribute is altered, in

some cases database might become meaningless.

Figure 3.11 Attribute-Wise-Multifaceted Attacks Comparison between DEW and GADEW Method

3.3.3.7 Additive Attack

Scenario of additive (secondary watermarking) attack can be tackled using reversible watermarking

[35]. Watermark is inserted in a relation R and we get watermarked relation Rm (RRm). An attacker

changes the watermarked relation Rm to Rm' and re-watermarks Rm' resulting in a new relation

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 10 20 30 40 50

Det

ecti

on R

atio


DEW GADEW


Page 37

Rm'm. While Rm'm contains secondary watermark with probability 1 since it has not been modified.

Rm' still contains the initial watermark with a high probability p and attacker removes the initial

watermark with a probability 1-p. (R>Rm>>Rm'>Rm'm).

The judge asks to run detection algorithm for both initial watermark and secondary watermark on

Rm and Rm'm, respectively. Both watermarks are successfully detected in their respective relations

and original relations were restored as R for initial watermark and Rm' for secondary watermark.

Initial watermark is detected in Rm'm with high probability but secondary watermark is detected with

low probability in R. Thus it is clear that the secondary watermark was inserted in the relation already

having initial watermark. Consequently, it is proved that owner of the relation is the one, who inserted

initial watermark into it.

3.4 Chapter Summary

We have used GA to improve the capacity of DEW method in databases, while keeping distortion

tolerance fixed. GA introduces some randomness in DEW technique, thus making it difficult for the

attacker to predict attributes. Security of the watermarking system is also enhanced by reducing the

distortion and minimizing abrupt changes caused by DEW method. This is achieved by two measures

added in the fitness function of GA, first by using the knowledge of the neighborhood values of the

relational database, second by minimizing the distortion introduced by selecting attributes resulting in

minimum distortion. Results are also showing improvement in capacity of watermark.

Consequently, more watermark bits can be embedded in database, while distortion introduced in it

is minimalized. This provides more comfort for the user and leaves fewer options for the attacker to

destroy the watermark. Detection technique of GADEW method resolves problem of reshuffling

attacks on attributes. It is also robust against addition, deletion, sorting, bit flipping, tuple and

attribute-wise-multifaceted, and additive attacks. It has also solved problem of false positive rate at

detection side. In future, we intend to develop a reversible watermarking technique, which can handles

both integer and floating point values present in a single relation.


Page 38

REVERSIBLE AND BLIND Chapter 4WATERMARKING FOR DATABASES

Currently, evidence of ownership and patent security is one of the ever increasing concerns of most of

the establishments [6, 24, 44, 79]. However, evidence of ownership of the shared data in the law court

might need some proof. Consequently, before sharing data, a non-disclosure contract may be signed

among the proprietor and the receiver, which forbids the receiver not to claim the proprietorship of the

object or redistribute it. If the receiver violates the contract, then the proprietor is able to prosecute him

in law court, only if when the proprietor can demonstrate his proprietorship on the shared data [71].

Watermarking proves to be useful for managing numerous security concerns faced in

movement of diverse multimedia objects such as DNA , images, database, text etc. [30]. We have

proposed a novel reversible and blind watermarking technique for relational databases called RBW-

RD. The reversibility of the proposed RBW-RD technique is based on the concept of Contrast

Mapping transformation. In context of relational database both of the watermarking techniques (RCM

technique and the proposed RBW-RD technique) are new. The proposed technique is able to achieve

high embedding capacity mainly because of two reasons. Firstly, all three steps of Contrast Mapping

technique are utilized for watermarking. Secondly, there is no overhead of adding side information to

the watermark data. Similarly, watermarking distortion is minimum because only first step out of three

steps causes high distortion, whereby distortion tolerance parameter is exploited to control the

distortion without affecting the embedding capacity. Additionally, in the proposed RBW-RD false

positive rate is minimal because automatic bit checking technique is adopted. The robustness

performance of the proposed RBW-RD is tested against different attacks and comparison with existing

watermarking techniques for relational database shows its effectiveness.


Page 39

4.1 Proposed Reversible and Blind Watermarking Technique for

Relational Database

The proposed RBW-RD method maintains the property of reversibility and blindness, besides

achieving high watermarking capacity at less watermarking distortion and FP rate. In order to embed a

single watermark bit using RCM method, two values are required [51, 52, 54, 55]. Whereas, the

selection of tuples, attributes, and watermark bits are carried out using MAC Hashing technique (H)

[47]. Therefore, the proposed RBW-RD technique is able to resist different types of attacks, such as

tuple deletion, addition, sorting, and bit flipping attack. In the proposed RBW-RD technique, selection

of tuples, attribute pairs, and watermark bits are performed using message authentication code (MAC),

which is explained at section 3.2.1 using equation (3.7).

4.1.1 Automatic Bit Checking

Certain watermarking attacks can result in huge number of FPs during watermark detection. FPs

falsely detects the faulty watermark and restores the false pair, which affects the extraction of

watermark and also disturbs the exact recovery of the original relation (R). However, FPs can be

reduced by introducing automatic bit checking technique at detection side.

In the proposed RBW-RD technique, for each selected tuple, an automatic watermark bit is

generated using H [34]. The obtained watermark bit is embedded in a selected attribute pair using

proposed RBW-RD technique. On extraction side, for each selected attribute pair, the automatically

generated watermark bit (b) is obtained. It is compared with the extracted watermark bit (b‟), if not

matched then the original attribute pair is not restored. Thus, results in less FPs and reducing distortion

of the restored relation (RR), caused by different attacks. Thus, number of FPs are reduced, as a result

the distortion of the RR is also minimized.


Page 40

4.1.2 RCM Transform

In order to embed watermark bit, two attributes x and y are selected from the same selected tuple using

H. Further, integer and fraction portion are separated for both of the attributes, which are denoted as

Int_x, Frac_x and Int_y, Frac_y. To prevent underflow and overflow, lower and upper limits [0, L] are

defined as RCM domain. It is shown in Figure 4.1. Upper limit (L) ensures that values may not exceed

word-length of each attribute that is determined by left shift technique (L=2t-1), where, t (word-

length) represents number of bits used to represent an integer value for the attribute. The forward

RCM transform converts original integer pair (Int_x, Int_y) into watermarked integer pair (Int_x′,

Int_y′) using equation (4.1).

Int_x' = 2Int_x - Int_y, Int_y' = 2Int_y - Int_x (4.1)

To prevent overflow/underflow, the conversion is restricted to RCM domain that is represented by

[0, L] x [0, L] and is given by the following equation.

0 2Int_x - Int_y L, 0 2Int_y - Int_x L

(4.2)

Figure 4.1 RCM Domain For 8-Bit Attribute

0

50

100

150

200

250 0 50 100 150 200 250

Y

X


Page 41

The inverse RCM transform restores the watermarked integer pair (Int_x′, Int_y′) back to original

integer pair (Int_x, Int_y) using equation (4.3). Ceiling(x) = ⌈x⌉ represents a ceiling function, which

gives smallest integer not less than⌈x⌉.

2 1 1 2

3 3 3 3Int_x Int_x' + Int_y' ,Int_y Int_x' + Int_y'

(4.3)

According to equation (4.2), the integer pair (Int_x, Int_y) belongs to RCM domain if the integer

values of its watermarked pair (Int_x′, Int_y′) fulfills the two constrains 0 ≤ Int_x'≤ L and 0 ≤ Int_y'≤

L; otherwise it does not belongs to RCM domain.

4.1.3 Distortion Tolerance (DT) Check

Some attributes of a relation do not tolerate much distortion. Therefore, distortion tolerance (DT)

keeps a check on the values of the attributes so that the change may not exceed a certain limit [22, 34].

DT check is incorporated in the proposed RBW-RD approach to avoid too much distortion and thus

keep the usability of the RW intact.

The forward RCM transforms the pair (Int_x, Int_y) using equation (4.1). Thus, causing distortion

for Int_x as Int_x' - Int_x = 2Int_x - Int_y - Int_x = Int_x - Int_y, and distortion for Int_y as Int_y' -

Int_y = Int_y - Int_x. Let δjk be the DT for different attributes j. DT provides upper (δInt_xj1) and

lower (δInt_xj2) limits for an attribute, depending upon the usability of values present in that attribute.

Value of DT check may be different for each attribute, therefore, the pair (Int_x, Int_y) is transformed

only if the watermarked value are with in DT Limit, i.e δInt_xj2 < Int_x - Int_y < δInt_xj1 && δInt_yj2

< Int_y - Int_x < δInt_yj1.


Page 42

4.1.4 Watermark Embedding

Watermark embedding process is explained in Figure 4.2. Selection of tuple, attribute pair, and

watermark bit is performed using H [34, 35]. Fraction and integer portion is separated and watermark

bit is embedded using one of the following three steps of RBW-RD Technique.

1. If both integers belong to RCM domain, satisfy DT check, and are not odd, then the pair is

transformed using equation (4.1) and LSB of Int_x‟ is set to „1‟ and watermark bit is embedded at

LSB of Int_y′.

2. If both integers belong to RCM domain, satisfy DT check, and are odd, then LSB of Int_x is set to

„0‟ and watermark bit is embedded in LSB of Int_y.

3. If any of the integer values don‟t belong to RCM domain or don‟t satisfy DT check, then LSB of

Int_x and Int_y are saved in the LSBs of fraction portion (Frac_x and Frac_y). While LSB of Int_x

is replaced with „0‟ and LSB of Int_y is replaced with watermark bit.

Finally, the watermarked attribute pairs are replaced with the original attribute pairs after combining

integer and fraction portion of both attributes.


Page 43

Figure 4.2 Block diagram Of Watermark Embedding Phase.

No

Yes

Yes

Yes

No

No

Select two attributes from the tuple using H

Select tuple and generate watermark bit b using H

Separate fraction and integer portion of both attributes

Select a Relation R, Secret Key SK, and Hashing Function H

For each tuple in the relation

No

Yes

Frac _x'=LSB(Int_x)

LSB(Int_x’)=0

(Int_x,Int_y) belong to RCM

Domain ?

(Int_y,Int_x) are both odd ?

Transform Int_x & Int_y

using (2)

Frac _y'=LSB(Int_y)

LSB(Int_y')=b

LSB(Int_x')=0

LSB(Int_y')=b

LSB(Int_x')=1

LSB(Int_y')=b

Combine integer and fraction portion and overwrite (x, y), with (x', y')

End of sequence?

Start

Stop

(Int_x,Int_y) satisfy DT ?


Page 44

Figure 4.3 Block Diagram Of Watermark Extraction Phase.

Yes

No

Select two attributes from the tuple using H

Select tuple and generate watermark bit b using H

Separate fraction and integer portion of both attributes

Select a Relation R, Secret Key SK, and Hashing Function H

For each tuple in the relation

If LSB(Int_x')==1 ?

Combine integer and fraction portion and overwrite (x', y') with (x, y) if b == b'

End of sequence ?

Start

Stop

LSB(Int_xx)=1 LSB(Int_yy)=1

b'=LSB(Int_y')

Int_x =Int_xx

Int_y= Int_yy

b'=LSB(Int_y')

LSB(Int_x)=LSB(Frac_x')

LSB(Int_y)=LSB(Frac_y')

b'=LSB(Int_y')

LSB(Int_x')='0'

LSB(Int_y')= '0'

Int_x,Int_y=Inverse_t

ransform(Int_x',

Int_y') using (4)

Yes

Yes

No

No

No

Yes

(Int_xx,Int_yy) satisfy DT ?

(Int_xx,Int_yy) belong to RCM

Domain ?

Int_xx=Int_x' Int_yy= Int_y'


Page 45

4.1.5 Watermark Extraction

Watermark extraction process is provided in Figure 4.3. Selection of tuple, attribute pair, and

watermark bit is performed using H [15, 17]. Before applying RBW-RD technique for watermark

extraction, the integer and fraction portion is separated for each attribute pair. Accordingly, watermark

is extracted using one of the following three steps of RBW-RD Technique.

1. If LSB of Int_x′ is „1‟, the integer pair belongs to RCM domain and satisfies DT check. Therefore,

the LSB of Int_y′ is obtained as detected watermark bit. LSB of Int_x′ and Int_y′ are set to "0"

and the original integer pair (Int_x, Int_y) is restored by inverse RCM transform using equation

(4.3).

2. If LSB of Int_x′ is „0‟, then LSB of Int_y′ is saved before setting LSBs of integer pair

(Int_xx′, Int_yy′) to „1‟. If both integers (Int_x′, Int_y′) belong to RCM domain and satisfy DT

check, then LSB of Int_y′ is obtained as detected watermark and LSBs of Int_x′ and Int_y′ are set

to „1‟.

3. If LSB of Int_x′ is „0‟, and the integer pair (Int_x′, Int_y′) does not belong to RCM domain or

satisfy DT check, then LSB of Int_y′ is extracted as obtained watermark bit. The original integer

pair (Int_x, Int_y) is restored by replacing LSBs of (Frac_x′, Frac_y′) with LSBs (Int_x′, Int_y′).

LSB of Int_y′ is extracted in all the three cases and matched with the automatically generated

watermark bit. If both are same, then watermarked attributes of the selected tuple are restored. Lastly,

integer and fraction portion of both attributes are combined. Embedding algorithm is devised to

exactly restore the relation and the watermark bit, while utilizing every selected attribute for achieving

high watermarking capacity. Figure 4.4 provides detailed algorithm of both watermark embedding and

extraction process.


Page 46

Figure 4.4 Embedding and Extraction Algorithms of The Proposed RBW-RD Technique

Watermark Embedding Algorithm ( R, Key SK, λ, α) Input, Original Relation R, Secret Key SK, Fraction of tuples 1/ λ, mark-able attributes α Output: Watermarked Relation RW 1 for each tuple ti in R 2 loop 3 if H(ti.PK || SK) mod λ = 0 // mark this tuple 4 x = ti.(H(ti.PK || SK ) mod α); // mark attribute x 5 y = ti.(H(ti.PK/2 || SK) mod α); //mark attribute y 6 b = LSB(H(ti.PK || SK)); 7 Frac_x=Bin(Get_frac(x)); 8 Int_x=Bin(Get _int(x)); 9 Frac_y=Bin(Get_frac(y)); 10 Int_y=Bin(Get _int(y)); 11 domain=check_domain(Int_x, Int_y) 12 if domain==1, DT=DT_check(Int_x, Int_y) 13 if DT ==1, odd=check_odd(Int_x, Int_y); 14 if odd==0 15 Int_x’,Int_y’=transform(Int_x,Int_y); 16 Int_x’(length(Int_x’))='1'; 17 Int_y’(length(Int_ y’))=b; 18 end 19 if odd==1 20 Int_x’ (length(Int_x))='0'; 21 Int_y’ (length(Int_y))= b; 22 end 23 end, end 24 if domain==0 || DT==0 25 Frac _x’(length(Frac _x))=LSB(Int_ x); 26 Int_x’ (length(Int_x))='0'; 27 Frac _y’(length(Frac _y))=LSB(Int_ y); 28 Int_y’(length(Int_ y’))=b; 29 end 30 x’=Dec((Int_x’)||.||(Frac_x’)); 31 y’=Dec((Int_y’)||.||(Frac_y’)); 32 end 33 end loop

Watermark Detection Algorithm ( RW, Key SK, λ,α) Input: Watermarked Relation RW, Secret Key SK, fraction of tuples 1/ λ, mark-able attributes α Output: Restored Relation RR 1 for each tuple ti in RW 2 loop 3 if H(ti.PK || SK) mod λ = 0 // mark this tuple 4 x’ = ti.(H(ti.PK || SK) mod α); // mark attribute x 5 y’ = ti.(H(ti.PK/2 || SK) mod α);//mark attribute y 6 b = lsb(H(ti.PK || SK)); 7 Frac_x’=Bin(Get_frac(x’)); 8 Int_x’=Bin(Get _int(x’)); 9 Frac_y’=Bin(Get_frac(y’)); 10 Int_y’=Bin(Get _int(y’)); 11 if LSB(Int_x’)== '1' 12 b’=LSB(Int_y’); 13 If b’==b 14 Int_x’ (length(Int_x’))='0'; 15 Int_y’ (length(Int_y’))= ‘0’; 16 Int_x, Int_y=transform(Int_x’, Int_y’); 17 end 18 else if LSB(Int_x’)== '0' 19 Int_xx (length(Int_x’))='1'; 20 Int_yy (length(Int_y’))= ‘1’; 21 domain= check_domain (Int_xx, Int_yy); 22 If domain==1, DT=DT_check(Int_xx,Int_yy); 22 if DT ==1, b’=LSB(Int_y’); 23 If b’==b 24 Int_x =Int_xx; 25 Int_y= Int_yy; 26 end 27 end, end 28 If domain==0 || DT==0 29 b’=LSB(Int_y’); 31 If b’==b 32 Int_x (length(Int_x’))=LSB(Frac_x’); 33 Int_y (length(Int_y’))=LSB(Frac_y’); 34 end 35 end 36 end technique 37 x= Dec((Int_x)||.||(Frac_x’)); 38 y= Dec((Int_y)||.||(Frac_y’)); 39 end 40 end loop


Page 47

4.1.6 Analyzing Three Steps of RBW-RD

Analysis of the three steps of the proposed RBW-RD technique is given below.

4.1.6.1 First Step

During first step, Int_x′, Int_y′ are obtained after applying forward RCM transform using equation

(4.1). The LSBs of Int_x', Int_y' are lost because watermark bit is embedded. At detection side, LSBs

of Int_x' and Int_y' are set to „0‟ after extracting watermark bit. This step along with the ceiling

function of inverse RCM transform equation (4.3) ensures the exact restoration of original values

(Int_x and Int_y). The transformation (2/3 and 1/3) using equation (4.1) and restoration (1/3 and 2/3)

using equation (4.3) of RBW-RD technique are the same, except setting LSB‟s of Int_x' and Int_y' to

„0‟ and using ceiling function at the detection side [51].

4.1.6.2 Second Step

The inverse RCM transform in equation (4.3) can exactly restore the original values (Int_x, Int_y),

except when Int_x' and Int_y' are both odd [51]. LSB of „1‟ means an odd integer number. From

equation (4.1), it follows that (Int_x', Int_y') are both odd integers only if (Int_x, Int_y) are odd

integers too.

4.1.6.3 Third Step

Third step embeds watermark in those attribute pairs that do not belong to RCM domain or do not

fulfill DT check. Whereas, previously proposed RCM techniques do not use the third step for

watermark embedding [51, 52, 54, 55]. Therefore, in the proposed RBW-RD technique, third step

helps to increase the watermarking capacity of the relational database. Third step embeds watermark

bit at LSB of int_y, whereas LSB of int_x represent attribute pair of the watermark.

In order to preserve the LSBs of watermarked pair (Int_x′, Int_y′), LSBs of fraction portion (frac_x,

frac_y) are used. LSBs of the fraction portion can tolerate small change. However, there is less chance

of losing LSB of fraction portion because only third step of RBW-RD technique utilizes this

procedure. Moreover, if the fraction portion is attacked even then the watermark can be recovered

exactly. In the proposed RBW-RD approach, only integer portions are targeted for watermark


Page 48

embedding because integer has less possibility to be attacked, whereas fraction portion can easily be

manipulated.

4.1.7 Reduction in Watermarking Distortion

Table 4.1 provides probability of watermark embedding for three steps. In order to calculate

probabilities for all three steps of the proposed RBW-RD technique, the LSB representation for the

attribute pairs is analyzed. If 50 % of the total selected attribute pairs belong to RCM Domain then,

probability of using first step of the proposed RBW-RD technique is 0.375 and probability for second

step is 0.125. Whereas, probability for third step is .50, because 50% of the total selected attribute

pairs don‟t belong to RCM domain.

The probability of using first step is expected to be 0.375 only. However combined probability of

second step and third is high (0.625). First step of RBW-RD is responsible for causing high distortion

because it uses equation (4.1) for watermarking. Accordingly, distortion caused by RBW-RD

technique will be less, because probability of using first step is less. However, second and third step of

the proposed RBW-RD technique have negligible effect on watermarking distortion. Thus, distortion

caused by RBW-RD technique is less while the watermarking capacity is high because all the three

steps are utilized for watermarking.

Incorporating DT check can further reduce the use of first step, thus beneficial to reduce overall

Table 4.1: Probability Of Watermarking For All Three Steps

First Step Second Step Third Step

Before Watermarking (int_x, int_y) (int_x, int_y) (int_x, int_y)

LSB representation of attribute

pair before watermarking (0,0) (0,1) (1,0) (1,1)

(0,0) (0,1)

(1,0) (1,1)

If 50 % of the total selected

pairs of attribute belong to

RCM Domain

(0.50) (0.50)

Probability of using each step (0.375) (0.125) (0.50)

After Watermarking (int_x',int_y') (int_x′,int_y′) (int_x′,int_y′)

LSB representation of attribute

pair after watermarking

(1,0)

(1,1)

(0,0)

(0,1)

(0,0)

(0,1)


Page 49

watermarking distortion of the relation. If DT is not fulfilled at first step, then the third step will be

used for watermarking, which does not causing much distortion. Thus, watermarking capacity of the

relation is not affected while satisfying the DT check. Altering the third step is very useful both for

capacity and distortion because despite very low value of DT check, the watermarking capacity of

database is not affected

4.2 Improvements of The Proposed RBW-RD Technique Over RCM

Technique

This section highlights different advantages of the proposed RBW-RD technique by comparing it with

RCM technique for relational database watermarking. The proposed RBW-RD technique achieves

high watermarking capacity compared to RCM techniques [51, 52, 54, 55], because watermark bits are

successfully embedded even if the transformed pairs do not belong to RCM domain. Additionally,

there is no need of extra storage, which is another reason for increase in watermark capacity.

Therefore, utilizing both integer and fraction portions for watermark embedding helps to increase

watermarking capacity.

On the other hand, DT factor is added for providing extra control to the proposed RBW-RD

technique, which is absent in RCM technique. It helps to observe limitations for each attribute so that

transformed value may not exceed their DT level. As a result overall distortion is minimized without

affecting watermarking capacity.

At detection side automatically generated watermark bits are matched with extracted watermark

bits. As a result, FPs caused by addition or bit flipping attacks are minimized. Thus, mean and std

measure of RR gets closer to mean and std of R, thereby attaining better reversibility

4.2.1 Increased Watermarking Capacity

Third step of RCM technique is not utilized for watermarking. Further, LSBs of the integer pair is

separately saved, along with the watermark information [51, 52, 55]. However, third step of the

proposed RBW-RD technique do not save LSB of int_x separately. Thus, it helps blind detection,

without requiring any additional compression/encryption. Furthermore, watermark bit can be

embedded at LSBs of int_y, thereby increasing watermarking capacity of the algorithm.

In the proposed RCM technique, LSBs of fraction portion (frac_x, frac_y), is utilized for preserving


Page 50

LSBs of watermarked pair (Int_x′, Int_y′). However, LSBs of the fraction portion can bear minor

alteration. Moreover, watermark can be accurately restored even if the fraction portion is attacked.

Thus, the proposed RBW-RD technique achieves more capacity and better reversibility compared to

RCM technique.

Figure 4.5 illustrates improvement in capacity by presenting true positives (TPs) for all three steps

of proposed RBW-RD technique. Fraction (1/λ) of tuple attacked is shown at horizontal axes, while

embedded watermark is shown on vertical axes. Experiments are conducted on a relation of 10,000

tuples consisting of 11 attributes, out of which first attribute is primary key and remaining 10 attributes

are used for watermark embedding. Each attribute contains numeric value, generated randomly from 0

to 999. Whereas, 0.17 fractions of tuples are used for watermarking and DT limit for all attributes is 0

to 500.

4.2.2 Less Distortion with Same Capacity

Three different values of DT check are used to measure overall distortion of RBW-RD watermarking.

Distortion measure for RW using the three values of DT check are represented as A = DT (0-250), B =

DT (0-500), and C= DT (0-999). It is evident that the distortion can be minimized while capacity

Figure 4.5 Capacity Comparisons After Deletion Attack

0

200

400

600

800

1000

1200

1400

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.79 0.89

Emb

edd

ed W

ater

mar

k

Fraction of Tuple Attacked (1/ λ)

TP_First Step

TP_Second Step

TP_Third Step


Page 51

remains the same. Depending upon the distortion requirement of the relation, if the distortion

requirement of the relation is very less even then the watermark is embedded without affecting its

capacity.

Column 1, 2, and 3, of Table 4.2, show distortion measure (mean, std) for three values of DT check

for RW. Each row represents mean and std of a particular attribute of a relation. Therefore, for 10

attributes, mean and std is provided in 10 tuples of the table. Whereas, last tuple provide mean and std

for the whole relation.

Equation (4.5) and (4.6) are used to measure improvement in distortion (mean and std) [22], as

provided in Table 4.3. A, B, and C represents (distortion) mean and std measures of RW using proposed

RBW-RD technique with values of DT check ranges from 0-999, 0-500, and 0-250. In (5) and (6),

absolute difference of A and O (mean, std measure of R) is subtracted, from absolute difference of C

and O. The positive value of difference for an attribute implies improvement. If difference is negative,

it means our results are decreasing. Maximum improvement can be observed in terms of distortion,

when improvement between A and C (relations) is calculated using equations (4.4) and (4.5).

Table 4.2: Mean And Std (Distortion) By Varying DT For Three Watermarked Relations

DT 0-250 DT 0-500 DT 0-999

DA_Mean DA_Std DB_Mean DB_Std DC_Mean DC_Std

518.8760 290.2835 518.9211 290.2631 518.7872 290.4524

470.9878 287.6695 470.9557 287.7225 471.0778 288.5631

489.9092 281.8385 489.8773 281.8745 489.8196 282.2471

486.1661 281.9845 486.1598 282.0138 485.9054 282.2217

500.1412 256.4161 500.1466 256.4236 500.2926 256.8708

508.9844 269.4938 508.9803 269.5078 509.1231 269.9312

510.5157 297.8284 510.5157 297.8284 510.5255 297.9881

447.5397 301.5168 447.5672 301.5046 447.5090 301.7310

433.2113 269.5894 433.2129 269.5898 433.3299 269.9873

511.8894 274.2112 511.8894 274.2112 511.8810 274.2121

487.8221 13.9421 487.8226 13.9391 487.8251 13.8975


Page 52

0

500

1000

1500

2000

0-250 0-500 0-999

Emb

edd

ed W

ater

mar

k

Distortion Tolerance (DT) Limit

TP_Third Step

TP_Second Step

TP_First Step

A C C O A OImprovement (in Mean) of D over D = | D _Mean - D _Mean| - | D _Mean- D _Mean| (4.4)

A C C O A OImprovement (in Std) of D over D = | D _Std - D _Std | - | D _Std - D _Std | (4.5)

Figure 4.6 Measuring Capacity Against Varying DT

Table 4.3: Measuring Improvement In Distortion By Using Different DT

Improvement of (relation) DA

over DC

Improvement of (relation) DB

over DC

Improvement of (relation) DA

over DB

Mean Std Mean Std Mean Std

0.0888 0.1690 0.0627 0.1519 0.0261 0.0171

0.0558 0.8649 0.0237 0.8406 0.0321 0.0244

0.0896 0.4060 0.0577 0.3726 0.0319 0.0334

0.2607 0.2372 0.2544 0.2079 0.0063 0.0293

0.1348 0.4547 0.1402 0.4473 -0.0054 0.0074

0.1187 0.4373 0.1146 0.4234 0.0041 0.0140

-0.0056 0.1597 -0.0056 0.1597 0.0000 0.0000

0.0307 0.2118 0.0172 0.1996 0.0135 0.0122

0.0896 0.3979 0.0912 0.3975 -0.0016 0.0004

0.0084 0.0004 0.0084 0.0004 0.0000 0.0000

0.0871 0.2379 0.0764 0.2332 0.0107 0.0122

Improvement of DA over DC=abs(DC_Mean-DO_Mean)-abs(DA_Mean- DO_Mean)

Improvement of DA over DC =abs(DC_Std- DO_Std)-abs(DA_Std- DO_Std)

Improvement of DB over DC=abs(DC_Mean- DO_Mean)-abs(DB_Mean- DO_Mean)

Improvement of DB over DC=abs(DC_Std- DO_Std)-abs(DB_Std-O_Std)

Improvement of DA over DB =abs(DB_Mean- DO_Mean)-abs(DA_Mean- DO_Mean)

Improvement of DA over DB =abs(DB_Std- DO_Std)-abs(DA_Std- DO_Std)


Page 53

Figure 4.6 shows that the watermark capacity remains the same for different values of DT checks.

Values of DT check are shown at horizontal axes, while embedded watermark is shown on vertical

axes. It is evident that first step causes high distortion as compared to third step. Therefore, reducing

value of DT check minimizes the use of first step, as a result overall distortion in RW is reduced.

Additionally, no effect on capacity is noticed by reducing values of DT check. Because, if first step do

not meet the requirement of the attribute (DT check) then third step is used.

4.2.3 Reducing FPs and Distortion Because of Addition and Bit Flipping Attack

Bit flipping attack inverts LSBs of all the attributes of the selected tuple [47, 79]. High rate of bit

flipping attacks can cause high FPs. Consequently, the R is not exactly restored. Table 4.4 shows the

improvement because of incorporating (with) bit checking (F) against no (without) bit checking (E)

technique. Distortion measure (mean, std) is obtained on RR.

Initially, the improvement in distortion of RR is measured after performing 10 % bit flipping attacks,

the results show improvement with bit checking incorporated. While measuring distortion after 80%

bit flipping attacks, show high improvement compared to previous improvement. Addition attack adds

new tuples to the selected relation in order to weaken the watermark detection process [22, 34, 47]. As

shown in Figure 4.7, addition attack is causing FPs, whereas bit checking technique detects FPs.

Increase in the ratio of addition attack also increases the number of FPs. Bit checking technique helps

to tackle FPs, as a result the difference between FPs and TPs increases.

Figure 4.7 Watermark Detection After Bit Flipping Attack (With BitCheck)

0

200

400

600

800

1000

1200

1400

1600

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.79

Det

ecte

d W

ater

mar

k


TP_Third Step

TP_Second Step

TP_First Step


Page 54

FPs are detected and eliminated as shown in Figure 4.8, while FP rate is high in Figure 4.9. Figure

4.9 shows increase in the use of 1st step of RBW-RD technique, which causes high FPs and distortion.

Increasing the ratio of bit flipping attack also increases FPs detection and distortion in RR.

Table 4.4: Effect Of Bit Checking On Distortion, Because Of FP‟s Caused By Bit Flipping And

Addition Attack (DT Check = 0-250)

Improvement of (relation) DF over DE

10 % Attack 90% Attack

Bit Flipping Addition Bit Flipping Addition

Mean Std Mean Std Mean Std Mean Std

0.0724 0.2119 -0.0270 0.1206 0.2648 1.5193 0.0456 0.2255

0.0196 0.6699 0.0262 0.0506 0.2230 4.9101 0.1310 -0.154

0.1580 0.4248 0.0495 0.0535 0.2327 2.5616 0.0013 0.4253

0.0050 0.1075 -0.0122 0.0773 0.0265 1.6965 0.0110 -0.134

0.1065 0.3306 0.0150 0.0314 0.3459 1.6317 -0.0219 0.1600

-0.0110 0.1769 -0.0220 0.0311 -0.0120 1.7467 -0.1750 0.2491

-0.0010 0.2205 0.0010 0.0525 0.0298 1.7731 0.0415 0.1912

0.1440 0.1831 0.0428 0.0368 0.5768 1.5695 -0.0223 0.2214

0.1571 0.1594 0.0335 0.0056 0.6417 1.4351 0.0903 0.1677

0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0095 -0.0040

0.0649 0.2484 0.0106 0.0459 0.2328 1.8843 0.0110 0.1347

Improvement of DF over DE = abs(DF_Mean-O_Mean)-abs(DE_Mean-O_Mean)

Improvement of DF over DE= abs(DF_Std-O_Std)-abs(DE_Std-O_Std)

Figure 4.8 Watermark Detection After Addition Attack (With And Without Bitcheck)

0

500

1000

1500

2000

2500

3000

3500

0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.79 0.89

Det

ecte

d W

ater

mar

k


Without_BitCheck

With_BitCheck


Page 55

Figure 4.10 shows that bit checking minimizes the FPs for both bit flipping and addition attacks.

Horizontal axis represents the fraction of tuple attacked, whereas vertical axis represents FP detection

of watermark. As bit flipping inverts LSB of Int_x and Int_y, therefore it causes high FPs. Thus, bit

checking can eliminate more of them. FPs caused by addition attack are also tackled using bit

checking, however, comparatively less FPs are observed compared to bit flipping attack. Further, it

can be observed that increasing attack level also increases number of FPs.

4.3 Robustness Analysis of The Proposed RBW-RD Method

In this section, results and discussion of addition, bit flipping, and subtraction attack are provided for

the proposed RBW-RD technique. The proposed technique utilizes both integer and fraction portion

for watermarking. However, attacks are carried out on integer portion only, because attacking fraction

portion does not harm the watermark detection process. It is illustrated that both bit flipping and

subtraction attacks have almost same effect on watermark detection process. No FP is detected using

proposed RBW-RD technique, while FPs are detected only after addition or bit flipping attack.

Whereas, sorting attack does not affect watermark detection process because H is used to select the

tuples to be marked [31]. In addition, the detection process is completely blind.

Figure 4.9 Comparisons Of FPs Between Bit Flipping And Addition Attack

0

100

200

300

400

500

600

700

800

900

1000

0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.79 0.89

FP D

etec

tio

n o

f W

ater

mar

k


Addition Attack

Bit Flipping Attack


Page 56

In bit flipping attack, LSB‟s of integer portion for all the attributes of the selected tuple are changed.

Bit flipping attack is executed for different percent of tuples, ranging from 10 - 80 %. Different

fractions of tuple are selected for watermark embedding and detection, ranging from 0.01 to 0.64.

Figure 4.10 FP&TP Detection After Bit Flipping Attack(Without BitCheck)

Figure 4.11 RBW-RD Capacity Comparison Of Simple Relation And After 10%

Subtraction, Addition, & Bit Flipping Attacked

0

1000

2000

3000

4000

5000

6000

7000

0.01 0.08 0.16 0.25 0.32 0.4 0.48 0.56 0.64

Det

ecte

d W

ater

mar

k

Fraction of tuple selected to be watermarked (1/ λ)

No Attack

10 % Subtraction

10 % Addition

10 % Bit Flipping(50%)

0

200

400

600

800

1000

1200

1400

1600

1800

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.79

Det

ecte

d W

ater

mar

k


FP_TP_Third Step

FP_TP_Second Step

FP_TP_First Step


Page 57

Results of watermark detection after bit flipping attack are shown in the Figure 4.11. It provides

successful detection of watermark after 10% to 80% bit flipping attacks on .01 to .64 fractions of

tuples.

Subtraction attack is responsible for deleting all the attributes of the selected tuple, the targeted

tuple may carry the watermark bit [31, 79]. In bit flipping attack every attribute of a selected tuple is

targeted, which inverts their LSBs and causes loss of watermark information. Although, the tuple is

still there in the relation after bit flipping attack, however in case of deletion attack, whole tuple is

deleted. Figure 4.12 provides comparison of bit flipping, subtraction, and addition attack on fixed

attack level 10%, with varying fractions of tuple selected to be watermarked (0.01 to 0.64). Addition

attack is causing FPs, because size of the relation is increased by 10 percent. However, loss of few

watermarking bits is noticed after bit flipping and subtraction attack.

Figure 4.12 RBW-RD Capacity Comparison After 10% To 80% Bit Flipping Attack

0

1000

2000

3000

4000

5000

6000

7000

0.01 0.08 0.16 0.25 0.32 0.4 0.48 0.56 0.64

Det

ecte

d W

ater

mar

k


No Attack

10%

20%

30%

40%

50%

60%

70%

80%


Page 58

Figure 4.13 shows comparison at 80% bit flipping, addition, and subtractive attacks, at different

fraction of selected tuples for watermarking. It is evident that FP rate, due to addition attack increases

with the increase in fraction of selected tuple. Furthermore, huge loss in watermarking detection rate is

Figure 4.13 RBW-RD Capacity Comparison of Simple Relation And After 80%

Subtraction, Addition, & Bit flipping (50% Attributes Altered)


Addition, & Bit Flipping Attacked

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0.01 0.08 0.16 0.25 0.32 0.4 0.48 0.56 0.64

Det

ecte

d W

ater

mar

k


No Attack

80 % Subtraction

80 % Addition


0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0.01 0.08 0.16 0.25 0.32 0.4 0.48 0.56 0.64

Det

ecte

d W

ater

mar

k


No Attack

80 % Subtraction

80 % Addition



Page 59

observed, in case of bit flipping and subtraction attack. However, difference between detection rate

after bit flipping and subtractive attacks is not prominent. Huge loss in the detection rate is noticed

because more tuples are deleted/altered as the fraction of selected tuple for watermarking increases.

Figure 4.13 illustrate that the effect of bit flipping (on 100% attributes) and subtraction attack on

watermark detection is nearly the same. Whereas, Figure 4.14 show that bit flipping attack (on 50%

attributes) is causing less effect on watermark detection compared to subtraction attack. Instead of

selecting all attributes for bit flipping attack, LSB of only 50 % (half) of attributes are changed.

Whereas, rests of the 50 % attributes are left unaltered.

4.4 Comparison of Proposed RBW-RD Technique with DEW

Technique

Comparison of proposed RBW-RD technique with DEW technique [22, 34, 35, 38] is provided in

current section. It has been studied that, mathematical complexity for DEW technique is higher

compared to RCM technique, whereas the embedding rate is identical [51]. The proposed RBW-RD

technique has high capacity because it uses all three steps for watermarking compared to only one step

of DEW technique. Whereas, only one step of proposed RBW-RD technique is causing high

distortion, thus proposed technique has less watermarking distortion.

Moreover, DEW technique with high value of DT check results in more FPs [22] however, no FPs

Figure 4.15 DEW Method Comparison After Subtraction, Bit Flipping, Addition

Attack, & No Attack On Relation (Average Of 10, 20…90% Attack)

0

500

1000

1500

2000

2500

0.01 0.02 0.04 0.08 0.17

Det

ecte

d W

ater

mar

k


50 % Subtraction

50 % Biit Flipping

50 % Addition

No Attack


Page 60

are detected in the proposed RBW-RD technique, even if the value of DT check is high. Addition

attack may reduce performance of FP detection in both DEW and the proposed RBW-RD techniques

but the proposed technique will get slightly higher rate of FP because of high embedding success.

Thus, the proposed RBW-RD technique achieves high capacity as compared to DEW, while distortion

and complexity are less. Likewise, FPs in the proposed RBW-RD technique are less even if the value

of DT check is high.

4.4.1 Experimental Analysis of The Proposed RBW-RD Technique against DEW Technique

Results of the proposed RBW-RD and DEW techniques are provided in Figure 4.15 and 16. Both

figures show 50 % subtraction, bit flipping, addition, and no attack on RW. Values of DT check is kept

high for both DEW (Figure 4.15) and the proposed RBW-RD technique (Figure 4.15). Average of nine

attack levels (10, 20 … to, 90 % attack) is taken as one point. In both figures, five points (0.01, 0.02,

0.04, 0.08, and 0.17) are selected for watermarking fraction. Watermarking capacity is low at point

0.01, while watermarking capacity is high at 0.17.

Increase in capacity is noticed, with the increase in fraction of tuples selected to be watermarked, as

a result difference between results of all attacks increases. Results show that watermark embedding

and extraction rate in the proposed RBW-RD technique is high compared to DEW technique. Further,

at high value of DT check, DEW technique has high probability of FPs detection. Despite of FP

Figure 4.16 RBW-RD And DEW Method Comparison After Bit Flipping

Attack (After Taking Average Of 0.01, 0.02, 0.04, 0.08, And 0.17)

325

375

425

475

525

575

625

675

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.79 0.89

Det

ecte

d W

ater

mar

k

Fraction of tuple attacked (1/ λ)

RBW-RD Bit Flipping

DEW Bit Flipping


Page 61

detection in DEW technique, the watermarking capacity of the proposed RBW-RD technique is high.

Figure 4.17 provides comparison between detection results of DEW and the proposed RBW-RD

techniques. In this figure bit flipping attacks on 0.00 to 0.89 fraction of tuple are performed. However,

for watermark embedding and detection overall 0.064103 fraction of tuple are used. Average of five

fractions (0.01, 0.02, 0.04, 0.08, and 0.17) is taken as one point. The proposed RBW-RD and DEW

techniques are both equipped with DT and bit checking capabilities as well as have same procedure for

Figure 4.17 RBW-RD Comparison After Subtraction, RBF, Addition Attack, &

No Attack On Relation (Average of 10, 20…90% Attack)

Figure 4.18 RBW-RD And DEW Method Comparison After Addition Attack

(After Taking Average Of 0.01, 0.02, 0.04, 0.08, And 0.17)

0

500

1000

1500

2000

2500

0.01 0.02 0.04 0.08 0.17

Det

ecte

d W

ater

mar

k


50 % Subtraction

50 % Biit Flipping

50 % Addition

No Attack

0

200

400

600

800

1000

1200

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.79 0.89

Det

ecte

d W

ater

mar

k

Fraction of tuple attacked (1/ λ)

RBW-RD Addition

DEW Addition


Page 62

tuple and attribute selection. Therefore, it is noticed that effect of bit flipping and addition attacks in

both techniques is almost same, because the value of DT check is kept high. However, sorting attack

does not create any disturbance in process of watermark detection, because both techniques are using

same method for tuple and attribute selection [47].

From Figure 4.17, it is evident that after bit flipping at different attack levels, the proposed RBW-

RD technique attains better detection rate compared to DEW technique. Whereas, Figure 4.18 shows

that the addition attack results of the proposed RBW-RD technique are consistently high compared to

DEW technique at all levels. Watermark detection rate after bit flipping attack consistently decreases,

as the fraction of attacked tuple increases. Conversely, detection rate after addition attack is

consistently increasing because of increase in FPs.

Figure 4.19 shows capacity comparison of DEW technique on different value of DT check. It is

illustrated that for DEW technique watermarking capacity is changing with change in value of DT

check. However, results of the proposed RBW-RD technique will remain high for all values of DT

check, as discussed in section 4.2. Consequently, results show that the proposed RBW-RD technique

Figure 4.19 Capacity Of DEW Method After Changing Values Of DT


Page 63

will have high capacity compared to DEW technique, even at low value of DT check

Std for RW is calculated, using the proposed RBW-RD technique on fixed (high) value of DT check

(0-999). Whereas, std for RW is calculated, using DEW technique by consistently changing value of

DT check. Difference between std (distortion) of RW is calculated for both DEW and the proposed

RBW-RD techniques by using equation (4.6). Difference represents improvement of proposed RBW-

RD technique over DEW technique. Figure 4.20 shows improvement in std (distortion) of proposed

RBW-RD technique over DEW technique. Improvement in distortion is calculated for varying value

of DT check for DEW technique while value of DT check of the proposed RBW-RD technique is kept

constant (0-999).

4.5 Chapter Summary

Results and analysis show that the proposed RBW-RD technique is robust, blind and reversible for

relational database watermarking. Comparative analysis of the proposed RBW-RD technique with

RCM and DEW techniques show noticeable improvement. The proposed watermarking technique has

Figure 4.20 Decrease In Distortion By Using RBW-RD (Fixed DT) Over

DEW (Changing DT).


Page 64

achieved maximum capacity by utilizing both types of attribute pairs that belong to or do not belong to

RCM domain. Further, no overhead of adding side information is required as well as embedding

distortion is minimized by utilizing distortion tolerance parameter. Additionally, the utilization of

effective automatic bit checking has enabled the proposed technique to increase watermark security

and reduce FP rate. However, both integer and fraction portion of the numeric attribute are utilized for

watermark embedding. Therefore, if fraction portion is attacked even then the watermark can be

restored exactly.


Page 65

WATERMARKING OF DNA Chapter 5SEQUENCES

Rapid increase in the amount of information is noticed in current digital era. As a result need for larger

data storage devices with low physical size have increased. Likewise, storing huge information using

data hiding approaches gives extra advantage of increased security. That can facilitate data storage as

well as providing data authenticity and copyright protection. Researchers have recently exploited DNA

medium both for data storage as well as for secret information hiding through different data hiding

techniques [25, 80]. DNA is an organic compound and hereditary information for all living organisms

is present in it. DNA components are composed of strands of four nucleotides that are Guanine (G),

Adenine (A), Thiamine (T), and Cytosine (C).

Chemical and biological rules help to understand formation of DNA nucleotides [81-83]. This

information can be useful for embedding watermark information in DNA medium [84]. High DNA

Length of microscopic organisms like bacteria can help to store big amount of information. Watermark

can be embedded as well as extracted from DNA medium that strengthens the idea of biological

storage devices [85]. DNA watermarking can be helpful for copyright protection of Genetically

Modified Organisms (GMOs), which helps their unlawful usage [86].

The proposed synonymous substitution based watermarking for DNA sequences (SSW-DNA)

attains high capacity and achieves robustness against mutations. It exploits whole coding region as a

result high data storage is attained. Existing DNA watermarking systems use only 4-fold synonymous

codons, which may not increase watermarking capacity, as 2-fold and 3-fold synonymous codons

makes substantial portion of DNA. The proposed method facilitates the use of 4-fold, 3-fold, and 2-

fold synonymous codons, enabling high data storage capabilities. Structural information is retained

using binary strings and watermark is encoded before embedding using Reed Solomon (RS) codes.


Page 66

Additionally, the biologically synonymous substitution method has advantage of sustaining the amino

acid sequence, thus the DNA functionality is retained.

5.1 Sequences Used for Testing

Different publicly available databases for DNA are available over the internet. The proposed SSW-

DNA method is applied on the DNA sequences shown in Table 5.1. The details of DNA sequences are

obtained from (NCBI) National Centre for Biotechnology Information database [87]. The NC_012806

is the GENBANKID of yeast mitochondria DNA sequence, while the remaining DNA sequences are

members of the Pisces and Amphibian family [87].

5.2 The Proposed SSW-DNA Method

The presented SSW-DNA watermarking technique consists of a data embedding and a data extraction

module as shown in Figure 5.1. The watermarking data is first converted to binary format and then

encoded using RS coder. Then the synonymous substitution technique is used for embedding

watermark in to the DNA sequence. Binary string is used to align the DNA sequence in the extraction

module. The watermarked data is then obtained using synonymous substitution method. The obtained

data is passed through RS coder for remove the invalid bits. Lastly, the data is converted from binary

to the original format.

Table 5.1: Dataset

GENEBANK ID Length

NC_012806 626

AB571609 1130

AB571626 714

JQ070418 2461

JQ085958 1070

JQ268556 1796


Page 67

5.2.1 Data Embedding Section

The watermark data and DNA is provided in alphanumeric character to the embedding module. The

embedding process ensures that the functionality of host DNA is not disturbed. Embedding module

converts the watermark data in to binary format and applies RS codes to provide protection to the data.

The RS code is capable of correcting multiple burst bit errors. Thus, the lost watermarked data bits can

be recovered later, on the detection side.

Figure 5.1 SSW-DNA Method

The data is then embedded in the DNA sequence using synonymous substitution method. The data

is encoded using nucleotide sequence according to set of biological rules given in Table 5.2. It shows

that the specific pair of bits is translated to the corresponding nucleotide base. Every single nucleotide

base is represented by a pair of bit, for example: 11 correspond to T, 10 correspond to G, 01 represents


Page 68

C, and 00 represents A. Therefore, a nucleotide sequence for example CATG can be represented as

„01001110‟ in binary format.

5.2.2 Correction of Errors

Mutation naturally occurs constantly in cells of every living organisms [88-90]. Mutation occurs inside

the DNA medium and can change its structural organization and composition. Mutations are of

multiple types subject to multiple factors [91]. Mutations can affect the watermark information stored

inside the DNA medium. One of a type of mutation is point mutation that can alter the arrangement of

DNA sequence. Another type of mutation is non-sense mutation, which cause distortion in the reading

frames of DNA.

This distortion can disturb the watermark information because; watermark can be embedded in

coding region of the DNA sequence only. Different other type of mutations may also cause problems

for exact detection of watermark, for example missense and some variations of point mutation. A two

layer methodology is utilized in the proposed technique which helps to reduce the data losses due to

mutation. This two layer methodology consists of RS codes and a sequence alignment strategy.

5.2.3 Employing RS Codes for Restoring Mutation Losses

The error correction scheme adopted by proposed SSW-DNA technique involves RS codes, to solve

the problem of data loss because of mutations. RS codes are block based linear codes and provide

good performance in the recovery of watermark bits [92]. RS codes are used extensively in digital

communication for error correction. In the presented method, RS code is applied upon the watermark

data, which encapsulates some parity bits in to the watermark data. At the extraction side the parity

Table 5.2: Data Encoding Table

Binary Sequence Base

11 T

10 G

01 C

00 A


Page 69

bits are useful for exact recovery of watermark data. Pictorial representation of using RS coder in

proposed method is shown in Figure 5.2.

Sender Side

Reciever Side In-v

ivo

RS EncoderUser

watermark

Extraction

Mutation

RS Decoder

Embedding

User

watermark

Figure 5.2 RS Code Implementation

Data is represented using k bits, whereas n-k shows the number of parity bits. Both n and k are

combined by RS coder and final watermark data is represented as n. Over all pictorial detail of the data

bits is shown in Figure 5.3.

Data bits Parity Bits

n bits

k bits n-k bits

Figure 5.3 Structure of Text Encoded Using RS Coder

The ability of the error correction using RS coder depends upon the ratio n/k. Synonymous

substitution technique is used for embedding watermark data in the DNA sequence. During watermark

extraction process the RS coder rectifies the erroneous bits in the watermarked data. The parity bits (n-

k) embedded along the watermark data, helps to correct the multiple bits flipping occurred in the

watermark data.


Page 70

5.2.4 Enhanced Synonymous Substitution Technique

Main focus of the proposed approach is to sustain the normal functionality of DNA after the data is

embedded. The synonymous substitution method fulfills the above purpose. Synonymous substitution

also referred as silent mutation is responsible for altering the DNA sequences using a specific

phenomenon, which does not affect any of the activities of the cells of living organisms. For example,

the generation of amino acids is not affected by embedding watermark using synonymous substitution

and thus protein synthesis process of DNA is not interrupted. A DNA sequence consists of coding and

non-coding regions. Only coding region of the DNA is used for watermarking and non-coding region

is left unused because effect of altering non-coding region is not studied yet.

DNA sequence

Sequence end

Get next codon

EndYes

No

2 or 4 fold

codon

Convert 2

watermark bits

to nucleotide

Insert to Least

significant base

Replace the

original codon

Select 1st

synonymous

codon if bit=0

else 2nd codon

Replace original

codon

If 2/3 fold

If 4 fold

Figure 5.4 Synonymous Substitution


Page 71

During the process of protein synthesis the coding region of DNA translate into a specific chain of

amino acid. Three successive nucleotides known as a degenerative codon is translated in to an amino

acid, following a specific genetic rule. Different patterns of successive nucleotides can be translated to

the same amino acid. This multiplicity feature of amino acid is named as number of fold for

degenerative codons. All the successive nucleotides (number of folds) that are translated in to same

amino acid are known as degenerative codons. For example, both ATG and ATA translate to MET

amino acid. Thus, MET amino acid has 2 fold degenerative codons.

According to degenerative codons the coding region can be divided into three types; 4 fold, 3 fold,

and 2 fold degenerative codons. Four fold codons can be translated into the same amino acid for

example TCA, TCG, TCT, and TCC all translate in to “Ser” amino acid. AUU, AUC, and AUA present

the example of three fold codons that are translated into “isoleucine” amino acid. AGG and AGA is the

example of two fold codons that are translated into “Cys” amino acid. The proposed SSW-DNA

approach utilizes all three types of degenerative codons for hiding watermark information. Figure 5.4

provides flowchart of the synonymous substitution process. The process of embedding watermark

information in the degenerative codons is described in the coming section.

5.2.4.1 Degenerative 4-fold Codons

In the proposed approach all three types of degenerative codons are used and in all types the third

nucleotide is used for watermarking. In Figure 5.5, 4 fold degenerative codons is elaborated using an

example. CTT is the available codon of the DNA for watermarking. CTT and other three relevant

synonymous codons (CTC, CTA, and CTG) translate into same amino acid Leucine (LEU). Following

Table 5.3: Synonymous Substitution

Bit Amino acid 00 01 10 11

Theorine (Teu/T) ACT ACG ACC ACA

Valine (Val/V) GUT GUG GUC GUA

Alanine (Ala/A) GCT GCG GCC GCA

Arginine (Arg/R) CGT CGG CGC CGA

Glycine (Gly/G) GGT GGG GGC GGA


Page 72

the watermark information, third nucleotide T (‟11‟) is replaced with nucleotide G („10‟). Thus, after

watermarking the codon CTT will be converted to codon CTG. This substitution of third nucleotide

helps transformation among degenerative codons without alarming the translation to the relevant

amino acid. Few amino acids with the variation in their 4 fold degenerative codons are shown in Table

5.3. Using watermark information the suitable synonymous codons are selected from the available

possible codons.

Leucine

(LEU/L)

CTT

CTC

CTA

CTG

CTT

G

CTG

Figure 5.5 Data Insertion In 4-Fold Degenerative Codons

5.2.4.2 Degenerative 2-fold and 3-fold Codons

Presented approach can utilize both 3 fold and 2 fold degenerative codons for embedding watermark

information. Synonymous substitution capability is maintained and one bit of watermark information

is embedded for one codon (representation of amino acid sequence). For both 3 fold and 2 fold

degenerative codons following steps are followed to store a single watermarking bit.

1. First synonymous codon is used and the second synonymous codon is left unused if the

watermark bit is „0‟.

2. Second synonymous codon is used and the first synonymous codon is left unused if the

watermark bit is „1‟.


Page 73

3. Whereas in case of 3 fold degenerative codon the third synonymous codon is left unused

throughout.

Histidine

(HIS/H)

CAT

CACCAT

1

CAC

Figure 5.6 Data Insertion In 2-Fold Degenerative Codons

An example of 2 fold degenerative codon is provided in Figure 5.6. CAT is the synonymous codon

that translates to HIS amino acid [80] according to the standard genetic code. If the watermark bit to

be embedded is „1‟ then the second synonymous codon is used for watermarking. Thus, CAC

synonymous codon replaces CAT synonymous codon.

Three (3) fold degenerative codons occur rarely in the DNA. However, the „isoleucine‟ amino acid

represents an example of 3 fold degenerative codon that is represented by three synonymous codons

(AUU, AUC, and AUA). If the watermark bit is „1‟ first synonymous codon AUU is used. However, if

the watermark bit is „0‟ then the second synonymous codon AUC is used. Whereas, the third

synonymous codon AUA is not taken into account. After embedding the message into host DNA

sequence, a watermarked sequence is obtained. Figure 5.7 provides pictorial representation of the

embedding module.


Page 74

Figure 5.7 Data Embedding Module

5.2.5 Data extraction Section

High structural mutation rate on DNA can cause problem in the recovery of watermark information.

Therefore, sequence alignment of the DNA is an important measure during the data extraction process,


Page 75

when the mutation rate is high. However, only RS code based error correction technique alone can be

useful under normal circumstances (no mutation).

5.2.5.1 Sequence Alignment Step to Tackle Mutations

Different techniques have been developed to transport DNA sequences over communication channel

e.g. Wavelet Transform, Fourier Transform, etc.[93, 94]. On the detection side, binary strings are

utilized for setting the alignment of watermarked DNA [94]. Alignment of sequences helps to remove

physical disorders caused by mutations. Watermark data can be affected by the occurrence of

mutations. Sequence alignment in the proposed SSS-DNA method is used for transferring the

watermarked DNA sequence information (as side information) to the data extraction section. In this

regards, the changes caused by mutations are revoked using the transferred watermarked DNA

sequence information (transferred as side information). As we know that the side information is

transferred as binary strings therefore LZWA technique is used for encryption, it provides good

compression ratio and high speed conversion [95]. To determine the size of the side information an

example is provided. If a DNA sequence composed of 1000 nucleobases is watermarked, then three

binary strings each of length 1000 will be used for passing side information. Where, LZWA can be

used to compress the side information at 25.4% compression ratio. Therefore, total 724 bits will be

needed to pass the side information of the DNA sequence which consists 1000 nucleobases [96].

5.2.5.2 Producing Binary Strings

Four binary strings are used to represent a DNA sequence; each string represents a separate nucleotide.

XG(n), XA(n), XT(n), and XC(n) are the binary string representing nucleotide G, A, T, and C, one-to-

Table 5.4: VOSS Representation Of DNA Sequence

X(n) A T G C G A T C A T G A C C T G C A

XA(n) 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1

XG(n) 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0

XC(n) 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 1 0

XT(n) 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0


Page 76

one. Length of a single binary string is same as the length of DNA sequence. Table 5.4 is showing an

example of a DNA sequence “ATGCGATCATGACCTGCA”. In the binary sequence „1‟ donates the

occurrence of nucleotide, where „0‟ represents the occurrence of nucleotide in other binary sequence.

The binary representation of the DNA sequences helps in reducing the distortion caused by mutation.

In this connection, out of four binary strings three are transferred as side information to the extraction

section. These binary strings are used for aligning the sequence; to recover the changes introduced by

mutations in DNA medium.

5.2.5.3 Transfer Channel

Binary strings can be compressed using LZMA or Run-Length Coding [97] and can be transferred on

a separate channel to the detection side. At the detection side, original DNA sequence is reassembled

using binary strings (alignment information). Only three of these four binary strings needs to be

transferred over the communication channel. The reconstructed DNA can remove any kind of

structural defects occurred due to mutations in DNA sequence. Original DNA sequence can be

reconstructed using only three binary strings. Figure 5.8 provides an example of three binary strings

XC=000010000110000, XA=100000110000010, and XG=011001000001000. Figure 5.8

demonstrations the reconstructed DNA sequence using the three alignment information.

Figure 5.8 Reconstruction Of DNA Using Binary Strings


Page 77

5.2.5.4 Mechanism of Extracting Data

Process of extrication of watermark data from the DNA sequence is explained in current section. In

case of mutations, the DNA sequence is aligned using binary strings. Sequence alignment helps to

restore the alterations occurred inside the DNA, resulted because of mutations. The reconstructed

DNA is passed to the extraction module. Different stages of the extraction side are given in Figure 5.9.

Watermark

extraction

Start

Sequence

Alignment

If sequence

end

Get each codon

Sequence

Alignment

Information

Load

Watermarked

DNA

Decode

watermark data

Extracted

Watermark

Codon type

Extract 3rd

baseDecode

watermark bit

Convert to

binary

DNA-Binary

mapping table

Extracted

data

No

4-fold codon

2-fold codon

Yes

End

Figure 5.9 Data Extraction Module


Page 78

Coding region of the DNA is used for the extraction of watermark using following method.

1. Least significant base from the codon is obtained in case of 4 fold degenerative codon.

2. Whereas only one bit is obtained as watermark information in case of 2 fold or 3 fold

degenerative codon.

I. If the current codon is first of the degenerative codons then „0‟ is obtained as watermark bit.

II. If the current codon is second of the degenerative codons then „1‟ is obtained as watermark

bit.

Table 5.2 is used to convert the nucleotide (base) to binary strings, as the table shows that each

nucleotide translates to two bits. Whereas signal bits are obtained in case of 2 fold and 3 fold

degenerative codons. The resultant watermark information obtained from the DNA is passed to the RS

coder. Where the watermark data lost due to mutation is restored.

5.3 Results and Analysis

The proposed DNA-HSS technique is evaluated by using both biological DNA sequences and

synthetic DNA sequences. Table 5.1 provides the list of biological DNA sequences used for evaluation

whereas; the synthetic DNA sequences are generated randomly. Results of the experiments show that

presented technique does not disturb the core functionality of DNA as well as provides high data

storage capacity compared to existing methods.

5.3.1 Capacity of Storing Bits

Capability of the proposed model in terms of data hiding capacity is given in Table 5.5. First column

of the table show Locus of the DNA whereas, Length (L) column of the table show the total number of

nucleotides. UCa

shows the number of codons within coding region of DNA used for storing

watermark information. While, UnCa

represent the number of codons in noncoding region, and this

portion of the DNA is left unwatermarked. Total number of bits stored in a DNA sequence is shown

by bs (Bit stored). [69]. It is clear from our technique that 4 fold degenerative codons can store two

bits, while 3 or 2 fold codons can store one bit only. Therefore, the mathematical representation to

estimate the number of bits stored inside the DNA sequence is provided in equation (5.1).

) , ) ) (5.1)


Page 79

In the equation above, N4f represents the number of 4 fold degenerative codons and N(2,3)f shows

number of 2 fold and 3 fold codons that only belong to the coding regions of DNA. Capacity of a

DNA sequence of storing a watermark is determined by bpn (bit per nucleotide), as explained through

equation (5.2).

𝑝𝑛 𝑏𝑠

𝐿 (5.2)

With increase in length of coding region the capacity of storing watermark also increases, which is

apparent from Table 5.5. Comparatively, a lengthy coding region of AB571609 has high bpn value as

compared to others less lengthy coding regions for example JQ08595.

5.3.2 RS Codes for Error Correction

Losses due to mutation are handled using RS codes and the binary string representation. Different

artificial mutations are applied on the watermarked DNA and the error detection and correction

methods are used to analyze their performance. Binary string representation removes structural

changes in the DNA by realigning the sequence at the receiving end. In order to successfully restore

the losses occurred on communication channel RS coder is utilized. Numerous scenarios of mutations

can arise, subject to the occurrence and density of these mutations. RS coder has been tested in most of

these different scenarios. Result of applying RS coder over burst and point mutation is provided in

Figure 5.10. RS coder is evaluated using total mutated bits versus mutation bits left uncorrected.

Table 5.5: Bit Storage Capacity

Locus Length (L) UCa UnC

a bs bpn

NC_012806 626 164 44 236 0.371

AB571609 1130 313 63 489 0.433

AB571626 714 95 143 140 0.196

JQ070418 2461 683 137 1040 0.423

JQ08595 1070 101 255 160 0.150

JQ268556 1796 142 456 217 0.121

a: 1 Codon= 3 Nucleotide


Page 80

Figure 5.10 RS Coder Performance For Point And Burst Mutation Scenario

Results of two mutations (point and burst) are plotted in the Figure 5.10. In case of burst mutation,

when the occurrence of mutation is low, the erroneous bits are completely restored. However, when

the occurrence of mutation exceeds 42, the success rate of recovery of erroneous bits is disturbed. The

point mutation comparatively better results as compared to burst mutation. The success ratio of

recovery of erroneous bits is affected, after the occurrence of mutation exceeds 90. The slope for the

decreasing performance in case of point mutation is considerably lesser than that of burst mutation.


Page 81

Figure 5.11 RS Coder Performance In Random Mutation Scenario

Figure 5.11 provides comprehensive insight in to the use of RS coder in case of burst mutation.

Data is embedded in the DNA sequence and mutations are applied on the watermarked DNA sequence.

Correctly recovered bits, erroneous bits, total data bits, and error correction rate is plotted in Figure

5.11. In order to make it more clear, the error correction rate has been calculated and is showing the

percentage of correction performed by the RS coder. Results show that success rate of correcting

erroneous bits is 100% in majority of the cases, while only few are showing little decrease in success

rate. Figure 5.12 show the results obtained by applying RS coder in case of random mutations

scenario, where the success rate is less compared to the burst mutation.


Page 82

Figure 5.12 RS Coder Performance In Point Mutation Scenario

5.4 Comparison with Existing Methods

Comparison between the proposed SSW-DNA method and existing methods for storing data in DNA

medium is presented in Table 5.6. In terms of data storage SSW-DNA method provides high storage

capacity as compared to other techniques. In addition to the utilization of 4 fold degenerative codons, the

proposed method is completely utilizing the data storage capacity, by using 2 fold and 3 fold

degenerative codons as well. Comparison between different DNA sequences is plotted in Figure 5.13

against the bpn value.

It is evident from the graph that the proposed method is better compared to existing methods in terms

of data hiding capability. Figure 5.13 indicates roughly 33 % average increase of the proposed SSW-

DNA method over other techniques.


Page 83

Figure 5.13 Bpn Comparison

Figure 5.14 shows results of RS coder for error correction on several block sizes (8, 10, 12, and 14).

It is evident that the mutation correction capability of RS coder decreases with the increase in block

size. The average uncorrected mutations are high at block size 14, whereas low at other block. It can

be concluded that the block size affects the ability of restoring erroneous bits.

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

ypt7 AB571609 AB571626 JQ070418 JQ085958 JQ268556

bp

n

Locus

Shimanovsky et al.

Arital et al.

DNA-Crypt

SSW-DNA

Table 5.6: Comparison With Existing Techniques For Data Hiding Capacity

Locus Length Shimanovs

kyet al. [98]

Aritaet

al. [63]

Heider

et al. [64]

SSW-

DNA

NC_012806 626 167 164 144 236

AB571609 1130 309 313 352 489

AB571626 714 87 95 90 140

JQ070418 2461 667 683 714 1040

JQ085958 1070 103 101 118 160

JQ268556 1796 132 142 150 217


Page 84

Figure 5.14 Average Uncorrected Mutations At Different Block Sizes

The average uncorrected mutation with changing n/k ratio and increasing mutation count is shown

in Figure 5.15. As mentioned earlier, by changing n/k ratio the error correction capability of RS coder

is also affected. Smaller n/k ratio reduces the number of redundant bits, which results in reducing the

recovery of data bits from erroneous bits. Whereas, the higher n/k ratio helps in better recovery of data

bits even after the high rate of mutations. From Figure 5.15 it can be observed that at the same

mutation count, RS coder with lower n/k ratio will cause more uncorrected mutations. Whereas,

number of mutations left uncorrected at n/k 3 is very low. Therefore, the best result of successful

data retrieval is provided by n/k 3.


Page 85

Figure 5.15 Average Uncorrected Mutation Trend At Different N/K

RS code is very useful in restoring the lost watermark bits when the channel nose is high, for example

missense and frame shift mutations. These codes are very effective in error correction competency, and

they provide high probability to correct any erroneous bits present in the data. Different experiments

performed using RS codes have shown that error rectification ability increases with high n/k ratio and

smaller block size.


Page 86

5.5 Chapter Summary

The presented SSW-DNA watermarking technique is useful for different applications, including

advancement of biological data storage devices and copyright protection of GMO‟s. The proposed

SSW-DNA technique is capable of confidentially storing information in DNA without causing any

interference in its properties. In order to store data without interfering the sole functionality of DNA,

synonymous substitution based watermarking technique is used.

Currently, the synonymous substitution techniques only use 4-fold codons for watermarking DNA

sequences. Whereas, 2-fold and 3-fold synonymous codons form a substantial portion of DNA

sequence therefore, proposed SSW-DNA method utilizes 4-fold, 3-fold, and 2-fold synonymous

condones to increase the data storage capacity. Additionally, the proposed SSW-DNA technique is able

to recover losses occurred due to mutations by using structural integrity of the DNA.

In order to analyze data storage and error correction ability of the proposed SSW-DNA method

different experiments are performed. These results show that the proposed technique can store more

information in DNA, compared to different methods. Changing mutation rates and different

parameters like n/k ratio and block size are used to compare the proposed approach with underlying

watermarking/data hiding techniques. Thus, the proposed SSW-DNA approach is capable of storing

data in DNA without causing any threat to the survival of the living organism. As a result, the data is

safely stored inside the living organism and the organism can sustain the stored information over

period of numerous generations.


Page 87

CONCLUSIONS AND Chapter 6FUTURE DIRECTIONS

6.1 Thesis Summary

Focus of the current study is on developing capable constraint centered watermarking systems for

digital multimedia objects belonging to relational databases, and DNA of the living organisms.

Throughout the progress of this PhD, a number of watermarking necessities such as payload,

reversibility, false positive rate, blindness, imperceptibility, distortion, cost, and usability were

identified. With larger understanding of these necessities, we were able to construct more sufficient

and operational watermarking systems.

GA is used to improve the capacity of DEW method in databases, while keeping distortion

tolerance fixed. GA introduces some randomness in DEW technique, thus making it difficult for the

attacker to predict attributes. Security of the watermarking system is also enhanced by reducing the

distortion and minimizing the abrupt changes caused by DEW method. This is achieved by two

measures added in the fitness function of GA, first by using the knowledge of the neighborhood values

of the relational database, second by minimizing the distortion introduced by selecting attributes

resulting in minimum distortion. Results are showing improvement in terms of embedding capacity as

well.

Consequently, more watermark bits can be embedded in the database, while distortion introduced

in it is minimalized. This provides more comfort for the user and leaves fewer options for the attacker

to destroy the watermark. Detection technique of GADEW method resolves problem of reshuffling

attacks on attributes. It is robust against addition, deletion, sorting, bit flipping, tuple and attribute-

wise-multifaceted, and additive attacks. It has also solved problem of false positive rate at detection

side. However, the proposed GADEW is a semi-blind watermarking technique that requires the GA

chromosome information at the detection side. Additionally, the embedding phase of the proposed

GADEW method requires more time as compared to existing methods, because GA is computationally


Page 88

intensive. In future, we intend to develop a reversible watermarking technique, which can handles both

integer and floating point values present in a single relation.

A new robust, reversible, and blind watermarking method for relational database is proposed

(RBW-RD), which exploits reversible contrast mapping technique. Proposed technique achieves

high watermarking capacity, because watermark bits are successfully embedded even if the

transformed pairs do not belong to RCM domain. Additionally, there is no need for extra storage,

which is another factor of increase in embedding watermark. On the other hand, DT factor is added for

providing extra control to the proposed RBW-RD technique. It helps to observe the limitations for

each attribute so that transformed value may not exceed their DT level. At detection side,

automatically generated watermark bits are matched with extracted watermark bits. As a result, FP‟s

caused by addition or bit flipping attacks are minimized.

Proposed technique utilizes all three steps of watermarking. Whereas, pre-existed properties of

RCM are also retained, that is, no compression or encryption is required and the computational

complexity is also less. Additionally, more capacity is achieved with less distortion and FPs, without

storing extra bits. While both integer and fraction portion of the numeric attribute are utilized for

watermark insertion. Proposed technique can tackle the sorting attack inherently while experiments

have shown robustness against the bit flipping and addition attack as well. As the proposed RBW-RD

technique utilizes third step of RCM by exploiting both integer and fraction portion of targeted

attribute. Thus, the proposed approach is more suitable for relations containing fraction values.

Furthermore, utilizing third step of RCM technique creates 50 % chance of losing LSB‟s of only

fraction portion of the restored attributes. Finally, it is also compared with DEW based watermarking

technique. Comparison shows superiority of the proposed RBW-RD technique over DEW technique in

terms of capacity, distortion, and false positive rate.

An interesting technique for DNA medium is presented (SSW-DNA) that secretly stores the data

inside DNA without disturbing its role of carrying hereditary information. Biologically, synonymous

substitution method maintains the amino acid sequence, thus DNA functionality is retained. The

proposed SSW-DNA method combines regions comprising of 2-fold and 3-fold codons along with 4-

fold codons for watermarking. Thus, the data hiding capacity of the proposed technique is high.

Similarly, two-layered error correction scheme is incorporated in the proposed SSW-DNA technique.

Structural information is retained using binary strings and watermark is encoded before embedding

using Reed Solomon codes.


Page 89

Different tests are carried out for performance analysis of the proposed SSW-DNA method

regarding the data storage capacity and error correction capability. Experimental results show that the

proposed method is able to store more data in DNA, compared to the existing techniques. The dual

layer mutation correction approach has been tested for different mutation rates. Different parameters

including block size, n/k ratio are used to obtain performance curves. Results show that the proposed

technique offers better performance in terms of recovering data losses due to mutations. However, in

order to reduce mutation losses, the detection module requires binary strings for sequence alignment.

Therefore, the proposed SSW-DNA approach can be categorized as semi-blind watermarking

technique. Lengthy DNA of microscopic organisms such as bacteria can help to store large amount of

information. Thus, strengthens the idea of biological storage devices. DNA watermarking supports

copyright protection of Genetically Modified Organisms (GMOs).

6.2 Future Research Directions

6.2.1 Intelligent Watermarking

Intelligent techniques take time for embedding a watermark. In order to reduce the computational time

different ways needs to be adopted, along with sustaining the key advantages of the intelligent

approaches. In figure 3.6 and 3.7 it can be observed that for less number of tuples (small chromosome

size) the GA achieves high watermarking capacity but when the chromosome size is large, the

capacity of the proposed GADEW method decreases. The decrease in capacity is caused by the

selection of large number of tuples to be watermarked (large chromosome size). To reduce this problem,

multiple runs of GA might be helpful. Multiple runs may increase the time of watermark embedding

process but the improvement in capacity might remain the same. Combination of multiple intelligent

techniques can also be useful for attaining desired outcomes. There is high scope of exploring different

intelligent techniques to bring improvement in different properties of underlying watermarking system.

6.2.2 Reversible Watermarking

Development of new Reversible watermarking techniques for different multimedia objects and

improvement in the current reversible watermarking systems needs to be explored. There are number

of watermarking features that needs to be addressed. Reversible watermarking could be blind, semi-

blind, or non-blind. In this regard, blind watermarking is a technique with stringent conditions but is

considered a favorable one. Along with blindness and reversibility, the robustness property is highly


Page 90

desirable. Therefore, different aspects of attacking methods and their analysis can be made for the

existing reversible watermarking methods.

6.2.3 Watermarking Different Objects

Other potential multimedia objects may be targeted for bringing improvements in watermarking

features. These multimedia objects can be software, HTML documents, audio, video, Natural

language, etc. Reducing FP detection and distortion may be useful for certain type of multimedia

objects. Such objects do not tolerate permanent changes in their content, e.g. software and natural

language processing. However, high watermarking capacity and less distortion properties may be

interesting for objects like audio, video and HTML documents.


Page 91

REFERENCES

1. Ingemar, C., et al., Digital Watermarking and Steganography. 2008: Morgan Kaufmann

Publishers Inc. 624.

2. Ming-Shing, H., T. Din-Chang, and H. Yong-Huai, Hiding digital watermarks using

multiresolution wavelet transform. Industrial Electronics, IEEE Transactions on, 2001. 48(5):

p. 875-882.

3. Suthaharan, S., et al., Perceptually tuned robust watermarking scheme for digital images.

Pattern Recognition Letters, 2000. 21(2): p. 145-149.

4. Kung, C.M., et al. A Robust Watermarking and Image Authentication Technique on Block

Property. in Information Science and Engineering, ISISE . International Symposium on. 2008.

5. Pan, Z., et al., A Double Domain Based Robust Digital Image Watermarking Scheme, in

Technologies for E-Learning and Digital Entertainment. 2008, Springer Berlin Heidelberg. p.

656-663.

6. Khan, A., et al., Machine learning based adaptive watermark decoding in view of anticipated

attack. Pattern Recognition, 2008. 41(8): p. 2594-2610.

7. Khan, A. and A.M. Mirza, Genetic perceptual shaping: Utilizing cover image and conceivable

attack information during watermark embedding. Information Fusion, 2007. 8(4): p. 354-365.

8. Wang, F.-H., et al., Multiuser-based shadow watermark extraction system. Information

Sciences, 2007. 177(12): p. 2522-2532.

9. Suthaharan, S., Fragile image watermarking using a gradient image for improved localization

and security. Pattern Recognition Letters, 2004. 25(16): p. 1893-1903.

10. Yu, D., F. Sattar, and B. Barkat, Multiresolution fragile watermarking using complex chirp

signals for content authentication. Pattern Recognition, 2006. 39(5): p. 935-952.

11. Chin-Chen, C. and H. Chou. A New Public-Key Oblivious Fragile Watermarking for Image

Authentication Using Discrete Cosine Transform. in Future Generation Communication and

Networking Symposia, FGCNS '08. Second International Conference on. 2008.

12. Zhang, X. and S. Wang, Fragile watermarking scheme using a hierarchical mechanism.

Signal Processing, 2009. 89(4): p. 675-679.

13. Kundur, D. and D. Hatzinakos, Digital watermarking for telltale tamper proofing and

authentication. Proceedings of the IEEE, 1999. 87(7): p. 1167-1180.

14. Yang, S.Y., Z.D. Lu, and F.H. Zou. A novel semi-fragile watermarking technique for image

authentication. in Signal Processing, Proceedings. ICSP '04. 7th International Conference on.

2004.


Page 92

15. Che, S., et al. Semi-fragile Watermarking Algorithm Based on Image Character. in Computer

Science and Software Engineering, International Conference on. 2008.

16. Fridrich, J., M. Goljan, and R. Du. Invertible authentication. 2001.

17. Chamlawi, R. and A. Khan, Digital image authentication and recovery: Employing integer

transform based information embedding and extraction. 2010, Elsevier Science Inc. p. 4909-

4928.

18. Chamlawi, R., A. Khan, and I. Usman, Authentication and recovery of images using multiple

watermarks. Computers & Electrical Engineering, 2010. 36(3): p. 578-584 0045-7906.

19. Sion, R., M. Atallah, and P. Sunil, Rights protection for relational data. Knowledge and Data

Engineering, IEEE Transactions on, 2004. 16(12): p. 1509-1525.

20. Malik, S.A., et al., Authentication of images for 3D cameras: Reversibly embedding

information using intelligent approaches. Journal of Systems and Software, 2012. 85(11): p.

2665-2673.

21. Kamran, M. and M. Farooq, An Information-Preserving Watermarking Scheme for Right

Protection of EMR Systems. IEEE Transactions on Knowledge and Data Engineering, 2012: p.

13.

22. Jawad, K. and A. Khan, Genetic algorithm and difference expansion based reversible

watermarking for relational databases. Journal of Systems and Software, 2013. 86(11): p.

2742-2753.

23. Gupta, G., Robust digital watermarking of multimedia objects, in Department of Computing.

2008, Macquarie University. p. 156.

24. Arsalan, M., S.A. Malik, and A. Khan, Intelligent reversible watermarking in integer wavelet

domain for medical images. Journal of Systems and Software, 2011. 85(4): p. 883-894.

25. Guozhen, X., et al., New field of Cryptography: DNA cryptography. Chinese Science Bulletin,

2006. 51(12): p. 1413-1420.

26. Alattar, A.M., Reversible watermark using the difference expansion of a generalized integer

transform. Image Processing, IEEE Transactions on, 2004. 13(8): p. 1147-1156.

27. Roberto, C., F. Francesco, and B. Rudy, Reversible watermarking techniques: an overview

and a classification. EURASIP J. Inf. Secur., 2010: p. 1-19.

28. Feng, J., et al., Reversible Watermarking: Current Status and Key Issues. I. J. Network

Security, 2006: p. 161-170.

29. Cauley, L. (2007) U.S. Net access not all that speedy. USA today Volume,

30. Cox, I., et al., Digital Watermarking. Journal of Electronic Imaging, 2002. 11(3): p. 414.


Page 93

31. Rakesh, A. and K. Jerry, Watermarking relational databases, in Proceedings of the 28th

international conference on Very Large Data Bases. 2002, VLDB Endowment: Hong Kong,

China.

32. Shehab, M., E. Bertino, and A. Ghafoor, Watermarking Relational Databases Using

Optimization-Based Techniques. Knowledge and Data Engineering, IEEE Transactions on,

2008. 20(1): p. 116-129.

33. Mailing, M., C. Xinchun, and C. Haiting. The Approach for Optimization in Watermark Signal

of Relational Databases by using Genetic Algorithms. in Computer Science and Information

Technology, ICCSIT . International Conference on. 2008.

34. Gupta, G. and J. Pieprzyk, Reversible and blind database watermarking using difference

expansion, in Proceedings of the 1st international conference on Forensic applications and

techniques in telecommunications, information, and multimedia and workshop. 2008, ICST:

Adelaide, Australia.

35. Gupta, G., et al., Database Relation Watermarking Resilient against Secondary Watermarking

Attacks. Information Systems Security.Springer Berlin, 2009: p. 222-236.

36. Farfoura, M.E., et al., A blind reversible method for watermarking relational databases based

on a time-stamping protocol. Expert Systems with Applications, 2012. 39(3): p. 3185-3196.

37. Chang, C.C. and P.Y. Lin, Adaptive watermark mechanism for rightful ownership protection.

Journal of Systems and Software, 2008. 81(7): p. 1118-1129.

38. Jun, T., Reversible data embedding using a difference expansion. Circuits and Systems for

Video Technology, IEEE Transactions on, 2003. 13(8): p. 890-896.

39. Wu, H.C., et al., A high capacity reversible data hiding scheme with edge prediction and

difference expansion. Journal of Systems and Software, 2009. 82(12): p. 1966-1973.

40. Afridi, T., A. Khan, and Y. Lee, Mito-GSAAC: mitochondria prediction using genetic

ensemble classifier and split amino acid composition. Amino Acids, 2011: p. 1-12.

41. Hayat, M. and A. Khan, MemHyb: Predicting membrane protein types by hybridizing SAAC

and PSSM. Journal of Theoretical Biology, 2011. 292(0): p. 93-102.

42. Naveed, M. and A. Khan, GPCR-MPredictor: multi-level prediction of G protein-coupled

receptors using genetic ensemble. Amino Acids, 2011: p. 1-15.

43. Tahir, M., A. Khan, and A. Majid, Protein subcellular localization of fluorescence imagery

using spatial and transform domain features. Bioinformatics, 2011. 28(1): p. 91-97.

44. Khan, A., A. M. Mirza, and A. Majid, Intelligent perceptual shaping of a digital watermark:

Exploiting Characteristics of human visual system. 2006, IOS Press. p. 213-223.

45. Khan, A., A novel approach to decoding: Exploiting anticipated attack information using

genetic programming. KES Journal, 2006: p. 337-346.


Page 94

46. Khan, A., et al. Variable Threshold Based Reversible Watermarking: Hiding Depth Maps. in

Mechtronic and Embedded Systems and Applications, MESA. IEEE/ASME International

Conference. 2008.

47. Rakesh, A., J.H. Peter, and K. Jerry, Watermarking relational data: framework, algorithms

and analysis. 2003, Springer-Verlag New York, Inc. p. 157-169.

48. Du, J., R. Alhajj, and K. Barker, Genetic algorithms based approach to database vertical

partition. Journal of Intelligent Information Systems, 2006. 26(2): p. 167-183.

49. Kamran, M., S. Suhail, and M. Farooq, A Robust, Distortion Minimizing Technique for

Watermarking Relational Databases Using Once-for-all Usability Constraints. IEEE

Transactions on Knowledge and Data Engineering, 2012. 99(PrePrints): p. 1-1.

50. Kamran, M. and M. Farooq, A Formal Usability Constraints Model for Watermarking of

Outsourced Datasets. Information Forensics and Security, IEEE Transactions on, 2013. 8(6):

p. 1061-1072.

51. Coltuc, D. and J.M. Chassery, Very Fast Watermarking by Reversible Contrast Mapping.

Signal Processing Letters, IEEE, 2007. 14(4): p. 255-258.

52. Yeh-Shun, C. and W. Ran-Zan, Steganalysis of Reversible Contrast Mapping Watermarking.

Signal Processing Letters, IEEE, 2009. 16(2): p. 125-128.

53. Goldman, N., et al., Towards practical, high-capacity, low-maintenance information storage

in synthesized DNA. Nature, 2013. advance online publication.

54. Maiti, D., S.P. Maity, and H. Maity. Modification in contrast mapping: Reversible

watermarking with performance improvement. in Signal Processing and Communications

(SPCOM), 2012 International Conference on. 2012.

55. Mousa, H., et al., Data hiding based on contrast mapping using DNA medium. Int. Arab J. Inf.

Technol, 2011. 8: p. 147-154.

56. Bancroft, C., et al., Long-Term Storage of Information in DNA. Science, 2001. 293(5536): p.

1763-1765.

57. Clelland, C.T., V. Risca, and C. Bancroft, Hiding Data in DNA Microdots Nature, 1999. 399:

p. 533-534.

58. Wong, P.C., K.-K. Wong, and H. Foote, Organic data memory using the DNA approach.

Communications of the ACM, 2003. 46(1): p. 95-98.

59. Modegi, T., Watermark embedding techniques for DNA sequences using codon usage bias

features, in 16th International Conference on Genome Informatics. 2005.

60. Ailenberg, M. and O.D. Rotstein, An improved Huffman coding method for archiving text,

images, and music characters in DNA. BioTechniques, 2009. 47: p. 747-754.


Page 95

61. Yachie, N., Y. Ohashi, and M. Tomita, Stabilizing synthetic data in the DNA of living

organisms. Syst Synthtic Biology, 2008. 2: p. 19-25.

62. Shimanovsky, B., J. Feng, and M.Potkon, Hiding Data in DNA, in Revised papers from the

5th International Workshop on Information Hiding, Lecture Notes in Computer Science, IH.

2002: Noordwijkerhout, The Netherlands. p. 373-386.

63. Arita, M. and Y. Ohashi, Secret Signatures Inside Genomic DNA. Biotechnol. Prog., 2004. 20:

p. 1605-1607.

64. Heider, D. and A. Barnekow, DNA-based watermarks using the DNA-Crypt Algorithm.

Computer Journal of BMC Bioinformatics, 2007. 8(1): p. 176-187.

65. Chang, C.C., et al., Reversible Data Hiding Schemes for Deoxyribonucleic Acid (DNA)

Medium. International Journal of Innovative Computing, Information and Control, 2007. 3(5):

p. 1145-1160.

66. Shiu, H.J., et al., Data Hiding method based upon DNA sequences. ELSEVIER Information

Sciences, 2010. 180: p. 12.

67. Church, G.M., Y. Gao, and S. Kosuri, Next-Generation Digital Information Storage in DNA.

Science, 2012. 337.

68. Driscoll, A.O. and R. Sleator, Synthetic DNA: the next generation of big data storage.

Bioengineered, 2013. 4: p. pp. 123-125.

69. Bonnet, J., P. Subsoontorn, and D. Endy, Rewritable digital data storage in live cells via

engineered control of recombination directionality. Proceedings of the National Academy of

Sciences, 2012. 109: p. pp. 8884-8889.

70. Freese, E., The specific mutagenic effect of base analogues on Phage T4. Journal of Molecular

Biology, 1959. 1(2): p. 87-105.

71. Halder, R., S. Pal, and A. Cortesi, Watermarking Techniques for Relational Databases:

Survey, Classification and Comparison. Journal of Universal Computer Science, 2010: p. 27.

72. Schneier, B., Applied Cryptography. 1996: John Wiley.

73. Yoo, M., Real-time task scheduling by multiobjective genetic algorithm. Journal of Systems

and Software, 2009. 82(4): p. 619-628.

74. Huffman, D.A., A Method for the Construction of Minimum-Redundancy Codes. Proceedings

of the IRE, 1952. 40(9): p. 1098-1101.

75. Rivest, R.L., A. Shamir, and L. Adleman, A method for obtaining digital signatures and

public-key cryptosystems. 1978, ACM. p. 120-126.

76. Frank, A. and A. Asuncion. UCI Machine Learning Repository. 2010 [cited; Available from:

http://archive.ics.uci.edu/ml.

http://archive.ics.uci.edu/ml


Page 96

77. Morford, L., A theoretical application of selectable markers in bacterial episomes for a DNA

cryptosystem. Journal of Theoretical Biology, 2011. 273(1): p. 100–102.

78. Rakesh, A., J.H. Peter, and K. Jerry, Watermarking relational data: framework, algorithms

and analysis. 2003, Springer-Verlag New York, Inc. p. 157-169.

79. Sion, R., M.J. Atallah, and S. Prabhakar, Rights protection for categorical data. Knowledge

and Data Engineering, IEEE Transactions on, 2005. 17(7): p. 912-926.

80. Gehani, A., T.H. LaBean, and J.H. Reif, DNA Based Cryptography. Computer Journal of

IMACS DNA-Based Computer, American Mathematical Society, USA, 2004. 2950(456): p.

34-50.

81. Zhang, J.-H., L.-Y. Wu, and X.-S. Zhang, Reconstruction of DNA sequencing by

hybridization. Bioinformatics, 2003. 19(1): p. 14-21.

82. Clancy, S. and W. Brown, Translation: DNA to mRNA to Protein. Nature Education, 2008.

1(1).

83. Watson, J., et al., Molecular Biology of the Gene. 6th ed. 2008: Pearson/ Benjamin

Cummings.

84. Heider, D. and A. Barnekow, DNA Watermarking: Challenging Perspectives for

Biotechnological Applications. Current Bioinformatics, 2011. 6(3): p. 375.

85. Heider, D. and A. Barnekow, DNA watermarks: A proof of concept. BMC Molecular Biology,

2008. 9(5): p. 45-49.

86. Heider, D., D. Kessler, and A. Barnekow, Watermarking sexually reproducing diploid

organisms. Bioinformatics, 2008. 24(17): p. 1961-1962.

87. NCBI. GENEBANK. 2010 07-11-11 [cited 2012 06-03-12]; GenBank ® is the NIH genetic

sequence database, an annotated collection of all publicly available DNA sequences].

Available from: www.ncbi.nlm.nih.gov/nuccore.

88. Haughton, D. and F. Balado, Performance of DNA Data Embedding Algorithms under

Substitution Mutations, in 2010 IEEE International Conference on Bioinformatics and

Biomedicine Workshops. 2010: Hong Kong. p. 201-206.

89. Richard, M., et al., Optimity in DNA repair. journal of Theoratical Biology, 2012. 292(7): p.

39-43.

90. Balado, F., Capacity of DNA Data Embedding Under Subsititution Mutations. IEEE

Transactions on Information Theory, 2013. 59(2): p. 928-941.

91. Pera, L.L., P. Marcatili, and A. Tramontano, PCMI: mapping point mutations on genomes.

Bioinformatics, 2010. 26(22): p. 2904-2905.

http://www.ncbi.nlm.nih.gov/nuccore


Page 97

92. Sklar, B., Digital Communications: Fundamentals and Applications. 2nd ed. 2001: Prentice-

Hall. 1070.

93. Kwan, H.K., R. Atwal, and B.Y.M. Kwan, Wavelet analysis of DNA sequence, in

International Conference on Communications, Circuits and Systems,ICCCAS. 2008: Fujian. p.

816-820.

94. Jleed, H. and S. Agaian, Prediction of Coding Region in the DNA Sequences, in 2011 IEEE

Conference on Systems, Man, Cybernetics (SMC), IEEE, Editor. 2011, IEEE: San Antonio,

TX, USA. p. 1128-1133.

95. Cui, W., New LZW Data Compression Algorithm and Its FPGA Implementation, in

Picture Coding Symposium. 2007.

96. Collin, L., A Quick Benchmark: Gzip vs. Bzip2 vs. LZMA. 2005.

97. Gonzalez, R.C. and R.E. Woods, Digital Image Processing. 2008, India: Pearson Education

Inc. 954.

98. Shimanovsky, B., J. Feng, and M.Potkon, Hiding Data in DNA, in in Revised papers from the

5th International Workshop on Information Hiding, Lecture Notes in Computer Science, IH

2002: Noordwijkerhout, The Netherlands. p. 373-386.

Documents

Multimedia Watermarking Using Intelligent Techniquesprr.hec.gov.pk/jspui/bitstream/123456789/6759/1/Khurram_Jawad_Computer...Multimedia Watermarking Using Intelligent Techniques Page