Watermarking Relational Databases CSC 574/474 Information System Security

Preview:

Citation preview

Watermarking Relational Databases

CSC 574/474 Information System Security

Cryptography Vs. Steganography Cryptography

Encryption: translate information into an unintelligible form

Decryption: decode to retrieve information Attackers cannot recover the information

Stenography Hide information in a seemingly common

message “Security through obscurity”: Attackers don’t

know where to find the information

Steganography Examples Greek messengers

Message tattooed into shaved head Invisible ink in a cover letter Bits hidden in pictures

Sounds familiar? Hide one image into another

Least significant bits Other forms?

Example

Taken from http://www.petitcolas.net/fabien/steganography/image%5Fdowngrading/old/

Example

Courtesy: http://www.petitcolas.net/fabien/steganography/image%5Fdowngrading/old/

Example

Courtesy: http://www.petitcolas.net/fabien/steganography/image_downgrading/index.html

Example

Courtesy: http://www.petitcolas.net/fabien/steganography/image_downgrading/index.html

Illustration of A Steganographic System

http://www.vu.union.edu/~shoemakc/watermarking/watermarking.html

Digital Watermarks

Insert marks into original data Use to demonstrate ownership: images,

video, audio, software… Other usage?

Should not significantly affect quality of original data

Should not be able to be destroyed easily Deter instead of prevent illegal copying

Watermarking Databases

Why? Data in database are intellectual

properties Is it possible?

Some numerical data do not need to be precise to be useful

Example? Some data are imprecise in nature

Example?

What Makes Watermarking Databases Different

Dealing with multiple objects (tuples) instead of one

Tuple order does not matter After dropping part of the

database, the remaining part is still valuable

Desirable Features

Detectability Allow undetectable marks

Robustness Benign updates, malicious attacks

Incremental updatability Do not need to re-compute

watermarks during updates

Desirable Features Imperceptibility

Preserve usefulness of the database Blind system

Do not need the original database for detection

Key-based system Watermarking scheme is open Only the private key matters

Attacks Benign updates Malicious attacks

Bit attack Rounding attack Subset attack Mix and match attack Additive attack Invertibility attack

Basic Setup n tuples, v numerical attributes, P: primary

key e least significant bits 1/r: fraction of tuples marked w: number of marked tuples (n/r) a: confidence parameter t: min number of correct mars for detection H: a one way hash function, K: private key,

F: a MAC function F(m) = H(K || H(K||m))

Watermark Insertion Algo.P A1 A2 … … A

v

AvAi… … … …A2A1P

(1) if (F(P) mod r = 0) then should mark(2) choose to mark Ai where i = F(P) mod v

bj bk bk-1 … be-1 … … b1

(3) choose jth bit to mark where j = F(P) mod e

bj = 0 if H(K || P) is evenbi =1 otherwise

Watermark Detection Algo.

1. Determine whether a tuple is marked2. Determine which attribute is marked

1. totalcount++3. Determine which bit is marked4. Check whether the jth bit is the same

as the expected mark1. Matchout++

5. Check whether a threshold t is met

How to determine threshold t?

Operations on Watermarked Databases

Query ? Updates?

Insertion Deletion Modification

How to Determine Threshold t

1. The probability that bj is not changed by watermarking is __

2. Out of w checks, the probability that t matches by chance is __

3. What is the probability the detection algorithm makes a wrong decision?

bj = 0 if H(K || P) is evenbi =1 otherwise

How to Determine Threshold t

1. 0.52. C(w, t) * 0.5^w3. (C(w, t) + C(w,t+1) + … + C(w,w)) *

0.5^w (1)

Let a be the tolerable error rate, we have to choose the minimum t such that(1) < a

Robustness Against Attacks

Bit-Flipping attack Choose s tuples from n tuples, flip all

the e least significant bits, the chance to erase the watermark is

Sumi=w-t+1,…,wC(w, i)C(n-w, s-i)/C(n,s)

Mix-and-Match Attack

Mallory takes k fraction of the database Mix it with his own relation Create a new relation of size n

For Alice to detect the watermark K*n/r + 0.5*(1-k)*n/r >= t

Additive Attack

Mallory inserts his own watermark in Alice’s database

How to determine who is the original owner? If two watermarking scheme marks

the same bit of the same tuple Then?

Invertibility Attack

Mallory finds a key that yields a satisfactory watermark on the database Affected by a The larger a is, is it easier or harder to

find such a key?

Design Tradeoffs

↓ a ↓ false hits ↑ missed watermark

↓ r ↑ robustness ↑ data errors

↑ v ↑ robustness

↑ e ↑ robustness ↑ data errors

Comments of the Paper

Simple yet effective idea Thorough analysis

Coming up with a good approach is hard

Analyze, validate and make the approach complete is even harder

No data on key length and hash function. What are their impact on performance?

Discussion

Possible attacks Frequent updates of the same tuple? Side channels

Water marking a tuple requires extra time Basic assumption

The owner’s database is secured

Regulations or law regarding database copyright?

Discussion

How to handle non-numerical data Every change is significant But we have to make changes

Minimize number of changes Encode message in cross-tuple

properties E.g., attribute frequency histogram

Discussion

Watermarking semi-structured data, e.g., XML? Attributes or element values can be

similarly watermarked Define key is an issue

The structure of the semi-structured data may also need to be watermarked

Further Reading

Watermarking Relational Databases by Rakesh Agrawal and Jerry Kiernan, International Conference on Very Large Data Bases (VLDB), 2002.

Rights Assessment for Discrete Digital Data, Ph.D thesis, by Radu Sion, Purdue University.

Recommended