11
Statistics and Steganalysis CSM25 Secure Information Hiding Dr Hans Georg Schaathun University of Surrey Spring 2009 – Week 2 Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 1 / 54 Learning Outcomes After this session, everyone should have a basic understanding of statistical hypothesis testing understand how statistical methods apply to steganography be able to implement the basic χ 2 test of steganalysis be able to interpret output from the χ 2 test Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 2 / 54 Suggested Reading Core Reading Cox et al. Chapter 13. Suggested Reading Gouri K. Bhattacharyya and Richard A. Johnson: Statistical Concepts and Methods (Wiley Series in Probability and Statistics). Suggested Reading «Higher-order statistical steganalysis of palette images» by Jessica Fridrich, Miroslav Goljan, David Soukal in Proc. SPIE Electronic Imaging, Jan 2003, pp. 178-190 Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 3 / 54 Visual Steganalysis The LSB plane The visual attack Visual inspection is the simplest form of steganalysis Consider complete image Extract LSB plane (or other bit planes) Histogramme etc. In Exercise 2, you studied LSB planes What did you see? These slides present some images I have inspected. Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 6 / 54

Visual Steganalysis The LSB plane Suggested Reading The

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Statistics and SteganalysisCSM25 Secure Information Hiding

Dr Hans Georg Schaathun

University of Surrey

Spring 2009 – Week 2

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 1 / 54

Learning Outcomes

After this session, everyone shouldhave a basic understanding of statistical hypothesis testingunderstand how statistical methods apply to steganographybe able to implement the basic χ2 test of steganalysisbe able to interpret output from the χ2 test

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 2 / 54

Suggested Reading

Core Reading

Cox et al. Chapter 13.

Suggested Reading

Gouri K. Bhattacharyya and Richard A. Johnson: Statistical Conceptsand Methods (Wiley Series in Probability and Statistics).

Suggested Reading

«Higher-order statistical steganalysis of palette images»by Jessica Fridrich, Miroslav Goljan, David Soukal in Proc. SPIEElectronic Imaging, Jan 2003, pp. 178-190

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 3 / 54

Visual Steganalysis The LSB plane

The visual attack

Visual inspection is the simplest form of steganalysisConsider complete imageExtract LSB plane (or other bit planes)Histogramme etc.

In Exercise 2, you studied LSB planesWhat did you see?

These slides present some images I have inspected.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 6 / 54

Visual Steganalysis The LSB plane

Structure in the Image (I)Example from Wayner’s book

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 7 / 54

Visual Steganalysis The LSB plane

Structure in the Image (II)Example from Wayner’s book

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 8 / 54

Visual Steganalysis The LSB plane

ComparingWhich is the stego-object?

Example uses EzStego (GIF).

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 9 / 54

Visual Steganalysis The LSB plane

A less good example (I)

GrayscaleSpatial DomainLSBAny structure?

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 10 / 54

Visual Steganalysis The LSB plane

A less good example (II)

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 11 / 54

Visual Steganalysis The LSB plane

Structure in the message

What are the vertical stripes?Structure? Of what?No relation to image...i.e. must relate to message.Conclusion: the plaintext is structured.What if we had compressed theplaintext?

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 12 / 54

Visual Steganalysis The LSB plane

Why is the message structured?

Message as 384x213 imageThe stripes are there. Why?How did we convert text to binary?What has happened,

All first-bits comes firstAll seventh-bits comes lastBit 6-7 is often zero

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 13 / 54

Visual Steganalysis The LSB plane

Different message structureCharacter by character

Same messageOrdered character by character.Is there structure?Maybe. Definitly harder to exploit.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 14 / 54

Visual Steganalysis The LSB plane

Can you detect it?

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 15 / 54

Visual Steganalysis The LSB plane

Conclusion

Message structures are visible in the stego-image.Many kinds of structures

Ratio of 1-s versus 0-s.Location of 0-s and 1-s.

Such structure disappear with compressionTextbooks focus on LSB of coverimage

Visible structures in the coverdisappear in the stego-image

Less obvious if the message is randomly and sparsely distributed.We are looking for the unusual

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 16 / 54

Visual Steganalysis The Histogram

A typical image

Image histogram made by imhist in MatlabGives number of pixels per colour-value

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 18 / 54

Visual Steganalysis The Histogram

And a stego-image

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 19 / 54

Visual Steganalysis The Histogram

What happened?

Histogram of stego-image: More raggedEvery other bar sticks out.Why?50.8% 1-s in the binary message.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 20 / 54

Visual Steganalysis The Histogram

What is characteristic?Pairs of values

Consider colour 2i (i = 0, 1, . . . , 127)What happens under LSB embedding?2i → 2i , 2i + 1Never 2i → 2i − 1.

Likewise 2i + 1 → 2i , 2i + 1(2i , 2i + 1) is a Pair of ValuesA pixel in (2i , 2i + 1) before embedding

... is a pixel in (2i , 2i + 1) after embedding

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 21 / 54

Visual Steganalysis Limitations

Visual methodsAdvantages and Limitations

Human perception is very flexiblecan exploit the unexpectedyou don’t have to know what you look for

Manual workthe process cannot be automated or computerised

How do you check a million images?

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 23 / 54

Statistics Statistical models

The remit of statistics

Statistics can estimate ‘normal’ behaviourand compare behaviours

AdvantagesAutomated decisionsExtract detailExact, quantifiable featuresAggregate measures

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 26 / 54

Statistics Statistical models

The fundamental question

Wendy the Warden intercepts an image.

Is it a probable, natural image?

Is it a probable stegogramme?

Depends on a model for natural imagesStatistical models and probability distributions

With a perfect model,cipher with ciphertexts distributed as natural images

If Wendy has a better model than Alice and Bob,then she can do effective steganalysis

In reality, we do not know what a natural image looks like

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 27 / 54

Statistics Pairs of Values

Pairs of ValuesThe statistic

Image X . Random variable Yk = #{(x , y)|Xxy = k}The Yk -s is the Histogramme.

Recall that (2l , 2l + 1) is a pair of values.First 7 pixel bits determined by image colour.

i.e. which pairLast bit (LSB) determined by message

i.e. which half of the pair

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 29 / 54

Statistics Pairs of Values

Pairs of ValuesExpected behaviour

Sum Y2l + Y2l+1 unaffected by embedding.For a random message steganogram,

Expect 50-50 2l and 2l + 1i.e. E(Y2l) = E(Y2l+1) = 1

2 (Y2l + Y2l+1)

For a natural image, what can we expect?In a given image, we can observe Y2l .

Is the observation probable?

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 30 / 54

Statistics Pairs of Values

Hypothesis testingThe principle

We have two possible hypotheses1 H0 The image is a steganogram with random message

Known distribution: E(Y2l) = E(Y2l+1) = 12 (Y2l + Y2l+1)

2 H1 The image is a natural imageUnknown distribution

Statistics allows us to answeris the observed Y2l -s likely under H0?

We cannot ask a similar question under H1.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 31 / 54

Statistics Pairs of Values

The χ2 test

Statistical hypothesis tests exist for many purposesThe χ2 test can

compare different distributionsi.e. the H0 distribution and the observed distribution

aggregate several numbersi.e. Y2l for every l

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 32 / 54

Statistics Pairs of Values

The χ2 statistic

Several random variable F0, F1, . . . , Fm

Known expectations E(F0), E(F1), . . . , E(Fm)

S =m∑

i=0

(Fo − E(Fo))2

E(Fo)

Definition

SPoV =127∑l=0

(Y2l − 12(Y2l + Y2l+1))

2

12(Y2l + Y2l+1)

=127∑l∈0

12(Y2l − Y2l+1)

2

Y2l + Y2l+1

χ2 distributed with m degrees of freedom

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 33 / 54

Statistics Pairs of Values

Making Conclusions

If the observed S is likely under χ2 distribution,the assumed distribution (and thus H0) is plausible

If the observed S is unlikely under χ2 distribution,H0 is implausible

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 34 / 54

Statistics Pairs of Values

The χ2 PDF

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 35 / 54

Statistics Pairs of Values

The Pairs-of-Values χ2 Distribution

χ2 PDF127 degrees offreedomRed: 2% prob.+Green: 5%+Blue: 10%CumulativeDensityFunction (CDF)

Area underthe curve

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 36 / 54

Statistics Pairs of Values

The p-value

Let S be a stochastic χ2 distributed variableLet s be the observed χ2 statisticDefine p-value:p = P(S < s)

I.e. low p-value ⇒ s is unusually smallImprobable if the image is a stegogramme.Conclusion: probably natural image

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 37 / 54

Statistics Pairs of Values

p-value illustrated

We read the statistic (χ2) on the x-axisThe p-value is the area under the PDF to the rightCompute it with chi2cdf

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 38 / 54

Statistics Pairs of Values

Corrections

You may have to exclude pixel values which do not occurhave at least four pixels of each pair of values used

This keeps the χ2 distribution a good approximationThis reduces the degrees of freedom

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 39 / 54

Statistics Pairs of Values

χ2 in Matlab

Defined in the Statistics toolboxSimplified functions available on website:

chi2pdf (the PDF)chi2cdf(s,v) – P(S ≤ s) when S ∼ χ2(v)chi2inv(p,v) – s such that P(S ≤ s) = p

Note that the p-value is P(S ≥ s) = 1− P(S ≤ s)

use chi2cdf to calculate it

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 40 / 54

Statistics I visual approach

Part-image

The χ2 statistic is effective when the image is full of hiddeninformation

What happens if only a small part is used?

Basic LSB embedding uses the first N pixelsWe calculate the χ2 and p values for every N

The result can be plotteduse plot or fplot in Matlab

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 42 / 54

Statistics I visual approach

PlotsNo message

χ2 statistic p-value

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 43 / 54

Statistics I visual approach

Plots30% of capacity

χ2 statistic p-value

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 44 / 54

Statistics I visual approach

Plots60% of capacity

χ2 statistic p-value

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 45 / 54

Statistics I visual approach

Plots100% of capacity

χ2 statistic p-value

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 46 / 54

Statistics Error types

Classification errors

Steganalysis is a binary classification problemidentify an unknown object (image) as either

suspiciousinnocent

Two error typesFalse positive an innocent image wrongly accusedFalse negative a «guilty» image not identifiedWhich type is most severe?

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 48 / 54

Statistics Error types

Hypothesis testing and errors

Hypothesis testing is a recurring theme in statistics.Typical null hypotheses

Treatment A makes patients recover no more quickly than notreatment.The climate in South-East Britain is as warm/cold today as it was a100 years ago.The image sent by Alice is a natural (innocent) image.

When the hypothesis has been phrased,experiments can tell us whether it is plausible or not.

Wrongly accepting the null hypothesis is the least serious error

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 49 / 54

Statistics Error types

Asymmetry of hypothesis testing

Treatment A makes patients recover more quickly than notreatment.

One error is more serious than another.Type I: Accepting the hypothesis when it is wrong

Patients get ineffective (or unhealthy) medicine.Type II: Rejecting the hypothesis when it is right

More research will be made to optimise the treatment.

H0 retained H0 rejectedH0 true No error Error Type IH0 false Error Type II No error

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 50 / 54

Statistics Error types

The weirdness of the steganalysis

H0: The message is a steganogram.

We consider it (implicitely) serious to declare the messageinnocent when it is a stegogramme.Why?

Makes strong surveillance regime.Might be appropriate for prison scenario.

Real reasonProbability distribution known only for stegogrammes.We require known distribution under H0.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 51 / 54

Postlogue

Randomised location

PoV assumes embedding in consecutive bitsGeneralised χ2 proposes a fixFridrich et al (2003) suggests an implementationNo rigid hypothesis test or statistical theory

works experimentally

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 53 / 54

Postlogue

Summary

Steganalysis can be cast as a problem of statisticsstandard statistical theory applies

The Pairs-of-Values χ2 test is a simple exampleThe weekly exercise is to implement and test this steganalysistechnique.

See website for detailed assignment.

Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 54 / 54