Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Statistics and SteganalysisCSM25 Secure Information Hiding
Dr Hans Georg Schaathun
University of Surrey
Spring 2009 – Week 2
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 1 / 54
Learning Outcomes
After this session, everyone shouldhave a basic understanding of statistical hypothesis testingunderstand how statistical methods apply to steganographybe able to implement the basic χ2 test of steganalysisbe able to interpret output from the χ2 test
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 2 / 54
Suggested Reading
Core Reading
Cox et al. Chapter 13.
Suggested Reading
Gouri K. Bhattacharyya and Richard A. Johnson: Statistical Conceptsand Methods (Wiley Series in Probability and Statistics).
Suggested Reading
«Higher-order statistical steganalysis of palette images»by Jessica Fridrich, Miroslav Goljan, David Soukal in Proc. SPIEElectronic Imaging, Jan 2003, pp. 178-190
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 3 / 54
Visual Steganalysis The LSB plane
The visual attack
Visual inspection is the simplest form of steganalysisConsider complete imageExtract LSB plane (or other bit planes)Histogramme etc.
In Exercise 2, you studied LSB planesWhat did you see?
These slides present some images I have inspected.
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 6 / 54
Visual Steganalysis The LSB plane
Structure in the Image (I)Example from Wayner’s book
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 7 / 54
Visual Steganalysis The LSB plane
Structure in the Image (II)Example from Wayner’s book
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 8 / 54
Visual Steganalysis The LSB plane
ComparingWhich is the stego-object?
Example uses EzStego (GIF).
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 9 / 54
Visual Steganalysis The LSB plane
A less good example (I)
GrayscaleSpatial DomainLSBAny structure?
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 10 / 54
Visual Steganalysis The LSB plane
A less good example (II)
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 11 / 54
Visual Steganalysis The LSB plane
Structure in the message
What are the vertical stripes?Structure? Of what?No relation to image...i.e. must relate to message.Conclusion: the plaintext is structured.What if we had compressed theplaintext?
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 12 / 54
Visual Steganalysis The LSB plane
Why is the message structured?
Message as 384x213 imageThe stripes are there. Why?How did we convert text to binary?What has happened,
All first-bits comes firstAll seventh-bits comes lastBit 6-7 is often zero
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 13 / 54
Visual Steganalysis The LSB plane
Different message structureCharacter by character
Same messageOrdered character by character.Is there structure?Maybe. Definitly harder to exploit.
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 14 / 54
Visual Steganalysis The LSB plane
Can you detect it?
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 15 / 54
Visual Steganalysis The LSB plane
Conclusion
Message structures are visible in the stego-image.Many kinds of structures
Ratio of 1-s versus 0-s.Location of 0-s and 1-s.
Such structure disappear with compressionTextbooks focus on LSB of coverimage
Visible structures in the coverdisappear in the stego-image
Less obvious if the message is randomly and sparsely distributed.We are looking for the unusual
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 16 / 54
Visual Steganalysis The Histogram
A typical image
Image histogram made by imhist in MatlabGives number of pixels per colour-value
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 18 / 54
Visual Steganalysis The Histogram
And a stego-image
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 19 / 54
Visual Steganalysis The Histogram
What happened?
Histogram of stego-image: More raggedEvery other bar sticks out.Why?50.8% 1-s in the binary message.
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 20 / 54
Visual Steganalysis The Histogram
What is characteristic?Pairs of values
Consider colour 2i (i = 0, 1, . . . , 127)What happens under LSB embedding?2i → 2i , 2i + 1Never 2i → 2i − 1.
Likewise 2i + 1 → 2i , 2i + 1(2i , 2i + 1) is a Pair of ValuesA pixel in (2i , 2i + 1) before embedding
... is a pixel in (2i , 2i + 1) after embedding
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 21 / 54
Visual Steganalysis Limitations
Visual methodsAdvantages and Limitations
Human perception is very flexiblecan exploit the unexpectedyou don’t have to know what you look for
Manual workthe process cannot be automated or computerised
How do you check a million images?
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 23 / 54
Statistics Statistical models
The remit of statistics
Statistics can estimate ‘normal’ behaviourand compare behaviours
AdvantagesAutomated decisionsExtract detailExact, quantifiable featuresAggregate measures
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 26 / 54
Statistics Statistical models
The fundamental question
Wendy the Warden intercepts an image.
Is it a probable, natural image?
Is it a probable stegogramme?
Depends on a model for natural imagesStatistical models and probability distributions
With a perfect model,cipher with ciphertexts distributed as natural images
If Wendy has a better model than Alice and Bob,then she can do effective steganalysis
In reality, we do not know what a natural image looks like
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 27 / 54
Statistics Pairs of Values
Pairs of ValuesThe statistic
Image X . Random variable Yk = #{(x , y)|Xxy = k}The Yk -s is the Histogramme.
Recall that (2l , 2l + 1) is a pair of values.First 7 pixel bits determined by image colour.
i.e. which pairLast bit (LSB) determined by message
i.e. which half of the pair
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 29 / 54
Statistics Pairs of Values
Pairs of ValuesExpected behaviour
Sum Y2l + Y2l+1 unaffected by embedding.For a random message steganogram,
Expect 50-50 2l and 2l + 1i.e. E(Y2l) = E(Y2l+1) = 1
2 (Y2l + Y2l+1)
For a natural image, what can we expect?In a given image, we can observe Y2l .
Is the observation probable?
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 30 / 54
Statistics Pairs of Values
Hypothesis testingThe principle
We have two possible hypotheses1 H0 The image is a steganogram with random message
Known distribution: E(Y2l) = E(Y2l+1) = 12 (Y2l + Y2l+1)
2 H1 The image is a natural imageUnknown distribution
Statistics allows us to answeris the observed Y2l -s likely under H0?
We cannot ask a similar question under H1.
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 31 / 54
Statistics Pairs of Values
The χ2 test
Statistical hypothesis tests exist for many purposesThe χ2 test can
compare different distributionsi.e. the H0 distribution and the observed distribution
aggregate several numbersi.e. Y2l for every l
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 32 / 54
Statistics Pairs of Values
The χ2 statistic
Several random variable F0, F1, . . . , Fm
Known expectations E(F0), E(F1), . . . , E(Fm)
S =m∑
i=0
(Fo − E(Fo))2
E(Fo)
Definition
SPoV =127∑l=0
(Y2l − 12(Y2l + Y2l+1))
2
12(Y2l + Y2l+1)
=127∑l∈0
12(Y2l − Y2l+1)
2
Y2l + Y2l+1
χ2 distributed with m degrees of freedom
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 33 / 54
Statistics Pairs of Values
Making Conclusions
If the observed S is likely under χ2 distribution,the assumed distribution (and thus H0) is plausible
If the observed S is unlikely under χ2 distribution,H0 is implausible
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 34 / 54
Statistics Pairs of Values
The χ2 PDF
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 35 / 54
Statistics Pairs of Values
The Pairs-of-Values χ2 Distribution
χ2 PDF127 degrees offreedomRed: 2% prob.+Green: 5%+Blue: 10%CumulativeDensityFunction (CDF)
Area underthe curve
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 36 / 54
Statistics Pairs of Values
The p-value
Let S be a stochastic χ2 distributed variableLet s be the observed χ2 statisticDefine p-value:p = P(S < s)
I.e. low p-value ⇒ s is unusually smallImprobable if the image is a stegogramme.Conclusion: probably natural image
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 37 / 54
Statistics Pairs of Values
p-value illustrated
We read the statistic (χ2) on the x-axisThe p-value is the area under the PDF to the rightCompute it with chi2cdf
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 38 / 54
Statistics Pairs of Values
Corrections
You may have to exclude pixel values which do not occurhave at least four pixels of each pair of values used
This keeps the χ2 distribution a good approximationThis reduces the degrees of freedom
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 39 / 54
Statistics Pairs of Values
χ2 in Matlab
Defined in the Statistics toolboxSimplified functions available on website:
chi2pdf (the PDF)chi2cdf(s,v) – P(S ≤ s) when S ∼ χ2(v)chi2inv(p,v) – s such that P(S ≤ s) = p
Note that the p-value is P(S ≥ s) = 1− P(S ≤ s)
use chi2cdf to calculate it
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 40 / 54
Statistics I visual approach
Part-image
The χ2 statistic is effective when the image is full of hiddeninformation
What happens if only a small part is used?
Basic LSB embedding uses the first N pixelsWe calculate the χ2 and p values for every N
The result can be plotteduse plot or fplot in Matlab
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 42 / 54
Statistics I visual approach
PlotsNo message
χ2 statistic p-value
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 43 / 54
Statistics I visual approach
Plots30% of capacity
χ2 statistic p-value
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 44 / 54
Statistics I visual approach
Plots60% of capacity
χ2 statistic p-value
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 45 / 54
Statistics I visual approach
Plots100% of capacity
χ2 statistic p-value
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 46 / 54
Statistics Error types
Classification errors
Steganalysis is a binary classification problemidentify an unknown object (image) as either
suspiciousinnocent
Two error typesFalse positive an innocent image wrongly accusedFalse negative a «guilty» image not identifiedWhich type is most severe?
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 48 / 54
Statistics Error types
Hypothesis testing and errors
Hypothesis testing is a recurring theme in statistics.Typical null hypotheses
Treatment A makes patients recover no more quickly than notreatment.The climate in South-East Britain is as warm/cold today as it was a100 years ago.The image sent by Alice is a natural (innocent) image.
When the hypothesis has been phrased,experiments can tell us whether it is plausible or not.
Wrongly accepting the null hypothesis is the least serious error
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 49 / 54
Statistics Error types
Asymmetry of hypothesis testing
Treatment A makes patients recover more quickly than notreatment.
One error is more serious than another.Type I: Accepting the hypothesis when it is wrong
Patients get ineffective (or unhealthy) medicine.Type II: Rejecting the hypothesis when it is right
More research will be made to optimise the treatment.
H0 retained H0 rejectedH0 true No error Error Type IH0 false Error Type II No error
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 50 / 54
Statistics Error types
The weirdness of the steganalysis
H0: The message is a steganogram.
We consider it (implicitely) serious to declare the messageinnocent when it is a stegogramme.Why?
Makes strong surveillance regime.Might be appropriate for prison scenario.
Real reasonProbability distribution known only for stegogrammes.We require known distribution under H0.
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 51 / 54
Postlogue
Randomised location
PoV assumes embedding in consecutive bitsGeneralised χ2 proposes a fixFridrich et al (2003) suggests an implementationNo rigid hypothesis test or statistical theory
works experimentally
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 53 / 54
Postlogue
Summary
Steganalysis can be cast as a problem of statisticsstandard statistical theory applies
The Pairs-of-Values χ2 test is a simple exampleThe weekly exercise is to implement and test this steganalysistechnique.
See website for detailed assignment.
Dr Hans Georg Schaathun Statistics and Steganalysis Spring 2009 – Week 2 54 / 54