When Discrete Optimization Meets Multimedia Security (and Beyond)

When Discrete Optimization Meets

Multimedia Security (and Beyond)

Dr Shujun Li (李树钧)

Deputy Director, Surrey Centre for Cyber Security (SCCS)

Senior Lecturer, Department of Computer Science

http://www.hooklee.com/

@hooklee75

http://www.hooklee.com/

https://twitter.com/hooklee75

Optimization meets Multimedia Security

The Original Research Problem:

Missing DCT Coefficients in Images

3

How does a digital camera work?

Shutter and

Diaphragm

Image

SensorLens

Image

EncodingImage

Storage/Trans

mission

…001110101001…

http://en.wikipedia.org/wiki/File:Matrixw.jpg

http://en.wikipedia.org/wiki/File:Matrixw.jpg

http://en.wikipedia.org/wiki/File:Iris_Diaphragm.gif

http://en.wikipedia.org/wiki/File:Iris_Diaphragm.gif

http://en.wikipedia.org/wiki/File:Verguetung1.jpg

http://en.wikipedia.org/wiki/File:Verguetung1.jpg

http://en.wikipedia.org/wiki/File:MicroSD_MemoryCard_002.jpg

http://en.wikipedia.org/wiki/File:MicroSD_MemoryCard_002.jpg

4

Image encoding pipeline

Pre-

Processing

Lossy

Coding

Lossless

Coding

Post-

Processing

Raw Image

Encoded

Image

Predictive

Coding

…11011001…

5

Transform in lossy image coding

Inverse

Transform

Inverse

QuantizationComplement

Block

Composition

Forward

TransformQuantizationTruncation

Block

Division

Lossless

Encoding

Lossless

Decoding

Encoder

Decoder

Some coefficients

can be discarded!

6

DCT as the mostly-used transform

- DCT (Discrete Cosine Transform) has been found

among one of the best de-correlation transform we

can use for image and video coding.

0 2 4 6 8 10 12 14 16 18 200

1

2

3

4

5

6

7x 10

4

Amplitude of DCT coefficients

Num

ber

of

DC

T c

oeff

icie

nts

7

JPEG image coding (DCT based)

- JPEG images are coded as blockwise (8×8) DCT

coefficients.

Blockwise

8×8 DCTQuantizer

Quantization

Table

Entropy

Encoder

8×8

Blocks

16 11 10 16 24 40 51 61

12 12 14 19 26 58 60 55

14 13 16 24 40 57 69 56

14 17 22 29 51 87 80 62

18 22 37 56 68 109 103 77

24 35 55 64 81 104 113 92

49 64 78 87 103 121 120 101

72 92 95 98 112 100 103 99

JPEG Image

8

- What if some DCT coefficients are

missing/unknown at the decoder side?

- This can happen in a number of scenarios.

- When an image is selectively encrypted, often one or

more DCT coefficients in some or all blocks are

encrypted so for an attacker those encrypted DCT

coefficients are missing.

- … (I will come back to other scenarios later!)

Finally the problem!

9

- To achieve format compliance

- To achieve perceptual encryption (different levels of

selective encryption different levels of perceptual quality

degradation)

- To achieve fast encryption with minimum bitrate control (re-

compression)

- To facilitate joint compression-encryption or encryption of

compressed images

- To allow other image processing operations between

encipher and decipher without revealing the key

- …

Why selective encryption?

10

- In the selective encryption context, we need to

evaluate the security of a selective encryption

method by looking at how much encrypted

information an attacker can recover.

- We assume the attacker has no any other

information other than the ciphertext (encrypted

image). So ciphertext-only attacks!

- In the literature, known- and chosen-ciphertext attacks

have been well studied, but not ciphertext-only attacks.

Why attackers?

11

- Lena (512512): Encrypting DC coefficients vs.

Encrypting the first 5 most significant DCT

coefficients (i.e., DC + the first 4 AC coefficients)

Examples of selective encryption

12

- Simply set all encrypted DC coefficients to zero (or

another more appropriate value).

Naïve ciphertext-only attack:

Error concealment attack (1)

13

- Simply set all encrypted DCT coefficients to zero

(or another more appropriate value).

Naïve ciphertext-only attack:

Error concealment attack (2)

14

- T. Uehara, R. Safavi-Naini, and P. Ogunbona, “Recovering

DC coefficients in block-based DCT,” IEEE Transactions

on Image Processing, vol. 15, no. 11, pp. 3592-3596, 2006

A smarter ciphertext-only attack:

USO method

15

- Property 1

- The difference between two neighboring pixels is a

Laplacian variate with zero mean and a small variance.

How does USO method work?

-100 -80 -60 -40 -20 0 20 40 60 80 1000

0.5

1

1.5

2

2.5

3x 10

4

16

- Property 2

- The range of pixel values calculated only from AC

coefficients constrains the value of the DC coefficient.

- N(tmin-min(B*)) DC(B) N(tmax-max(B*))

- N: block size

- [tmin, tmax]: valid range of pixel values (for 8-bit gray-

scale images they are 0 and 255)

- B and B*: A block and its DC-free edition


17

- Step 1: Choose a corner block as the

initial reference block B0, and estimate

DC coefficients of all the other blocks

relative to DC(B0).

- Step 2: Calculate the valid DC ranges of

all blocks and then intersect them to get

the range of DC(B0). Take the midpoint

and adjust the whole image accordingly.

- Step 3: Repeat Steps 1 and 2 for the

four corner blocks and then average the

results.

- Step 4: If there are pixel values out of

valid range, do scaling or clipping.


18

- [−83.0, 338.0]

- [−90.3, 345.3]

- [−92.0, 347.0]

- [−136.3, 391.3]

Is USO method perfect?

Not quite!

19

- Pixel value range: [−88.6, 303.0]

- PSNR = 14.3 dB

- SSIM = 0.732

- MS-SSIM = 0.711

How bad can the result be?

Full-Reference Objective

VQA (Visual Quality

Assessment) Metrics

Original

Recovered


We Can Do Better!

Discrete Optimization + USO

21

- Step 1: Do USO Step 1, but adjust (if necessary)

the estimate DC of each block so that no

under/over-flow pixel value exists.

- Step 2: Repeat Step 1 for different values of

DC(B0) to minimize the blockwise under/over-

flow rate, where B0 is the first block of each scan.

- Step 3: The same as the USO method’s Step 3.

FRM: Flow Rate Minimization

An improved USO method (ICIP 2010)

Shujun Li, Junaid Jameel Ahmad, Dietmar Saupe and C.-C. Jay Kuo, “An Improved

DC Recovery Method from AC Coefficients of DCT-Transformed Images,” in

Proceedings of 2010 17th IEEE International Conference on Image Processing (ICIP

2010, Hong Kong, China, September 26-29, 2010), pp. 2085-2088, 2010

22

- Minimum under-/over-flow rate Ground truth

of DC(B0)

Why does FRM work?

0 200 400 600 800 1000 1200 1400 1600 1800 2000

8

9

10

11

12

13

14

15

16

17

Estimate of DC(B0)

Underf

low

/Overf

low

rate

0 200 400 600 800 1000 1200 1400 1600 1800 2000

8

8.5

9

9.5

10

10.5

11

11.5

12

12.5

Estimate of DC(B0)

Underf

low

/Overf

low

rate

0 200 400 600 800 1000 1200 1400 1600 1800 2000

10

11

12

13

14

15

Estimate of DC(B0)

Underf

low

/Overf

low

rate

0 200 400 600 800 1000 1200 1400 1600 1800 2000

9.5

10

10.5

11

11.5

12

12.5

13

13.5

14

Estimate of DC(B0)

Underf

low

/Overf

low

rate

23

- PSNR: 14.3 23.2

- SSIM: 0.732 0.900

- MS-SSIM: 0.711 0.924

USO vs. FRM

Original USO FRM

24

- Statistically FRM > USO (and perceptually as well)

USO vs. FRM: 200 test images

20 40 60 80 100 120 140 160 180 200

-5

0

5

(PSNR): Mean = 1.57

20 40 60 80 100 120 140 160 180 200-0.05

0

0.05

0.1

0.15

(SSIM): Mean = 0.0248

20 40 60 80 100 120 140 160 180 200

0

0.1

0.2

0.3

(MS-SSIM): Mean = 0.0567

20 40 60 80 100 120 140 160 180 200

-5

0

5

(WSNR): Mean = 1.52

20 40 60 80 100 120 140 160 180 200

0

2

4

6

(NQM): Mean = 2.07

20 40 60 80 100 120 140 160 180 200

0

0.2

0.4

0.6

0.8

(IFC): Mean = 0.166

20 40 60 80 100 120 140 160 180 2000

0.05

0.1

(VIF): Mean = 0.0317

20 40 60 80 100 120 140 160 180 2000

0.05

0.1

(VIFP): Mean = 0.0265

20 40 60 80 100 120 140 160 180 200

0

0.05

0.1

0.15

(UQI): Mean = 0.0345

20 40 60 80 100 120 140 160 180 200

-202468

(VSNR): Mean = 2.04

25

- Unknown DC coefficients only

- Optimization of DC(B0) only The visual quality

of the recovered image is still not always

satisfying.

Is FRM perfect?


We Can Do Even Better!

A Completely New Approach Based

on Global Discrete Optimization

27

- Parameters: image size – MN

- Variables: pixels – x(i,j), DCT coefficients – y(k,l)

- Objective: minimize f = i,j,i’,j’ |x(i,j) – x(i’,j’)|

- (i,j) and (i’,j’) are coordinates of neighboring pixels

- Constraints:

- x=Ay – the 2-D DCT for each block

- xminx(i,j)xmax – the valid range of pixel values

- y(k,l)=y*(k,l) – the known DCT coefficients

A general optimization model for any

missing DCT coefficients (ICIP 2011)

Shujun Li, Andreas Karrenbauer, Dietmar Saupe and C.-C. Jay Kuo, “Recovering

Missing Coefficients in DCT-Transformed Images,” in Proceedings of 2011 18th IEEE

International Conference on Image Processing (ICIP 2011, Brussels, Belgium,

September 11-14, 2011), pp. 1569-1572, 2011

28

- Property 1 The difference between two

neighboring pixels is a Laplacian variate with zero

mean and a small variance.

- Theorem Given S observations of a Laplacian

distribution z with zero means, its maximum

likelihood estimator (MLE) of its variance of the

Laplacian distribution is

Why x(i,j),x(i’,j’)|x(i,j)-x(i’,j’)|?

1

𝑆 𝑖=1

𝑆

|𝑧𝑖|

29

- New auxiliary variables: x(i,j), y(i,j), h(i,j,i’,j’)

- New objective: minimize f = i,j,i’,j’ h(i,j,i’,j’)

- (i,j) and (i’,j’) are coordinates of neighboring pixels

- New constraints:

- x=Ay – the blockwise 2-D DCT

- xminxxmax – the valid range of pixel values


- x(i,j) – x(i’,j’) h(i,j,i’,j’)

- x(i’,j’) – x(i,j) h(i,j,i’,j’)

The model can be linearized into

a linear programming (LP) problem

h(i,j,i’,j’) ≥ |x(i,j) – x(i’,j’)| ≥ 0

30

- One free variable

- Global brightness: f is independent of it.

- This does not influence the visual quality of the

recovered image.

- But we need to handle this issue.

- Solution

- Shift the histogram towards the center of the valid range

of pixel values (xmin+xmax)/2 until the left and right

margins are equal.

One remaining problem

31

- Any linear programming solvers can be used.

- IBM ILOG CPLEX

- Commercial software but with academic program

- C/C++/MATLAB API

- MATLAB function linprog (in Optimization Toolbox)

- …

- Complexity (number of pixels: n, the number of

unknown DCT coefficients: U)

- Time complexity: O(n2U)

- Space complexity: O(nU)

32

- PSNR: 22.8142 26.4866

- SSIM: 0.9022 0.9580

- MS-SSIM: 0.8983 0.9461

FRM vs. LP: U=1

Original FRM LP

33

- Statistically LP > FRM (> USO)

FRM vs. LP: U=1 (200 test images)

34

- Naive recovery: U unknown DCT coefficients =

midpoints of the valid ranges (LN/2 for DC

coefficients and 0 for all AC coefficients)

Recovering more than DC: U>1

0 12

35

- LP based recovery

- Note that no existing method can do more than DC

recovery (other than the naïve one).

Recovering more than DC: U>1

0 12

36

- LP is not as practical as you thought!

- Both time and space complexity become too high

for real-time applications when n become relatively

large (e.g. just 512×512 or 1024×1024).

- For 512×512 images, our implementation based on IBM

CPLEX requires 10~30 seconds and >300 MB memory

to solve the easiest problem (DC recovery, U=1).

- When U=2, “Lenna” as the input: out of memory on my

old laptop!

- So we need an algorithm with an even lower time/space

complexity!

Is the LP based method perfect?

37

- When U=1, it is possible to convert the LP problem

to a combinatorial optimization problem on a

min-cost flow network.

- The time complexity is reduced to O(n1.5).

- Experiments showed that the actual time/space

complexity is reduced drastically.

- For 512×512 images, the gain is at the order of 100.

- New research question

- Can we do a similar thing when U >1?

A faster algorithm for DC recovery

(ALENEX 2012)

Sabine Cornelsen, Andreas Karrenbauer and Shujun Li, “Leveling the Grid,” in

Proceedings of the Meeting on Algorithm Engineering & Experiments, Kyoto, Japan,

January 16, 2012 (ALENEX 2012), pp. 45-54, SIAM, 2012

38

- Divide-and-conquer can help!

- Step 1: Partition the large image into smaller regions

(segments)

- Step 2: Run LP based DCT recovery method on each

image block

- Step 3: Run a second LP DC recovery pass on the

whole image

- Ongoing with University of Malaya

- Initial results are positive

- To submit to IEEE Signal Processing Letters

- Will extend to a longer journal paper

A faster algorithm for U>1

39

- Whole DCT coefficients

- Partial DCT coefficients: residuals

- Partial DCT coefficients: sign bits

- Position of DCT coefficients: secret permutations

- …

Can this be applied to other selective

encryption settings?

Integer unknowns LP problem becomes MIP

(mixed integer programming) problem which is

NP-hard so can be much harder to solve!

40

- Our recent (unpublished) work on recovering

DCT sign bits showed positive results.

Recovering DCT sign bits is possible!

41

- Our recent (unpublished) work on recovering

secret permutations of DCT coefficients (within

block) showed positive results.

Recovering secretly permuted DCT

coefficients is possible as well!


A More General Research Problem:

Missing Information in Digital Media

with (Partially) Known Structure

43


- Variables: pixels – x(i,j), DCT coefficients – y(k,l)


- Constraints:

- x=Ay – the 2-D DCT for each block



A more general model

Changing these will lead

to different applications!

44

- (Ciphertext-only attacks on) Selective encryption

- Whole DCT coefficients

- Partial DCT coefficients: residuals, sign bits, …

- Position of DCT coefficients: secret permutations

- Information hiding

- Irreversible information hiding

- Content authentication and self-recovery watermarking

- Image compression (coding)

- Leave some information about DCT coefficients un-

coded to achieve a higher compression efficiency

- Anti-Forensics?

From selective encryption to other areas

45

- All our work focuses on gray-scale images

(one “color” channel).

- Generalization to images with multiple

color channels is straightforward.

- Each color channel can have a separate

optimization process.

- There is normally cross-channel correlation as

well so the multiple optimization processes may

be linked in some way.

- Note that there are images with more than

three channels (e.g. multi-spectral

images).

From one channel to multiple ones

Cosentino, 2014

http://heritagesciencejournal.springeropen.com/articles/10.1186/2050-7445-2-8

http://heritagesciencejournal.springeropen.com/articles/10.1186/2050-7445-2-8

46

- Digital video

- In addition to spatial domain, now we have a temporal

domain where correlations between adjacent frames

can also be considered.

- Different quality levels (if exist) can bring further

correlations that can be exploited

- The intra- and inter-predictive coding methods widely

used in video coding schemes can make the

optimization model difficult to handle.

- Motion compensation may cause complications.

- Digital audio

- Simpler media (1-D)

- Often part of digital video

From digital images to other media

47

- Lapped transforms such as MDCT (Modified DCT)

- Used in audio coding such as MP3 and some

image/video coding schemes e.g. JPEG-XR and VC-1

- DHT (Discrete Hadamard Transform)

- Used in some image/video coding schemes such as

JPEG-XR and H.264/MPEG-4 AVC

- DWT (Discrete Wavelet Transform)

- Used in some image/video coding schemes e.g. JPEG

2000, DjVu and Dirac

- DST (Discrete Sine Transform)

- Used in HEVC

- …

From DCT to other transforms

48

- We do not have to involve a transform in the

optimization model.

- What is more important is the known structure of

the missing information.

- We will look at an example where we effectively

work in spatial domain.

- It calls for significant changes of the optimization

model.

From transforms back to spatial domain


Generalization to Digital

Watermarking: Self-Recovery

Hui Wang, Anthony TS Ho and Shujun Li, “A Novel Image Restoration Scheme

Based on Structured Side Information and Its Application to Image Watermarking,”

Signal Processing: Image Communication, vol. 29, no. 7, pp. 773-787, Elsevier, 2014

50

- A content authentication and self-recovery image

watermarking scheme (Wang et al. IWDW 2011)

- It embeds the mean pixel values of each 44 block as a

self-recovery watermark and uses a linear regression

based method for self recovery.

Digital watermarking in spatial domain

Embedding Extraction/Recovery

51


- Variables: pixels – x(i,j)


- Constraints:

- W2(k) = ((i,j)B(k)x(i,j))/16 – the extracted self-recovery

watermark for each 44 block


- x(i,j)=x*(i,j) – for known pixel values

Can the previous LP based method be

generalized for this application?

52

- It tends to assign all pixel values in each 44 block to the

mean (the extracted watermark) thus creating visible

blocking artefacts.

The simple LP model works but not

perfect

Original Tampered Self-Recovered

53

- Simple LP model > Linear regression based

method

Experimental results with 100 test

images

54


- Variables: pixels – x(i,j), 1st and 2nd order auxiliary

variables – h(i,j,i’,j’) and h’(i,j,i’,j’)

- Objective: minimize f = i,j ((1–)h(i,j)+h’(i,j))

- Here, is a weight between 0 and 1.

- Constraints:

- The same ones as in the simple LP model

- f(x(i,j))h(i,j) and –f(x(i,j))h(i,j)

- f(h(i,j))h’(i,j) and –f(h(i,j))h’(i,j)

- Here, f(x(i,j)) = di{–1,1}dj{–1,1} (x(i,j) – x(i+di,j+dj)) and

f(h(i,j)) = di{–1,1}dj{–1,1} (h(i,j) – h(i+di,j+dj))

A revised model

55

- Experiments on 100 test images showed the

optimal value.

Optimal value of : 0.5

56

- Revised LP model > Simple LP model > Linear

regression based method

The revised LP model outperforms the

simple LP model


Take Home Messages

58

- Missing coefficients in a DCT-transformed image

can be effectively recovered using a general LP

optimization model.

- This general model can be applied to many other

applications.

- There are a large number of open research

questions.

- Welcome to contact me for collaboration on this

topic and beyond!

- Welcome to visit University of Surrey @ Guildford!

Take home messages


Thanks for your attention!

Questions?