23
1 IEEE-WVU, Anchorage - 2008 The Unseen Challenge Data Sets Anderson Rocha Walter Scheirer Siome Goldenstein Terrance Boult

IEEE-WVU, Anchorage - 2008 1 The Unseen Challenge Data Sets Anderson Rocha Walter Scheirer Siome Goldenstein Terrance Boult

Embed Size (px)

Citation preview

1IEEE-WVU, Anchorage - 2008

The Unseen Challenge Data Sets

Anderson Rocha Walter Scheirer

Siome Goldenstein Terrance Boult

2IEEE-WVU, Anchorage - 2008

The Data Sets

• Two data sets are provided– PNG: lossless compression– JPEG: lossy compression

• Prevalence of images on the Internet– Sources: Google images, Yahoo Images,

and Flickr

3IEEE-WVU, Anchorage - 2008

Message Sizes• For each tool, we provide four different

embedding size:– Tiny: < 5% of the channel capacity– Small: > 5% & < 15% of the channel capacity– Medium: > 15% & < 40% of the channel capacity– Large: > 40% of the channel capacity

• For the PNG set, the message size is explicitly stated

• For the JPEG set, the message size is NOT stated

4IEEE-WVU, Anchorage - 2008

Message Content

• Random bit sequences

• Snippets of mp3 songs

• Plain text

• Other images

A B C

5IEEE-WVU, Anchorage - 2008

Categories

• Each set consists of clean and stego images• Clean set

– Modified: cropping, overlay, object-appending– Non-modified: original

• Stego set– 4 categories for JPEG, 3 categories for PNG, one

for each tool

6IEEE-WVU, Anchorage - 2008

Categories

• JPEG subcategories– Stego

• Animals• Business• Maps• Natural• Tourist• Vacation

– Clean• Misc

QuickTime™ and a decompressor

are needed to see this picture.

7IEEE-WVU, Anchorage - 2008

Clean Manipulated Images

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

Object Appending

Image Cropping

Overlay

8IEEE-WVU, Anchorage - 2008

PNG Tools

• Camaleão (http://www.ic.unicamp.br/~rocha/sci/stego)

– Simple LSB insertion/modification software

– Uses cyclic permutations and block ciphering to hide messages in LSBs

• SecurEngine

(http://www.sharewareplaza.com/SecurEngine-download_4268.html)

– Incorporates 5 crypto algorithms: Blowfish, Gost, Vernam, Cast256, and Mars

– LSB encoding

9IEEE-WVU, Anchorage - 2008

PNG Tools

• Stash-It (http://www.smalleranimals.com/stash.htm)

– Windows based stego tool– Simple LSB insertion/modification software– No encryption feature

10IEEE-WVU, Anchorage - 2008

JPEG Tools• F5

(http://www.inf.tu-dresden.de/~aw4)– Resilient to 2 statistical attack– Instead of replacing LSBs directly, F5 decreases the

absolute value of the DCT coefficients– Chooses DCT coefficients randomly– Matrix embedding

• JPHide(http://linux01.gwdg.de/~alatham)– Uses blowfish to generate a stream of pseudo-random

control bits to define bit encodings – Large embeddings trivial to detect

11IEEE-WVU, Anchorage - 2008

JPEG Tools• JSteg

(http://zooid.org/~paul/crypto/jsteg)– 40 bit RC4 Encryption– Channel capacity determination– LSB encoding in quantized DCT coefficients

• Outguess(http://www.outguess.org/detection.php)– Preserves statistics based on frequency counts– Seed based iterator available to choose embedding locations– Change minimization calculation for each seed– Remains one of the most difficult tools to detect

12IEEE-WVU, Anchorage - 2008

PNG Data Set - Breakdown

• TrainingTiny Small Medium Large

Camaleão 400 400 400 400

SecurEngine 380 387 385 380

Stash-It 399 400 400 400

Total 1,179 1,187 1,185 1,180

Non-Modified

2,000

Append-Modified

666

Crop-modified

667

Overlay-modified

667

Total 4,000

4,731 total images in the PNG stego category

4,000 total images in the PNG clean category

13IEEE-WVU, Anchorage - 2008

PNG Data Set - Breakdown

• TestingTiny Small Medium Large

Camaleão 250 250 250 250

SecurEngine 250 250 250 243

Stash-It 250 250 250 250

Total 750 750 750 743

2,993 total images in the PNG stego category

14IEEE-WVU, Anchorage - 2008

JPEG Data Set - Breakdown

• TrainingF5 JPHide JSteg Outguess

Animals 1,732 2,127 244 436

Business 3,779 - 124 11

Maps 3,361 - 112 68

Natural 5,211 1,113 232 70

Tourist 4,968 1,721 268 160

Vacation 2,960 353 100 35

Total 22,011 5,314 1,080 780

29,185 total images in the JPEG stego category

15IEEE-WVU, Anchorage - 2008

JPEG Data Set - Breakdown

• TrainingAnimals-Non-modified 61

Business-Non-modified 31

Maps-Non-modified 28

Natural-Non-modified 58

Tourist-Non-modified 67

Vacation-Non-modified 25

Misc-Non-modified 1,996

Misc-Append-modified 665

Misc-Crop-modified 666

Misc-Overlay-modified 662

Total 4,259

29,185 total images in the JPEG stego category

16IEEE-WVU, Anchorage - 2008

JPEG Data Set - Breakdown

• TestingTiny Small Medium Large

F5 250 250 250 250

JPHide 240 322 318 101

Jsteg 198 202 199 198

Outguess 0.2 481 421 - -

Outguess 0.13 491 425 - -

Total 1,660 1,620 767 549

4,596 total images in the JPEG stego category

17IEEE-WVU, Anchorage - 2008

Sample Usage: stegdetect

• JPEG Training SetDetected, C Detected, I No Steg

Clean - 360 3809

F5 22011 0 0

JPHide 4506 604 204

JSteg 638 421 21

Outguess 0.13 220 10 295

Outguess 0.2 13 5 237

Detected, C: correct algorithm detected

Detected, I: incorrect algorithm detected

Overall false detect rate for the clean image set is 8.6%

18IEEE-WVU, Anchorage - 2008

Sample Usage: stegdetect

• JPEG Testing SetDetected, C Detected, I No Steg

Clean - 899 79

F5 0 216 784

JPHide 333 153 495

JSteg 444 353 0

Outguess 0.13 206 31 679

Outguess 0.2 32 37 833

Overall false detect rate for the clean image set is 8.0%

19IEEE-WVU, Anchorage - 2008

Sample Usage: stegdetect

• Detailed results for JPHide Test Set

Large Medium Small Tiny

Detected, C 51 249 25 8

Detected, I 47 61 35 10

Negative 3 8 262 222

20IEEE-WVU, Anchorage - 2008

Sample Usage: stegdetect

• Conclusions– Significant differences between the results

of training and testing• Weaker performance overall for testing• Designed difficulty of testing set

– Stegdetect performs poorly for large embeddings (non-intuitive), as well as small and tiny embeddings (expected)

21IEEE-WVU, Anchorage - 2008

The Unseen Challenge Data Sets

• Lossy (JPEG) and Lossless (PNG) imagery

• 3 tools for PNG set, 4 tools for JPEG set

• 4 distinct embedding sizes for PNG, varying sizes for JPEG

• Clean imagery across all sets

22IEEE-WVU, Anchorage - 2008

The Unseen Challenge Data Sets

• Valid approaches for use:– Detection– Detection and recovery (size or content)– Detection and destruction– Fusion

No standard data set exists for steg evaluation!

This set is a step in that direction!

23IEEE-WVU, Anchorage - 2008

Download!

http://www.liv.ic.unicamp.br/wvu/datasets.php