How DHS is Doing Cybersecurity With Content Filtering · 2015-05-19 · How DHS is Doing Cybersecurity with Content Filtering TECH-W01 Department of Homeland Security National Protection

SESSION ID:

#RSAC

Tom Ruoff

How DHS is Doing Cybersecurity with

Content Filtering

TECH-W01

Department of Homeland Security

National Protection and Programs Directorate

Office of Cybersecurity and Communication/Chief Technology Office

#RSAC

DHS & Content FilteringBottom Line Up Front

Q1. Why is DHS is working on this?

A1. Because current signature and detonation approaches are not sufficient to allow control of cyber attacks.

Q2. What is better?

A2. Content Filtering. Test results indicate eMIST 3.0.3 is capable of blocking zero day malware at about a

99.5% rate.

Q2. What does DHS want to accomplish?

A3. Strategically – improve cybersecurity. Tactically - stimulate both sides of the supply-demand equation

to significantly enable and enhance cybersecurity posture for Federal Executive Branch Departments and

Agencies as well as critical infrastructure owners and operators Information Technology systems through use

of commercially available technology acquired at market driven cost.

DHS wants to facilitate cybersecurity culture change to move time scale from months to milliseconds

2

#RSAC

DHS & Content Filtering

What You Get Out of This Talk – Agenda

1. Technical understanding of what content filtering is

2. How well it work in neutering malware – test results

3. What DHS is doing with this cool stuff to protect itself

4. What are our next steps

5. What can you do with this knowledge

6. Motivation to use this approach to secure your enterprise

3

#RSAC

DHS & Content Filtering

WHAT IS CONTENT FILTERING?

Defining the terms

4

#RSAC

What is Content Filtering?

A filtering technology based on a robust understanding of the syntactic structure

and semantic meaning of the file type or protocol being filtered to pass

known/validated good content

Uses a bit/byte level understanding of the file – compare to RFC

Decomposed objects into base elements of file type/object protocol specification and then re-

assembles a “clean” version that excludes non-essential components

Requires access to the file type/protocol specification (RFC) and/or extensive reverse engineering

Specs frequently don’t match reality so sometimes the decomposition process fails since the object

does not de-compose per the specification; a Word doc is sometimes not a Word document per the

Word RFC….or a Word document masquerades as a PowerPoint

Not signature based

Resulting file usually very close to original with minimal damage/changes

5

#RSAC

World of Malware – Where Content Filter Fits In

Two types of Malware attacks (1 of 2)

1. Syntactic – The attacker sends incorrect, malformed, or unexpected data to the system in order to execute an exploit. Within syntactic based attacks there are two main variants:

a. Non-compliance with Specification – In this attack, the data does not comply with the file format/protocol specification and the software processing that data does not properly handle it leading to a program crash and possible exploit.

b. Compliance with Specification – In this attack, the data complies with the specification, but an incorrect assumption or decision by the developer on how to implement the specification leads to potential program crash and exploit. For example, suppose a program processes a length delimited file and the specification says that a data field is 128 characters but developer knew that by convention (e.g. common use) that only 16 characters were used so he hardcoded an array to be 16 characters long. If an attacker sent a specification compliant data field with 128 characters of data instead of 16 characters it could lead to a buffer overflow and possible exploit.

6

#RSAC

World of Malware – Where Content Filter Fits in

Two types of Malware attacks (2 of 2)

2. Semantic – The attacker sends structurally correct but logically incorrect data to the

system to cause the device to operate outside of its design parameters (e.g. tell a generator

to operate 20K RPM above its design tolerance of 5K RPM).

7

#RSAC

So Why Does Content Filtering Work?

Most malware very fragile, format conversion changes to the file can

break it (render operationally useless)

Malware likes to misrepresent itself

E.g. a JPEG claiming to be TIFF

Malware exploits defects in parsing, usually by providing a structurally

wrong or logically incorrect file

Malware developers like to hide in the portions of files used for metadata

storage, at the end of the file, between segments/markers in a file, and

via steganographic techniques in the payload of files (e.g. image data)

8

#RSACContent Filtering: Deep Content Inspection & Sanitization

ASSUMPTIONS

1. Detecting malware is really hard so don’t try

2. Malware is fragile so extracting content and re-assembling objects neuters almost all

attacks

3. Exploding the malware is a good start to observe malicious behavior but not entirely

effective

4. Active content within object protocol (Excel formulas) are benign – the rest is assumed

malicious

5. There is a user impact (like rendering URLs inactive) and need to be part of policy settings

6. If the object is not definable (Syntactic attack - kind of a Word 2007…) then policy can

either drop file or pass

9

#RSAC

Content Filtering Methods

Deep Content Inspection and Sanitization

Verifies file complies with specification, then writes out known good content

Format Conversion

Converts a file to another related format before converting back to the original file format (e.g. PDF to PS to PDF)

File Flattening

Converts file to another similar but usually less complex format that doesn’t have the data attack risks of the original (e.g. PPT to series of JPG files)

Canonicalization

Convert contents from specialized form into normalized/raw form (e.g. audio files into PCM)

10

#RSAC

Typical Content Filtering Process

11

Text Dirty Word Search

Based on a “Dirty” and “Clean” word list

Macro removal filter

Images are inspected for format and

sanitized for embedded information or

malware

Embedded objects are inspected up to a

configurable level deep, usually 1

Virus Cleaning

Typical Office

Document

<Image> </Image>

<Excel> </Excel>

<Macro> </Macro>

#RSAC

How Does it Work: MS Office (1 of 2)

Microsoft Office Filters (97-2010), Word (.doc/.docx), Excel (.xls/.xlsx), PowerPoint (.ppt/.pptx) - Processing Steps

1. Validate file type compiles with official specification from Microsoft

(2003 and below) or from Microsoft and the ISO for (2007+)

2. Recursively process MS Office into constituent parts

3. Perform text extraction for dirty word analysis

12

#RSAC

How Does it Work: MS Office (2 of 2)

Microsoft Office Filters (97-2010), Word (.doc/.docx), Excel (.xls/.xlsx), PowerPoint (.ppt/.pptx) - Processing Steps continued

4. Send all non-MS Office components that are supported to other filters.

If file type not supported then either fail the MS Office file or remove

that object from the MS Office*

5. Non-MS Office components are filtered by their respective filters and if

possible reinserted back into the parent MS office document

13

#RSAC

How Does it Work: Imagery

JPEG (.jpg, .jpeg), Windows Bitmap (.bmp/.dib), Windows Metafile

(.wmf), Windows Enhanced Metafile (.emf), Graphics Interchange Format

(.gif), Portable Network Graphics (.png), Tagged Image File Format (.tiff)

Processing Steps:

1. Validate file type compiles with official specification

2. Validate and/or remove metadata

3. Send metadata for dirty word analysis

4. Zeroize the least significant bits of the image data*

5. Rebuild and recompress image

* Does not apply to WMF/EMF files

14

#RSAC

How Does it Work: Compressed Files

PKzip (.zip), UNIX tar (.tar), GNU zip (.gz), BZip2 (.bz2)

Steps:


2. Check excessive levels of embedding (zip/tar)

3. Extract directory structure data

4. Extract all the files and throw away the container

5. Filter files

6. Rebuild container by reinserting filtered files. Failed files are replaced with zero byte files

15

#RSAC

How Does it Work: Text

Text files (.txt/.csv/.log) – Support 7 bit/8 bit ASCII and

Unicode UTF-8 - Steps

1. Validate the file is non-executable textual data

2. Apply Regular Expressions to data (usually to neuter URLs)

3. Apply Dirty Word Filter to textual by rotating through a series of

commonly used Code Pages (e.g. character encodings)

16

#RSAC

How Does it Work: PDF

Adobe Portable Document Format (PDF) - Processing Steps


2. Perform text extraction for Dirty Word Analysis

3. Convert PDF to Postscript (PS) then back to PDF

4. Validate that encrypted and JavaScript content were removed

17

#RSAC

Content Filtering Lab Test Results

Methodology for determining eMIST’s effectiveness at neutralizing malware and determining false positive rates:

1. Collect presumed good and malicious test data.

2. Verify the malicious data using established test bed.

3. Configure eMIST v3.0.3 with the appropriate policies, network configuration, etc.

4. Process files through eMIST v3.0.3.

5. Record output results (e.g., passed, modified, rejected) for each file, per file type.

6. Evaluate malicious test set output files for malicious content using established test bed.

7. Analyze results and calculate 95% confidence-level ranges.

18

#RSAC

How Well Does Content Filtering Work – Lab Results

At 95% Confidence Factor

File Type Block/Cleansing Rate

(479 Policy)

Block/Cleansing Rate

(Basic Policy)

Doc 95.28% ± 2.02% 98.63% ± 1.56%

Ppt 80.48% ± 24.76% //99% 71.92% ± 33.67% /99%

Pdf 99.80% ± 0.16% 99.87% ± 0.18%

Xls 96.62% ± 1.33%//98% 98.06% ± 1.43%//98%

Gif 98.22% ± 2.50% //100% 96.56% ± 4.78% //100%

Jpg 2.91% ± 1.33% 2.88% ± 1.86%

Rtf N/A//99.8% N/A//99.8%

19

#RSAC

How Well Does Content Filtering Work – Lab Testing

File Type False Positive Rate (479 Policy) False Positive Rate (Basic Policy)

doc 4.28% ± 0.79% 4.27% ± 1.12 ppt 5.36% ± 1.53% 5.68% ± 2.21% xls 8.26% ± 2.94% 8.73% ± 4.23% docx 5.03% ± 0.50% 44.55% ± 1.62%pptx 15.39% ± 1.10% 25.81% ± 1.89% xlsx 16.73% ± 2.37% 19.16% ± 3.52% pdf 1.49% ± 0.20% 3.39% ± 0.43% gif 1.73% ± 0.58% 1.82% ± 0.84% tiff 1.32% ± 0.32% 1.36% ± 0.46% jpg 1.45% ± 0.31% 1.36% ± 0.42%png 1.66% ± 0.29% 1.83% ± 0.42%bmp 1.88% ± 0.53% 2.03% ± 0.78% wmf 1.25% ± 0.56% 1.31% ± 0.81% emf 1.35% ± 0.42% 1.28% ± 0.57%

95% Confidence Factor

False Positive Rate

20

#RSAC

Review of Lab Testing

Results from testing indicate eMIST 3.0.3 appears to be capable of blocking zero day malware at about a 99.5% rate

Pass rate is 98.5%, can be improved by tailoring dirty word list

OR

If object is not defined then send to secondary inspection process since this means the object may be malicious –take a systems approach

21

#RSAC

DHS Operational Testing of eMIST 3.0.3

We will put eMIST 3.0.3 in our operational network (LAN A)

to assess operational malicious content kill rate

22

Test results forthcoming: we ran into operational issues so test results need to be

verified before public release

#RSACeMist Mail Content Filtering Combined with Behavior-

based Tools

23

Internet

DHS SOCOneNet DC2 LAN-A

OneNet

Hub

Transport

Server

@dhs.gov

Email Server

MS Outlook

ClientMain Inbox

Current @dhs.gov email path


based Tools

24

Internet


OneNet

Hub

Transport

Server

@dhs.gov

Email Server

eMist

Email Server

CS&C Participants – EPP-

equipped Laptops

eMist

Pilot adds Endpoint Protection (EPP)-equipped laptops, an EPP server, and the eMist Mail Content Filtering tool

EPP EPP

EPP EPP

EPP

EPP EPP

EPP EPP

EPP EPP

EPP EPP

EPP EPP

EPP EPP

#RSAC

EPP

eMist Mail Content Filtering Combined with Behavior-

based Tools

25

Internet


OneNet

Hub

Transport

Server

@dhs.gov

Email Server

eMist

Email Server

CS&C Participants – EPP-

equipped Laptops

eMist

Email traffic entering dhs.gov is replicated and

goes to both primary Outlook server and eMist


based Tools

26

eMist

eMist extracts embedded attachments in emails and

cleans them

Emails are reconstructed with their now-cleansed attachments re-inserted

#RSAC

MS Outlook

Client

EPP

eMist Mail Content Filtering Combined with Behavior-

based Tools

27

Internet


OneNet

Hub

Transport

Server

@dhs.gov

Email Server

eMist

Email Server

CS&C Participants

EPP-equipped Laptops

eMist

Pilot participants with EPP laptops have Outlook Clients connect to 2

inboxesAllows EPP tools to detect malicious behavior from files originating from

either email inboxMain Inbox

Test Inbox


based Tools

28

Internet


OneNet

Hub

Transport

Server

@dhs.gov

Email Server

eMist

Email Server

MS Outlook

Client

Test Inbox

Main Inbox

CS&C Participants

EPP-equipped Laptops

eMist

EPP on laptop monitors for and alerts on suspicious behaviors, including reference

to files that are source of suspect behaviors


based Tools

29

EPP

EPP-detected behaviors from laptops

Data aggregated by EPP server now supports

multiple cybersecurity activities

EPP


based Tools

30

EPP-detected behaviors from laptops

Malicious items successfully blocked by

eMist/ missed by current mechanisms

EPP

EPP


based Tools

31

EPP-detected behaviors .gov emails

EPP-detected behaviors eMisttest emails

Malicious items not blocked by eMist – candidates for tuning,

signature development, or heuristics

EPP

EPP

#RSAC

DHS Use of Content Filtering

What DHS is doing with content filtering to promote its use?

We put eMIST 3.0.3 and follow-on commercial in our operational network (LAN A) to assess operational malicious content kill rate –slide show

Will use evidence to justify and encourage procurement of commercial content filtering products

Partnering with vendors to advance state of art for email and web content filtering

32

#RSAC

DHS Use of Content Filtering

What is DHS Doing next with content filtering?

Programming next set of commercial product tests and operational demonstrations of kill rate – email and web

Planning next set of operational tests using a TBD commercial product to perform content filtering on DHS LAN A email

Focus will be on sanitization rate, usability and availability

Using evidence to justify and encourage procurement of commercial content filtering products

Partnering with vendors to advance state of art for email and web content filtering

33

#RSAC

What Can YOU Do with this Knowledge?

1. Research content filtering technology – become smarter on “pass

known good” approach

2. Become familiar with current commercial state of art

3. Go get some and protect your networks!!!

4. Demand vendors improve offerings – the demand side of

supply/demand

5. Developers: Go make better commercial offerings to advance

state of art and lower cost through competition

34

#RSAC

Parting Words - Motivation

1. This approach works – 98% zero day kill rate

2. It is not monetarily costly, sort of depends…

3. This approach impacts user experience (based upon policy to

block/pass undefinable objects) – this is a good thing as it re-

sets expectations for “cost of security”

4. Really drives bad guys cost up – makes their job harder so

maybe we are being strategically impactful

5. Soooo, go get some…..market research!

35

Documents

How DHS is Doing Cybersecurity With Content Filtering · 2015-05-19 · How DHS is Doing Cybersecurity with Content Filtering TECH-W01 Department of Homeland Security National Protection