Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation

1

Using Cell Processors for Intrusion Detection through Regular Expression Matching with

Speculation

Author:C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin Cristea

Publisher:2011 International Conference on Complex, Intelligent, and Software Intensive Systems

Presenter:Ye-Zhi Chen

Date:2011/9/7

2

Introduction

Main purpose :determine whether incoming network traffic matches

known attack signatures

Bottleneck : existing signature matching algorithms can scan only one byte

at a time

Intrusion Detection System (IDS) : an effective way to provide a degree

of security to computers connected to network based on string matching.

An Internet worm in an incoming network packet is usually identified by a

string representing the executable program’s name in the packet payload

3

Introduction

Hardware based solutions : FPGA implement specific string matching

algorithms, making use of the high parallelism available

Ex : Bloom filters , DFA

Run an adapted Speculative Parallel Pattern Matching(SPPM)

Algorithm on IBM Cell Broadband Engine (Cell BE)

4

Intrusion Detection System

Three methodologies :signature-based、 anomaly-based、 stateful

protocol analysis

A. DFA matching :

1. Most signature databases contain several regular expressions which can be

combined together into a single large DFA

2. DFAs for distinct signatures are combined into a single DFA that

simultaneously represents all the signatures

3. A DFA is a quintuple (Σ; S; s0; δ ; F) : Σ is the input alphabet ; S is a finite

set of states; s0 is the initial state belonging to S ; δ is the transition

function ; F is the set of final or accepting states . If an accepting state has

been reached then an attack signature has been found.

5


0 1

S1 S2 S1

S2 S1 S2

6


In the algorithm, the memory access to read the value at for a certain input

character for a certain current state would take several processor cycles

In the worst case, when the entire input string is scanned, the performance

of the serial algorithm is at least M * | I | cycles, where| I | is the length of

the input string and M is the number of processor cycles needed to read an

input character

multi-byte matching methods : In the ideal case, consuming B bytes of the

input string at a time can result in a performance of M * | I | / B

7


B. Regular Expression Matching with Speculation

1. The main idea behind SPPM is to divide the input string into several

chunks of the same size and process them in parallel

2. Initialization stage : the input string is split into two chunks and the state

variables for the Primary and Secondary threads are initialized.

3. Parallel processing stage : they scan their private chunks in lockstep. If a

match is found by either one of them then the algorithm terminates

4. Validation stage : the Primary continues to scan the Secondary’s chunk

8


Three possible outcomes arise:

1. A match is found and the algorithm returns success

2. Coupling occurs before the end of the second chunk

3. The entire second chunk is traversed again and no match is foundFound at Parallel processing

stage Found at Validation processing stage

Not Found

9


This paper adapted the SPPM algorithm to make use of parallel hardware,

using all the processing units available.

The most favorable case : speedup factor would be K , which K is total

number of processing units (in parallel stage)

If a match is not found in the parallel processing stage, then a possible

speedup gain could occur in the validation stage if the coupling between

two right neighbors occurs.

The least favorable case : when a match is not found and the entire input

buffer is scanned, the complexity of the SPPM algorithm is the same with

the one of the serial algorithm.

10


11

Cell Intrusion Detection

Cell processor can be split into four components:

1. External input and output structures

2. Power Processing Element (PPE) : main processor

3. Synergistic Processing Elements (SPEs) : Eight coprocessors

4. Element Interconnect Bus (EIB) :A specialized high bandwidth circular

data bus connecting the PPE, input / output elements and the SPEs

12


PPE :

A 64 bit PowerPC architecture based microprocessor

It runs at a clock speed of 3.2 GHz.

Running the O.S and coordinating the SPEs

It has 32KB L1 cache 512KB L2 cache

13


SPE :

Each SPE contains a Synergistic Processing Unit (SPU) , memory flow

controller, a memory management unit, a bus interface and an atomic unit

RISC processor

Each SPE has 128 128-bits registers

Support for Single Instruction Multiple Data (SIMD) instructions

Suitable for efficient loop unrolling and instruction scheduling.

Each SPE has 256 KB of local store memory (LS), which the SPU can

access it directly

Use DMA transfers , because SPEs can’t access directly the main memory

of the PPE.

14


Three different programs to perform DFA matching :

1. single-threaded DFA

2. Using the speculative parallel pattern matching solution (2 SPEs)

3. Using the speculative parallel pattern matching solution(8 SPEs)

15


16


Implement

Step 1 :Scan and Parse the input file and then bring the DFA

Step 2 :Divide input string into several chunks of a specified length by an

input string divider

Step 3 :These chunks are then matched through the DFA

17


if the state is an accepting one, that fact is shown by the presence of the string

a() after the state number

18


The parser uses three buffers to scan and parse the input file :

The first one is used to store an entire line from the file.

The second buffer is used to hold the state transition part of the line read

The third buffer is used to hold each element of this state transition array

and we store this value in the corresponding position in the DFA data

structure.

19


DFA data structure :

four main fields :

1. States

2. Final : an array of STATES_NO rows and SYMBOLS_NO_MIN columns

3. Start : starting state of DFA

4. STATES_NO : total number of states

Additional field

dummy :Because the DFA has a size greater than one maximum DMA

transfer (16KB), we choose this field to have the remaining number of

bytes to make the entire size of the structure multiple of 16KB

20


DFA matching for 2 Cell SPUs :

1. PPU waits for strings to process and divides them into two chunks

2. PPU passes the two chunks to the two SPUs(called Primary and

Secondary)

3. SPUs run DFA matching algorithm and return the results to PPU.

4. Based on the result , PPU decides whether the Primary SPU should begin

the validation stage.

Parallel approach for 8 processing units :

5. divide the eight SPUs into four pairs of two which run the two-threaded

speculative algorithm

6. Do the same thing described above

21


A DFA with more than 1500 states won’t fit into the local store of the SPUs

Solution for large DFAs :

1. Made several input files containing smaller DFAs (550 states is sufficient)

2. By combining together these smaller DFAs, we obtain the large DFA

3. Used the double-buffering technique which consists in issuing a DMA

transfer and not waiting for its completion

22

Result

23

Result

Documents

Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation