21
A Memory-Efficient Re configurable Aho-Corasick FSM Implemen tation for Intrusion Dete ction Systems Authors: Seongwook Youn and Dennis McLe od Presenter: Yi-Sheng, Lin Date: Sep. 11, 2007 Publisher/Conf. : IC-SAMOS 2007 Dept. of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

  • Upload
    kiona

  • View
    37

  • Download
    1

Embed Size (px)

DESCRIPTION

A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems. Authors: Seongwook Youn and Dennis McLeod Presenter: Yi-Sheng, Lin Date: Sep. 11, 2007 Publisher/Conf. : IC-SAMOS 2007. - PowerPoint PPT Presentation

Citation preview

Page 1: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

A Memory-Efficient Reconfigurable

Aho-Corasick FSM Implementation for Intrusion Detection S

ystems Authors: Seongwook Youn and Dennis McLeod Presenter: Yi-Sheng, Lin Date: Sep. 11, 2007 Publisher/Conf. : IC-SAMOS 2007

Dept. of Computer Science and Information Engineering National Cheng Kung University,

Taiwan R.O.C.

Page 2: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

Outline

1. Introduction

2. The Aho-Corasick Algorithm

3. The Split-AC Algorithm

4. IDS Rule Sets and Partitioning

5. Implementation and Evaluation

6. Conclusions

Page 3: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

Introduction

The Aho-Corasick (AC) algorithm is a very powerful but memory-hungry pattern matching algorithm

The main contribution of the "Split-AC“ algorithm requires between 28-75% less memory than the state-of-the-art AC implementation.

A secondary contribution

performs packet header classification, matches one pattern per rule, fits on a single FPGA chip

Page 4: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

The Aho-Corasick Algorithmbat

batch

cat

Page 5: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

The Aho-Corasick Algorithm (Cont.)

The AC algorithm has the significant advantage : Every text character is examined only once, i.e. the lookup cost is

O(N) where N the length of the text.

The major disadvantage of AC algorithm :

It requires large amounts of memory in a straightforward implementation

Page 6: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

The Split-AC Algorithm

Most patterns use only a small subset of the 256 possible characters

Some pattern characters are frequent and appear in a transition almost in every state while others appear infrequently

Partition the single, big FSM into smaller ones Using character translation in order to compress the 8-bit

character input value into using fewer bits

Page 7: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

The Split-AC Algorithm(Cont.)

Using a combination of RAM and CAM (Content-addressable memory ).

Frequently used transitions are kept in a regular table

Infrequent are kept in a CAM

Default transitions (i.e. return to the initial state) are merged into a single entry.

Page 8: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

Split-AC Architecture

Page 9: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

The Split-AC Algorithm(Cont.)

Define : f (c) : the occurrence frequency of character c in

transitions of this fsm Tf : the frequency threshold f (c) > Tf : transitions made on frequent characters are c

alled frequent transitions. 0 < f(c) < Tf : transitions made for these characters are c

alled infrequent transitions. Characters with a zero occurrence frequency are non-exi

stent characters.

Page 10: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

The Split-AC Algorithm(Cont.)

1) The input character c is used to address the character translation table, which produces the translated character c'.

2) The state lookup table and the CAM are addressed concurrently to find the next state. If c' is a "frequent“ character, the answer is offered by the state lookup table, otherwise by the CAM.

3) Check if the next state is also a "final" state. If it is "final", a pattern was located and the appropriate action must be taken.

Page 11: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

The Split-AC Algorithm(Cont.)Split-AC Optimizations

In order to reduce the CAM size

1. Assigning sequential numbers to all fsm states that have at least one infrequent transition

2. The grouped states share a common prefix of s -s' bits and have a individual suffix of s‘ bits.

3. comparing the remaining s -s' bits in the data portion of the CAM

p.s. : The idea here is that usually fewer characters appear in the CAM than in the state lookup table.

Page 12: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

IDS Rule Sets and Partitioning

Using between 3 and 5 different header parameters to classify the rules, which are (depending on the packet type):

. TCP/UDP: protocol, source/destination IP address, source/destination ports. . ICMP: protocol, source/destination IP address, ICMP type. . IP: protocol, source/destination IP address.

Page 13: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

Split-AC Architecture

Page 14: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

IDS Rule Sets and Partitioning

This rule grouping method results in approximately 400 rule sets.

Assigning one Split-AC fsm to each rule set then some of The resulting fsms will still have a large number of states

Partition large rule sets into smaller subsets which run concurrently

Page 15: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

IDS Rule Sets and Partitioning

Split- AC fsm with 512 states (9 state bits) and 128 different frequent characters (7 translated character bits). 512 * 128 * 9 + 256 * 7 = 576K + 1.75K = 577.5 Kbits.

Divided into 8 smaller fsms with 64 states (6 state bits) and 64 different frequent characters (6 translated character bits).

8* (64*64*6+256*6) = 8* (24K+1.5K) = 204 Kbits The cost in this tradeoff is increased control logic, since

we now have 8 fsms running concurrently instead of 1.

Page 16: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

Implementation and Evaluation

Page 17: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

Implementation and Evaluation

The rule set subdivision process can generate up to nearly 900 subsets each of which requires two different memories.

Virtex4 FPGAs have a maximum of 552 dual-port memory blocks

The solution to this problem is :

1. allocate data from mutually exclusive fsms into the same FPGA block.

2. allocating non-exclusive fsm data into different ports in the same FPGA block

p.s. : mutually exclusive : The fsms can never both be active for the same packet.

Page 18: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

Implementation and Evaluation

Page 19: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

Implementation and Evaluation

Page 20: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

Implementation and Evaluation

Page 21: A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

Conclusions

Split-AC is small in terms of memory footprint but is not the fastest alternative to string matching.

A future course of research could be to improve the Split-AC architecture in order to make it faster.