44
POLITECNICO DI MILANO Regular Expression Matching for NIDS Computation [email protected] 3d DRESD 2008

3rd 3DDRESD: ReCPU 4 NIDS

Embed Size (px)

Citation preview

Page 1: 3rd 3DDRESD: ReCPU 4 NIDS

POLITECNICO DI MILANO

Regular Expression Matching for NIDS

Computation

[email protected]

3d DRESD 2008

Page 2: 3rd 3DDRESD: ReCPU 4 NIDS

Rationale and objectivesRationale and objectives

Growing demand for high-speed packet analisys in network devices

Exploit high-speed regular expression matching in hardware accelerated Intrusion Detection System devices

Analisys of the ReCPU architecture, adapting and it to NIDS computation and implementing it on FPGA board

2

Page 3: 3rd 3DDRESD: ReCPU 4 NIDS

Presentation OutlinePresentation Outline

Pattern matching: State of the ArtProposed approach: ReCPUNIDS overviewConclusions and Future Works

3

Page 4: 3rd 3DDRESD: ReCPU 4 NIDS

What’s nextWhat’s next

Pattern matchingState of the ArtLimitations

Proposed approach: ReCPUNIDS overviewConclusions and Future Works

4

Page 5: 3rd 3DDRESD: ReCPU 4 NIDS

Pattern matching: State of the Pattern matching: State of the artart

3 possible approaches:

AUTOMATON-BASED (DFA or NFA)Pros: Deterministic execution time (linear in best cases and exponential in worst ones), direct support of regular expressionsCons: Might consume much memory without compressing data structure

HEURISTIC-BASEDPros: Can skip characters not in a match, sublinear execution time on averageCons: Might suffer from algorithmic attacks in the worst case

FILTERING-BASEDPros: Memory efficient in the bit vectorsCons: Might suffer from algorithmic attacks in the worst case, since relies on the assumption that the signature rarely appears. Do not natively support wildcards and repetitions

5

Page 6: 3rd 3DDRESD: ReCPU 4 NIDS

LimitationsLimitations

Signature-matching intrusion detection systems have

two types of performance limitations: 1)CPU-bound limitations that arise due to string-matching 2) I/O-bound limitations caused by the overhead of reading packets from the network interface card (the number of packets may overwhelm the IDS internal packet buffers).

As to the first, it is possible to offload IDS computation to embedded hardware devices

6

Page 7: 3rd 3DDRESD: ReCPU 4 NIDS

What’s nextWhat’s next

Pattern matching: State of the Art

Proposed approach: ReCPURE as a programming languageReCPU architectureThe complete frameworkAdaptability of the design

NIDS overviewConclusions and Future Works

7

Page 8: 3rd 3DDRESD: ReCPU 4 NIDS

Proposed Approach: ReCPUProposed Approach: ReCPU

ReCPU: a new hardware approach for regular expression matchingDeveloped by M. Paolieri, I. Bonesana (ALaRI) and M.D. Santambrogio (Politecnico di Milano) for DNA sequencing matchingIt is a parallel and pipelined architecture able to deal with Regular Expressions (RE) as a programming languageNo need of either Deterministic or Non-deterministic Finite AutomatonNo need of additional setup-time when the pattern to search changes, it just requires to update the instruction memory with the new RE (without modifying the underline hardware)

8

Page 9: 3rd 3DDRESD: ReCPU 4 NIDS

ReCPU instructions 1/2ReCPU instructions 1/2

9

( call

ABCD compare with “ABCD”

)| return and OR operator

( call

aacd compare with “aacd”

e) compare with “e” and return

NOP end of RE

Regular Expressions (RE) as a programming languageA RE is a sequence of instructions to be executed by the ReCPU processor

Example: RE= (ABCD)|(aacde) using a 4-comparator cluster

Page 10: 3rd 3DDRESD: ReCPU 4 NIDS

ReCPU instructions 2/2ReCPU instructions 2/2

10

Operators like * and + corresponds to loop instructions (finding more occurrences of the same pattern looping on the same RE instruction)Parentheses are managed as function calls : an open parenthesis is mapped as a call while a close on is mapped as a return

Whenever an open parenthesis is encountered, the current context is pushed into an entry of the data-stack

A RE is completely matched whenever a NOP instruction is fetched from the instruction memory

Page 11: 3rd 3DDRESD: ReCPU 4 NIDS

Instruction formatInstruction format

The binary code produced by the compiler (see later) is composed of Opcode and Reference

The Opcode is divided into 3 slices: 1. the MSB indicates an open parenthesis,2. the next 2 bits indicates the internal operand (i.e. used

within the characters of the reference),3. The last bits stand for the external operand (i.e. loops and

closed parenthesis)

11

Page 12: 3rd 3DDRESD: ReCPU 4 NIDS

Bitwise representation of the opcodesBitwise representation of the opcodes

12

Page 13: 3rd 3DDRESD: ReCPU 4 NIDS

ReCPU test configurationReCPU test configuration

Block diagram of ReCPU with 4 Clusters, each of those has a ClusterWidth of 4. The main blocks are: Control Path and Data Path composed by a Pipeline of Fetch/Decode and Execution stages

13

Page 14: 3rd 3DDRESD: ReCPU 4 NIDS

Architecture description 1/5Architecture description 1/5

14

ArchitectureDesign a dedicated adaptable architectureExploit well-known microarchitectural techniquesHigh level of parallelism exploitedThroughput higher than 1 character per clock cycleRequires just O(n) memory locations, where n is the lenght of the RE

Page 15: 3rd 3DDRESD: ReCPU 4 NIDS

Architecture description 2/5Architecture description 2/5

15

Architecture details:Harvard-based architecture

Parallel accesses to memories

Parallel execution of multiple comparisons

Two-stages pipelineInstructions - and data- prefetching to avoid pipeline stalls

Page 16: 3rd 3DDRESD: ReCPU 4 NIDS

Architecture description 3/5Architecture description 3/5

several parallel comparators - grouped in units called Clusters - are placed in the Data Path Each comparator compares an input text character with a different one from the pattern The number of elements of the cluster is indicated as ClusterWidth and it represents the number of characters that can be compared every clock cycle whenever a sub-RE is matching a bigger ClusterWidth corresponds to much better performance whenever the input string starts matching the RE because a wider sub-expression (i.e. an instruction) is processed in a single clock cycle

16

Page 17: 3rd 3DDRESD: ReCPU 4 NIDS

Architecture description 4/5Architecture description 4/5

The architecture is composed by several Clusters the total number is indicated as NCluster.Each comparator Cluster processes the input text shifted by one character with respect to the previous cluster .Increasing the number of NCluster more characters are checked in parallel, and so ReCPU results to be faster whenever the pattern is not matching the input textDue to the higher hardware complexity the critical path increases and the maximum possible clocking frequency decreases

17

Page 18: 3rd 3DDRESD: ReCPU 4 NIDS

Architecture description 5/5Architecture description 5/5

18

Each cluster is shifted of one character from the previous in order to cover a wider set of data in a single clock cycle.

Page 19: 3rd 3DDRESD: ReCPU 4 NIDS

exampleexample

19

Comparator clusters working on an input text.

The top and bottom pictures correspond to two subsequent clock cycles.

Page 20: 3rd 3DDRESD: ReCPU 4 NIDS

Data Path 1/2Data Path 1/2

The ReCPU Data Path can: fetches the instructiondecodes it verifies whether it matches the current part of the text or not.

The ReCPU Data Path cannot:identify the result of the whole RErequest data or instructions from the external memories.These task are managed by the Control Path (see later)

20

Page 21: 3rd 3DDRESD: ReCPU 4 NIDS

Data Path 2/2Data Path 2/2

The pipeline is composed by two stages: Fetch/Decode and Execute. The Control Path spends one cycle to precharge the pipeline and then it starts exploiting the prefetching mechanism. In each stage were introduced duplicated buffers to avoid stalls. Hence, we have a reduction of the execution latency with a consequent performance improvement.when an RE starts matching, one buffer is used to prefetch the next instruction and the other is used as backup of the first one. In case that the matching process fails (i.e. prefetching is useless) the backup instruction can be used without stalling the pipeline

21

Page 22: 3rd 3DDRESD: ReCPU 4 NIDS

Control Path 1/2Control Path 1/2

22

Page 23: 3rd 3DDRESD: ReCPU 4 NIDS

Control Path 2/2Control Path 2/2

The core of the Control Path is a Finite State Machine

23

Page 24: 3rd 3DDRESD: ReCPU 4 NIDS

Non matching stateNon matching state

While not matching the text, the same instruction address is fetched and the data address advances performing the comparison by means of the clusters inside of the Data PathIf no match is detected the data memory address is incremented by the number of clusters This way several characters are compared every single clock cycle leading to a throughput i.e. clearly more than one character/cc.

24

Page 25: 3rd 3DDRESD: ReCPU 4 NIDS

Matching stateMatching state

When an RE starts matching, the FSM goes into EX_M state and the ReCPU switches to the matching mode by using a single cluster comparator to perform the pattern matching task on the data memory. As for the previous case more than one character per clock cycle is checked by the different comparators of a cluster. When the FSM is in this state and one of the instructions composing the RE fails the whole process has to be restarted from the point where RE started to match.

25

Page 26: 3rd 3DDRESD: ReCPU 4 NIDS

The complete FrameworkThe complete Framework

26

The Framework

Page 27: 3rd 3DDRESD: ReCPU 4 NIDS

Adaptability of the designAdaptability of the design

27

The VHDL implementation fully-configurable: it is possible to modify some architectural parameters such as:

number and dimensions of the parallel comparator units (ClusterWidth and NCluster)width of buffer registers and memory addresses

This way it is possible to define the best architecture according to the user requirements, finding a good trade-off between timing, area constraints and desired performance

Page 28: 3rd 3DDRESD: ReCPU 4 NIDS

The compilerThe compiler

Compiler Translation of standard high-level RE into ReCPU machine code instructionsAdaptation of text to data memoryInspired from VLIW design style, where architectural parameters are exposed to the compiler in order to exploit the parallelism issuing the instructions to different parallel units

28

Page 29: 3rd 3DDRESD: ReCPU 4 NIDS

Design Space ExplorationDesign Space Exploration

Design Space Exploration to determine optimal architecture configurations on different Xilinx FPGAs

• Changing the number of parallel units

– between {2, 4, 8, 16, 32, 64}

• Definition of a cost function

Tx are time/char:

Tcnm : not matching with AND operator

Tonm : not matching with OR operator

Tm : matching

p1 :probability of having an AND operator with a not matching pattern = 0,25

p2 :probability of having an OR operator with a not matching pattern = 0,25

p3 :probability of having a matching with any operator = 0,529

m3onm2cnm1 TpTpTpcostf

Page 30: 3rd 3DDRESD: ReCPU 4 NIDS

30

Tcp = critical path delay

Page 31: 3rd 3DDRESD: ReCPU 4 NIDS

Design Space Exploration Design Space Exploration ResultsResults

31

It is possible to identify the best architecture according to area and performance requirements

Page 32: 3rd 3DDRESD: ReCPU 4 NIDS

PerformancePerformance

Whenever there is a function call (i.e. nested parentheses) one additional clock cycle of latency is required. The throughput of the proposed architecture really depends on the RE as well as on the input text so it is not possible to compute a fixed throughput but just to provide the performance achievable in different cases.

32

Page 33: 3rd 3DDRESD: ReCPU 4 NIDS

Experimental resultsExperimental results

grep (www.gnu.org/software/grep) on a Linux Fedora Core 4.0 PC with Intel Pentium 4 at 2.80GHz, 512MB RAM measuring the execution time with Linux time command and taking as result the real value.

33

•if loop operators are not present, ReCPU performs equal either with more than one instruction and OR operators or with a single AND instruction

•In case of loop operators it is possible to notice a slow-down in the performance but still achieving a speedup of more than 60.

Page 34: 3rd 3DDRESD: ReCPU 4 NIDS

What’s nextWhat’s next

Pattern matching: State of the ArtProposed approach: ReCPU

NIDS overviewPacket analisysSnort

Conclusions and Future Works

34

Page 35: 3rd 3DDRESD: ReCPU 4 NIDS

35

NIDS overview 1/2NIDS overview 1/2

A great number of intrusion detection systems (IDS) are software applications running on standard Microsoft windows or Linux platforms.

For 10 Mbit/s Ethernet links, these solutions provide sufficient power to capture and process the data packets.

However, for higher-speed links (gigabit and higher) hardware accelerators have begun to be integrated into IDS systems, to process packets in real-time (or near real-time).

Page 36: 3rd 3DDRESD: ReCPU 4 NIDS

NIDS overview 2/2NIDS overview 2/2

Network-based IDS (NIDS) resides on a network segment and analyzes network traffic in real-time to detect malicious packets in transit. Passive network monitors take advantage of “promiscuous mode'' access

In particular, we’ll focus on Signature based NIDS, scanning packets for specific characters ("signature") in the header and/or payload.

The IDS will compare the value within these fields to a pre-defined database of values that define a potential attack.

Source: Gregg Judge, FPGA architecture ups intrusion detection performanceH.Petek N Newsman, Insertion, evasion and debial of service: eluding network intrusion detection36

Page 37: 3rd 3DDRESD: ReCPU 4 NIDS

Packet analisysPacket analisys

37

Passive protocol analysis is useful because it is unobtrusive and, at the lowest levels of network operation, extremely difficult to evade. The installation of a sniffer does not cause any disruption to the network or degradation to network performance. Individual machines on the network can be (and usually are) ignorant to the presence of sniffer.Because the network media provides a reliable way for a sniffer to obtain copies of raw network traffic, there's no obvious way to transmit a packet on a monitored network without it being seen.

Page 38: 3rd 3DDRESD: ReCPU 4 NIDS

38

SnortSnort

The following fields are of most interest to a basic NIDS, such as SNORT (www.snort.org):

Source addressDestination addressPortPacket payload

e.g. a typical snort rule is alert tcp $EXTERNAL_NET any -> $HOME_NET 79 (msg:"FINGER cmd_rootsh backdoor attempt"; flow:to_server,established; content:"cmd_rootsh"; reference:nessus,10070; reference:url,www.sans.org/y2k/TFN_toolkit.htm; reference:url,www.sans.org/y2k/fingerd.htm; classtype:attempted-admin; sid:320; rev:10;)

Page 39: 3rd 3DDRESD: ReCPU 4 NIDS

39

Page 40: 3rd 3DDRESD: ReCPU 4 NIDS

What’s nextWhat’s next

Pattern matching: State of the ArtProposed approach: ReCPUNIDS overview

Conclusions and Future WorksBoard Implementation

40

Page 41: 3rd 3DDRESD: ReCPU 4 NIDS

IP Fragmentation issuesIP Fragmentation issues

IP defines a mechanism, called “fragmentation'', that allows machines to break individual packets into smaller ones. So, reassembly issues manifest themselves at the IP layerInsertion attacks disrupt stream reassembly by adding packets to the stream that would cause it to be reassembled differently on the end-system, if the end system accepted the disruptive packets. An IDS that does not properly handle out-of-order fragments is vulnerable; an attacker can intentionally scramble her fragment streams to elude the IDS. It's also important that the IDS not attempt to reconstruct packets until all fragments have been seen. Another easily made mistake is to attempt to reassemble as soon as the marked final fragment arrives.41

Page 42: 3rd 3DDRESD: ReCPU 4 NIDS

Implementing on BoardImplementing on Board

It’s necessary to wrap the ReCPU core into an IP-CORE, in order to connet the network card, and a RISC processor.ReCPU IP-CORE will be attached to an OPB-slave bus, mastered by a Microblaze processor

42

ReCPU

core

IP CORE

Ethernet

MicroBlaze

OPB bus

Page 43: 3rd 3DDRESD: ReCPU 4 NIDS

stepssteps

1. The packet in transit is intercepted by the ethernet interface listening in promiscuous mode

2.  The onboard RISC processor running linux:• masters the ethernet device,• receives the packet, • manage fragmentation and reassembly, if needed• Forwards the level 3 payload to the ReCPU core

3. ReCPU analizes what it receives from the RISC processor

4. Results of the pattern-matching process are returned to the RISC processor

5.  if no matching happens, the packet can be ignored, in the other case, proper action will be carried out as consequence

43

Page 44: 3rd 3DDRESD: ReCPU 4 NIDS

Questions?Questions?

44