34
MIDeA: A Multi-Parallel Intrusion Detection Architecture Giorgos Vasiliadis, FORTH-ICS, Greece Michalis Polychronakis, Columbia U., USA Sotiris Ioannidis, FORTH-ICS, Greece CCS 2011, 19 October 2011

MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

MIDeA: A Multi-Parallel Intrusion Detection Architecture

Giorgos Vasiliadis, FORTH-ICS, Greece Michalis Polychronakis, Columbia U., USA Sotiris Ioannidis, FORTH-ICS, Greece CCS 2011, 19 October 2011

Page 2: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Network Intrusion Detection Systems

• Typically deployed at ingress/egress points

– Inspect all network traffic

– Look for suspicious activities

– Alert on malicious actions

10 GbE

[email protected] 2

Internet Internal Network

NIDS

Page 3: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

• Traffic rates are increasing – 10 Gbit/s Ethernet speeds are common in

metro/enterprise networks

– Up to 40 Gbit/s at the core

• Keep needing to perform more complex analysis at higher speeds – Deep packet inspection

– Stateful analysis

– 1000s of attack signatures

Challenges

[email protected] 3

Page 4: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Designing NIDS

• Fast

– Need to handle many Gbit/s

– Scalable

• Moore’s law does not hold anymore

• Commodity hardware

– Cheap

– Easily programmable

[email protected] 4

Page 5: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Today: fast or commodity

• Fast “hardware” NIDS

– FPGA/TCAM/ASIC based

– Throughput: High

• Commodity “software” NIDS

– Processing by general-purpose processors

– Throughput: Low

[email protected] 5

Page 6: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

MIDeA

• A NIDS out of commodity components

– Single-box implementation

– Easy programmability

– Low price

Can we build a 10 Gbit/s NIDS with commodity hardware?

[email protected] 6

Page 7: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Outline

• Architecture

• Implementation

• Performance Evaluation

• Conclusions

[email protected] 7

Page 8: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Single-threaded performance

• Vanilla Snort: 0.2 Gbit/s

NIC Preprocess Pattern

matching Output

[email protected] 8

Page 9: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Problem #1: Scalability

• Single-threaded NIDS have limited performance

– Do not scale with the number of CPU cores

[email protected] 9

Page 10: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Multi-threaded performance

• Vanilla Snort: 0.2 Gbit/s • With multiple CPU-cores: 0.9 Gbit/s

NIC

Preprocess Pattern

matching Output

Preprocess Pattern

matching Output

Preprocess Pattern

matching Output

[email protected] 10

Page 11: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Problem #2: How to split traffic

Synchronization overheads

Cache misses

Receive-Side Scaling (RSS)

NIC

cores

11

Page 12: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Multi-queue performance

• Vanilla Snort: 0.2 Gbit/s • With multiple CPU-cores: 0.9 Gbit/s • With multiple Rx-queues: 1.1 Gbit/s

RSS NIC

Pattern matching

Output

Preprocess Pattern

matching Output

Pattern matching

Output Preprocess

Preprocess

12

Page 13: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Problem #3: Pattern matching is the bottleneck

Offload pattern matching on the GPU

NIC Pattern

matching Output

NIC Preprocess Pattern

matching Output

Preprocess

> 75%

[email protected] 13

Page 14: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Why GPU?

• General-purpose computing – Flexible and programmable

• Powerful and ubiquitous – Constant innovation

• Data-parallel model – More transistors for data processing rather than

data caching and flow control

[email protected] 14

Page 15: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Offloading pattern matching to the GPU

• Vanilla Snort: 0.2 Gbit/s • With multiple CPU-cores: 0.9 Gbit/s • With multiple Rx-queues: 1.1 Gbit/s • With GPU: 5.2 Gbit/s

RSS NIC

Pattern matching

Output

Preprocess Pattern

matching Output

Pattern matching

Output Preprocess

Preprocess

15

Page 16: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Outline

• Architecture

• Implementation

• Performance Evaluation

• Conclusions

[email protected] 16

Page 17: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Multiple data transfers

• Several data transfers between different devices

Are the data transfers worth the computational gains offered?

NIC CPU

GPU

[email protected] 17

PCIe PCIe

Page 18: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Capturing packets from NIC

• Packets are hashed in the NIC and distributed to different Rx-queues

• Memory-mapped ring buffers for each Rx-queue

Rx

Rx Queue Assigned

Rx Rx Rx

Network Interface

Ring buffers

Kernel space

User space

[email protected] 18

Page 19: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

CPU Processing

• Packet capturing is performed by different CPU-cores in parallel – Process affinity

• Each core normalizes and reassembles captured packets to streams – Remove ambiguities – Detect attacks that span multiple packets

• Packets of the same connection always end up to the same core

– No synchronization – Cache locality

• Reassembled packet streams are then transferred to the GPU for

pattern matching – How to access the GPU?

[email protected] 19

Page 20: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Accessing the GPU

• Solution #1: Master/Slave model

• Execution flow example

[email protected] 20

GPU

Thread 2

P1

P1

Transfer to GPU:

GPU execution:

Transfer from GPU: P1

P1

P1

P1

14.6 Gbit/s

Thread 3

Thread 4

Thread 1 PCIe

64 Gbit/s

Page 21: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Accessing the GPU

• Solution #2: Shared execution by multiple threads

• Execution flow example

[email protected] 21

P1 P2 P3

P1 P2 P3

Transfer to GPU:

GPU execution:

Transfer from GPU: P1 P2 P3

P1

P1

P1

GPU

48.1 Gbit/s

Thread 1

Thread 2

Thread 3

Thread 4

PCIe 64 Gbit/s

Page 22: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Transferring to GPU

• Small transfer results to PCIe throughput degradation Each core batches many reassembled packets into a single

buffer

[email protected]

CPU-core

Scan

Push

Push

Push

GPU

Page 23: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Pattern Matching on GPU

• Uniformly, one GPU core for each reassembled packet stream

GPU core

Matches

GPU core

GPU core

GPU core

Packet Buffer

GPU core

GPU core

[email protected] 23

Page 24: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Pipelining CPU and GPU

• Double-buffering

– Each CPU core collects new reassembled packets, while the GPUs process the previous batch

– Effectively hides GPU communication costs

CPU

Packet buffers

[email protected] 24

Page 25: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Recap

1-10Gbps

Demux

Per-flow protocol analysis

Data-parallel content matching

NIC:

CPUs:

GPUs:

Packet streams

Reassembled packet streams

Packets [email protected] 25

Page 26: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Outline

• Architecture

• Implementation

• Performance Evaluation

• Conclusions

[email protected] 26

Page 27: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Setup: Hardware

• NUMA architecture, QuickPath Interconnect

Me

mo

ry

IOH IOH

GP

U

NIC

GP

U

Me

mo

ry

CPU-0 CPU-1

[email protected] 27

Model Specs

2 x CPU Intel E5520 2.27 GHz x 4 cores

2 x GPU NVIDIA GTX480 1.4 GHz x 480 cores

1 x NIC 82599EB 10 GbE

Page 28: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Pattern Matching Performance

• The performance of a single GPU increases, as the number of CPU-cores increases

Bounded by PCIe capacity

GP

U T

hro

ugh

pu

t

1

14.6

26.7

42.5 48.1

2 4 8

#CPU-cores

[email protected] 28

Page 29: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Pattern Matching Performance

• The performance of a single GPU increases, as the number of CPU-cores increases

70.7

Adding a second GPU

[email protected] 29

GP

U T

hro

ugh

pu

t

1

14.6

26.7

42.5 48.1

2 4 8

#CPU-cores

Page 30: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Setup: Network

[email protected] 30

Traffic Generator/Replayer MIDeA

10 GbE

Page 31: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Synthetic traffic

• Randomly generated traffic

Gb

it/s

1.5

4.8

7.2

Snort (8x cores)

MIDeA

200b 800b 1500b

Packet size

[email protected] 31

2.1 1.1

2.4

Page 32: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Real traffic

Gb

it/s

5.2

• 5.2 Gbit/s with zero packet-loss – Replayed trace captured at the gateway of a university

campus

[email protected] 32

1.1

Snort (8x cores)

MIDeA

Page 33: MIDeA: A Multi-Parallel Intrusion Detection Architecuregvasil/slides/midea.ccs2011.pdf•MIDeA: A multi-parallel network intrusion detection architecture –Single-box implementation

Summary

• MIDeA: A multi-parallel network intrusion detection architecture

– Single-box implementation

– Based on commodity hardware

– Less than $1500

• Operate on 5.2 Gbit/s with zero packet loss

– 70 Gbit/s pattern matching throughput

[email protected] 33