41
Kargus: A Highlyscalable Softwarebased Intrusion Detection System M. Asim Jamshed * , Jihyung Lee , Sangwoo Moon , InsuYun * , Deokjin Kim , Sungryoul Lee ,YungYi , KyoungSoo Park * * Networked & Distributed Computing Systems Lab, KAIST † Laboratory of Network Architecture Design & Analysis, KAIST ‡ Cyber R&D Division, NSRI

Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Kargus: A Highly‐scalable Software‐based Intrusion Detection System

M. Asim Jamshed*, Jihyung Lee†, SangwooMoon†, InsuYun*, Deokjin Kim‡, Sungryoul Lee‡, Yung Yi†, KyoungSoo Park*

* Networked & Distributed Computing Systems Lab, KAIST† Laboratory of Network Architecture Design & Analysis, KAIST

‡ Cyber R&D Division, NSRI

Page 2: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

InternetInternet

Network Intrusion Detection Systems (NIDS)

• Detect known malicious activities– Port scans, SQL injections, buffer overflows, etc.

• Deep packet inspection– Detect malicious signatures (rules) in each packet

• Desirable features– High performance (> 10Gbps) with precision– Easy maintenance

• Frequent ruleset updates

2

NIDSNIDSAttack

Page 3: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Hardware vs. Software

• H/W‐based NIDS– Specialized hardware

• ASIC, TCAM, etc.– High performance– Expensive

• Annual servicing costs– Low flexibility

• S/W‐based NIDS– Commodity machines– High flexibility– Low performance

• DDoS/packet drops

3

IDS/IPS Sensors (10s of Gbps)

IDS/IPS M8000(10s of Gbps)

Open‐source S/W

~ US$ 20,000 ‐ 60,000

~ US$ 10,000 ‐ 24,000

≤ ~2 Gbps

Page 4: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Goals

• S/W‐based NIDS– Commodity machines– High flexibility

4

– High performance

Page 5: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Typical Signature‐based NIDS Architecture 

5

PacketAcquisition

Preprocessing

DecodeFlow management 

Reassembly

MatchSuccess

Match Failure(Innocent Flow)

Multi‐string Pattern Matching

Evaluation Failure(Innocent Flow)

EvaluationSuccess

Rule Options Evaluation

Output

MaliciousFlow

alert tcp $EXTERNAL_NET any -> $HTTP_SERVERS 80 (msg:“possible attack attempt BACKDOOR optix runtime detection"; content:"/whitepages/page_me/100.html"; pcre:"/body=\x2521\x2521\x2521Optix\s+Pro\s+v\d+\x252E\d+\S+sErver\s+Online\x2521\x2521\x2521/")

Bottlenecks

* PCRE: Perl Compatible Regular Expression  

Page 6: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Contributions

A highly‐scalable software‐based NIDS for high‐speed networkGoal A highly‐scalable software‐based NIDS for high‐speed networkGoal

Slow software NIDS Fast software NIDS

Inefficient packet acquisition

Expensive string & PCRE pattern matching

Multi‐core packet acquisition

Parallel processing & GPU offloading

Bottlenecks Solutions

Fastest S/W signature‐based IDS: 33Gbps100% malicious traffic: 10 GbpsReal network traffic: ~24 Gbps

OutcomeFastest S/W signature‐based IDS: 33Gbps100% malicious traffic: 10 GbpsReal network traffic: ~24 Gbps

Outcome

6

Page 7: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Challenge 1: Packet Acquisition

• Default packet module: Packet CAPture (PCAP) library– Unsuitable for multi‐core environment– Low performing– More power consumption

• Multi‐core packet capture library is required

7

[Core 1] [Core 2] [Core 3] [Core 4] [Core 5]

10 Gbps NIC B10 Gbps NIC A

[Core 1] [Core 2] [Core 3] [Core 4] [Core 5]

10 Gbps NIC B10 Gbps NIC A

[Core 7] [Core 8] [Core 9] [Core 10] [Core 11]

10 Gbps NIC D10 Gbps NIC C

[Core 7] [Core 8] [Core 9] [Core 10] [Core 11]

10 Gbps NIC D10 Gbps NIC C

Packet RX bandwidth*

0.4‐6.7 GbpsCPU utilization

100 % 

* Intel Xeon X5680, 3.33 GHz, 12 MB L3 Cache

Page 8: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Solution: PacketShader I/O

• PacketShader I/O– Uniformly distributes packets based on flow info by RSS hashing 

• Source/destination IP addresses, port numbers, protocol‐id– 1 core can read packets from RSS queues of multiple NICs– Reads packets in batches (32 ~ 4096)

• Symmetric Receive‐Side Scaling (RSS)– Passes packets of 1 connection to the same queue

8

* S. Han et al., “PacketShader: a GPU‐accelerated software router”, ACM SIGCOMM 2010 

RxQA1

RxQA1

RxQB1

RxQB1

RxQA2

RxQA2

RxQB2

RxQB2

RxQA3

RxQA3

RxQB3

RxQB3

RxQA4

RxQA4

RxQB4

RxQB4

RxQA5

RxQA5

RxQB5

RxQB5

[Core 1] [Core 2] [Core 3] [Core 4] [Core 5]

10 Gbps NIC B10 Gbps NIC B10 Gbps NIC A10 Gbps NIC A

RxQA1

RxQB1

RxQA2

RxQB2

RxQA3

RxQB3

RxQA4

RxQB4

RxQA5

RxQB5

[Core 1] [Core 2] [Core 3] [Core 4] [Core 5]

10 Gbps NIC B10 Gbps NIC A

Packet RX bandwidth 0.4 ‐ 6.7 Gbps

40 Gbps

CPU utilization 100 % 

16‐29%

Page 9: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Challenge 2: Pattern Matching

• CPU intensive tasks for serial packet scanning

• Major bottlenecks– Multi‐string matching (Aho‐Corasick phase)– PCRE evaluation (if ‘pcre’ rule option exists in rule)

• On an Intel Xeon X5680, 3.33 GHz, 12 MB L3 Cache– Aho‐Corasick analyzing bandwidth per core: 2.15 Gbps– PCRE  analyzing bandwidth per core: 0.52 Gbps

9

Page 10: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Solution: GPU for Pattern Matching

• GPUs– Containing 100s of  SIMD processors

• 512 cores for NVIDIA GTX 580– Ideal for parallel data processing without branches

• DFA‐based pattern matching on GPUs– Multi‐string matching using Aho‐Corasick algorithm– PCRE matching

• Pipelined execution in CPU/GPU– Concurrent copy and execution

10

Engine Thread

Packet Acquisition

PreprocessMulti‐stringMatching

Rule OptionEvaluation

GPU Dispatcher Thread

Offloading Offloading

GPU

Multi‐stringMatching

PCREMatching

Multi‐string Matching Queue PCRE Matching Queue

Engine Thread

Packet AcquisitionPacket 

AcquisitionPreprocessPreprocess

Multi‐stringMatching

Multi‐stringMatching

Rule OptionEvaluationRule OptionEvaluation

GPU Dispatcher Thread

OffloadingOffloading OffloadingOffloading

GPU

Multi‐stringMatching

Multi‐stringMatching

PCREMatchingPCRE

Matching

Multi‐string Matching Queue PCRE Matching Queue

Aho‐Corasick bandwidth 2.15 Gbps

39 Gbps

PCRE bandwidth0.52 Gbps

8.9 Gbps

Page 11: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Optimization 1: IDS Architecture

• How to best utilize the multi‐core architecture?• Pattern matching is the eventual bottleneck

• Run entire engine on each core

11

Function Time % ModuleacsmSearchSparseDFA_Full 51.56 multi‐string matching

List_GetNextState 13.91 multi‐string matching

mSearch 9.18 multi‐string matching

in_chksum_tcp 2.63 preprocessing

* GNU gprof profiling results 

Page 12: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Solution: Single‐process Multi‐thread

• Runs multiple IDS engine threads & GPU dispatcher threads concurrently– Shared address space– Less GPU memory consumption– Higher GPU utilization & shorter service latency

12

GPU memory usage

1/6

Packet Acquisition

Core 1Core 1

Preprocess

Multi‐stringMatching

Rule Option

Evaluation

Packet Acquisition

Core 2Core 2

Preprocess

Multi‐stringMatching

Rule Option

Evaluation

Packet Acquisition

Core 3Core 3

Preprocess

Multi‐stringMatching

Rule Option

Evaluation

Packet Acquisition

Core 4Core 4

Preprocess

Multi‐stringMatching

Rule Option

Evaluation

Packet Acquisition

Core 5Core 5

Preprocess

Multi‐stringMatching

Rule Option

Evaluation

Core 6

GPU Dispatcher ThreadSingle thread pinned at core 1

Packet AcquisitionPacket 

Acquisition

Core 1

PreprocessPreprocess

Multi‐stringMatching

Multi‐stringMatching

Rule Option

Evaluation

Rule Option

Evaluation

Packet AcquisitionPacket 

Acquisition

Core 2

PreprocessPreprocess

Multi‐stringMatching

Multi‐stringMatching

Rule Option

Evaluation

Rule Option

Evaluation

Packet AcquisitionPacket 

Acquisition

Core 3

PreprocessPreprocess

Multi‐stringMatching

Multi‐stringMatching

Rule Option

Evaluation

Rule Option

Evaluation

Packet AcquisitionPacket 

Acquisition

Core 4

PreprocessPreprocess

Multi‐stringMatching

Multi‐stringMatching

Rule Option

Evaluation

Rule Option

Evaluation

Packet AcquisitionPacket 

Acquisition

Core 5

PreprocessPreprocess

Multi‐stringMatching

Multi‐stringMatching

Rule Option

Evaluation

Rule Option

Evaluation

Core 6

GPU Dispatcher ThreadGPU Dispatcher ThreadSingle thread pinned at core 1

Page 13: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Architecture

• Non Uniform Memory  Access (NUMA)‐aware• Core framework as deployed in dual hexa‐core system• Can be configured to various NUMA set‐ups accordingly

13

▲ Kargus configuration on a dual NUMA hexanode machine having 4 NICs, and 2 GPUs

Page 14: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

• Caveats– Long per‐packet processing latency:

• Buffering in GPU dispatcher

– More power consumption• NVIDIA GTX 580: 512 cores

• Use:– CPU when ingress rate is low (idle GPU)– GPU when ingress rate is high

Optimization 2: GPU Usage

14

Page 15: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

• Load balancing between CPU & GPU– Reads packets from NIC queues per cycle– Analyzes smaller # of packets at each cycle (a < b < c)– Increases analyzing rate  if queue length increases– Activates GPU if queue length increases

CPU CPU GPU

Solution: Dynamic Load Balancing

15

a b

b ca

c

α β γ

Internal packet queue (per engine)

GPU Queue Length

Packet latency withGPU :  640 μsecs

CPU: 13 μsecs

Page 16: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Optimization 3: Batched Processing

• Huge per‐packet processing overhead– > 10 million packets per second for small‐sized packets at 10 Gbps– reduces overall processing throughput

• Function call batching– Reads group of packets from RX queues at once– Pass the batch of packets to each function

16

Decode(p)  Preprocess(p) Multistring_match(p)

Decode(list‐p)  Preprocess(list‐p) Multistring_match(list‐p)

2x faster processing rate

Page 17: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Kargus Specifications

17

NUMA node 1

12 GB DRAM (3GB x 4)

Intel 82599 Gigabit Ethernet Adapter (dual port)

NVIDIA GTX 580 GPU 

NUMA node 2

Intel X5680 3.33 GHz (hexacore)12 MB L3 NUMA‐Shared Cache 

$1,210

$512

$370

$100

Total Cost (incl. serverboard) = ~$7,000

Page 18: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

IDS Benchmarking Tool

• Generates packets at line rate (40 Gbps) – Random TCP packets (innocent)– Attack packets are generated by attack rule‐set

• Support packet replay using PCAP files• Useful for performance evaluation

18

Page 19: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Kargus Performance Evaluation

• Micro‐benchmarks– Input traffic rate: 40 Gbps– Evaluate Kargus (~3,000 HTTP rules) against:

• Kargus‐CPU‐only (12 engines)• Snort with PF_RING• MIDeA*

• Refer to the paper for more results

19

* G. Vasiliadis et al., “MIDeA: a multi‐parallel intrusion detection architecture”, ACM CCS ‘11

Page 20: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

0

5

10

15

20

25

30

35

64 218 256 818 1024 1518

Throug

hput (G

bps)

Packet size (Bytes)

MIDeASnort w/ PF_RingKargus CPU‐onlyKargus CPU/GPU

0

5

10

15

20

25

30

35

64 218 256 818 1024 1518

Throug

hput (G

bps)

Packet size (Bytes)

Innocent Traffic Performance

20

Actual payload analyzing bandwidth

• 2.7‐4.5x faster than Snort• 1.9‐4.3x faster than MIDeA

Page 21: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Malicious Traffic Performance

21

0

5

10

15

20

25

30

35

64 256 1024 1518

Throug

hput (G

bps)

Packet size (Bytes)

Kargus, 25%Kargus, 50%Kargus, 100%Snort+PF_Ring, 25%Snort+PF_Ring, 50%Snort+PF_Ring, 100%

• 5x faster than Snort

Page 22: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Real Network Traffic

• Three 10Gbps LTE backbone traces of a major ISP in Korea:– Time duration of each trace: 30 mins ~ 1 hour– TCP/IPv4 traffic:

• 84 GB of PCAP traces• 109.3 million packets• 845K TCP sessions

• Total analyzing rate: 25.2 Gbps– Bottleneck: Flow Management (preprocessing)

22

Page 23: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Effects of Dynamic GPU Load Balancing

23

400450500550600650700750800850900

0 5 10 20 33

Kargus w/o LB (polling)

Kargus w/o LB

Kargus w/ LB

Offered Incoming Traffic (Gbps) [Packet Size: 1518 B]

Power Con

sumpt

ion(W

atts)

• Varying incoming traffic rates– Packet size = 1518 B

8.7%20%

Page 24: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Conclusion

• Software‐based NIDS:– Based on commodity hardware

• Competes with hardware‐based counterparts

– 5x faster than previous S/W‐based NIDS– Power efficient– Cost effective

24

> 25 Gbps (real traffic)> 33 Gbps (synthetic traffic)US $~7,000/‐

Page 25: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Thank You

25

fast‐[email protected]

https://shader.kaist.edu/kargus/

Page 26: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Backup Slides

Page 27: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Kargus vs. MIDeA

27

UPDATE MIDEA KARGUS OUTCOME

* G. Vasiliadis, M.Polychronakis, and S. Ioannidis, “MIDeA: a multi‐parallel intrusion detection architecture”, ACM CCS 2011

Page 28: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Kargus vs. MIDeA

28

UPDATE MIDEA KARGUS OUTCOME

Packet acquisition PF_RING PacketShader I/O 70% lower CPU utilization

* G. Vasiliadis, M.Polychronakis, and S. Ioannidis, “MIDeA: a multi‐parallel intrusion detection architecture”, ACM CCS 2011

Page 29: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Kargus vs. MIDeA

29

UPDATE MIDEA KARGUS OUTCOME

Packet acquisition PF_RING PacketShader I/O 70% lower CPU utilization

Detection engineGPU‐support for Aho‐Corasick

GPU‐support for Aho‐Corasick & PCRE

65% faster detection rate

* G. Vasiliadis, M.Polychronakis, and S. Ioannidis, “MIDeA: a multi‐parallel intrusion detection architecture”, ACM CCS 2011

Page 30: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Kargus vs. MIDeA

30

UPDATE MIDEA KARGUS OUTCOME

Packet acquisition PF_RING PacketShader I/O 70% lower CPU utilization

Detection engineGPU‐support for Aho‐Corasick

GPU‐support for Aho‐Corasick & PCRE

65% faster detection rate

Architecture Process‐based Thread‐based 1/6GPU memory usage

* G. Vasiliadis, M.Polychronakis, and S. Ioannidis, “MIDeA: a multi‐parallel intrusion detection architecture”, ACM CCS 2011

Page 31: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Kargus vs. MIDeA

31

UPDATE MIDEA KARGUS OUTCOME

Packet acquisition PF_RING PacketShader I/O 70% lower CPU utilization

Detection engineGPU‐support for Aho‐Corasick

GPU‐support for Aho‐Corasick & PCRE

65% faster detection rate

Architecture Process‐based Thread‐based 1/6GPU memory usage

Batch processingBatching only for

detection engine (GPU)Batching from packet acquisition to output

1.9x higher throughput

* G. Vasiliadis, M.Polychronakis, and S. Ioannidis, “MIDeA: a multi‐parallel intrusion detection architecture”, ACM CCS 2011

Page 32: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Kargus vs. MIDeA

32

UPDATE MIDEA KARGUS OUTCOME

Packet acquisition PF_RING PacketShader I/O 70% lower CPU utilization

Detection engineGPU‐support for Aho‐Corasick

GPU‐support for Aho‐Corasick & PCRE

65% faster detection rate

Architecture Process‐based Thread‐based 1/6GPU memory usage

Batch processingBatching only for

detection engine (GPU)Batching from packet acquisition to output

1.9x higher throughput

Power‐efficient

AlwaysGPU (does not offload 

only when packet sizeis too small)

Opportunistic offloading toGPUs 

(Ingress traffic rate)15% power saving

* G. Vasiliadis, M.Polychronakis, and S. Ioannidis, “MIDeA: a multi‐parallel intrusion detection architecture”, ACM CCS 2011

Page 33: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Receive‐Side Scaling (RSS)

33

• RSS uses Toeplitz hash function (with a random secret key)

Algorithm: RSS Hash Computation

function ComputeRSSHash(Input[], RSK)ret = 0;for each bit b in Input[] do

if b == 1 thenret ^= (left‐most 32 bits of RSK);

endifshift RSK left 1 bit position;

end forend function 

Page 34: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Symmetric Receive‐Side Scaling

34

• Update RSK (Shinae et al.)

0x6d5a 0x6d5a 0x6d5a 0x6d5a

0x6d5a 0x6d5a 0x6d5a 0x6d5a

0x6d5a 0x6d5a 0x6d5a 0x6d5a

0x6d5a 0x6d5a 0x6d5a 0x6d5a

0x6d5a 0x6d5a 0x6d5a 0x6d5a

0x6d5a 0x56da 0x255b 0x0ec2

0x4167 0x253d 0x43a3 0x8fb0

0xd0ca 0x2bcb 0xae7b 0x30b4

0x77cb 0x2d3a 0x8030 0xf20c

0x6a42 0xb73b 0xbeac 0x01fa

Page 35: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Why use a GPU?

35

GTX 580:  512 cores

ALU

Xeon X5680:  6 cores

Control

Cache

ALU

ALU ALU

ALUALU

ALU

VS

*Slide adapted from NVIDIA CUDA C A Programming Guide Version 4.2 (Figure 1‐2)

Page 36: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

GPU Microbenchmarks –Aho‐Corasick

36

0

5

10

15

20

25

30

35

40

32 64 128 256 512 1,024 2,048 4,096 8,192 16,384

Throug

hput (G

bps)

The number of packets in a batch (pkts/batch)

GPU throughput (2B per entry)

CPU throughput

2.15 Gbps

39 Gbps

Page 37: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

GPU Microbenchmarks – PCRE

37

0

1

2

3

4

5

6

7

8

9

10

32 64 128 256 512 1,024 2,048 4,096 8,192 16,384

Throug

hput (G

bps)

The number of packets in a batch (pkts/batch)

GPU throughput

CPU throughput

0.52 Gbps

8.9 Gbps

Page 38: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

• Use of global variables minimal– Avoids compulsory cache misses– Eliminates cross‐NUMA cache bouncing effects

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

64 128 256 512 1024

Innocent Traffic Malicious Traffic

Effects of NUMA‐aware Data Placement

38

Packet Size (Bytes)

Performan

ce Spe

edup

1518

Page 39: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

CPU‐only analysis for small‐sized packets 

• Offloading small‐sized packets to the GPU is expensive– Contention across page‐locked DMA accessible memory with GPU– GPU operational cost of packet metadata increases 

39

0

2,000

4,000

6,000

8,000

10,000

12,000

64 68 72 76 80 84 88 92 96 100 104 108 112 116 120 124 128

Latenc

y (m

sec)

Packet Size (Bytes)

GPU total latency CPU total latencyGPU pattern matching latency CPU pattern matching latency

82

Page 40: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Challenge 1: Packet Acquisition

• Default packet module: Packet CAPture (PCAP) library– Unsuitable for multi‐core environment– Low Performing

40

0.4  0.8  1.5  2.9 5.0  6.7 

0

20

40

60

80

100

0

5

10

15

20

25

30

35

40

64 128 256 512 1024 1518

CPU Utiliz

ation (%

)

Rec

eiving Throu

ghpu

t (Gbp

s)

Packet Size (bytes)

PCAP polling PCAP polling CPU %

Page 41: Kargus: A Highly scalable Software based Intrusion ...shader.kaist.edu/kargus/docs/kargus-slides.pdfInternet Network Intrusion Detection Systems (NIDS) • Detect known malicious activities

Solution: PacketShader* I/O

41

0.4  0.8  1.5  2.9 5.0  6.7 

0

20

40

60

80

100

0

5

10

15

20

25

30

35

40

64 128 256 512 1024 1518

CPU Utiliz

ation (%

)

Rec

eiving Throu

ghpu

t (Gbp

s)

Packet Size (bytes)

PCAP polling PSIO PCAP polling CPU % PSIO CPU %