40
Efficient Memory Utilization on Network Processors for Deep Packet Inspection Piti Piyachon Yan Luo Electrical and Computer Engineering Department University of Massachusetts Lowell

Efficient Memory Utilization on Network Processors for Deep Packet Inspection

  • Upload
    lynch

  • View
    62

  • Download
    0

Embed Size (px)

DESCRIPTION

Efficient Memory Utilization on Network Processors for Deep Packet Inspection. Piti Piyachon Yan Luo Electrical and Computer Engineering Department University of Massachusetts Lowell. Our Contributions. Study parallelism of a pattern matching algorithm - PowerPoint PPT Presentation

Citation preview

Page 1: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

Efficient Memory Utilization on Network Processors

for Deep Packet Inspection

Piti Piyachon

Yan Luo

Electrical and Computer Engineering Department

University of Massachusetts Lowell

Page 2: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Our Contributions• Study parallelism of a pattern matching

algorithm• Propose Bit-Byte Aho-Corasick

Deterministic Finite Automata• Construct memory model to find optimal

settings to minimize the memory usage of DFA

Page 3: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

DPI and Pattern Matching• Deep Packet Inspection

– Inspect: packet header & payload

– Detect: computer viruses, worms, spam, etc.

– Network intrusion detection application: Bro, Snort, etc.

• Pattern Matching requirements1. Matching predefined multiple patterns (keywords, or strings) at

the same time

2. Keywords can be any size.

3. Keywords can be anywhere in the payload of a packet.

4. Matching at line speed

5. Flexibility to accommodate new rule sets

Page 4: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Classical Aho-Corasick (AC) DFA: example 1

• A set of keywords– {he, her, him, his}

accept state

0

4

3

h

e

2

r

1

5

sm

i

6

he

herhim his

hhh

h

h

h

start state

accept state accept stateaccept state

Failure edges back to state 1 are shown as dash line.

Failure edges back to state 0 are not shown.

Page 5: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Memory Matrix Model of AC DFA

• Snort (Dec’05): 2733 keywords

• 256 next state pointers– width = 15 bits

• > 27,000 states

• keyword-ID width = 2733 bits

• 27538 x (2733 + 256 x 15) = 22 MB

22 MB is too big for on-chip RAM

256 Next State Pointers

255 254 3 2 1 0Keyword-ID

<15> <15> <15> <15> <15> <15><2733>

0

1

2

3

27538

state#

....

....

....

....

Page 6: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

0

3

0

0

2

1

1

he

her himhis

k0: he : 0 , 0k1: her : 0 , 0 , 1k2: h im : 0 , 0, 0k3: h is : 0 , 0, 1

4

0

Failure edges are not shown.

Bit-1 DFA

0

3

1

0

2

0

1

he

her

him his

k0: he : 1 , 0k1: her : 1 , 0 , 0k2: h im: 1, 1, 1k3: h is : 1 , 1 , 0

5

1

Bit-3 DFA

4

1

6

0

Failure edges are not shown.

Bit-AC DFA (Tan-Sherwood’s Bit-Split)k0 h e 0110 1000 0110 0101k1 h e r 0110 1000 0110 0101 0111 0010k2 h i m 0110 1000 0110 1001 0110 1101k3 h i s 0110 1000 0110 1001 0111 0011

Need 8 bit-DFA

Page 7: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Memory Matrix of Bit-AC DFA

• Snort (Dec’05): 2733 keywords

• 2 next state pointers– width = 9 bits

• 361 states

• keyword-ID width = 16 bits

• 1368 DFA• 1368 x 361 x (16 + 2 x 9) = 2 MB

2 Next State Pointers

1 0Keyword-ID

<9> <9><16>

0

1

2

3

361

state#

....

....

....

....

Page 8: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Bit-AC DFA Techniques• Shrinking the width of keyword-ID

– From 2733 to 16 bits– By dividing 2733 keywords into 171 subsets

• Each subset has 16 keywords

• Reducing next state pointers – From 256 to 2 pointers– By dividing each input byte into 1 bits– Need 8 bit-DFA

• Extra benefits – The number of states (per DFA) reduces from ~27,000 to ~300 states.– The width of next state pointer reduces from 15 to 9 bits.

• Memory– Reduced from 22 MB to 2 MB

• The number of DFA = ?– With 171 subsets, each subset has 8 DFA. – Total DFA = 171 x 8 = 1,368 DFA

What can we do better to reduce the memory usage?

2 Next State Pointers

1 0Keyword-ID

<9> <9><16>

0

1

2

3

512

state#

....

....

....

....

Page 9: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Classical AC DFA: example 2

Failure edges are not shown.

0

e

1

l

2

e

3

m

4

5

e

6

n

7

t

s

8

p

9

a

10

r

11

a

12

13

l

14

l

15

e

l

16

m

17

a

18

n

19

a

20

21

g

e

22

e

23

m

24

o

25

r

26

y

27

k0 k1k2 k3

M atch Found atk0: e lements state 8k1: paralle l state 16k2: manage state 22k3: memory state 27

28 states

Page 10: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

Byte-AC DFAbyte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3

k0 elements e l e m e n t sk1 parallel p a r a l l e lk2 manage m a n a g ek3 memory m e m o r y

M atch Found atk0: e lements state 2k1: para lle l state 4k2: m anage state 6k3: m emory state 8

0

3

2

1

4

5

6

8

e

l g

r

e

p m

k0

k1k2

k3

0

3

2

1

M atch Found atk0: e lemen ts state 2k1: pa ra llel state 4k2: manage state 6k3: memory state 8

4 6

7

8

n

l e

y

l

a

e

k0

k1k2

k3

0

3

2

1

M atch Found atk0: e lemen ts state 2k1: paralle l state 4k2: manage state 6k3: mem ory state 8

4

5

6

7

8

t

e (any)

(any)

e

r n

m

k0

k1k2

k3

0

3

2

1

M atch Found atk0: e lem ents state 2k1: para lle l state 4k2: manage state 4k3: memo ry state 8

4

7

8

s

l

(any)

m

a

o

k0

k1k2

k3

Byte 0 Byte 1

Byte 2 Byte 3

• Considering 4 bytes at a time

• 4 DFA

• < 9 states / DFA

• 256 next state pointers!

Similar to Dharmapurikar-Lockwood’s JACK DFA, ANCS’05

Page 11: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Bit-Byte-AC DFAbyte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3

k0 e l e m e n t s 0110 0101 0110 1100 0110 0101 0110 1101 0110 0101 0110 1110 0111 0100 0111 0011k1 p a r a l l e l 0111 0000 0110 0001 0111 0010 0110 0001 0110 1100 0110 1100 0110 0101 0110 1100k2 m a n a g e 0110 1101 0110 0001 0110 1110 0110 0001 0110 0111 0110 0101 xxxx xxxx xxxx xxxxk3 m e m o r y 0110 1101 0110 0101 0110 1101 0110 1111 0111 0010 0111 1001 xxxx xxxx xxxx xxxx

k0: e lements 0 ,0k1: para lle l 0 ,1k2: m anage 1 ,0k3: m emory 1 ,0

0

2

1

3

4

5

0 1 0

01

k0 k1 k2

bit 3, Byte 0

k3

0

1

3

1

k2

bit 6, Byte 2

k3

k0: e lemen ts 1 ,1k1: pa ralle l 1 ,1k2: manage 1 ,xk3: mem ory 1 ,x

0

2

1

k0

k1

k2

k3

• 4 bytes at a time

• Each byte divides into bits.

• 32 DFA (= 4 x 8)

• < 6 states/DFA

• 2 next state pointers

Page 12: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Memory Matrix of Bit-Byte-AC DFA

• Snort (Dec’05): 2733 keywords • 4 bytes at a time• < 36 states/DFA• 2 next state pointers

– width = 6 bits• keyword-ID width = 3 bits• 29152 DFA (= 911 x 32)• 29152 x 36 x (3 + 2 x 6) = 1.9 MB

2 Next State Pointers

1 0Keyword-ID

<6> <6><3>

0

1

2

3

36

state#

....

....

....

....

• 1.9 MB is a little better than 2 MB.

• This is because

• It is not any optimal setting.

• Each DFA has different number of states.

• Don’t need to provide same size of memory matrix for every DFA.

Page 13: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Bit-Byte-AC DFA Techniques• Still keeping the width of keyword-ID as low as Bit-DFA.

• Still keeping next state pointers as small as Bit-DFA.

• Reducing states per DFA by

– Skipping bytes

– Exploiting more shared states than Bit-DFA

• Results of reducing states per DFA

– from ~27,000 to 36 states

– The width of next state pointer reduces from 15 to 6 bits.

Page 14: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Construction of Bit-Byte AC DFAbyte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3

k0 e l e m e n t s 0110 0101 0110 1100 0110 0101 0110 1101 0110 0101 0110 1110 0111 0100 0111 0011k1k2k3

bit 3 of byte 0

4 bytes (considered) at a time

Page 15: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Construction of Bit-Byte AC DFA

k0: e lements 0k1:k2:k3:

0

1

0

bit 3, Byte 0

byte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3k0 e l e m e n t s 0110 0101 0110 1100 0110 0101 0110 1101 0110 0101 0110 1110 0111 0100 0111 0011k1k2k3

4 bytes (considered) at a time

Page 16: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Construction of Bit-Byte AC DFA

k0: e lements 0 ,0k1:k2:k3:

0

2

1

0

0

k0

bit 3, Byte 0

byte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3k0 e l e m e n t s 0110 0101 0110 1100 0110 0101 0110 1101 0110 0101 0110 1110 0111 0100 0111 0011k1k2k3

4 bytes (considered) at a time

Page 17: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Construction of Bit-Byte AC DFA

k0: e lements 0 ,0k1: para lle l 0k2:k3:

0

2

1

0

0

k0

bit 3, Byte 0

byte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3k0 e l e m e n t s 0110 0101 0110 1100 0110 0101 0110 1101 0110 0101 0110 1110 0111 0100 0111 0011k1 p a r a l l e l 0111 0000 0110 0001 0111 0010 0110 0001 0110 1100 0110 1100 0110 0101 0110 1100k2k3

4 bytes (considered) at a time

Page 18: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Construction of Bit-Byte AC DFA

k0: e lements 0 ,0k1: para lle l 0 ,1k2:k3:

0

2

1

3

0 1

0

k0 k1

bit 3, Byte 0

byte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3k0 e l e m e n t s 0110 0101 0110 1100 0110 0101 0110 1101 0110 0101 0110 1110 0111 0100 0111 0011k1 p a r a l l e l 0111 0000 0110 0001 0111 0010 0110 0001 0110 1100 0110 1100 0110 0101 0110 1100k2k3

4 bytes (considered) at a time

Page 19: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Construction of Bit-Byte AC DFA

k0: e lements 0 ,0k1: para lle l 0 ,1k2: m anage 1k3:

0

2

1

3

4

0 1

01

k0 k1

bit 3, Byte 0

byte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3k0 e l e m e n t s 0110 0101 0110 1100 0110 0101 0110 1101 0110 0101 0110 1110 0111 0100 0111 0011k1 p a r a l l e l 0111 0000 0110 0001 0111 0010 0110 0001 0110 1100 0110 1100 0110 0101 0110 1100k2 m a n a g e 0110 1101 0110 0001 0110 1110 0110 0001 0110 0111 0110 0101 xxxx xxxx xxxx xxxxk3

4 bytes (considered) at a time

Page 20: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Construction of Bit-Byte AC DFA

k0: e lements 0 ,0k1: para lle l 0 ,1k2: m anage 1 ,0k3:

0

2

1

3

4

5

0 1 0

01

k0 k1 k2

bit 3, Byte 0

byte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3k0 e l e m e n t s 0110 0101 0110 1100 0110 0101 0110 1101 0110 0101 0110 1110 0111 0100 0111 0011k1 p a r a l l e l 0111 0000 0110 0001 0111 0010 0110 0001 0110 1100 0110 1100 0110 0101 0110 1100k2 m a n a g e 0110 1101 0110 0001 0110 1110 0110 0001 0110 0111 0110 0101 xxxx xxxx xxxx xxxxk3

4 bytes (considered) at a time

Page 21: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Construction of Bit-Byte AC DFA

k0: e lements 0 ,0k1: para lle l 0 ,1k2: m anage 1 ,0k3: m emory 1

0

2

1

3

4

5

0 1 0

01

k0 k1 k2

bit 3, Byte 0

byte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3k0 e l e m e n t s 0110 0101 0110 1100 0110 0101 0110 1101 0110 0101 0110 1110 0111 0100 0111 0011k1 p a r a l l e l 0111 0000 0110 0001 0111 0010 0110 0001 0110 1100 0110 1100 0110 0101 0110 1100k2 m a n a g e 0110 1101 0110 0001 0110 1110 0110 0001 0110 0111 0110 0101 xxxx xxxx xxxx xxxxk3 m e m o r y 0110 1101 0110 0101 0110 1101 0110 1111 0111 0010 0111 1001 xxxx xxxx xxxx xxxx

4 bytes (considered) at a time

Page 22: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Construction of Bit-Byte AC DFA

k0: e lements 0 ,0k1: para lle l 0 ,1k2: m anage 1 ,0k3: m emory 1 ,0

0

2

1

3

4

5

0 1 0

01

k0 k1 k2

bit 3, Byte 0

k3

byte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3k0 e l e m e n t s 0110 0101 0110 1100 0110 0101 0110 1101 0110 0101 0110 1110 0111 0100 0111 0011k1 p a r a l l e l 0111 0000 0110 0001 0111 0010 0110 0001 0110 1100 0110 1100 0110 0101 0110 1100k2 m a n a g e 0110 1101 0110 0001 0110 1110 0110 0001 0110 0111 0110 0101 xxxx xxxx xxxx xxxxk3 m e m o r y 0110 1101 0110 0101 0110 1101 0110 1111 0111 0010 0111 1001 xxxx xxxx xxxx xxxx

4 bytes (considered) at a time

Page 23: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Construction of Bit-Byte AC DFA

k0: e lements 0 ,0k1: para lle l 0 ,1k2: m anage 1 ,0k3: m emory 1 ,0

0

2

1

3

4

5

0 1 0

01

k0 k1 k2

bit 3, Byte 0

k3

byte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3k0 e l e m e n t s 0110 0101 0110 1100 0110 0101 0110 1101 0110 0101 0110 1110 0111 0100 0111 0011k1 p a r a l l e l 0111 0000 0110 0001 0111 0010 0110 0001 0110 1100 0110 1100 0110 0101 0110 1100k2 m a n a g e 0110 1101 0110 0001 0110 1110 0110 0001 0110 0111 0110 0101 xxxx xxxx xxxx xxxxk3 m e m o r y 0110 1101 0110 0101 0110 1101 0110 1111 0111 0010 0111 1001 xxxx xxxx xxxx xxxx

Failure edges are not shown.

Page 24: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Construction of Bit-Byte AC DFAbyte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3

k0 e l e m e n t s 0110 0101 0110 1100 0110 0101 0110 1101 0110 0101 0110 1110 0111 0100 0111 0011k1 p a r a l l e l 0111 0000 0110 0001 0111 0010 0110 0001 0110 1100 0110 1100 0110 0101 0110 1100k2 m a n a g e 0110 1101 0110 0001 0110 1110 0110 0001 0110 0111 0110 0101 xxxx xxxx xxxx xxxxk3 m e m o r y 0110 1101 0110 0101 0110 1101 0110 1111 0111 0010 0111 1001 xxxx xxxx xxxx xxxx

0

1

3

1

k2

bit 6, Byte 2

k3

k0: e lemen ts 1 ,1k1: paralle l 1 ,1k2: manage 1 ,xk3: mem ory 1 ,x

0

2

1

k0

k1

k2

k3

Page 25: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Construction of Bit-Byte AC DFA

k0: e lements 0 ,0k1: para lle l 0 ,1k2: m anage 1 ,0k3: m emory 1 ,0

0

2

1

3

4

5

0 1 0

01

k0 k1 k2

bit 3, Byte 0

k3

0

1

3

1

k2

bit 6, Byte 2

k3

k0: e lemen ts 1 ,1k1: pa ralle l 1 ,1k2: manage 1 ,xk3: mem ory 1 ,x

0

2

1

k0

k1

k2

k3

byte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3k0 e l e m e n t s 0110 0101 0110 1100 0110 0101 0110 1101 0110 0101 0110 1110 0111 0100 0111 0011k1 p a r a l l e l 0111 0000 0110 0001 0111 0010 0110 0001 0110 1100 0110 1100 0110 0101 0110 1100k2 m a n a g e 0110 1101 0110 0001 0110 1110 0110 0001 0110 0111 0110 0101 xxxx xxxx xxxx xxxxk3 m e m o r y 0110 1101 0110 0101 0110 1101 0110 1111 0111 0010 0111 1001 xxxx xxxx xxxx xxxx

32 bit-byte DFA need to be constructed.

Page 26: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

0

2

1

3

4

5

0 1 0

01

k0 k1 k2

bit 3, Byte 0

k3

0

1

3

1

k2

bit 6, Byte 2

k3

0

2

1

k0

k1

k2

k3

Bit-Byte-DFA: Searchingbyte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3

input

Page 27: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

0

2

1

3

4

5

0 1 0

01

k0 k1 k2

bit 3, Byte 0

k3

0

1

3

1

k2

bit 6, Byte 2

k3

0

2

1

k0

k1

k2

k3

byte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3a b 1 2

0110 0001 0110 0010 0011 0001 0011 0010input

A failure edge is shown as necessary.

0

Bit-Byte-DFA: Searching

Page 28: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

0

2

1

3

4

5

0 1 0

01

k0 k1 k2

bit 3, Byte 0

k3

0

1

3

1

k2

bit 6, Byte 2

k3

0

2

1

k0

k1

k2

k3

byte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3a b 1 2 m e m o

0110 0001 0110 0010 0011 0001 0011 0010 0110 1101 0110 0101 0110 1101 0110 1111input

Bit-Byte-DFA: Searching

Page 29: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

0

2

1

3

4

5

0 1 0

01

k0 k1 k2

bit 3, Byte 0

k3

0

1

3

1

k2

bit 6, Byte 2

k3

0

2

1

k0

k1

k2

k3

A failure edge is shown as necessary.

byte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3a b 1 2 m e m o r y e f

0110 0001 0110 0010 0011 0001 0011 0010 0110 1101 0110 0101 0110 1101 0110 1111 0111 0010 0111 1001 0110 0101 0110 0110input

0

Bit-Byte-DFA: Searching

Page 30: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

0

2

1

3

4

5

0 1 0

01

k0 k1 k2

bit 3, Byte 0

k3

0

1

3

1

k2

bit 6, Byte 2

k3

0

2

1

k0

k1

k2

k3

Match=> (keyword) ‘memory’Only all 32 bit-DFA find the match in their

own!

byte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3 byte0 byte1 byte2 byte3a b 1 2 m e m o r y e f

0110 0001 0110 0010 0011 0001 0011 0010 0110 1101 0110 0101 0110 1101 0110 1111 0111 0010 0111 1001 0110 0101 0110 0110input

Bit-Byte-DFA: Searching

Page 31: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Find the optimal settings to minimize memory• When k = keywords per subset

– The width of keyword-ID = k bits– k = 1, 2, 3, … , K – when K = the number of keywords in the whole set.

• Snort (Dec.2005) : K = 2733 keywords

• b = bit(s) extracted for each byte– b = 1, 2, 4, 8

– # of next state pointers = 2b

– The example 2: b = 1– Beyond b > 8

• > 256 next state pointers

• B = Bytes considered at a time – B = 1, 2, 3, … – The example 2: B = 4

• Total Memory (T) is a function of k, b, and B.– T = f(k, b, B)

2 Next State Pointers

1 0Keyword-ID

<6> <6><3>

0

1

2

3

36

state#

....

....

....

....

Page 32: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

T’s Formula

Total memory of all bit-ACs in all subset

kK

j

bB

iijij ekBbkT

1

8

1

),,(

k

KNsubsetwhen ,

DefinitionK The number of all Keywords in the whole rule setk The number of keyword in each subset 1, 2, 3, …, Kb GroupedBit: The number of bits grouped to divide 8 bits (of a byte) 1, 2, 4, 8B The number of bytes considered at a time 1, 2, 3, …

The number of next state pointers 2, 4, 16, 256

The number of states in i th bit-level AC in j th subset

The number of bits used to encode states in i th bit-level AC in j th subset

ije

ij

iie 2log, b2andb

B

N

N

subset

bitDFA 8

2 Next State Pointers

1 0Keyword-ID

<6> <6><3>

0

1

2

3

36

....

....

....

....

k e e

Page 33: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

250

270

290

310

330

350

370

390

410

1 11 21 31 41 51 61 71 81 91

T (KB)

Bit-Byte-AC DFA: B=16, b=2

k (keywords-per-subset)

Find the optimal k• Each pair of (b, B) has one optimal k for a minimal T.

T_min at k=12

Page 34: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Find the optimal b40

6.2

355.

7

314.

9

295.

2

295.

2 341.

6

395.

8

345.

4

307.

4

289.

2

271.

9

273.

4

773.5

672.9

598.0

562.8

502.6

424.0

200

300

400

500

600

700

800

B=1 B=2 B=4 B=8 B=16 B=32

b=1b=2b=4

T (KB)

k =3 3 3

k =3 3 3

k =4 4 4

k =6 5 4

k =9 12 18

k =18 33 34

• Each setting of k, b, and B has different optimal point.– Choosing only the optimal setting to

compare.

• b = 2 is the best.

Page 35: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Find the optimal B   

100.00

87.26

82.38

77.65 77.51

74.5375.56

73.0672.17

68.69 69.06

65

70

75

80

85

90

95

100

1 2 3 4 5 6 7 8 9 16 32

T (normalized to the base case B =1)

B

k=3 k=3 k=3 k=4 k=4 k=4 k=4 k=5 k=5 k=12 k=32

395 KB

• b = 2

• T reduces while B increases.– Non-linearly

• B > 16, – T begins to increase.

• B = 16 is the best for Snort (Dec’05).

Page 36: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

39

5.8

4

34

5.4

1

32

6.1

0

30

7.3

9

30

6.8

3

29

5.0

2

29

9.1

1

28

9.1

9

28

5.6

7

27

1.9

1

2001.57

6064.27

0

1000

2000

3000

4000

5000

6000

7000

1 2 3 4 5 6 7 8 9 16

T (KB)

B

Brodie's

Tan's

Comparing with Existing Works   

• Tan-Sherwood’s, Brodie-Cytron-Taylor’s, and Ours

• Our Bit-Byte DFA when B=16– The optimal point at b=2 and k=12

– 272 KB

– 14 % of 2001 KB (Tan’s)

– 4 % of 6064 KB (Brodie’s)

Page 37: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Comparing with Existing Works   

• Tan-Sherwood’s and Ours: At B = 13

95

.84

34

5.4

1

32

6.1

0

30

7.3

9

30

6.8

3

29

5.0

2

29

9.1

1

28

9.1

9

28

5.6

7

27

1.9

1

2001.57

6064.27

0

1000

2000

3000

4000

5000

6000

7000

1 2 3 4 5 6 7 8 9 16

T (KB)

B

Brodie's

Tan's

• (Tan’s on ASIC)

– 2001 KB

– k = 16 is not the optimal setting for B=1.

– Each bit-DFA uses same storage’s capacity, which fits the largest one (worst case).

• (Ours on NP)

– 396 KB < 2001 KB

– k = 3 is the optimal setting for B=1.

– Each bit-DFA uses exactly memory space to hold it.

Page 38: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Results with an NP Simulator   

• NePSim2– An open source IXP24xx/28xx simulator

• NP Architecture based on IXP2855– 16 MicroEngines (MEs)– 512 KB– 1.4 GHz

• Bit-Byte AC DFA: b=2, B=16, k=12– T = 272 KB– 5 Gbps

Page 39: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Conclusion   

• Bit-Byte DFA model can reduce memory usage up to 86%.

• Implementing on NP uses on-chip memory more efficiently without wasting space, comparing to ASIC.

• NP has flexibility to accommodate

• The optimal setting of k, b, and B.

• Different sizes of Bit-Byte DFA.

• New rule sets in the future.

• The optimal setting may change.

• The performance (using a NP simulator) satisfies line speed up to 5 Gbps throughput.

Page 40: Efficient Memory Utilization  on Network Processors  for Deep Packet Inspection

ANCS 2006 U Mass Lowell

Thank you

Question?

[email protected]

[email protected]