40
Efficient Signature Matching with Multiple Alphabet Compression Tables Shijin Kong Randy Smith Cristian Estan Presented at SecureComm 2008, Istanbul, Turkey

Efficient Signature Matching with Multiple Alphabet Compression

  • Upload
    ngophuc

  • View
    231

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Efficient Signature Matching with Multiple Alphabet Compression

Efficient Signature Matching with Multiple Alphabet Compression Tables

Shijin

Kong Randy Smith Cristian

Estan

Presented at SecureComm

2008, Istanbul, Turkey

Page 2: Efficient Signature Matching with Multiple Alphabet Compression

2

Signature Matching

Signature Matching a core component of network devices

Operation (ideal): For a set of signatures, match all relevant sigs in a single pass over payload

Many constraintsEvolving, complex signaturesWirespeed operationLimited memoryActive adversary

Page 3: Efficient Signature Matching with Multiple Alphabet Compression

3

Regular Expressions and DFAs

Regular expressions standard for writing sigsBuffer overflow: /^RETR\s[^\n]{100}/Format string attack: /^SITE\s+EXEC[^\n]*%[^\n]*%/

DFAs used for matching to input

Page 4: Efficient Signature Matching with Multiple Alphabet Compression

4

12

12

12

4

2

8

2

8

25

25

25

41

41

41

5

5

State 0 State 112

12

12

4

4

4

8

8

State 225

25

25

6

41

5

41

5

State 3

input_byte=1

crt_state=1 …

DFA Operation

next_state=12

if accept(next_state)

alert

Page 5: Efficient Signature Matching with Multiple Alphabet Compression

5

Matching with DFAs

AdvantagesFast – minimal per-byte processingComposable – combine many DFAs into one

DisadvantagesStates are heavyweight (1 KB each!)State-space explosion occurs when DFAs combined

Memory exhausted with only a few DFAs!Workaround: many DFAs matched in parallel

Page 6: Efficient Signature Matching with Multiple Alphabet Compression

6

12

12

12

4

2

8

2

8

25

25

25

41

41

41

5

5

State 0 State 112

12

12

4

4

4

8

8

State 225

25

25

6

41

5

41

5

State 3

Key: Reduce memory usage

Reduce size of transition tables

Reduce number of states

Strategy: aggressively reduce memory footprint, keep exec time low

Page 7: Efficient Signature Matching with Multiple Alphabet Compression

7

Main Contribution

Multiple Alphabet Compression TablesLightweight, applicable to hardware or software Compatible with other techniquesWorst case = average case

Results (in software)4x to 70x memory reduction 35% - 85% execution time increase

Page 8: Efficient Signature Matching with Multiple Alphabet Compression

8

Outline

IntroductionAlphabet Compression TablesInteracting with D2FAsExperimental Results

Page 9: Efficient Signature Matching with Multiple Alphabet Compression

9

12

12

12

4

2

8

2

8

25

25

25

41

41

41

5

5

State 0 State 112

12

12

4

4

4

8

8

State 225

25

25

6

41

5

41

5

State 3

Alphabet Compression: core observation

Some input symbols are equivalent; the transitions on those symbols at any state are identical.

Page 10: Efficient Signature Matching with Multiple Alphabet Compression

10

12

4

2

8

2

8

25

41

41

41

5

5

State 0 State 112

4

4

4

8

8

State 225

6

41

5

41

5

State 3

input_byte=1

crt_state=1

0

0

0

1

2

3

4

5

Alphabet compression table

index=0

next_state=12

Alphabet Compression Tables

Page 11: Efficient Signature Matching with Multiple Alphabet Compression

11

12

4

2

8

2

8

25

41

41

41

5

5

State 0 State 112

4

4

4

8

8

State 225

6

41

5

41

5

State 30

0

0

1

2

3

4

5

Alphabet compression table

Even further compression…

Page 12: Efficient Signature Matching with Multiple Alphabet Compression

12

12

4

2

8

2

8

25

41

41

41

5

5

State 0 State 112

4

4

4

8

8

State 225

6

41

5

41

5

State 30

0

0

1

2

3

4

5

Alphabet compression table

Even further compression…

Page 13: Efficient Signature Matching with Multiple Alphabet Compression

13

12

4

2

8

2

8

25

41

41

41

5

5

State 0 State 112

4

4

4

8

8

State 225

6

41

5

41

5

State 30

0

0

1

2

3

4

5

Alphabet compression table

Even further compression…

Page 14: Efficient Signature Matching with Multiple Alphabet Compression

14

State 0 State 1 State 2 State 3

ACT 0

0

0

0

1

1

1

2

2

ACT 1

25

41

5

12

4

2

8

12

4

5

25

6

41

5

0

0

0

1

2

3

2

3

Multiple ACTs

How do we know which ACT to use with which state?

Page 15: Efficient Signature Matching with Multiple Alphabet Compression

15

State 0 State 1 State 2 State 3

crt_act=1

next_state=12

ACT 0

0

0

0

1

1

1

2

2

ACT 1

0 25

1 41

0 5

0 12

1 4

0 2

0 8

0 12

1 4

0 8

0 25

1 6

1 41

0 5

crt_state=1

0

0

0

1

2

3

2

3

index=0input_byte=1

next_act=0

Multiple ACTs

Page 16: Efficient Signature Matching with Multiple Alphabet Compression

16

Constructing Multiple ACTs

Partition states appropriatelyfor example:

{S1, S2, S3, …, Sn} { {S1, S8,}, {S2, S3, S9,}, … }

Construct single ACT for each group of statesSee algorithm in paper

Page 17: Efficient Signature Matching with Multiple Alphabet Compression

17

Partitioning States for ACTs

Input: number of ACTs to use m, DFA DOutput: a partition of states into m subsets

Use greedy, heuristic approach:

States = Set of all states in D;while (m>1) {

Subset = GetEquivClassPartition(States);AddToResult(Subset);States = States

Subset;m--;

}return Result;

Page 18: Efficient Signature Matching with Multiple Alphabet Compression

18

How many Compression Tables?

Avg

trans per state Avg

exec time

Eight ACTs is enough

Page 19: Efficient Signature Matching with Multiple Alphabet Compression

19

Outline

IntroductionAlphabet Compression TablesInteracting with D2FAsExperimental Results

Page 20: Efficient Signature Matching with Multiple Alphabet Compression

20

ACTs

and D2FAs

Two kinds of redundancy

Symbols have identical behavior for large subsets of states

Compress with (multiple) ACTs

S1

S2

S3

S4

a,b,c

d

e

a,b,c

a,b,c

Page 21: Efficient Signature Matching with Multiple Alphabet Compression

21

ACTs

and D2FAs

Two kinds of redundancy

Symbols have identical behavior for large subsets of states

Compress with (multiple) ACTs

Symbols at many states lead to common next states

Compress with D2FAs

S1

S2

S3

S4

a,b,c

d

e

a,b,c

a,b,c

S2

fe

S1

fe

c

c

h

h

Page 22: Efficient Signature Matching with Multiple Alphabet Compression

22

12

12

12

4

2

8

2

8

25

25

25

41

41

41

5

5

State 0 State 112

12

12

4

4

4

8

8

State 225

25

25

6

41

5

41

5

State 3

D2FAs: core observation

Kumar et al (Sigcomm

2006); Kumar et al (ANCS 2006); Becchi

et al (ANCS 2007)

For many pairs of states, the transitions for most characters are identical!

Idea: store only one copy

Page 23: Efficient Signature Matching with Multiple Alphabet Compression

23

12

12

12

4

2

8

2

8

25

25

25

41

41

41

5

5

State 0 State 112

12

12

4

4

4

8

8

State 225

25

25

6

41

5

41

5

State 3

D2FAs: core observation

For many pairs of states, the transitions for most characters are identical!

Idea: store only one copy

Kumar et al (Sigcomm

2006); Kumar et al (ANCS 2006); Becchi

et al (ANCS 2007)

Page 24: Efficient Signature Matching with Multiple Alphabet Compression

24

2

8

2

12

12

12

4

4

4

8

8

State 0 State 1

25

25

25

41

41

41

5

5

State 2

6

5

41

State 3

input_byte=1

crt_state=1 next_state=12

2 0

D2FAs

Default transitions

Issue: good compression,

potentially heavy run- time cost

Page 25: Efficient Signature Matching with Multiple Alphabet Compression

25

ACTs

and D2FAs Together

Combine ACTs and D2FAs to address both kinds of redundancy

Procedure:1.

Apply D2FA compression to DFAs2.

Apply multiple ACT compression to D2FA results

Only slight modification to ACT constructionAdd “not handled here” symbolDeal with default transitions

Page 26: Efficient Signature Matching with Multiple Alphabet Compression

26

2

8

2

12

12

12

4

4

4

8

8

State 0 State 1

25

25

25

41

41

41

5

5

State 2

6

5

41

State 3

2 0

ACTs

+ D2FAs

Default transitions

Page 27: Efficient Signature Matching with Multiple Alphabet Compression

27

2

8

2

12

4

8

State 0 State 1

25

41

5

State 2

6

5

41

State 3

2 0

ACTs

+ D2FAs

ACT 00

0

0

1

1

1

2

2

Page 28: Efficient Signature Matching with Multiple Alphabet Compression

28

0 2

0 8

0 2

0 12

1 4

0 8

State 0 State 1

0 25

1 41

0 5

State 2

1 6

0 5

1 41

State 3

0 2 1 0

ACTs

+ D2FAs

ACT 00

0

0

1

1

1

2

2

ACT 10

0

0

1

2

3

4

0

Page 29: Efficient Signature Matching with Multiple Alphabet Compression

29

0 2

0 8

0 2

0 12

1 4

0 8

State 0 State 1

0 25

1 41

0 5

State 2

1 6

0 5

1 41

State 3

0 2 1 0

ACTs

+ D2FAs

ACT 00

0

0

1

1

1

2

2

ACT 10

0

0

1

2

3

4

0

crt_act=1crt_state=1

index=0

input=1

next_state=12

next_act=0

index=0

Page 30: Efficient Signature Matching with Multiple Alphabet Compression

30

Outline

IntroductionAlphabet Compression TablesInteracting with D2FAsExperimental Results

Page 31: Efficient Signature Matching with Multiple Alphabet Compression

31

Experimental Setup

1550 HTTP, SMTP, FTP signaturesGrouped by protocol and rule set (Snort or Cisco)

DFA Set Splitting (Yu, 2006) to cluster DFAsProvide memory bound a prioriHeuristically combine into as few DFAs as possible

Experiment Environment10 GB traces, run on 3.0 GHz P4Exec time measured with cycle-accurate counters

Page 32: Efficient Signature Matching with Multiple Alphabet Compression

32

Memory vs

Time

4000 States114 DFAs

8000 States82 DFAs

16000 States45 DFAs

ideal

Page 33: Efficient Signature Matching with Multiple Alphabet Compression

33

Memory vs

Time

Snort HTTP Cisco IPS HTTP

Lowest mem, highest exec!

Page 34: Efficient Signature Matching with Multiple Alphabet Compression

34

Memory vs

Time

Snort SMTP Cisco IPS SMTP

Lowest mem, highest exec!

Page 35: Efficient Signature Matching with Multiple Alphabet Compression

35

Conclusion

Multiple alphabet compression tablesLightweightApplicable to hardware or software platformsCompatible with other techniques

Provides better time vs. space performance4x to 70x memory reduction35% to 85% execution time increase

Best technique a function of time, memory limitsACTs add superior design points

Page 36: Efficient Signature Matching with Multiple Alphabet Compression

36

Efficient Signature Matching with Multiple Alphabet Compression Tables

Thank you

Page 37: Efficient Signature Matching with Multiple Alphabet Compression

37

intentionally blank

Page 38: Efficient Signature Matching with Multiple Alphabet Compression

38

Regular Expressions and DFAs

Regular expressions standard for writing sigsBuffer overflow: /^RETR\s[^\n]{100}/Format string attack: /^SITE\s+EXEC[^\n]*%[^\n]*%/

DFAs used for matching to input

Match sig

2!

Payload Header

C0 A8 64 01 site exec % retr

% S1

S2

r

e [^\n]

[^\nrs]

\n

t r

\n

s

[^\n]…

i t e

\n

S3

S2

%…

Page 39: Efficient Signature Matching with Multiple Alphabet Compression

39

Memory Usage

DFA ACT D2FA ACT + D2FA

Snort HTTP 74 8.1 8.8 4.3

Snort SMTP 98 5.7 42 3.4

Snort FTP 94 4.9 9.2 3.9

DFA ACT D2FA ACT + D2FA

Cisco HTTP 116 30 4.7 17

Cisco SMTP 110 29 3.0 18

Cisco FTP 83 5.1 1.7 1.9

All results reported in megabytes (MB)

Page 40: Efficient Signature Matching with Multiple Alphabet Compression

40

Memory vs

Time

Snort FTP Cisco IPS FTP