View
34
Download
0
Category
Preview:
Citation preview
Hyperscan: A Fast Multi-pattern Regex Matcher for Modern CPUs
Xiang Wang1, Yang Hong1, Harry Chang1, KyoungSoo Park2, Geoff Langdale3, Jiayu Hu1 and Heqing Zhu1
1 Intel Corporation; 2 KAIST; 3 branchfree.org
Network Platforms Group
…
2
Networking Applications with Regex Matching
• Deep packet inspection (DPI) – key functionality of L7 traffic monitoring
• Regular expression (regex) matching – core element of DPI
• Big problem – regex matching is SLOW
IPS/IDS WAF
…
…
Application Identification
Network Platforms Group 3
Current Best Practice: Prefilter-based Pattern Matching
/\sSEARCH\s\w+\s\{\d+\}[\r]?\n[^\n]*?%/smi
/^\w+\s+UNSUBSCRIBE\s[^\n]{100}/smi
content:"SEARCH";
pcre:"/\sSEARCH\s\w+\s\{\d+\}[\r]?\n[^\n]*?%/smi"
content:"UNSUBSCRIBE";
pcre:"/^\w+\s+UNSUBSCRIBE\s[^\n]{100}/smi";
Rule 0:
Rule N:
SEARCH
UNSUBSCRIBE
Multi-string matching Single regex matching
Single regex matching
Match!
No Match
…… …
Two-stage Pattern Matching
Network Platforms Group 4
Problems with Prefilter-based Pattern Matching
.*foo[ˆx]barY+
XfoZbarY
Pattern
InputString Matching for “bar”
fX o Z b a r
Regex Matching
fX o Z b a r Y
Manual choice of improper string keywords
Duplicate matching of the string keywords
content:“/";
pcre:"/(?=[defghilmnoqrstwz])(m(ookflolfctm\x2fnmot\.fmu|clvompycem\x2fcen\.vcn)"
Complex regexes lead to slow NFA
Slow
Network Platforms Group 5
Contributions
Novel regex decomposition
Solutions
SIMD-based pattern matching
Efficient multi-string matching
Fast bit-based NFA
Issues
Snort: 8.7x Speedup
Multi-string matching: 3.2x Speedup over DFC
Multi-regex matching: 13.5x Speedup over RE2
Outcome
Manual choice of improper string keywords
Duplicate matching of the string keywords
Complex regexes lead to slow NFA
Problems with current best practices
Slow multi-string matching
Slow NFA matching
Suboptimal matching performance
Network Platforms Group 6
Wide Adoption of Hyperscan
• Successfully deployed by over 40 commercial projects globally
• In production use by tens of thousands of cloud servers in data centers
• Integrated into 37 open-source projects
Network Platforms Group 8
Decomposition-based Matching
.* [^x] Y+foo bar
FA2 STR2 FA1 FA0STR1.*foo[ˆx]barY+
XfoZbarY
Pattern
Input
fX o Z b a r Y
• No duplicate string keyword matching• Smaller FAs with fast DFA matching• Facilitate multi-regex matching
FA1 is Dead!Don’t trigger FA0
String MatchingFA Matching
• Decomposes a pattern into string (STR) and subregex (FA) components
• String matching is the entrance• All components have to be matched
in order
Network Platforms Group 9
Key Issues with Regex Decomposition
• How to automatically decompose a regex?
• How many real-world regexes can be decomposed?
Network Platforms Group 10
Key Issues with Regex Decomposition
• How to automatically decompose a regex?
• How many real-world regexes can be decomposed?
Network Platforms Group 11
Graph-based Regex Decomposition
Glushkov NFA1a
2b
3c
7. 8g
9h
5e
6f
0.4d
10i
• Textual regex decomposition is often tricky, e.g. /b[il1]l\s{0,10}/• Graph structure delivers more insights
(abc|def).*ghiRegex
Graph-based Decomposition1) Dominant Path Analysis2) Dominant Region Analysis3) Network Flow Analysis
1a
2b
3c
7. 8g
9h
5e
6f
0.4d
10i
FA1STR1
.*abc
ghidefSTR2
STR3
Network Platforms Group 12
Graph-based String Extraction
Dominant Path Analysis
1[^a]
2.
4[^a]
6a
7b
8c
3[^a]
5d
10f
9[^e]
11c
0.
1[^a]
2.
3[^a]
6b
9a
12r
3[^a]
4d
14[^e]
15c
0.
7a
10b
13c
5f
8o
11o
Dominant Region Analysis
Network Platforms Group 13
Graph-based String Extraction
Network Flow Analysis• Finds a string (or multiple strings) that ends at the edge• Assigns a score inversely proportional to the length of the string(s) ending at the edge • Runs “max-flow min-cut” algorithm [1] to find a minimum cut-set
2[^a]
3.
6[^a]
10f
13g
16h
5[^a]
8.
19[^c]
22.
0.
11a
14b
17c
9f
12o
15o
7e
4.
20[^m]
18[^e]
21c
1[^a]
23g
[1]Jack Edmonds and Richard M Karp. Theoretical improvements in algorithmic efficiency for network flow problems. Journal of the ACM, 19(2):248–264, 1972.
Network Platforms Group 14
Key Issues with Regex Decomposition
• How to automatically decompose a regex?
• How many real-world regexes can be decomposed?
Network Platforms Group 15
Effectiveness of Graph Analysis on Real-world Rules
Ruleset Total All Graph Analyses
DominantPath
DominantRegion
Network Flow
Snort Talos(May 2015)
1663 94.0% 93.3% 1.9% 1.0%
Snort ET-open 2.9.0 7564 89.3% 86.9% 1.3% 2.7%
Suricata 4.0.4 7430 87.5% 85.0% 1.3% 2.7%
Majority of Regex Rules are Decomposable
Dominant Path Analysis is Effective
Network Platforms Group
Quality of Automatically Extracted Keywords
16
38.5
391.2
524.5 520.6
697.7
0
100
200
300
400
500
600
700
800
1.E+03
1.E+04
1.E+05
1.E+06
1.E+07
1.E+08
700 850 1000 1150 1300
# of Patterns
Snort Talos*
Prefilter-based Hyperscan Reduction
4.6
131139.2
179.1 182.4
0
40
80
120
160
200
1.E+04
1.E+05
1.E+06
1.E+07
1.E+08
1.E+09
500 1000 1500 2000 2500
# of Patterns
Snort ET-Open*
Prefilter-based Hyperscan Reduction
* Left vertical axis: # of regex matching process invocations (In logarithmic scale based on 10)* Right vertical axis: reduction of Hyperscan
Network Platforms Group 18
How to Accelerate Pattern Matching Algorithms?
• Modern CPUs support SIMD (Single Instructions Multiple Data) to exploit data level parallelism
• SIMD instructions can boost database pattern matching by 2x [1]
• Accelerates both multi-string and FA matching with SIMD as the goal
[1] E. Sitaridi, O. Polychroniou, and K. A. Ross. SIMD-accelerated regular expression matching. In Proceedings of the Workshop on Data Management on New Hardware (DaMoN), 2016
X2 X1 X0X3
Y2 Y1 Y0Y3
X2 OP Y2 X1 OP Y1 X0 OP Y0X3 OP Y3
OP
SIMD Register X
SIMD Register Z
SIMD Register Y
OP OP OP
Network Platforms Group 19
Multi-string Pattern Matching Overview
• Extended shift-or matching− Finds candidate input strings that are likely to match some string patterns
• Verification− Filters false positives with hashing− Confirms an exact match with string patterns with the same hash value
Multi-string Shift-or Matching
Verification
Exact Matching
Candidate Matching Input
String Pattern
…
Hashing
Input Traffic
Network Platforms Group 20
Shift-or String Matching
sh-mask(‘h’)
sh-mask(‘a’)
sh-mask(‘p’)
aphp
lowhigh
string pattern
aphp…Input
st-mask 11111111
11111110
11111011
11110101
11111110
m1 (st-mask << 1) sh-mask(‘a’)
11111110 11111110=Limitations:− Single string pattern
matching only− Cannot benefit from
SIMD instructions
OR
m2 (m1 << 1) sh-mask(‘p’)
11111101 11111100 11110101= OR
m3 (m2 << 1) sh-mask(‘h’)
11111011 1111101111111010= OR
m4 (m3 << 1) sh-mask(‘p’)
Match!
11110111 1111010111110110= OR
[1] Ricardo A. Baeza-Yates and Gaston H. Gonnet. A new approach to text searching. Communications of the ACM (CACM), 35(10):74–82, 1992
Network Platforms Group 21
Multi-string Shift-or Matching
• Pattern grouping: Groups the patterns into N buckets• SIMD acceleration: Uses 128-bit sh-masks with 128-bit SIMD instructions (e.g., pslldq
for "left shift“ and por for "or")
… 11111110
sh-mask(‘b’)
sh-mask(‘a’)
sh-mask(‘c’)
low
sh-mask(‘d’)
Padding Bytes
high
11111110 11111111
11111111 11111110
11111110 11111111
11111111 11111110
… 11111110
… 11111110
… 11111110
ab
cd
ab
Bucket 0
cd
…
Bucket 1 Bucket 2 Bucket N
Network Platforms Group 22
Multi-string Shift-or Matching
hp
aphp…
sh-mask(‘a’)
aphp
sh-mask(‘p’) << 24
sh-mask(‘h’) << 16
sh-mask(‘p’) << 8
sh-mask(‘a’)
st-mask
lowhigh
Bucket 4
Bucket 0
Input
… 11101110 11111110 11111111 11111111
… 11111110 11111110 11101110 11111111
… 11111110 11101110 11111111 11101110
… 00000000 00000000 00000000 11111111
… 11101110 11111111 11101110 00000000
… 11101110 11111111 00000000 00000000
… 11101110 00000000 00000000 00000000
sh-mask(‘h’)
sh-mask(‘p’)
Match! (bucket = 0, position = 3)Match! (bucket = 4, position = 3)
… 11101110 11111110 11111111 11111111
OR
11101110
Pre-shifting the sh-masks increasesinstructions per cycle (IPC)!
128-bit SIMD operations increase throughput!
Network Platforms Group 23
Bit-based NFA Matching
• Uses DFA as much as possible – but often impossible
• Classic NFA is slow - O(m) memory lookups per input character (m = # of current states)
• Represents each state with one bit in a state bit-vector
• Exploits parallel bit operations of SIMD to compute the next states -3
1A
4D
5A
6F
7F
0.
3C
2B
-1 -13
5
3
Network Platforms Group 24
Other Subsystems
Small string-set (<80) matching
NFA and DFA cyclic state acceleration
Small-size DFA matching
Anchored pattern matching
Suppression of futile FA matching
…
Network Platforms Group
Evaluation of Hyperscan
• Primary evaluation points:
1. Performance of string matching and regex matching vs. state-of-the-art solutions
2. Application-level performance improvement with Hyperscan
• Experiment setup:
– Machine: Intel Xeon Platinum 8180 CPU @ 2.50GHz (48 GB of RAM)
Runs with a single core
GCC 5.4
– Ruleset: Snort Talos (May 2015), Snort ET-Open 2.9.0, Suricata rulesets 4.0.4
– Workload: random traffic, real-world web traffic
26
Network Platforms Group 27
Multi-String Matching Performance with Snort ET-Open
3.2
1.3 1.2 1.1
0
1
2
3
4
0
3
6
9
12
15
1k 5k 10k 26k
Thro
ugh
pu
t (G
bp
s)
Number of String Patterns1
2.5
2.1
1.71.5
0
1
2
3
0
1
2
3
4
5
6
7
8
9
1k 5k 10k 26k
Thro
ugh
pu
t (G
bp
s)
Number of String Patterns2
1 Random workload.2 Real web traffic trace.
Network Platforms Group
Regex Matching Performance
28
183.3
6.913.5 8.4
0
40
80
120
160
200
Talos ET-Open
Spee
d-u
p b
y H
yper
scan
Multiple Regex Matching*
* Test with Snort Talos (1,300 regexes) and ET-Open (2,800 regexes) rulesets under real Web traffic trace.
40.1
24.8
10.3 9.1
2.3 1.8
0
10
20
30
40
50
Talos ET-Open
Spee
d-u
p b
y H
yper
scan
Single Regex Matching*
PCRE RE2 PCRE2
Network Platforms Group 29
Real-world DPI Application - Snort
• Stock Snort (ST-Snort) employs− AC for multi-string matching− PCRE for regex matching− Boyer-Moore algorithm single-string
matcher
• Hyperscan-ported Snort (HS-Snort) replaced all the algorithms with Hyperscan
• Snort Talos (May 2015) with real-world web traffic
8.37x
113
986
0
200
400
600
800
1000
1200
Thro
ugh
pu
t(M
bp
s)
Snort Performance
ST-Snort HS-Snort
Network Platforms Group 30
Conclusion
• Regex matching is at the core of DPI applications
• Hyperscan’s performance advantage is boosted by:
− Novel regex decomposition
− Efficient multi-string matching and bit-based NFA implementation
• Hyperscan achieves significant performance boosts
− 3.2x compared to DFC in multi-string matching
− 13.5x compared to RE2 in regex matching
• Hyperscan accelerates DPI application Snort by 8.37x
Network Platforms Group 31
Thank You
• Thanks Matt Barr, Alex Coyte and Justin Viiret for their development contribution
• Source code at https://github.com/intel/hyperscan
Recommended