Upload
lyphuc
View
226
Download
1
Embed Size (px)
Citation preview
Qian Gong, Wenji Wu, Phil DeMar
GPU Technology Conference 2017
May 2017
Deep Packet Inspection Using GPUs
• Main uses for network traffic analysis
– Operations & management
– Capacity planning
– Performance troubleshooting
• Levels of network traffic analysis
– Device counter level (snmp data)
– Traffic flow level (flow data)
– Packet level (The focus of this work)
• Network securities
• Application performance analysis
• Traffic characterization studies
Background
5/11/2017 Deep Packet Inspection Using GPUs, GTC’172
Characteristics of packet-based network traffic analysis applications
• Time constraints on packet processing
• Computing and I/O throughput-intensive
• High levels of data parallelism
• Packet parallelism. Each packet can be processed independently
• Flow parallelism. Each flow can be processed independently
• Extremely poor temporal locality for data
• Typically, data processed once in sequence; rarely reused
Background (cont.)
5/11/2017 Deep Packet Inspection Using GPUs, GTC’173
Packet-based traffic analysis tools face performance & scalability
challenges within high-performance networks.
– High-performance networks:
• 40GE/100GE link technologies
• Servers are 10GE-connected by default
• 400GE backbone links & 100GE host connections loom on the horizon
– Millions of packets generated & transmitted per sec
The Challenges
5/11/2017 Deep Packet Inspection Using GPUs, GTC’174
• Requirements on computing platform for high performance network
traffic analysis applications
– High compute power
– Ample memory & IO bandwidth
– Capability of handling data parallelism inherent with network data
– Easy programmability
Packet-based Traffic Analysis Tool Platform (I)
5/11/2017 Deep Packet Inspection Using GPUs, GTC’175
• Three types of computing platforms:
– NPU/ASIC, CPU, GPU
Packet-based Traffic Analysis Tool Platform (II)
5/11/2017 Deep Packet Inspection Using GPUs, GTC’176
Features NPU/ASIC CPU GPU
High compute power Varies ✖ ✔
High memory bandwidth Varies ✖ ✔
Easy programmability ✖ ✔ ✔
Data-parallel execution model ✖ ✔ ✔
Architecture Comparison
Features cores Bandwidth DP SP Power Price
NVidia K80 4992 480 GB/s 2.91 TF 8.73 TF 300W $4,349
Intel E7-
8890
18 102 GB/s 0.72 TF 1.44 TF 165W $7,174
NVidia K80 vs. Intel E7-8890
Network Traffic Analysis using GPUs
Highlights of our work:
• Demonstrated GPUs can significantly accelerate network traffic
analysis
• Designed/Implemented a generic I/O architecture to capture and
move network traffics from wire into GPU domain
• Implemented a GPU-accelerated library for network traffic analysis
Our Solution
5/11/2017 Deep Packet Inspection Using GPUs, GTC’177
GPU-based Network Traffic Analysis Framework
5/11/2017 Deep Packet Inspection Using GPUs, GTC’178
Running modes
• Online analysis
– Traffic capture
• Offline analysis
GPU-based analysis
• Header analysis
• Payload analysis
Applications
• Network Monitoring
• IPS/IDS
• Traffic Engineering
• And more
NetworkWireCAP Packet
Capture Engine
Storage Libpcap library
Analyser
Configuration
System
Configuration
Online traffic
Offline traffic
Filter
(BPF)
Header
Parser
Pattern
Matching
Flow Table
(SrcIP, DstIP, SrcPort, DstPort, Proto)
Header Parsing and/or Assembly
(Suspicious packets)
Traffic
Summarization
Abnormal
Warning
Network
Monitoring
IPS/IDS
Traffic
Engineering
Header Analysis
Payload Analysis
Network Traffic Source GPU Domain Applications
Configuration
(In standard JSON format)
…
System Architecture – Online Analysis
5/11/2017 Deep Packet Inspection Using GPUs, GTC’179
...
1. Traffic Capture2. Preprocessing GPU Domain
Traffic Analysis
KernelsUser Space
Output
3. GPU-based Analysis
4. Output
Packet
Buffer
Packets
NICs
Packet
BufferOutput
...
Capturing
CapturedData
Packet Chunks
...
Four types of logical Entities:
• Traffic Capture
• Preprocessing
• GPU-based Analysis
• Output (in JSON format)
WireCAP Packet Capture Engine
5/11/2017 Deep Packet Inspection Using GPUs, GTC’1710
• An advanced packet capture engine for commodity network
interface cards (NICs) in high-speed networks
– Lossless zero-copy packet capture and delivery
– Zero-copy packet forwarding
– A Libpcap-compatible interface for low-level network access
• WireCAP project website
– http://wirecap.fnal.gov (Note: source code is available)
• A GPU-accelerated library for network traffic analysis
– Dozens of CUDA kernels
– Can be combined in a variety of ways to perform intended
analysis operations
• Two types of GPU-based network traffic analysis
– Header analysis (see our GTC’13 talk)
• http://on-demand.gputechconf.com/gtc/2013/presentations/S3146-
Network-Traffic-Monitoring-Analysis-GPUs.pdf
– Packet payload analysis
• Deep packet analysis (TCP streams)
GPU-based Network Traffic Analysis
5/11/2017 Deep Packet Inspection Using GPUs, GTC’1711
Challenges in Stream Reassembly (I)
--- Parallelism
5/11/2017 Deep Packet Inspection Using GPUs, GTC’1712
Why stream reassembly?
• Payload of packet affiliated to the same TCP stream need to be
assembled before matching against pre-defined patterns
However…
• Stream reassembly via parallel hash-table requires an atomic lock with
each hash key (TCP 4-tuple)
• Limited data parallelism when less simultaneous TCP connections are
present
T T C A
T
T
A T A KC
A K
reordering &
normalizationmatch
Challenges in Stream Reassembly (II)
--- Denial of Service Attack
5/11/2017 Deep Packet Inspection Using GPUs, GTC’1713
• To address the problem of out-of-order packets, one widely adopted
approach is packet buffering and stream reassembly, i.e., buffer all
packets following a missing one, until they become in-sequence again.
• This approach is intuitive but vulnerable to denial-of-service (DoS)
attacks, whereby attackers exhaust the packet buffer capacity by sending
long segments of out-of-order packets.
A T CT
A
K
Already received and forwarded data
Buffered data
New data
GPU-based Deep Packets Analysis Pipeline
5/11/2017 Deep Packet Inspection Using GPUs, GTC’1714
packets
Packet Statistic
Analysis /Flow
Classification
Per-flow
TCP Data
Reassembly
Automaton-
based Pattern
Matching
Stream
Processing
Connection
corresponding
connection recordsto next hash bucket
hash(4-tuple)
TCB Connection Table
hk
• Intra-batch TCP packets reordering & assembly
• Inter-batch split detection
Hybrid Pattern Matching Pipeline
State
…
Next
Pattern matching wo/ buffering or dropping out-of-order packets
Observation 1
According to previous internet traffic analysis report, only 2%-5%
packets are affected by re-ordering
When processing packets in batch (~1e6 packets), 0.1%-0.5% TCP
streams spread across batches
Key Mechanisms (I)
5/11/2017 Deep Packet Inspection Using GPUs, GTC’1715
Mechanism 1 --- intra-batch stream reassembly
+ Load packets from network to GPUs in batch
+ In-batch packet reordering and reassembly via parallel sorting
5/11/2017 Deep Packet Inspection Using GPUs, GTC’1716
GPU-based TCP Stream Reassembly
Packet
Reordering
raw packet
packets in flow and sequence ordersort by
(4-tuple|seq #)
p3 p2 p1 p4 p5 p9p6p7p8 p4
p1 p3 p7 p2 p4 p9p8p6p4 p5
next packet array
flow identifier
1 1 1 2 2 3332 2
filter + scan
Stream
Normalization3 7 4 5 98end end endn/a
0 n1 n2 0 n3 n6n50n4
bytes of overlapping data (prefer new data) scan seq #
n/a
5/11/2017 Deep Packet Inspection Using GPUs, GTC’1717
Key Mechanisms (II)
Observation 2
If a string S is matched across a list of packets P1P2…PN, the suffix of P1
must match a prefix of S, the prefix of PN must match a suffix of S, and
P2…PN-1 must match the prefixes of a suffix-S.
Mechanism 2 --- inter-batch split detection
+ Combine the Aho-Corasick (AC) and suffix-AC automatons to
detect signatures spread over different batches
5/11/2017 Deep Packet Inspection Using GPUs, GTC’1718
GPU-based Pattern Matching for Out-of-order Packets
Intra-batch: AC automaton
Keywords: X = {he, his, she, hers} • One thread per packet
• Each thread scans extra N bytes
towards its consecutive packet
thread k
thread k+1
thread k+2
State transition automaton Parallel execution mode
5/11/2017 Deep Packet Inspection Using GPUs, GTC’1719
GPU-based Pattern Matching for Out-of-order Packets
Inter-batch: AC automaton & Suffix-AC automaton
Suffix Pattern Tree (PST)
Suffix set of X: {e,is,s,he,ers,rs}
Out-of-order Packets
struct {
nextState[256];
preState;
preChar;
}PST;
suffix string = path(state)
case 1
AC state
case 2
Suffix-AC state
case 3
AC state Suffix-AC state
New packets
Received and forwarded data
5/11/2017 Deep Packet Inspection Using GPUs, GTC’1720
Performance Evaluation
Traffic Statistics• Traffic source: real traffics mirrored from the Fermilab gateway
• Traffic pattern (average per batch)
Base Systems:• Intel Xeon CPU E5-2650 @ 2.30 GHz, NVIDIA K40
Throughput (wo/ memory transfer)• TCP reassembly: 72.96 Mpps (⨉192 speedup comparing to libnids on CPU)
• TCP state management: 286.85 Mpps
• Pattern matching (AC & Suffix-AC): 5.83 Mpps
# of packets 1 million
# of data packets 776,207
mean packet length 1415-byte
# of connections 15,500
Comparison to Existing Tools
5/11/2017 Deep Packet Inspection Using GPUs, GTC’1721
Snort Split-Detect GASPP Ours
Computing platform CPU CPU GPU GPU
Methods Stream
Reassembly
Split
detection
Intra-batch
stream
reassembly
In-batch stream
reassembly + inter-
batch split detection
Detection over OOO
packets
✔ ✔ limited ✔
Resistance to
fragmentation flood
N Y N Y
Throughput Low Low High High
• Comparison to Snort1, Split-Detect2, and GASPP3
[1] Roesch, Martin. "Snort: Lightweight intrusion detection for networks." Lisa. Vol. 99. No. 1. 1999.
[2] Varghese, George, J. Andrew Fingerhut, and Flavio Bonomi. "Detecting evasion attacks at high speeds without
reassembly." ACM SIGCOMM Computer Communication Review. Vol. 36. No. 4. ACM, 2006.
[3] Vasiliadis, Giorgos, et al. "Design and Implementation of a Stateful Network Packet Processing Framework for
GPUs." IEEE/ACM Transactions on Networking (2016).
5/11/2017 Deep Packet Inspection Using GPUs, GTC’1722
Functionality Evaluation in the Presence of Adversaries
• Robust stream reassembly in facing of out-of-order packets
• Immune to SYN flood and ‘cold start’ in doing normalization
• Exempt from attacks on available buffer memory with timeout and
connection evasion mechanisms
• Extended the GPU-based deep packet analysis framework to work
with regular expressions
• Complement the header info w/ with the payload detection results
for a thorough inspection/analysis
• Optimize and evolve the GPU-based network traffic analysis
framework for 40GE/100GE networks
Future Works
5/11/2017 Deep Packet Inspection Using GPUs, GTC’1723
Questions?
5/11/2017 Deep Packet Inspection Using GPUs, GTC’1724