Upload
victor-chapman
View
217
Download
0
Embed Size (px)
Citation preview
Connect. Communicate. Collaborate
Using Temporal Locality for a Better Design of Flow-oriented Applications
Martin Žádník, CESNET
TNC 2007, Lyngby
Connect. Communicate. Collaborate
Motivation• Optimize performance of network applications• Where context is retrieved with every arrival of the
packet• Such as passive monitoring applications such as
NetFlow, IDS, …• So far, scaling by sampling
Connect. Communicate. Collaborate
Memory limitation• Context must be stored in memory which is either
– small and fast or– large and slow
• What about memory hierarchy?• Use large memory with cache similarly to PC
architecture• Only if locality of traffic is good
– spatial– temporal
Connect. Communicate. Collaborate
Steps• Find a network characteristic for locality• Apply it on real samples• Analyze results• Optimize architecture• Optimize performance• Focus on flow-oriented applications
Connect. Communicate. Collaborate
• Time characteristic is dependent on the speed of link• Pseudo-Time is counted in number of packets
• Not interested directly in time but rather in sequence locality (what is next)
Metric
Connect. Communicate. Collaborate
Characteristic• Flow gap = gap (measured in number of diff. packets)
between two packets of the same flow
Connect. Communicate. Collaborate
Measurement• Collecting data
– samples of 8 – 30 mil. packets– tcpdump, headers only– 195.113.126.154:64540,130.149.49.26:64510
• Offline processing– Perl scripts– average gaps, maximum gaps– cumulative histograms
Connect. Communicate. CollaborateResults
• Distribution of flow-gaps is exponential for common traffic
Cumulativ Histogram of Gaps
0
0.2
0.4
0.6
0.8
1
1.2
0 10000 20000 30000 40000 50000 60000 70000
Length of gaps
Cu
mu
lati
v p
erc
en
tag
e o
f tr
aff
ic
Connect. Communicate. Collaborate
Apply results• Estimate size of the cache in system of cache and slow
memory (DRAM)• Optimize replacement
policy• Estimate the speed-up• Case study on FlowMon
probe
Connect. Communicate. Collaborate
Real World• On chip cache latency 1 clock cycle• External cache 4 clock cycles• DRAM average latency 16 cycles
Connect. Communicate. Collaborate
Amdahl’s law
Connect. Communicate. CollaborateFlowMon context - speedup
• 8x 64bit words
• Internal Cache 9 cycles
• External Cache 12 cycles
• DRAM 24 cycles
1
1.2
1.4
1.6
1.8
2
0 5000 10000 15000 20000 25000 30000 35000
Size of Cache
Spe
e-up
Internal Cache External Cache
Connect. Communicate. CollaborateVictim policy
• LRU x Random
Connect. Communicate. CollaborateEntering policy
• Sample&Hold [Estan,Varghese]• Target elephants flows only• Make sense only for really small cache
Connect. Communicate. CollaborateConclusion
• PseudoTime locality of flows• Measurements on real samples• So far, on-chip CACHE only• Speed-up 1.7x:
• Memory architecture described in VHDL and used for FlowMon probe on COMBO6X cards
• Future work: – Corelation with timestamps – Implement LRU or Sample&Hold