D95725004 陳怡安R96725019 解巽評R96725023 高榮泰
IEEE/ACM TRANSACTIONS ON NETWORKING OCTOBER 2006 Cristian Estan, George Varghese, Member, IEEE, and Michael Fisk
指導教授: 林永松 教授
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 2/64
IntroductionRelated WorkCounting Algorithm & AnalysisMeasurement ResultsConclusion
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 3/64
IntroductionRelated WorkCounting Algorithm & AnalysisMeasurement ResultsConclusion
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 4/64
This paper presents a family of bitmap algorithms that address the problem of counting the number of distinct header patterns (flows) seen on a high-speed link.
The authors’ new probabilistic algorithms use little memory and are fast.
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 5/64
Detect port/IP scans
Identify DoS attacks
Estimate spreading rate of a worm
Packet scheduling
Counting is especially hard when processing must be done within a packet arrival time
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 6/64
• Naïve solution – use hash tables (like NetFlow)
• Best known prior algorithm – probabilistic counting
• This paper approach – use bitmaps & probabilistic algorithm
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 7/64
General purpose-Multiresolution bitmap
Whole family of counting algorithms that further improve performance by taking advantage of particularities of the specific counting application. Adaptive bitmap Triggered bitmap
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 8/64
A flow is defined by an identifier given by the values of certain header fields. Ex: define a flow by source and destination IP
addresses
The problem we wish to solve is counting the number of distinct flow identifiers (flow IDs) seen in a specified measurement interval.
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 9/64
An intrusion detection system looking for port scans could count for each active source address the flows
Flows defined by destination IP and port and suspect any source IP that opens more than three flows in 12 s of scanning.
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 10/64
Cost of large memoryPower consumption
Need solutions that:1. Use small amount of memory 2. Have high accuracy
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 11/64
IntroductionRelated WorkCounting Algorithm FamilyAlgorithm AnalysisMeasurement ResultsConclusion
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 12/64
Flajolet, Martin (1985) probabilistic counting Memory use similar to multiresolution bitmap
Whang et al (1990) introduce direct bitmap
You, Chang (1996) use virtual bitmap
Duffield, Lund, Thorup (2002) Accurate solutions based on counting TCP SYN
flags
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 13/64
IntroductionRelated WorkCounting Algorithm & AnalysisMeasurement ResultsConclusion
Direct BitmapVirtual BitmapMultiresolution Bitmap Adaptive Bitmap Triggered Bitmap
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 15/64
HASH(green)=10001001
Set bits in the bitmap using hash of the flow ID of incoming packets
00 01 10 11
00000000 11111111
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 16/64
HASH(blue)=00100100
Different flows have different hash values
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 17/64
HASH(green)=10001001
Packets from the same flow always hash to the same bit
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 18/64
HASH(violet)=10010101
Collisions OK, estimates compensate for them
Bitmap Algorithms for Counting Active Flows on High-Speed Links 19/64
b is the bitmap size The probability that a flow hashes to a
given bit: 1/b n is the number of given flows, the
probability of no flow hashes to a given bit is
Expected number of bits not set is:
The estimation for number of active flows is:
bnnn ebp /)/1()/11()1(
n/bz b(1/e) bpE[z]
)z
bbln(n̂
Observation: The estimation goes BAD when z goes near 0!!
(1)
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 20/64
bn /
b 1e
]n
n̂SD[
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1 2 3 4 5 6 7 8 9
increases, and standard deviation increases and Z
decreases!!
b(1/e) bpE[z] z
Bitmap Algorithms for Counting Active Flows on High-Speed Links 21/64b
e
nbn
eb
n
nVar
ptpp
b
nn
nVar
VVarp
b
nn
nVar
nVVar
p
b
nn
nVar
pVVarp
b
n
p
pV
p
pV
p
pV
b
nbn
pfpVpfpVpfbn
Vfbn
VVfb
zV
bezEp
z
bbn
b
n
n
n
nnn
nn
n
nnn
bn
22
2
2
2
2
2
2
2
2
2
3
3
2
2
2
/
)1()1()
ˆ(
))1(1(1
(1
)ˆ
(
))((1
)ˆ
(
))(2
2(
21
)ˆ
(
))((1
)n
n̂Var(
.....))(
31
)(21
(ˆ
.....))()(2
1)()()((ˆ
)(ˆ
ln)(,
)(
)ln(ˆ
))1(1()()()(
)2
1)(2
)1(()
11()(
)11(2)1())1(()(
)1
1()E(V
22
2
1
1
1
222
ebeVEVEVVar
b
bbs
bbVE
EEEVE
beb
b
nnn
nnn
b
j
j
jAjAi
b
ijAj
b
ijAjn
nn
)ˆ
(Vn
nar
Var(Vn) is easy to obtain!
Using Taylor expansion and Var(Vn) to obtain
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 22/64
HASH(orange)=11110011
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 23/64
HASH(pink)=11100000
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 24/64
HASH(yellow)=01100011
As the flow number get far more than expected upper limit, estimates get inaccurate
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 25/64
Solution: use more bits
HASH(green)=10001001
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 26/64
Solution: use more bits
Problem: memory scales with the number of flowsHASH(blue)=00100100
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 27/64
Solution: a) store only a portion of the bitmap
b) calculate estimate by scaling factor
11001101
11101111
00 01 10 11
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 28/64
HASH(pink)=11100000
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 29/64
HASH(yellow)=01100011
Bitmap Algorithms for Counting Active Flows on High-Speed Links 30/64
Similar with what we done in direct bitmap n: total active flow number; m: the number
of active flow hash to the virtual bitmap The probability distribution of m is
binominal, and expected value is:
We can use (1) to estimate m and obtain n by dividing it by α
nE[m]
)ln(1
ˆz
bbn
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 31/64
Slight different from what we obtained via directed bitmap
b
e
1]
n
n̂SD[
Problem: estimate inaccurate when few flows active
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 32/64
Solution: use many bitmaps, each accurate for a different range
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 33/64
HASH(pink)=11100000
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 34/64
HASH(yellow)=01100011
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 35/64
Use this bitmap to estimate number of flows
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 36/64
Use this bitmap to estimate number of flows
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 37/64
Problem: must update up to three bitmaps
per packetSolution: combine bitmaps into one
OR
OR
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 38/64
HASH(pink)=11100000
00 01 100 101
11001101
11101111
00000000 11111111
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 39/64
HASH(yellow)=01100011
00 100 101
11001101
11101111
00000000 11111111
01
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 40/64
Select the suitable “Base Component” in which the coarsest component has no more than setmax bits set
Add the bits in base component together and multipling with scaling factor
Base Component
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 41/64
Find most accurate Find most accurate componentcomponent
Estimate number of Estimate number of flows hashing to itflows hashing to it
Apply scaling factorApply scaling factor
Bitmap Algorithms for Counting Active Flows on High-Speed Links 42/64
Every Component could be the “Base Component”
If the error of some component is too large? Change finer one as
the “Base Component”!
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 1 2 3 4 5 6 7 8
1
1)2(k
1-k
]n
n̂SD[
2//
kbk
eee kk
X
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 43/64
Bitmap Algorithms for Counting Active Flows on High-Speed Links 44/64
Direct bitmap
Virtual bitmap
Multiresolution
bitmap
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 45/64
The accuracy of a well tuned virtual bitmap and with the wide range of multiresolution bitmaps!!
A small multiresolution bitmap for estimate the magnitude of active flows number and a large virtual bitmap count them precisely
The resolution of the virtual bitmap can be adjusted
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 46/64
Two Updates? Replace r-adjacent component in
mutiresolution bitmap for virtual bitmap
While the flow number is large, replace the components in high resolution.
While the flow number is small, replace the components in low resolution
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 47/64
When the flow number is small…Replace the components of high resolution with the virtual bitmap
When the flow number is large…Replace the components of lower resolution with the virtual bitmap
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 48/64
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 49/64
As to port scan…..?????A multiresolution bitmap for an
active source??This multiresolution bitmap has to be
able to handle large number of flowsMost traffic is NOT port scanAn WASTE!!!
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 50/64
A small direct bitmap + a large multiresolution bitmap
Small direct bitmap counting the active flows from a given source
Once the number exceeds the threshold, a large multiresolution bitmap will be allocated for this source
Bitmap Algorithms for Counting Active Flows on High-Speed Links 51/64
)/()( 22 be
eeN /12
)1ln( 2 N
)1ln(/ 2 NNb
N: the maximum flow number we plan to measure
Bitmap Algorithms for Counting Active Flows on High-Speed Links 52/64
Sweet spot! ρoptimal:1.594, z/b:
20.3%
b/243.1
2/544.1 b
b243.1
Bitmap Algorithms for Counting Active Flows on High-Speed Links 53/64
b, setmax, c, k b= f(k)/2
setmax=b(1-e-max)
c = 2+logk(N/(maxb)) (N is the maximum flow number we want to measure)
f(k)/ln(k) is an indicator of memory usage
1
1)2(k
1-k
]n
n̂SD[
2//
kbk
eee kk
n/bz b(1/e) bpE[z]
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 54/64
IntroductionRelated WorkCounting Algorithm & AnalysisMeasurement ResultsConclusion
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 55/64
Packet traces data (IP headers over a link)
Measurement interval : 5 s
Flows definition : 5-tuple of source and destination IP addresses, ports, and protocol
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 56/64
Low density : sampling error High density : collision error
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 57/64
Comparison Problem-specific counting method for a
specific problem like threshold detection can significantly outperform a one-size-fits-all technique like probabilistic counting.
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 58/64
Configured for average error of 10%
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 59/64
Configured for average error of 3%, 1%
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 60/64
Comparison
Adaptive bitmap can achieve almost the same benefits of virtual bitmap when the number of flows does not vary dramatically.
Overestimating
Three times more accurate
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 61/64
Comparison
Our algorithm reported 84.6% of the sources with four connections, 98.1% of those with five, and all (100%)of the sources that had at least eight connections
Five times less memory
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 62/64
Trade-off between significantly less memory and possible missing port scanners.
However, the probability of a port scanner not being detected decreases exponentially with the number of connections it opens. For example, the probability is 1.87% at five
connections, 0.23% at six, 0.03% at seven, and so on
.
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 63/64
Port scans frequently touch not just a handful of addresses, but an entire block of contiguous addresses
Our algorithms reduce the memory usage by as much as an order of magnitude Count more sources at a time Detect stealthy slow scans : counting
sources with longer inter-arrival times
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 64/64
IntroductionRelated WorkCounting Algorithm & AnalysisMeasurement ResultsConclusion
112/04/21 Bitmap Algorithms for Counting Active Flows on High-Speed Links 65/64
Solve the flow counting problem using extremely small amounts of memory and produce satisfying accuracy
Customizable counting algorithm for applications :