My Experiences, results and remarks to TCP BW and CT Measurements Tools

Preview:

DESCRIPTION

My Experiences, results and remarks to TCP BW and CT Measurements Tools. Jiří Navrátil SLAC. My interests concerns o n. How to achieve full BW utilization Analysis of Tools for BW estimation New methods of CT detection. Inspira tion Les Cottrell IEPM metrics - PowerPoint PPT Presentation

Citation preview

My Experiences, results and remarks to TCP BW and CT

Measurements Tools

Jiří Navrátil

SLAC

My interests concerns on

• How to achieve full BW utilization • Analysis ofTools for BW estimation• New methods of CT detection

• Inspiration– Les Cottrell IEPM metrics – Jin Gujun (LBNL) netest-2 recommendation– Feng Wu-Chun (LANL) simulations– Diana Katabi, Charles Blake (MIT) bottleneck

detection

• Area of saturation = area of full utilization

– TCP windows size (not sufficient for HSL)– Parallel streams– combination

Parallel TCP via Iperf (SLAC to LABS)

0

100

200

300

400

500

600

1 2 4 8 16 24 32 64 96 128

#N parallel TCP streams (WS=1024K)

Ag

gre

ga

ted

sp

ee

d [

Mb

ps]

ARGONE

CALTECH

LANL

RAL-uk

CERN-ch

IN2P3-fr

INFN-it

Parallel TCP via Iperf (SLAC-LANL)

0

100

200

300

400

500

600

2 4 8 16 24 32 64 96 128

#N parallel streams

To

tal

spe

ed

[M

bp

s]

Řada1

Řada2

Řada3Saturation point

Parallel TCP - Bandwidth allocation (1)

0

10

20

30

40

50

60

70

80

90

100

#N parallel TCP streams (SLAC-IN2P3.fr)

To

tal s

pe

ed

as

SU

M o

f p

art

ial s

tre

am

s s

pe

ed

[M

bp

s]

0 Mbytes MbitsMedian Median Average Sdev1 26,4 222 26,3 220 52,7 44 22 22 01 28,3 23,62 28,3 23,63 28,4 23,74 28,4 23,70 113,4 94,6 23,65 23,65 0,0577351 28,3 23,62 28,6 23,93 27,7 23,24 28,6 23,95 28,3 23,76 28 23,47 26,8 22,48 28,4 23,70 224,7 188 23,65 23,475 0,494975

Parallel TCP via Iperf (SLAC-LANL)

0

5

10

15

20

25

2 4 8 16 24 32 64 96 128

#N parallel stream

Str

eam

sp

ee

d [

Mb

ps

]

Řada1

Řada2

Řada3

Inflexion point

Parallel TCP via Iperf (SLAC-LANL)

0

100

200

300

400

500

600

2 4 8 16 24 32 64 96 128

#N parallel streams

To

tal

spe

ed

[M

bp

s]

Řada1

Řada2

Řada3

Parallel TCP via Iperf (SLAC-LANL)

0

5

10

15

20

25

2 4 8 16 24 32 64 96 128

#N parallel stream

Str

eam

sp

ee

d [

Mb

ps

]

Řada1

Řada2

Řada3

Parallel TCP via Iperf (SLAC-CALTECH)

0

50

100

150

200

250

300

2 4 8 16 24 32 64 96 128

#N Parallel streams

To

tal

spe

ed

[M

bp

s]

Řada2

Řada3

Řada4

Parallel TCP via Iperf (SLAC-CALTECH)

0

5

10

15

20

25

30

2 4 8 16 24 32 64 96

#N parallel stream

Str

ea

m S

pe

ed

[M

bp

s]

Řada1

Řada2

Řada3

Why such distribution of throughput for individual

streams ?

Virtual queue

In

Out

Internet path

Problems of slow and high loaded lines

• All parallel streams share same “virtual queue”

• All “my traffic” share same queue with outside CT

visible in statistics for “streams speed”visible in time reports (In iperf)

Bandwidth allocation(SLAC-IN2P2)

0

0,5

1

1,5

2

2,5

3

3,5

2 4 8 16 24 32 64 96 128

#N parallel streams

Mbp

s per

stre

am

Max

Min

Mean

Aggregated speed (SLAC-IN2P3)

0

20

40

60

80

100

120

#N parallel streams

Mbps

ex-1

ex-2

ex-3

ex-4

Parallel TCP Bandwidth(iperf from SLAC-CERN)

0

10

20

30

40

50

60

70

80

90

100

#N parallel streams

Tota

l spe

ed [M

bps]

sample-1

sample-2

sample-3

sample-4

Parallel TCP Bandwidth(iperf from SLAC-CERN)

0

0,5

1

1,5

2

2,5

3

3,5

4

#N parallel streams

Stre

am sp

eed

sample-1

sample-2

sample-3

sample-4

sample-5

Why I cannot achieve more bandwidth with less

streams?

Is there a CT ?

I don’t know !

BW allocation SLAC-CERN(32 streams)

0

1

2

3

0 8 16 24

#N parallel streams

Mb

ps

per

str

eam

32 streams

BW allocation (SLAC-CERN)

0

0,5

1

1,5

2

2,5

3

3,5

0 16 32 48 63#N parallel streams

Mbp

s pe

r str

eam

N=24

N=32

N=64

N=16

Parallel TCP via Iperf ( 32 streams SLAC-IN2P3 )

0

0,2

0,4

0,6

0,8

1

1,2

1,4

1,6

1,8

2

#N paralel streams

str

ea

m s

pe

ed

[M

bp

s]

sample-1

sample-2

sample-3

sample-4

BW allocation (SLAC LANL)

0

5

10

15

20

25

0 16 32 48 64 80 96

#N Parallel streams

Mb

ps

pe

r s

tre

am N=16

N=24

N=32

N=64

N=128

Conclusion:

The other TCP applications behave very

similarly in same environment

But not all applications are so aggressive as

Iperf !

Used tools for BW estimations

• Pathrate

• Pathload

• Iperf

• Netest-2

• Incite BWe

• UDPmon

Short characteristic of methodsPatharate

+ accurate

- not very reliable, long run time

Pathload+ accurate, fast, light

- limited range of operation ( < 155 Mbps)

Small comparisons

0

20

40

60

80

100

120

140

160

vid

conf-1

vid

conf-2

vid

conf-3

C-1

00-1

C-1

00-2

C-1

00-3

C1000-1

C1000-2

C1000-2

Iperf-max

Pload-min

Pload-max

Incite-BWe

PR-ADR

Short characteristic of methodsIperf

+ reliable

- must be configured to use full BW and postprocessed - heavy load of lines during operation

Netest-2+ reliable, good reports- must be configured to use full BW, - accurate (different timing scheme)

Iperf vers. Netest-2 SLAC - CERN(pcgiga)

0

20

40

60

80

100

120

2 4 8 16 24 32 64

#N parallel streams

Ag

gre

ga

ted

Sp

ee

d

Mb

ps Iperf

Netest-2

Iperf vers. Netetst-2 SLAC-Caltech

0

100

200

300

400

#N parallel streams

Ag

gre

ga

ted

sp

ee

d

Mb

ps Iperf

Netest-2 Apr

Netest-2 May

Iperf vers. Netest (SLAC-NERSC)

0

200

400

600

800

1000

1200

1400

1600

1800

2 4 8 16 24 32 64

#N parallel streams

Ag

gre

gate

d s

pe

ed

Mb

ps

netest-2

iperf

Netest-2 parallel streams

0

20

40

60

80

100

120

140

1 7 13

19

25

31

37

43

49

55

61

#N parallel streams

Mb

ps

pe

r s

tre

am Řada1

Řada2

Řada3

Řada4

Řada5

Řada6

INCITE: Edge-based Traffic Processing and Service Inference for High-Performance Networks Richard Baraniuk, Rice University; Les Cottrell, SLAC; Wu-chun Feng, LANL

Time

P1 P2

P1 P2

P1

P2

P2

P1

dTS

(20 ms)

dtp2

dtp1

RTTdTR

BWe ~f(VQ) =f(dTR) dTR <0,RTT>

freq= volume/dt

volume ~ pkt_length*8

dt ~ F(VQ) ~ F(Internet)

BWe=Mean(freq)

Open Problem

What is pkt_length?

• All “my test traffic” share same queue with outside CT

(It means that delay caused to my pkts by VQ is not dependent only on my pkt_lengths !) The accuracy is dependent on knowlegde of packet distribution on particular path.

The average packet length ~1000 bytes gives reasonable results.

Virtual queue

ANL

CERN

Current/Average

Filtering & Averaging

Current/Average

Good Morning !

KPNQest

UK-problems

Load balancing on one path router

CT = f (dTR)

INCITE: Edge-based Traffic Processing and Service Inference for High-Performance Networks Richard Baraniuk, Rice University; Les Cottrell, SLAC; Wu-chun Feng, LANL

MF-CT Features and benefits

• No need access to routers ! – Current monitoring systems for Load of traffic are based on

SNMP or Flows (needs access to routers)

• Low cost:– Allows permanent monitoring (20 pkts/sec ~ overhead 10

Kbytes/sec)– Can be used as data provider for ABW prediction

(ABW=BW-CT)

• We have 2 data points of CT per second !!• Weak point for common use

MATLAB code

SNMP counter

SNMP counter

MF-CT Simulator

SNMP counter

SNMP counter

UDP echo

UDP echo

SLAC IN2P3

CERN

Recommended