53
1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 [email protected] , http://research.Microsoft.com/~Gray/T alks/

1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 [email protected]@Microsoft.com,

Embed Size (px)

Citation preview

Page 1: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

1

Rules of Thumb in Data Engineering

Jim GrayInternational Conference on Data EngineeringSan Diego, CA 4 March [email protected], http://research.Microsoft.com/~Gray/Talks/

Page 2: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

2

Credits & Thank You!! Prashant Shenoy U. Mass, Amherst analysis of web caching rules. [email protected]

Terrance Kelly, U. Michigan,lots of advice on fixing the paper, [email protected]

interesting work on caching at: http://ai.eecs.umich.edu/~tpkelly/papers/wcp.pdf

Dave Lomet, Paul Larson, Surajit Chaudhurihow big should database pages be?

Remzi Arpaci-Dusseau, Kim Keeton, Erik Riedel discussions about balanced systems an IO

Windsor Hsu, Alan Smith, & Honesty Young, also studied TPC-C and balanced systems (very nice work!) http://golem.cs.berkeley.edu/~windsorh/DBChar/

Anastassia Ailamaki, Kim Keeton cpi measurements

Gordon Bell discussions on balanced systems.

Page 3: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

3

Woops!

and Apology…..Printed/Published paper has MANY bugs! Conclusions OK (sort of ), but typos, flaws,

errors,… Revised version at

http://research.microsoft.com/~Gray/ and in CoRR and MS Research tech report archive.By 15 March 2000.

Sorry!

Sorry!

Page 4: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

4

Outline

Moore’s Law and consequences

Storage rules of thumb

Balanced systems rules revisited

Networking rules of thumb

Caching rules of thumb

Page 5: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

5

Trends: Moore’s LawPerformance/Price doubles every 18 months100x per decadeProgress in next 18 months

= ALL previous progress New storage = sum of all old storage

(ever) New processing = sum of all old

processing.

E. coli double ever 20 minutes!

15 years ago

Page 6: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

6

Trends: ops/s/$ Had Three Growth Phases1890-1945

Mechanical

Relay

7-year doubling

1945-1985Tube, transistor,..

2.3 year doubling

1985-2000Microprocessor

1.0 year doubling 1.E-06

1.E-03

1.E+00

1.E+03

1.E+06

1.E+09

1880 1900 1920 1940 1960 1980 2000

doubles every 7.5 years

doubles every 2.3 years

doubles every 1.0 years

ops per second/$

Page 7: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

7

Trends: Gilder’s Law: 3x bandwidth/year for 25 more years

Today: 10 Gbps per channel 4 channels per fiber: 40 Gbps 32 fibers/bundle = 1.2 Tbps/bundle

In lab 3 Tbps/fiber (400 x WDM)In theory 25 Tbps per fiber1 Tbps = USA 1996 WAN bisection bandwidthAggregate bandwidth doubles every 8 months!

1 fiber = 25 Tbps

Page 8: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

8

Trends: Magnetic Storage Densities

Amazing progressRatios have changedCapacity grows 60%/yAccess speed grows 10x more slowly 0.01

0.1

1

10

100

1000

10000

100000

1000000

84 88 92 96 00 04

tpikbpiMBpsGbpsi

Magnetic Disk Parameters vs Time

year

Page 9: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

9

Trends: Density Limits

The end is near!Products:11 GbpsiLab: 35 Gbpsi“limit”: 60 GbpsiButlimit keeps rising& there are alternatives

Bit Density

3 2

3,000 2,000

1,000 600

300 200

100 60

30 20

10 6

b/µm2 Gb/in2

1990 1992 1994 1996 1998 2000 2002 2004 2006 2008

1 0.6

CD

DVD ODD

Wavelength Limit

SuperParmagnetic Limit

?: NEMS, Florescent

? Holograpic

, DNA?

Figure adapted from Franco Vitaliano, “The NEW new media: the growing attraction of nonmagnetic storage”, Data Storage, Feb 2000, pp 21-32, www.datastorage.com

Density vs Timeb/µm2 & Gb/in2

Page 10: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

10

Trends: promises NEMS (Nano Electro Mechanical Systems)(http://www.nanochip.com/) also Cornell, IBM, CMU,…

• 250 Gbpsi by using tunneling electronic microscope

• Disk replacement• Capacity: 180 GB now,

1.4 TB in 2 years • Transfer rate: 100 MB/sec R&W• Latency: 0.5msec• Power: 23W active, .05W Standby• 10k$/TB now, 2k$/TB in 2002

Page 11: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

11

Consequence of Moore’s law:Need an address bit every 18 months.

Moore’s law gives you 2x more in 18 months.RAM Today we have 10 MB to 100 GB machines

(24-36 bits of addressing) then In 9 years we will need 6 more bits:

30-42 bit addressing (4TB ram).

Disks Today we have 10 GB to 100 TB file systems/DBs

(33-47 bit file addresses) In 9 years, we will need 6 more bits

40-53 bit file addresses (100 PB files)

Page 12: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

12

Architecture could change this

1-level store: System 48, AS400 has 1-level store. Never re-uses an address. Needs 96-bit addressing today.

NUMAs and Clusters Willing to buy a 100 M$ computer? Then add 6 more address bits.

Only 1-level store pushes us beyond 64-bitsStill, these are “logical” addresses, 64-bit physical will last many years

Page 13: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

13

Outline

Moore’s Law and consequences

Storage rules of thumb

Balanced systems rules revisited

Networking rules of thumb

Caching rules of thumb

Page 14: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

14

Storage Latency: How Far Away is the Data?

RegistersOn Chip CacheOn Board Cache

Memory

Disk

12

10

100

Tape /Optical Robot

10 9

10 6

Olympia

This Hotel

This RoomMy Head

10 min

1.5 hr

2 Years

1 min

Pluto

2,000 YearsAndromeda

Page 15: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

15

Storage Hierarchy : Speed & Capacity vs Cost TradeoffsStorage Hierarchy : Speed & Capacity vs Cost Tradeoffs

1015

1012

109

106

103

Typ

ical

Sys

tem

(by

tes)

Size vs Speed

Access Time (seconds)10-9 10-6 10-3 10 0 10 3

Cache

Main

Secondary

Disc

Nearline Tape

Offline Tape

Online Tape

102

100

10-2

10-4

10-6

$/M

B

Price vs Speed

Access Time (seconds)10-9 10-6 10-3 10 0 10 3

Cache

MainSecondary

Disc

Nearline Tape

Offline Tape

Online Tape

Page 16: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

16

Disks: TodayDisk is 8GB to 80 GB10-30 MBps5k-15k rpm (6ms-2ms rotational latency)

12ms-7ms seek7K$/IDE-TB, 20k$/SCSI-TBFor shared disks most time spent waiting in queue for access to arm/controller

Seek

Rotate

Transfer

Seek

Rotate

Transfer

Wait

Page 17: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

17

Standard Storage MetricsCapacity: RAM: MB and $/MB: today at 512MB and 3$/MB Disk: GB and $/GB: today at 40GB and 20$/GB Tape: TB and $/TB: today at 40GB and

10k$/TB (nearline)

Access time (latency) RAM: 100 ns Disk: 15 ms Tape: 30 second pick, 30 second position

Transfer rate RAM: 1-10 GB/s Disk: 20-30 MB/s - - -Arrays can go to 10GB/s Tape: 5-15 MB/s - - - Arrays can go to

1GB/s

Page 18: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

18

New Storage Metrics: Kaps, Maps, SCAN

Kaps: How many kilobyte objects served per second The file server, transaction processing metric This is the OLD metric.

Maps: How many megabyte objects served per sec The Multi-Media metric

SCAN: How long to scan all the data the data mining and utility metric

And Kaps/$, Maps/$, TBscan/$

Page 19: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

22

Storage Ratios Changed10x better access time10x more bandwidth100x more capacityData 25x cooler (1Kaps/20MB vs 1Kaps/500MB)

4,000x lower media price20x to 100x lower disk priceScan takes 10x longer (3 min vs 45 min)

Disk Performance vs Time

1

10

100

1980 1990 2000

Year

seek

s p

er s

eco

nd

ban

dw

idth

: MB

/s

0.1

1.

10.

Cap

acity

(GB

)

Disk accesses/second vs Time

1

10

100

1980 1990 2000

Year

Acc

esse

s p

er S

eco

nd

Storage Price vs TimeMegabytes per kilo-dollar

0.1

1.

10.

100.

1,000.

10,000.

1980 1990 2000

Year

MB

/k$

DRAM/disk media price ratio changed

1970-1990 100:1 1990-1995 10:1 1995-1997 50:1 today ~ 0.03$/MB disk 100:1

3$/MB dram

Page 20: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

23

Data on Disk Can Move to RAM in 10 years

Storage Price vs TimeMegabytes per kilo-dollar

0.1

1.

10.

100.

1,000.

10,000.

1980 1990 2000

Year

MB

/k$

100:1

10 years

Page 21: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

24

Kaps over time

1.E+0

1.E+1

1.E+2

1.E+3

1.E+4

1.E+5

1.E+6

1970 1980 1990 2000

Kap

s/$

10

100

1000

Kap

s/d

isk

Kaps

Kaps/$

More Kaps and Kaps/$ but….

Disk accesses got much less expensiveBetter disks

Cheaper disks!But: disk arms are expensivethe scarce resource45 minute Scanvs 5 minutes in 1990

100 GB

30 MB/s

Page 22: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

25

Disk vs Tape

Disk 40 GB 20 MBps 5 ms seek time 3 ms rotate latency 7$/GB for drive

3$/GB for ctlrs/cabinet 4 TB/rack

1 hour scan

Tape 40 GB 10 MBps 10 sec pick time 30-120 second seek time 2$/GB for media

8$/GB for drive+library 10 TB/rack

1 week scanThe price advantage of tape is narrowing, and the performance advantage of disk is growingAt 10K$/TB, disk is competitive with nearline tape.

GuestimatesCern: 200 TB3480 tapes2 col = 50GBRack = 1 TB=20 drives

Page 23: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

27

It’s Hard to Archive a PetabyteIt takes a LONG time to restore it.At 1GBps it takes 12 days!Store it in two (or more) places online (on disk?).

A geo-plexScrub it continuously (look for errors)On failure, use other copy until failure repaired, refresh lost copy from safe copy.

Can organize the two copies differently (e.g.: one by time, one by space)

Page 24: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

28

The “Absurd” 10x (=5 year) Disk

2.5 hr scan time (poor sequential access)1 aps / 5 GB (VERY cold data)It’s a tape!

1 TB100 MB/s

200 Kaps

Page 25: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

29

How to cool disk data:

Cache data in main memory See 5 minute rule later in presentation

Fewer-larger transfers Larger pages (512-> 8KB -> 256KB)

Sequential rather than random access Random 8KB IO is 1.5 MBps Sequential IO is 30 MBps (20:1 ratio is

growing)

Raid1 (mirroring) rather than Raid5 (parity).

Page 26: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

30

Stripes, Mirrors, Parity (RAID 0,1, 5)

RAID 0: Stripes bandwidth

RAID 1: Mirrors, Shadows,… Fault tolerance Reads faster, writes 2x slower

RAID 5: Parity Fault tolerance Reads faster Writes 4x or 6x slower.

0,3,6,.. 1,4,7,.. 2,5,8,..

0,1,2,.. 0,1,2,..

0,2,P2,.. 1,P1,4,.. P0,3,5,..

Page 27: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

31

RAID 10 (strips of mirrors) Wins“wastes space, saves arms”RAID 5 (6 disks 1 vol):

Performance 675 reads/sec 210 writes/sec Write

4 logical IO, 2 seek + 1.7 rotate

SAVES SPACEPerformance degrades on failure

RAID1 (6 disks, 3 pairs)

Performance 750 reads/sec 300 writes/sec Write

2 logical IO 2 seek 0.7 rotate

SAVES ARMSPerformance improves on failure

Page 28: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

33

Auto Manage Storage1980 rule of thumb: A DataAdmin per 10GB, SysAdmin per mips

2000 rule of thumb A DataAdmin per 5TB SysAdmin per 100 clones (varies with app).

Problem: 5TB is 60k$ today, 10k$ in a few years.

Admin cost >> storage cost !!!!Challenge: Automate ALL storage admin tasks

Page 29: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

34

Summarizing storage rules of thumb (1)

Moore’s law: 4x every 3 years 100x more per decade

Implies 2 bit of addressing every 3 years.Storage capacities increase 100x/decadeStorage costs drop 100x per decadeStorage throughput increases 10x/decadeData cools 10x/decadeDisk page sizes increase 5x per decade.

Page 30: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

35

Summarizing storage rules of thumb (2)

RAM:Disk and Disk:Tape cost ratios are 100:1 and 3:1So, in 10 years, disk data can move to RAM since prices decline 100x per decade. A person can administer a million dollars of disk storage: that is 1TB - 100TB todayDisks are replacing tapes as backup devices.You can’t backup/restore a Petabyte quicklyso geoplex it.

Mirroring rather than Parity to save disk arms

Page 31: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

36

Outline

Moore’s Law and consequences

Storage rules of thumb

Balanced systems rules revisited

Networking rules of thumb

Caching rules of thumb

Page 32: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

37

Standard Architecture (today)

PCI Bus 2

System Bus

PCI Bus 1

Page 33: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

38

Amdahl’s Balance Laws

parallelism law: If a computation has a serial part S and a parallel component P, then the maximum speedup is (S+P)/S.balanced system law: A system needs a bit of IO per second per instruction per second:about 8 MIPS per MBps.

memory law: =1: the MB/MIPS ratio (called alpha ()), in a balanced system is 1.IO law: Programs do one IO per 50,000 instructions.

Page 34: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

39

Amdahl’s Laws Valid 35 Years Later?

Parallelism law is algebra: so SURE! Balanced system laws? Look at tpc results (tpcC, tpcH) at http://www.tpc.org/

Some imagination needed: What’s an instruction (CPI varies from 1-

3)? RISC, CISC, VLIW, … clocks per instruction,…

What’s an I/O?

Page 35: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

40

Disks/ cpu

 

50

22

TPC systemsNormalize for CPI (clocks per instruction) TPC-C has about 7 ins/byte of IO TPC-H has 3 ins/byte of IO

TPC-H needs ½ as many disks, sequential vs randomBoth use 9GB 10 krpm disks (need arms, not bytes)

  MHz/cpu

CPI mipsKB

/IO

IO/s/

disk

Disks

MB/s/

cpu

Ins/IO

Byte

Amdahl 1 1 1 6      8

TPC-C=random

550 2.1 262 8 100 397 40 7TPC-H= sequential

550 1.2 458 64 100 176 141 3

Page 36: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

41

TPC systems: What’s alpha (=MB/MIPS)?Hard to say:

Intel 32 bit addressing (= 4GB limit). Known CPI.

IBM, HP, Sun have 64 GB limit. Unknown CPI.

Look at both, guess CPI for IBM, HP, Sun

Alpha is between 1 and 6Mips Memory Alpha

Amdahl 1 1 1tpcC Intel 8x262 = 2Gips 4GB 2tpcH Intel 8x458 = 4Gips 4GB 1tpcC IBM 24 cpus ?= 12 Gips 64GB 6tpcH HP 32 cpus ?= 16 Gips 32 GB 2

Page 37: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

43

Amdahl’s Balance Laws Revised

Laws right, just need “interpretation” (imagination?)

Balanced System Law: A system needs 8 MIPS/MBpsIO, but instruction rate must be measured on the workload. Sequential workloads have low CPI (clocks per

instruction), random workloads tend to have higher CPI.

Alpha (the MB/MIPS ratio) is rising from 1 to 6. This trend will likely continue.One Random IO’s per 50k instructions. Sequential IOs are larger One sequential IO per 200k instructions

Page 38: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

44

PAP vs RAP Peak Advertised Performance vs Real Application Performance

File System

ApplicationData

133 MBps

90 MBps

PCI

66 MBps25 MBps

Disks

SCSI160 MBps

90 MBps

1600 MBps500 MBps

System Bus 550 x4 Mips = 2 Bips1-3 cpi = 170-550 mips

CPU

PCI Bus 2

System Bus

PCI Bus 1

Page 39: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

45

Outline

Moore’s Law and consequencesStorage rules of thumbBalanced systems rules revisitedNetworking rules of thumbCaching rules of thumb

Page 40: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

47

1 GBps1 GBps

Ubiquitous 10 GBps SANs

in 5 years

1Gbps Ethernet are reality now. Also FiberChannel ,MyriNet, GigaNet,

ServerNet,, ATM,…

10 Gbps x4 WDM deployed now (OC192) 3 Tbps WDM working in lab

In 5 years, expect 10x, wow!!

5 MBps20 MBps

40 MBps

80 MBps

120 MBps120 MBps(1Gbps)(1Gbps)

Page 41: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

48

Networking

WANS are getting faster than LANSG8 = OC192 = 8Gbps is “standard”Link bandwidth improves 4x per 3 yearsSpeed of light (60 ms round trip in US)Software stacks have always been the problem.

Time = SenderCPU + ReceiverCPU + bytes/bandwidth

This has been the problem

Page 42: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

49

0

50

100

150

200

250

100Mbps Gbps SAN

Transmitreceivercpusender cpu

Time µs toSend 1KB

The Promise of SAN/VIA:10x in 2 years http://www.ViArch.org/

Yesterday: 10 MBps (100 Mbps Ethernet)

~20 MBps tcp/ip saturates 2 cpus

round-trip latency ~250 µs

Now Wires are 10x faster

Myrinet, Gbps Ethernet, ServerNet,…

Fast user-level communication tcp/ip ~ 100 MBps 10% cpu round-trip latency is 15 us

1.6 Gbps demoed on a WAN

Page 43: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

50

How much does wire-time cost?$/Mbyte?

Cost Time

Gbps Ethernet .2µ$ 10 ms100 Mbps Ethernet .3µ$ 100 msOC12 (650 Mbps) .003$ 20 msDSL .0006$ 25 secPOTs .002$ 200 secWireless: .80$ 500 sec

Seat cost$/3y

BandwidthB/s $/MB Time

GBpsE 2000 1.00E+08 2.E-07 0.010100MbpsE 700 1.00E+07 7.E-07 0.100OC12 12960000 5.00E+07 3.E-03 0.020OC3 3132000 3.00E+06 1.E-02 0.333T1 28800 1.00E+05 3.E-03 10.000DSL 2300 4.00E+04 6.E-04 25.000POTS 1180 5.00E+03 2.E-03 200.000Wireless ? 2.00E+03 8.E-01 500.000

seconds in 3 years 94608000

Page 44: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

52

Outline

Moore’s Law and consequences

Storage rules of thumb

Balanced systems rules revisited

Networking rules of thumb

Caching rules of thumb

Page 45: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

53

The Five Minute RuleTrade DRAM for Disk AccessesCost of an access (Drive_Cost / Access_per_second)Cost of a DRAM page ( $/MB/ pages_per_MB)Break even has two terms:Technology term and an Economic term

Grew page size to compensate for changing ratios.Now at 5 minutes for random, 10 seconds sequential

ofDRAMPricePerMB

skDrivePricePerDi

skecondPerDiAccessPerS

ofDRAMPagesPerMBtervaleferenceInBreakEvenR

Page 46: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

54

Cost a RAM Page RAM_$_Per_MB

PagesPerMB

The 5 Minute Rule Derived

Breakeven: RAM_$_Per_MB = _____DiskPrice . PagesPerMB T x AccessesPerSecond

T = DiskPrice x PagesPerMB . RAM_$_Per_MB x AccessPerSecond

$

( )/

T

T =TimeBetweenReferences to Page

Disk Access Cost /T

DiskPrice .

AccessesPerSecond

Page 47: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

55

Plugging in the Numbers

ofDRAMPricePerMB

skDrivePricePerDi

skecondPerDiAccessPerS

ofDRAMPagesPerMBtervaleferenceInBreakEvenR

PPM/aps disk$/Ram$ Break Even

Random 128/120 ~1

1000/3 ~300 5 minutes

Sequential

1/30 ~ .03 ~ 300 10second

s Trend is longer times because disk$ not changing much, RAM$ declining 100x/decade

5 Minutes & 10 second rule

Page 48: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

56

When to Cache Web Pages.

Caching saves user timeCaching saves wire timeCaching costs storageCaching only works sometimes: New pages are a miss Stale pages are a miss

Page 49: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

57

The 10 Instruction RuleSpend 10 instructions /second to save 1 byteCost of instruction:

I =ProcessorCost/MIPS*LifeTimeCost of byte:

B = RAM_$_Per_B/LifeTimeBreakeven:

NxI = B

N = B/I = (RAM_$_B X MIPS)/ ProcessorCost ~ (3E-6x5E8)/500 = 3 ins/B for Intel

~ (3E-6x3E8)/10 = 10 ins/B for ARM

Page 50: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

58

Web Page Caching Saves People Time

Assume people cost 20$/hour (or .2 $/hr ???)Assume 20% hit in browser, 40% in proxy Assume 3 second server timeCaching saves people time

28$/year to 150$/year of people time or .28 cents to 1.5$/year.

connection cacheR_remoteseconds

R_localseconds

Hhit rate

People Savings¢/page

LAN proxy 3 0.3 0.4 0.6

LAN browser 3 0.1 0.2 0.3

Modem proxy 5 2 0.4 0.7

Modem browser 5 0.1 0.2 0.5

Mobile proxy 13 10 0.4 0.7

Mobile browser 13 0.1 0.2 1.4

Page 51: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

59

Web Page Caching Saves Resources

Wire cost is penny (wireless) to 100µ$ LAN

Storage is 8 µ$/mo

Breakeven: wire cost = storage rent4 to 7 months

Add people cost: breakeven is ~ 4 years.“cheap people” (.2$/hr) 6 to 8 months.A

$/10 KB

download

network

B

$/10 KB

storage/mo

Time = A/B

Break-even

cache

storage time

C

People Cost

of download

$

Time =

(A+ C )/B

Break Even

Internet/LAN 1.E-04 8.E-06 18 months 0.02 15 yearsModem 2.E-04 8.E-06 36 months 0.03 21 yearsWireless 1.E-02 2.E-04 300 years 0.07 >999 years

Page 52: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

60

Caching Disk caching 5 minute rule for random IO 11 second rule for sequential IO

Web page caching: If page will be re-referenced in

18 months: with free users 15 years: with valuable usersthen cache the page in the client/proxy.

Challenge: guessing which pages will be re-referenceddetecting stale pages (page velocity)

Page 53: 1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.comGray@Microsoft.com,

61

Outline

Moore’s Law and consequences

Storage rules of thumb

Balanced systems rules revisited

Networking rules of thumb

Caching rules of thumb