1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San...

Preview:

Citation preview

1

Rules of Thumb in Data Engineering

Jim GrayInternational Conference on Data EngineeringSan Diego, CA 4 March 2000Gray@Microsoft.com, http://research.Microsoft.com/~Gray/Talks/

2

Credits & Thank You!! Prashant Shenoy U. Mass, Amherst analysis of web caching rules. shenoy@cs.umass.edu

Terrance Kelly, U. Michigan,lots of advice on fixing the paper, tpkelly@mynah.eecs.umich.edu

interesting work on caching at: http://ai.eecs.umich.edu/~tpkelly/papers/wcp.pdf

Dave Lomet, Paul Larson, Surajit Chaudhurihow big should database pages be?

Remzi Arpaci-Dusseau, Kim Keeton, Erik Riedel discussions about balanced systems an IO

Windsor Hsu, Alan Smith, & Honesty Young, also studied TPC-C and balanced systems (very nice work!) http://golem.cs.berkeley.edu/~windsorh/DBChar/

Anastassia Ailamaki, Kim Keeton cpi measurements

Gordon Bell discussions on balanced systems.

3

Woops!

and Apology…..Printed/Published paper has MANY bugs! Conclusions OK (sort of ), but typos, flaws,

errors,… Revised version at

http://research.microsoft.com/~Gray/ and in CoRR and MS Research tech report archive.By 15 March 2000.

Sorry!

Sorry!

4

Outline

Moore’s Law and consequences

Storage rules of thumb

Balanced systems rules revisited

Networking rules of thumb

Caching rules of thumb

5

Trends: Moore’s LawPerformance/Price doubles every 18 months100x per decadeProgress in next 18 months

= ALL previous progress New storage = sum of all old storage

(ever) New processing = sum of all old

processing.

E. coli double ever 20 minutes!

15 years ago

6

Trends: ops/s/$ Had Three Growth Phases1890-1945

Mechanical

Relay

7-year doubling

1945-1985Tube, transistor,..

2.3 year doubling

1985-2000Microprocessor

1.0 year doubling 1.E-06

1.E-03

1.E+00

1.E+03

1.E+06

1.E+09

1880 1900 1920 1940 1960 1980 2000

doubles every 7.5 years

doubles every 2.3 years

doubles every 1.0 years

ops per second/$

7

Trends: Gilder’s Law: 3x bandwidth/year for 25 more years

Today: 10 Gbps per channel 4 channels per fiber: 40 Gbps 32 fibers/bundle = 1.2 Tbps/bundle

In lab 3 Tbps/fiber (400 x WDM)In theory 25 Tbps per fiber1 Tbps = USA 1996 WAN bisection bandwidthAggregate bandwidth doubles every 8 months!

1 fiber = 25 Tbps

8

Trends: Magnetic Storage Densities

Amazing progressRatios have changedCapacity grows 60%/yAccess speed grows 10x more slowly 0.01

0.1

1

10

100

1000

10000

100000

1000000

84 88 92 96 00 04

tpikbpiMBpsGbpsi

Magnetic Disk Parameters vs Time

year

9

Trends: Density Limits

The end is near!Products:11 GbpsiLab: 35 Gbpsi“limit”: 60 GbpsiButlimit keeps rising& there are alternatives

Bit Density

3 2

3,000 2,000

1,000 600

300 200

100 60

30 20

10 6

b/µm2 Gb/in2

1990 1992 1994 1996 1998 2000 2002 2004 2006 2008

1 0.6

CD

DVD ODD

Wavelength Limit

SuperParmagnetic Limit

?: NEMS, Florescent

? Holograpic

, DNA?

Figure adapted from Franco Vitaliano, “The NEW new media: the growing attraction of nonmagnetic storage”, Data Storage, Feb 2000, pp 21-32, www.datastorage.com

Density vs Timeb/µm2 & Gb/in2

10

Trends: promises NEMS (Nano Electro Mechanical Systems)(http://www.nanochip.com/) also Cornell, IBM, CMU,…

• 250 Gbpsi by using tunneling electronic microscope

• Disk replacement• Capacity: 180 GB now,

1.4 TB in 2 years • Transfer rate: 100 MB/sec R&W• Latency: 0.5msec• Power: 23W active, .05W Standby• 10k$/TB now, 2k$/TB in 2002

11

Consequence of Moore’s law:Need an address bit every 18 months.

Moore’s law gives you 2x more in 18 months.RAM Today we have 10 MB to 100 GB machines

(24-36 bits of addressing) then In 9 years we will need 6 more bits:

30-42 bit addressing (4TB ram).

Disks Today we have 10 GB to 100 TB file systems/DBs

(33-47 bit file addresses) In 9 years, we will need 6 more bits

40-53 bit file addresses (100 PB files)

12

Architecture could change this

1-level store: System 48, AS400 has 1-level store. Never re-uses an address. Needs 96-bit addressing today.

NUMAs and Clusters Willing to buy a 100 M$ computer? Then add 6 more address bits.

Only 1-level store pushes us beyond 64-bitsStill, these are “logical” addresses, 64-bit physical will last many years

13

Outline

Moore’s Law and consequences

Storage rules of thumb

Balanced systems rules revisited

Networking rules of thumb

Caching rules of thumb

14

Storage Latency: How Far Away is the Data?

RegistersOn Chip CacheOn Board Cache

Memory

Disk

12

10

100

Tape /Optical Robot

10 9

10 6

Olympia

This Hotel

This RoomMy Head

10 min

1.5 hr

2 Years

1 min

Pluto

2,000 YearsAndromeda

15

Storage Hierarchy : Speed & Capacity vs Cost TradeoffsStorage Hierarchy : Speed & Capacity vs Cost Tradeoffs

1015

1012

109

106

103

Typ

ical

Sys

tem

(by

tes)

Size vs Speed

Access Time (seconds)10-9 10-6 10-3 10 0 10 3

Cache

Main

Secondary

Disc

Nearline Tape

Offline Tape

Online Tape

102

100

10-2

10-4

10-6

$/M

B

Price vs Speed

Access Time (seconds)10-9 10-6 10-3 10 0 10 3

Cache

MainSecondary

Disc

Nearline Tape

Offline Tape

Online Tape

16

Disks: TodayDisk is 8GB to 80 GB10-30 MBps5k-15k rpm (6ms-2ms rotational latency)

12ms-7ms seek7K$/IDE-TB, 20k$/SCSI-TBFor shared disks most time spent waiting in queue for access to arm/controller

Seek

Rotate

Transfer

Seek

Rotate

Transfer

Wait

17

Standard Storage MetricsCapacity: RAM: MB and $/MB: today at 512MB and 3$/MB Disk: GB and $/GB: today at 40GB and 20$/GB Tape: TB and $/TB: today at 40GB and

10k$/TB (nearline)

Access time (latency) RAM: 100 ns Disk: 15 ms Tape: 30 second pick, 30 second position

Transfer rate RAM: 1-10 GB/s Disk: 20-30 MB/s - - -Arrays can go to 10GB/s Tape: 5-15 MB/s - - - Arrays can go to

1GB/s

18

New Storage Metrics: Kaps, Maps, SCAN

Kaps: How many kilobyte objects served per second The file server, transaction processing metric This is the OLD metric.

Maps: How many megabyte objects served per sec The Multi-Media metric

SCAN: How long to scan all the data the data mining and utility metric

And Kaps/$, Maps/$, TBscan/$

22

Storage Ratios Changed10x better access time10x more bandwidth100x more capacityData 25x cooler (1Kaps/20MB vs 1Kaps/500MB)

4,000x lower media price20x to 100x lower disk priceScan takes 10x longer (3 min vs 45 min)

Disk Performance vs Time

1

10

100

1980 1990 2000

Year

seek

s p

er s

eco

nd

ban

dw

idth

: MB

/s

0.1

1.

10.

Cap

acity

(GB

)

Disk accesses/second vs Time

1

10

100

1980 1990 2000

Year

Acc

esse

s p

er S

eco

nd

Storage Price vs TimeMegabytes per kilo-dollar

0.1

1.

10.

100.

1,000.

10,000.

1980 1990 2000

Year

MB

/k$

DRAM/disk media price ratio changed

1970-1990 100:1 1990-1995 10:1 1995-1997 50:1 today ~ 0.03$/MB disk 100:1

3$/MB dram

23

Data on Disk Can Move to RAM in 10 years

Storage Price vs TimeMegabytes per kilo-dollar

0.1

1.

10.

100.

1,000.

10,000.

1980 1990 2000

Year

MB

/k$

100:1

10 years

24

Kaps over time

1.E+0

1.E+1

1.E+2

1.E+3

1.E+4

1.E+5

1.E+6

1970 1980 1990 2000

Kap

s/$

10

100

1000

Kap

s/d

isk

Kaps

Kaps/$

More Kaps and Kaps/$ but….

Disk accesses got much less expensiveBetter disks

Cheaper disks!But: disk arms are expensivethe scarce resource45 minute Scanvs 5 minutes in 1990

100 GB

30 MB/s

25

Disk vs Tape

Disk 40 GB 20 MBps 5 ms seek time 3 ms rotate latency 7$/GB for drive

3$/GB for ctlrs/cabinet 4 TB/rack

1 hour scan

Tape 40 GB 10 MBps 10 sec pick time 30-120 second seek time 2$/GB for media

8$/GB for drive+library 10 TB/rack

1 week scanThe price advantage of tape is narrowing, and the performance advantage of disk is growingAt 10K$/TB, disk is competitive with nearline tape.

GuestimatesCern: 200 TB3480 tapes2 col = 50GBRack = 1 TB=20 drives

27

It’s Hard to Archive a PetabyteIt takes a LONG time to restore it.At 1GBps it takes 12 days!Store it in two (or more) places online (on disk?).

A geo-plexScrub it continuously (look for errors)On failure, use other copy until failure repaired, refresh lost copy from safe copy.

Can organize the two copies differently (e.g.: one by time, one by space)

28

The “Absurd” 10x (=5 year) Disk

2.5 hr scan time (poor sequential access)1 aps / 5 GB (VERY cold data)It’s a tape!

1 TB100 MB/s

200 Kaps

29

How to cool disk data:

Cache data in main memory See 5 minute rule later in presentation

Fewer-larger transfers Larger pages (512-> 8KB -> 256KB)

Sequential rather than random access Random 8KB IO is 1.5 MBps Sequential IO is 30 MBps (20:1 ratio is

growing)

Raid1 (mirroring) rather than Raid5 (parity).

30

Stripes, Mirrors, Parity (RAID 0,1, 5)

RAID 0: Stripes bandwidth

RAID 1: Mirrors, Shadows,… Fault tolerance Reads faster, writes 2x slower

RAID 5: Parity Fault tolerance Reads faster Writes 4x or 6x slower.

0,3,6,.. 1,4,7,.. 2,5,8,..

0,1,2,.. 0,1,2,..

0,2,P2,.. 1,P1,4,.. P0,3,5,..

31

RAID 10 (strips of mirrors) Wins“wastes space, saves arms”RAID 5 (6 disks 1 vol):

Performance 675 reads/sec 210 writes/sec Write

4 logical IO, 2 seek + 1.7 rotate

SAVES SPACEPerformance degrades on failure

RAID1 (6 disks, 3 pairs)

Performance 750 reads/sec 300 writes/sec Write

2 logical IO 2 seek 0.7 rotate

SAVES ARMSPerformance improves on failure

33

Auto Manage Storage1980 rule of thumb: A DataAdmin per 10GB, SysAdmin per mips

2000 rule of thumb A DataAdmin per 5TB SysAdmin per 100 clones (varies with app).

Problem: 5TB is 60k$ today, 10k$ in a few years.

Admin cost >> storage cost !!!!Challenge: Automate ALL storage admin tasks

34

Summarizing storage rules of thumb (1)

Moore’s law: 4x every 3 years 100x more per decade

Implies 2 bit of addressing every 3 years.Storage capacities increase 100x/decadeStorage costs drop 100x per decadeStorage throughput increases 10x/decadeData cools 10x/decadeDisk page sizes increase 5x per decade.

35

Summarizing storage rules of thumb (2)

RAM:Disk and Disk:Tape cost ratios are 100:1 and 3:1So, in 10 years, disk data can move to RAM since prices decline 100x per decade. A person can administer a million dollars of disk storage: that is 1TB - 100TB todayDisks are replacing tapes as backup devices.You can’t backup/restore a Petabyte quicklyso geoplex it.

Mirroring rather than Parity to save disk arms

36

Outline

Moore’s Law and consequences

Storage rules of thumb

Balanced systems rules revisited

Networking rules of thumb

Caching rules of thumb

37

Standard Architecture (today)

PCI Bus 2

System Bus

PCI Bus 1

38

Amdahl’s Balance Laws

parallelism law: If a computation has a serial part S and a parallel component P, then the maximum speedup is (S+P)/S.balanced system law: A system needs a bit of IO per second per instruction per second:about 8 MIPS per MBps.

memory law: =1: the MB/MIPS ratio (called alpha ()), in a balanced system is 1.IO law: Programs do one IO per 50,000 instructions.

39

Amdahl’s Laws Valid 35 Years Later?

Parallelism law is algebra: so SURE! Balanced system laws? Look at tpc results (tpcC, tpcH) at http://www.tpc.org/

Some imagination needed: What’s an instruction (CPI varies from 1-

3)? RISC, CISC, VLIW, … clocks per instruction,…

What’s an I/O?

40

Disks/ cpu

 

50

22

TPC systemsNormalize for CPI (clocks per instruction) TPC-C has about 7 ins/byte of IO TPC-H has 3 ins/byte of IO

TPC-H needs ½ as many disks, sequential vs randomBoth use 9GB 10 krpm disks (need arms, not bytes)

  MHz/cpu

CPI mipsKB

/IO

IO/s/

disk

Disks

MB/s/

cpu

Ins/IO

Byte

Amdahl 1 1 1 6      8

TPC-C=random

550 2.1 262 8 100 397 40 7TPC-H= sequential

550 1.2 458 64 100 176 141 3

41

TPC systems: What’s alpha (=MB/MIPS)?Hard to say:

Intel 32 bit addressing (= 4GB limit). Known CPI.

IBM, HP, Sun have 64 GB limit. Unknown CPI.

Look at both, guess CPI for IBM, HP, Sun

Alpha is between 1 and 6Mips Memory Alpha

Amdahl 1 1 1tpcC Intel 8x262 = 2Gips 4GB 2tpcH Intel 8x458 = 4Gips 4GB 1tpcC IBM 24 cpus ?= 12 Gips 64GB 6tpcH HP 32 cpus ?= 16 Gips 32 GB 2

43

Amdahl’s Balance Laws Revised

Laws right, just need “interpretation” (imagination?)

Balanced System Law: A system needs 8 MIPS/MBpsIO, but instruction rate must be measured on the workload. Sequential workloads have low CPI (clocks per

instruction), random workloads tend to have higher CPI.

Alpha (the MB/MIPS ratio) is rising from 1 to 6. This trend will likely continue.One Random IO’s per 50k instructions. Sequential IOs are larger One sequential IO per 200k instructions

44

PAP vs RAP Peak Advertised Performance vs Real Application Performance

File System

ApplicationData

133 MBps

90 MBps

PCI

66 MBps25 MBps

Disks

SCSI160 MBps

90 MBps

1600 MBps500 MBps

System Bus 550 x4 Mips = 2 Bips1-3 cpi = 170-550 mips

CPU

PCI Bus 2

System Bus

PCI Bus 1

45

Outline

Moore’s Law and consequencesStorage rules of thumbBalanced systems rules revisitedNetworking rules of thumbCaching rules of thumb

47

1 GBps1 GBps

Ubiquitous 10 GBps SANs

in 5 years

1Gbps Ethernet are reality now. Also FiberChannel ,MyriNet, GigaNet,

ServerNet,, ATM,…

10 Gbps x4 WDM deployed now (OC192) 3 Tbps WDM working in lab

In 5 years, expect 10x, wow!!

5 MBps20 MBps

40 MBps

80 MBps

120 MBps120 MBps(1Gbps)(1Gbps)

48

Networking

WANS are getting faster than LANSG8 = OC192 = 8Gbps is “standard”Link bandwidth improves 4x per 3 yearsSpeed of light (60 ms round trip in US)Software stacks have always been the problem.

Time = SenderCPU + ReceiverCPU + bytes/bandwidth

This has been the problem

49

0

50

100

150

200

250

100Mbps Gbps SAN

Transmitreceivercpusender cpu

Time µs toSend 1KB

The Promise of SAN/VIA:10x in 2 years http://www.ViArch.org/

Yesterday: 10 MBps (100 Mbps Ethernet)

~20 MBps tcp/ip saturates 2 cpus

round-trip latency ~250 µs

Now Wires are 10x faster

Myrinet, Gbps Ethernet, ServerNet,…

Fast user-level communication tcp/ip ~ 100 MBps 10% cpu round-trip latency is 15 us

1.6 Gbps demoed on a WAN

50

How much does wire-time cost?$/Mbyte?

Cost Time

Gbps Ethernet .2µ$ 10 ms100 Mbps Ethernet .3µ$ 100 msOC12 (650 Mbps) .003$ 20 msDSL .0006$ 25 secPOTs .002$ 200 secWireless: .80$ 500 sec

Seat cost$/3y

BandwidthB/s $/MB Time

GBpsE 2000 1.00E+08 2.E-07 0.010100MbpsE 700 1.00E+07 7.E-07 0.100OC12 12960000 5.00E+07 3.E-03 0.020OC3 3132000 3.00E+06 1.E-02 0.333T1 28800 1.00E+05 3.E-03 10.000DSL 2300 4.00E+04 6.E-04 25.000POTS 1180 5.00E+03 2.E-03 200.000Wireless ? 2.00E+03 8.E-01 500.000

seconds in 3 years 94608000

52

Outline

Moore’s Law and consequences

Storage rules of thumb

Balanced systems rules revisited

Networking rules of thumb

Caching rules of thumb

53

The Five Minute RuleTrade DRAM for Disk AccessesCost of an access (Drive_Cost / Access_per_second)Cost of a DRAM page ( $/MB/ pages_per_MB)Break even has two terms:Technology term and an Economic term

Grew page size to compensate for changing ratios.Now at 5 minutes for random, 10 seconds sequential

ofDRAMPricePerMB

skDrivePricePerDi

skecondPerDiAccessPerS

ofDRAMPagesPerMBtervaleferenceInBreakEvenR

54

Cost a RAM Page RAM_$_Per_MB

PagesPerMB

The 5 Minute Rule Derived

Breakeven: RAM_$_Per_MB = _____DiskPrice . PagesPerMB T x AccessesPerSecond

T = DiskPrice x PagesPerMB . RAM_$_Per_MB x AccessPerSecond

$

( )/

T

T =TimeBetweenReferences to Page

Disk Access Cost /T

DiskPrice .

AccessesPerSecond

55

Plugging in the Numbers

ofDRAMPricePerMB

skDrivePricePerDi

skecondPerDiAccessPerS

ofDRAMPagesPerMBtervaleferenceInBreakEvenR

PPM/aps disk$/Ram$ Break Even

Random 128/120 ~1

1000/3 ~300 5 minutes

Sequential

1/30 ~ .03 ~ 300 10second

s Trend is longer times because disk$ not changing much, RAM$ declining 100x/decade

5 Minutes & 10 second rule

56

When to Cache Web Pages.

Caching saves user timeCaching saves wire timeCaching costs storageCaching only works sometimes: New pages are a miss Stale pages are a miss

57

The 10 Instruction RuleSpend 10 instructions /second to save 1 byteCost of instruction:

I =ProcessorCost/MIPS*LifeTimeCost of byte:

B = RAM_$_Per_B/LifeTimeBreakeven:

NxI = B

N = B/I = (RAM_$_B X MIPS)/ ProcessorCost ~ (3E-6x5E8)/500 = 3 ins/B for Intel

~ (3E-6x3E8)/10 = 10 ins/B for ARM

58

Web Page Caching Saves People Time

Assume people cost 20$/hour (or .2 $/hr ???)Assume 20% hit in browser, 40% in proxy Assume 3 second server timeCaching saves people time

28$/year to 150$/year of people time or .28 cents to 1.5$/year.

connection cacheR_remoteseconds

R_localseconds

Hhit rate

People Savings¢/page

LAN proxy 3 0.3 0.4 0.6

LAN browser 3 0.1 0.2 0.3

Modem proxy 5 2 0.4 0.7

Modem browser 5 0.1 0.2 0.5

Mobile proxy 13 10 0.4 0.7

Mobile browser 13 0.1 0.2 1.4

59

Web Page Caching Saves Resources

Wire cost is penny (wireless) to 100µ$ LAN

Storage is 8 µ$/mo

Breakeven: wire cost = storage rent4 to 7 months

Add people cost: breakeven is ~ 4 years.“cheap people” (.2$/hr) 6 to 8 months.A

$/10 KB

download

network

B

$/10 KB

storage/mo

Time = A/B

Break-even

cache

storage time

C

People Cost

of download

$

Time =

(A+ C )/B

Break Even

Internet/LAN 1.E-04 8.E-06 18 months 0.02 15 yearsModem 2.E-04 8.E-06 36 months 0.03 21 yearsWireless 1.E-02 2.E-04 300 years 0.07 >999 years

60

Caching Disk caching 5 minute rule for random IO 11 second rule for sequential IO

Web page caching: If page will be re-referenced in

18 months: with free users 15 years: with valuable usersthen cache the page in the client/proxy.

Challenge: guessing which pages will be re-referenceddetecting stale pages (page velocity)

61

Outline

Moore’s Law and consequences

Storage rules of thumb

Balanced systems rules revisited

Networking rules of thumb

Caching rules of thumb

Recommended