Upload
cody-cooper
View
218
Download
0
Embed Size (px)
Citation preview
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Jim GrayTalk at University of Tokyo
Personal views on PITAC report: invest in long term research
Preview of Turing lecture: 10 long term research problems• Bush: Summarize info in cyberspace
• Turing: Intelligent Computers
• 7 9s: build systems that are always up and prove it.
5-Minute rule• For disks
• For tapes
Sorting Progress• PennySort
• Terabyte Sort (!)
Slides will be at http://research.Microsoft.com/~Gray/talks
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Presidential Advisory Committee on
High Performance Computing and Communications,
Information Technologies, and the Next Generation Internet
Information Technology
http://www.ccic.gov/ac/interim/ orhttp://research.microsoft.com/~Gray/papers/PITAC_Interim_Report_8_98.doc
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Charter for the Committee: provide an independent assessment of
High-Performance Computing and Communications (HPCC)
• Progress
• Balance among research components;
Next Generation Internet initiative;
• Progress
• Balance
IT Research and development• Maintain United States leadership in
—IT and
—Applications
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Committee Members
Co-Chairs:• Bill Joy, Sun Microsystems • Ken Kennedy, Rice
University
Members:• Eric Benhamou, 3Com • Vinton Cerf, MCI
• Ching-chih Chen, Simmons • David Cooper, LLNL
• Steve Dorfman, Hughes • David Dorman, PointCast
• Bob Ewald, SGI • David Farber, U. of Pennsylvania
• Sherri Fuller, U. of Washington • Hector Garcia-Molina, Stanford
• Susan Graham, UC Berkeley • Jim Gray, Microsoft
• Danny Hillis, Disney, Inc • John Miller, Montana State Univ.
• David Nagel, AT&T • Raj Reddy, Carnegie Mellon
• Ted Shortliffe, Stanford • Larry Smarr, U. of Illinois @ UC
• Joe Thompson, Miss. State U. • Les Vadasz, Intel
• Andy Viterbi, Qualcom • Steve Wallach, Centerpoint
• Irving Wladawsky-Berger, IBM
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
My Summary of the Report 1/3 of the US economic growth since 1992 was in the IT sector.
IT is key to our health, wealth, and safety. Created 400 B$ of wealth in last 3 years (!!)Federal IT research funding of twenty years ago, created the boom. Federal IT research funding for the last decade has been flat (in constant dollars).
Research funding is increasingly near-term & applied developmentThe committee recommends Increase long-term research funding in:• Software design and implementation technologies • Technologies to scale the Next Generation Internet to 6 billion users. • Tools, algorithms, and systems for high-performance computing.
Spend a billion dollars over the next 5 years on Lewis and Clark style "expeditions" into cyberspace.
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Myths
1. Now that IT is a big business, Industry will do long term research.
FACT:
industry spends LITTLE on long-term research.
it is not in their best interest
2. IT research = buy computers for scientists.
FACT
computer science research
is different from
the application of computers to some discipline.
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Research Priorities
Findings:• Total federal Information technology R&D investment is
inadequate
• Federal IT R&D is excessively focused on near-term problems
Recommendations:• Create a strategic initiative in long-term IT R&D
• Increase the investment for research in software, scalable information infrastructure, high-end computing, and socio-economic and workforce impacts
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Software Research
Findings:• Demand for software far exceeds the nation’s ability to produce it
• The nation depends on fragile software
• Technologies to build reliable and secure software are inadequate
• The nation is under-investing in fundamental software research
Recommendations:• Fund more fundamental research in software development methods and
component technologies
• Sponsor a national library of software components
• Make software research a substantive component of every major IT research initiative
• Support research in human-computer interfaces and interaction
Make fundamental software research an absolute priority
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Scalable Information Infrastructure Findings:
• The Internet has grown well beyond the intent of its original designers
• Our nation’s dependence on the information infrastructure is increasing daily
• We cannot safely extend what we currently know to more complex systems
• Learning how to build large-scale, highly reliable and secure systems requires research
Recommendations:• Increase funding in research and development of core software and
communications technologies aimed directly at the challenge of scaling the information infrastructure
• Expand the Next Generation Internet test beds to include additional industry partnerships in order to foster the rapid commercialization and deployment of enabling technologies
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
High-End Computing
Findings HEC is:• essential for science and engineering research• an element of the United States national security • ripe for new applications • suppliers suffer from unusual market pressures
Research& Development Recommendations• Fund innovative technologies and architectures• Fund HEC software (parallel programming)• Aim for a real application petaops by 2010
through a both hardware and software strategies• Fund HEC systems for science and engineering research
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Social, Economic, Workforce Recommendations
Expand research on the social and economic impacts of information technology diffusion and adoption
Expand initiatives to increase IT literacy, access and research capabilities
Address the shortage of high-technology workers
Programs to re-train “stale” IT workers
Encourage participation by women and minorities
Short-term increase in immigration of skilled IT workers
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Conclusions
IT is an essential foundation for commerce, education, health care, environmental stewardship, and national security:• Dramatically transform the way we communicate, learn, deal with
information and conduct research• Transform the nature of work, nature of commerce, product design cycle,
practice of health care, and the government itself
The total Federal IT R&D investment is inadequate
The Federal IT R&D is excessively focused on near-term problems
U. S. government must:• Create a strategic initiative in long-term IT R&D• Establish an effective structure for managing and coordinating IT
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Jim GrayTalk at University of Tokyo
Personal views on PITAC report: invest in long term research
Preview of Turing lecture: 10 long term research problems• Bush: Summarize info in cyberspace
• Turing: Intelligent Computers
• 7 9s: build systems that are always up and prove it.
5-Minute rule• For disks
• For tapes
Sorting Progress• PennySort
• Terabyte Sort (!)
Slides will be at http://research.Microsoft.com/~Gray/talks
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Vanaveer Bush: Memex
Memex: Proposed putting all information online (1948)
It will happen
Result: InfoGlut. Too much information in the shoebox
Challenge:
• Organize the information.
• Give answers as good as an expert in the field.
• Anticipate questions and so inform “subscriber”
Protect personal privacy
• A hacker cannot get access to your personal information without your consent.
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Turing’s Test (1951): Intelligent Machines
Computers helped with the 4-color problem end game Computers (and people) won world chess
championship Computers will likely be our 5th brain
• Augment our intelligence• See for us, hear for us, read for us,• Prosthetic eyes, ears, voices, arms, legs,….
Probably computers will be intelligent like plants and animals.
Perhaps computers can be intelligent like people• Pass the Turing Test (easy/impossible?) (70%, 5 minutes, B can lie)• Translating telephone (as good as a human translator)• Read a textbook and pass the written exam.• Pass a graduate programming class • Pass a graduate literature class
Radical: Download someone.
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Dependable Systems
Build a system used by millions of people each day.
Then:
• Prove that it does what it is supposed to do (code matches spec).
• Prove that it delivers 99.99999% (7 9s) availability (1 hr per millennium)
• Prove that it cannot be “hacked” for less than 1B$ (Y2K $)
Then build the system automatically from the specification.
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Jim GrayTalk at University of Tokyo
Personal views on PITAC report: invest in long term research
Preview of Turing lecture: 10 long term research problems• Bush: Summarize info in cyberspace
• Turing: Intelligent Computers
• 7 9s: build systems that are always up and prove it.
5-Minute rule• For disks
• For tapes
Sorting Progress• PennySort
• Terabyte Sort (!)
Slides will be at http://research.Microsoft.com/~Gray/talks
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Storage Hierarchy (9 levels)
Cache 1, 2
Main (1, 2, 3 if nUMA).
Disk (1 (cached), 2)
Tape (1 (mounted), 2)
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Meta-Message: Technology Ratios Are Important
Meta-Message: Technology Ratios Are Important
If everything gets faster & cheaper at the same rate THEN nothing really changes.
Things getting MUCH BETTER:• communication speed & cost 1,000x• processor speed & cost 100x• storage size & cost 100x
Things staying about the same• speed of light (more or less constant)• people (10x more expensive)• storage speed (only 10x better)
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Today’s Storage Hierarchy : Speed & Capacity vs Cost Tradeoffs
Today’s Storage Hierarchy : Speed & Capacity vs Cost Tradeoffs
1015
1012
109
106
103
Typ
ical
Sys
tem
(by
tes)
Size vs Speed
Access Time (seconds)10-9 10-6 10-3 10 0 10 3
Cache
Main
Secondary
Disc
Nearline Tape Offline
Tape
Online Tape
104
102
100
10-2
10-4
$/M
B
Price vs Speed
Access Time (seconds)10-9 10-6 10-3 10 0 10 3
Cache
MainSecondary
DiscNearline
TapeOffline Tape
Online Tape
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Storage Ratios Changed
10x better access time
10x more bandwidth
4,000x lower media price
DRAM/DISK 100:1 to 10:10 to 50:1
Disk Performance vs Time
1
10
100
1980 1990 2000
Year
acce
ss t
ime
(ms)
1
10
100
ban
dw
idth
(M
B/s
)
Disk Performance vs Time(accesses/ second & Capacity)
1
10
100
1980 1990 2000
Year
Acc
esse
s p
er
Sec
on
d
0.1
1
10
Dis
k C
apac
kty
(GB
)
Storage Price vs Time
0.01
0.1
1
10
100
1000
10000
1980 1990 2000
Year
$/M
B
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
The 5 Minute Rule DerivedM$: cost of a RAM page
RAM $/MB PageSize x Lifetime
A$: cost of a disk accessDisk Price
AccessesPerSec x Lifetime
RI: Reference Interval time between accesses to page
$
Reference Interval =Time
Disk access cost A$/RI
M$
Cost of a RAM page
M$= A$/RI
Breakeven:Breakeven: M$ = A$ / Reference IntervalM$ = A$ / Reference Interval
Reference Interval = M$/A$Reference Interval = M$/A$
= = DiskPrice x PageSizeDiskPrice x PageSize RAMprice x AccPerSecRAMprice x AccPerSec
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
The Five Minute Rule Observations
Break even has two terms:
(2) Economic term: DiskPrice / RAM_MB_Price ~ 400:4 = 100:1
(1) Technology term: PageSize / DiskAccPerSec ~ 8KB : 80 = 100:1
Economic term trends down
Technology term trends up to compensate.
Still at 5 minute for random, 1 minute sequential
1ofRAMPricePerMB
skDrivePricePerDi
skecondPerDiAccessPerS
ofRAMPagesPerMBtervaleferenceInBreakEvenR
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Shows Best Page Index Page Size ~16KB
Index Page Utility vs Page Size and Index Elemet Size
0.40
0.50
0.60
0.70
0.80
0.90
1.00
Page Size (KB)
Uti
lity
16 B 0.64 0.72 0.78 0.82 0.79 0.69 0.54
32 B 0.54 0.62 0.69 0.73 0.71 0.63 0.50
64 B 0.44 0.53 0.60 0.64 0.64 0.57 0.45
128 B 0.34 0.43 0.51 0.56 0.56 0.51 0.41
2 4 8 16 32 64 128
16 byte entries
32 byte
64 byte
128 byte
Index Page Utility vs Page Size and Disk Performance
0.40
0.50
0.60
0.70
0.80
0.90
1.00
Page Size (KB)
Uti
lity
40 MB/s 0.65 0.74 0.83 0.91 0.97 0.99 0.94
10 MB/s 0.64 0.72 0.78 0.82 0.79 0.69 0.54
5 MB/s 0.62 0.69 0.73 0.71 0.63 0.50 0.34
3 MB/s 0.51 0.56 0.58 0.54 0.46 0.34 0.22
1 MB/s 0.40 0.44 0.44 0.41 0.33 0.24 0.16
2 4 8 16 32 64 128
10 MB/s
5 MB/s
3 MB/s
1MB/s
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Standard Storage MetricsStandard Storage Metrics
Capacity: • RAM: MB and $/MB: today at 10MB & 100$/MB• Disk: GB and $/GB: today at 10 GB and 200$/GB• Tape: TB and $/TB: today at .1TB and 25k$/TB
(nearline)Access time (latency)• RAM: 100 ns• Disk: 10 ms• Tape: 30 second pick, 30 second position
Transfer rate• RAM: 1 GB/s• Disk: 5 MB/s - - - Arrays can go to 1GB/s• Tape: 5 MB/s - - - striping is problematic
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
New Storage Metrics: Kaps, Maps, SCAN?
New Storage Metrics: Kaps, Maps, SCAN?
Kaps: How many KB objects served per second• The file server, transaction processing metric• This is the OLD metric.
Maps: How many MB objects served per sec• The Multi-Media metric
SCAN: How long to scan all the data• The data mining and utility metric
And
•Kaps/$, Maps/$, TBscan/$
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
For the Record (good 1998 devices packaged in system
http://www.tpc.org/results/individual_results/Dell/dell.6100.9801.es.pdf)
DRAM DISK TAPE robotUnit capacity (GB) 1 18 35
Unit price $ 4000 500 10000$/GB 4000 28 20
Latency (s) 1.E-7 1.E-2 3.E+1Bandwidth (Mbps) 500 15 7
Kaps 5.E+5 1.E+2 3.E-2Maps 5.E+2 13.04 3.E-2
Scan time (s/TB) 2 1200 70000$/Kaps 9.E-11 5.E-8 3.E-3$/Maps 8.E-8 4.E-7 3.E-3
$/TBscan $0.08 $0.35 $211
X 14
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
For the Record (good 1998 devices packaged in system
http://www.tpc.org/results/individual_results/Dell/dell.6100.9801.es.pdf)
4.E+03500
5.E+05
500
2
9.E-11
8.E-08
0.08
28 15 9913
1200
5.E-084.E-07
0.3520 7
0.03 0.03
7.E+04
3.E-03 3.E-03
211
1.E-12
1.E-09
1.E-06
1.E-03
1.E+00
1.E+03
1.E+06
$/GB
Bandw
idth (
Mbp
s)
Kaps
Map
s
Scan tim
e (s/
TB)
$/Kap
s
$/M
aps
$/TBsc
an
DRAM
DISK
TAPE robot X 14
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
How To Get Lots of Maps, SCANsHow To Get Lots of Maps, SCANs
parallelism: use many little devices in parallel
Beware of the media myth
Beware of the access time myth
1 Terabyte
10 MB/s
At 10 MB/s: 1.2 days to scan
1 Terabyte
1,000 x parallel: 100 seconds SCAN.
Parallelism: divide a big problem into many smaller ones to be solved in parallel.
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
The Disk Farm On a CardThe Disk Farm On a CardThe 1 TB disc cardAn array of discsCan be used as 100 discs 1 striped disc 10 Fault Tolerant discs ....etcLOTS of accesses/second bandwidth
14"
Life is cheap, its the accessories that cost ya.
Processors are cheap, it’s the peripherals that cost ya (a 10k$ disc card).
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Tape Farms for Tertiary StorageNot Mainframe Silos
Tape Farms for Tertiary StorageNot Mainframe Silos
Scan in 27 hours.many independent tape robots(like a disc farm)
10K$ robot 14 tapes500 GB 5 MB/s 20$/GB 30 Maps
100 robots
50TB 50$/GB 3K Maps
27 hr Scan
1M$
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Tape & Optical: Beware of the Media Myth
Tape & Optical: Beware of the Media Myth
Optical is cheap: 200 $/platter 2 GB/platter => 100$/GB (2x cheaper than disc)
Tape is cheap: 30 $/tape 20 GB/tape => 1.5 $/GB (100x cheaper than disc).
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Tape & Optical Reality: Media is 10% of System Cost
Tape & Optical Reality: Media is 10% of System Cost
Tape needs a robot (10 k$ ... 3 m$ ) 10 ... 1000 tapes (at 20GB each) => 20$/GB ... 200$/GB
(1x…10x cheaper than disc)
Optical needs a robot (100 k$ ) 100 platters = 200GB ( TODAY ) => 400 $/GB
( more expensive than mag disc ) Robots have poor access times Not good for Library of Congress (25TB) Data motel: data checks in but it never checks out!
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
The Access Time MythThe Access Time Myth
The Myth: seek or pick time dominatesThe reality: (1) Queuing dominates (2) Transfer dominates BLOBs (3) Disk seeks often shortImplication: many cheap servers
better than one fast expensive server• shorter queues• parallel transfer• lower cost/access and cost/byte
This is now obvious for disk arraysThis will be obvious for tape arrays
Seek
Rotate
Transfer
Seek
Rotate
Transfer
Wait
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Jim GrayTalk at University of Tokyo
Personal views on PITAC report: invest in long term research
Preview of Turing lecture: 10 long term research problems• Bush: Summarize info in cyberspace
• Turing: Intelligent Computers
• 7 9s: build systems that are always up and prove it.
5-Minute rule• For disks
• For tapes
Sorting Progress• PennySort
• Terabyte Sort (!)
Slides will be at http://research.Microsoft.com/~Gray/talks
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Penny Sort Ground Ruleshttp://research.microsoft.com/barc/SortBenchmark
How much can you sort for a penny.• Hardware and Software cost• Depreciated over 3 years• 1M$ system gets about 1 second,• 1K$ system gets about 1,000 seconds.• Time (seconds) = SystemPrice ($) / 946,080
Input and output are disk resident
Input is • 100-byte records (random data)• key is first 10 bytes.
Must create output file and fill with sorted version of input file.
Daytona (product) and Indy (special) categories
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
PennySort Hardware
• 266 Mhz Intel PPro
• 64 MB SDRAM (10ns)
• Dual Fujitsu DMA 3.2GB EIDE disks
Software• NT workstation 4.3
• NT 5 sort
Performance• sort 15 M 100-byte records (~1.5 GB)
• Disk to disk
• elapsed time 820 sec
—cpu time = 404 sec
PennySort Machine (1107$ )
board13%
Memory8%
Cabinet + Assembly
7%
Network, Video, floppy
9%
Software6%
Other22%
cpu 32%
Disk25%
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Sort Speed Doubles Every Year
1E-3
1E-2
1E-1
1E+0
1E+1
1E+2
1E+3
1E+4
1E+5
1E+6
1E+7
1E+8
1985 1990 1995 2000
Records Sorted per Second
Doubles Every Year
GB Sorted per DollarDoubles Every Year
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Recent Results NOW Sort: 9 GB on a cluster of 100 UltraSparcs in 1 minute
MilleniumSort: 16x Dell NT cluster: 100 MB in 1.8 Sec (Datamation)
Tandem/Sandia Sort: 68 CPU ServerNet 1 TB in 47 minutes
Rumor of IBM Sort: 7000 cpu Blue Pacific 1 TB in 1024 seconds (17 minutes). 10 Mrps (1GBps)
Jim Gray / Presented at U. Tokyo / 23 Jan 1999
Jim GrayTalk at University of Tokyo
Personal views on PITAC report: invest in long term research
Preview of Turing lecture: 10 long term research problems• Bush: Summarize info in cyberspace
• Turing: Intelligent Computers
• 7 9s: build systems that are always up and prove it.
5-Minute rule• For disks
• For tapes
Sorting Progress• PennySort
• Terabyte Sort (!)
Slides will be at http://research.Microsoft.com/~Gray/talks