51
Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA, 14 Oct 1999

1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

Embed Size (px)

Citation preview

Page 1: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

1

Yotta

Zetta

Exa

Peta

Tera

Giga

Mega

Kilo

Data Centric ComputingData Centric ComputingJim Gray

Microsoft Research

Research.Microsoft.com/~Gray/talks

FAST 2002

Monterey, CA, 14 Oct 1999

Page 2: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

2

Put Everything in Future (Disk) Controllers(it’s not “if”, it’s “when?”)

Jim GrayMicrosoft Researchhttp://Research.Micrsoft.com/~Gray/talksFAST 2002 Monterey, CA, 14 Oct 1999

Acknowledgements:

Dave Patterson explained this to me long ago Leonard Chung

Kim Keeton Erik Riedel Catharine Van Ingen

Helped me sharpen these arguments

Page 3: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

3

First Disk 1956• IBM 305 RAMAC

• 4 MB

• 50x24” disks

• 1200 rpm

• 100 ms access

• 35k$/y rent

• Included computer & accounting software(tubes not transistors)

Page 4: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

4

10 years later1.

6 m

eter

s

Page 5: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

5

Disk Evolution• Capacity:100x in 10 years

1 TB 3.5” drive in 2005 20 GB as 1” micro-drive

• System on a chip

• High-speed SAN

• Disk replacing tape

• Disk is super computer!

Kilo

Mega

Giga

Tera

Peta

Exa

Zetta

Yotta

Page 6: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

6

Disks are becoming computers• Smart drives

• Camera with micro-drive

• Replay / Tivo / Ultimate TV

• Phone with micro-drive

• MP3 players

• Tablet

• Xbox

• Many more…

Disk Ctlr + 1Ghz cpu+1GB RAM

Comm:Infiniband, Ethernet, radio…

ApplicationsWeb, DBMS, Files

OS

Page 7: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

7

Data Gravity Processing Moves to Transducers smart displays, microphones, printers, NICs, disks

Storage

Network

Display

ASIC

ASIC

ASICToday:

P=50 mips

M= 2 MB

In a few years

P= 500 mips

M= 256 MB

Processing decentralized

Moving to data sources

Moving to power sources

Moving to sheet metal

? The end of computers ?

Page 8: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

8

It’s Already True of PrintersPeripheral = CyberBrick

• You buy a printer• You get a

– several network interfaces– A Postscript engine

• cpu, • memory, • software,• a spooler (soon)

– and… a print engine.

Page 9: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

9

The (absurd?) consequences of Moore’s Law

• 256 way nUMA?• Huge main memories: now:

500MB - 64GB memories then: 10GB - 1TB memories

• Huge disksnow: 20-200 GB 3.5” disks then: .1 - 1 TB disks

• Petabyte storage farms– (that you can’t back up or restore).

• Disks >> tapes– “Small” disks:

One platter one inch 10GB

• SAN convergence 1 GBps point to point is easy

• 1 GB RAM chips

• MAD at 200 Gbpsi

• Drives shrink one quantum

• 10 GBps SANs are ubiquitous

• 1 bips cpus for 10$

• 10 bips cpus at high end

Page 10: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

10

The Absurd Design?• Further segregate processing from storage

• Poor locality

• Much useless data movement

• Amdahl’s laws: bus: 10 B/ips io: 1 b/ips

ProcessorsDisks

~ 1 Tips

RAM

~ 1 TB

~ 100TB

100 GBps10 TBps

Page 11: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

11

What’s a Balanced System?(40+ disk arms / cpu)

System Bus

PCI Bus PCI Bus

Page 12: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

13

Observations re TPC C, H systems

• More than ½ the hardware cost is in disks

• Most of the mips are in the disk controllers

• 20 mips/arm is enough for tpcC

• 50 mips/arm is enough for tpcH

• Need 128MB to 256MB/arm

• Ref:– Gray& Shenoy: “Rules of Thumb…”– Keeton, Riedel, Uysal, PhD thesis.

? The end of computers ?

Page 13: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

16

When each disk has 1bips, no need for ‘cpu’

Page 14: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

17

Implications

• Offload device handling to NIC/HBA

• higher level protocols: I2O, NASD, VIA, IP, TCP…

• SMP and Cluster parallelism is important.

Terabyte/s Backplane

• Move app to NIC/device controller

• higher-higher level protocols: CORBA / COM+.

• Cluster parallelism is VERY important.

CentralProcessor &

Memory

Conventional Radical

Page 15: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

18

Interim Step: Shared Logic• Brick with 8-12 disk drives• 200 mips/arm (or more)

• 2xGbpsEthernet• General purpose OS

(except NetApp )

• 10k$/TB to 50k$/TB• Shared

– Sheet metal

– Power

– Support/Config

– Security

– Network ports

Snap™~1TB 12x80GB NAS

NetApp™~.5TB 8x70GB NAS

Maxstor™~2TB 12x160GB NAS

Page 16: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

20

Gordon Bell’s Seven Price Tiers 10$: wrist watch computers

100$: pocket/ palm computers

1,000$: portable computers

10,000$: personal computers (desktop)

100,000$: departmental computers (closet)

1,000,000$: site computers (glass house)

10,000,000$: regional computers (glass castle)

Super-Server: Costs more than 100,000 $“Mainframe” Costs more than 1M$Must be an array of processors,

disks, tapescomm ports

Page 17: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

21

Bell’s Evolution of Computer ClassesTechnology enable two evolutionary paths:

1. constant performance, decreasing cost2. constant price, increasing performance

??Time

Mainframes (central)

Minis (dep’t.)

PCs (personals)Lo

g P

rice

WSs

1.26 = 2x/3 yrs -- 10x/decade; 1/1.26 = .81.6 = 4x/3 yrs --100x/decade; 1/1.6 = .62

Page 18: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

22

NAS vs SAN• Network Attached Storage

– File servers– Database servers– Application servers– (it’s a slippery slope: as Novell showed)

• Storage Area Network– A lower life form– Block server: get block / put block– Wrong abstraction level (too low level)– Security is VERY hard to understand.

• (who can read that disk block?)

SCSI and iSCSI are popular.

High level Interfaces are better

Page 19: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

23

How Do They Talk to Each Other?• Each node has an OS• Each node has local resources: A federation.• Each node does not completely trust the others.• Nodes use RPC to talk to each other

– WebServices/SOAP? CORBA? COM+? RMI?

– One or all of the above.

• Huge leverage in high-level interfaces.• Same old distributed system story.

SANSIO

stre

ams

data

gram

s

RP

C?

Applications

SIO

streams

datagrams

RP

C ?

Applications

Page 20: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

25

The Slippery Slope

• If you add function to server

• Then you add more function to server

• Function gravitates to data.

Nothing = Sector Server

Everything = App Server

Something =

Fixed App Server

Page 21: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

26

Why Not a Sector Server?(let’s get physical!)

• Good idea, that’s what we have today.

• But– cache added for performance– Sector remap added for fault tolerance– error reporting and diagnostics added– SCSI commends (reserve,.. are growing)– Sharing problematic (space mgmt, security,…)

• Slipping down the slope to a 2-D block server

Page 22: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

27

Why Not a 1-D Block Server?Put A LITTLE on the Disk Server

• Tried and true design– HSC - VAX cluster– EMC– IBM Sysplex (3980?)

• But look inside– Has a cache – Has space management– Has error reporting & management– Has RAID 0, 1, 2, 3, 4, 5, 10, 50,…– Has locking– Has remote replication– Has an OS– Security is problematic– Low-level interface moves too many bytes

Page 23: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

28

Why Not a 2-D Block Server?Put A LITTLE on the Disk Server

• Tried and true design– Cedar -> NFS– file server, cache, space,..– Open file is many fewer msgs

• Grows to have– Directories + Naming– Authentication + access control– RAID 0, 1, 2, 3, 4, 5, 10, 50,…– Locking– Backup/restore/admin– Cooperative caching with client

Page 24: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

29

Why Not a File Server?Put a Little on the 2-D Block Server

• Tried and true design– NetWare, Windows,

Linux, NetApp, Cobalt, SNAP,...WebDav

• Yes, but look at NetWare– File interface grew– Became an app server

• Mail, DB, Web,….

– Netware had a primitive OS• Hard to program, so optimized wrong thing

Page 25: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

30

Why Not Everything?

Allow Everything on Disk Server(thin client’s)

• Tried and true design– Mainframes, Minis, ...– Web servers,…– Encapsulates data– Minimizes data moves– Scaleable

• It is where everyone ends up.

• All the arguments against are short-term.

Page 26: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

31

The Slippery Slope

• If you add function to server

• Then you add more function to server

• Function gravitates to data.

Nothing = Sector Server

Everything = App Server

Something =

Fixed App Server

Page 27: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

32

Disk = Node• has magnetic storage (1TB?)

• has processor & DRAM

• has SAN attachment

• has execution environment

OS KernelSAN driver Disk driver

File System RPC, ...Services DBMS

Applications

Page 28: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

33

Hardware• Homogenous machines

leads to quick response through reallocation

• HP desktop machines, 320MB RAM, 3u high, 4 100GB IDE Drives

• $4k/TB (street), 2.5processors/TB, 1GB RAM/TB

• 3 weeks from ordering to operational

Slide courtesy of Brewster Kahle, @ Archive.org

Page 29: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

34

Disk as Tape• Tape is unreliable, specialized,

slow, low density, not improving fast, and expensive

• Using removable hard drives to replace tape’s function has been successful

• When a “tape” is needed, the drive is put in a machine and it is online. No need to copy from tape before it is used.

• Portable, durable, fast, media cost = raw tapes, dense. Unknown longevity: suspected good.

Slide courtesy of Brewster Kahle, @ Archive.org

Page 30: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

35

Disk As Tape: What format?• Today I send NTFS/SQL disks.• But that is not a good format for Linux.• Solution: Ship NFS/CIFS/ODBC servers (not disks)• Plug “disk” into LAN.

– DHCP then file or DB server via standard interface.

– Web Service in long term

Page 31: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

36

Some Questions

• Will the disk folks deliver?

• What is the product?

• How do I manage 1,000 nodes (disks)?

• How do I program 1,000 nodes (disks)?

• How does RAID work?

• How do I backup a PB?

• How do I restore a PB?

Page 32: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

37

Will the disk folks deliver? Maybe!Hard Drive Unit Shipments

Total Hard Drive Unit Shipments

0

50

100

150

200

250

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

Un

its in

Mill

ion

s

Source: DiskTrend/IDC

Not a pretty picture (lately)

Page 33: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

38

Most Disks are Personal• 85% of disks are desktop/mobile (not SCSI)• Personal media is AT LEAST 50% of the problem.• How to manage your shoebox of:

– Documents

– Voicemail

– Photos

– Music

– Videos

Music6.9 GB

1.8K files180 CDs

Working2.3 GB

432 folders2.9K files

Archive5.1 GB

477 folders18.7 K files

Video2.6 GB

10 hoursLow res

My Books98 MB

27.1K files & 42K .msg17.7 GB (by size) Files (by number)

.xls.jpg

.doc/html

.pdf .tif

Mail.7 GB

43K msgs

.doc/html.jpg

.gif

.xls

.pdf

.ppt

.tif

.gif

Music6.9 GB

1.8K files180 CDs

Working2.3 GB

432 folders2.9K files

Archive5.1 GB

477 folders18.7 K files

Video2.6 GB

10 hoursLow res

My Books98 MB

27.1K files & 42K .msg17.7 GB (by size) Files (by number)

.xls.jpg

.doc/html

.pdf .tif

Mail.7 GB

43K msgs

.doc/html.jpg

.gif

.xls

.pdf

.ppt

.tif

.doc/html.jpg

.gif

.xls

.pdf

.ppt

.tif

.gif

Page 34: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

39

What is the Product?(see next section on media management)

• Concept: Plug it in and it works!• Music/Video/Photo appliance (home)• Game appliance • “PC”• File server appliance• Data archive/interchange appliance• Web appliance• Email appliance• Application appliance• Router appliance

power

network

Page 35: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

40

Auto Manage Storage• 1980 rule of thumb:

– A DataAdmin per 10GB, SysAdmin per mips

• 2000 rule of thumb– A DataAdmin per 5TB – SysAdmin per 100 clones (varies with app).

• Problem:– 5TB is 50k$ today, 5k$ in a few years.

– Admin cost >> storage cost !!!!• Challenge:

– Automate ALL storage admin tasks

Page 36: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

41

How do I manage 1,000 nodes?• You can’t manage 1,000 x (for any x).• They manage themselves.

– You manage exceptional exceptions.

• Auto Manage– Plug & Play hardware

– Auto-load balance & placement storage & processing

– Simple parallel programming model

– Fault masking

• Some positive signs:– Few admins at Google 10k nodes 2 PB ,

Yahoo! ? nodes, 0.3 PB,Hotmail 10k nodes, 0.3 PB

Page 37: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

42

How do I program 1,000 nodes?

• You can’t program 1,000 x (for any x).

• They program themselves.– You write embarrassingly parallel programs– Examples: SQL, Web, Google, Inktomi, HotMail,….– PVM and MPI prove it must be automatic (unless you have a PhD)!

• Auto Parallelism is ESSENTIAL

Page 38: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

43

Plug & Play Software• RPC is standardizing: (SOAP/HTTP, COM+, RMI/IIOP)

– Gives huge TOOL LEVERAGE– Solves the hard problems :

• naming, • security, • directory service, • operations,...

• Commoditized programming environments – FreeBSD, Linix, Solaris,…+ tools– NetWare + tools– WinCE, WinNT,…+ tools– JavaOS + tools

• Apps gravitate to data.

• General purpose OS on dedicated ctlr can run apps.

Page 39: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

44

It’s Hard to Archive a PetabyteIt takes a LONG time to restore it.

• At 1GBps it takes 12 days!• Store it in two (or more) places online (on disk?).

A geo-plex• Scrub it continuously (look for errors)• On failure,

– use other copy until failure repaired, – refresh lost copy from safe copy.

• Can organize the two copies differently (e.g.: one by time, one by space)

Page 40: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

52

CyberBricks• Disks are becoming supercomputers.• Each disk will be a file server then SOAP server• Multi-disk bricks are transitional• Long-term brick will have OS per disk.• Systems will be built from bricks.

• There will also be – Network Bricks

– Display Bricks

– Camera Bricks

– ….

Page 41: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

53

Yotta

Zetta

Exa

Peta

Tera

Giga

Mega

Kilo

Data Centric ComputingData Centric ComputingJim Gray

Microsoft Research

Research.Microsoft.com/~Gray/talks

FAST 2002

Monterey, CA, 14 Oct 1999

Page 42: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

54

Communications Excitement!!

Point-to-Point Broadcast

Immediate

TimeShifted

conversationmoney

lectureconcert

mail booknewspaper

NetNetWorkWork+ DB+ DB

DataDataBaseBase

Its ALL going electronicInformation is being stored for analysis (so ALL database)Analysis & Automatic Processing are being added

Slide borrowed from Craig Mundie

Page 43: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

55

Information Excitement!• But comm just carries information

• Real value added is– information capture & render

speech, vision, graphics, animation, …

– Information storage retrieval, – Information analysis

Page 44: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

56

Information At Your Fingertips

• All information will be in an online database (somewhere)

• You might record everything you – read: 10MB/day, 400 GB/lifetime (5 disks today)

– hear: 400MB/day, 16 TB/lifetime (2 disks/year today)

– see: 1MB/s, 40GB/day, 1.6 PB/lifetime (150 disks/year maybe someday)

• Data storage, organization, and analysis is challenge.• text, speech, sound, vision, graphics, spatial, time…

• Information at Your Fingertips– Make it easy to capture – Make it easy to store & organize & analyze – Make it easy to present & access

Page 45: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

57

How much information is there?• Soon everything can be

recorded and indexed• Most bytes will never be seen

by humans.• Data summarization, trend

detection anomaly detection are key technologies

See Mike Lesk: How much information is there: http://www.lesk.com/mlesk/ksg97/ksg.html

See Lyman & Varian:

How much informationhttp://www.sims.berkeley.edu/research/projects/how-much-info/

Yotta

Zetta

Exa

Peta

Tera

Giga

Mega

KiloA BookA Book

.Movie

All LoC books(words)

All Books MultiMedia

Everything!

Recorded

A PhotoA Photo

24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli

Page 46: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

58

Why Put Everything in Cyberspace?

Low rentmin $/byte

Shrinks timenow or later

Shrinks spacehere or there

Automate processingknowbots

Point-to-Point OR Broadcast

Imm

edia

te O

R T

ime

Del

ayed

LocateProcessAnalyzeSummarize

Page 47: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

59

Disk Storage Cheaper than Paper• File Cabinet: cabinet (4 drawer) 250$

paper (24,000 sheets) 250$space (2x3 @ 10$/ft2) 180$total 700$3 ¢/sheet

• Disk: disk (160 GB =) 300$ASCII: 100 m pages

0.0001 ¢/sheet (10,000x cheaper)

Image: 1 m photos 0.03 ¢/sheet (100x cheaper)

• Store everything on disk

Page 48: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

60

Gordon Bell’s MainBrain™Digitize Everything

A BIG shoebox?

• Scans 20 k “pages” tiff@ 300 dpi 1 GB• Music: 2 k “tacks” 7 GB• Photos: 13 k images 2 GB • Video: 10 hrs 3 GB• Docs: 3 k (ppt, word,..) 2 GB• Mail: 50 k messages 1 GB

16 GB

Page 49: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

61

Gary Starkweather

• Scan EVERYTHING

• 400 dpi TIFF

• 70k “pages” ~ 14GB

• OCR all scans (98% recognition ocr accuracy)

• All indexed (5 second access to anything)

• All on his laptop.

Page 50: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

62

• Q: What happens when the personal terabyte arrives?

• A: Things will run SLOWLY….

unless we add good software

Page 51: 1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Data Centric Computing Jim Gray Microsoft Research Research.Microsoft.com/~Gray/talks FAST 2002 Monterey, CA,

63

Summary

• Disks will morph to appliances

• Main barriers to this happening– Lack of Cool Apps– Cost of Information management