46
Scaleable Scaleable Computing Computing Jim Gray Jim Gray Microsoft Corporation Microsoft Corporation [email protected] [email protected]

Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Embed Size (px)

DESCRIPTION

™. Scaleable Computing Jim Gray Microsoft Corporation [email protected]. Thesis: Scaleable Servers. Scaleable Servers Commodity hardware allows new applications New applications need huge servers Clients and servers are built of the same “stuff” Commodity software and Commodity hardware - PowerPoint PPT Presentation

Citation preview

Page 1: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Scaleable ComputingScaleable Computing

Jim GrayJim GrayMicrosoft CorporationMicrosoft Corporation

[email protected]@Microsoft.com

Page 2: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Thesis: Scaleable ServersThesis: Scaleable Servers Scaleable ServersScaleable Servers

Commodity hardware allows new applicationsCommodity hardware allows new applications New applications need huge serversNew applications need huge servers Clients and servers are built of the same “stuff”Clients and servers are built of the same “stuff”

Commodity software and Commodity software and Commodity hardwareCommodity hardware

Servers should be able to Servers should be able to Scale up Scale up (grow node by adding CPUs, disks, networks)(grow node by adding CPUs, disks, networks)

Scale out Scale out (grow by adding nodes)(grow by adding nodes)

Scale down Scale down (can start small)(can start small)

Key software technologiesKey software technologies Objects, Transactions, Clusters, ParallelismObjects, Transactions, Clusters, Parallelism

Page 3: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

1987: 256 tps Benchmark 1987: 256 tps Benchmark 14 M$ computer (Tandem)14 M$ computer (Tandem) A dozen peopleA dozen people False floor, 2 rooms of machinesFalse floor, 2 rooms of machines

Simulate 25,600 clients

A 32 node processor array

A 40 GB disk array (80 drives)

OS expert

Network expert

DB expert

Performance expert

Hardware experts

Admin expert

Auditor

Manager

Page 4: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

1988: DB2 + CICS Mainframe1988: DB2 + CICS Mainframe65 tps65 tps

IBM 4391 IBM 4391 Simulated network of 800 clientsSimulated network of 800 clients 2m$ computer2m$ computer Staff of 6 to do benchmarkStaff of 6 to do benchmark

2 x 3725 network controllers

16 GB disk farm4 x 8 x .5GB

Refrigerator-sizedCPU

Page 5: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

1997: 10 years later1997: 10 years later1 Person and 1 box = 1250 tps1 Person and 1 box = 1250 tps

1 Breadbox ~ 5x 1987 machine room1 Breadbox ~ 5x 1987 machine room 23 GB is hand-held23 GB is hand-held One person does all the workOne person does all the work Cost/tps is 1,000x lessCost/tps is 1,000x less

25 micro dollars per transaction25 micro dollars per transaction4x200 Mhz cpu1/2 GB DRAM12 x 4GB disk

Hardware expertOS expertNet expertDB expertApp expert

3 x7 x 4GB disk arrays

Page 6: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

What Happened?What Happened? Moore’s law: Moore’s law:

Things get 4x better every 3 yearsThings get 4x better every 3 years (applies to computers, storage, and networks)(applies to computers, storage, and networks)

New Economics: CommodityNew Economics: Commodityclassclass price/mips softwareprice/mips software $/mips k$/year $/mips k$/yearmainframe mainframe 10,000 10,000 100 100 minicomputerminicomputer 100 100 10 10microcomputer 10 microcomputer 10 1 1

GUI: Human - computer tradeoffGUI: Human - computer tradeoffoptimize for people, not computersoptimize for people, not computers

mainframeminimicro

time

pric

e

Page 7: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

What Happens NextWhat Happens Next

Last 10 years: Last 10 years: 1000x improvement 1000x improvement

Next 10 years: Next 10 years: ????????

Today: Today: text and image servers are freetext and image servers are free

25 25 $/hit => advertising pays for $/hit => advertising pays for themthem

Future:Future:video, audio, … servers are freevideo, audio, … servers are free“You ain’t seen nothing yet!” “You ain’t seen nothing yet!”

1985 20051995

perf

orm

ance

Page 8: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Kinds Of Kinds Of Information ProcessingInformation Processing

It’s ALL going electronicIt’s ALL going electronic

Immediate is being stored for analysis (so ALL database)Immediate is being stored for analysis (so ALL database)

Analysis and automatic processing are being addedAnalysis and automatic processing are being added

Point-to-pointPoint-to-point BroadcastBroadcast

ImmediateImmediate

Time-Time-shiftedshifted

ConversationConversationMoneyMoney

LectureLectureConcertConcert

MailMail BookBookNewspaperNewspaper

NetworkNetwork

DatabaseDatabase

Page 9: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Why Put EverythingWhy Put EverythingIn Cyberspace?In Cyberspace?

Low rent -Low rent -min $/bytemin $/byte

Shrinks time -Shrinks time -now or laternow or later

Shrinks space -Shrinks space -here or therehere or there

Automate processing -Automate processing -knowbotsknowbots

Point-to-point Point-to-point OR OR

broadcastbroadcast

Imm

ed

iate

OR

tim

e-d

ela

ye

dIm

me

dia

te O

R t

ime

-de

lay

ed

NetworkNetwork

DatabaseDatabase

LocateLocateProcessProcessAnalyzeAnalyzeSummarizeSummarize

Page 10: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Magnetic Storage Magnetic Storage Cheaper Than PaperCheaper Than Paper

File cabinetFile cabinet:: cabinet (four drawer)cabinet (four drawer) 250$250$paper (24,000 paper (24,000

sheets)sheets) 250$250$ space space (2x3 @ 10$/ft(2x3 @ 10$/ft22)) 180$180$ totaltotal

700$700$

3¢/sheet3¢/sheet DiskDisk:: disk (4 GB =)disk (4 GB =) 800$800$

ASCII: 2 mil ASCII: 2 mil pagespages

00..04¢/sheet04¢/sheet (80x cheaper)(80x cheaper)

ImageImage:: 200,000 pages200,000 pages

0.4¢/sheet0.4¢/sheet (8x cheaper)(8x cheaper)

Store everything on diskStore everything on disk

Page 11: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

DatabasesDatabasesInformation at Your FingertipsInformation at Your Fingertips™™

Information Network Information Network™™

Knowledge NavigatorKnowledge Navigator™™

All information will be in anAll information will be in anonline database (somewhere)online database (somewhere)

You might record everything you You might record everything you Read: 10MB/day, 400 GB/lifetimeRead: 10MB/day, 400 GB/lifetime

(eight tapes (eight tapes todaytoday)) Hear: 400MB/day, 16 TB/lifetimeHear: 400MB/day, 16 TB/lifetime

(three tapes/year (three tapes/year todaytoday)) See: 1MB/s, 40GB/day, 1.6 PB/lifetime See: 1MB/s, 40GB/day, 1.6 PB/lifetime

(maybe someday)(maybe someday)

Page 12: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Database StoreDatabase StoreALL Data TypesALL Data Types

The new world:The new world: Billions of objectsBillions of objects Big objects (1 MB)Big objects (1 MB) Objects have Objects have

behavior (methods)behavior (methods)

The old world:The old world: Millions of objectsMillions of objects 100-byte objects100-byte objects

PeoplePeople

NameName AddressAddress

MikeMike

WonWon

DavidDavid NYNY

BerkBerk

AustinAustinPeoplePeople

NameName AddressAddress PapersPapers PicturePicture VoiceVoice

MikeMike

WonWon

DavidDavid NYNY

BerkBerk

AustinAustin

Paperless officePaperless office Library of Congress onlineLibrary of Congress online All information onlineAll information online

EntertainmentEntertainmentPublishingPublishingBusinessBusiness

WWW and InternetWWW and Internet

Page 13: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Billions Of Clients Billions Of Clients

Every device will be “intelligent”Every device will be “intelligent” Doors, rooms, cars…Doors, rooms, cars… Computing will be ubiquitousComputing will be ubiquitous

Page 14: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Billions Of ClientsBillions Of ClientsNeed Millions Of ServersNeed Millions Of Servers

MobileMobileclientsclients

FixedFixedclients clients

ServerServer

SuperSuperserverserver

ClientsClients

ServersServers

All clients networked All clients networked to serversto servers May be nomadicMay be nomadic

or on-demandor on-demand Fast clients wantFast clients wantfasterfaster servers servers

Servers provide Servers provide Shared DataShared Data ControlControl CoordinationCoordination CommunicationCommunication

Page 15: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

ThesisThesisMany little beat few bigMany little beat few big

Smoking, hairy golf ballSmoking, hairy golf ball How to connect the many little parts?How to connect the many little parts? How to program the many little parts?How to program the many little parts? Fault tolerance?Fault tolerance?

$1 $1 millionmillion $100 K$100 K $10 K$10 K

MainframeMainframe MiniMiniMicroMicro NanoNano

14"14"9"9"

5.25"5.25" 3.5"3.5" 2.5"2.5" 1.8"1.8"1 M SPECmarks, 1TFLOP1 M SPECmarks, 1TFLOP

101066 clocks to bulk ram clocks to bulk ram

Event-horizon on chipEvent-horizon on chip

VM reincarnatedVM reincarnated

Multiprogram cache,Multiprogram cache,On-Chip SMPOn-Chip SMP

10 microsecond ram

10 millisecond disc

10 second tape archive

10 nano-second ram

Pico Processor

10 pico-second ram

1 MM 3

100 TB

1 TB

10 GB

1 MB

100 MB

Page 16: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Future Super Server:Future Super Server:4T Machine4T Machine

Array of 1,000 4B machinesArray of 1,000 4B machines1 bps processors1 bps processors1 BB DRAM 1 BB DRAM 10 BB disks 10 BB disks 1 Bbps comm lines1 Bbps comm lines1 TB tape robot1 TB tape robot

A few megabucksA few megabucks Challenge:Challenge:

ManageabilityManageabilityProgrammabilityProgrammabilitySecuritySecurityAvailabilityAvailabilityScaleabilityScaleabilityAffordabilityAffordability

As easy as a single systemAs easy as a single system

Future servers are CLUSTERSFuture servers are CLUSTERSof processors, discsof processors, discs

Distributed database techniquesDistributed database techniquesmake clusters workmake clusters work

CPU

50 GB Disc

5 GB RAM

Cyber BrickCyber Bricka 4B machinea 4B machine

Page 17: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

The Hardware Is In Place…The Hardware Is In Place…And then a miracle occursAnd then a miracle occurs

? SNAP: scaleable networkSNAP: scaleable network

and platformsand platforms Commodity-distributedCommodity-distributed

OS built on:OS built on: Commodity platformsCommodity platforms Commodity networkCommodity network

interconnectinterconnect Enables parallel applicationsEnables parallel applications

Page 18: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Thesis: Scaleable ServersThesis: Scaleable Servers Scaleable ServersScaleable Servers

Commodity hardware allows new applicationsCommodity hardware allows new applications New applications need huge serversNew applications need huge servers Clients and servers are built of the same “stuff”Clients and servers are built of the same “stuff”

Commodity software and Commodity software and Commodity hardwareCommodity hardware

Servers should be able to Servers should be able to Scale up Scale up (grow node by adding CPUs, disks, networks)(grow node by adding CPUs, disks, networks)

Scale out Scale out (grow by adding nodes)(grow by adding nodes)

Scale down Scale down (can start small)(can start small)

Key software technologiesKey software technologies Objects, Transactions, Clusters, ParallelismObjects, Transactions, Clusters, Parallelism

Page 19: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Scaleable ServersScaleable ServersBOTH SMP And ClusterBOTH SMP And Cluster

Grow up with SMP; 4xP6Grow up with SMP; 4xP6is now standardis now standardGrow out with clusterGrow out with clusterCluster has inexpensive partsCluster has inexpensive parts

ClusterClusterof PCs of PCs

SMP superSMP superserverserver

DepartmentalDepartmentalserverserver

PersonalPersonalsystemsystem

Page 20: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

SMPs Have AdvantagesSMPs Have Advantages

Single system image Single system image easier to manage, easier easier to manage, easier to program threads in to program threads in shared memory, disk, Netshared memory, disk, Net

4x SMP is commodity4x SMP is commodity Software capable of 16xSoftware capable of 16x Problems:Problems:

>4 not commodity>4 not commodity Scale-down problem Scale-down problem

(starter systems expensive)(starter systems expensive) There There isis a BIGGEST one a BIGGEST one

SMP superSMP superserverserver

DepartmentalDepartmentalserverserver

PersonalPersonalsystemsystem

Page 21: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Building the Largest NodeBuilding the Largest Node There is a biggest node (size grows over time)There is a biggest node (size grows over time) Today, with NT, it is probably 1TBToday, with NT, it is probably 1TB We are building itWe are building it (with help from DEC and SPIN2)(with help from DEC and SPIN2)

1 TB GeoSpatial SQL Server database1 TB GeoSpatial SQL Server database (1.4 TB of disks = 320 drives).(1.4 TB of disks = 320 drives). 30K BTU, 8 KVA, 1.5 metric tons.30K BTU, 8 KVA, 1.5 metric tons.

Will put it on the Web as a demo app.Will put it on the Web as a demo app. 10 meter image of the ENTIRE PLANET.10 meter image of the ENTIRE PLANET. 2 meter image of interesting parts 2 meter image of interesting parts (2% of land)(2% of land)

One pixel per meter = 500 TB One pixel per meter = 500 TB uncompressed.uncompressed.

Better resolution in US (courtesy of USGS).Better resolution in US (courtesy of USGS).

www.SQL.1TB.com

SupportSupportfilesfiles

1-TB SQL Server DB1-TB SQL Server DBSatellite and aerial Satellite and aerial

photosphotos

Todo loo da loo-rah, ta da ta-la la la

Todo loo da loo-rah, ta da ta-la la la

Todo loo da loo-rah, ta da ta-la la la

Todo loo da loo-rah, ta da ta-la la la

Todo loo da loo-rah, ta da ta-la la la

Todo loo da loo-rah, ta da ta-la la la

Todo loo da loo-rah, ta da ta-la la la

1-TB home page1-TB home page

TM

Page 22: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

What’s TeraByte?What’s TeraByte? 1 Terabyte:1 Terabyte: 1,000,000,000 business letters 150 miles of book shelf1,000,000,000 business letters 150 miles of book shelf 100,000,000 book pages 100,000,000 book pages 15 miles of book shelf 15 miles of book shelf 50,000,000 FAX images50,000,000 FAX images 7 miles of book shelf 7 miles of book shelf 10,000,000 TV pictures (mpeg) 10 days of video 10,000,000 TV pictures (mpeg) 10 days of video

4,000 LandSat images 4,000 LandSat images 16 earth images (100m) 16 earth images (100m) 100,000,000 web page 10 copies of the web HTML100,000,000 web page 10 copies of the web HTML

Library of Congress (in ASCII) is 25 TBLibrary of Congress (in ASCII) is 25 TB 1980: $200 million of disc1980: $200 million of disc 10,000 discs 10,000 discs

$5 million of tape silo$5 million of tape silo 10,000 tapes 10,000 tapes

1997: $200 k$ of magnetic disc 48 discs1997: $200 k$ of magnetic disc 48 discs $30 k$ nearline tape 20 tapes$30 k$ nearline tape 20 tapes

Terror Byte !Terror Byte !

Page 23: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

TB DB User InterfaceTB DB User Interface

+ +

+

Next

Page 24: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Tpc-C Web-Based BenchmarksTpc-C Web-Based Benchmarks Client is a Web browser Client is a Web browser

(7,500 of them!)(7,500 of them!) Submits Submits

OrderOrder InvoiceInvoice Query to server via Web Query to server via Web

page interfacepage interface

Web server translates to DBWeb server translates to DB SQL does DB workSQL does DB work Net: Net:

easy to implement easy to implement performance is GREAT!performance is GREAT!

HT

TP

HT

TP

OD

BC

OD

BC

SQL SQL

IISIIS= Web= Web

Page 25: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Grow UP and OUT Grow UP and OUT

1 billion 1 billion transactions transactions

per dayper day

SMP superSMP superserverserver

DepartmentalDepartmentalserverserver

PersonalPersonalsystemsystem

1 Terabyte DB1 Terabyte DB

Cluster: •a collection of nodes •as easy to program and manage as a single node

Page 26: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Clusters Have AdvantagesClusters Have Advantages

Clients and servers made from the same stuffClients and servers made from the same stuff Inexpensive: Inexpensive:

Built with commodity components Built with commodity components

Fault tolerance: Fault tolerance: Spare modules mask failuresSpare modules mask failures

Modular growthModular growth Grow by adding small modulesGrow by adding small modules

Unlimited growth: Unlimited growth: no biggest oneno biggest one

Page 27: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Windows NT ClustersWindows NT Clusters Microsoft & 60 vendors defining NT clustersMicrosoft & 60 vendors defining NT clusters

Almost all big hardware and software vendors involvedAlmost all big hardware and software vendors involved

No special hardware needed - but it may helpNo special hardware needed - but it may help Fault-tolerant first, scaleable secondFault-tolerant first, scaleable second

Microsoft, Oracle, SAP giving demos todayMicrosoft, Oracle, SAP giving demos today Enables Enables

Commodity fault-toleranceCommodity fault-tolerance Commodity parallelism (data mining, virtual reality…)Commodity parallelism (data mining, virtual reality…) Also great for workgroups!Also great for workgroups!

Page 28: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Billion Transactions per DayBillion Transactions per DayProjectProject

Building a 20-node Windows NT Building a 20-node Windows NT Cluster (with help from Intel)Cluster (with help from Intel)> 800 disks> 800 disks

All commodity partsAll commodity parts Using SQL Server & Using SQL Server &

DTC distributed transactionsDTC distributed transactions Each node has 1/20 th of the DB Each node has 1/20 th of the DB Each node does 1/20 th of the Each node does 1/20 th of the

workwork 15% of the transactions are 15% of the transactions are

“distributed”“distributed”

Page 29: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

How Much Is 1 Billion How Much Is 1 Billion Transactions Per Day?Transactions Per Day?

Millions of transactions per dayMillions of transactions per day

0.10.1

1.1.

10.10.

100.100.

1,000.1,000.

1 B

tpd

1 B

tpd

Vis

aV

isa

AT

&T

AT

&T

Bo

fAB

ofA

NY

SE

NY

SE

Mtp

dM

tpd

1 Btpd = 11,574 tps 1 Btpd = 11,574 tps (transactions per second)(transactions per second) ~ 700,000 tpm ~ 700,000 tpm (transactions/minute)(transactions/minute)

AT&T AT&T 185 million calls 185 million calls

(peak day worldwide)(peak day worldwide) Visa ~20 M tpdVisa ~20 M tpd

400 M customers400 M customers 250,000 ATMs worldwide250,000 ATMs worldwide 7 billion transactions / year 7 billion transactions / year

(card+cheque) in 1994 (card+cheque) in 1994

Page 30: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

ParallelismParallelismThe OTHER aspect of clustersThe OTHER aspect of clusters

Clusters of machines Clusters of machines allow two kinds allow two kinds of parallelismof parallelism Many little jobs: online Many little jobs: online

transaction processingtransaction processing TPC-A, B, C…TPC-A, B, C…

A few big jobs: data A few big jobs: data search and analysissearch and analysis TPC-D, DSS, OLAPTPC-D, DSS, OLAP

Both give Both give automatic parallelismautomatic parallelism

Page 31: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey

Kinds of Parallel ExecutionKinds of Parallel Execution

Pipeline

Partition outputs split N ways inputs merge M ways

Any Sequential Program

Any Sequential Program

Any Sequential

Any Sequential Program Program

Page 32: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey

Partitioned ExecutionPartitioned Execution

A...E F...J K...N O...S T...Z

A Table

Count Count Count Count Count

Count

Spreads computation and IO among processors

Partitioned data gives NATURAL parallelism

Page 33: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey

N x M way ParallelismN x M way Parallelism

A...E F...J K...N O...S T...Z

Merge

Join

Sort

Join

Sort

Join

Sort

Join

Sort

Join

Sort

Merge Merge

N inputs, M outputs, no bottlenecks.

Partitioned DataPartitioned and Pipelined Data Flows

Page 34: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

The Parallel Law The Parallel Law Of ComputingOf Computing

Grosch's Law: Grosch's Law:

Parallel Law:Parallel Law:Needs:Needs:

Linear speedup and linear scale-upLinear speedup and linear scale-upNot always possibleNot always possible 1 MIPS 1 MIPS

1 $1 $

1,000 MIPS1,000 MIPS1,000 $1,000 $

2x $ is2x performance

1 MIPS1 MIPS1 $1 $

1,000 MIPS1,000 MIPS 32 $32 $.03$/MIPS.03$/MIPS

2x $ is 4x performance

Page 35: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Thesis: Scaleable ServersThesis: Scaleable Servers Scaleable ServersScaleable Servers

Commodity hardware allows new applicationsCommodity hardware allows new applications New applications need huge serversNew applications need huge servers Clients and servers are built of the same “stuff”Clients and servers are built of the same “stuff”

Commodity software and Commodity software and Commodity hardwareCommodity hardware

Servers should be able to Servers should be able to Scale up Scale up (grow node by adding CPUs, disks, networks)(grow node by adding CPUs, disks, networks)

Scale out Scale out (grow by adding nodes)(grow by adding nodes)

Scale down Scale down (can start small)(can start small)

Key software technologiesKey software technologies Objects, Transactions, Clusters, ParallelismObjects, Transactions, Clusters, Parallelism

Page 36: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

The BIG PictureThe BIG PictureComponents and transactionsComponents and transactions

Software modules are objects Software modules are objects Object Request Broker (a.k.a., Transaction Object Request Broker (a.k.a., Transaction

Processing Monitor) connects objectsProcessing Monitor) connects objects(clients to servers)(clients to servers)

Standard interfaces allow software plug-insStandard interfaces allow software plug-ins Transaction ties execution of a “job” into an Transaction ties execution of a “job” into an

atomic unit: all-or-nothing, durable, isolatedatomic unit: all-or-nothing, durable, isolated

Object Request BrokerObject Request Broker

Page 37: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Linking And EmbeddingLinking And EmbeddingObjects are data modules;Objects are data modules;

transactions are execution modulestransactions are execution modules

Link: pointer to object Link: pointer to object somewhere elsesomewhere else Think URL in InternetThink URL in Internet

Embed: bytesEmbed: bytesare hereare here

Objects may be Objects may be activeactive; ; can callback to subscriberscan callback to subscribers

Page 38: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Objects Meet DatabasesObjects Meet DatabasesThe basis for The basis for universaluniversal

data servers, access, & integrationdata servers, access, & integration

DBMSDBMSengineengine

object-oriented (COM oriented) object-oriented (COM oriented) programming interface to dataprogramming interface to data

Breaks DBMS into componentsBreaks DBMS into components Anything can be a data sourceAnything can be a data source Optimization/navigation “on top Optimization/navigation “on top

of” other data sourcesof” other data sources A way to componentized a A way to componentized a

DBMSDBMS Makes an RDBMS and O-RMakes an RDBMS and O-R

DBMS (assumes optimizer DBMS (assumes optimizer understands objects)understands objects)

DatabaseDatabase

SpreadsheetSpreadsheet

PhotosPhotos

MailMail

MapMap

DocumentDocument

Page 39: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

43

The Three The Three TiersTiers

Web Client

HTML

VB or Java Script Engine

VB or Java Virt Machine

VBscritptJavaScrpt

VB Javaplug-ins

InternetORB

HTTP+DCOM

ObjectserverPool

MiddlewareORB

TP MonitorWeb Server...

DCOM (oleDB, ODBC,...)

Object & Dataserver.

LU6.2

IBMLegacy Gateways

Page 40: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

47

Server Side ObjectsServer Side Objects Easy Server-Side ExecutionEasy Server-Side Execution

Give simple execution Give simple execution environmentenvironment

Object gets Object gets startstart invokeinvoke shutdownshutdown

Everything else is Everything else is automaticautomatic

Drag & Drop Business Drag & Drop Business ObjectsObjects

NetworkNetwork

Thread PoolThread Pool

QueueQueue

ConnectionsConnections

ContextContext SecuritySecurity

Shared Data

ReceiverReceiver

SynchronizationSynchronization

Service logic

Co

nfig

ura

tion

Co

nfig

ura

tion

Ma

na

ge

me

nt

Ma

na

ge

me

nt

A Server

Page 41: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

A new programming paradigm Develop object on the desktopDevelop object on the desktop Better yet: download them from the NetBetter yet: download them from the Net Script work flows as method invocations Script work flows as method invocations All on desktopAll on desktop Then, move work flows and objects to server(s)Then, move work flows and objects to server(s) GivesGives

desktop development desktop development three-tier deploymentthree-tier deploymentSoftware CyberbricksSoftware Cyberbricks

Page 42: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Transactions Coordinate Transactions Coordinate Components (ACID)Components (ACID)

Transaction propertiesTransaction properties Atomic: all or nothingAtomic: all or nothing Consistent: old and new valuesConsistent: old and new values Isolated: automatic locking or versioningIsolated: automatic locking or versioning Durable: once committed, effects surviveDurable: once committed, effects survive Transactions are built into modern OSsTransactions are built into modern OSs

MVS/TM Tandem TMF, VMS DEC-DTM, NT-DTCMVS/TM Tandem TMF, VMS DEC-DTM, NT-DTC

Page 43: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Transactions & ObjectsTransactions & Objects Application requests transaction Application requests transaction

identifier (XID)identifier (XID) XID flows with method invocationsXID flows with method invocations Object Managers join (enlist)Object Managers join (enlist)

in transactionin transaction Distributed Transaction Manager Distributed Transaction Manager

coordinates commit/abortcoordinates commit/abort

Page 44: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Distributed TransactionsDistributed Transactions Enable Huge Throughput Enable Huge Throughput

Each node capable of 7 KtmpC Each node capable of 7 KtmpC (7,000 (7,000 activeactive users!) users!) Can add nodes to cluster Can add nodes to cluster (to support 100,000 users)(to support 100,000 users)

Transactions coordinate nodesTransactions coordinate nodes ORB / TP monitor spreads work among nodesORB / TP monitor spreads work among nodes

Page 45: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Distributed TransactionsDistributed Transactions Enable Huge DBs Enable Huge DBs

Distributed database technology Distributed database technology spreads data among nodesspreads data among nodes

Transaction processing technology Transaction processing technology manages nodesmanages nodes

Page 46: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft

Thesis: Scaleable ServersThesis: Scaleable Servers Scaleable Servers Built from CyberbricksScaleable Servers Built from Cyberbricks

Allow new applicationsAllow new applications

Servers should be able to Servers should be able to Scale up, out, downScale up, out, down

Key software technologiesKey software technologies Clusters (ties the hardware together)Clusters (ties the hardware together) Parallelism: (Parallelism: (uses the independent cpus, stores, wiresuses the independent cpus, stores, wires

Objects (software CyberBricks) Objects (software CyberBricks) Transactions: masks errors.Transactions: masks errors.