1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak...

1Bell & Gray 4/15 / 95

Parallel Database SystemsParallel Database Systems A SNAP Application A SNAP Application

Gordon Bell

450 Old Oak CourtLos Altos, CA 94022

GBell@Microsoft.com

Jim Gray

310 Filbert, SF CA 94133Gray@Microsoft.com

NetworkPlatform

2Bell & Gray 4/15 / 95

OutlineOutline

• Cyberspace Pep Talk: •Databases are the dirt of Cyberspace•Billions of clients mean millions of servers

• Parallel Imperative:• Hardware trend: Many little devices• Consequence: Servers are arrays of commodity components• PC’s are the bricks of Cyberspace• Must automate parallel {design / operation / use}• Software parallelism via dataflow & Data Partitioning

• Parallel database techniques• Parallel execution of many little jobs (OLTP)• Data Partitioning• Pipeline Execution• Automation techniques)

• Summary

3Bell & Gray 4/15 / 95

Kinds Of Information ProcessingKinds Of Information Processing

Point-to-Point Broadcast

Immediate

TimeShifted

conversationmoney

lectureconcert

mail booknewspaper

NetNetworkwork

DataDataBaseBase

Its ALL going electronicImmediate is being stored for analysis (so ALL database)Analysis & Automatic Processing are being added

4Bell & Gray 4/15 / 95

Why Put Everything in Cyberspace?Why Put Everything in Cyberspace?

Low rentmin $/byte

Shrinks timenow or later

Shrinks spacehere or there

Automate processingknowbots

Point-to-Point OR Broadcast

Network

DataBase

LocateProcessAnalyzeSummarize

6Bell & Gray 4/15 / 95

Database Store ALL Data Database Store ALL Data TypesTypes

• The New World:•Billions of objects•Big objects (1MB)•Objects have behavior

(methods)

• The Old World:

– Millions of objects

– 100-byte objects

People

Name Address Papers Picture Voice

David NY

Austin

People

Name Address

David NY

Austin Paperless officeLibrary of congress onlineAll information online entertainment publishing businessInformation Network, Knowledge Navigator, Information at your fingertips

7Bell & Gray 4/15 / 95

Magnetic Storage Cheaper than PaperMagnetic Storage Cheaper than Paper

• File Cabinet: cabinet (4 drawer) 250$paper (24,000 sheets) 250$space (2x3 @ 10$/ft2) 180$total 700$3 ¢/sheet

•Disk: disk (8 GB =) 4,000$ASCII: 4 m pages0.1 ¢/sheet (30x cheaper)

• Image: 200 k pages2 ¢/sheet (similar to paper)

• Store everything on disk

8Bell & Gray 4/15 / 95

Cyberspace Demographics Cyberspace Demographics

• Computer History:

• most computers are smallNEXT: 1 Billion X for some X

(phone?)

• most of the money is inclients and wiring1990: 50% desktop1995: 75% desktop

1950 National Computer1960 Corporate Computer1970 Site Computer1980 Departmental Computer1990 Personal Computer2000 ?

1KSUPER MAIN

FRAMEMINI WS PC

9Bell & Gray 4/15 / 95

Billions of Clients Billions of Clients

• Every device will be “intelligent”•Doors, rooms, cars, ...

•Computing will be ubiquitous

10Bell & Gray 4/15 / 95

Billions of Clients Need Billions of Clients Need Millions of ServersMillions of Servers

mobileclients

fixed clients

server

superserver

Clients

Servers

Super ServersLarge DatabasesHigh Traffic shared data

All clients are networked to serversmay be nomadic or on-demand

Fast clients want faster servers

Servers provide

control,

coordination

communication

17Bell & Gray 4/15 / 95

OutlineOutline

• Cyberspace Pep Talk: • Databases are the dirt of Cyberspace• Billions of clients mean millions of servers

• Parallel Imperative:•Hardware trend: Many little devices•Consequence: Server arrays of commodity parts•PC’s are the bricks of Cyberspace•Must automate parallel {design / operation / use}•Software parallelism via dataflow & Data Partitioning

• Parallel database techniques• Parallel execution of many little jobs (OLTP)• Data Partitioning• Pipeline Execution• Automation techniques)

• Summary

18Bell & Gray 4/15 / 95

Hardware trends: Few generic parts: CPU RAM Disk & Tape arrays

ATM for LAN/WAN?? for CAN

?? for OS

These parts will be inexpensive (commodity components)Systems will be arrays of these partsSoftware challenge: how to program arrays

1 M$100 K$ 10 K$

Mainframe MiniMicro Nano

9"5.25" 3.5" 2.5" 1.8"

Moore’s Law RestatedMoore’s Law RestatedMany Little Won over Few BigMany Little Won over Few Big

19Bell & Gray 4/15 / 95

Future SuperServerFuture SuperServer

Array of processors, disks, tapescomm lines

Challenge:How to program itMust use parallelism

Pipeline hide latency

Partitionbandwidthscaleup

1,000 discs = 10 Terrorbytes

100 Tape Transports

= 1,000 tapes = 1 PetaByte

100 Nodes 1 Tips

21Bell & Gray 4/15 / 95

The Hardware is in Place and Then A Miracle Occurs

SNAPSNAPScaleable Network And Platforms

Commodity Distributed OSCommodity Distributed OS built onCommodity PlatformsCommodity Network Interconnect

22Bell & Gray 4/15 / 95

Why Parallel Access To Data?Why Parallel Access To Data?

1 Terabyte

10 MB/s

At 10 MB/s1.2 days to scan

1 Terabyte

1,000 x parallel1.3 minute SCAN.

Parallelism: divide a big problem into many smaller ones

to be solved in parallel.

BANDWIDTH

23Bell & Gray 4/15 / 95

DataFlow ProgrammingDataFlow ProgrammingPrefetch & Postwrite Hide Latency Prefetch & Postwrite Hide Latency

• Can't wait for the data to arrive• Need a memory that gets the data in advance ( 100MB/S)

• Solution: •Pipeline from source (tape, disc, ram...) to cpu cache•Pipeline results to destination LATENCY

24Bell & Gray 4/15 / 95

The New Law of ComputingThe New Law of Computing

Grosch's Law:

Parallel Law: Needs

Linear Speedup and Linear ScaleupNot always possible

1 MIPS1 $

1,000 $

1,000 MIPS

2x $ is 2x performance

1 MIPS1 $

1,000 MIPS 32 $.03$/MIPS

2x $ is 4x performance

25Bell & Gray 4/15 / 95

Parallelism: Performance is the GoalParallelism: Performance is the Goal

Goal is to get 'good' performance.

Law 1: parallel system should be faster than serial system

Law 2: parallel system should give near-linear scaleup or

near-linear speedup orboth.

Parallelism is faster, not cheaper:trades money for time.

27Bell & Gray 4/15 / 95

The Perils of ParallelismThe Perils of Parallelism

Startup: Creating processesOpening filesOptimization

Interference: Device (cpu, disc, bus)logical (lock, hotspot, server, log,...)

Skew: If tasks get very small, variance > service time

Processors & Discs

The Good Speedup Curve

Processors & Discs

Three Perils

Processors & Discs

A Bad Speedup Curve

Linearity

No Parallelism Benefit

29Bell & Gray 4/15 / 95

Kinds of Parallel ExecutionKinds of Parallel Execution

Pipeline

Partition outputs split N ways inputs merge M ways

Any Sequential Program

SequentialSequential

SequentialSequential Any Sequential Program

Any Sequential Program

30Bell & Gray 4/15 / 95

Data RiversData Rivers Split + Merge StreamsSplit + Merge Streams

M ConsumersN producers

Producers add records to the river, Consumers consume records from the riverPurely sequential programming.River does flow control and buffering

does partition and merge of data recordsRiver = Exchange operator in Volcano.

N X M Data Streams

31Bell & Gray 4/15 / 95

Partitioned Data and ExecutionPartitioned Data and Execution

A...E F...J K...N O...S T...Z

A Table

Count Count Count Count Count

Spreads computation and IO among processors

Partitioned data gives NATURAL execution parallelism

32Bell & Gray 4/15 / 95

Partitioned + Merge + Pipeline Partitioned + Merge + Pipeline Execution Execution

A...E F...J K...N O...S T...Z

Pure dataflow programming Gives linear speedup & scaleup

But, top node may be bottleneckSo....

33Bell & Gray 4/15 / 95

N xM way ParallelismN xM way Parallelism

A...E F...J K...N O...S T...Z

Merge Merge

N inputs, M outputs, no bottlenecks.

34Bell & Gray 4/15 / 95

Why are Relational OperatorsWhy are Relational OperatorsSuccessful for Parallelism?Successful for Parallelism?

Relational data model uniform operatorson uniform data streamClosed under composition

Each operator consumes 1 or 2 input streamsEach stream is a uniform collection of dataSequential data in and out: Pure dataflow

partitioning some operators (e.g. aggregates, non-equi-join, sort,..)

requires innovation

AUTOMATIC PARALLELISM

35Bell & Gray 4/15 / 95

SQL SQL a NonProcedural Programming Languagea NonProcedural Programming Language

• SQL: functional programming language describes answer set.

• Optimizer picks best execution plan•Picks data flow web (pipeline), •degree of parallelism (partitioning)•other execution parameters (process placement, memory,...)

Schema

Monitor

Optimizer

ExecutionPlanning

Rivers

Executors

36Bell & Gray 4/15 / 95

Database Systems “Hide” Parallelism Database Systems “Hide” Parallelism

•Automate system management via tools•data placement•data organization (indexing)•periodic tasks (dump / recover / reorganize)

•Automatic fault tolerance•duplex & failover• transactions

•Automatic parallelism•among transactions (locking)•within a transaction (parallel execution)

37Bell & Gray 4/15 / 95

Success StoriesSuccess Stories

•Online Transaction Processing •many little jobs•SQL systems support 3700 tps-A

(24 cpu, 240 disk)

•SQL systems support 21,000 tpm-C (110 cpu, 800 disk)

•Batch (decision support and Utility)• few big jobs, parallelism inside•Scan data at 100 MB/s•Linear Scaleup to 50 processors

hardware

38Bell & Gray 4/15 / 95

Kinds of Partitioned DataKinds of Partitioned Data

Split a SQL table to subset of nodes & disks

Partition within set:Range Hash Round Robin

A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z

Good for equijoins, range queriesgroup-by

Shared disk and memory less sensitive to partitioning, Shared nothing benefits from "good" partitioning

Good for equijoins Good to spread load

41Bell & Gray 4/15 / 95

Picking Data RangesPicking Data Ranges

Disk PartitioningFor range partitioning, sample load on disks.

Cool hot disks by making range smallerFor hash partitioning,

Cool hot disks by mapping some buckets to others

River PartitioningUse hashing and assume uniform If range partitioning, sample data and use

histogram to level the bulk

Teradata, Tandem, Oracle use these tricks

42Bell & Gray 4/15 / 95

Parallel Data ScanParallel Data Scan

Select imagefrom landsatwhere date between 1970 and 1990and overlaps(location, :Rockies) and snow_cover(image) >.7;

Temporal

Spatial

date loc image

Landsat

1/2/72.........4/8/95

33N120W.......34N120W

Assign one process per processor/disk:find images with right data & locationanalyze image, if 70% snow, return it

Answer

date, location, & image tests

44Bell & Gray 4/15 / 95

Parallel AggregatesParallel Aggregates

For aggregate function, need a decomposition strategy:

count(S) = count(s(i)), ditto for sum()avg(S) = ( sum(s(i))) / count(s(i))and so on...

For groups,sub-aggregate groups close to the sourcedrop sub-aggregates into a hash river.

A...E F...J K...N O...S T...Z

A Table

Count Count Count Count Count

46Bell & Gray 4/15 / 95

Sub-sortsgenerateruns

Mergeruns

Range or Hash Partition River

River is range or hash partitioned

Scan or other source

Parallel SortParallel Sort

M input N output Sort design

Disk and mergenot needed if sort fits in memory

Scales linearly because

12= => 2x slowerlog(10 ) 6

log(10 ) 12

Sort is benchmark from hell for shared nothing machinesnet traffic = disk bandwidth, no data filtering at the source

47Bell & Gray 4/15 / 95

Blocking Operators =Short PiplelinesBlocking Operators =Short Piplelines

An operator is blocking, if it does not produce any output, until it has consumed all its input

Examples:Sort, Aggregates, Hash-Join (reads all of one operand)

Blocking operators kill pipeline parallelismMake partition parallelism all the more important.

Sort RunsScan

Sort Runs

Tape File SQL Table Process

Merge Runs

Table Insert

Index Insert

SQL Table

Index 1

Index 2

Index 3

Database LoadTemplate hasthree blocked phases

50Bell & Gray 4/15 / 95

Hash JoinHash Join

Hash smaller table into N buckets (hope N=1)

If N=1 read larger table, hash to smallerElse, hash outer to disk then

bucket-by-bucket hash join.

Purely sequential data behavior

Always beats sort-merge and nestedunless data is clustered.

Good for equi, outer, exclusion joinLots of papers,

products just appearing (what went wrong?)

Hash reduces skew

Right Table

LeftTable

HashBuckets

51Bell & Gray 4/15 / 95

Observation: Execution “easy”Observation: Execution “easy”Automation “hard”Automation “hard”

It is “easy” to build a fast parallel execution environment(no one has done it, but it is just programming)

It is hard to write a robust and world-class query optimizer.There are many tricksOne quickly hits the complexity barrier

Common approach:Pick best sequential planPick degree of parallelism based on bottleneck analysisBind operators to processPlace processes at nodesPlace scratch files near processesUse memory as a constraint

52Bell & Gray 4/15 / 95

Systems That Work This WaySystems That Work This Way

Shared NothingTeradata: 400 nodesTandem: 110 nodesIBM / SP2 / DB2: 48 nodesATT & Sybase 112 nodesInformix/SP2 48 nodes

Shared DiskOracle 170 nodesRdb 24 nodes

Shared MemoryInformix 9 nodes RedBrick ? nodes

CLIENTS

MemoryProcessors

CLIENTS

53Bell & Gray 4/15 / 95

Research Research ProblemsProblems

• Automatic data placement (partition: random or organized)

• Automatic parallel programming (process placement)

• Parallel concepts, algorithms & tools

• Parallel Query Optimization

• Execution Techniques load balance, checkpoint/restart, pacing,

1,000 discs = 10 Terrorbytes

100 Tape Transports

= 1,000 tapes = 1 PetaByte

100 Nodes 1 Tips

54Bell & Gray 4/15 / 95

SummarySummary

• Cyberspace is Growing• Databases are the dirt of cybersspace

PCs are the bricks, Networks are the morter. Many little devices: Performance via Arrays of {cpu, disk ,tape}

• Then a miracle occurs: a scaleable distributed OS and net•SNAP: Scaleable Networks and Platforms

• Then parallel database systems give software parallelism•OLTP: lots of little jobs run in parallel•Batch TP: data flow & data partitioning•Automate processor & storage array administration •Automate processor & storage array programming

• 2000 platforms as easy as 1 platform.

1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak...

Documents

World Cup 2010 Alphabetic Angel Bell Brown Angel Bell Brown Angel Bell Brown Angel Bell Brown Angel Bell Brown Angel Bell Brown Angel Bell Brown Angel

1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

Gray @ Nortel 20 April 1999 Scaleable Computing Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks/ Gray@Microsoft.com

Copyright Gordon Bell Clusters & Grids Crays, Clusters, Centers and Grids Gordon Bell (gbell@microsoft.com) Bay Area Research Center Microsoft Corporation

Gordon’s Personal View of Personal Computing: before the PC · Gordon’s Personal View of ... before the PC Vintage Computer Society 27 September 1998 Gordon Bell gbell@microsoft.com

Gordons Personal View of Personal Computing: before the PC Vintage Computer Society 27 September 1998 Gordon Bell gbell@microsoft.com gbell

cartella colori · 2016. 7. 15. · 01 02 06 04 05 10 07 08 09 03 area clinica pearl gray pearl gray private gray private gray private gray pearl gray private gray private gray pearl

NSF Visit Gordon Bell gbell Microsoft Research 4 October 2002

Copyright Gordon Bell & Jim Gray PC+ More Wheels of Reincarnation Or A New PC+, www+ Era? Infinite processing, memory, and bandwidth @ zero cost Gordon

BR300073 · 2020. 12. 10. · Gray sandstone, red rock Red rock, 'gray sandstone Gray sandstone • Bine rock Red rock Red rock Red rock Red rock Gray rock Gray rock Gray rock. Gray

Family Acknowledgement Rosa Bell “Lady” Gibson Carter€¦ · NC. She departed this life June 4, 2014, at Shannon Gray Rehabilitation and Recovery Center in Jamestown, North Carolina

LTU Armadillo 2007 IGVC Jeremy Gray, BSEE; Shawn Ellison, MSCS; Phil Munie, MSCS; Brandon Bell, MSCS

1 Telepresence: An Umbrella Research Topic Jim Gray Microsoft Research Gray@Microsoft.com Gray

Computer Technology Forecast Jim Gray Microsoft Research Gray@Microsoft.com http://~research.Microsoft.com/~Gray

1 Scaleable WindowsNT? Jim Gray Microsoft Research Gray@Microsoft.com Gray

Introduction to Sociology Sociology 120 §Napa Valley College §Social Sciences Division §Glen Bell §Office 1030 EOffice Phone 253-3182 §gbell@napavalley.edu

Creative · 2020-08-01 · Adams Nadine Amen James Anderson Ginger Bell **Larry Benfer Harold (Cork) Beveridge Gail Blackwell Gray Blattmann Clarice Brown Dr. Richard Brown Bryan

Copyright Gordon Bell Beyond Moores Law (and the web): Whats next? Nets Everywhere gbell/pubs.htm gbell/pubs.htm

Gray, Gray & Gray Presents: Preview of the "Tax Reform Act of 2014" Webinar

avfa.agrinet.tnavfa.agrinet.tn/upload/supports/support240.pdf · , (Charleston-Gray) (Charleston-Gray ) (Gray Bell) Early Canada (Royal.Charleston )