59
1 Commodity Database Servers Jim Gray Microsoft Research [email protected] http:// Research.Microsoft.com/~Gray/ talks

1 Commodity Database Servers Jim Gray Microsoft Research [email protected] Gray/talks

Embed Size (px)

Citation preview

Page 1: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

1

Commodity Database Servers

Jim Gray

Microsoft Research

[email protected]

http://Research.Microsoft.com/~Gray/talks

Page 2: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

2

Outline

• Status report on Commodity Server Performance

• Why Most VLDBs will be Multi-Media Servers

• Preview of Microsoft’s SQL Server 7

Page 3: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

3

Status Report on Commodity Server Performance

• Standards: – TPC, – SpecWeb, ...

• Product benchmarks: e.g. – SAP, – PeopleSoft,…

• Both indicate that – NT is 18 months behind Unix-SMP performance – but clusters can make up the difference

Page 4: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

4

TPC-CSMP

• HP 9000 16 cpu, Sybase 1152.1 ktpmC, 82$/tpmC

• NEC 8 cpuSQL Server14.9 ktpmC, 60$/tpmC

Cluster• IBM SP2 12x8 cpu

Oracle 8.257 ktpmC, 148$/tpmc

• Predict:large & inexpensive NT cluster number this year.

Diseconomy of Scale: Big systems are Expensive

27$/tpmC vs 148$/tpmC

0

5

10

15

20

25

30

35

40

11007 16101 52117 57053

tpmC

tpm

C p

er

k$

Page 5: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

5

TPC-D• Performance Champions:NCR/Teradata

– 1 TB:32x4 node clusters

– 300 GB: 24x4 node cluster

– 100 GB: 8x4 cluster

• All use Teradata software on NCR World-Mark Intel-based hardware

(QppD) (QthD)1,000 GB NCR WorlkMark Server 3069 1205

300 GB NCR WorlkMark Server 9260 3117100 GB NCR WorldMark Server 12149 3912

Page 6: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

6

Outline

• Status report on Commodity Server Performance

• Why Most VLDBs will be Multi-Media Servers

• Preview of Microsoft’s SQL Server 7

Page 7: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

7

VLDB Reality Test

• California DMV – ~ 20 million cars, drivers, doctors,

barbers,..

– Some drivers have moving violations

– DMV knows about 1.5 KB about each one

– 30 GB total.

• Microsoft: too big says DoJ– 40B$ revenue (in company life time)

– ~1 billion unit sales: @ 100 B = 100 GB

– ~100 M customers: @1 KB = 100 GB

• Wall Mart (no one bigger!)– Sells 10 B items per year

– 100 bytes/item => 1 TB

• ATT – 300 M calls per day (peak day)

– 10 B calls per year

– 100 b/call = 1 TB

Page 8: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

8

VLDB Reality Test• Its HARD to find 1 TB of

transaction data– 100 M web hits/day

– 250 B/hit

– 1TB/year

• Its HARD to find 1TB of text data– 100 M web pages

– 10 KB/page

– = 1 TB

• How do they do it?

• Lots of indices?– No: that is only 3x

• Precomputed Aggregates?– Yes: OLAP benchmark

• Start at 30 MB

• Use 2.7 GB or 6GB database

– But: this is dumb

• Email?– Microsoft: 6 TB

– Hotmail: 3.5 TB

– AOL?

Page 9: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

9

Data Tidal Wave• Seagate 47GB drive @ 3k$

– 100 GB penny per MB drive coming in 2000

• 10 $/GB = 10 k$/ Terabyte! (in y2k)– Everyone can afford one

• What’s a terror bite?– If you sell ten billion items a year (e.g Wal-Mart)

– And you record 100 bytes on each one

– Then you got a Terror Bite

• Where will the terror bytes come from?– Multimedia (like the TerraServer) and...

Page 10: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

10

Multi Media: Very Large DBs • Photo is 100 KB, not 100 B

– So, photo DBs are 1,000x larger

• Examples:– Scanned documents– Photo records of products/people/places– Surveillance– Scientific monitoring

Page 11: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

11

Some TerrorByte Databases

• EOS/DIS (picture of planet each week)– 15 PB by 2007

• Federal Reserve Clearing house: images of checks– 15 PB by 2006 (7 year history)

• Sloan Digital Sky Survey:– 40 TB raw, 2 TB cooked

• TerraServer:

Page 12: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

12

Scaleup - Big Database• Build a 1 TB SQL Server database

– Show off Windows NT and SQL Server scalability

– Stress test the product• Data must be

– 1 TB– Unencumbered– Interesting to everyone everywhere– And not offensive to anyone anywhere

• Loaded – 1.1 M place names from Encarta World Atlas– 1 M Sq Km from USGS (1 meter resolution)– 2 M Sq Km from Russian Space agency (2 m)

• Will be on web (world’s largest atlas)• Sell images with commerce server.• USGS CRDA: 3 TB more coming.

Page 13: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

13

TerraServerWorld’s Largest PC!

• 324 disks (2.9 terabytes)

• 8 x 440Mhz Alpha CPUs

• 10 GB DRAM

• NT EE & SQL 7.0

• Photo of the planetUSGS and Russianimages

Page 14: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

14

Background• Earth is 500 Tera-meters square– USA is 10 tm2

• 100 TM2 land in 70ºN to 70ºS

• We have pictures of 6% of it– 3 tsm from USGS

– 2 tsm from Russian Space Agency

• Compress 5:1 (JPEG) to 1.5 TB.

• Slice into 10 KB chunks

• Store chunks in DB

• Navigate with

– Encarta™ Atlas• globe

• gazetteer

– StreetsPlus™ in the USA

40x60 km2 jump image

20x30 km2 browse image

10x15 km2 thumbnail

1.8x1.2 km2 tile

• Someday– multi-spectral image

– of everywhere

– once a day / hour

Page 15: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

15

USGS Digital Ortho Quads (DOQ)

• US Geologic Survey

• 3 TeraBytes• Most data not yet published• Based on a CRADA

– TerraServer makes data available.

USGS “DOQ”

1x1 meter4 TBContinentalUSNew DataComing

Page 16: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

16

Russian Space Agency(SovInfomSputnik) SPIN-2 (Aerial Images is Worldwide Distributor)

• 1.5 Meter Geo Rectified imagery of (almost) anywhere

• Almost equal-area projection

• De-classified satellite photos (from 200 KM),

• More data coming (1 m)

• Want to sell imagery on Internet.

• Putting 2 tm2 onto TerraServer.SPIN-2

Page 17: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

17

http://www.TerraServer.com

Demo

Microsoft

BackOfficeSPIN-2

Page 18: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

18

1TB Database Server AlphaServer 8400 4x400. 10 GB RAM 324 StorageWorks disks 10 drive tape library (STC Timber Wolf DLT7000 )

SPIN-2

Hardware

S T C9 7 4 0D L TT a p eL i b r a r y

4 89 G BD r i v e s

4 89 G BD r i v e s

4 89 G BD r i v e s

A l p h aS e r v e r8 4 0 0

E n t e r p r i s e S t o r a g e A r r a y

1 0 0 M b p sE t h e r n e t S w i t c h D S 3 I n t e r n e t

M a pS e r v e r

4 89 G BD r i v e s

4 89 G BD r i v e s

4 89 G BD r i v e s

8 x 4 4 0 M H zA l p h a c p u s1 0 G B D R A M

4 89 G BD r i v e s

S i t eS e r v e r s

Page 19: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

19

broswer

HTMLJava

Viewer

The Internet

Web Client

Microsoft AutomapActiveX Server

Internet InfoServer 4.0

Image DeliveryApplication

SQL Server7

MicrosoftSite Server EE

Internet InformationServer 4.0

Image Provider Site(s)

Terra-Server DB Automap Server

Sphinx(SQL Server)

Terra-ServerStored Procedures

InternetInformationServer 4.0

ImageServer

Active Server Pages

MTS

Terra-Server Web Site

Software

Page 20: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

20

• Backup and Recovery– STC 9717 Tape robot– Legato NetWorker™– Sphinx Backup/Restore Utility– Clocked at 80 MBps!!

• SQL Server Enterprise Mgr– DBA Maintenance– SQL Performance Monitor

System Management &

Maintenance

Page 21: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

21

TerraServer File Group Layout• Convert 324 disks to 28 RAID5 sets

plus 28 spare drives

• Make 4 NT volumes (RAID 50) 595 GB per volume

• Build 30 20GB files on each volume• DB is File Group of 120 files

HSZ70 A

HSZ70 B

HSZ70 A

HSZ70 B

HSZ70 A

HSZ70 B

HSZ70 A

HSZ70 B

HSZ70 A

HSZ70 B

HSZ70 A

HSZ70 B

E: F: G: H:

HSZ70 A

HSZ70 B

Page 22: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

22

AlternateNameName

CountryIDStateIDTypeID

GazSourcIDLatitude

LongitudeUGridIDZGridIDDOQdate

SPIN2date

PlaceID

Place

ImageFlag

FeatureType

TypeIDDescription

13

GazetteerSource

GazSrcIDDescription

11,089,897

Country

CountryIDCountryName

UNcode

264

State

StateIDCountryIDStateName

1083

CountrySearch

AlternateNameCountryIDGazSrcID

1148

StateSerach

AlternateNameCountryID

StateIDFreatureIDGazSrcID

3776

PlaceGrid

ZGridIDBestPlaceName

XDistanceYDistrance

50,000,000

Gazetteer Design

• Classic Snowflake Schema

• Fast First hint to Optimizer

Page 23: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

23

1

ImgSource

SrcIDSrcName

SrcTblNameSrcDescription

GridSysIDImgTypeID

2

Jump

UGridIDZGridID

ZTileGridIDImgDataImgDate

ImgTypeIDImgMetaID

SrcIDEncryptKeyFile Name

.65 M SPIN21.5 M USGS

OriginalMetaData

OrigMetaIDSrcID

ImageSourceAgency

SourcePhotoIDSourcePhotoDateSourceDEMDate

MetaDataDateProductionSystem

ProductionDateDataFileSizeCompressionHeaderBytes

…80 other fields

650 k SPIN22 M USGS

Pick

NameDescription

LinkPickDate

10

ImageMeta

ImgMetaIDOrigMetaIDImgStatusImgDate

ImgTypeIDJumpPixHeightJumpPixWidth

BrowsePixHeightBrowsePixWidthThumbPixWidthThumbPixHeight

CutColCutRowMidLat

MidLongNELat

NELongNWLat

NWLongSELat

SELongSWLat

SWLongUGridID

UTMZoneXUtmIDYUtmIDXGridIDYGridIDZGridID

650 k SPIN22 M USGS

ImgType

ImgTypeIDImgFileDescImgFileExt

MimeStr

4

Browse

UGridIDZGridID

ZTileGridIDImgDataImgDate

ImgTypeIDImgMetaID

SrcIDEncryptKeyFile Name

.65 M SPIN21.5 M USGS

Thumb

UGridIDZGridID

ZTileGridIDImgDataImgDate

ImgTypeIDImgMetaID

SrcIDEncryptKeyFile Name

.65 M SPIN21.5 M USGS

Tile

UGridIDZGridID

ZTileGridIDImgDataImgDate

ImgTypeIDImgMetaID

SrcIDEncryptKeyFile Name

16 M SPIN296 M USGS

xxx

UGridHits

URLUGridID

ZTileGridIDcount

Log

URLTime

<extensivelist of actionparameters

xxx

TileMeta

ImgMetaIDOrigMetaID

SrcIDImgStatusImgDate

ImgTypeIDTilePixHeightTilePixWidth

CutColCutRowMidLat

MidLongNELat

NELongNWLat

NWLongSELat

SELongSWLat

SWLongUGridID

UTMZoneXUtmIDYUtmIDXGridIDYGridIDZGridID

16 M SPIN296 M USGS

Image Data Design• Image pyramid stored in DBMS (250 M recs)

Page 24: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

24

Image Delivery and LoadDLTTape “tar”

\Drop’N’ DoJobWait 4LoadLoadMgr

DB

100mbitEtherSwitch

108 9.1 GBDrives

Enterprise Storage Array

AlphaServer8400

108 9.1 GBDrives

108 9.1 GBDrives

STCDLTTape

Library

604.3 GBDrives

AlphaServer4100

ESAAlphaServer4100

LoadMgr

DLTTape

NTBackup

ImgCutter

\Drop’N’ \Images

10: ImgCutter20: Partition30: ThumbImg40: BrowseImg45: JumpImg50: TileImg55: Meta Data60: Tile Meta70: Img Meta80: Update Place

...LoadMgr

Page 25: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

25

SQL 7 Testimonial• We started using it March 4 1997

– SQL 7 Pre-Alpha

– SQL 7 Alpha

– SLQ 7 Beta 1

– SQL 7 Beta

• Loaded the DB twice– (we made application mistakes)

• Now doing it “right”

• Reliability: Great! SQL 7 never lost data

• Ease of use: Great!

• Functionality: Great!

Page 26: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

26

Outline

• Status report on Commodity Server Performance

• Why Most VLDBs will be Multi-Media Servers

• Preview of Microsoft’s SQL Server 7

Page 27: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

27

SQL 7: Easy & Functional

EasyEasy Dynamic self managementDynamic self management Multi-site managementMulti-site management Alert/response managementAlert/response management Job scheduling and executionJob scheduling and execution Scriptable managementScriptable management profiling/tuning toolsprofiling/tuning tools Fully UnicodeFully Unicode English Language QueryEnglish Language Query Integrated text search engineIntegrated text search engine

ScalabilityScalability

Data WarehousingData Warehousing

Page 28: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

28

Made It Easier!(fewer knobs)

• Desktop & Workgroups– Auto Configure Engine / Dynamic Disk/memory

– Reduce Learning Curve, Increase Productivity

– Self-Managing SQLAgent, Wizards, “Task Pads”

• Large Organizations– Deploy/manage hundreds of SQL Servers

– Lower TOC for Large Environments

– Multi-Server Operations/ “Lights-out” Environment

Page 29: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

29

• Admin servers from one place• Automate simple stuff• Wizards for common stuff• Manage arrays of servers– operations, security,…

– Replication

– Import/export

•Interface is scriptable– COM object model

– Script with Java, VB, ...

•Scheduling and Multi-step jobs

Multi-Site Management

Page 30: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

30

DBA and Developer Tools• Built-in GUI

– data/schema design

– data query & edit

– intgrated with programming tools

• SQL Server Profiler– Selected server events and trace criteria

– “Capture” output to screen or replay

• SQL Server Expert– Analyzes actual server usage history

– Makes recommendations to improve performance

– Recommends Index design

– Recommends operations procedures

Page 31: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

31

Wizards and GUIs• Wizards galore (over 50 at last count)• MS Access as a query interface• Built-in data access tools (integrated with tools)• Graphical show plan

Page 32: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

32

Many New Wizards...

• Create a Database

• Scheduled Backup

• Create a Maintenance Plan

• Create a Scheduled Job

• Create an Alert

• Security Wizard

• Import Data to SQL Server

• Export Data From SQL Server

• Clustering (Wolfpack)

• Index Tuning Wizard

Web Assistant Register Servers Configure Replication Create Publication Create Pull Subscription Create Push Subscription Replication Partitioning Create an Index Create a Stored Procedure Create a View More to come...

Page 33: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

33

Distributed Management Objects (SQL-DMO)

• COM Interfaces for administering SQL Server– Embedded Administration (no UI)

• All Administration Functions Supported– Server, Database Configurations, Settings– Object Creation, Security, Replication, Scripting,..– 40+ Objects, 1000+ properties and methods

• Integration Interface for ISV Administration– I.e., Baan using DMO for Scripted App Install

• Scripting Via VBA and Jscript + DCOM

Page 34: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

34

DMO: Object Model (Overview)

Users

Databases

Logins

DB Options

Configurations

Alerts

Operators

Tasks

Jobs

SQLAgent

Transaction Logs

Publications

Remote Login

Linked ServersColumns

IndexesView

Stored Procs

Table

Files

FileGroups

Keys (PK/FK)

TriggersRules

Defaults

SQL Server

Page 35: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

35

DMO Scripting

• Backup a Database

Set MyServer = CreateObject("SQLDMO.SQLServer")‘Create Server ObjectSet MyBackup = CreateObject("SQLDMO.Backup") ‘Create Backup Object

MyServer.Name = “MSSALES” ‘ Identify ServerMyServer.LoginSecure = True ‘ Windows NT AuthMyServer.Connect ‘ Connect

MyBackup.Database = ”SALESII” ‘ Database to backupMyBackup.Files = "\\MyServer\Backups\" _ ‘ Backup Location

+ MyBackup.Database +”.bak” ‘ Name Backup FileMyBackup.SQLBackup MyServer ‘ Back it Up

MyServer.Disconnect ‘ We’re Done!

Page 36: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

36

Scalability

Win9x/NTW versionWin9x/NTW version Dynamic row-level lockingDynamic row-level locking Improved query optimizerImproved query optimizer Intra-query parallelismIntra-query parallelism 64-bit support64-bit support ReplicationReplication Distributed queryDistributed query High Availability ClustersHigh Availability Clusters

EasyEasy

ScalabilityScalability

Data WarehousingData Warehousing

Page 37: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

37

Scale Down to Windows 95-98• Full function (same as NTW)

• Self managing

• Many tools

• Integration with Next MS Access

• Great for imbedded apps

Page 38: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

38

Replication• Transactional and Merge

• Remote update

• ODBC and OLE DB subscribers

• Wizards

• Performance

2PC, 2PC, RPCRPC

SubscriberSubscriber

DB2

CICS SubscriberSubscriberSubscriberSubscriber

VSAM

OS 390DB2

PublisherPublisher

Updating SubscriberUpdating Subscriber(immediate updates)(immediate updates)

DistributorDistributor

SubscriberSubscriber

Page 39: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

39

•# of emp. per group# of emp. per group

•total inc. per grouptotal inc. per group

Local Agg.Local Agg.

4 x 50 rows4 x 50 rows

+ + + +

DisksDisks50,000 rows50,000 rows

Global Agg.Global Agg. Result 50 rowsResult 50 rows+

Parallel QuerySMP & Disk Parallelism

• Plus Distributed• Plus Hash Join (fanciest on the planet)

• Plus Optimized Partitioned views

Page 40: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

40

Distributed Heterogeneous QueriesData Fusion / Integration

Join spread sheets, databases, directories,

Text DBs

etc.

Any source that exposes OLE DB interfaces

SQL Server as gateway, even on the desktop

DatabaseDatabase(DB2, VSAM, Oracle, …)(DB2, VSAM, Oracle, …)

SpreadsheetSpreadsheet

PhotosPhotos

MailMail

MapsMaps

DocumentsDocumentsand the Weband the Web

DirectoryDirectoryServiceService

SQL 7.0Query

Processor

Page 41: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

41

UtilitiesThe Key to LARGE Databases

• Backup– Fuzzy– Parallel– Incremental– Restartable

• Recovery– Fast– File granularity

• Reorganize– shrinks file – reclusters file

• Auto-repair

Page 42: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

42

Data Warehousing

Warehousing FrameworkWarehousing Framework Visual data modelerVisual data modeler Microsoft repositoryMicrosoft repository Data transformation services Data transformation services

(DTS)(DTS) Plato & Dcube - Multi Plato & Dcube - Multi

Dimensional Data CubesDimensional Data Cubes English query 2.0English query 2.0 Built-in text-index engineBuilt-in text-index engine

EasyEasy

ScalabilityScalability

Data WarehousingData Warehousing

Page 43: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

43

Key Microsoft Data Warehouse Programs

• Data Warehouse Framework (DWF)– Process -- for building, using and managing– Pipeline -- for metadata flow– Protocols -- to integrate components

• Data Warehouse Alliance (DWA)– Partners -- ISVs pledged to the framework and its parts– Products -- complete spectrum from Microsoft and

third-parties

Page 44: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

44

Microsoft Data Warehousing Framework

OperationalData

(OLE-DB **)

OperationalData

(OLE-DB **)

Data Warehouse Design(logical/physical schema*/ data flow**)

Data Warehouse Design(logical/physical schema*/ data flow**) End-User Tools

(Excel**,Access,

English Query)

End-User Tools

(Excel**,Access,

English Query)

Data Warehouse Management(Console*, Scheduling**, Events**,Topology*,)

Data Warehouse Management(Console*, Scheduling**, Events**,Topology*,)

Data Transformations

(DTS**)

Data Transformations

(DTS**)

Data Marts(SQL Server** &OLAP Server**)

Data Marts(SQL Server** &OLAP Server**)

OLE

-DB

**O

LE-D

B**

Building Using

Man

ag

ing

** available in SQL Server 7 (* partially) Meta-Data FlowData Flow

Microsoft Repository** (Persistent Shared Meta-Data)Microsoft Repository** (Persistent Shared Meta-Data)

DB Schema**DB Schema** Transformation**Transformation** Scheduling Scheduling OLAPOLAP

Data Mart Design**(Cubes/Star schema)

Data Mart Design**(Cubes/Star schema)

Page 45: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

45

Alliance for Data Warehousing

BMCData MirrorExecusoftInformaticaMicrosoftPlatinum TechnologyPraxisPrismSagentSASSterlingV-Mark

AndyneBusiness ObjectsCognosIQ SoftwareMicrosoftNCR Data MiningPilotPlatinum TechnologySagentSASSeagateWall Data

DW Build DW Access

Technical and marketing relationship

Supports SQL Server storage engine

Third-party products tested with BackOffice

Page 46: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

46

DW Alliance Milestones

• 9/96 - Launched with 8 founding members

• 3/97 - Design review

• 1/97 - 6/97 - Expanded to 21 members

• 7/97 - Repository design review – Team development of shared metadata

• 9/97 - OLE DB for OLAP API specification

• 1H’98 - Integration development with Sphinx DTS and Replication APIs

Page 47: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

47

Microsoft Repository• Based on joint Sterling/Microsoft design (Shipped 97Q2)• Wide distribution:VB, Visual Studio and Third-Parties• Designed with over 60 vendors• Extended to support DB schema, transformations, OLAP

– Key element of the DW Framework

• UML is abstract model• Everything viewable

in UML terms

UML Unified Modeling Language

UMX Uml Extensions

CDE Component Descriptions

COM Component Object Model

DBM Database Model

DTM Data Type Model

GEN Generic

SQL Microsoft SQL Server

OCL Oracle

UML

UMX

CDE

COM

DBMDTM GEN

SQL OCL

Page 48: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

48

Repository & Data Warehousing

• Common infrastructure -- the meta-data pipeline

• Supports interoperability between data warehousing tools and products

• Process:– Initial spec developed with 12 vendors– Gathering feedback now– Final spec review in Redmond, 2/98

Page 49: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

49

Data Transformation

RepositoryRepositoryMetadataMetadata

TransformsTransforms Oracle > SQL ServerOracle > SQL Server

Function Example() Transform()

If DTSSource(“CreditRating”) = “1” thenDTSDestination(” Risk ") = ”Good"

Else If DTSSource(”Credit") = ”2”DTSDestination(” Risk ") = ”Average”

Else If DTSSource(”Credit") = ”3”DTSDestination(” Risk ") = ”Bad”

ElseExample = DTS_SkipRow

End if

End Function

TransformationTransformationObjectsObjects

ActiveX ScriptsActiveX Scripts

SQLAgentSQLAgentMultiserverMultiserverOperationsOperations

Data PumpData PumpData PumpData Pump

IDTSDataPumpIDTSDataPump IUnknownIUnknown

• Workflow system manages Data Pump– Pre-defined transforms using the DTS GUI– Procedural VB Script, JavaScript, VBA, any COM

• Multi-stream in, Multi-stream out

Page 50: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

50

Transformations• Data quality and validation

– Missing values, scrubbing, exception handling

• Data integration– Heterogeneous query, join keys, elim. dups

• Transforms– Combine/decompose multiple columns to one

• Aggregation

• Central metadata– Business rules, data lineage

Page 51: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

51

Flexible Architecture• Debates between

MOLAP and ROLAP vendors obscure customer needs

• Plato is the product that best supports MOLAP, ROLAP and Hybrid and offers the most seamless integration of all three

• Users & apps only see cubes

MO

LAP

UserView

Dataload

PersistentStore

UserView

Dataaccess

MDCache

RO

LAP

UserView

MDCache

Hyb

rid

Page 52: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

52

Source tableSource table

Partition 1Partition 1

ROLAP

Partition 2Partition 2

Partition 3Partition 3

ROLAP

EuropeEurope

USAUSA

AsiaAsia

MD SQLMD SQL

SQLSQL

Plato and Dcubeand HOLAP

““Plato”Plato” serverserver

““Plato”Plato”

DesignerDesigner

Dcu

be

Dcu

be

ClientClientappapp

User 1User 1

Dcu

be

Dcu

be

ClientClientappapp

User 2User 2

CHEVY

FORD 19901991

19921993

REDWHITEBLUE

By Color

By Make & Year

By Color & Year

By MakeBy Year

Sum

Page 53: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

53

How Plato Handles Data Explosion

Fact Table

Quarter

Product Family

Quarter

ProductMonth

Product Family

Month

Products

Aggregation Wizard finds the aggregations that feed the most other aggregations

Page 54: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

54

How Plato Handles Data Explosion

• Aggregation Wizard finds the “80-20” rule in the data– The 20 percent of all possible pre-aggregations that provide 80 percent of

the performance gain

– Analyses level counts for each dimensions and parent-child ratios for each level

• Independent of OLAP data model

Page 55: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

55

OLE DB For OLAP

• OLE DB extensions to access MD data – Part of OLE DB 2.0

• One new object: Dataset

• Enhancements to existing objects

• Heavily leverages OLE DB

Page 56: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

56

OLE DB For OLAPObjects And Interfaces

CommandCommand CoCreateInstanceCoCreateInstance

EnumeratorEnumerator

Data sourceData source

SessionSession

Schema RowsetsSchema Rowsets

Flattened RowsetFlattened Rowset

Range RowsetRange Rowset

DatasetDataset

Page 57: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

57

English Query

Page 58: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

58

OBJECT RELATIONALThe Next Great DBMS Wave

• All the DB vendors are adding objects

• Microsoft is adding DBs to Objects

• Integration with COM+

• Gives user-defined types and objects

• Plug-ins will be Billion dollar industry– Blades for SQL Server razor

Page 59: 1 Commodity Database Servers Jim Gray Microsoft Research Gray@Microsoft.com Gray/talks

59

Outline

• Status report on Commodity Server Performance

• Why Most VLDBs will be Multi-Media Servers

• Preview of Microsoft’s SQL Server 7