32
Massive scale with Microsoft SQL Server 2008 R2 Parallel Data Warehouse Edition Rushabh Mehta Managing Director (India) | Solid Quality Mentors [email protected]

Rushabh Mehta Managing Director (India) | Solid Quality Mentors

Embed Size (px)

DESCRIPTION

Agenda Microsoft Data Warehousing Overview SMP v/s MPP Architecture Microsoft Parallel Data Warehouse Architecture and Components

Citation preview

Page 1: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

Massive scale with Microsoft SQL Server 2008 R2 Parallel Data Warehouse Edition

Rushabh MehtaManaging Director (India) | Solid Quality [email protected]

Page 2: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

About me: Rushabh MehtaProfessional Association for SQL Server

PresidentSolid Quality Mentors (SolidQ)

Business Intelligence MentorManaging Director, India

SQL Server MVP

[email protected] ◊ www.solidq.com ◊ @rushabhmehta

Page 3: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

AgendaMicrosoft Data Warehousing OverviewSMP v/s MPP ArchitectureMicrosoft Parallel Data Warehouse Architecture and Components

Page 4: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

Microsoft Data Warehousing Offerings

Page 5: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

SQL 6.01995

SQL 7.01998

SQL 20002000

SQL 20052005

SQL 20082008

SQL 6.51996

SQL CE2000

64-bit2001

OLAP and ETLData MiningManaged Reporting

Microsoft’s Commitment to DW and BI Pe

rvas

ive

Insig

ht

Data WarehousingAd-hoc Reporting

DW ScaleData ProfilingCompression

VS IntegrationKPIsMultiple sources Resource Governor

Partitioning

FastTrack2009

PDW2010

Power PivotLoad Optimize

Parallel ProcessingScale to 100s of TB

•Gartner Leaders Quadrant for Business Intelligence, since 2008•Gartner Leaders Quadrant for Data Warehouse, since 2008• Leader in “The Forrester Wave: Enterprise Data

Warehousing Platforms, Q1 2009”• Fastest growing of top 5 data warehouse vendors -

IDC • Microsoft spends as a company $9.1 billion in

research annually

Page 6: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

SQL Server Fast Track Data Warehouse

A method for designing a cost-effective, balanced system for Data Warehouse workloads Reference hardware configurations developed in conjunction with hardware partners using this methodBest practices for data layout, loading and management

Solution to help customers and partners accelerate their data warehouse deployments

Page 7: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

Fast Track Scope

Dat

a Pa

th

Data Warehouse

Analysis Services Cubes

PerformancePoint ServicesSAN, Storage Array

Reporting Services

Web Analytic Tools

Integration Services ETL

SharePoint Services

Microsoft Office SharePoint

Data Staging,Bulk Loading

Subject AreaData Marts

Supporting Systems BI Data Storage Systems Presentation Layer Systems

Reference Architecture Scope (dashed)

Pres

enta

tion

Dat

aPr

esen

tatio

n D

ata

Page 8: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

Fast Track Value Proposition

8

Appliance-like time to valueReduces DBA effort; fewer indexes, much higher level of sequential I/O

Choice of HW PlatformsDell, HP, Bull, EMC and IBM – more

in future

Low TCO ThroughCommodity hardware and value

pricing; Lower storage costs.

High ScaleNew reference architectures scale

up to 48 TB (assuming 2.5x compression)

Reduced RiskValidated by Microsoft; better

choice of hardware; application of Best Practice

Page 9: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

SMP ArchitectureSMP = Symmetric Multiprocessing

Two or more identical processors connected to single shared main memory and controlled by single OS instanceAny processor can work on any taskEasily move tasks between processors to balance workload efficiently

All SQL Server implementations up until now have been SMP

Page 10: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

MPP ArchitectureMPP = Massively Parallel Processing

Uses many separate CPUs running in parallel to execute a single programEach CPU has its own memoryApplications must be segmented, using high speed communications between nodes

Page 11: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

Parallel Data WarehouseControl

RackDataRack

Control Rack

Data Rack/s

Page 12: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

Compute Nodes Storage Nodes

Spare Compute Node

Dua

l Fib

er C

hann

el

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQLDua

l Infi

niba

nd

Control Nodes

Active / Passive

Landing Zone

Backup Node

SQL

Management Servers

Client Drivers

ETL Load Interface

Corporate Backup Solution

Support / Patching

Corporate Network Private Network

SQL

SQL

Parallel Data Warehouse

Compute Node + Storage Node PDW Node

Page 13: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

Compute NodesEach MPP node is a highly tuned SMP node with standard interfaces

Dedicated hardware, database & storage

Running SQL Server 2008 EE

SQL as primary interface

Compute Node

SQL

Page 14: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

Architecture: Compute Server Node Hardware Options

Pre-configured For Each Sqlserver Instance On Each Compute Node.

• Drives Configured As RAID1 To Avoid Appliance Failover For A Single Drive Failure• Dell Compute Nodes Have 2

LUN’s (2 RAID1 Pairs)• HP Compute Nodes Have 3

LUN’s (3 RAID1 Pairs)

tempdb Used For The Following Purposes

• Sort-work Area For Data Loading Into Clustered Index Tables

• Spill Area For Hash Joins Not Fitting Into Memory

• Temporary PDW Tables

Enterprise ClassDBMS

TempDBWorkspace

Dual Multi-CoreProcessors

DUAL 4Gb FC Dual InfiniBand

CPU

CPU

RAM

Page 15: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

Data LayoutReplicated:A table structure that exists as a full copy within each discrete PDW Node.

Distributed: A table structure that is hashed on a single column and uniformly distributed across all nodes on the appliance. Each distribution is a separate physical table in the DBMS.

Ultra shared nothing: The ability to design a schema of both distributed and replicated tables to minimize data movement between nodes

Small sets of data can be more efficiently stored in full (replicated).Certain set operations are more efficient against full sets of data (i.e., single node operations).

Page 16: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

Data Layout

Date DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day

Store DimStore Dim IDStore NameStore MgrStore Size

Item DimProd Dim IDProd CategoryProd Sub CatProd Desc

Sales FactDate Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold

Promo DimMktg Camp IDCamp NameCamp MgrCamp StartCamp End

DD

SD

ID

MD

SF1

DD

SD

ID

PD

SF2

DD

SD

ID

PD

SF3

DD

SD

ID

PD

SF4

DD

SD

ID

PD

SF5

DD

SD

ID

PD

SF1

Page 17: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

Compute Nodes Storage Nodes

Spare Compute Node

Dua

l Fib

er C

hann

el

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQLDua

l Infi

niba

nd

Control Nodes

Active / Passive

Landing Zone

Backup Node

SQL

Management Servers

Client Drivers

ETL Load Interface

Corporate Backup Solution

Support / Patching

Corporate Network Private Network

SQL

SQL

Parallel Data Warehouse

Control NodesActive / Passive Cluster

SQLClient Drivers

Page 18: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

Control Node & Client Drivers• Client connections always go through the

control node• The Control Node contains no persistent

user data• PDW ‘Secret Sauce’

• Processes SQL requests• Prepares execution plan• Orchestrates distributed execution

• Local SQL Server to do final query plan processing / result aggregation

• Client Drivers provided by DataDirect• ODBC, OLE-DB, JDBC and ADO.NET client drivers• Available drivers for 32 and 64 bits

Page 19: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

PDW Benefits – Massive Parallel Processing

Control Rack DataRack

Query 1

Query 1 is standard T-SQL submitted to SQL Server on Control Node

?????????

?

Query is executed on all 10 NodesResults are sent back to client

Page 20: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

PDW Benefits – Massive Parallel Processing

Blazing fast performance by parallelizing queries on highly optimized ultra shared nothing nodes.

Control Rack DataRack Multiple queries

are simultaneously executed across all nodes.

PDW supports querying while data is loading.

?

?

??

?

?

?

? ????

? ???

??? ? ??????? ? ??????? ? ??????? ? ??????? ? ??????? ? ??????? ? ??????? ? ??????? ? ??????? ? ????

Page 21: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

Compute Nodes Storage Nodes

Spare Compute Node

Dua

l Fib

er C

hann

el

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQLDua

l Infi

niba

nd

Control Nodes

Active / Passive

Landing Zone

Backup Node

SQL

Management Servers

Client Drivers

ETL Load Interface

Corporate Backup Solution

Support / Patching

Corporate Network Private Network

SQL

SQL

Parallel Data Warehouse

Support / Patching

Management NodesActive / Passive Cluster

Page 22: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

Management Node• Runs a separate domain controller

(Active Directory)• Used for deploying patches to all

nodes in the appliance• Holds images in case a node needs

reimaging• High Availability using Active / Passive

clustering

Page 23: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

Compute Nodes Storage Nodes

Spare Compute Node

Dua

l Fib

er C

hann

el

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQLDua

l Infi

niba

nd

Control Nodes

Active / Passive

Landing Zone

Backup Node

SQL

Management Servers

Client Drivers

ETL Load Interface

Corporate Backup Solution

Support / Patching

Corporate Network Private Network

SQL

SQL

Parallel Data Warehouse

Landing Zone

ETL Load Interface

Page 24: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

Landing Zone• Provides high capacity storage for data files

from ETL processes• Integration services available on the landing

zone• Connected to internal network• Available as sandbox for other applications

and scripts that run on internal network

SourceLanding

Zone Files

Data Loader

Compute Nodes

Page 25: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

Compute Nodes Storage Nodes

Spare Compute Node

Dua

l Fib

er C

hann

el

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQLDua

l Infi

niba

nd

Control Nodes

Active / Passive

Landing Zone

SQL

Management Servers

Client Drivers

ETL Load Interface

Support / Patching

Corporate Network Private Network

Backup Node

Corporate Backup Solution

SQL

SQL

Parallel Data Warehouse

Backup Node

Corporate Backup Solution

Page 26: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

Backup Node• Coordinated backup across the nodes• Database level backup

• Full or differential• Metadata backup

• Can restore to a larger appliance• Optional item – 1 size per config

• Up to 524TB of capacity• Available in XS, S, M, L and XL

Page 27: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

PDW Software Architecture

SQL Server

DW Authenticati

on

DW Configuratio

nDW

Queue DW Schema

PDW Services

DMS

IIS

Compute NodesCompute Nodes

Compute Node

Landing Zone

Backup Node

Management Node

Built by DWPUExisting MS software 3rd Party

Nexus Query Tool

JDBCOLE-DBODBC

ADO.NETSQL Server

DMS

User DataAdmin

Console

DSQLCore

Engine Services

DMS Manager

MS BI(AS, RS)

DMS

DMS

Loader

ClientSQL SSIS

HPC AD

SQL OS

SQL OS

Control Node

3rd Party Tools (Client Access)

Page 28: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

ConclusionMPP architecture supports massive scale through increased parallelization and shared-nothing architectureMicrosoft SQL Server 2008 R2 Parallel Data Warehouse Edition brings massive scale wrapped in the simplicity of an appliance

Page 29: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

ReferencesMicrosoft Parallel Data Warehouse official sitehttp://www.microsoft.com/pdw

Page 30: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

Feedback / QnAYour Feedback is Important!Please take a few moments to fill out our

online feedback form at: << Feedback URL – Ask your organizer for this in advance>>

For detailed feedback, use the form at http://www.connectwithlife.co.in/vtd/helpdesk.aspx

Or email us at [email protected]

Use the Question Manager on LiveMeeting to ask your questions now!

Page 31: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

ContactSolidQwww.solidq.com

Email [email protected]

Page 32: Rushabh Mehta Managing Director (India) | Solid Quality Mentors

© 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after

the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.