46
Big Data Working with Terabytes in SQL Server Andrew Novick www.NovickSoftware.com

Big Data Working with Terabytes in SQL Server Andrew Novick

Embed Size (px)

Citation preview

Page 1: Big Data Working with Terabytes in SQL Server Andrew Novick

Big DataWorking with Terabytes in SQL Server

Andrew Novick

www.NovickSoftware.com

Page 2: Big Data Working with Terabytes in SQL Server Andrew Novick

Agenda

What’s Big? Concerns

ETL/Load Performance Query Performance Backup/Restore Performance

Architecture Solutions

Page 3: Big Data Working with Terabytes in SQL Server Andrew Novick

Introduction

Andrew Novick – Novick Software, Inc. Business Application Consulting

SQL Server .Net

www.NovickSoftware.com Books:

Transact-SQL UDFs SQL 2000 XML Distilled

Page 4: Big Data Working with Terabytes in SQL Server Andrew Novick

SQL Pass 2008

November 18-21 – Seattle

Page 5: Big Data Working with Terabytes in SQL Server Andrew Novick

What’s big?

Page 6: Big Data Working with Terabytes in SQL Server Andrew Novick
Page 7: Big Data Working with Terabytes in SQL Server Andrew Novick

What’s Big?

100’s of gigabytes and up to 10’s of terabytes

100,000,000 rows an up to 100’s of Billions of rows

Page 8: Big Data Working with Terabytes in SQL Server Andrew Novick

Big Scenarios

Data Warehouse

Very Large OLTP databases (usually with reporting functions)

Page 9: Big Data Working with Terabytes in SQL Server Andrew Novick

Big Hardware

Multi-core 8-64

RAM 16 GB to 256 GB

SAN’s or direct attach RAID

64 Bit SQL Server

Page 10: Big Data Working with Terabytes in SQL Server Andrew Novick

Concerns

Page 11: Big Data Working with Terabytes in SQL Server Andrew Novick

What me worry?

Page 12: Big Data Working with Terabytes in SQL Server Andrew Novick

Concerns

Load Speed (ETL)

Query Speed

Data Management

Backup / Restore DBCC CHECKDB, remove Fragmentation

Page 13: Big Data Working with Terabytes in SQL Server Andrew Novick

Architecture

What do we have to work with?

Page 14: Big Data Working with Terabytes in SQL Server Andrew Novick

SQL Server Storage Architecture

Physical IO - subsystem

DiskDisk Disk Disk Disk Disk

Logical Disk System – Windows Drives

Drive C: Drive D: Drive E:

SQL Server Storage

FileGroupB

FileB2 FileB1

FileGroupA

FileA1

Table1 Table2

Page 15: Big Data Working with Terabytes in SQL Server Andrew Novick

Solutions

Page 16: Big Data Working with Terabytes in SQL Server Andrew Novick

Solution to what?

Load Speed (ETL)

Query Speed

Data Management

Backup / Restore DBCC CHECKDB, remove Fragmentation

Page 17: Big Data Working with Terabytes in SQL Server Andrew Novick

Solutions

Use Multiple FileGroups/Files

Spread Data to maximize resource use Sliding Window if there is a time dimension

Partitioned Tables and/or Views

ETL – Insert into empty unindexed tables

Use READ_ONLY FileGroups to minimize maintenance needs.

Page 18: Big Data Working with Terabytes in SQL Server Andrew Novick

I/O Performance

Little has changed in 50 years

Watch out for bottlenecks in the I/O Path

Memory reduces the need for I/O

Disks can only do so many I/O operations per second

The more disk heads you have the higher the I/O throughput.

Page 19: Big Data Working with Terabytes in SQL Server Andrew Novick

At 3 PM on the 1st of the month:

Where do you want your data to be?

Page 20: Big Data Working with Terabytes in SQL Server Andrew Novick
Page 21: Big Data Working with Terabytes in SQL Server Andrew Novick

Sliding WindowAlways ThereData

TemporalData

2008-01

TemporalData

2008-02

TemporalData

2008-03

TemporalData

2008-04

TemporalData

2008-05

Page 22: Big Data Working with Terabytes in SQL Server Andrew Novick

Read_Only FileGroups

Require only one Backup Don’t require page or row locks Don’t require maintenance

The ALTER requires exclusive access to the database before SQL 2008

ALTER DATABASE <database> MODIFY FILEGROUP <filegroup> SET READ_ONLY

Page 23: Big Data Working with Terabytes in SQL Server Andrew Novick

Concern - Load Performance (ETL) 4 Hour maximum window for any load

Load into large indexed tables is unacceptably long.

Example: 2 million row insert into 400 million row table with 10 indexes took 12 hours.

Page 24: Big Data Working with Terabytes in SQL Server Andrew Novick

Concern – Query Performance Users have little patience

Data warehouse Queries Frequent small to medium to support UI

Less frequent large queries on fact tables

may access 10’s of GB

Page 25: Big Data Working with Terabytes in SQL Server Andrew Novick

Partitioning

Page 26: Big Data Working with Terabytes in SQL Server Andrew Novick

Partitioned Views

Available in SQL Server Standard Available in SQL Server 2000 Created like any view

Check constraints tell SQL Server which data is in which table

CREATE VIEW Fact AS SELECT * FROM Fact_20080405UNION ALL SELECT * FROM Fact_20080406

ALTER TABLE Fact_20080405 ADD CONSTRAINT CK_FACT_20080405_Date CHECK (FactDate >= ‘2008-04-05’ and FactDate < ‘2008-04-06’

Page 27: Big Data Working with Terabytes in SQL Server Andrew Novick

Partitioned View - 2

Looks to a query like any table or view

Can take advantage of parallel execution.

Limited to 256 tables Can cross servers (Performance Warning)

SELECT FactDate, ….. FROM Fact WHERE CustID=334343 AND FactDate = ‘2008-04-05’

Page 28: Big Data Working with Terabytes in SQL Server Andrew Novick

View Fact

Partitioned View

Physical IO - subsystem

DiskDisk Disk Disk Disk Disk

Logical Disk System – Windows Drives

Drive C: Drive D: Drive E:

SQL Server Storage

FileGroupB

FileB2 FileB1

FileGroupA

FileA1

Table1 Table2

FGF1

F1

FGF2 FGF3 FGF4

F4 F3 F2

Fact_20080330

Fact_20080331

Fact_20080401

Fact_20080401

FGF1

F1

FGF2 FGF3 FGF4

F4 F3 F2

Page 29: Big Data Working with Terabytes in SQL Server Andrew Novick

Partition Elimination

The query compiler can eliminate partitions from consideration in the plan

Partition elimination happens at query compile time.

Values matching the partitioning column must be constants to allow partition elimination.

Page 30: Big Data Working with Terabytes in SQL Server Andrew Novick

Demo 1 – Partitioned Views

Page 31: Big Data Working with Terabytes in SQL Server Andrew Novick

Partitioned Tables

SQL Server Enterprise SQL Server 2005 and Above

Require a non-null partitioning column Check constraints tell SQL Server what data

is in each parturition

All tables are partitioned!

Page 32: Big Data Working with Terabytes in SQL Server Andrew Novick

Partitioned Tables 2

Partition Function Defines how to split data

Partition Scheme Defines where to store each range of data

CREATE Partitioned View Fact_PF(smalldatetime) RANGE RIGHT FOR VALUES (‘2001-07-01’, ‘2001-07-02’)

CREATE PARTITION SCHEME Fact_PFAS PARTITION Fact_pf TO

(PRIMARY, FG_20010701, FG_20010702)

Page 33: Big Data Working with Terabytes in SQL Server Andrew Novick

Table Fact

Partitioned Table

Physical IO - subsystem

DiskDisk Disk Disk Disk Disk

Logical Disk System – Windows Drives

Drive C: Drive D: Drive E:

SQL Server Storage

FileGroupB

FileB2 FileB1

FileGroupA

FileA1

Table1 Table2 Fact.$Partition=1

Fact.$Partition=2

Fact.$Partitoin=3

Fact.$Partition=4

FGF1

F1

FGF2 FGF3 FGF4

F4 F3 F2

Page 34: Big Data Working with Terabytes in SQL Server Andrew Novick

Demo 2 – Partitioned Tables

Page 35: Big Data Working with Terabytes in SQL Server Andrew Novick

Partitioning Goals

Adequate Import Speed Maximize Query Performance

Make use of all available resources Data Management

Migrate data to cheaper resources Delete old data easily

Page 36: Big Data Working with Terabytes in SQL Server Andrew Novick

Achieving Load Speed

Insert into empty tablesInsert into empty tables

Index and add foreign keys after the insert

Add the Slices to Partitioned Views Partitioned Tables

Page 37: Big Data Working with Terabytes in SQL Server Andrew Novick

Achieving Query Speed

Eliminate access to partitions during query compile

All disk resources should be used Parallel access

All available memory should be used All available CPUs should be used

Parallel query

Page 38: Big Data Working with Terabytes in SQL Server Andrew Novick

Solution

Partition at a sufficiently high grain Spread dimension data to all useable disks

Separate Data and Index FileGroups Multiple files per FileGroup

Spread Fact data by partition key to all useable disks

Rotate file locations to maximize dispersion

Page 39: Big Data Working with Terabytes in SQL Server Andrew Novick

Concern – Data Management (Backup) Let’s say you have a 10 TB database.

Now back that up.

Page 40: Big Data Working with Terabytes in SQL Server Andrew Novick

Backup Calculation

10 TB = 10000 GB

Typical Backup speed Low end 1 GB per minute High end 10 GB per minute

At 10 GB/Minute

Who’s got 1000 minutes?

Page 41: Big Data Working with Terabytes in SQL Server Andrew Novick

Achieving Backup Performance Backup less!

Maintain data in a READ_ONLY state

Compress Backups

Page 42: Big Data Working with Terabytes in SQL Server Andrew Novick

Partial Backup

Partial Base Backs up read_write filegroups

Partial Differential Differential backup of read_write filegroupsBACKUP DATABASE <db name>

READ_WRITE_FILEGROUPS

WITH DIFFERENTIAL ….

BACKUP DATABASE <db name>

READ_WRITE_FILEGROUPS …..

Page 43: Big Data Working with Terabytes in SQL Server Andrew Novick

Maintenance Operations

Maintain only READ_WRITE data DBCC CHECKFILEGROUP ALTER INDEX

REBUILD PARTITION = REORGANIZE PARTITION =

Avoid SHRINK

Page 44: Big Data Working with Terabytes in SQL Server Andrew Novick

SQL Server 2008 – What’s New

Row, page, and backup compression Filtered Indexes Optimization for star joins MERGE T-SQL DML Resource Governor

Fewer operations require exclusive access to the database

Page 45: Big Data Working with Terabytes in SQL Server Andrew Novick

New England Visual Basic Pro Focused on VB.Net development Meetings @ MS Waltham – MPR C 1st Thursday - 6:15 to 8:30 PM

Sept 4 – Jim O’Neil – ASP.Net Dynamic Data Sept 25 – Chris Hammond – DotNetNuke Oct 2 – Kathleen Dollard – XML Litterals in VB 9 Nov 6 – Joe Stagner – Stupid Hacker Tricks and How 2 Defend Feb 5 ’09 – Joe Hill – Novell – Mono/VB/etc….

www.NEVB.com

Page 46: Big Data Working with Terabytes in SQL Server Andrew Novick

Thanks for Coming

Andrew Novick

[email protected]

www.NovickSoftware.com