CAT: Azure SQL DB Premium Dive and Mythbuster Fairweather.pdf · CAT: Azure SQL DB Premium –Deep...

Preview:

Citation preview

CAT: Azure SQL DB Premium – Deep

Dive and Mythbuster

Ewan Fairweather

Senior Program Manager

Azure Customer Advisory Team

Tobias Ternstrom

Principal Program Manager

Data Platform Group

Cloud & Enterprise Customer Team

CAT

Customer

45%

Engineering

45%

Community

10%

Architecture guidance and technology expertise

i.e. patterns, practices and codification

Community

Accelerate cloud adoption

i.e. white-papers, events

Frameworks and code

Platform

Provide “end to end” Azure customer story on how

features works in customer project scenarios based

on learnings from the biggest dpeloyments

Europe:

- Azure Applications

- Azure Data

- Azure Analytics

Agenda

• Persistent data options in Azure

• Azure SQL DB Premium Deep Dive

• Sizing and capacity planning

• Customer experience and learnings

• Summary

Persistent Data Options in Azure

The Application Journey

Azure Storage Options

Platform as a Service

• Azure SQL Database (managed databases)

• Publish and run

• Shared environment

Infrastructure as a Service

• SQL Server running in a Windows Azure VM

• Or any other database you have bits for

• Full control / insight

• More administrative effort

Azure Storage

• Tables

• Blobs

• Queues

• No relational

• Cheap storage

• Optimized for density and scale out

High “Friction”/Control

Res

ou

rce

sDedicated

Shared

Low

100% of API, Virtualized

Roll-your-own HA/DR/scale

SQL Server in IaaSVirtualized Machine

SQL Server

Raw iron

Scale-up

Full h/w control

Roll-your-own HA/DR/scale

Auto HA, Fault-Tolerance

Self-provisioning, mgmt @ scale

Virtualized Database

SQL Database - PaaS

Three different ways to run SQL

Premium

Azure

Decision Points• Common Data going to WA Storage (Point lookups, minimial relational)

– Telemetry Logs, append workloads, primarily key value lookups

– Blobs for WA SQL DB (lower costs, reduce DB size under 150GB limit)

• Commonly going to SQL Server in VM (lift and shift, DW)

– Applications needing features not currently in SQL DB (example: Fulltext)

– Light DW workloads

• Commonly going to SQL DB (OLTP)

– Applications who do not want to manage their databases

– Applications that need massive horizontal scale (Internet-facing SaaS ISVs)

– New OLTP applications

– Premium DB extends Azure SQL DB’s capabilities

Typical Performance Factors

Factor Why it matters

Latency - Greater than on-premise- Higher variance

Establishing connections

- Initial login goes to the gateway

- Connections are unreliable and will fail

Multi tenancy

- Unpredictableperformance

- Soft throttling - Hard throttling- Shared log, max

transaction size

1

1

2

2

3

3

• Writes are the most expensive resource in this system

SQL DB Web/Business Performance Variance• Web/Business Editions provide elastic scale without

performance SLA

• There is some variance in performance due to multi tenancy, we will reduce the variance further over time

• SQL DB contains logic to move DBs around to balance load across each cluster to maximize average resources

DB ResourcesAvailable

Time

Databases can get different resources based on other’s activity

Resource management in Azure SQL DB• SQL Database monitors the usage of the shared resources to keep databases within resource

limits

• When resource usage exceeds limits SQL DB can manage resource usage at DB or node level killing connection or deny requests– Throttling stages: Soft (subset of DBs) and Hard (all DBs)

Decode type and resource

Resource Limit Error code

Database Size 150 GB or less depending on the database quota (MAXSIZE) 40544

Transaction durationState 1: 24 hoursState 2: 20 seconds if a transaction locks a resource required by an underlying system task

40549

Lock count 1 million locks per transaction 40550

TempDBState 1: 5 GB of tempdb spaceState 2: 2 GB per transaction in tempdbState 3: 20% of total log space in tempdb

40551

Transaction log spaceState 1: 2 GB per transactionState 2: 20% of total log space

40552

Memory 16 MB memory grant for more than 20 seconds 40553

Worker Thread Governance Every database will have a maximum worker thread concurrency limit10928 10929

Azure SQL DB Premium: How it works

Edition Comparison• Premium has reserved resources on all 3 nodes

• You can upgrade or downgrade a database

• You should decide sizing based on your resource needs

DB ResourcesAvailable

Time

P1

P2

Web/Business

Premium Edition• Some applications require guaranteed resources

• Premium Edition was introduced for customers who need dedicated resources

• Common customer attributes:– High throughput requirements

– Low latency requirements

– Low performance variance requirements

• Premium Edition details– Dedicated resources (min=max) to avoid performance variance

– Different sizes (P1-P2) allow adjustment based on resource needs

– Currently in Public Preview

Premium Edition Reservation Sizes• Reservations are done separately for each database

– Capacity is limited during public preview

– Customers can get 1-2 reservations based on availability

• Monthly Price is USD $930 for P1 at GA. P2 is 2x

• P3 and P4’s available at engineering discretion

Size CPU Cores Worker Threads

Active Sessions

Disk IO (IOPS)

Memory (GB)

P1 1 200 2000 150 8

P2 2 400 4000 300 16

Premium Database

Set Premium Service Objective

Checking Status of Azure SQL DB

• The DB will remain online aside from a few seconds during the final failover

Checking Current SLO

Checking Status of Move

• Lower and upper bound estimates vary between 15 minutes for an empty database and approximately 2 days for a 150 GB database

SQL Premium DB SizeSQL Premium GA

Monthly Cost

SQL VM

Monthly Cost

SQL VM Size

(Enterprise Edition)

P1 (M)

1 CPU Core

8GB RAM

150 IOPS

$930 $1,629

S (A1)

1 CPU Core

1.75GB RAM

2x500 IOPS

P2 (L)

2 CPU Cores

16GB RAM

300 IOPS

$1860 $1696

M (A2)

2 CPU Cores

3.5GB RAM

4x500 IOPS

$1,830

L (A3)

4 CPU Cores

7GB RAM

8x500 IOPS

$2,321

A6

4 CPU Cores

28GB RAM

8x500 IOPS

$3,660

XL (A4)

8 CPU Cores

14GB RAM

16x500 IOPS

$4,642

A7

8 CPU Cores

56GB RAM

16x500 IOPS

Premium DB or A Larger VM?

Sizing and Capacity Planning

Sizing Databases• For a SINGLE database…

– Find largest resource consumer

– Measure peak load over time period

– Choose appropriate reservation size to handle peak load

• Workload Type matters– Batch processing – aim to achieve avg

throughput over time (not size for peak)

– Interactive applications need to size for the peak to preserve response times

0

0.2

0.4

0.6

0.8

1

1.2

1

11

21

31

41

51

61

71

81

91

10

1

11

1

12

1

13

1

14

1

15

1

16

1

CPUAvgCoresUsedInHr

Peak Load Example• Weekly IO chart of a large

customer on WA SQL DB

• We actively work on the load each week

– Query tuning

– Moving maintenance jobs to off-peak hours

• We also do aggressive things

– Split different functions out into different databases

– Rate-meter background jobs to not impact core workloads

0

50

100

150

200

250

20

13

09

16

00

20

13

09

16

10

20

13

09

16

20

20

13

09

17

06

20

13

09

17

16

20

13

09

18

02

20

13

09

18

12

20

13

09

18

22

20

13

09

19

08

20

13

09

19

18

20

13

09

20

04

20

13

09

20

14

20

13

09

21

00

20

13

09

21

10

20

13

09

21

20

20

13

09

22

06

20

13

09

22

16

Avg Hourly Physical Write IOPS (1 week)

Total

Daily Maintenance Job Moved to off-peak hours

Weekly Maintenance Moved to Sunday

Query Tuning to reduce daily peak

26

Azure SQL Database DMV Surface Area Health (master)• sys.event_log• sys.bandwidth_usage• sys.database_connection_stats

Resource Usage (master)• sys.resource_usage*• sys.resource_stats*

Data Access & Usage• sys.dm_db_index_usage_stats• sys.dm_db_missing_index_details• sys.dm_db_missing_index_groups• sys.dm_db_missing_index_group_stats• sys.dm_exec_sessions

Performance• sys.dm_exec_query_stats• sys.dm_exec_sql_text• sys.dm_exec_query_plan• sys.dm_exec_requests• sys.dm_db_wait_stats

Windows Azure SQL Database and SQL Server -- Performance and Scalability Compared and Contrastedhttp://msdn.microsoft.com/en-us/library/windowsazure/jj879332.aspx

Capacity planning• Use sys.resource_stats (in preview) in

master db to determine your application resource needs:

SELECT * FROM sys.resource_statsWHERE database_name = 'MyTestDB' ANDstart_time > DATEADD(day, -7, GETDATE())

Investigating resource usage

SELECT(SELECT

SUM(DATEDIFF(minute, start_time, end_time))FROM sys.resource_statsWHERE database_name = 'MyTestDB' AND

start_time > DATEADD(day, -7, GETDATE()) ANDavg_cpu_cores_used > 1.0) * 1.0 / SUM(DATEDIFF(minute,

start_time, end_time)) AS percenage_more_than_1_coreFROM sys.resource_statsWHERE database_name = 'MyTestDB' AND start_time > DATEADD(day, -7,GETDATE())

SELECTavg(avg_cpu_cores_used) AS 'Average CPU Cores Used',max(avg_cpu_cores_used) AS 'Maximum CPU Cores Used',avg(avg_physical_read_iops + avg_physical_write_iops) AS

'Average Physical IOPS',max(avg_physical_read_iops + avg_physical_write_iops) AS

'Maximum Physical IOPS',avg(active_memory_used_kb / (1024.0 * 1024.0)) AS 'Average

Memory Used in GB',max(active_memory_used_kb / (1024.0 * 1024.0)) AS 'Maximum

Memory Used in GB',avg(active_session_count) AS 'Average # of Sessions',max(active_session_count) AS 'Maximum # of Sessions',avg(active_worker_count) AS 'Average # of Workers',max(active_worker_count) AS 'Maximum # of Workers'

FROM sys.resource_statsWHERE database_name = 'MyTestDB' AND start_time >DATEADD(day, -7, GETDATE())

Avg and Max resource usage Percentage of time using more than 1 core

Managing DB Resource Growth• Assuming your application resources grow over time, you need a plan to deal with

that growth, in the box world we are always sizing for a future peak

• The cloud offers two architectural approaches to manage, which are both elastic

– “Scale-up” (limited): Web/Business -> P1 -> P2

– “Scale-out”: use more databases

• Partitioning data by function or by tenant allows you to adjust as needed to growth in resource usage at the database level

• Plan on actively monitoring/alerting telemetry about the resource use so you can adjust to growth before something breaks…

Cost Optimization• Two paths to improve your cloud service

– Spend more money (purchase more capacity)

– Optimize/Tune (more operations in capacity you have)

• The Cloud model lets you choose– If you have development resources available, you might choose to ‘tune’

– If you are on a time deadline, you might just choose to scale up instead

• This model also works great for seasonal demand changes– Example: Add capacity before the holiday sales season, remove after. (~$32

per day for a P1)

Customer Experience and Learnings

What’s different with data access in the Cloud?

Two key areas of attention

• Connection management issues

• Less reliable connection state due to multiple layers and network hops

• Retry logic mandatory to implement reliable communications between application and database server

• Higher latency between app tier and database tier compared to an on-premises deployment

• Firewalls, load balancers, gateways

• This amplifies the impact of chatty application behaviors

We will talk more about this in our 11:45 session

• Time (t) or size (n) window approach can result in the loss of: – t seconds of data

– n rows of data

Batching inserts

Azure SQL Database

APP TIER

Bulk Insert

Buffer, group items

Data access layer

Application logicasynch inserts

Batch

12

3

Takeaways • Reliability: Plain ADO.NET single insert with full retry logic

• Density: Async and buffered approach

2

1

• How can I improve density?

– Introducing batching

– Reducing application round-trips

– Improve insert performance

• Leverage asynchronous approach

– Buffer across time and number of insertions

Workload tuning options

Scale-Up vs. Scale-Out• P1-P2 supported during public preview period• Additional sizes can be introduced by GA• With a scale up approach you may lose some

flexibility– E.g. require planning for worst case / peaks– Premium let you scale up/down between P1 and

P2 max one time a day

• Scale up may not fit all costs/business models– Unpredictable workloads– Multiple database deployments

Easyjet Seat Selection System • 70/30 R/W

workload, very efficient workload (<200mS max exec time)

• Majority of queries benefitted from switching

• Reduced and more stable response times for both reads and writes

Switch

Customer experience: Easyjet

• Reduced impact of 40501, 10928 and 10929 errors

• Remaining exceptions have been mostly due to application issues

Major ticket sale

Broken build

Another Customer experience• Availability has greatly improved

after the switch (less than 2min x month)

• Growing trend in CPU usage– Around 2 on average, with spikes up to 5

• No major errors related to resource issues

• Sporadic throttling for High Log IO waits

Switch

Application-Tier Caching• App-tier caching is a very effective way to reduce data-tier

load

• Azure has a several caching solutions available to you

• For load spikes, this can often significantly reduce peak load

• Example: Azure SQL DB was used in the last US Presidential Election

– Few writes, massive reads all at once

– App tier caching used to remove reads from the database

CPU graph for the core reporting DB

• 1st 10 seconds – 44K page views/second (est. ~450K DB calls/sec)

• Next 20 seconds – 10K page views/sec (est. ~100K DB calls/sec)

(DB calls mostly removed due to caching)

Summary

Summary

• Premium DB provides predictable performance

and elasticity

• We offer you a mixture of scale-up and scale-out

approaches

• The elastic nature of these options allows you to

deal with peaks in a different way to on premises

Resources• Premium Preview for SQL Database Guidance

(http://msdn.microsoft.com/en-us/library/jj853352.aspx)

• Azure SQL Database and SQL Server -- Performance and Scalability Compared and Contrasted (http://msdn.microsoft.com/en-us/library/windowsazure/jj879332.aspx)

Resources…• Cloud Service Fundamentals in Windows Azure

• Wiki: http://social.technet.microsoft.com/wiki/contents/articles/17987.cloud-service-fundamentals.aspx

• Best practices on:– Scale out architecture

– Design for operations

– Telemetry solution

– Reliable architecture

THANK YOU!• For attending this session and

PASS SQLRally Nordic 2013, Stockholm

Recommended