Windows Azure Storage: Building applications that scale

Preview:

DESCRIPTION

Windows Azure Storage: Building applications that scale. Jai Haridas & Joe Giardino 4-004 . Agenda. Introduction Scalability Targets Best Practices Demo – How to build scalable apps? Questions. Introduction. Windows Azure Storage Abstractions. - PowerPoint PPT Presentation

Citation preview

Windows Azure Storage: Building applications that scaleJai Haridas & Joe Giardino4-004

IntroductionScalability TargetsBest PracticesDemo – How to build scalable apps?Questions

Agenda

Introduction

Blobs: Store files and metadata associated with it

Queues: Durable asynchronous messaging system to decouple components

Tables: A strongly consistent NoSQL structured store that auto scales

Drives and Disks: Network mounted durable drives available for applications in the cloud

Windows Azure Storage Abstractions

Windows Azure Storage Characteristics A “pay for what you use” cloud storage system

Durable: Store multiple replicas of your data• Local replication: – Synchronous replication before returning success

• Geo replication: – Replicated to data center at least 400+ miles apart – Asynchronous replication after returning success to user.

Available: Multiple replicas are placed to provide fault tolerance

Scalable: Automatically partitions data across servers to meet traffic demands

Strong consistency: Default behavior is consistent reads once data is committed

All abstractions backed by same storeSame feature set across all abstractions (geo, durability, strong consistency, auto scale, monitoring, partitioning logic etc.)Reduce costs by blending different characteristics of each abstraction880K requests/s at peak & 4+ Trillion objectsGreat performance for low transaction costs!

Easy to use and open REST APIs Client libraries in Java, Node.js, PHP, .NET etc.

Windows Azure Storage Characteristics

Xbox: Uses Windows Azure Blobs, Tables & Queues for applications like Cloud Game Saves, Halo multiplayer, Music, Kinect data collection etc. SkyDrive: Uses Windows Azure Blobs to store pictures, documents etc.

Bing: Uses Windows Azure Blobs, Tables and Queues to implement an ingestion engine that consumes Twitter and Facebook public status feeds and provides it to Bing search

And many more…

Windows Azure Storage – How is it used?

Facebook/Twitter data stored into blobsIngestion engine process blobsAnnotate with auth/spam/adult scores, content classification , expands links, etcUses Tables heavily for indexingQueues to manage work flowResults stored back into blobsBing takes resulting blobs and folds into search index

BING REALTIME FACEBOOK/TWITTER SEARCH INGESTION ENGINERunning on Windows Azure Storage

Windows Azure Blobs

User postingsStatus updates

…………

Bing Ingestion Engine (Azure Service)

Windows Azure Tables

Windows Azure Queues

peak 40,000 Requests/sec2~3 billion Requests per day

Took 1 dev 2 months to design, build and release to production

Index Facebook/Twitter data within 15 seconds of update

VM

VM

VM

VM

North America Region Europe Region Asia Pacific Region

S. Central – U.S. Sub-region

W. Europe Sub-region

N. Central – U.S. Sub-region N.

Europe Sub-region

S.E. AsiaSub-region

E. AsiaSub-region

Major datacenter

CDN PoPs

Windows Azure Storage

East – U.S. Sub-region

West – U.S. Sub-region

Scalability Targets

Flat network storage design “Quantum 10” network Non-blocking 10Gbps based fully meshed network Move to software based Load Balancer Provides an aggregate backplane in excess of 50 Tbps bandwidth per Datacenter

Enables high bandwidth scenarios such as Windows Azure IaaS disks, HPC, Map Reduce etc.

Windows Azure Flat Network Storage

Scalability Targets -Storage AccountStorage Account level targets by end of 2012 Applies to accounts created after June 7th 2012Capacity – Up to 200 TBs Transactions – Up to 20,000 entities/messages/blobs per second Bandwidth for a Geo Redundant storage accountIngress - up to 5 GibpsEgress - up to 10 Gibps

Bandwidth for a Locally Redundant storage accountIngress - up to 10 Gibps Egress - up to 15 Gibps

Scalability Targets – PartitionPartition level Targets by end of 2012 Applies to accounts created after June 7th 2012Single Queue – Account Name + Queue NameUp to 2,000 messages per second  Single Table Partition – Account Name + Table Name + PartitionKey valueUp to 2,000 entities per second  Single Blob – Account Name + Container Name + Blob NameUp to 60 Mibps  

Best Practices

Common Design & ScalabilityCommon Settings

Turn off Nagling & Expect 100 (.NET – ServicePointManager)Set connection limit (.NET – ServicePointManager.DefaultConnectionLimit)Turn off Proxy detection when running in cloud (.NET – Config: autodetect setting in proxy element)

Design you application that allows distributing requests across your range of partition keys to avoid hotspots Avoid Append/Prepend pattern: Access pattern lexically sorted by Partition Key valuesPerform one time operations at startup rather than every request Creating containers/tables/queues which should always exist Setting required constant ACLs on container/table/queue

Common Design & ScalabilityTurn on analytics & take control of your investigations– Logging and Metrics Who deleted my container? – Look at the client IP for delete container request Why is my request latency increased? - Look at E2E vs. Server latency What is my user demographics? – Use client request id to trace requests & client IP How can I tune my service usage? – Use metrics to analyze API usage & peak traffic stats And many more…

Use appropriate retry policy for intermittent errors Storage client uses exponential retry by default

Storage AccountsCollocate storage accounts with your compute roles as egress is free within same regionUse multiple storage accounts to: achieve targets that exceed a single storage achieve client proximity

Map multiple clients to same storage account Use different containers/tables/queues instead an account for each

customer

Storage AccountsDesign to add more accounts as neededUse different account for Windows Azure DiagnosticsChoose local redundant storage ifData can be restored on major disastersGeographical boundary constraints on where data can be stored

Windows Azure Blobs - ScalabilityHow to upload a single large blob as fast as possible?Use parallel block upload and then commitStorage client library – CloudBlobClient’s SingleBlobUploadThresholdInBytes and ParallelOperationThreadCount

How to upload multiple blobs as fast as possible?Use single thread for each blob but upload multiple blobs in parallel

Future Service Update: Accommodate better “Append/Prepend” blob writes

Windows Azure Blobs - ScalabilityHow to migrate/backup blobs to a different account?Use Async Copy using REST version 2012-02-12 and abovePipeline all copies at once and then poll for completion

In partial reads scenarios, use 64KB block size when primary scenario is to read small chunks

Table Design & ScalabilityCritical Queries: Select PartitionKey, RowKey to avoid hotspotsBatch Processing: Entities that need to be updated together will have same PartitionKey

Schema-less: Store multiple types in same tableSingle Index – {PartitionKey, RowKey}: If needed, concatenate columns to form composite keysEntity Locality:

{PartitionKey, RowKey} determines sort orderStore related entites together to reduce IO and improve performance

Table Design & ScalabilityAvoid Large Number of Scans: Cost depends on selectivity - number of entities that server should read vs. returned in result set Store data in required pivots in same partition i.e. Partition Level Index Build index table to provide eventually consistent index i.e. Cross Partition Level Index Cache relatively constant and frequently scanned result setsLow latency requirement: If scenario allows, use async pattern to queue up request when online operation is expensive or failsDo not reuse DataServiceContext/TableServiceContext across logical operations

Queue Design & ScalabilityMake message processing idempotent

Use “Update Message” API to save intermittent processing state Extend visibility time based on message. Allows processors to set smaller lease times Save interim processing state Use “Message Count” to scale workersUse “Dequeue Count” on message to handle poison messagesBatch Get - increase message processing throughputUse blobs to store messages that exceed 64KB or to increase throughput

Use multiple queues to scale beyond the published targets

Shared Access Signatures (SAS)Use HTTPS to securely use/transport SAS tokens

Use minimum permissions needed and restrict time period for accessClock Skew - Clients should renew SAS token with sufficient leewayRevocable SAS - Use policy to store SAS permissions, expiry etc. Only 5 policies can be associated with container Change policy IDs when removing and recreating policiesREST version 2012-02-12 allows expiry > 1 hour even without using revocable SAS

How to build scalable apps?- Social Graffiti DemoA Windows 8 App & Windows Azure Service - Scales using Tables and Queues

Social GraffitiWindows 8 App allows users to select a wallWall has tiles that allows users to draw

Users can zoom into any tile and draw shapesUsers can see what other users are drawing at the same timeGoals Allow a single wall to be used by many users and share graffiti Number of active walls is not a concern as service controls its creation

Social Graffiti Demo - Demo

Read/Write

Read/Write

Design Choices - #1

Devices

Windows Azure Storage Service

Graffiti Service

Graffiti Service will need to scale like Windows Azure Storage Service Need to accommodate large throughputLatency overhead as all calls routed via service

SAS Read/Write

Get SAS

Design Choices - #2

Devices

Windows Azure Storage Service

Graffiti Service

Graffiti Service scales as it is only responsible for handing SASDevices read/write directly to Windows Azure Storage service using SAS – no routing overheadFrequent table scans to retrieve all shapes drawn by other users – scalability targets may exceed & scans should be avoided

SAS Write

Get SAS & Shapes to

show

Design Choices - #3

Devices

Windows Azure Storage Service

Graffiti Service scales as it is only responsible for handing SAS and cached result of shapesDevices write directly to Windows Azure Storage service using SAS – no routing overheadCheckpoint workers coalesces shapes into a single record cached by service – reduces the number of records to scanDevices read from cache from Graffiti service which caches to reduce scans

Checkpoint Workers

Cache Changes

Graffiti Service

Checkpoint changes to

image

Data ModelMaster Wall Table List of available walls to draw onPartitionKey = Ticks i.e. Append only pattern but since we do not require high request throughput, this works

Wall Table Each wall gets its own table to store all the shapes drawn and store checkpointed recordsParitionKey = Tile Location i.e. x, y, z coordinates on wallStores checkpoint and individual shape records. RowKey prefix determines type

Dirty Tile Table Each tile in a wall gets a row to mark it as dirty which the checkpoint worker role monitors to checkpointPartitionKey = Tile Location i.e. x, y, z coordinates on wallRowKey = Wall Table name

Windows Azure Table Client - Service LayerOption 1 – WCF Data ServicesGood for fixed schema used like relational tablesDo not require control on serialization/deserializationOption 2 – Table Service Layer’s Dynamic Table EntityEntity containing a Dictionary of Key-Value propertiesUsed when schema is not known example: ExplorersPerformance!Option 3 – Table Service Layer’s POCO POCO derives from ITableServiceEntity or TableServiceEntityControl over serialization and deserialization – make your data dance to your tune!ETag maintained with Entities - easy to update!Performance!

3

21

Work Flow - Device

Devices

1 - Device asks Graffiti service a list of walls2- Once user selects a wall, device sends request to service to indicate interest in a wall

Service returns SAS for Wall & Dirty Tile tables for a restricted range of PartitionKey3 - Device also upserts a record in Dirty Tile table to indicate that it is dirty. Note – all users working on same tile will have only one record in Dirty Tile table4 - When user draws, device caches it and then sends a batch request to insert into Wall table

Graffiti Service

Dirty Tile Table

4 Wall Table

Social Graffiti Demo - Code

3

2

1

Work Flow - Checkpoint

1 - A master checkpoint worker queries the Dirty table records to retrieve all tiles that are dirty2- It queues up work for checkpoint workers to checkpoint dirty tiles across all walls

Uses multiple queues to scale out3 – Checkpoint worker will process the message in which it

a) Retrieves all shape records for a tile b) Generates an image object for the shapes it retrieved and stores it as checkpoint record

c) Deletes unconditionally all the shape records retrieved and previous checkpoint record

d) Deletes the dirty indicator from Dirty Table only if Etag matches

Checkpoint Queues

Wall Table

Dirty Tile Table

Checkpoint

Workers

Social Graffiti Demo – Code & Analytics

Performance with Storage Client Library 2.0

Storage Client 1.7 Storage Client 2.0 : DataServices

Storage Client 2.0 : Reflection

Storage Client 2.0 : No Reflection

0

5

10

15

20

25

30

35

40

0

20

40

60

80

100

120

140

160

Batch Stress Scenario Per Entity Latencies

DeleteQueryInsertProcessor Time (s)Test Duration (s)

Tim

e (m

s)

Faster NoSQL table accessUpto 72.06% reduction in execution timeUpto 31.92% reduction in processor time Upto 69-90% reduction in latency

Performance with Storage Client Library 2.0

Storage Client 1.7 Storage Client 2.00

5,000

10,000

15,000

20,000

25,000

30,000

Large Blob Scenario (256MB) Resource Utilization

Total Test Time (s)Total Processor Time (s)

Tim

e (s

)

Storage Client 1.7 Storage Client 2.00

10

20

30

40

50

60

70

Large Blob Scenario (256MB) La-tencies

UploadDownload

Tim

e (s

)

Faster uploads and downloads31.46% reduction in processor time Upto 22.07% reduction in latency

Future Features

JSON for TablesCORS support for Blobs, Tables and QueuesContent-Disposition header for blobsQueue batching for PUT & DELETEProvide Geo failover control to userWindows Phone library

Questions?

• Storage team blogs @ http://blogs.msdn.com/b/windowsazurestorage/

• Getting Started @ https://www.windowsazure.com/en-us/develop/overview/

• Pricing information @ https://www.windowsazure.com/en-us/pricing/details/

Resources

• Follow us on Twitter @WindowsAzure

• Get Started: www.windowsazure.com/build

Resources

Please submit session evals on the Build Windows 8 App or at http://aka.ms/BuildSessions

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Recommended