70
Architecting to be Cloud Native On Windows Azure or Otherwise BU MET CS755, Cloud Computing, Dino Konstantopoulos 21-Mar-2013 (6:00 – 9:00 PM EDT) HELLO my name is Bill Wilder An App in the Cloud is not (necessarily) a Cloud-Native App

Architecting to be Cloud Native On Windows Azure or Otherwise

  • Upload
    melora

  • View
    47

  • Download
    2

Embed Size (px)

DESCRIPTION

HELLO my name is. Architecting to be Cloud Native On Windows Azure or Otherwise. Bill Wilder. BU MET CS755, Cloud Computing, Dino Konstantopoulos 21-Mar-2013 (6:00 – 9:00 PM EDT). An App in the Cloud is not (necessarily) a Cloud-Native App. www.cloudarchitecturepatterns.com. - PowerPoint PPT Presentation

Citation preview

Page 1: Architecting to be Cloud Native On Windows Azure or Otherwise

Architecting to be Cloud Native

On Windows Azure or Otherwise

BU MET CS755, Cloud Computing, Dino Konstantopoulos 21-Mar-2013 (6:00 – 9:00 PM EDT)

                                        

HELLOmy name isBill Wilder

An App in the Cloud is not (necessarily)a Cloud-Native App

Page 2: Architecting to be Cloud Native On Windows Azure or Otherwise

Who is Bill Wilder?

www.devpartners.com

www.bostonazure.org

www.cloudarchitecturepatterns.com

Page 3: Architecting to be Cloud Native On Windows Azure or Otherwise

Roadmap for this talk… …

1. App in the Cloud != Cloud App (or at least not a Cloud-Native App)

2. Put Cloud-Native in context of cloud platform types from software development point of view

3. How to keep running when things go wrong?4. How to scale?5. How to minimize costs?

Assumptions: – You know what “the cloud” is – so we can focus on application

architecture using cloud as a toolbox– You are interested in understanding cloud-native apps

Page 4: Architecting to be Cloud Native On Windows Azure or Otherwise

The term “cloud” is nebulous…

The term “cloud” is nebulous…

Page 5: Architecting to be Cloud Native On Windows Azure or Otherwise

“Bring Your Own” ____ as a Service

BYO

UsersBYO

Applications

BYO Virtual Machines

PaaS

IaaS

SaaS

more

less

Responsibility &

Flexibility

NIST: http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf

Most productive

platforms for

Cloud-Native

Apps

Page 6: Architecting to be Cloud Native On Windows Azure or Otherwise

The term “cloud” is nebulous…

A public cloud perspective…

Page 7: Architecting to be Cloud Native On Windows Azure or Otherwise

Windows Azure Feature Map

Page 8: Architecting to be Cloud Native On Windows Azure or Otherwise

What is different about the cloud?

What's different about the cloud?^public

Page 9: Architecting to be Cloud Native On Windows Azure or Otherwise

1/9th above w

ater

TTM & Sleeping well=

Page 10: Architecting to be Cloud Native On Windows Azure or Otherwise

MTBF MTTR

commodity hardware + multitenant services= cost-efficient cloud

failure is routine(so you better be good at

handling it)

Page 11: Architecting to be Cloud Native On Windows Azure or Otherwise

This bar is always open

*and*

has an APIPay by the Drink

Page 12: Architecting to be Cloud Native On Windows Azure or Otherwise

• Resource allocation (scaling) is:– Horizontal– Bi-directional– Automatable

The “illusion of infinite resources”

Page 13: Architecting to be Cloud Native On Windows Azure or Otherwise

Cloud-Native Applications have their Application Architecture aligned with the Cloud Platform Architecture

–Use the platform in the most natural way–Let the platform do the heavy lifting

where appropriate–Take responsibility for error handling, self-

healing, and some aspects of scaling

Page 14: Architecting to be Cloud Native On Windows Azure or Otherwise

• 3- or N-tier, SOA• Multi-data center• Horizontal scaling• Expects failure• PaaS

Traditional Cloud-Native

• 2-tier• Single data center• Vertical scaling• Ignores failure• Hardware or IaaS

• Less flexible• More manual/attention• Less reliable (SPoF)• Maintenance window• Less scalable, more $$

• Agile/faster TTM• Auto-scaling• Self-healing• HA• Geo-LB/FO

TELL

S/CL

UES

CON

SEQ

UEN

CES

Tells: Traditional vs Cloud-Native

Which is “best” architecture?

There is no “best” architecture – it is situational, a Technical Business Decision.

Cloud-native popularity growing in proportion to the shrinking cost

and competitive benefits.

Page 15: Architecting to be Cloud Native On Windows Azure or Otherwise

Putting Cloud Services to work

Putting the cloud to work

Page 16: Architecting to be Cloud Native On Windows Azure or Otherwise

Web Tier Web Tier

pageofphotos.com

Original Approach• 2-tier architecture• Stateful web nodesPros• Well understood• Easy to get working

[Potential] Cons• UX fails for upgrades,

hardware failures, app pool recycling

• Limited scale• Not Cloud-Native

Database

/maura

Page 17: Architecting to be Cloud Native On Windows Azure or Otherwise

Web Tier Web Tier

pageofphotos.com

1. Scale web tier (stateless)

2. Scale service tier (async)

3. Scale data tier

(shard)All while…handling failure and optimizing for cost- & operational- efficiency Scale the app, not the team!

Database

Service TierService Tier

Database

/maura

Page 18: Architecting to be Cloud Native On Windows Azure or Otherwise

Horizontal Scaling Compute Pattern

pattern 1 of 5

Page 19: Architecting to be Cloud Native On Windows Azure or Otherwise

Common Terminology:Scaling Up/Down Vertical ScalingScaling Out/In Horizontal “Scaling” But really is Horizontal Resource Allocation

• Architectural Decision– Big decision… hard to change

Vertical Scalingvs. Horizontal Scaling

Page 20: Architecting to be Cloud Native On Windows Azure or Otherwise

What’s the difference between performance

and scale??

Page 21: Architecting to be Cloud Native On Windows Azure or Otherwise

Vertical Scaling (“Scaling Up”)

.

Resources that can be “Scaled Up”• Memory: speed, amount • CPU: speed, number of CPUs• Disk: speed, size, multiple controllers• Bandwidth: higher capacity pipe• … and it sure is EASY

Downsides of Scaling Up• Hard Upper Limit• HIGH END HARDWARE HIGH END CO$T• Lower value than “commodity hardware”• May have no other choice (architectural)

Page 22: Architecting to be Cloud Native On Windows Azure or Otherwise

Horizontal Scaling (“Scaling Out”)Autonomous nodes

for scalability(stateless web servers, shared

nothing DBs, your custom code in

QCW)

Autonomous nodes*and*

Homogeneous nodes for operational simplicity

*and*Anonymous nodes

don‘t get emotionally involved!

This is how a [public] CLOUD PLATFORM works *and*

This is how YOUR CLOUD-NATIVE app works

Page 23: Architecting to be Cloud Native On Windows Azure or Otherwise

Load Balancer(Cloud Service)

Managed VMs(Cloud Service)

“Web Role”

Example: Web Tier www.pageofphotos.com

Page 24: Architecting to be Cloud Native On Windows Azure or Otherwise

1. Auto-Scale • Bidirectional

2. Nodes can fail• Releasing VM resources (e.g.,

via Auto-Scale) is one cause• Handle shutdown signals• Externalize session state

• e.g., see ASP.NET Session State Providers for Azure Tables, Azure Cache

• N+1 rule as UX optimization

Horizontal Scaling Considerations

Page 25: Architecting to be Cloud Native On Windows Azure or Otherwise

How many users does your cloud-native

application need before it needs to be able to

horizontally scale??

Page 26: Architecting to be Cloud Native On Windows Azure or Otherwise

Queue-Centric Workflow Pattern

(QCW for short)

pattern 2 of 5

Page 27: Architecting to be Cloud Native On Windows Azure or Otherwise

Extend www.pageofphotos.com into a new Service Tier

QCW enables applications where the UI and back-end services are Loosely Coupled

[ Similar to CQRS Pattern ]

Page 28: Architecting to be Cloud Native On Windows Azure or Otherwise

Web Tier Web Tier

pageofphotos.com

Add service tier (async)Leave Web Tier to do what it’s good at

Database

Service TierService Tier

/maura

Page 29: Architecting to be Cloud Native On Windows Azure or Otherwise

QCW Example: User Uploads Photo www.pageofphotos.com

Web Tier Service TierReliable Queue

Reliable Storage

Page 30: Architecting to be Cloud Native On Windows Azure or Otherwise

QCW

WE NEED:• Compute (VM) resources to run our code

• Reliable Queue to communicate

• Durable/Persistent Storage

Page 31: Architecting to be Cloud Native On Windows Azure or Otherwise

Where does Windows Azure fit?

Page 32: Architecting to be Cloud Native On Windows Azure or Otherwise

QCW [on Windows Azure]

WE NEED:• Compute (VM) resources to run our code

Web Roles (IIS – Web Tier) Worker Roles (w/o IIS – Service Tier)

• Reliable Queue to communicateAzure Storage Queues

• Durable/Persistent StorageAzure Storage Blobs

Page 33: Architecting to be Cloud Native On Windows Azure or Otherwise

QCW on Azure: User Uploads a Photo

WebRole(IIS)

WorkerRoleAzure Queue

Azure Blob

UX implications: how does user know thumbnail is ready?

ww

w.p

ageo

fpho

tos.

com

push pull

Page 34: Architecting to be Cloud Native On Windows Azure or Otherwise

Reliable Queue & 2-step Delete

WebRole

WorkerRole

var url = “http://pageofphotos.blob.core.windows.net/up/<guid>.png”;queue.AddMessage( new CloudQueueMessage( url ) );

var invisibilityWindow = TimeSpan.FromSeconds( 10 );CloudQueueMessage msg = queue.GetMessage( invisibilityWindow );// do all necessary processing…

Queue

queue.DeleteMessage( msg );

Page 35: Architecting to be Cloud Native On Windows Azure or Otherwise

QCW requires Idempotent

• Perform idempotent operation more than once, end result same as if we did it once

• Example with Thumbnailing (easy case)• App-specific concerns dictate approaches

– Compensating action, Last write wins, etc.• PARTNERSHIP: division of responsibility

between cloud platform & app Transaction cannot span database + queue

Page 36: Architecting to be Cloud Native On Windows Azure or Otherwise

QCW expects Poison Messages

• A Poison Message cannot be processed– Error condition for non-transient reason– Check CloudQueueMessage.DequeueCount

property• Falling off the queue may kill your system• Determine a Max Retry policy per queue

– Delete, put on “bad” queue, alert human, …

Page 37: Architecting to be Cloud Native On Windows Azure or Otherwise

QCW enables Responsive UX

• Response to interactive users is as fast as a work request can be persisted

• Time consuming work done asynchronously• Comparable total resource consumption, arguably

better subjective UX• UX challenge – how to express Async to users?

– Communicate Progress– Display Final results– Long Polling/Web Sockets (e.g., SignalR or Node.io)

Page 38: Architecting to be Cloud Native On Windows Azure or Otherwise

QCW enables Scalable App

• Decoupled front/back provides insulation– Blocking is Bane of Scalability– Order processing partner doing maintenance– Twitter down– Email server unreachable– Internet connectivity interruption

• Loosely coupled, concern-independent scaling– (see next slide)– Get Scale Units right

–Key to optimizing operational CO$T$

Page 39: Architecting to be Cloud Native On Windows Azure or Otherwise

QCW requires “Plan for Failure”

• VM restarts will happen– Hardware failure, O/S patching, crash (bug)

• Bake in handling of restarts into our apps– Restarts are routine: system “just keeps working”– Idempotent mindset is key– Event Sourcing (commonly seen with CQRS) may

help• Not an exception case! Expect it!• Consider N+1 Rule

Page 40: Architecting to be Cloud Native On Windows Azure or Otherwise

Aside: Is QCW same as CQRS?

• Short answer: “no”• CQRS

– Command Query Responsibility Segregation• Commands change state• Queries ask for current state• Any operation is one or the other• Sometimes includes Event Sourcing• Sometimes modeled using Domain Driven

Design (DDD)

Page 41: Architecting to be Cloud Native On Windows Azure or Otherwise

General Case: Many Roles, Many Queues

WebRole(IIS)

WorkerRole

WebRole(IIS)

WebRole

(Public)

WorkerRoleWorker

RoleWorker

Role Type 1

WorkerRoleWorker

RoleWorkerRoleWorker

Role Type 2

Queue Type 1

Queue Type 2

Queue Type 1

Queue Type 2

Queue Type 3

• Scaling is best when Investment α Benefit• Optimize for CO$T EFFICIENCY

• Logical vs. Physical Architecture depends on current scale

WorkerRole

Type 2

WorkerRole

Type 2

WorkerRole

Type 2

WebRole

(Admin)

Page 42: Architecting to be Cloud Native On Windows Azure or Otherwise

What about the Data?

• You: Azure Web Roles and Azure Worker Roles– Taking user input, dispatching work, doing work– Follow a decoupled queue-in-the-middle pattern– Stateless compute nodes

• Cloud: “Hard Part”: persistent, scalable data– Azure Queue & Blob Services– Three copies of each byte– Blobs are geo-replicated– Busy Signal Pattern

Page 43: Architecting to be Cloud Native On Windows Azure or Otherwise

Database Sharding Pattern

pattern 3 of 5

Page 44: Architecting to be Cloud Native On Windows Azure or Otherwise

Extend www.pageofphotos.com example into Data Tier

What happens when demands on data tier outgrow one physical database?

Page 45: Architecting to be Cloud Native On Windows Azure or Otherwise

Web Tier Web Tier

pageofphotos.com

Scale data tier (shard)

Sharding is horizontal scaling for databases.

Unlike compute nodes, databases are not stateless.

Database

Service TierService Tier

Database

/mauraDatabase

Database

Page 46: Architecting to be Cloud Native On Windows Azure or Otherwise

Database Sharding

• Problem: too much for one physical database– Too much data (e.g., 150 GB limit in WASD)– Not sufficiently performant

• Solution: split data across multiple databases– One Logical Database, multiple Physical Databases

• Each Physical Database Node is a Shard• Goal is a Shared Nothing design & single shard

handles most common business operations– May require some denormalization (duplication)

Page 47: Architecting to be Cloud Native On Windows Azure or Otherwise

All shards have same schema

SHARDS

Page 48: Architecting to be Cloud Native On Windows Azure or Otherwise

Sharding is Difficult

• What defines a shard? (Where to put/find stuff?)– Example – by HOME STATE: customer_ma,

customer_ia, customer_co, customer_ri, …– Design to avoid query / join / transact across shards

• What happens if a shard gets too big?– Rebalancing shards can get complex– Foursquare case study is interesting

• Cache coherence, connection pool management– Rolling-your-own is complex

Page 49: Architecting to be Cloud Native On Windows Azure or Otherwise

Where does Windows Azure fit?

Page 50: Architecting to be Cloud Native On Windows Azure or Otherwise

Windows Azure SQL Database (WASD)is SQL Server… with a few diffs…

Common

SQL ServerSpecific(for now)

WASDSpecific

“Just change the connection

string…”

• Full Text Search• Transparent Data

Encryption (TDE)• Many more…

Limitations• 150 GB size limit• Busy Signal PatternExtra Capabilities• Managed Service• Highly Available• Rental model• Federations

http://msdn.microsoft.com/en-us/library/ff394115.aspxAdditional information on Differences:

Page 51: Architecting to be Cloud Native On Windows Azure or Otherwise

Windows Azure SQL Databse Federations for Sharding

• Single “master” database– “Query Fanout” makes partitions transparent– Instead of customer_ma, customer_ia, etc… we are back to

customer database• Handles redistributing shards• Handles cache coherence and simplifies connection pooling

• No MERGE (yet); SPLIT only• Bonus feature for Multitenant Applications

USE FEDERATION myfed (myfedkey = 911) WITH FILTERING=ON RESET

• http://blogs.msdn.com/b/cbiyikoglu/archive/2011/01/18/sql-azure-federations-robust-connectivity-model-for-federated-data.aspx

Page 52: Architecting to be Cloud Native On Windows Azure or Otherwise

Key Take-away

Database Sharding has historically been an APPLICATION LAYER concern

Windows Azure SQL Database Federations supports sharding lower in the stack as a DATABASE LAYER concern

Page 53: Architecting to be Cloud Native On Windows Azure or Otherwise

My database instance is limited to 150 GB.

∞ ∞ ∞Does that mean the

cloud doesn’t really offer the illusion of infinite

resources??

Page 54: Architecting to be Cloud Native On Windows Azure or Otherwise

Busy Signal Pattern

pattern 4 of 5

Page 55: Architecting to be Cloud Native On Windows Azure or Otherwise

• Language/Platform SDKs on www.windowsazure.com • TOPAZ from Microsoft P&P: http://bit.ly/13R7R6A • All have Retry Policies

Page 56: Architecting to be Cloud Native On Windows Azure or Otherwise

Auto-Scaling Pattern

pattern 5 of 5

Page 57: Architecting to be Cloud Native On Windows Azure or Otherwise

Goal is AUTOSCALING – using a library or services

Microsoft• “WASABi” block from P&P (you run it)• MetricsHub is in the Azure store (very basic service)

Third Party Services• A few SaaS choices for Auto-Scaling and Monitoring

Page 58: Architecting to be Cloud Native On Windows Azure or Otherwise

in conclusion

In Conclusion

Page 59: Architecting to be Cloud Native On Windows Azure or Otherwise

Optimize for MTTR (1/2)• Apply Busy Signal Pattern

– Retry transient failures due to issues with network, throttling, failovers

– Applies to all cloud services• Apply Node Failure Pattern

– Stateless Nodes, QCW Pattern, handle node shutdown signals, covers nodes going away due to scaling action

– Consider N+1 Rule• Detect Poison Messages

– Protect against Bad Data

Page 60: Architecting to be Cloud Native On Windows Azure or Otherwise

Optimize for MTTR (2/2)• Prevent Resource Failures

– Environmental-signal-based Auto-Scaling (for surprises)

– Proactive Auto-Scaling for known spikes (e.g., Superbowl Ad, lunch rush)

– QCW Pattern (allow work to pile up w/o blocking users)

• Log Everything– Gather logs with Windows Azure Diagnostics

Page 61: Architecting to be Cloud Native On Windows Azure or Otherwise

Typical Site Any 1 Role Inst Overall System

Operating System Upgrade

Application Code Update

Scale Up, Down, or In

Hardware Failure

Software Failure (Bug)

Security Patch

What’s Up? Reliability as EMERGENT PROPERTY

Page 62: Architecting to be Cloud Native On Windows Azure or Otherwise

Optimize for Cost• Operational Efficiency Big Factor

– Human costs can dominate– Automate (CI & CD and self-healing) – Simplify: homogeneous nodes

• Review costs billed (so transparent!)– Be on lookout for missed efficiencies

• “Watch out for money leaks!”– Inefficient coding can increase the monthly bill

• Prefer to Buy Rent rather than Build – Save costs (and TTM) of expensive engineering

Page 63: Architecting to be Cloud Native On Windows Azure or Otherwise

Optimize for Scale• With the right architecture…

– Scale efficiently (linearly)– Scale all Application Tiers– Auto-Scale– Scale Globally (8/24 data centers)

• Use Horizontal Resourcing• Use Stateless Nodes• Upgrade without Downtime, even at scale• Do not need to sacrifice User Experience (UX)

Page 64: Architecting to be Cloud Native On Windows Azure or Otherwise

Cloud Architecture Patterns bookPrimer Chapters

1. Scalability2. Eventual Consistency3. Multitenancy and

Commodity Hardware4. Network Latency

www.cloudarchitecturepatterns.com

Page 65: Architecting to be Cloud Native On Windows Azure or Otherwise

Cloud Architecture Patterns book Pattern Chapters

1. Horizontally Scaling Compute Pattern2. Queue-Centric Workflow Pattern3. Auto-Scaling Pattern4. MapReduce Pattern5. Database Sharding Pattern6. Busy Signal Pattern7. Node Failure Pattern8. Colocate Pattern9. Valet Key Pattern10. CDN Pattern11. Multisite Deployment Pattern

Page 66: Architecting to be Cloud Native On Windows Azure or Otherwise

BostonAzure.org

• Boston Azure Cloud User Group• Focused on Microsoft’s Public Cloud Platform• Roles: Architect, Dev, IT Pro, DevOps (“WazOps”)• Talks, Demos, Tools, Hands-on, special events, …

• Monthly, 6:00-8:30 PM in Boston area (free)• Follow on Twitter: @bostonazure • More info or to join our Meetup.com group:

http://www.bostonazure.org

Page 67: Architecting to be Cloud Native On Windows Azure or Otherwise

Business Card

Page 68: Architecting to be Cloud Native On Windows Azure or Otherwise

My name is Bill Wilder

[email protected] ·· www.devpartners.com

www.cloudarchitecturepatterns.comcommunity

@bostonazure ·· www.bostonazure.org@codingoutloud ·· blog.codingoutloud.com ·· [email protected]

HELLO

my name is

Bill Wilder

Find this slide deck

here!

Page 69: Architecting to be Cloud Native On Windows Azure or Otherwise

Windows Azure Feature Map

Page 70: Architecting to be Cloud Native On Windows Azure or Otherwise

Questions?Comments?

More information?

?