Embracing Failure: Self-healing, Decentralized Resource Management for Apache CloudStack

Embracing FailureSelf-Healing, Decentralized Resource Management for Apache CloudStack

John BurwellVice President, Software Engineering

[email protected] | @john_burwell

mailto:[email protected]

http://twitter.com/john_burwell

@shapeblue #ccceu

VP of Software Engineering @ ShapeBlue

Member, Apache CloudStack PMC (June 2013)

Ran operations and designed automated provisioning for analytic/virtualization clouds

Led architectural design and server-side development of a SaaS physical security platform

About Me

http://www.shapeblue.com/

http://cloudstack.apache.org/

@shapeblue #ccceu

“ShapeBlue are expert builders of public &

private clouds. They are the leading global

Apache CloudStack integrator & consultancy”

…and we’re hiring!

About ShapeBlue

@shapeblue #ccceu

Bang ups and Hang Ups Can Happen to You

Derive the normative operationdesign from failure recovery

@shapeblue #ccceu

What is a Resource?Control Plane

Device

Device

Device

(Desired State)

(Actual State)

Resource

(Converges Desired with Actual State)

Eventually, the desired and actual states will be consistent

@shapeblue #ccceu

CloudStack partitions resources into zones,

clusters, and pods

@shapeblue #ccceu

Resource status information is stale or lost

Resource definitions conflict with device state

Entropy

Failure Modes

@shapeblue #ccceu

@shapeblue #ccceu

Consistency

AvailabilityPartition Tolerance

Pick 2

@shapeblue #ccceu

Orchestration operations are available and eventually consistent

... but device modifications must be consistent.

@shapeblue #ccceu

@shapeblue #ccceu

Orchestration TierAP

CP Automation Control Tier

@shapeblue #ccceu

Desired Resource StateAP

CP Actual Resource State

@shapeblue #ccceu

SchedulingAP

CP State Convergence

Resource OffersResource Status

State Transitions

Hoke

@shapeblue #ccceu

Simple Self-contained Locality Non-persistent

Hoke Design Goals

@shapeblue #ccceu

Runtime Resource View

ResourceFSM

Management

ProcessDevic

e

Queue

State Transitio

n

1

1

Monitor Process

ResourceOfferResourceStatu

s

@shapeblue #ccceu

An actor represents state and behavior

Communicate by message passing — each actor has a dedicated queue or mailbox

Each actor is allocated a lightweight thread — implicit lock

Actor Model

@shapeblue #ccceu

All resources represented in a directed, acyclic graph

The root node of the graph is the region organized in the following manner:region -> zone -> pod -> cluster

Each resource is a child of the partition node in which owns it

Resource Graph

@shapeblue #ccceu

Google’s resource scheduler Transactional shared state model

enabling sophisticated, global decision making

Supports both high churn and low churn workloads

Multiple, pluggable schedulers working in parallel

Inspiration from Omega

@shapeblue #ccceu

Two level scheduler Resource Offers Pessimistic Locking Pluggable Geared towards high churn workloads

Inspiration from Mesos

@shapeblue #ccceu

Best Effort shared-state scheduler Multiple parallel schedulers

distributed by partition Combines allocators and planners Pluggable

Hybrid Scheduler

@shapeblue #ccceu

Partition controllers spawn system VMs for their child partitions as need to address scheduler business and reliability guarantees

Parent partition controllers monitor the health of their child partition controllers and re-spawn as necessary

Auto Scaling, Self Healing

@shapeblue #ccceu

Evaluate implementing the concepts in the Orleans paper to reduce the number of active actors required

Determine best approach causality tracking for state transitions (e.g. version vectors)

Create a library implementing these concepts to demonstrate viability and separate concerns and performance test

Next Steps

@shapeblue #ccceu

Gilbert, Seth & Nancy Lynch. Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services. 2002.

Schwarkopf, Malte; Konwinski, Andy; et. al. Omega: flexible, scalable schedulers for large compute clusters. 2013.

References

http://www.apple.com/



https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&sqi=2&ved=0CB0QFjAAahUKEwi44vfXtrPIAhVIcT4KHe4dAdI&url=http%3A%2F%2Fresearch.google.com%2Fpubs%2Fpub41684.html&usg=AFQjCNFhre6QW6LnswxdXZu-jLg3WXQ1eQ&sig2=J1y2QcLPMepWCApydF_lcQ&bvm=bv.104615367,d.cWw

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&sqi=2&ved=0CB0QFjAAahUKEwi44vfXtrPIAhVIcT4KHe4dAdI&url=http%3A%2F%2Fresearch.google.com%2Fpubs%2Fpub41684.html&usg=AFQjCNFhre6QW6LnswxdXZu-jLg3WXQ1eQ&sig2=J1y2QcLPMepWCApydF_lcQ&bvm=bv.104615367,d.cWw

@shapeblue #ccceu

Hindman, Benjamin; Konwinski, Andy; et. al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. 2011.

Bernstien, Philip; Bykov, Sergey; et. al. Orleans: Distributed Virtual Actors for Programmability and Scalability. 2014.

References

https://people.csail.mit.edu/matei/papers/2011/nsdi_mesos.pdf

https://people.csail.mit.edu/matei/papers/2011/nsdi_mesos.pdf

http://research.microsoft.com/pubs/210931/Orleans-MSR-TR-2014-41.pdf

http://research.microsoft.com/pubs/210931/Orleans-MSR-TR-2014-41.pdf

@shapeblue #ccceu

Questions

Comments

@shapeblue #ccceu

Thank you

Technology

Embracing Failure: Self-healing, Decentralized Resource Management for Apache CloudStack