Availability, The Cloud and Everything (version 2, Surge2010)

Preview:

Citation preview

Availability,the Cloud and

Everything

Joe Williams

Saturday, October 2, 2010

Me

• Joe Williams• Infrastructure Engineer • Cloudant• @williamsjoe• joeandmotorboat.com

Saturday, October 2, 2010

• Distributed database built on CouchDB• Real-time Search and Analytics• Sign Up! (Free to 256MB)• cloudant.com• http://github.com/cloudant/bigcouch

Saturday, October 2, 2010

Bias

• Distributed Databases (CouchDB)• Amazon EC2• Chef• Erlang

Saturday, October 2, 2010

Availability

Saturday, October 2, 2010

Availability

• What is Availability?

Saturday, October 2, 2010

Availability

Saturday, October 2, 2010

Availability

“System availability refers to the accessibility of system services to users. A system is available if it is

operational for an overwhelming fraction of the time. Unlike reliability, availability is instantaneous.”

Saturday, October 2, 2010

Availability

“System reliability refers to the property of tolerating constituent component failures, for the longest time. A

system is perfectly reliable if it never fails.”

Saturday, October 2, 2010

Availability

• Reliability * Availability = Dependability

Saturday, October 2, 2010

Availability

• Availability & Reliability• Mean time to failures• Mean time to repair• Durability• Fault isolation• Fault tolerance

Saturday, October 2, 2010

Availability

• Uptime / Downtime• Perceived• Actual

Saturday, October 2, 2010

Availability

• Probabilistic Risk Assessment• Event Tree Analysis• Fault Tree Analysis

Apthorpe (http://www.usenix.org/events/lisa01/tech/apthorpe/apthorpe.ps)

Saturday, October 2, 2010

The Cloud

Saturday, October 2, 2010

The Cloud

“It never gets easier, you just go faster.”- Greg Lemond

Saturday, October 2, 2010

The Cloud

• Abstraction• Commoditization• Homogenous• Ephemeral

Saturday, October 2, 2010

The Cloud

• Costs• Loss of Control• Single Points of Failure• Network Partitions / Data Locality• Unreliable• Performance

Saturday, October 2, 2010

The Cloud

• Benefits• API to everything• Fast and Flexible Resource Mgmt• “Unlimited” Resources

Saturday, October 2, 2010

The Cloud

• Bootstrapping• Time and Effort

Adam Jacob and Ezra Zygmuntowicz (http://blip.tv/file/2285124/)

Saturday, October 2, 2010

The Cloud

• Nodes are stateless and disposable.

Saturday, October 2, 2010

The Cloud

"Clouds are systems ... and with systems, you have to think hard and know how to deal with issues in that environment. The scale is so much bigger, and you don't have the physical control. But we think people should

be optimistic about what we can do here. If we are clever about deploying cloud computing with a clear-eyed notion of what the risk models are, maybe we can actually save the economy through technology."

- Security in the Ether By David Talbot - MIT Technology Review Jan/Feb 2010

Saturday, October 2, 2010

What’s Next

• Distributed Systems• Automation• Data Driven Operations

Saturday, October 2, 2010

Distributed Systems

Baran (http://www.rand.org/pubs/research_memoranda/RM3420/)

Saturday, October 2, 2010

Distributed Systems

• RAID ain’t as redundant as it used to be.

Leventhal (http://queue.acm.org/detail.cfm?id=1670144)

Saturday, October 2, 2010

Distributed Systems

• Redundancy• Duplication• Distribution

Saturday, October 2, 2010

Distributed Systems

• Alphabet Soup• ACID, CAP, BASE, 2PC, MVCC• Vector Clocks, Eventual Consistency• Dynamo, Paxos, Chandra, Byzantine

Saturday, October 2, 2010

Distributed Systems

• CAP == Availability

Saturday, October 2, 2010

Distributed Systems

• Erlang• Distributed• Concurrent• Fault Tolerant

Saturday, October 2, 2010

Distributed Systems

• Erlang• Supervision Trees

Saturday, October 2, 2010

Distributed Systems

• Erlang• Hot Code Upgrades• Distributed Upgrades are HARD

Saturday, October 2, 2010

Distributed Systems

• Future Work

• Erlang Supervision Trees

• PRA / FTA / ETA

Apthorpe (http://www.usenix.org/events/lisa01/tech/apthorpe/apthorpe.ps)

Saturday, October 2, 2010

Automation

Saturday, October 2, 2010

Automation

• Optimal use of the cloud.

Saturday, October 2, 2010

Automation

• Frequent deployment.

Saturday, October 2, 2010

Automation

• Tools• Chef• Puppet• Cfengine• Bcfg2

Saturday, October 2, 2010

Automation

• Erlang + Chef (as of v0.8)• erl_call Provider

Saturday, October 2, 2010

Data Driven Operations

Saturday, October 2, 2010

Data Driven Operations

“What gets measured, gets managed.”-Peter Drucker

Saturday, October 2, 2010

Data Driven Operations

• Instrumentation

Saturday, October 2, 2010

Data Driven Operations

• Logging

Saturday, October 2, 2010

Data Driven Operations

• Visualization

Saturday, October 2, 2010

Data Driven Operations

• Demo!

Saturday, October 2, 2010

Data Driven Operations

• Modeling

• Analysis

• Universal Law of Computational Scalability

• Amdahl’s Law

Saturday, October 2, 2010

Data Driven Operations

• Modeling isn’t just for capacity planning.

Montagne (http://queue.acm.org/detail.cfm?id=1862187)

Saturday, October 2, 2010

The End

Saturday, October 2, 2010

Questions?

Joe Williams - @williamsjoe

Saturday, October 2, 2010

Recommended