Upload
vuongkhanh
View
224
Download
2
Embed Size (px)
Citation preview
Cloud Computing
Ennan ZhaiComputer Science at Yale University
About Final Project
About Final Project
• Important dates before demo session: - Oct 31: Proposal v1.0 - Nov 7: Source code v1.0- Nov 14: Proposal v2.0- Nov 21: Source code v2.0- Dec 1: Bakeoff version
About Final Project• How to present?
- ~2 min video and ~2 min Q&A
• What to submit? - 1) Source code- 2) Proposal- 3) README
• Others- any programming language- running on zoo machines- individual or group (2-3 members)
About Final Project• Topics:
- Structured P2P system, e.g., Chord- Hybrid P2P system, e.g., BubbleStorm- Email client software- Decentralized read/write file system- Decentralized social network application- Sybil-resistant recommendation system- Fault tolerance (BFT) system- Accountability system, e.g., PeerReview - ... ...
About Final Project
• A few good final projects in the past years:- Workable email client- A picture-based encryption tool- A privacy-preserving social network- ... ...
Questions?
• Cloud Computing
• Challenges in the Clouds
• A Concrete Cloud Reliability Case
Lecture Outline
• Cloud Computing
• Challenges in the Clouds
• A Concrete Cloud Reliability Case
Lecture Outline
What’s the Cloud Computing?
Cloud computing is a business model for enabling convenient
network access to a shared pool of configurable resources
which can be rapidly provisioned and released with minimal
management effort or service provider interaction.
--- according to NIST(National Institute of Standards and Technology)
What’s the Cloud Computing?
Have You Used the Cloud?
Have You Used the Cloud?
Have You Used the Cloud?
Have You Used the Cloud?
Have You Used the Cloud?
Have You Used the Cloud?
Why We Like It?
• Why users like it? - Do not care where it is, it is “just there”- Access from “any” platform
• Why CS researchers like it? - High-performance computation with less money- Lots of hard and interesting challenges
Why We Like It?
• Why users like it? - Do not care where it is, it is “just there”- Access from “any” platform
• Why CS researchers like it? - High-performance computation with less money- Lots of hard and interesting challenges
Why We Like It?
• Why users like it? - Do not care where it is, it is “just there”- Access from “any” platform
• Why CS researchers like it? - High-performance computation with less money- Lots of hard and interesting challenges
Why We Like It?
Cloud V.S. Distributed Systems
What Kinds of Clouds Exist Now?
• Three types of services:- Software as a Service (SaaS)
- Analogy: Restaurant. Prepares&serves entire meal, does the dishes, etc
- Platform as a Service (PaaS)- Analogy: Take-out food. Prepares meal but does not serve it.
- Infrastructure as a Service (IaaS)- Analogy: Grocery store. Provides raw ingredients.
What Kinds of Clouds Exist Now?
• Three types of services:- Software as a Service (SaaS)
- Analogy: Restaurant. Prepares&serves entire meal, does the dishes, etc
- Platform as a Service (PaaS)- Analogy: Take-out food. Prepares meal but does not serve it.
- Infrastructure as a Service (IaaS)- Analogy: Grocery store. Provides raw ingredients.
What Kinds of Clouds Exist Now?
• Three types of services:- Software as a Service (SaaS)
- Analogy: Restaurant. Prepares&serves entire meal, does the dishes, etc
- Platform as a Service (PaaS)- Analogy: Take-out food. Prepares meal but does not serve it.
- Infrastructure as a Service (IaaS)- Analogy: Grocery store. Provides raw ingredients.
What Kinds of Clouds Exist Now?
• Three types of services:- Software as a Service (SaaS)
- Analogy: Restaurant. Prepares&serves entire meal, does the dishes, etc
- Platform as a Service (PaaS)- Analogy: Take-out food. Prepares meal but does not serve it.
- Infrastructure as a Service (IaaS)- Analogy: Grocery store. Provides raw ingredients.
What Kinds of Clouds Exist Now?
Software as a Service (SaaS)
Software as a Service (SaaS)
Hardware
Middleware
Application
Cloud Provider (i.e., SaaS Provider)
Software as a Service (SaaS)
Hardware
Middleware
Application
Cloud Provider (i.e., SaaS Provider)
• SaaS provider offers an entire application- Word processor, spreadsheet, CRM software, etc.- Customer pays cloud provider- Example: Google Apps, Salesforce.com, etc.
Software as a Service (SaaS)
Hardware
Middleware
Application
Cloud Provider (i.e., SaaS Provider)
• SaaS provider offers an entire application- Word processor, spreadsheet, CRM software, etc.- Customer pays cloud provider- Example: Google Apps, Salesforce.com, etc.
Software as a Service (SaaS)
Hardware
Middleware
Application
Customer
Cloud Provider (i.e., SaaS Provider)
• SaaS provider offers an entire application- Word processor, spreadsheet, CRM software, etc.- Customer pays cloud provider and uses the service- Example: Google Apps, Salesforce.com, etc.
Software as a Service (SaaS)
Hardware
Middleware
Application
Customer
Cloud Provider (i.e., SaaS Provider)
• SaaS provider offers an entire application- Word processor, spreadsheet, CRM software, etc.- Customer pays cloud provider and uses the service- Example: Google Apps, Salesforce.com, etc.
Software as a Service (SaaS)
Hardware
Middleware
Application
Customer
Cloud Provider (i.e., SaaS Provider)
• SaaS provider offers an entire application- Word processor, spreadsheet, CRM software, etc.- Customer pays cloud provider and uses the service- Example: Google Apps, Salesforce.com, etc.
A Typical SaaS: Gmail
A Typical SaaS: Gmail
Hardware
Middleware
Application
Gmail Provider
A Typical SaaS: Gmail
Hardware
Middleware
Application
Gmail Provider
• Outsourcing your e-mail software: - Distributed, replicated message store in BigTable- Weak consistency model for some operations (e.g., msg read)- Stronger consistency for others (e.g., send msg)
A Typical SaaS: Gmail
Hardware
Middleware
Application
Gmail Provider
BigTable
• Outsourcing your e-mail software: - Distributed, replicated message store in BigTable- Weak consistency model for some operations (e.g., msg read)- Stronger consistency for others (e.g., send msg)
A Typical SaaS: Gmail
Hardware
Middleware
Application
Gmail Provider
BigTable
BigTable APIs
• Outsourcing your e-mail software: - Distributed, replicated message store in BigTable- Weak consistency model for some operations (e.g., msg read)- Stronger consistency for others (e.g., send msg)
Hardware
Middleware
Application
Gmail Provider
Gmail
A Typical SaaS: Gmail
• Outsourcing your e-mail software: - Distributed, replicated message store in BigTable- Weak consistency model for some operations (e.g., msg read)- Stronger consistency for others (e.g., send msg)
BigTable
BigTable APIs
Hardware
Middleware
Application
Customer
Gmail Provider
Gmail
A Typical SaaS: Gmail
• Outsourcing your e-mail software: - Distributed, replicated message store in BigTable- Weak consistency model for some operations (e.g., msg read)- Stronger consistency for others (e.g., send msg)
BigTable
BigTable APIs
Hardware
Middleware
Application
Customer
Gmail Provider
Gmail
A Typical SaaS: Gmail
• Outsourcing your e-mail software: - Distributed, replicated message store in BigTable- Weak consistency model for some operations (e.g., msg read)- Stronger consistency for others (e.g., send msg)
BigTable
BigTable APIs
Hardware
Middleware
Application
Customer
Gmail Provider
Gmail
• Outsourcing your e-mail software: - Distributed, replicated message store in BigTable- Weak consistency model for some operations (e.g., msg read)- Stronger consistency for others (e.g., send msg)
A Typical SaaS: Gmail
BigTable
BigTable APIs
Hardware
Middleware
Application
Customer
Gmail Provider
Gmail
• Outsourcing your e-mail software: - Distributed, replicated message store in BigTable- Weak consistency model for some operations (e.g., msg read)- Stronger consistency for others (e.g., send msg)
A Typical SaaS: Gmail
BigTable
BigTable APIs
Platform as a Service (PaaS)
Hardware
Middleware
Application
Platform as a Service (PaaS)
• Cloud provides middleware/infrastructure- For example, Microsoft Common Language Runtime (CLR)- Customer pays SaaS provider for the service- SaaS provider pays the cloud for the platform- Example: Windows Azure, Google App Engine, etc.
Cloud Provider (i.e., PaaS Provider)
Application
Hardware
Middleware
Application
Platform as a Service (PaaS)
• Cloud provides middleware/infrastructure- For example, Microsoft Common Language Runtime (CLR)- Customer pays SaaS provider for the service- SaaS provider pays the cloud for the platform- Example: Windows Azure, Google App Engine, etc.
Cloud Provider (i.e., PaaS Provider)
Application
Hardware
Middleware
Application
Platform as a Service (PaaS)
App Provider
• Cloud provides middleware/infrastructure- For example, Microsoft Common Language Runtime (CLR)- App provider pays the cloud for the platform- Customer pays App provider for the service- Example: Windows Azure, Google App Engine, etc.
Cloud Provider (i.e., PaaS Provider)
Application
Hardware
Middleware
Application
Platform as a Service (PaaS)
App Provider
• Cloud provides middleware/infrastructure- For example, Microsoft Common Language Runtime (CLR)- App provider pays the cloud for the platform- Customer pays App provider for the service- Example: Windows Azure, Google App Engine, etc.
Cloud Provider (i.e., PaaS Provider)
Application
Hardware
Middleware
Application
Platform as a Service (PaaS)
CustomerApp Provider
• Cloud provides middleware/infrastructure- For example, Microsoft Common Language Runtime (CLR)- App provider pays the cloud for the platform- Customer pays app provider for the service- Example: Windows Azure, Google App Engine, etc.
Cloud Provider (i.e., PaaS Provider)
Application
Hardware
Middleware
Application
Platform as a Service (PaaS)
CustomerApp Provider
• Cloud provides middleware/infrastructure- For example, Microsoft Common Language Runtime (CLR)- App provider pays the cloud for the platform- Customer pays app provider for the service- Example: Windows Azure, Google App Engine, etc.
Cloud Provider (i.e., PaaS Provider)
Application
Hardware
Middleware
Application
Platform as a Service (PaaS)
CustomerApp Provider
• Cloud provides middleware/infrastructure- For example, Microsoft Common Language Runtime (CLR)- App provider pays the cloud for the platform- Customer pays app provider for the service- Example: Windows Azure, Google App Engine, etc.
Cloud Provider (i.e., PaaS Provider)
Application
A Typical PaaS: Facebook
Hardware
Middleware
Application
A Typical PaaS: Facebook
Facebook Provider
Hardware
Middleware
Application
A Typical PaaS: Facebook
• Facebook offers PaaS capabilities to App provider- Facebook APIs allow access to social network properties- Third-party game applications- Facebook itself also uses PaaS provided by its company, e.g., log
analysis for recommendations
Facebook Provider
Hardware
Middleware
Application
A Typical PaaS: Facebook
• Facebook offers PaaS capabilities to App provider- Facebook APIs allow access to social network properties- Third-party game applications- Facebook itself also uses PaaS provided by its company, e.g., log
analysis for recommendations
Facebook APIs
Facebook Clusters
Facebook Provider
Hardware
Middleware
Application
A Typical PaaS: Facebook
• Facebook offers PaaS capabilities to App provider- Facebook APIs allow access to social network properties- Third-party game applications- Facebook itself also uses PaaS provided by its company, e.g., log
analysis for recommendations
Facebook APIs
Facebook Clusters
Facebook Provider
Hardware
Middleware
Application
A Typical PaaS: Facebook
App Provider
• Facebook offers PaaS capabilities to App provider- Facebook APIs allow access to social network properties- App providers adopt their services (e.g., game) onto Facebook- Facebook itself also uses PaaS provided by its company, e.g., log
analysis for recommendations
Facebook Game
Facebook APIs
Facebook Clusters
Facebook Provider
Hardware
Middleware
Application
A Typical PaaS: Facebook
App Provider
• Facebook offers PaaS capabilities to App provider- Facebook APIs allow access to social network properties- App providers adopt their services (e.g., game) onto Facebook- Facebook itself also uses PaaS provided by its company, e.g., log
analysis for recommendations
Facebook APIs
Facebook Clusters
Facebook Provider
Facebook Game
Hardware
Middleware
Application
A Typical PaaS: Facebook
CustomerApp Provider
• Facebook offers PaaS capabilities to App provider- Facebook APIs allow access to social network properties- App providers adopt their services (e.g., game) onto Facebook- Facebook itself also uses PaaS provided by its company, e.g., log
analysis for recommendations
Facebook Game
Facebook APIs
Facebook Clusters
Facebook Provider
Facebook Game
Hardware
Middleware
Application
A Typical PaaS: Facebook
CustomerApp Provider
• Facebook offers PaaS capabilities to App provider- Facebook APIs allow access to social network properties- App providers adopt their services (e.g., game) onto Facebook- Facebook itself also uses PaaS provided by its company, e.g., log
analysis for recommendations
Facebook Game
Facebook APIs
Facebook Clusters
Facebook Provider
Facebook Game
Hardware
Middleware
Application
A Typical PaaS: Facebook
CustomerApp Provider
• Facebook offers PaaS capabilities to App provider- Facebook APIs allow access to social network properties- App providers adopt their services (e.g., game) onto Facebook- Facebook itself also uses PaaS provided by its company, e.g., log
analysis for recommendations
Facebook Game
Facebook APIs
Facebook Clusters
Facebook Provider
Facebook Game
Infrastructure as a Service (IaaS)
Hardware
Middleware
Application Application
Middleware
• Cloud provides raw computing resources- Virtual machines, blade servers, hard disk, etc.- Customer pays SaaS provider for the service- SaaS provider pays the cloud for the resources- Example: Amazon Web Services, Rackspace Cloud, etc.
Infrastructure as a Service (IaaS)
Cloud Provider (i.e., IaaS Provider)
Hardware
Middleware
Application Application
Middleware
• Cloud provides raw computing resources- Virtual machines, blade servers, hard disk, etc.- Customer pays SaaS provider for the service- SaaS provider pays the cloud for the resources- Example: Amazon Web Services, Rackspace Cloud, etc.
Infrastructure as a Service (IaaS)
Cloud Provider (i.e., IaaS Provider)
Hardware
Middleware
Application Application
Middleware
• Cloud provides raw computing resources- Virtual machines, blade servers, hard disk, etc.- App provider pays the cloud for the resources- Customer pays App provider for the service- Example: Amazon Web Services, Rackspace Cloud, etc.
Infrastructure as a Service (IaaS)
App Provider
Cloud Provider (i.e., IaaS Provider)
Hardware
Middleware
Application Application
Middleware
• Cloud provides raw computing resources- Virtual machines, blade servers, hard disk, etc.- App provider pays the cloud for the resources- Customer pays App provider for the service- Example: Amazon Web Services, Rackspace Cloud, etc.
Infrastructure as a Service (IaaS)
App Provider
Cloud Provider (i.e., IaaS Provider)
Middleware
Application
Hardware
Middleware
Application Application
Customer
Middleware
Infrastructure as a Service (IaaS)
App Provider
Cloud Provider (i.e., IaaS Provider)
• Cloud provides raw computing resources- Virtual machines, blade servers, hard disk, etc.- App provider pays the cloud for the resources- Customer pays App provider for the service- Example: Amazon Web Services, Rackspace Cloud, etc.
Middleware
Application
Hardware
Middleware
Application Application
Customer
Middleware
Infrastructure as a Service (IaaS)
App Provider
Cloud Provider (i.e., IaaS Provider)
• Cloud provides raw computing resources- Virtual machines, blade servers, hard disk, etc.- App provider pays the cloud for the resources- Customer pays App provider for the service- Example: Amazon Web Services, Rackspace Cloud, etc.
Middleware
Application
Hardware
Middleware
Application Application
Customer
Middleware
Infrastructure as a Service (IaaS)
App Provider
Cloud Provider (i.e., IaaS Provider)
• Cloud provides raw computing resources- Virtual machines, blade servers, hard disk, etc.- App provider pays the cloud for the resources- Customer pays App provider for the service- Example: Amazon Web Services, Rackspace Cloud, etc.
Middleware
Application
Hardware
Middleware
Application Application
Middleware
Typical IaaS: EC2 and S3
Amazon
Hardware
Middleware
Application Application
Middleware
Typical IaaS: EC2 and S3
Amazon
EC2 S3
Hardware
Middleware
Application Application
Middleware
Amazon
EC2 S3
Netflix Provider
• Netflix (app) heavily depends on Amazon AWS: - Media files are stored in S3- Transcoding to target devices (e.g., iPad) using EC2- Analysis of streaming sessions based on Elastic MapReduce
Typical IaaS: EC2 and S3
Hardware
Middleware
Application
Middleware
Amazon
EC2 S3
Netflix Provider
• Netflix (app) heavily depends on Amazon AWS: - Media files are stored in S3- Transcoding to target devices (e.g., iPad) using EC2- Analysis of streaming sessions based on Elastic MapReduce
Typical IaaS: EC2 and S3
Hardware
Middleware
Application
Middleware
Amazon
EC2 S3
Netflix Provider
• Netflix (app) heavily depends on Amazon AWS: - Media files are stored in S3- Transcoding to target devices (e.g., iPad) using EC2- Analysis of streaming sessions based on Elastic MapReduce
Typical IaaS: EC2 and S3
Netflix
Hardware
Middleware
Application
Middleware
Amazon
EC2 S3
Netflix Provider
• Netflix (app) heavily depends on Amazon AWS: - Media files are stored in S3- Transcoding to target devices (e.g., iPad) using EC2- Analysis of streaming sessions based on Elastic MapReduce
Typical IaaS: EC2 and S3
Netflix
Hardware
Middleware
Application Application
Middleware
Amazon
EC2 S3
CustomerNetflix Provider
• Netflix (app) heavily depends on Amazon AWS: - Media files are stored in S3- Transcoding to target devices (e.g., iPad) using EC2- Analysis of streaming sessions based on Elastic MapReduce
Typical IaaS: EC2 and S3
Netflix
Hardware
Middleware
Application Application
Middleware
Amazon
EC2 S3
CustomerNetflix Provider
• Netflix (app) heavily depends on Amazon AWS: - Media files are stored in S3- Transcoding to target devices (e.g., iPad) using EC2- Analysis of streaming sessions based on Elastic MapReduce
Typical IaaS: EC2 and S3
Netflix
• Three types of services:- Software as a Service (SaaS)
- Analogy: Restaurant. Prepares&serves entire meal, does the dishes, etc
- Platform as a Service (PaaS)- Analogy: Take-out food. Prepares meal but does not serve it.
- Infrastructure as a Service (IaaS)- Analogy: Grocery store. Provides raw ingredients.
Recall
• Three types of services:- Software as a Service (SaaS)
- Analogy: Restaurant. Prepares&serves entire meal, does the dishes, etc
- Platform as a Service (PaaS)- Analogy: Take-out food. Prepares meal but does not serve it.
- Infrastructure as a Service (IaaS)- Analogy: Grocery store. Provides raw ingredients.
Recall
Zoo?
The Major Cloud Providers
The Major Cloud Providers
• Amazon is the big player: - Infrastructure as a service (e.g., EC2)- Storage as a service (e.g., S3)
• But there are many others:- Microsoft Azure: It has similar services to Amazon, with an
emphasis on .Net programming model- Google App Engine: It offers programming interface, Hadoop, also
software as a service, e.g., Gmail and Google Docs- IBM, HP, Yahoo!: They seem to focus on enterprise scale cloud apps
The Major Cloud Providers
• Amazon is the big player: - Infrastructure as a service (e.g., EC2)- Storage as a service (e.g., S3)
• But there are many others:- Microsoft Azure: It has similar services to Amazon, with an
emphasis on .Net programming model- Google App Engine: It offers programming interface, Hadoop, also
software as a service, e.g., Gmail and Google Docs- IBM, HP, Yahoo!: They seem to focus on enterprise scale cloud apps
• Cloud Computing
• Challenges in the Clouds
• A Concrete Cloud Reliability Case
Lecture Outline
• Cloud Computing
• Challenges in the Clouds
• A Concrete Cloud Reliability Case
Lecture Outline
What Kinds of Challenges?
• Scalability
• Availability and reliability
• Security and privacy
What Kinds of Challenges?
• Scalability
• Availability and reliability
• Security and privacy
What Kinds of Challenges?
Scalability
PC
Scalability
• What if one computer is not enough? - Buy a bigger (server-class) computer
PC
Scalability
• What if one computer is not enough? - Buy a bigger (server-class) computer
PC Server
Scalability
• What if one computer is not enough? - Buy a bigger (server-class) computer
PC Server
• What if the biggest computer is not enough? - Buy many computers
Scalability
• What if one computer is not enough? - Buy a bigger (server-class) computer
PC Server Cluster
• What if the biggest computer is not enough? - Buy many computers
Scalability
ScalabilityRack
ScalabilityNetwork switches
(connects nodes with each other and with other racks)
Rack
ScalabilityNetwork switches
(connects nodes with each other and with other racks)
Many nodes/blades (often identical)
Rack
ScalabilityNetwork switches
(connects nodes with each other and with other racks)
Many nodes/blades (often identical)
Storage device(s)
Rack
Scalability
• What if cluster is too big to fit into machine room? - Build a separate building for the cluster- Building can have lots of cooling and power- Result: Data center
PC Server Cluster
Scalability
• What if cluster is too big to fit into machine room? - Build a separate building for the cluster- Building can have lots of cooling and power- Result: Data center
PC Server Cluster
Scalability
• What if cluster is too big to fit into machine room? - Build a separate building for the cluster- Building can have lots of cooling and power- Result: Data center
PC Server Cluster Data center
Google Data Center in Oregon
Data centers (size of a football field)
Google Data Center in Oregon
• A warehouse-sized computer - A single data center can easily contain 10,000 racks with
100 cores in each rack (1,000,000 cores total)
Google Data Center in OregonData centers (size of a
football field)
Google Data Center Locations
Google Data Centers in the USA
Google Data Centers in Europe
Google Data Centers World Wide
Open Challenges
Open Challenges
• Can you manage thousands of racks effectively?- Cloud monitor systems (e.g., PlanetSeer [OSDI’04])- Can you design more scalable data center network?
Open Challenges
• Can you manage thousands of racks effectively?- Cloud monitor systems (e.g., PlanetSeer [OSDI’04])- Can you design more scalable data center network?
Open Challenges
• Can you make data center more scalable?- Scalable data center architecture (e.g., VL2 [SIGCOMM’09])- Can you design more scalable data center network?
• Can you manage thousands of racks effectively?- Cloud monitor systems (e.g., PlanetSeer [OSDI’04])
Open Challenges
• Can you make data center more scalable?- Scalable data center architecture (e.g., VL2 [SIGCOMM’09])- Can you design more scalable data center network?
• Can you manage thousands of racks effectively?- Cloud monitor systems (e.g., PlanetSeer [OSDI’04])
• Scalability
• Availability and reliability
• Security and privacy
What Kinds of Challenges?
• Scalability
• Availability and reliability
• Security and privacy
What Kinds of Challenges?
Availability & Reliability
• Is the cloud always there when you need it? - Service outages- Connectivity outages
Recent Cloud Disasters
Recent Cloud Disasters
Recent Cloud Disasters
Top10 Cloud Service Outages
Open Challenges
• Can you build a system to find out the root-cause when a service becomes unavailable?- Diagnosis systems (e.g., Sherlock [SIGCOMM’07])
Open Challenges
• Can you build a system to find out the root-cause when a service becomes unavailable?- Diagnosis systems (e.g., Sherlock [SIGCOMM’07])- Accountable cloud (e.g., AVM [OSDI’10])
Open Challenges
• Can you build a system to find out the root-cause when a service becomes unavailable?- Diagnosis systems (e.g., Sherlock [SIGCOMM’07])- Accountable cloud (e.g., AVM [OSDI’10])
Open Challenges
• Can you propose an approach to make the clouds more robust?- Fault tolerate systems (e.g., F10 [NSDI’13])
• Can you build a system to find out the root-cause when a service becomes unavailable?- Diagnosis systems (e.g., Sherlock [SIGCOMM’07])- Accountable cloud (e.g., AVM [OSDI’10])
Open Challenges
• Can you propose an approach to make the clouds more robust?- Fault tolerate systems (e.g., F10 [NSDI’13])
• Can you build a system to find out the root-cause when a service becomes unavailable?- Diagnosis systems (e.g., Sherlock [SIGCOMM’07])- Accountable cloud (e.g., AVM [OSDI’10])
• Scalability
• Availability and reliability
• Security and privacy
What Kinds of Challenges?
• Scalability
• Availability and reliability
• Security and privacy
What Kinds of Challenges?
Security & Privacy
• Compromised your cloud accounts - Hacker does not need to break into your home to steal all your
private data, if he can break or guess your cloud password- Even worse, if hack who cracks your Facebook account get into
your accounts everywhere online
Security & Privacy
• Compromised your cloud accounts - Hacker does not need to break into your home to steal all your
private data, if he can break or guess your cloud password- Even worse, if hack who cracks your Facebook account get into
your accounts everywhere online
• You do not know if the cloud providers read your private data
Open Challenges
• Can you build a system to preserve the privacy of your data on the clouds?- MAC for MapReduce (e.g., Airavat [NSDI’10])- Trusted storage (e.g., Depot [OSDI’10])
Open Challenges
• Can you build a system to preserve the privacy of your data on the clouds?- MAC for MapReduce (e.g., Airavat [NSDI’10])- Trusted storage (e.g., Depot [OSDI’10])
Open Challenges
• Can you build a system to preserve the privacy of your data on the clouds?- MAC for MapReduce (e.g., Airavat [NSDI’10])- Trusted storage (e.g., Depot [OSDI’10])
Open Challenges
• Can you propose an approach to verify if the cloud provider modifies your data?- Trusted cloud computing (e.g., Excalibur [USENIX Sec’12])
• Can you build a system to preserve the privacy of your data on the clouds?- MAC for MapReduce (e.g., Airavat [NSDI’10])- Trusted storage (e.g., Depot [OSDI’10])
Open Challenges
• Can you propose an approach to verify if the cloud provider modifies your data?- Trusted cloud computing (e.g., Excalibur [USENIX Sec’12])
• Can you build a system to preserve the privacy of your data on the clouds?- MAC for MapReduce (e.g., Airavat [NSDI’10])- Trusted storage (e.g., Depot [OSDI’10])
More Risks?
• EverClouds is a project of DeDiS group (Bryan is PI): - aims to solve tricky cloud security problems (e.g., timing channels)- tries to make the clouds more reliable (e.g., failure detection)
• We already have some efforts:- SRA: A cloud structural reliability auditing system (submitted)- iRec: A cloud independence recommender system (HotDep’13)- P-SRA: A privacy-preserving structural-reliability auditor (CCSW’13)- Timing channel control with provider-enforced deterministic
execution (CCSW’10)
More Risks?
• Cloud Computing
• Challenges in the Clouds
• A Concrete Cloud Reliability Case
Lecture Outline
• Cloud Computing
• Challenges in the Clouds
• A Concrete Cloud Reliability Case
Lecture Outline
Realistic Problem
Summary of the October 22, 2012 AWS Service Event in the US-East Region
We’d like to share more about the service event that occurred on Monday, October 22nd in the US-East Region. We have now completed the analysis of the events that affected AWS customers, and we want to describe what happened, our understanding of how customers were affected, and what we are doing to prevent a similar issue from occurring in the future.
The Primary Event and the Impact to Amazon Elastic Block Store (EBS) and Amazon Elastic Compute Cloud (EC2)
Correlated failures resulting from EBSdue to bugs in one EBS server
Realistic Problem
Elastic Compute Cloud (EC2)
Elastic Block Store (EBS)
Realistic Problem
... ...
Elastic Block Store (EBS)
Realistic Problem
... ...... ...
VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4
Elastic Block Store (EBS)
Realistic Problem
... ...... ...
VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4
EBS Server2EBS Server1
Realistic Problem
... ...... ...
VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4
EBS Server2EBS Server1
Realistic Problem
... ...... ...
VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4
EBS Server2EBS Server1
Realistic Problem
... ...... ...
VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4
EBS Server2EBS Server1
Realistic Problem
System We Need
• #1: Dependency collections• #2: Dependency representation• #3: Efficient auditing
System We Need
• #1: Dependency collections• #2: Dependency representation• #3: Efficient auditing
Dependency Data Collections
Type Dependency Expression
Network <src=”S” dst=”D” route=”x,y,z”/>
Hardware <hw=”H” type=”T” dep=”x”/>
Software <pgm=”S” hw=”H” dep=”x,y,z”/>
Our defined format
• Reuse existing data collection tools: - Convert the outputs to uniform format. - Three types of format: NET, HW and SW.
Dependency Data Collections
DepDB
NSDMiner
Dependency Data Collections
DepDB
NSDMiner
Dependency Data Collections
DepDB
NSDMiner
Dependency Data Collections
DepDB
NSDMiner
<src=”S1” dst=”Internet” route=”ToR1,Core1”/><src=”S1” dst=”Internet” route=”ToR1,Core2”/><src=”S2” dst=”Internet” route=”ToR1,Core1”/><src=”S2” dst=”Internet” route=”ToR1,Core2”/>
DepDB
NSDMiner
<src=”S1” dst=”Internet” route=”ToR1,Core1”/><src=”S1” dst=”Internet” route=”ToR1,Core2”/><src=”S2” dst=”Internet” route=”ToR1,Core1”/><src=”S2” dst=”Internet” route=”ToR1,Core2”/>
System We Need
• #1: Dependency collections• #2: Dependency representation• #3: Efficient auditing
Example Redundancy
Example Redundancy
SW
HW
NET
Example Redundancy
Building Fault Graph Top-to-Bottom
SW
HW
NET
Redundancy configuration fails
Step1: Root Node
Server 2 failsServer 1 fails
Redundancy configuration fails
Step2: Server Nodes
AND gate: all the sublayer nodes fail, the upper layer node fails
Server 2 failsServer 1 fails
Step2: Server NodesRedundancy configuration fails
Net fails
+" +"
HW fails SW fails SW fails
Server 2 failsServer 1 fails
Net fails HW fails
Redundancy configuration fails
Step3: Dependency Nodes
OR gate: one of the sublayer nodes fails, the upper layer node fails
Net fails
+" +"
HW fails SW fails SW fails
Server 2 failsServer 1 fails
Net fails HW fails
Step3: Dependency NodesRedundancy configuration fails
Net fails
+" +"
HW fails
Disk2CPU2
+"
SW fails SW fails
Server 2 failsServer 1 fails
Disk1CPU1
+"
Net fails HW fails
Step4: Hardware DependencyRedundancy configuration fails
Net fails
+" +"
ToR1 Core2Core1
HW fails
+" +"
Disk2CPU2
+"
Path1 Path2
SW fails SW fails
Server 2 failsServer 1 fails
Disk1CPU1
+"
Net fails HW fails
Step5: Network DependencyRedundancy configuration fails
Net fails
+" +"
ToR1 Core2Core1
HW fails
libc6 libccllibsvnl
+" +"
Disk2CPU2
+"
Path1 Path2
+"
Riak
+"
Query
+"
SW fails SW fails
Server 2 failsServer 1 fails
Disk1CPU1
+"
Net fails HW fails
Step6: Software DependencyRedundancy configuration fails
System We Need
• #1: Dependency collections• #2: Dependency representation• #3: Efficient auditing
• Two algorithms balancing cost and accuracy: - Minimal fault set algorithm- Failure sampling algorithm
Efficient Auditing
• Two algorithms balancing cost and accuracy: - Minimal fault set algorithm- Failure sampling algorithm
Efficient Auditing
Minimal Fault Set Algorithm
• Traditional algorithm in safety engineering- Exponential complexity (NP-hard)
• We are the first to apply it in Cloud area:- Analyzing a fat tree with 30,528 with ~40 hours
• We propose efficient failure sampling algorithm.
Minimal Fault Set Algorithm
• Traditional algorithm in safety engineering- Exponential complexity (NP-hard)
• We are the first to apply it in Cloud area:- Analyzing a fat tree with 30,528 with ~40 hours
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
Failure Sampling Algorithm
1 or 0 1 or 0 1 or 0
Failure Sampling Algorithm
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
1 or 0 ?
1 or 0 1 or 0 1 or 0
Failure Sampling Algorithm
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
Fault Sets
∅
1 or 0 1 or 0 1 or 0
Failure Sampling Algorithm
1 or 0 ?Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
The 1st Sampling Round
Fault Sets
∅
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
1 or 0 1 or 0 1 or 0
The 1st Sampling Round
Fault Sets
∅
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
✔✘ ✘
The 1st Sampling Round
Fault Sets
∅
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
✔✘ ✘
✘
The 1st Sampling Round
Fault Sets
∅
✘ ✘
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
✔✘ ✘Fault Sets
{Server1’s HW, Server2’s HW}
The 1st Sampling Round
✘
✘ ✘
Fault Sets
{Server1’s HW, Server2’s HW}
The 2nd Sampling Round
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
1 or 0 1 or 0 1 or 0
Fault Sets
{Server1’s HW, Server2’s HW}
The 2nd Sampling Round
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
✔ ✘✔Fault Sets
{Server1’s HW, Server2’s HW}
The 2nd Sampling Round
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
✔ ✘✔
✔
Fault Sets
{Server1’s HW, Server2’s HW}
The 2nd Sampling Round
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
✔ ✘
Fault Sets
{Server1’s HW, Server2’s HW}
The 3rd Sampling Round
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
✔✘✔Fault Sets
{Server1’s HW, Server2’s HW}
The 3rd Sampling Round
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
✔✘✔Fault Sets
{Server1’s HW, Server2’s HW}
The 3rd Sampling Round
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
✘
✘ ✘
✔✘✔Fault Sets
{Server1’s HW, Server2’s HW}
{Switch1}
The 3rd Sampling Round
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
✘
✘ ✘
Fault Sets{Server1’s HW, Server2’s HW}
{Switch1}
{Switch1, Server2’s HW}
{Switch1}
{Switch1, Server2’s HW}
... ...
After Many (e.g., 107) Rounds
Fault Sets{Server1’s HW, Server2’s HW}
{Switch1}
{Switch1, Server2’s HW}
{Switch1}
{Switch1, Server2’s HW}
... ...
Size-Based Ranking
Fault Sets{Switch1}
{Switch1}
{Switch1, Server2’s HW}
{Switch1, Server2’s HW}
{Server1’s HW, Server2’s HW}
... ...
Size-Based Ranking
Thanks!
Questions?