35
8/14/2019 Chandra Krintz http://slidepdf.com/reader/full/chandra-krintz 1/35 AppScale: Open-source Platform-as-a-Service (PaaS) System for Energy-Aware Cloud Computing Research Chandra Krintz Associate Professor Computer Science Dept and Institute for Energy Efficiency UCSB Nov. 18, 2009

Chandra Krintz

Embed Size (px)

Citation preview

Page 1: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 1/35

AppScale:

Open-sourcePlatform-as-a-Service (PaaS)

System for Energy-AwareCloud Computing Research

Chandra KrintzAssociate Professor

Computer Science Dept andInstitute for Energy Efficiency

UCSB

Nov. 18, 2009

Page 2: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 2/35

Cloud Computing

• Software systems for accessing easily and transparentlyscalable CPU/storage/network resources via a networkconnection or web interface – “as-a-service”  

• On a rental basis 

SLAs

Web Services

Virtualization

-• Pay-per-use, flat rate – E-comerced based• Resources are opaque

Page 3: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 3/35

Cloud Computing

• Software systems for accessing easily and transparentlyscalable CPU/storage/network resources via a networkconnection or web interface – “as-a-service”   –  Culmination of grid/cluster/utility/elastic computing

 –  Advances in processor, virtualization, systems technology

• On a rental basis – via service-level agreements (SLAS) –  Pay-per-use, flat rate – E-comerced based

 –  Resources are opaque

• Have experienced a rapid uptake in the commercial sector –  Public clouds – you run your systems/apps on others’ systems

• Access small fraction of vast resource pools

 –  Offer availability guarantees and extreme scale

Page 4: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 4/35

Layers

• Software systems for accessing easily and transparentlyscalable CPU/storage/network resources via a networkconnection or web interface – “as-a-service”  

 –  Infrastructure, e.g. Amazon Web Services (AWS)• Provision isolated resources under contract

• ull-s ste i a es de lo ed o er irtual achine onitor

IaaS

 

 –  Platform, e.g. Google AppEngine (GAE), Microsoft Azure• Enable construction of network-accessible applications

• Complete software stack; Process-level runtime isolation

• Specialized/scalable runtime and library support

 –  Software, e.g. Salesforce• Remotely accessible and customizable applications

PaaS

SaaS

Page 5: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 5/35

Cloudy Issues

• Many institutions and companies own IT infrastructure

• Public cloud features also useful for “on-premise” clouds

 –  Privacy of code and data –  Avoids vendor “lock-in”, pay-per-use, and availability reliance

 –  Potential for hybrid and customized approaches

  – 

o en a or eas ng resource cons ra n s• Storage/data management, cpu/memory, communication

 –  Potential for investigating new energy-aware dist. systems

Public clouds are opaque – open APIs, closed implementation –  How can we research what’s next if we can’t see what’s now?

• Can cloud fabrics support other application domains,

services, performance/availability requirements?

Page 6: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 6/35

An Opening in the Clouds

• Open-source cloud computing systems from theUCSB Computer Science Department –  Goal: Bring popular cloud fabrics to “on-premise” clusters that

are easy to use and are transparent

 –  To facilitate investigation of• Ener -e icient cloud com utin 

 –  Services, underlying device technology, support technologies –  Customization (availability, performance, application behavior)

• Hybrid cloud solutions (public and on-premise)

Page 7: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 7/35

An Opening in the Clouds

• Open-source cloud computing systems from theUCSB Computer Science Department –  Goal: Bring popular cloud fabrics to “on-premise” clusters that

are easy to use and are transparent –  To facilitate investigation of

• Energy-efficient cloud computing  –  erv ce , un er y ng ev ce ec no ogy, u or ec no og e

 –  Customization (availability, performance, application behavior)• Hybrid cloud solutions (public and on-premise)

 –  By emulating key cloud layers from the commercial sector• Engender user community, access to real applications/users• Leverage extant software technologies

 –  Not replacement technology for any Public Cloud service

Page 8: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 8/35

UCSB Cloud Computing

• Software systems for accessing easily and transparentlyscalable CPU/storage/network resources via a networkconnection or web interface – “as-a-service”  

 –  Infrastructure, e.g. Amazon Web Services (AWS)• Eucalyptus: www.eucalyptus.com Prof. Rich Wolski

 

IaaS

 – 

a orm, e.g. oog e pp ng ne , croso zure• Enable construction of network-accessible applications• Complete software stack; Process-level runtime isolation• Specialized/scalable runtime and library support

 –  AppScale: appscale.cs.ucsb.edu• Web services based implementation of Google App Engine

 –  Complete application stack for MVC-based web applications –  Written in Python or Java

PaaS

Page 9: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 9/35

From Google App Engine

to AppScale• GAE is a full application stack that facilitates construction

of interactive webpages with a database backing store –  Users develop Python and Java apps using well defined APIs

• Execute, test, debug the apps locally using the Google open-source software development kit (SDK)

  – 

SDK implements simple, slow, non-scalable versions of APIs –  Generates indices that the application requires

Page 10: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 10/35

From Google App Engine

to AppScale• GAE is a full application stack that facilitates construction

of interactive webpages with a database backing store –  Users develop Python and Java apps using well defined APIs

• Execute, test, debug the apps locally using the Google open-source software development kit (SDK)

  – 

SDK implements simple, slow, non-scalable versions of APIs –  Generates indices that the application requires

• Upload application to Google: YOUR_APPNAME.appspot.com

 –  Sandboxed execution, language/behavior restrictions

 –  Resource quotas via free or for-fee –  Highly scalable API implementations (proprietary)

Page 11: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 11/35

From Google App Engine

to AppScale• GAE is a full application stack that facilitates construction

of interactive webpages with a database backing store –  Users develop Python and Java apps using well defined APIs

• Highly scalable proprietary implementation on Google resources

Datastore -> Bi table/Ma reduce 

MemCache -> in-memory datastoreAuthentication -> Google Accounts

Mail -> GMail

URL-Fetch (for HTTP/S communication)

Images

Task Queues for short background jobs

Page 12: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 12/35

AppScale

• An Open-source platform-as-a-service (PaaS) system thatemulates Google App Engine (GAE) –  Extension of GAE SDK with API implementations replacedAppLoadBalancer (ALB) Database Master/Peer (DB M/P)

AppServer (AS) Database Slave/Peer (DB S/P)

GAE App

Developer

(AppScale Admin)

GAE App

Users

AppScale

tools

HTTPS

App

Controller

ALB

DB M/P

DB S/P

ASGAE App

UsersGAE App

Users

AppScale Cloud

Page 13: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 13/35

An AppScale Image

• Implements every AppScale component –  Can instantiate as a particular role (ALB, AS, DB)

 –  Can change functionality and instantiate itself as another• AppLoadBalancer (ALB)

 –  Controller of the system (Ruby-on-Rails application)

  – 

Starts other instances and instantiates each with its role –  Monitors other components once instantiated

 –  Restarts lost components

 –  Current work: ALB availability (shadow ALB(s))

 –  GAE developers contact ALB to manage cloud• Via the AppScale Tools

• Stop/start cloud, upload/start/terminate GAE apps

• Command line and web page administrator status w/ load stats

Page 14: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 14/35

An AppScale Image

• Implements every AppScale component –  Can instantiate as a particular role (ALB, AS, DB)

 –  Can change functionality and instantiate itself as another• AppLoadBalancer (ALB)

 –  Controller of the system (Ruby-on-Rails application)

  – 

Starts other instances and instantiates each with its role –  Monitors other components once instantiated

 –  Restarts lost components

• Monitors system statistics –  CPU, Memory, page requests, network, application behavior

 –  Grow and shrink AppScale cloud on-demand

Page 15: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 15/35

An AppScale Image

• Implements every AppScale component –  Can instantiate as a particular role (ALB, AS, DB)

 –  Can change functionality and instantiate itself as another• AppServer (AS)

 –  Web front-end for the GAE applications (GAE API compliant)

  – 

Users of GAE apps interact with ASs directly• After being routed to an AS instance by the ALB upon first

access to a GAE application (for load balancing purposes)

 –  Python GAE interface

 –  Java GAE interface

Page 16: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 16/35

An AppScale Image

• Implements every AppScale component –  Can instantiate as a particular role (ALB, AS, DB)

 –  Can change functionality and instantiate itself as another• AppServer (AS)

 –  Web front-end for the GAE applications (GAE API compliant)

  – 

Users of GAE apps interact with ASs directly• After being routed to an AS instance by the ALB upon first

access to a GAE application (for load balancing purposes)

 –  Python GAE interface

 –  Java GAE interface• Extensions (non-GAE compliant)

 –  MapReduce integration via an API

 –  GAE developers write their own mappers/reducers• AppScale schedules and load balances

Page 17: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 17/35

An AppScale Image

• Implements every AppScale component –  Can instantiate as a particular role (ALB, AS, DB)

 –  Can change functionality and instantiate itself as another• AppDBMaster (DBM) and AppDBSlaves (DBS)

 –  Scalable distributed database (Datastore API) implementation

 •

Type and replication specified at AppScale start by admin –  Bigtable like options (schema-free, key-value store)

• HBase (Java) and Hypertable (C++)

• MongoDB (Python)

 –  Peer-to-peer options• Cassandra (Java), Voldemort (Java)

 –  In-Memory, Distibuted – MemchacheDB (Python)

 – 

Relational options• MySQL (sharded, Master/Slave)

Page 18: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 18/35

AppScale

• Component interoperation –  Google protocol buffers for all app data communication

 –  Inter-language IPC via Thrift

• Deploys and executes automatically over different cloud

a rics an virtua ization ayers –  Eucalyptus

 –  Amazon’s EC2

 –  Xen

 –  KVM

 –  Soon to come…• VMWare/VCloud, IBM’s RC2, greater EC2 scale

Page 19: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 19/35

AppScale Potential

• Current status –  Stable system, many interesting database options

 –  International user community, real applications• AppScale is used for commercial & private endeavors

• Significant visibility and attention from blogosphere & industry

  – 

Web services application domain• W/ extension that spawns multi-language Map-Reduce jobs

• Research directions:

 –  Measurement and characterization of energy consumption• Multiple components, languages, resource requirements

• Parallelization and concurrency, multicore utilization

• Virtualization and IaaS layers

• Database technologies

Page 20: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 20/35

AppScale Research Roadmap

• Support for Java and additional DBs

• ASs and DBs grow and shrink according to load and failure –  Resource monitoring & allocation (SLA support)

• Automatic and dynamic renegotiation, improved scaling

• Per ormance ener and availabilit monitorin  –  Capture full-system behavior via sampling

• For debugging, performance/energy feedback, optimization

• Administrator/Developer control of

Replication of data for fault tolerance Scaling triggersType and amount of system monitoring Sandbox restrictions

Parallelism/Concurrency Energy/Power management

• Alternative computation models, e.g. streaming

Page 21: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 21/35

• Support for novel/emerging application domains –  Computationally and data intensive -- streaming, HPC, hybrids

IaaS + PaaS integration AppScale + Eucalyptus –  Resource allocation, specialization/customization

• Automatic re-negotiation of SLAs

• -

AppScale Research Directions

, ,

 –  Application-level feedback to/from physical power/air systems –  Isolation/energy tradeoffs, e.g. exploit shared memory

Multi-core server

VirtualizationTechnology (SW+HW)

Linux + EucalyptusAppScale

Applications/libraries

Page 22: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 22/35

Lessons• Open-source and management of a user community• Positives:

 –  Real applications, real systems

• If it works here, it really works –  Immediate feedback – helps debugging, evaluating efficacy of

more “theoretical” and “research” contributions, what is,

 –  Extensive visibility – helps with recruting, funding, impact

 –  Really fun and challenging

• Negatives –  Immediate feedback – can be quite demanding and time

consuming to respond to users effectively• Not responding equal to not putting something out as open source

• Visibility can also be a bad thing!

 –  Engineering oriented

Page 23: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 23/35

Cloud Computing at UCSB

Open-source implementations of popular cloud systems

• Platform-as-a-service (PaaS) framework

• Web services based implementation of 

Google AppEngine APIs

• Runs over Eucalyptus, Amazon EC2, and

• RACELab students:

Chris Bunch & Navraj Chohan,Nupur Garg, Matt Hubert,

Puneet Lakhina, Yiming Li,

• Implements multiple database backends(Hbase, Hypertable, Cassandra, MySQL,…)

• Real use, real users, real impact

• International user community

http://appscale.cs.ucsb.eduLead: Chandra Krintz

Thanks!

, ,

Michal Weigel Visiting Scholar:Yoshihide Nomura (Fujitsu)

• Supported by: Google,

National Science Foundation,IBM Research

Page 24: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 24/35

Cloud Computing at UCSB

Open-source implementations of popular cloud systems

• Infrastructure-as-a-service (IaaS) framework

Web services based implementation of elastic/utility/cloud computing infrastructure

• Linux image hosting via virtualization

• Emulates the Amazon AWS interface –

• Platform-as-a-service (PaaS) framework

Web services based implementation of Google AppEngine APIs

• Runs over Eucalyptus, Amazon EC2, and

virtualization layers (Xen/KVM)

•  

•Real use, real users, real impact

• Distributed with Ubuntu

• Large international user community

http://www.eucalyptus.com

Lead: Rich Wolski

 

(Hbase, Hypertable, Cassandra, MySQL,…)• Real use, real users, real impact

• International user community

http://appscale.cs.ucsb.edu

Lead: Chandra Krintz

Thanks!

• RACELab students: Chris Bunch, Jovan Chohan, Navraj Chohan,Nupur Garg, Matt Hubert, Puneet Lakhina, Yiming Li, GauravMehta, Nagy Mostafa, Soo Hwan Park, Michal WeigelVisiting Scholar: Yoshihide Nomura (Fujitsu)

• Supported by: Google, National Science Foundation,IBM Research, You!?

Page 25: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 25/35

Cloudy Issues

• Many institutions and companies own IT infrastructure

• Public cloud features also useful for “on-premise” clouds –  Privacy of code and data

 –  Avoids vendor “lock-in”, pay-per-use, and availability reliance

 –   

 –  Potential for easing resource constraints• Storage/data management, cpu/memory, communication

Public clouds are opaque – open APIs, closed implementation –  What is your application doing?

• Can cloud fabrics support other application domains,

services, performance/availability requirements?

Page 26: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 26/35

AppScale

• Component interoperation –  Component handshakes via socket messages

 – 

Google protocol buffers for all data communication –  Inter-language IPC via Thrift

 •

Deploys and executes automatically over different cloudfabrics and virtualization layers –  Eucalyptus

 –  Amazon’s EC2

 –  Xen

 –  KVM

 –  Currently laying the groundwork with VMWare for an

AppScale + VCloud/VSphere effort

Page 27: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 27/35

AppScale

• An Open-source platform-as-a-service (PaaS) system thatemulates Google App Engine (GAE)

 –  Extension of GAE SDK with API implementations replacedAppLoadBalancer (ALB) Database Master/Peer (DB M/P)

AppServer (AS) Database Slave/Peer (DB S/P)

GAE App

Developer

(AppScale Admin)

GAE App

Users

AppScale

tools

HTTPS

App

Controller

ALB

DB M/P

DB S/P

ASGAE App

UsersGAE App

Users

AppScale Cloud

MapReduce

Tasks

Page 28: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 28/35

AppScale

• Extension of GAE SDK with API implementations replacedDatastore -> HBase, Hypertable, Cassandra, Voldemort, MySQL,

MongoDB, MemCacheDB in progress: Oracle’s TimesTenMapReduce -> Hadoop MemCache -> Python & Java

Authentication -> built-in, decoupled from Google Accounts

GAE App

Developer

(AppScale Admin)

GAE App

Users

AppScale

tools

HTTPS

App

Controller

ALB

DB M/P

DB S/P

ASGAE App

UsersGAE App

Users

AppScale Cloud

MapReduce

Tasks

Page 29: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 29/35

RACELab Research

• Cross-language shared memory (Michal Weigel) –  Replace high overhead communication protocols transparently

for co-located runtimes• Multi-language profiling and optimization (Nagy Mostafa)

 –  Track changes in software revisions that impact performance

  –  ross- anguage spec a zat on o nterpreters

• Computationally intensive AppScale (Chris Bunch) –  MapReduce, HPC libraries

 –  Scheduling and load balancing

• Alternative data-intensive computational models in the cloud(Navraj Chohan) –  Cloud storage and persistance of data

 –  Streaming data management

Page 30: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 30/35

AppScale

Page 31: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 31/35

AppScale

Page 32: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 32/35

UCSB Open-Source Cloud Computing

• Eucalyptus Lead: Rich Wolski, MAYHEM Lab

 –  Elastic Utility Computing Architecture Linking YourPrograms T o Useful S ystems

 –  Infrastructure-as-a-service (IaaS) framework –  Web services based implementation of elastic/utility/cloud

computing infrastructure

 – 

Linux image hosting via virtualization –  Emulates the Amazon AWS interfaces

Page 33: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 33/35

• Interface is based on Amazon’s published WSDL –  EC2/S3/EBS

 – 

EC2 “Availability” zones correspond to individual clusters –  Uses either Amazon tools or Eucalyptus replacements

 –  Web services and REST interface

Does not assume that worker nodes have publicly routableIP addresses

 –  All cloud images have access to a private network interface –  Multiple networking options including elastic IPs and security

groups• WS-security for authentication

• Web-based administration and accounting

E l

Page 34: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 34/35

Eucalyptus

• Availability –  Via popular Linux distributions: Ubuntu, Debian, CentOS, …

Uptake –  19,400 downloads, > 71 countries

 –  May 2009: Key technology in thep at orm

 –  Fundamental technology of theUbuntu Oct. 2009 distribution

• 10,000,000 potential downloads

• Commercial spinoff: Eucalyptus Systems, Inc –  Support open source community www.eucalyptus.com

 –  Customized cloud services/components/scalability/SLAs

 –  Hybrid clouds - High availability (now available!)

A S l

Page 35: Chandra Krintz

8/14/2019 Chandra Krintz

http://slidepdf.com/reader/full/chandra-krintz 35/35

AppScale

• Open-source platform-as-a-service (PaaS) thatemulates Google App Engine (GAE)

• GAE is a full application stack that facilitates constructionof interactive webpages with a database backing store

 –  Sandbox / restrictionsPure Python or Java, no thread/subprocess spawning, system calls

No writes to file system, reads only to static files uploaded w/app

Storage using key-value, schema-free datastore (Bigtable-based)

HTTP/S communication only, CGI to handle page requests

Limit on number of datastore elements accessed per request

Limit on response duration, task frequency, request rate

Enforced quotas (BW, CPU, requests/s, files, app size, …)