29
CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 1 CLOSER 2012 The glideinWMS approach to the ownership of System Images in the Cloud World presented by Igor Sfiligoi 1 co-authors A.Tiradani 2 , B.Holzman 2 and D.C.Bradley 3 1 UC San Diego, 2 Fermilab, 3 UW Madison

The glideinWMS approach to the ownership of System Images in the Cloud World

Embed Size (px)

DESCRIPTION

Presentation at CLOSER 2012. Scientific communities that are accustomed to use Grid resources are now considering the use of Cloud resources. However, moving from the Grid to the Cloud brings along the need for the creation and maintenance of the system image used to configure the provisioned resources, and this presents both opportunities and problems for the users. The impact is especially interesting in the context of glideinWMS due to its layered architecture. This presentation describes the various options available to the glideinWMS project team, their advantages and disadvantages, and explains why one of them is to be preferred. Closer web page: http://closer.scitevents.org/

Citation preview

Page 1: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 1

CLOSER 2012

The glideinWMS approachto the ownership of System Images

in the Cloud World

presented by Igor Sfiligoi1

co-authors A.Tiradani2, B.Holzman2 and D.C.Bradley3

1UC San Diego, 2Fermilab, 3UW Madison

Page 2: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 2

Our users

● Our primary target audience is the scientific community● In particular, those communities that need

massive amounts of CPU cycles

● Large number of users - O(1k)● Organized in groups (known as VOs)

● Batch processing is a must● Typical user task requires > 1 CPU year● Users understand the need of splitting the problem

in many (semi-)independent tasks

Virtual Organizations

Page 3: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 3

Our environment

● World-wide distributed computing a must● No single resource provider can provide enough

CPU for all the users– Both due to logistical and political constraints

● Until now this meant Grid computing● i.e. federation of independent batch sys. providers● and essentially all research-funded

● Our VOs are now considering Cloud computing, too● Both commercial and research-funded

Cloud==IaaS

Page 4: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 4

CLOUDGRID

Grid vs Cloud

● Similarities● Both are a way to provision resources

● Differences relevant to this talk

● Only bare (virtualized) hardware provided by the resource provider

● Must install OS before running the actual payload– But more flexibility

● OS and system libraries provided by the resource provider

● User just executes his payload– Must play well with

provided OS

Page 5: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 5

CLOUDGRID

Grid vs Cloud

● Similarities● Both are a way to provision resources

● Differences relevant to this talk

● Only bare (virtualized) hardware provided by the resource provider

● Must install OS before running the actual payload– But more flexibility

● OS and system libraries provided by the resource provider

● User just executes his payload– Must play well with

provided OS

Some user groups findthe cloud model easier

Page 6: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 6

Why can Cloud be easier?

● Counterintuitive● Installing and maintaining a whole OS a big task!

● Yet, easier than trying to adapt existing scientific application● Most code not actively maintained● Often written making system assumptions

● Plus, most (virtualized) hardware uniform● And HW API variations are usually much smaller

than OS API variations

Page 7: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 7

Why can Cloud be easier?

● Counterintuitive● Installing and maintaining a whole OS a big task!

● Yet, easier than trying to adapt existing scientific application● Most code not actively maintained● Often written making system assumptions

● Plus, most (virtualized) hardware uniform● And HW API variations are usually much smaller

than OS API variations

But someone stillhas to manage the batch jobs

EnterglideinWMS

Page 8: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 8

glideinWMS

● An overlay batch system on top of dynamic resources● Based on the pilot principle

● Hides provisioning from final users● Looks and feels like

a batch systemon top of dedicated resources to them

Provider A

Provider BProvider CProvider AProvider A

Glideinresource pool

http://tinyurl.com/glideinWMS Sfiligoi, I. et al., (2009). The pilot way to grid resources using glideinwms. In Computer Science and Information Engineering, 2009 WRI World Congress on, 2, pp.428-32. doi:10.1109/CSIE.2009.950

Page 9: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 9

glideinWMS architecture

● Three independent components● Glidein Factory – interface to resources● VO Frontend – resource provisioning logic● The actual WMS – seen by the final users

ResourceProviders

VO Frontend

GlideinFactory

WMS

PilotPilotPilot

PilotJob

Job Pilot

Monitor Request

Workload Management System == Batch system

Page 10: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 10

glideinWMS architecture

● Three independent components● Glidein Factory – interface to resources● VO Frontend – resource provisioning logic● The actual WMS – seen by the final users

ResourceProviders

VO Frontend

GlideinFactory

WMS

PilotPilotPilot

PilotJob

Job Pilot

Monitor RequestVO Frontend

GlideinFactory

WMS

GlideinFactory VO Frontend

VO Frontend

WMS

N-to-Mrelationship

Workload Management System == Batch system

Page 11: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 11

Multiple operation teams

● Few groups operate the whole glideinWMS● Typically separate groups for

● Glidein Factory● VO Frontend + WMS

Sfiligoi, I. et al., (2011). Reducing the human cost of grid computing with glideinWMS. In CLOUD COMPUTING 2011, pp. 217-21.

Page 12: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 12

Multiple operation teams

● Few groups operate the whole glideinWMS● Typically separate groups for

● Glidein Factory● VO Frontend + WMS

● Glidein Factory typically generic● Essentially an abstraction layer to resources● Was the interface to Grid resources

– Adding the support for Cloud resources now

Sfiligoi, I. et al., (2011). Reducing the human cost of grid computing with glideinWMS. In CLOUD COMPUTING 2011, pp. 217-21.Andrews, W. et al., (2011). Early experience on using glideinWMS in the cloud. J. Phys.: Conf. Ser. 331 062014

Page 13: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 13

Multiple operation teams

● Few groups operate the whole glideinWMS● Typically separate groups for

● Glidein Factory● VO Frontend + WMS

● Glidein Factory typically generic● VO Frontend the actual brain

● Contains resource provisioning logic● Provides pilot credential(s)● Customizes the pilot

Sfiligoi, I. et al., (2011). Adapting to the Unknown With a few Simple Rules: The glideinWMS Experience. In ADAPTIVE 2011, pp. 25-28.Jan 2012 VO Frontend Training Session at UCSD

Page 14: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 14

Multiple operation teams

● Few groups operate the whole glideinWMS● Typically separate groups for

● Glidein Factory● VO Frontend + WMS

● Glidein Factory typically generic● VO Frontend the actual brain

So...when movingto the Cloud,

who should ownthe OS image?

Page 15: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 15

Option 1 – The Grid approach

● The easiest path forward is to mimic the Grid approach● Nobody, but the closest layer aware of the Cloud

● This implies● OS image owned by the Glidein Factory

– Since it is the one closest to the Cloud● Drop superuser privileges as soon as

the system boots up– Since superuser ops not allowed in the Grid paradigm

Page 16: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 16

Option 1 - Analysis

Advantages:● VO unaware of the Cloud

● Only system services have superuser privileges– Minimize risk

● OS management shared between multiple VOs– Better quality– Lower security risk

Disadvantages:● VO cannot take advantage

of Cloud flexibility● Pilot cannot take full

advantage of OS features

● More work for the Factory admins– May need to maintain

several OS images

Page 17: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 17

Option 2 – Adding root privileges

● One step further is allowing pilot to run as root (i.e. superuser)

● This implies● OS image still owned by the Glidein Factory● But pilot process now a system service● VO can customize OS configs, e.g.

– Install new software– Configure and start otherwise-dormant system services

● Side effect of VO Frontend being able to customize the pilot● Since running as root, it can change anything in the system

● Acceptable, since pilot running with credential provided by the VO FE

Page 18: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 18

Option 2 - Analysis

Advantages:● VO only marginally aware

of the Cloud● VO can change

system settings– And the pilot now a proper

batch system daemon

● OS management shared between multiple VOs– Better quality– Lower security risk

Disadvantages:● VO cannot provide the

OS of their choice● Increased security risk

– Dynamic system configuration– Superuser processes

● VO must maintain any software it installs

● More work for the Factory admins– May need to maintain

several OS images

● Added effort● Increased security risk

Page 19: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 19

Option 3 – Pure Cloud model

● Move OS handling down to the “user”● The VO Frontend admins in glideinWMS

● This implies● OS owned by the VO Frontend● Glidein Factory still needs to manipulate it

– Add pilot bits– Proper Cloud-provider-specific contextualization

● Pilot process now a system service

Not strictly needed, but very little reason not to

We do not want to expose it to the scientists

Page 20: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 20

Option 3 - Analysis

Advantages:● VO can now provide

custom OS image● The pilot is a proper

batch system daemon

● Less work for the Glidein Factory admins

Disadvantages:● VO has to provide an

OS image● Increased security risk

– Dynamic system configuration– Superuser processes

● More work for VO(s)

– Must maintain the VO image● Potential image

manipulation problems– Given arbitrary OS image

in Glidein Factory

Expertise problem

Page 21: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 21

The network traffic cost

● Most Cloud providers charge users both for CPU and WAN network usage● Network was “free” in the Grid world

● Data prestaging required to keep costs down● Data hosting still cheaper than networking● This includes software (including the OS)

● Implications on the OS ownership options:● VO-owned software likely more dynamic

– Thus higher cost

Page 22: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 22

Making a choice

● No option a clear winner● Each option has significant advantages

and disadvantages

● Still want to pick one● To focus development● As the recommended

deployment strategyBut may end up

supporting all of themin the code

Page 23: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 23

Option 2 chosen

● Option 3 not suitable for current user base● Not enough expertise in VOs to maintain full OS● Happy with only a few OS versions

● Option 1 too restrictive● Some VOs are eager to tweak system settings

– Major reason to look at the Cloud

● Major drawback of option 2 is security risk● Can be mitigated by using superuser privileges

only when absolutely needed

Manpower alsoa major issue

Compared to Option 1

Page 24: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 24

Future work

● The glideinWMS project about to release the first Cloud-enabled version● Works fine in internal tests

● But no real-world Cloud experience yet● Just inferring from our Grid-based deployments

● Eager to put it in real-user hands● And collect their feedback

Page 25: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 25

Conclusions

● Our user base lives in batch system paradigm● Extensive experience with Grid world● But would like to expand into Cloud resources

● OS ownership the major change● glideinWMS architecture allows for delegating the

ownership to an intermediate service● And lower the migration cost

● But is it a good idea?● Yes, if some customization allowed

Page 26: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 26

Acknowledgments

● This work is partially sponsored by ● the US National Science Foundation under Grants

No. OCI-0943725 (STCI), PHY-1104549 (AAA), and PHY-0612805 (CMS Maintenance & Operations)

and ● the US Department of Energy under Grants No.

DE-SC0002298 (ANDSL) and DE-FC02-06ER41436 subcontract No. 647F290 (OSG).

Page 27: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 27

Backup slides

Page 28: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 28

glideinWMS in Numbers

● OSG Glidein Factory at UCSD● Serving ~10 Frontends

Page 29: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 29

glideinWMS in Numbers

● CMS Frontend at CERN● Using 3 Glidein Factories