Transcript
Page 1: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 1

CLOSER 2012

The glideinWMS approachto the ownership of System Images

in the Cloud World

presented by Igor Sfiligoi1

co-authors A.Tiradani2, B.Holzman2 and D.C.Bradley3

1UC San Diego, 2Fermilab, 3UW Madison

Page 2: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 2

Our users

● Our primary target audience is the scientific community● In particular, those communities that need

massive amounts of CPU cycles

● Large number of users - O(1k)● Organized in groups (known as VOs)

● Batch processing is a must● Typical user task requires > 1 CPU year● Users understand the need of splitting the problem

in many (semi-)independent tasks

Virtual Organizations

Page 3: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 3

Our environment

● World-wide distributed computing a must● No single resource provider can provide enough

CPU for all the users– Both due to logistical and political constraints

● Until now this meant Grid computing● i.e. federation of independent batch sys. providers● and essentially all research-funded

● Our VOs are now considering Cloud computing, too● Both commercial and research-funded

Cloud==IaaS

Page 4: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 4

CLOUDGRID

Grid vs Cloud

● Similarities● Both are a way to provision resources

● Differences relevant to this talk

● Only bare (virtualized) hardware provided by the resource provider

● Must install OS before running the actual payload– But more flexibility

● OS and system libraries provided by the resource provider

● User just executes his payload– Must play well with

provided OS

Page 5: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 5

CLOUDGRID

Grid vs Cloud

● Similarities● Both are a way to provision resources

● Differences relevant to this talk

● Only bare (virtualized) hardware provided by the resource provider

● Must install OS before running the actual payload– But more flexibility

● OS and system libraries provided by the resource provider

● User just executes his payload– Must play well with

provided OS

Some user groups findthe cloud model easier

Page 6: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 6

Why can Cloud be easier?

● Counterintuitive● Installing and maintaining a whole OS a big task!

● Yet, easier than trying to adapt existing scientific application● Most code not actively maintained● Often written making system assumptions

● Plus, most (virtualized) hardware uniform● And HW API variations are usually much smaller

than OS API variations

Page 7: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 7

Why can Cloud be easier?

● Counterintuitive● Installing and maintaining a whole OS a big task!

● Yet, easier than trying to adapt existing scientific application● Most code not actively maintained● Often written making system assumptions

● Plus, most (virtualized) hardware uniform● And HW API variations are usually much smaller

than OS API variations

But someone stillhas to manage the batch jobs

EnterglideinWMS

Page 8: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 8

glideinWMS

● An overlay batch system on top of dynamic resources● Based on the pilot principle

● Hides provisioning from final users● Looks and feels like

a batch systemon top of dedicated resources to them

Provider A

Provider BProvider CProvider AProvider A

Glideinresource pool

http://tinyurl.com/glideinWMS Sfiligoi, I. et al., (2009). The pilot way to grid resources using glideinwms. In Computer Science and Information Engineering, 2009 WRI World Congress on, 2, pp.428-32. doi:10.1109/CSIE.2009.950

Page 9: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 9

glideinWMS architecture

● Three independent components● Glidein Factory – interface to resources● VO Frontend – resource provisioning logic● The actual WMS – seen by the final users

ResourceProviders

VO Frontend

GlideinFactory

WMS

PilotPilotPilot

PilotJob

Job Pilot

Monitor Request

Workload Management System == Batch system

Page 10: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 10

glideinWMS architecture

● Three independent components● Glidein Factory – interface to resources● VO Frontend – resource provisioning logic● The actual WMS – seen by the final users

ResourceProviders

VO Frontend

GlideinFactory

WMS

PilotPilotPilot

PilotJob

Job Pilot

Monitor RequestVO Frontend

GlideinFactory

WMS

GlideinFactory VO Frontend

VO Frontend

WMS

N-to-Mrelationship

Workload Management System == Batch system

Page 11: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 11

Multiple operation teams

● Few groups operate the whole glideinWMS● Typically separate groups for

● Glidein Factory● VO Frontend + WMS

Sfiligoi, I. et al., (2011). Reducing the human cost of grid computing with glideinWMS. In CLOUD COMPUTING 2011, pp. 217-21.

Page 12: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 12

Multiple operation teams

● Few groups operate the whole glideinWMS● Typically separate groups for

● Glidein Factory● VO Frontend + WMS

● Glidein Factory typically generic● Essentially an abstraction layer to resources● Was the interface to Grid resources

– Adding the support for Cloud resources now

Sfiligoi, I. et al., (2011). Reducing the human cost of grid computing with glideinWMS. In CLOUD COMPUTING 2011, pp. 217-21.Andrews, W. et al., (2011). Early experience on using glideinWMS in the cloud. J. Phys.: Conf. Ser. 331 062014

Page 13: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 13

Multiple operation teams

● Few groups operate the whole glideinWMS● Typically separate groups for

● Glidein Factory● VO Frontend + WMS

● Glidein Factory typically generic● VO Frontend the actual brain

● Contains resource provisioning logic● Provides pilot credential(s)● Customizes the pilot

Sfiligoi, I. et al., (2011). Adapting to the Unknown With a few Simple Rules: The glideinWMS Experience. In ADAPTIVE 2011, pp. 25-28.Jan 2012 VO Frontend Training Session at UCSD

Page 14: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 14

Multiple operation teams

● Few groups operate the whole glideinWMS● Typically separate groups for

● Glidein Factory● VO Frontend + WMS

● Glidein Factory typically generic● VO Frontend the actual brain

So...when movingto the Cloud,

who should ownthe OS image?

Page 15: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 15

Option 1 – The Grid approach

● The easiest path forward is to mimic the Grid approach● Nobody, but the closest layer aware of the Cloud

● This implies● OS image owned by the Glidein Factory

– Since it is the one closest to the Cloud● Drop superuser privileges as soon as

the system boots up– Since superuser ops not allowed in the Grid paradigm

Page 16: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 16

Option 1 - Analysis

Advantages:● VO unaware of the Cloud

● Only system services have superuser privileges– Minimize risk

● OS management shared between multiple VOs– Better quality– Lower security risk

Disadvantages:● VO cannot take advantage

of Cloud flexibility● Pilot cannot take full

advantage of OS features

● More work for the Factory admins– May need to maintain

several OS images

Page 17: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 17

Option 2 – Adding root privileges

● One step further is allowing pilot to run as root (i.e. superuser)

● This implies● OS image still owned by the Glidein Factory● But pilot process now a system service● VO can customize OS configs, e.g.

– Install new software– Configure and start otherwise-dormant system services

● Side effect of VO Frontend being able to customize the pilot● Since running as root, it can change anything in the system

● Acceptable, since pilot running with credential provided by the VO FE

Page 18: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 18

Option 2 - Analysis

Advantages:● VO only marginally aware

of the Cloud● VO can change

system settings– And the pilot now a proper

batch system daemon

● OS management shared between multiple VOs– Better quality– Lower security risk

Disadvantages:● VO cannot provide the

OS of their choice● Increased security risk

– Dynamic system configuration– Superuser processes

● VO must maintain any software it installs

● More work for the Factory admins– May need to maintain

several OS images

● Added effort● Increased security risk

Page 19: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 19

Option 3 – Pure Cloud model

● Move OS handling down to the “user”● The VO Frontend admins in glideinWMS

● This implies● OS owned by the VO Frontend● Glidein Factory still needs to manipulate it

– Add pilot bits– Proper Cloud-provider-specific contextualization

● Pilot process now a system service

Not strictly needed, but very little reason not to

We do not want to expose it to the scientists

Page 20: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 20

Option 3 - Analysis

Advantages:● VO can now provide

custom OS image● The pilot is a proper

batch system daemon

● Less work for the Glidein Factory admins

Disadvantages:● VO has to provide an

OS image● Increased security risk

– Dynamic system configuration– Superuser processes

● More work for VO(s)

– Must maintain the VO image● Potential image

manipulation problems– Given arbitrary OS image

in Glidein Factory

Expertise problem

Page 21: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 21

The network traffic cost

● Most Cloud providers charge users both for CPU and WAN network usage● Network was “free” in the Grid world

● Data prestaging required to keep costs down● Data hosting still cheaper than networking● This includes software (including the OS)

● Implications on the OS ownership options:● VO-owned software likely more dynamic

– Thus higher cost

Page 22: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 22

Making a choice

● No option a clear winner● Each option has significant advantages

and disadvantages

● Still want to pick one● To focus development● As the recommended

deployment strategyBut may end up

supporting all of themin the code

Page 23: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 23

Option 2 chosen

● Option 3 not suitable for current user base● Not enough expertise in VOs to maintain full OS● Happy with only a few OS versions

● Option 1 too restrictive● Some VOs are eager to tweak system settings

– Major reason to look at the Cloud

● Major drawback of option 2 is security risk● Can be mitigated by using superuser privileges

only when absolutely needed

Manpower alsoa major issue

Compared to Option 1

Page 24: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 24

Future work

● The glideinWMS project about to release the first Cloud-enabled version● Works fine in internal tests

● But no real-world Cloud experience yet● Just inferring from our Grid-based deployments

● Eager to put it in real-user hands● And collect their feedback

Page 25: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 25

Conclusions

● Our user base lives in batch system paradigm● Extensive experience with Grid world● But would like to expand into Cloud resources

● OS ownership the major change● glideinWMS architecture allows for delegating the

ownership to an intermediate service● And lower the migration cost

● But is it a good idea?● Yes, if some customization allowed

Page 26: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 26

Acknowledgments

● This work is partially sponsored by ● the US National Science Foundation under Grants

No. OCI-0943725 (STCI), PHY-1104549 (AAA), and PHY-0612805 (CMS Maintenance & Operations)

and ● the US Department of Energy under Grants No.

DE-SC0002298 (ANDSL) and DE-FC02-06ER41436 subcontract No. 647F290 (OSG).

Page 27: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 27

Backup slides

Page 28: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 28

glideinWMS in Numbers

● OSG Glidein Factory at UCSD● Serving ~10 Frontends

Page 29: The glideinWMS approach to the ownership of System Images in the Cloud World

CLOSER 2012 - Apr 2012 glideinWMS - Ownership of VM 29

glideinWMS in Numbers

● CMS Frontend at CERN● Using 3 Glidein Factories


Recommended