45
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it PES Cloud Computing Cloud Computing (and virtualization at CERN) Ulrich Schwickerath et al With special thanks to the many contributors to this presentation! Disclaimer: largely personal view of things GridKa School 2011, Karlsruhe

Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

PES Cloud Computing

Cloud Computing

(and virtualization at CERN)

Ulrich Schwickerath et al

With special thanks to the many contributors to this presentation!

Disclaimer: largely personal view of things

GridKa School 2011, Karlsruhe

Page 2: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

Outline

What is this all about ?• Definition, Key features and manifestations• Is it useful ? For what ? For whom ?• Is it relevant for HEP community

An IaaS cook book• Ingredients and recipe • Caveats and work to do

The practice• Results from a prototype setup at CERN

Conclusions

Page 3: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

What is Cloud Computing ?

Cloud computing is the delivery of computing as a service rather than a product, whereby shared resources, software and information are provided to computers and other devices as a utility (like the electricity grid) over a network (typically the Internet).

Source: Wikipedia

Page 4: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

Key features

elastic

performant

reliable

scalable

maintainable secureagile

Source: wikipedia

Page 5: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

5

Manifestation: XaaS

Software as a Service

SaaS

Platform as a Service

PaaS

Infrastructure as a Service

IaaS

Thanks to Tony Cass

On demand access to applications

Platform for building & deliveringWeb applications

Utility computing

Page 6: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

The key question is ...

What is the relevance for HEP community ?

Disclaimer: personal and site biased perspective, with a focus on IaaS

Page 7: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

What are the challenges?

Increasing demand for computing and storage resources

More (new) users to support

More communities to support (VOs), with different requirements

Conflicting software requirements for applications and new hardware

Resource provider challenges

Page 8: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

What are the challenges?

Decreasing resources for developments and maintenance

Constant or decreasing number of people to provide computing services

Resource provider boundary conditions

Necessity to optimize the use of existing

resources

Page 9: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

What are the challenges ?

Example:CERN public batch resources

Page 10: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

What are the challenges ?

VOs often know their users better than the sites do

VOs know best where their active data is

VOs often unhappy with sites scheduling decisions

Example: Pilot job frameworks:

Essentially a work-around for VOs to do the job scheduling themselves Causes some overhead for the sites who have to maintain new services

glExec, SCAS/Argus UID switching within a single user job

Page 11: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

11

… and the new technologies

Physical hardware

Host operating system

Application Application

Physical hardware

Host operating system

Virtual host Virtual host

Host OS Host OS

Application Application

Virtualization as part of the solution

Page 12: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

… and the new technologies

Virtualization + sand-boxing

Page 13: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

… and the new technologies

Virtual machine management systems

Central place to manage your VMs in the computer centerSome complemented by services required for Clouds

ISF

Page 14: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

How can virtualization help ?

Many dedicated machines

Mainly managed by VOs

Specific applications

Low CPU and I/O usage

Large reliability required

Current CERN solution for consolidation using CERN Virtual Infrastructure (CVI)

• Specialized fully redundant hardware with shared dedicated storage via iSCSI

• Support for live migration for services

• Selected solution: Microsoft Hyper-V + SCVMM as orchestrator

• Completed by Self-Service on cheap hardware

• Different configurations: enclosures+shared storage, disk servers, worker nodes

Resource usage optimization

Page 15: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

How can virtualization help ?

Virtualization for service consolidation is reality, wide

spread and routine

Example: CERN CVI

Page 16: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

CVI status (April 2011)

Number of Virtual Machines per Operating System

54% Windows VMs

46% Linux VMs

Sep 2011:1700 VMs Apr 2011: 1250 VMs on 248 hypervisorsNov 2010: 680 VMs on 170 hypervisors

Thanks to Jan van Eldik et al

Page 17: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

April 2011: CVI 1250 VMs, 8 Customer groups

IT-PES service consolidation

BE-CO

Grid devel

IT-OIS

Custom hosts

Engineering

Thanks to Jan van Eldik et al

Page 18: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

18

How can virtualization help?

Note: virtualization is NOT cloud

computing

Page 19: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

19

Other 18%

Databases 4%

VO Services 5%

Mass Storage 25%

Batch 40%

Grid Services 2%

WinServices 6%

Can we gain elsewhere ?

Example: CERN Computer Center in September 2011

Page 20: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

Batch processing resources

Managed by the site

Grid Worker node setup

Lots of identical machines

Challenging use case due to scale and performance requirements!

CPU, I/O and network demanding

Cheap hardware

individual failures OK

Page 21: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

Batch virtualization: principles

What is(are) the problem(s) to solve ?

Optimize Job throughputJob success rate

MinimizeOperational overheadDowntime for updates

RequirePerformance and speed

Page 22: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

22

The price to pay for virtualization

HS06 CPU benchmark test

Bare metal: SLC52x L5520 Intel Xeon2.27GHz

HS06=12.05/core

KVMHW as beforeSLC5/6 hypervisor8 SLC5 guestsNo KSM,ept off (SLC5)Pinned VMs

HS06=11.4/core (SLC5)HS06=11.8/core (SLC6)

HyperVHW as before8 SLC5 guests

HS06=11.7/core

Page 23: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

23

1 2 3 4 5 6 7 80

100020003000400050006000700080009000

10000

Read

VMBare metal

Threads/VM number

KBytes/sec

1 2 3 4 5 6 7 80

2000

4000

6000

8000

10000

Write

VMBare metal

Threads/VM number

KBytes/sec

I/O Benchmarking

Analysis by Qiulan Huang

(Chinese academy of

science),December 2010

Notes:- Caching off- SLC5 Hypervisor- 20% penalty- write worse- block device disk on LV, exported to VM

preliminary

preliminary

(random)

(random)

Worst case scenario: 1VM/physical core running IOzone, 8 threads on bare metal

Page 24: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

24

I/O Benchmarking

• Analysis by Qiulan Huang (Chinese academy of science), December 2010

Caching off, SLC5 based KVM hypervisors

LV raw device, imported into the VM

1-8 VM per hypervisor with one IOZone benchmark each

Bare metal test with 1-8 concurrent IOZone threads

20% penalty, write performance penatly is worse• New analysis ongoing by Belmiro Moreira (CERN)

Trying qcow2 and compare to LV

Based on SLC6.1 KVM

No final results yet

Page 25: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

25

933

934

935

936

937

938

939

940

Network Benchmark

bare metal1vm2VMs3VMs4VMs5VMs6VMs7VMs8VMs

Throughput()Mbits/s)

Network Benchmarking

Iperf with TCP window size of 256k and 60s test time

Analysis by Qiulan Huang

(Chinese academy of science),December 2010, CERN

Penalty <=1%

reference

Page 26: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

Virtualizing batch resources

Key ideas for prototype at CERN

● Isolation: one job per virtual machine only ● Limited Live-Time for each worker node● Be agile: always start from the latest image with

the newest software● Be demand driven: if possible, adjust running

VMs to current or expected demand

Page 27: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

27

Virtualization of batch resources

How to provide the right mix of environments matching needs ?

Batch worker nodesdynamically joining LSF

Page 28: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

28

Automate intrusive interventions: kernel, afs, glibc updates

Draining, runs until job finishes

Accepts jobs for 24 H

LONG JOB

VMSlot #1

Draining, runs until job finishes

Accepts jobs for 24 H

SHORT JOB SHORT JOB SHORT JOB

VMSlot #2

Accepts jobs for 24 H

LONG JOB

Image A Image B

Draining, runs until job finishes

Accepts jobs for 24 H

SHORT JOB SHORT JOB SHORT JOB

Notes:

NEW virtual machines always start with the latest image

Image A and Image B can correspond to different OS versions

Virtualization of batch resources

Thanks to Ricardo Silva

Page 29: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

Cooking up the infrastructure

Provisioning systemVM Orchestrator

Computing resources

Image repositoryImage creation

Image distribution

Application(batch)

Administrator

Ups ... looks like an IaaS infrastructure, doesn't it ?

Page 30: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

CERN's “lxcloud” system choices

Image creation:

Software setup can be derived from centrally managed computers

Image distribution:

currently Bit-torrent in use

VM orchestrator:

tested OpenNebula and Platform ISF, interested in OpenStack

Page 31: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

CERN's “lxcloud” system choices

Virtualization technology

Started with XEN, moved to KVM in 2010

Image treatment:

Prestaged images, using LVM snapshot

Public cloud interface(s)

Tests with ONE EC2 access using ATLAS hammer-cloud tests (selected users only)

Page 32: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

CERN IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 32ISC-cloud 2010 – Frankfurt / Main

Image distribution

Image distribution performance

Page 33: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

33

VM provisioning (ONE)Virtual machine provisioning system tests

9513

101781

153225

297369

441 585657

729801

873945 1089

11611233

13051377

14491521

15931665

17371809

18811953

20252097

21692241

23132385

24572529

26012673

27452817

28892961

30333105

31773249

33213393

34653537

36093681

37533825

38973969

4041

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

Time

Nod

es s

een

in L

SF

Peak of ~16,000 VMs

Page 34: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

34

Is this already a cloud ???

Oh, well, maybe yes, it depends ...

Is this a cloud infrastructure

Page 35: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

35

Is this a cloud ???

Yes, no, yes, depends ... maybe

Does it matter ?

Is this a cloud infrastructure

It has shares some features with the Cloud definition.

… as long as it does what we need

Page 36: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

36

Virtual batch at CERN

In production since December 2010 at CERN

Page 37: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

Batch virtualization: caveat

● The bulk of applications are single threaded.

● Therefore, a full virtualization of current CERN batch resources translates into

> 31,000 VMs in a single batch instance

The scalability challenge

Page 38: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

38

Batch system tests: resource layer Up to 15,000 nodesUp to 400,000 jobs

Feasibility studies

More than 3x of what is officially

supported

Current production system

Page 39: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

39

Batch system tests: job submissionUp to 15,000 nodesUp to 400,000 jobs

Feasibility studies

More than 3x of what is officially

supported

Current production system

Page 40: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

40

Batch virtualization scalability

Simply virtualizing the batch farm, based on a 1Core=1+VM model and a flat batch farm is

unlikely to scale as needed

Need to do something more clever ...

Page 41: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

Going cloud ...

Lxcloud production

CERN VMIC

CERNVM “Golden Nodes”

Quattor

ONE production master

ONE test master

ONE EC2 interface

Lxcloud - Test

Physical Resources

Image creation

VM provisioning

EnduserVO

ApplicationManager(lxbatch)

Page 42: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

Public cloud access testsEC2 lxcloud Direct submission Regular GRID jobs

Event rate

Efficiency

Thanks to Daniel van der Ster

Page 43: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

A more general IaaS model ?

Page 44: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

Conclusions

● Cloud computing is certainly a nice new technology, and technology is maturing

● First attempts to apply it are in progress at several sites, to solve operational issues rather than to please users

● Virtualization plays a vital role in the applied models, and is already reality

● Application scalability is an issue for larger scale deployments

● Looking forward to an interesting future

Page 45: Cloud Computing - KIT · 2011-09-07 · Cloud computing is certainly a nice new technology, and technology is maturing First attempts to apply it are in progress at several sites,

45

Batch virtualization

Thank you!

Any questions ?