August 20th, Fermi Lab 1
Sebas&enGoasguen–[email protected]
SchoolofCompu,ng
ClemsonUniversity,Clemson,SC
Scien,ficAssociateatCERN
Summer2009andSummer2010
August 20th, Fermi Lab 2
Outline
• CloudBasics• BuildingaCloudProvider
– Lxcloud@CERN
• VOCsandClouds– ResearchdoneatClemson
August 20th, Fermi Lab 3
WhatisCloudCompu&ng?
August 20th, Fermi Lab 4
Afewreferences
"Above the clouds: A Berkeley view of cloud computing"http://berkeleyclouds.blogspot.com/
"A break in the clouds: towards a cloud definition"L.M Vaquero et al. SIGCOMM computer communication review, 2008. http://portal.acm.org/citation.cfm?id=1496100
"An EGEE Comparative Study - Grid cloud comparative study"M-Elian Begin, 2009
August 20th, Fermi Lab 5
OntheHypecurve
• NowprobablyatthetopoftheHype–Oct09
August 20th, Fermi Lab 6
Trendy…
• Source:hOp://www.google.com/trends
August 20th, Fermi Lab 7
Cloudforma&on
• SlideadaptedfromRichWolski,UCSB
August 20th, Fermi Lab 8
WhiteHouseisgoingtotheCloud• Reducecosts…SeeApps.gov
August 20th, Fermi Lab 9
DOEandNASAtoo(Checknovacc.org)
August 20th, Fermi Lab 10
EverythingismovingtotheCloud…StayonEarththough!
• hOp://contactdubai.com/tag/saas‐soTware‐or‐storage‐as‐a‐service
August 20th, Fermi Lab 11
An“Old”idea:OSI/AnatomyoftheGrid/Windowsarchitectures…
August 20th, Fermi Lab 12
WhatistheCloud?The*aaS
• SaaS–SoVwareasaService‐• PaaS–PlaYormasaService‐
• Iaas–InfrastructureasaService–• Servicecomposi,onatalllayersofdistributedsystem.Buildsasystemofsystem
• SoTwareandhardwarereuse
• Tendencyforthe*aaS‐itusbutthesethreearethemainones
August 20th, Fermi Lab 13
SoVwareasaService
August 20th, Fermi Lab 14
Skyisthelimit…
• Phoneapps…FermiVoice?
August 20th, Fermi Lab 15
PlaYormasaService
August 20th, Fermi Lab 16
InfrastructureasaService/Comingofageofvirtualiza&on
August 20th, Fermi Lab 17
WhatistheCloud?The*aaS• SaaS–SoVwareasaService–
– EasyAccesstohostedapplica,onsoverthenetwork.MostlikelyusingyourBrowser
– APItotheseapplica,ons• PaaS–PlaYormasaService–
– Environmenttodeploynewapplica,ons– Restrictedcapabili,esoffered– APItothispla]ormandaccesstoSaaSAPI
• Iaas–InfrastructureasaService–– AccesstoHardwareresources– APItomakeresourcealloca,onrequests
August 20th, Fermi Lab 18
KeyFeatures• Youdon’tknowwhat’sbehindbutitworks
– Transparency• YouPaywhatyouuse
– U,litypricing• Yougetwhatyouaskfor(On‐demand)
– ReadthefineprintsoftheSLAs…• Itscalesifyouneedmore
– Howfardoesitscale?– Doesn’tthismeantheunderlyingresourcesareunderu,lized?
August 20th, Fermi Lab 19
Whynow?Evolu&onoftheMashupRevolu&onthanks
toanAPI“explosion”
August 20th, Fermi Lab 20
Whynow?• BigInternetcompaniesfacedalotofdatatoanalyze:weblogs…
• Developedinhouse:Newfilesystem(Hadoop),newanalysisframework(Map‐Reduce)
• Massiveamountofresourcesallacrosstheplanet:>500,000coresforGoogle?
• Higherneedstoconsolidate:virtualiza,on,energycosts.
• Newdevices:iPhone/G1• Atrulyinter‐connectedplanet
August 20th, Fermi Lab 21
Afewinteres&ngthings…tos&rthepot• Industryisleading.Isacademiabehind?
• Whocaresaboutstandards?(>20bodiesworkingoncloudstandards…)
• Weshouldswitchparadigmandrewriteapplica,onsoncetheyare6monthsold.
August 20th, Fermi Lab 22
Outline
• CloudBasics• BuildingaCloudProvider
– Lxcloud@CERN(Incollabora&onwithUlrichSchwickerath,EwanRoche,BelmiroMoreiraandRomainWartel)
• VOCsandClouds– ResearchdoneatClemson
August 20th, Fermi Lab 23
IaaSlevel• Forconsolida,ngservices
– UsedinITforawhilenow– FermiGridservicesrunninginXenVMs
• Forofferingon‐demandservices– E.gVOBoxes,replacehardwarerequest
• Forvirtualizinglargescaleservices– Clusterson‐demand
• Virtualiza,onisakeyenablerforIaaS
August 20th, Fermi Lab 24
BatchVirtualiza,on
RunbatchjobswithinVirtualMachines BeOerapplica,onenvironment
Custommadebyuser Increasedsecurity BeOercontrolonresourcesharing
Mul,‐coreapps Increasedflexibilityontheadminside
CanrunapreferredOSonthemetal
Whyvirtualizing“Batch”?
August 20th, Fermi Lab 25
BatchVirtualiza,on
Type1:Runmyjobs(inyourVM)
Type2:RunmyjobsinmyVM
Howtovirtualize“Batch”…smoothly?
August 20th, Fermi Lab 26
Type3:Givememyinfrastructurei.eaVMorabatchofVMs
Movingtothecloud:
August 20th, Fermi Lab 27 3/23
Deployment Models Innovation in Cloud Computing Architectures
August 20th, Fermi Lab 28
Maincomponents/characteris&csSetofHypervisors• Physicalmachineswithavirtualmachinemonitor• XenorKVM...orHyper‐V...orVMwareESx...VMprovisioningsystem• OpenNebula• Nimbus• Eucalyptus• Pla]ormISF• oreventradi,onalschedulerslikePBS/Maui.Imagedistribu&onmechanism• Sharedfilesystem(e.gNFS,AFS,PVFS,Lustre...)• Copyimages(e.gscp,wget,BiOorent)Networking• Private/Publicbridged• NAT
August 20th, Fermi Lab 29
ThoughtsforOSG…tos&rthepotagain• Sitesneedtohavehypervisors,that’sastar,ngpoint.Withoutit/themtherewon’tbeOSGclouds.
• WhatVMM/HypervisortheyusedoesnotmaOer…butmyguessisthat80%willuseKVM
• WhatprovisioningsystemtheyuseisamaOeroflocaltechnicalsetup,tasteandrela,onships
• Sitescandothisnow• Thehardproblemisintheimagetransferandtrust…SeeHEPiXvirtualiza,onworkinggroup
August 20th, Fermi Lab 30
CERN'sLXCLOUDarchitecture
• ImagerepositorywithGoldennodes.
• VMinstancesnotquaOormanagedhavefinitelife,me
• SpecificIP/MACsarepinnedtohypervisors
• Currentlytes,ngtwoprovisioningsystem:OpennebulaandPla]ormISF.
August 20th, Fermi Lab 31
August 20th, Fermi Lab 32
ProvisioningsystemOpenNebulaandPla]ormISFarecurrentlybeingevaluated.ResultsshowninthistalkwereobtainedwithOpenNebula.
OpenNebulaoutoftheUniversityCompultenseofMadrid• C/C++corewithRubydriversandcommandlineinterface• MysqlandSqlitebackends• Usesshascommunica,onbetweenfrontendandhosts• XML‐RPCAPI• • SupportforLVMcontributedbyCERN• EnablesHybridclouds(i.einstan,a,ononremotecloudproviders)
• ImplementssubsetofEC2interfaceaswellasupcomingOCCIinterfaceforPubliccloudinterface.
August 20th, Fermi Lab 33
ComparisonwithSimilarTechnologiesOpenNebula - Architecture, Current Status & Roadmap
Platform ISF VMware Vsphere Eucalyptus Nimbus OpenNebula
Virtualization Management VMware, Xen VMware Xen, KVM Xen Xen, KVM,
VMware
Virtual Network Management Yes Yes No Yes Yes
Image Management Yes Yes Yes Yes Yes
Service Contextualization No No No Yes Yes
Scheduling Yes Yes No No Yes
Administration Interface Yes Yes No No Yes
Hybrid Cloud Computing No No No No Yes
Cloud Interfaces No vCloud EC2 WSRF, EC2 EC2 QueryOGF OCCI
vCloud
Flexibility and Extensibility Yes No Yes Yes Yes
Open Source No No GPL Apache Apache
August 20th, Fermi Lab 34
CERN'sLXCLOUDdetails
• AlogscpandbiOorrentimagedistribu,onhasbeenimplemented
• Hypervisorsrunu,li,estodetectwhatVMtheyareallowedtorunandwhichimagestheyneedtodownload
• OpenNebulatriggersinstan,a,onviassh
• InstancesbasedonLVMsnapshots
August 20th, Fermi Lab 35
August 20th, Fermi Lab 36
ImageDistribu&on
Push:• Sequen,alSCP• logarithmicSCP(scp‐wave)• hOp://code.google.com/p/scp‐wave/
Pull:• wgetviaanhOpbasedrepository(locally)• BiOorrent(RomainWartel,BelmiroMoreira@CERN)
SharedFS• NFS• PVFS,Lustre...
August 20th, Fermi Lab 37
Imagedistribu&onresults(thxtoBelmiro)
August 20th, Fermi Lab 38
Guidingtheprovisioning
• Definepoliciestocomposethebatchfarm• Automatetheprovisioningofthevirtualmachinessuchthatthepoliciesareenforced.
• e.gInspectthejobqueueanddeducethebestcomposi,onofthebatchfarm.IntermsofSMPVMs,OS...
• AsizerisusedtomonitorthepoolofVMinstancesandevaluatethepolicies.
• Currentlyonlyonepolicy:"KeepthepoolfullwiththepropersharesofVMtypes"
• SeeICAC2010andCCGRID2009papers
August 20th, Fermi Lab 39
AutonomicProvisioningResults
August 20th, Fermi Lab 40
EarlyResultsofsizer
August 20th, Fermi Lab 41
JoiningtheBatchsystem...Acontextualiza=onproblem...
CONTEXT = [ vmid = "$VMID", TTL = "3", AFS = "off", files = "/opt/vmimage/init.sh /opt/vmimage/etchosts /opt/vmimage/etcsysconfigifcfg /opt/vmimage/id_rsa.pub /o pt/vmimage/lsfcontext.conf /opt/vmimage/etcsysconfignetwork", target = "xvdb" ]
• FilesandvariablesarestoredinaISOcreatedonthefly.
• StartupscriptmountsthisISOandrunscontextualiza,onscript.
• VMsaresetupasdynamichostsintheLSFpool.
August 20th, Fermi Lab 42
ScalabilityTests...7,500slotsinLSFviaOpennebula
August 20th, Fermi Lab 43
Tes&ngLSFscalability
August 20th, Fermi Lab 44
IaaSatClemson• Thereallyeasyway:
– KVMonaregularHPCcluster
– NATnetworking(everyVMgetsitsownNAT)
– BaseimageonNFSserver– KVMsnapshotmodecreatestemporarydiskinscratch,diskdiscardedonceinstanceisshutdown
– SubmitVMsasPBSjobs
IMAGE=/home/sebgoa/kvm/star5.img
export TMPDIR=/local_scratch
kvm -hda $IMAGE -net nic,model=e1000 -net user -m 1280 -snapshot -nographic;
August 20th, Fermi Lab 45
IaaSatClemson
• But…– NosharedFSbetweenVMs
– LookslikeeachVMhasthesameIP
– Can’tuseregularjobmanagementsystemstorunjobsinthoseVMs(needglidein/proxylikesolu,on)
• ThissetuphasbeenoneofthekeydriversforourdevelopmentofKestrel:AnXMPPbasedjobmanagementsystem
August 20th, Fermi Lab 46
Kestrel
• AjobmanagementframeworkusingtheXMPPprotocol
• Startedasastudentproject
• UsesInstantMessagingconceptsofno,fica,ons
• Prac,calinadversenetworkcondi,ons
hop://wiki.github.com/legastero/Kestrel/hops://twiki.grid.iu.edu/bin/view/CampusGrids/InstallingKestrel
August 20th, Fermi Lab 47
Boo&ngVMsisextremelyfast(20VMs/sec)
August 20th, Fermi Lab 48
STARSuccesswithClemsonIaaSandKestrel
• “Buttosimulatetheequivalentsampleof12.2BillionMonte‐Carloeventswith~10MillionacceptedbyeventtriggeringaTerfulleventreconstruc,on,wewouldhavetaken3yearsatBNLon50machinesThisMonte‐Carloeventgenera,onwouldessen,allynothavebeendone.Withtheresourcesfromcloud,wetook3‐4weeks.”–JeromeLauretBNL.
August 20th, Fermi Lab 49
Conclusions• TheCloudisherelet’shopeitgetssunny• APIexplosionopensuppossibili,es• FocusingonIaaSlayers,LXCLOUDandClemson’sclustershavebeendeveloped/enhancedtoprovisionVMs.
• GreatscalabilitywithOpenNebula• KVMshowsgreatpromiseespeciallywiththesnapshotmode
• PerformancewillgetevenbeOer• MayneedspecializedjobmanagementsystemstomakeuseofCloudsacrossmul,‐site
August 20th, Fermi Lab 50
ThankstoNSF,DOEandOSGThankstoLanceStout,MikeMurphy,
MichaelFenn,LintonAbrahamandalltheotherstudents…
ThankstoCERNandtheIT/PES‐PSgroupThankstoJeromeLauret,MaohewWalker
Ques&ons?:[email protected]://cirg.cs.clemson.edu
August 20th, Fermi Lab 51
Outline
• CloudBasics• BuildingaCloudProvider
– Lxcloud@CERN(Incollabora,onwithUlrichSchwickerath,EwanRoche,BelmiroMoreiraandRomainWartel)
• VOCsandClouds– ResearchdoneatClemson
August 20th, Fermi Lab 52
VOC:VirtualOrganiza&onCluster(JGC+FGCSpapers)
August 20th, Fermi Lab 53
WhyVOCsakaClouds?• Observa,onthatwhatpeoplewantisresourceswiththeirownOS/Appsandcentralscheduling:Pilotjobframeworks.
• AcloudisaclusteroverWAN
• Thereforethereisaneedfor– Awaytorequest/startthenodes– Awaytocreateavirtualnetwork– Awaytorunjobsinthem
• VerysimilartoglideinWMSbutthepilotsasktostartVMs
August 20th, Fermi Lab 54
Mul&‐SiteOverlay(ICAC2010)
August 20th, Fermi Lab 55
VOCImplementa&on
• Mul,pleconfigura,ons:– Type1:SharedheadnodeonPhysicalcluster,VOisunawareofVOC(e.gLXCLOUD)
– Type2:VOprovidesvirtualheadnodesonmul,plegridsites.
– Type3:VOusesanoverlaynetworkwithasingleheadnode(e.gSTAR).
August 20th, Fermi Lab 56
Type1:Implementa&on
• KVMvs.Xenforeaseofuse
• NormalClusteru,li,es/techniques
• NFSshare• AndPVFSsetup
• KVMoffersasnapshotmodethatgivesusabilitytouseasingleimagefile.Writesaretemporary
August 20th, Fermi Lab 57
Load‐DrivenProvisioning(CCGRID09)• DynamicProvisioningisdoneviatheuseofaWatchdogontheVOCheadnode
• WatchdogmonitorsincomingjobsontheOSGgatekeeper(Condorjobmanagerisused)
• Whenjobsareinthelocalschedulerqueue,thewatchdogstartsaVMonaphysicalhost(sta,cmappingbetweenhostandguestcurrently).XML‐RPCsystem
• WhenVMstarts,CondorinsidetheVMstartsandadver,zesitspresencetothecentralmanager‐>Jobsrun.
August 20th, Fermi Lab 58
ExperimentalResults• EngageVOonOSG• SiteClemson‐BirdnestonOSGProduc,on
• Clustersizerespondstoload,Simula,onResultsconfirm(PendingIPDPSpaper,simulator:simVOCavailableathOp://cirg.cs.clemson.edu/soTware/simvoc)
August 20th, Fermi Lab 59
From: ACAT 2010, February 22-27th Jaipur/India
Engage VO on OSG