How the INDIGO-DataCloud computing platform aims at ... · How the INDIGO-DataCloud computing...

Preview:

Citation preview

HowtheINDIGO-DataCloudcomputingplatformaims

athelpingscientificcommunities

RIA-653549Giacinto DONVITO

INDIGOTechnicalDirectorINFNBari

giacinto.donvito@ba.infn.it

INDIGO-DataCloud

• AnH2020projectapprovedinJanuary2015intheEINFRA-1-2014call• 11.1M€,30months (fromApril2015toSeptember2017)

• Who:26Europeanpartnersin11Europeancountries• CoordinationbytheItalianNationalInstituteforNuclearPhysics(INFN)• Includingdevelopersofdistributedsoftware,industrialpartners,researchinstitutes,universities,e-infrastructures

• What:developanopensourceCloudplatform forcomputinganddata(“DataCloud”)tailoredtoscience.

• For:multi-disciplinaryscientificcommunities• E.g.structuralbiology, earthscience,physics,bioinformatics, culturalheritage,astrophysics,lifescience,climatology

• Where:deployableonhybrid(publicorprivate)Cloudinfrastructures• INDIGO=INtegratingDistributeddataInfrastructuresforGlobalExplOitation

• Why:answertothetechnologicalneedsofscientistsseekingtoeasilyexploitdistributedCloud/Gridcomputeanddataresources. 2

FromthePaper“AdvancesinCloud”

• ECExpertGroupReportonCloudComputing,http://cordis.europa.eu/fp7/ict/ssai/docs/future-cc-2may-finalreport-experts.pdf

To reach the full promises of CLOUD computing, major aspects have not yet beendeveloped and realised and in some cases not even researched. Prominent among theseare open interoperation across (proprietary) CLOUD solutions at IaaS, PaaS and SaaSlevels. A second issue is managing multitenancy at large scale and in heterogeneousenvironments. A third is dynamic and seamless elasticity from in- house CLOUD to publicCLOUDs for unusual (scale, complexity) and/or infrequent requirements. A fourth is datamanagement in a CLOUD environment: bandwidth may not permit shipping data to theCLOUD environment and there are many associated legal problems concerning securityand privacy. All these challenges are opportunities towards a more powerful CLOUDecosystem.[…] A major opportunity for Europe involves finding a SaaS interoperable solution acrossmultiple CLOUD platforms. Another lies in migrating legacy applications without losingthe benefits of the CLOUD, i.e. exploiting the main characteristics, such as elasticity etc.

3

INDIGOAddressesCloudGaps

• INDIGOfocusesonusecasespresentedbyitsscientificcommunities toaddressthegapsidentifiedbythepreviouslymentionedECReport,withregardto:• Redundancy/reliability• Scalability(elasticity)• Resourceutilization• Multi-tenancyissues• Lock-in• MovingtotheCloud• Datachallenges:streaming,multimedia,bigdata• Performance

• Reusingexistingopensourcecomponentswhereverpossibleandcontributingtoupstreamprojects (suchasOpenStack,OpenNebula,Galaxy,etc.)forsustainability.

4IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

INDIGOandotherEuropeanProjects• TheINDIGOservicesarebeingdevelopedaccordingtotherequirementscollectedwithinmanymultidisciplinaryscientificcommunities,suchasELIXIR,WeNMR,INSTRUCT,EGI-FedCloud,DARIAH,INAF-LBT,CMCC-ENES,INAF-CTA,LifeWatch-Algae-Bloom,EMSO-MOIST,EuroBioImaging.However,theyareimplementedsothattheycanbeeasilyreusedbyotherusercommunities.• INDIGOhasstrongrelationshipswithcomplementaryinitiatives,suchasEGI-EngageontheoperationalsideandAARCwithrespecttoAuthN/AuthZ policies.UsersofEC-fundedinitiativessuchasPRACE andEUDAT arealsoexpectedtobenefitfromthedeploymentofINDIGOcomponentsinsuchinfrastructures.• SeveralNational/Regionalinfrastructuresarecoveredbythe26INDIGOpartners,locatedin11Europeancountries.• INDIGOismentionedintherecentImportantProjectofCommonEuropeanInterest(IPCEI) fortheexploitationofHPCandHTCresourcesatnational,regionalandEuropeanlevels.

5IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

WorkPackages

6IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

INDIGO-DataCloudGeneralArchitecture

7

JSAGA/JSAGAAdaptorsFuture GatewayEngineFuture GatewayRESTAPI

OtherScienceGateways

Mobile Apps

OpenMobileToolkit

Ophidpiaplugin

LONIplugin

Taverna,Keplerplugin

AdminPortlets

UserPortlets

DataAnalitics

WorkflowPortlets

SGMonGUIClients

FutureGatewayPortal WorkflowsMobileclientsSupportservices

WP6Services

Kubernetes Cluster

IAM

Service

PaaS

Orchestrator

QoS/SLA

CloudProvider

Ranker

Monitoring

Infrastructure

Manager

TOSCA

TOSCAWP5

Services

Onedata Dynafed

FTSDataServices

REST/CDMI/Wedbav/posix/GridftpOIDC

Accounting

Non-INDIGO

IaaS

NativeIaaS API

Heat/IM

TOSCA

WP4Services

Mesos

ClusterMesos

Cluster

Aut.Scaling

Service

Storage

Service

S3/CDMI/Posix/WebdavGridFTP

Smart

Scheduling

SpotIstances

Native

Docker

QoS Support

Identity

Armonization

Local

Repository

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

IaaSFeatures(1)

• Improvedschedulingforallocationofresources bypopularopensourceCloudplatforms,i.e.OpenStackandOpenNebula.• Enhancementswilladdressbothbetterschedulingalgorithmsandsupportforspot-instances.Thelatterareinparticularneededtosupportallocationmechanisms similartothoseavailableonpubliccloudssuchasAmazonandGoogle.

• Wewillalsosupportdynamicpartitioningofresourcesamong“traditionalbatchsystems”andCloudinfrastructures(forsomeLRMS).

• SupportforstandardsinIaaSresourceorchestrationengines throughtheuseoftheTOSCAstandard.• ThisovercomestheportabilityandusabilityproblemthatwaysoforchestratingresourcesinCloudcomputingframeworkswidelydifferamongeachother.

• ImprovedIaaSorchestrationcapabilities forpopularopensourceCloudplatforms,i.e.OpenStackandOpenNebula.• EnhancementswillincludethedevelopmentofcustomTOSCAtemplatestofacilitateresourceorchestrationforendusers,increasedscalabilityofdeployedresourcesandsupportoforchestrationcapabilitiesforOpenNebula.

8IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

IaaSFeatures(2)

• ImprovedQoS capabilitiesofstorageresources.• Bettersupportofhigh-levelstoragerequirementssuchasflexibleallocationofdiskortapestoragespaceandsupportfordatalifecycle.Thisisanenhancementalsowithrespecttowhatiscurrentlyavailableinpublicclouds,suchasAmazonGlacierandGoogleCloudStorage.

• Improvedcapabilitiesfornetworkingsupport.• EnhancementswillincludeflexiblenetworkingsupportinOpenNebula andhandlingofnetworkconfigurationsthroughdevelopmentsoftheOCCIstandardforbothOpenNebula andOpenStack.

• ImprovedandtransparentsupportforDockercontainers.• IntroductionofnativecontainersupportinOpenNebula,developmentofstandardinterfacesusingtheOCCIprotocoltodrivecontainersupportinbothOpenNebulaandOpenStack.

9IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

PaaSFeatures(1)

• ImprovedcapabilitiesinthegeographicalexploitationofCloudresources.• Endusersneednottoknowwhereresourcesarelocated,becausetheINDIGOPaaSlayerishidingthecomplexityofbothschedulingandbrokering.

• StandardinterfacetoaccessPaaSservices.• Currently,eachPaaSsolutionavailableonthemarketisusingadifferentsetofAPIs,languages,etc.INDIGOwillusetheTOSCAstandardtohidethesedifferences.

• SupportfordatarequirementsinCloudresourceallocations.• Resourcescanbeallocatedwheredataisstored.

• IntegrateduseofresourcescomingfrombothpublicandprivateCloudinfrastructures.• TheINDIGOresourceorchestratoriscapableofaddressingbothtypesofCloudinfrastructuresthroughTOSCAtemplateshandledateitherthePaaSorIaaSlevel.

10IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

PaaSFeatures(2)

• Distributeddatafederations supportinglegacyapplicationsaswellashighlevelcapabilitiesfordistributedQoS andDataLifecycleManagement.• ThisincludesforexampleremotePosix accesstodata.

• IntegratedIaaSandPaaSsupportinresourceallocations.• Forexample,storageprovidedattheIaaSlayerisautomaticallymadeavailabletohigher-levelallocationresourcesperformedatthePaaSlayer.

• Transparentclient-sideimport/exportofdistributedClouddata.• Thissupportsdropbox-likemechanismsforimportingandexportingdatafrom/totheCloud.ThatdatacanthenbeeasilyingestedbyCloudapplicationsthroughtheINDIGOunifieddatatools.

• Supportfordistributeddatacachingmechanismsandintegrationwithexistingstorageinfrastructures.• INDIGOstoragesolutionsarecapableofprovidingefficientaccesstodataandoftransparentlyconnectingtoPosix filesystemsalreadyavailableindatacenters.

11IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

PaaSFeatures(3)

• Deployment,monitoringandautomaticscalabilityofexistingapplications.• Forexample,existingapplicationssuchaswebfront-endsorR-Studioserverscanbeautomaticallyanddynamicallydeployedinhighly-availableandscalableconfigurations.

• Integratedsupportforhigh-performanceBigDataanalytics.• ThisincludescustomframeworkssuchasOphidia(providingahighperformanceworkflowexecutionenvironmentforBigDataAnalyticsonlargevolumesofscientificdata)aswellasgeneralpurposeenginesforlarge-scaledataprocessingsuchasSpark,allintegratedtomakeuseoftheINDIGOPaaSfeatures.

• Supportfordynamicandelasticclustersofresources.• ResourcesandapplicationscanbeclusteredthroughtheINDIGOAPIs.Thisincludesforexamplebatchsystemson-demand(suchasHTCondor orTorque)andextensibleapplicationplatforms(suchasApacheMesos)capableofsupportingbothapplicationexecutionandinstantiationoflong-runningservices.

12IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

AAIFeatures

• Provideanadvancedsetoffeaturesthatincludes:• Userauthentication(supportingSAML,OIDC,X.509)• Identityharmonization(linkheterogeneousAuthN mechanismstoasingleVOidentity)• ManagementofVOmembership(i.e.,groupsandotherattributes)• Managementofregistrationandenrolmentflows• ProvisioningofVOstructureandmembershipinformationtoservices• Management,distributionandenforcementofauthorizationpolicies

13IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

StorageQualityofServiceandtheCloud

14

Amazon S3 Glacier

Google Standard DurableReducesAvailability Nearline

HPSS/GPFS CorrespondstotheHPSSClasses(customizable)

dCache Resilient TAPEdisk+tape

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Nextstep:DataLifeCycle

15

• DataLifeCycleisjustthetimedependentchangeof• StorageQualityofService• OwnershipandAccessControl(PIOwned,noaccess,SiteOwned,Publicaccess)• Paymentmodel:Payasyougo;Payinadvanceforrestoflifetime.• Maybeotherthings

6m 1years 10years

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

DataFederation

AmazonS3

DNS:p-aws-useast

INFNItaly

DockerOneclient

Docker

AWSUSA

DockerOnezone

VMonezone

DockerOneclient

Docker

NFSServer

VMoneprovider

VMnfs

VMoneclient

POSIXVolume

DockerOneclient

DockerUPVSpain

VM:demo-onedata-upv-provider

DockerOneclient

LaptopOSX

SAMBAExport

boot2docker

20IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

17

FrontendServices/Toolkit

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Integration schemas

• WeprovidethegraphicaluserinterfacesintheformofthescientificgatewaysandworkflowsandthewaytoaccesstheINDIGOPaaS servicesandsoftwarestack,andallowdefineandsetuptheon-demandinfrafortheWP2usecases.• Settingupwholeusecaseinfrastructure:Theadministratorwillbeprovidedwiththereadytousereceiptsthathewillbeabletocustomize.Thefinaluserswillbeprovidedwiththeserviceend-pointsandwillnotbeawareofthebackend.

• UsetheINDIGOfeaturesfromtheirownPortals: Usercommunities, havingtheirownScientificGatewaysetup,canexploittheFutureGateway RESTAPItodealwithINDIGOwholesoftwarestack.

• UseoftheINDIGOtoolsandportals, including theFutureGateway,ScientificWorkflowsSystems,BigDataAnalyticsFrameworks(likeOphidia),MobileApplications.InthisscenariothefinalusersaswellasdomainadministratorswillusetheGUItools.Theadministratorwilluseitasdescribedinfirstcase.Inadditiondomainspecificuserswillbeprovidedwithspecificportlets/workflows/apps thatwillallowgraphicalinteractionwiththeirapplicationsrunviaINDIGOsoftwarestack.

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 18

FromCSGFtoFutureGateway

GridEngine

JSAGA

Portlet Portlet …

ClassicCSGF (before INDIGO)

Liferay/Glassfish

JSAGA

Portlet Portlet …

FutureGateway Approach (INDIGO)

Liferay/Tomcat

Comunication Portlet-GridEngine-JSAGAonly possiblewithJAVAlibraries

APIServer

Comunication Portlet-APIServerviaRESTAPIs,thisallowstoserveexternalapplicationsTheAPIServerinteractsviaJAVA librariestoJSAGA

RESTAPIs

Web/MobileApps

• ThesameRESTAPIscouldbeusedbyMobileApps

• ThoseAPIsmakeeasiertheinteractionwiththePaaS layer

• ThoseRESTAPIsprovideaneasyexploitationofINDIGOCapabilitiestonon-INDIGOApplications

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud19

Ophidia framework

• Ophidia isabigdataanalyticsframeworkforeScience• Primarilyusedfortheanalysisofclimatedata,exploitableinmultipledomains• “Datacube”abstractionandOLAP-basedapproachforbigdata• Supportforarray-baseddataanalysisandscientificdataformats• Parallelcomputingtechniquesandsmartdatadistributionmethods• ~100array-basedprimitivesand~50datacubeoperators

• i.e.:datasub-setting, dataaggregation,array-basedtransformations,datacube roll-up/drill-down,datacubeimport,etc.

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 20

INDIGOmoduleforKepler

• TheKeplerscientificworkflowsystemisanopensourcetoolthatenablescreation,executionandsharingofworkflowsacrossabroadrangeofscientificandengineeringdisciplines.• FirstversionoftheINDIGOmoduledelivered,graduallyadded newfunctionalitiesavailablefortheusers.• INDIGOmodulebased ontheFutureGateway API• Atthemoment,itispossibletobuildworkflowsthatdefinetask,preparesinputsandtriggersexecution.WhileataskisexecutedwithinINDIGO'sinfrastructure,itispossibletocheckitsstatus.• FutureGateway APIclient: https://github.com/indigo-dc/indigoclient• Keplerbasedactors: https://github.com/indigo-dc/indigokepler

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 21

22

Usecasesexamples

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Service Deployment andapplication execution

Integrating distributed data infrastructures with INDIGO-DataCloud 21

WearenowworkingonaddingaCaliconetworkconfiguration

Application execution tothe PaaS Layer

• TheINDIGOapproachtotheapplicationdistributionandexecutionis:• BasedonDocker• ExploitsMesos+Chronos• AlltheapplicationexecutionsaredescribedexploitingaTOSCATemplatesviasimpleAPIsorPortlets• Theinput/outputareautomaticallymanagedbythePaaS layer(viaOnedata andexternalendpoints)• Dependencies,retryonfailuressupportedbymeansofChronos• Geographicaldata-awareschedulingprovidebyINDIGOPaaS orchestrator

13IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

• chronos_job_1:• type:tosca.nodes.indigo.Container.Application.Docker.Chronos• properties:• schedule: 'R0/2015-12-25T17:22:00Z/PT5M'• description: 'Executeapp'• command: /bin/bashrun.sh• uris:[]• retries:3• environment_variables:• INPUT_ONEDATA_SPACE: {get_input: InputOnedataSpace}• INPUT_PATH: {get_input: InputPath}• ....• artifacts:• image:• file: indigodatacloud/ambertools_app• type:tosca.artifacts.Deployment.Image.Container.Docker• requirements:• - host:docker_runtime1

• chronos_job_upload:• type:tosca.nodes.indigo.Container.Application.Docker.Chronos• properties:• schedule: 'R0/2015-12-25T17:22:00Z/PT5M'• description: 'Uploadoutputdata'• command: /bin/bashrun.sh• retries:3

• environment_variables:• PROVIDER_HOSTNAME: <ONEDATA_PROVIDER_IP>• ONEDATA_TOKEN:<ROBOTToken>• ONEDATA_SPACE:<path>• INPUT_FILENAME: <inputfilename>• OUTPUT_FILENAME: <inputfilename-->coincindeswithamber-job-01

OUTPUT_FILENAME>• OUTPUT_PROTOCOL:http(s)|ftp(s)|S3|Swift|WebDav• OUTPUT_URL: <outputURL>• OUTPUT_CREDENTIALS: <e.g.username:password>• artifacts:• image:• file: indigodatacloud/jobuploader• type:tosca.artifacts.Deployment.Image.Container.Docker• requirements:• - host:docker_runtime1• - job_predecessor: chronos_job_1•• docker_runtime1:• type:tosca.nodes.indigo.Container.Runtime.Docker• capabilities:• host:• properties:• num_cpus: 0.5• mem_size: 512MB

• tosca_definitions_version:tosca_simple_yaml_1_0• imports:• - indigo_custom_types:https://raw.githubusercontent.com/indigo-dc/tosca-

types/master/custom_types.yaml• description:>• TOSCAexamplesforspecifyingaChronos Jobthatrunsanapplicationusing

Onedata storage.• inputs:• input_onedata_token:• type:string• description:Usertokenrequiredtomounttheuser'sINPUTOnedata space• required:yes• output_onedata_token:• type:string• description:Usertokenrequiredtomounttheuser'sOUTPUTOnedata space.

Itcanbethesameastheinputtoken• required:yes• #data_locality:• #type:boolean• #description:FlagthatcontrolstheINPUTdatalocality:ifyestheorchestrator

willselectthebestprovider,ifnotheuserhastospecifytheprovidertobeused• #required:yes• input_onedata_providers:• type:list• description:ListoffavoriteOnedata providerstobeusedtomounttheInput

Onedata space.Ifnotprovided,datalocalityalgo willbeapplied.• entry_schema:• type:string• default:['']• required:no• output_onedata_providers:• type:list• description:ListoffavoriteOnedata providerstobeusedtomountthe

OutputOnedata space.Ifnotprovided,thesameprovider(s)usedtomounttheinputspacewillbeused.

• entry_schema:• type:string• default:['']• required:no• input_onedata_space:• type:string

• required:yes• output_path:• type:string• description:PathtotheoutputdatainsidetheOutputOnedata space• required:yes• output_filenames:• type:list• description:Listoffilenamesgeneratedbytheapplicationrun• entry_schema:• type:string• required:yes• cpus:• type:float• description:AmountofCPUsforthisjob• required:yes• mem:• type:float• description:AmountofMemory(MB)forthisjob• required:yes• topology_template:• node_templates:• chronos_job:• type:tosca.nodes.indigo.Container.Application.Docker.Chronos• properties:• schedule:'R0/2015-12-25T17:22:00Z/PT5M'• name:'JOB_ID_TO_BE_SET_BY_THE_ORCHESTRATOR'• description:'Executeapp'• command:'/bin/bashrun.sh'• uris:[]• retries:3• environment_variables:• INPUT_ONEDATA_TOKEN: {get_input:input_onedata_token }• OUTPUT_ONEDATA_TOKEN: {get_input:output_onedata_token }• INPUT_ONEDATA_PROVIDERS: {get_input:input_onedata_providers }• OUTPUT_ONEDATA_PROVIDERS: {get_input:output_onedata_providers }• INPUT_ONEDATA_SPACE: {get_input:input_onedata_space }• ....• artifacts:• image:• file:indigodatacloud/ambertools_app• type:tosca.artifacts.Deployment.Image.Container.Docker

UC:Awebportalthat exploits abatch systemtorunapplications

• Ausercommunitymaintainsa“vanilla”versionofportalandcomputingimageplussomespecificrecipestocustomizesoftwaretoolsanddata• Portalandcomputingarepartofthesameimagethatcantakedifferentroles.• Customizationmayincludecreatingspecialusers,copying(andregisteringintheportal)referencedata,installing(andagainregistering)processingtools.• Typicallywebportalimagealsohasabatchqueueserverinstalled.

• Alltherunninginstancesshareacommondirectory.• Differentcredentials:end-userandapplicationdeployment.

13IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

UCInspiration:Galaxyonthecloud

• Galaxycanbeinstalledonadedicatedmachineorasafront/endtoabatchqueue.• Galaxyexposesawebinterfaceandexecutesalltheinteractions(includingdatauploading)asjobsinabatchqueue.• Requiresashareddirectoryamongtheworkingnodesandthefront/end.• Itsupportsaseparatestorageareafordifferentusers,managingthemthroughtheportal.

28IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

UC:Awebportalthat exploits abatchsystem torunapplications

1) Thewebportalisinstantiated,installedandconfiguredautomaticallyexploitingAnsible recipesandTOSCATemplates.

2) Aremoteposix shareisautomaticallymountedonthewebportalusingOnedata

3) Thesameposix shareisautomaticallymountedalsoonworkernodesusingOnedata

4) End-userscanseeandaccessthesamefilesviasimplewebbrowsersorsimilar.5) AbatchsystemisdynamicallyandautomaticallyconfiguredviaTOSCA

Templates6) Theportalisautomaticallyconfiguredinordertoexecutejobonthebatch

cluster7) Thebatchclusterisautomaticallyscaledup&downlookingatthejobloadon

thebatchsystem.IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 29

UC:UseCaseLifecycle

• Preliminary• Theusecaseadministratorcreatesthe“vanilla”imagesoftheportal+computingimage.• Theusecaseadministrator,withthesupportofINDIGOexperts,writestheTOSCAspecificationoftheportal,queue,computingconfiguration.

• Group-specific• Theusecaseadministrator,withthesupportofINDIGOexperts,writesspecificmodulesforportal-specificconfigurations.• Theusecaseadministratordeploysthevirtualappliance.

• Dailywork• UsersAccesstheportalasifitwaslocallydeployedandsubmitJobstothesystemastheywouldhavebeenprovisionedstatically.

30IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

UC:AGraphic Overview

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Future GatewayAPIServer

WP6

WP5

Front-EndPublic IP

Provider

User2)Deploy TOSCAwithVanilla VM/Container

1)StageData

5)Mount

6)AccessWebPortal

Galaxy

4)Install /Configure

WNWNWN …

VirtualElastic Cluster

Orchestrator

IM

OpenNebula

WP4

Other PaaSCore Services

CloudSite

OpenStack

HeatClues

IM

31

TOSCADocuments andDockerfiles perUseCase

INDIGO-DataCloudDocker Hub Organization

Champion+JRA

1.a.1)build,push

1.a.2)Dockerfile(commit)

1.b)AutomatedBuild

DetailedIaaS Status

CCR Workshop, 17/3/2016 Davide Salomoni 32

#nova-docker

•Docker driverforOpenStack Compute(nova).• 3rdpartycomponent,notfromINDIGO,wehavesentsomepatchesand bugfixes.• Deploymentscenario:• Installapackageinthedesiredcomputenodes.• Configurenovaandglancetobeawareofcontainers.

• ItwillbepackagedforLiberty.

34IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

#ONEDock

•Docker driverforOpenNebula.• ItrequiresaworkingOpenNebula 4installation.•Deploymentscenario:• InstallandconfigureONEDock inthefrontendnode.•ConfigurethecomputenodeswithDocker andONErequiredconfig.

34IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

#RepositorySynchronization

•SyncbetweenDocker Huborganizationandlocalregistry/catalogviawebhooks.•ModulesforbothOpenNebula andOpenStack.•StandalonecomponentwithRESTAPI.•Deploymentscenario:• Installapackageandconfigurethewebhooks.•Giveaccesstothesyncusertotheregistry/catalog.

34IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

#opie

• Preemptible InstancesextensionforOpenStack Compute(nova)•NeedsaworkingOpenStack Compute(nova)environment.• API,SchedulerandHostManager aspluggableexternalmodules.•Deploymentscenario:• Installationofapackage+a*manual*modification(i.e.applyingapatch)onthecomputeAPI.• Configurenova(i.e.nova.conf)tousethenewHostManager and• PreemtibleFilterScheduler.• ItwillbereleasedforLiberty.

34IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

#Synergy

• FairshareschedulingsupportforOpenStack.•NeedsaworkingOpenStack Compute(nova)environment.• Externalandstandalonecomponent,nomodificationsneeded.•Deploymentscenario:• Installsynergyasaseparateservice,configureittointeract withOpenStack.• ItwillbereleasedforLiberty(forthe1strelease).

34IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

#uDocker

•Runcontainersinbatchsystemsinuserspace,makingpossibleto downloadandruncontainersbynon-privilegedusers.• ItdoesnotrequireDocker atall.•Deploymentscenario:• Nodeployment.Itisasimplepythontool.

• ThishasbeentestedtorunsomecontainersinlocalHPCsystem(CSIC),withsuccess,soitmaybeusefulforEGIintheGridcontext.

34IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

#AAIforOpenStack

• TheOpenID ConnectsupporthasbeenimprovedintheKeystoneauthenticationclientlibraries(itdidnotworkoutofthebox).•Deploymentscenario:• Keystoneserverconfigurationneeded.• NothingrequiredattheCLI,changescontributedupstream(althoughnotreleasedyet).

34IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

#AAIforOpenNebula

•BasedontheTokenTranslationService(TTS).•Deploymentscenario:• Noinstallationrequiredatthesitelevel,onlyconfiguration (similartoPERUN).

34IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

INDIGOFAQ

• HowdoINDIGOachieveresourceredundancyandhighavailability?• Thisisachievedatmultiplelevels:

• atthedatalevel,redundancycanbeimplemented exploitingthecapabilityofINDIGO'sOnedata ofreplicatingdataacrossdifferentdatacenters.

• atthesitelevel, itispossibletoaskforcopiesofdatatobeforexampleonbothdiskandtapeusingtheINDIGOQoS storagefeatures.

• forservices,theINDIGOarchitectureusesMesos andMarathontoprovideautomaticservicehigh-availabilityandloadbalancing.Thisautomationiseasilyobtainableforstateless services; forstatefulservicesthisisapplication-dependent butitcannormallybeintegratedintoMesos through,forexample,acustomframework(examplesofwhichareprovidedbyINDIGO).

• HowdoINDIGOachieveresourcescalability?• Firstofall,wecandistinguishbetweenvertical(scaleup)andhorizontal(scaleout)scalability.INDIGOprovidesboth:• Mesos andMarathonhandleverticalscalabilitybydeployingDocker containerswithanincreasingamountofresources.

• TheINDIGOPaaS OrchestratorhandleshorizontalscalabilitythroughrequestsmadeattheIaaS leveltoaddresourceswhenneeded. 34

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

INDIGOFAQ

• HowdoINDIGOachieveresourcescalability?• TheINDIGOsoftwaredoesthisinasmartway,i.e.forexampleitdoesnotlookatCPUloadonly:• InthecaseofadynamicallyinstantiatedLRMS,itchecksthestatusofjobsandqueuesandaccordinglyaddsorremovecomputingnodes.

• InthecaseofaMesos cluster,incasethereareapplicationstostartandtherenofreeresources,INDIGOstartsupmorenodes.ThishappenswithinthelimitsofthesubmittedTOSCAtemplates.Inotherwords,anygivenuserstayswithinthelimitsoftheTOSCAtemplatehehassubmitted;thisistruealsoforwhatregardsaccountingpurposes.

• Howdoyouknowwhenandwhereresourcesareavailable?• WeareextendingtheInformationSystemavailableintheEuropeanGridInfrastructure(EGI)toinformtheINDIGOPaaS orchestratorabouttheavailableIaaSinfrastructuresandabouttheservicestheyprovide.ItisthereforepossiblefortheINDIGOorchestratortooptimallychooseacertainIaaS infrastructuregiven,forexample,thelocationofacertaindataset.

35IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Conclusions

• Firstofficialreleasewillbe:1st August

• Thefirstprototypeisalreadyavailable:• Notalltheservicesandfeaturesareavailable• Thisisforinternalevaluation,butalreadysomeservicescouldbetested

• Alotofimportantdevelopmentarebeingcarriedonwiththeoriginaldeveloperscommunitysothatthecodemantenance isnot(only)inourhands

43 IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Thankyou

https://www.indigo-datacloud.euBetterSoftwareforBetterScience.

44IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Recommended