Enabling Cloud Bursting for Life Sciences within Galaxy

Preview:

Citation preview

Enabling Cloud Bursting for Life Sciences within Galaxy

Enis Afgan Johns Hopkins University

Galaxy Team

Slides available at bit.ly/gxy-bursting

What is •  A data analysis and integration tool

•  A (free for everyone) web service integrating a wealth of tools, compute resources, terabytes of reference data and permanent storage

•  Open source software that makes integrating your own tools and data and customizing for your own site simple

?

usegalaxy.org or

any of the other 60+ public servers

$ hg clone bitbucket.org/galaxy/galaxy-dist $ sh run.sh

Galaxy

/Tools

/Data

/Indices

DB

Compute resources

Galaxy

GalaxyGalaxy

RNA-Seq

Assembly

QualityControl (QC)

Local Federated

GalaxyObjectStore

interface

DB

Indices AData A

Tools A S3, SwiftPulsar

Indices BData B

Tools B

Local

Pulsar

Indices C

Data C

Tools C

Artifact & job provenance

RNA-Seq, Assembly, QC

GalaxyGalaxy

CloudMan

Focus on Cloud Bursting Peak usage scenarios

Resource heterogeneity

Software licensing

Software installation restrictions

National cyber infrastructure resource access

Per-user, merit-based resource access

Burst Triggers When?

Resource capacity

Job requirements

Data locality

System configuration

User preferences

Where? Remote resource availability

Cost

Burst Architecture 1.  Galaxy dynamic job destination framework

2.  Galaxy CloudMan cluster with Pulsar

3.  A job destination mapper function

CloudManPulsar

CloudManPulsar

LocalDRM

Galaxy<dynamic)job)destination)framework)/>

f(mapper)

Pulsar A standalone job manager server for Galaxy

Can be deployed on dedicated or transient servers (even MS Windows!)

Handles data staging and remote job execution

Pulsarjob

Stage data Submit job Monitor job

Send back the data

1. Galaxy dynamic job destination framework

Define job execution properties

•  Runners: local, Slurm, HTCondor, DRMAA, Pulsar, …

•  Destinations: resource & job properties (e.g., DRM queue, wall time)

2. CloudMan with Pulsar A.  Launch a Galaxy on the Cloud instance

B.  Enable Pulsar service

C.  Add the instance as a destination in job config

Tool availability

•  Direct tool install

•  Docker images

3. Job mapper function Determine job destination at runtime

import pyslurm   def cloud_burst():    n = pyslurm.node()    nodes_state = n.get()    available_nodes = []    for node in nodes_state.itervalues():        if node['total_cpus'] > 0:            available_nodes.append(node)    if not available_nodes:        return 'pulsar_nectar_galaxy'    return 'drmaa_runner’

job destination

CloudManPulsar

CloudManPulsar

LocalDRM

Galaxy<dynamic)job)destination)framework)/>

f(mapper)Pulsar ?

An outcome?

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0

100

200

300

400

500

600

700

800

900

1000 20

13-4

1 20

13-4

3 20

13-4

5 20

13-4

7 20

13-4

9 20

13-5

1 20

13-5

3 20

14-0

2 20

14-0

4 20

14-0

6 20

14-0

8 20

14-1

0 20

14-1

2 20

14-1

4 20

14-1

6 20

14-1

8 20

14-2

0 20

14-2

2 20

14-2

4 20

14-2

6 20

14-2

8 20

14-3

0 20

14-3

2 20

14-3

4 20

14-3

6 20

14-3

8 20

14-4

0 20

14-4

2 20

14-4

4 20

14-4

6 20

14-4

8 20

14-5

0 20

14-5

2 20

15-0

1 20

15-0

3

Jobs

run

to c

ompl

etio

n (c

ount

)

Aver

age

wai

t tim

e (m

inut

es)

Week

Average wait

Jobs run to completion

usegalaxy.org Start bursting No job wait

More jobs

An outcome?

usegalaxy.org

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

14.00%

2013

-41

2013

-43

2013

-45

2013

-47

2013

-49

2013

-51

2013

-53

2014

-02

2014

-04

2014

-06

2014

-08

2014

-10

2014

-12

2014

-14

2014

-16

2014

-18

2014

-20

2014

-22

2014

-24

2014

-26

2014

-28

2014

-30

2014

-32

2014

-34

2014

-36

2014

-38

2014

-40

2014

-42

2014

-44

2014

-46

2014

-48

2014

-50

2014

-52

2015

-01

2015

-03

Jobs

del

eted

whi

le q

ueue

d (%

of j

obs s

ubm

itted

)

Week

User frustration level