Transcript
Page 1: Building Scalable Scientific Applications using Makeflow

Building Scalable Scientific Applications using Makeflow

Dinesh Rajan and Douglas ThainUniversity of Notre Dame

Page 2: Building Scalable Scientific Applications using Makeflow

The Cooperative Computing Labhttp://nd.edu/~ccl

Page 3: Building Scalable Scientific Applications using Makeflow

The Cooperative Computing Lab• We collaborate with people who have large

scale computing problems in science, engineering, and other fields.

• We operate computer systems on the O(10,000) cores: clusters, clouds, grids.

• We conduct computer science research in the context of real people and problems.

• We develop open source software for large scale distributed computing.

3http://www.nd.edu/~ccl

Page 4: Building Scalable Scientific Applications using Makeflow

Science Depends on Computing!

AGTCCGTACGATGCTATTAGCGAGCGTGA…

Page 5: Building Scalable Scientific Applications using Makeflow

The Good News:Computing is Plentiful!

5

Page 6: Building Scalable Scientific Applications using Makeflow
Page 7: Building Scalable Scientific Applications using Makeflow

7

I have a standard, debugged, trusted application that runs on my laptop. A toy problem completes in one hour.A real problem will take a month (I think.)

Can I get a single result faster?Can I get more results in the same time?

Last year,I heard aboutthis grid thing.

What do I do next?

This year,I heard about

this cloud thing.

Page 8: Building Scalable Scientific Applications using Makeflow

Should I port my program to MPI or Hadoop?Learn C / JavaLearn MPI / HadoopRe-architectRe-writeRe-testRe-debugRe-certify

Page 9: Building Scalable Scientific Applications using Makeflow

What if my application looks like this?

Page 10: Building Scalable Scientific Applications using Makeflow

Our Philosophy:

• Harness all the resources that are available: desktops, clusters, clouds, and grids.

• Make it easy to scale up from one desktop to national scale infrastructure.

• Provide familiar interfaces that make it easy to connect existing apps together.

• Allow portability across operating systems, storage systems, middleware…

• Make simple things easy, and complex things possible.

• No special privileges required.

Page 11: Building Scalable Scientific Applications using Makeflow

I can get as many machineson the cluster/grid/cloud as I want!

How do I organize my applicationto run on those machines?

Page 12: Building Scalable Scientific Applications using Makeflow

Makeflow:A Portable Workflow System

Page 13: Building Scalable Scientific Applications using Makeflow

An Old Idea: Makefiles

13

part1 part2 part3: input.data split.py ./split.py input.data

out1: part1 mysim.exe ./mysim.exe part1 >out1

out2: part2 mysim.exe ./mysim.exe part2 >out2

out3: part3 mysim.exe ./mysim.exe part3 >out3

result: out1 out2 out3 join.py ./join.py out1 out2 out3 > result

Page 14: Building Scalable Scientific Applications using Makeflow

Makeflow = Make + Workflow

• Provides portability across batch systems.• Enable parallelism (but not too much!)• Fault tolerance at multiple scales.• Data and resource management.

14

Makeflow

Local Condor SGE WorkQueue

http://www.nd.edu/~ccl/software/makeflow

Page 15: Building Scalable Scientific Applications using Makeflow

Makeflow Language - Rules

• Each rule specifies:– a set of target files to

create;– a set of source

files needed to create them;

– a command that generates the target files from the source files.

part1 part2 part3: input.data split.py ./split.py input.data

out1: part1 mysim.exe ./mysim.exe part1 >out1

out2: part2 mysim.exe ./mysim.exe part2 >out2

out3: part3 mysim.exe ./mysim.exe part3 >out3

result: out1 out2 out3 join.py ./join.py out1 out2 out3 > result

out1 : part1 mysim.exemysim.exe part1 > out1

Page 16: Building Scalable Scientific Applications using Makeflow

You must stateall the files

needed by the command.

Page 17: Building Scalable Scientific Applications using Makeflow

PrivateCluster

CampusCondor

Pool

PublicCloud

Provider

CRCSGE

Cluster

Makefile

Makeflow

Local Files and Programs

Makeflow + Batch System

makeflow –T sge

makeflow –T condor

Work Queue

Work Queue

Page 18: Building Scalable Scientific Applications using Makeflow

Drivers

• Local• Condor• SGE• Batch• Hadoop• WorkQueue

• Torque• MPI-Queue• XGrid• Moab

Page 19: Building Scalable Scientific Applications using Makeflow

How to run a Makeflow

• Run a workflow locally (multicore?)– makeflow -T local sims.mf

• Clean up the workflow outputs:– makeflow –c sims.mf

• Run the workflow on Torque:– makeflow –T torque sims.mf

• Run the workflow on Condor:– makeflow –T condor sims.mf

Page 20: Building Scalable Scientific Applications using Makeflow

Example: Biocompute Portal

Generate Makefile

Makeflow

RunWorkflow

ProgressBar

Transaction Log

UpdateStatus

CondorPool

SubmitTasks

BLASTSSAHASHRIMPESTMAKER…

Page 21: Building Scalable Scientific Applications using Makeflow

Makeflow Documentationhttp://www.nd.edu/~ccl/software/makeflow/


Recommended