W6 Exercises Workflows - doc.grid.surfsara.nl

EXERCISES:

BACKGROUND

Workflow management systems offer a useful abstraction layer to more easily

run distributed computations on e‐infrastructures. These systems enable the

definition of programs as “workflow components” which can then be combined

(linked) to each other to perform more complex operations. Each component

potentially can run on a different (type of) infrastructure, for example grid,

cloud, a local cluster or a server.

WS‐PGRADE is a web‐portal (or science gateway) to create and run workflows

using the gUse workflow management system, which coordinates the execution of

jobs in various types of infrastructures. In these practical exercises the

goal is to have minimal hands‐on experience with this system by creating and

executing simple workflows on the local VM cluster and on the Grid.

BASIC USAGE OF WS‐PGRADE:

o Download and set up a new VM with the WS‐Pgrade portal installed.

Get the new VM from here

Import the image in VirtualBox

o Download and install your certificate to the new VM.

If you need help, check section 2 in the Quick Start Guide

(Lecture 16) tutorial.

You won’t need to request again for a tutor VO membership,

unless you haven’t done this yet.

o Start a Firefox session in the new VM and open the WS‐Pgrade webpage

using this link:

http://grid‐mooc:8080/liferay‐portal‐6.1.0/

o Sign in at the top left corner using the following account:

Email Address: mooc@grid‐mooc.test

Password: mooc

There are two ways to get your proxy on the portal:

A)

• Open a terminal, create a non‐vomsified proxy and upload this to the

Myproxy server. To do this, type:

o $ voms‐proxy‐init

o $ myproxy‐init ‐l <username> ‐c 0 ‐t 100

Replace <username> with any name of your preference. You will be asked to

enter your GRID pass phrase (the password that was sent to you with your

certificate) and a MyProxy pass phrase that will be used in the WS‐pgrade

portal.

• Sign in to the WS‐Pgrade portal and go to Security ‐> Certificate

• Download the proxy you uploaded on the MyProxy server. Fill in the form

with the values:

o Hostname: px.grid.sara.nl

o Port: 7512

o Login: the <username> you typed in the previous myproxy‐init

command

o Password: the MyProxy pass phrase you typed in the previous

myproxy‐init command

o Lifetime: 100 hours

o Press Download button

o Press Associate to VO button

o Choose tutor and press OK

From now on you can access grid computing and storage resources from the

portal as a member of the tutor VO.

B) Alternatively, you can upload your .p12 certificate to the portal.

• Sign in to the WS‐Pgrade portal and go to Security ‐> Certificate ‐>

Upload

• Download the proxy you uploaded as described in A).

In this exercise you will do the following steps (see more details below):

a) Create script (will be executed by the job)

b) Create abstract workflow (Graph)

c) Create concrete workflow

d) Configure workflow (specify script to run and inputs for the workflow)

a) Create script

• This simple script will be the same used in the practice for grid and

cluster job execution: Download the hello‐script.sh in your VM

b) Create abstract workflow (graph)

• Go to Workflow‐>Graph

• Choose Graph Editor

• On the Graph Editor window

o Create a Job (Graph‐> New Job)

o Configure name and description of the job and graph (right mouse,

Properties)

o Save graph (Graph‐>Save As)

• On browser window, press Refresh

• The new Graph should show on the list


• Go to Workflow‐>Create Concrete

• Select a graph

• Fill in name for the workflow

• Press Ok(you should see “workflow has been created successfully”)

d) Configure the concrete workflow

• Go to Workflow‐>Concrete

• Find the newly created workflow in list, press Configure

• Click on the job

• Go to Job Executable tab

• Click on pbs as the Type (this will make the job run on the VM

‘cluster’)

• For “Executable code of binary”: press Browse, upload file with script

(hello‐script.sh)

• Fill a string into the “Parameter” box (this will be given as argument

to the script for the “hello” message)

• Click on green “Tick” sign

• Should see

“Save accepted within the browser. Do not forget to upload the

modified configuration on server!”

• Close the Configure window (top right)

• Save concrete workflow: press the diskette icon (should show success).

Press Ok

• Click green left arrow

From now on this workflow can be executed


a. Submit workflow

b. Monitor workflow execution

c. Check the output generated by the workflow

a) Submit workflow


• Find workflow in list, make sure it is configured

• Press Submit button, press Yes (should see “The workflow has been

submitted”), press Yes

b) Monitor workflow execution

• Press Details button

• Press Details again (will see more information about the individual

jobs, in this case only one)

• Press View finished or View all contents (this will show more details of

the selected jobs (finished or all) )

• Press std. Output, std. Error to see messages written by the program

c) Check the output

• The results of the “Hello” job are written to the standard output file,

which can be inspected by pressing the button std. Output in the

detailed view of the jobs

5.


a. Reconfigure workflow (using gLite instead of pbs)

b. Run workflow again (see exercise 4)

a) Reconfigure the existing concrete workflow


• Find workflow in list, press Configure


• Choose gLite as Type (this will make the job run on the Grid)

• Choose tutor next to Grid

• Browse for the hello‐script.sh and fill a string into the “Parameter”

box



• Save concrete workflow (press the diskette icon). Press Ok


b) Submit the workflow like before and check the status of this workflow from

time to time. The execution might take a while. You can retrieve the results

only when the job shows as finished (or error).


a) Create script


c) Create concrete (as in exercise 3)

d) Configure concrete

e) Run workflow (as in exercise 4 or 5)

f) Retrieve output files and study their content

a) Create script

This script reads one text input file (inputFile) and receives an argument

($1) to generate an output file (outputFile): Download the hello‐script‐

ports.sh in your VM.


• Go to Workflow‐>Graph

• Start Graph Editor

• On the Graph Editor window

o Create a Job (Graph‐> New Job or icon)

o Create two ports (Select Job, Graph‐>New port)

o Save graph (Graph‐>Save As)

• On browser window, press Refresh

• The new Graph should show on the list


d) Configure the concrete workflow


• Find newly created workflow in list, press Configure


• Choose pbs or glite as Type

• Go to Job Executable tab

• For “Executable code of binary”: press Browse and upload file containing

script: hello‐script‐ports.sh

• Fill a string into the “Parameter” box (this will be given as argument

to the script)


The following will configure the inputs of the job

• Go to Job I/O tab

• Click on the input port (configuration form will open)

• Fill box Input Port’s Internal File Name with inputFile (this is the

name of the input file in the script)

• Click on the big Pi icon as the Source of input directed to this port

• Type a string in the box (this will be copied inside a file named

inputFile and sent to the job)

The following will configure the outputs of the job

• Go to Job I/O tab

• Click on the output port (configuration form will open)

• Fill box Output Port’s Internal File Name with outputFile (this is the

name of the outFile file generated by the script)



• Save concrete workflow (press the diskette icon)


e) Run workflow

f) Retrieve output files

• Go to Storage‐>Local

• Find workflow of interest and press get Outputs

• The archive will contain the files and the logs corresponding to each

job (look deep into the directories, there are several layers).

This exercise is similar to exercise 6, however here two components should be

linked such that the data is processed by one and then the other.

The resulting workflow should be executed on pbs and on the grid.

Steps:

a) Create graph:

Similar to exercise 6. The graph should have two components, each one with one

input and one output port. The output of component 1 should be linked to the

input port of component 2 (click on the output port and drag to the input port

to connect them). Use meaningful names for the ports and components, because

these will show on the configuration GUI later.

b) Create concrete

c) Configure each job separately, as in exercise 6. Use the same script in

both jobs, but different strings in the field Parameter (Note that you

cannot configure the input port of the second component, because the

content of the input file will be passed on automatically from the first

component.)

d) Run and monitor the workflow

e) Retrieve output files

In this exercise you will create a workflow that will split a file into

smaller chunks, run jobs for each of the chunks, then merge the results

again into a file. This can be achieved using special types of input and

output ports (Generator and Collector).

As an example consider the scripts: splitLines.sh, task.sh, merge.sh and

input: uploadInput.txt

Run the example both on pbs and Grid.

In this exercise you will run a workflow for various values in one of the

input ports (parameter sweep). This will cause the execution of one job for

each of the input values in the list.

The same scripts and the same Graph will be used as for exercise 8. You need

to create a new concrete workflow, then configure it to run for a list of

files, instead of a single one.

This is done by configuring the input port with a file with special name

paramInputs.zip.

After succeeding, change the parameters, make a new archive and run the

workflow again.

Run the example both on pbs and Grid.

Repeat Exercise 8, this time fetching the input file (uploadInput.txt) from

the Grid storage:

• Go to Storage ‐> LFC ‐> Get LFC Hosts ‐> List LFC Host content ‐> Upload

the file uploadInput.txt

• Go to Concrete ‐> select the workflow ‘splitmerge’ ‐> Configure

o For the ‘Split’ job ‐> set Type to ‘glite’ to run on the Grid

o Job I/O: splitInput ‐> Source of input directed to this port:

Remote ‐> lfn:/grid/tutor/<path to your file>

o Run the workflow

ADDITIONAL EXERCISES:

• Workflows

Choose two grid workflow management systems (WfMS) and find information about

them on websites, papers, or personal communication.

As examples of possible systems to choose from, you may use the list contained

in Lecture 27. However, there are many others, so feel free to study other

that speak more to your own interest and application case.

Based on the study of this documentation, try to briefly answer the following

questions for these two WfMS:

1) Can the WfMS submit jobs using gLite middleware? Which other middleware

does the system support?

2) How are workflows described to this WfMS? Is there a language or

graphical interface for the developer?

3) How are the workflows executed in this system? Does the user start

manually? Or are the workflows started automatically somehow?

4) How is the monitoring of workflow execution done? Is there a graphical

interface? Can the user see the intermediate status and the results?

5) How does the WfMS handle errors? Does it retry failed jobs? Does it

abort the workflow execution when errors occur? What type of information

is shown by the WfMS to the user about the errors that occurred?

• Science Gateways

Search the web for science gateways in your own area of interest. For example,

in astronomy or for grid computing. Suggestion: use the pointers in lecture

30. Then answer the following questions for one of them:

1) What can one do on this gateway? For example, access data, run

predefined applications, etc.

2) Who can access the gateway? Is it open or for a closed community?

3) What do you need to bring into the gateway to be able to use it? For

example, own data, grid certificate, payment, etc.

4) Which team or organization maintains the gateway?

5) Would this science gateway actually help your research in some way? Why

(not)?

SOLUTIONS:

A)

Open a Terminal in your VM and type:

o $ voms‐proxy‐init

o $ myproxy‐init ‐l <username> ‐c 0 ‐t 100

o Sign into the portal here: http://grid‐mooc:8080/liferay‐portal‐

6.1.0/

o Go to Security ‐> Certificate

o Download your proxy:

o Download the hello‐script.sh in your VM

Download the hello‐script‐ports.sh in your VM

Download linkedData1.sh, linkedData2.sh

Documents

W6 Exercises Workflows - doc.grid.surfsara.nl