30
The Foundation API How does it work?

The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

Embed Size (px)

Citation preview

Page 1: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

The Foundation APIHow does it work?

Page 2: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

How It Runs...

• At the DE:

• The job is started, it runs the Foundation API App, with the information provided in json#1 (created in TITO) and the inputs provided by the user.

• As it runs, the App sends a message to TACC. The message tells the Foundation API Application at TACC what application to run, what TACC system to run it on, where the application and its wrapper script is located in iRODS, and the specific settings or arguments to pass to the wrapper script.

• At TACC:

• The Foundation API Application at TACC runs and it helps create a bash run script to run the job on the SGE queue, with the help of the application’s wrapper script and json#2, which resides within the FAPI system at TACC. The run script includes specifics about where the input data is in iRODS, where the outputs should be put in iRODS after the job, and what settings to pass to the specific bioinformatics application being run.

Page 3: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

An Example: Newbler2.6

runAssembly -o outputname -m -force -large –cpu 1 inputReads.sff

runAssembly -o "${OUTNAME}" -m -force -large -cpu "${CPU}" \-a "${MIN_CONTIG_SIZE}" -l "${LARGE_CONTIG_SIZE}" "${OTHER}" "${INPUT_F}"

Page 4: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

An Example: Newbler2.6 The first test of the wrapper script#!/bin/bash

#$ -V #Inherit the submission environment#$ -cwd # Start job in submission directory#$ -N newblertest # Job Name#$ -j y # Combine stderr and stdout#$ -o $JOB_NAME.o$JOB_ID # Name of the output#$ -pe 1way 12 # Requests 1task/node, 12 cores total#$ -q development # Queue name ”development”#$ -l h_rt=01:00:00 # Run time (hh:mm:ss) - 1.0 hours #$ -M [email protected] # Use email notification address#$ -m be # Email at Begin and End of jobset -x # Echo commands, use "set echo" with csh

#MODE=${mode}#INPUT="${inputSeqs}"#OUTNAME="${outputName}"#OUTFORM=${outputFormat}INPUT="/iplant/home/rogerab/data/sequencing1/FFGLB5S04.sff”OUTNAME="NewblerOut”CPU=12MIN_CONTIG_SIZE=100LARGE_CONTIG_SIZE=500

module purgemodule load TACCmodule swap intel gccmodule load irods

Iinit:passwordwait

#Copy from iRODSiget -fT "${INPUT}"waitINPUT_F=$(basename ${INPUT})

/work/01685/rogerab/bin3/bin/runAssembly -o "${OUTNAME}" -m -force -large -cpu "${CPU}” \ -a "${MIN_CONTIG_SIZE}" -l "${LARGE_CONTIG_SIZE}" "${OTHER}" "${INPUT_F}”

Page 5: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

An Example: Newbler2.6: The final form of the wrapper script

CONTENTS OF newbler_wrapper.sh

INPUT="${inputSeqs}"OUTNAME="${outputName}"CPU="${cpu}"MIN_CONTIG_SIZE="${min_contig_size}"LARGE_CONTIG_SIZE="${large_contig_size}"OTHER="${other}"

#Copy Input File from iRODSiget -fT "${INPUT}"waitINPUT_F=$(basename ${INPUT})

chmod a+x runAssemblychmod a+x createProjectchmod a+x addRunchmod a+x newblerchmod a+x newMappingchmod a+x runMappingchmod a+x runProjectchmod a+x newAssemblychmod a+x stopRunrunAssembly -o "${OUTNAME}" -m -force -large -cpu "${CPU}" -a "${MIN_CONTIG_SIZE}" -l "${LARGE_CONTIG_SIZE}" "${OTHER}" "${INPUT_F}"

Page 6: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

An Example: Newbler2.6: The json for the Lonestar Foundation API Application

CONTENTS OF appN.json

{ "name": "newbler", "parallelism": "SERIAL", "version": "2.6", "helpURI": "https://pods.iplantcollaborative.org/wiki/display/DEapps/Newbler", "label": "Newbler 2.6", "shortDescription": "Newbler, genome assembler", "longDescription": "Genome assembler for 454 sequencing reads", "author": "Roger Barthelson", "datePublished": "03/20/12", "tags": [ "assembler", "NGS", "454", "Roche" ], "ontology": [ "http://sswapmeet.sswap.info/sequenceServices/SequenceServices" ], "executionHost": "lonestar4.tacc.teragrid.org", "executionType": "HPC", "deploymentPath": "/iplant/home/rogerab/applications/newbler2.6/bin", "templatePath": "newbler_wrapper.sh", "testPath": "test/newblerwrapper.sh", "checkpointable": "true", "modules": [ "purge", "load TACC", "load irods" ], "inputs": [ { "id": "inputSeqs", "value": { "default": "", "validator": "", "visible": true, "required": true },

Page 7: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

An Example: Newbler2.6: The json for the Lonestar Foundation API Application

CONTENTS OF appN.json (continued)

"inputs": [ { "id": "inputSeqs", "value": { "default": "", "validator": "", "visible": true, "required": true }, "details": { "label": "Sequences:", "description": "Sequence file in SFF or fasta format" }, "semantics": { "ontology": [ "http://sswapmeet.sswap.info/sequence/FASTA" ], "minCardinality": 1, "maxCardinality": 1, "fileTypes": [ "fasta-0" ] } } ],

Page 8: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

An Example: Newbler2.6: The json for the Lonestar Foundation API Application

CONTENTS OF appN.json (continued) "parameters": [ { "id": "cpu", "value": { "default": ”1", "type": "string", "validator": "", "required": true, "visible": true }, "details": { "label": "number of threads", "description": "Specify the number of cores to be used", "visible": true }, "semantics": { "ontology": [ "xs:string" ] } }, { "id": "min_contig_size", "value": { "default": "200", "validator": "", "required": false, "visible": true, "type": "string" }, "details": { "label": "minimum contig size", "description": "Specify the minimum contig size to be output.", "visible": true }, "semantics": { "ontology": [ "xs:string" ] } },

Page 9: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

An Example: Newbler2.6: The json for the Lonestar Foundation API Application

CONTENTS OF appN.json (continued) { "name": "newbler", "parallelism": "SERIAL", "version": "2.6", "helpURI": "https://pods.iplantcollaborative.org/wiki/display/DEapps/Newbler", "label": "Newbler 2.6", "shortDescription": "Newbler, genome assembler", "longDescription": "Genome assembler for 454 sequencing reads", "author": "Roger Barthelson", "datePublished": "03/20/12", "tags": [ "assembler", "NGS", "454", "Roche" ], "ontology": [ "http://sswapmeet.sswap.info/sequenceServices/SequenceServices" ], "executionHost": "lonestar4.tacc.teragrid.org", "executionType": "HPC", "deploymentPath": "/iplant/home/rogerab/applications/newbler2.6/bin", "templatePath": "newbler_wrapper.sh", "testPath": "test/newblerwrapper.sh", "checkpointable": "true", "modules": [ "purge", "load TACC", "load irods" ], "inputs": [ ], "parameters": [ ] }

Page 10: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

Where is the application?

Page 11: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

An Example: Newbler2.6: The json for the Lonestar Foundation API Application

CONTENTS OF appN.json (continued) { "name": "newbler", "parallelism": "SERIAL", "version": "2.6", "helpURI": "https://pods.iplantcollaborative.org/wiki/display/DEapps/Newbler", "label": "Newbler 2.6", "shortDescription": "Newbler, genome assembler", "longDescription": "Genome assembler for 454 sequencing reads", "author": "Roger Barthelson", "datePublished": "03/20/12", "tags": [ "assembler", "NGS", "454", "Roche" ], "ontology": [ "http://sswapmeet.sswap.info/sequenceServices/SequenceServices" ], "executionHost": "lonestar4.tacc.teragrid.org", "executionType": "HPC", "deploymentPath": "/iplant/home/rogerab/applications/newbler2.6/bin", "templatePath": "newbler_wrapper.sh", "testPath": "test/newblerwrapper.sh", "checkpointable": "true", "modules": [ "purge", "load TACC", "load irods" ], "inputs": [ ], "parameters": [ ] }

Page 12: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

Where is the Application?

Page 13: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

Where is the json file?

• I don’t know.

• It seems to be entered into a database of information held by the Foundation API application on the TACC side, e.g. on Lonestar. The actual file name doesn’t matter.

• How does it get there?

• curl -X POST -sku ”user:password" -F "[email protected]" https://foundation.iplantc.org/apps-v1/apps

• The response should be the contents of the json file. That means it liked your json.

Page 14: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

Test the Application on TACC with the Test Application

• https://foundation.iplantcollaborative.org/iplant-test/

• Your App name is the “id” you entered into the json, plus the version number.

• Example: id:newbler, version:2.6, becomes newbler-2.6

• What to do:

• Log in with your iplant user id and password

• Find your App under Apps Service>Shared Apps

• Get a job submission form, fill it out, submit it!

• Monitor the results under Job Service

Page 15: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

The Test Application with the Apps service section shown.

Page 16: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

The Test Application: job submission form for newbler.

Page 17: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

Summary of the TACC Portion of the Foundation API

• The json loaded into the TACC Foundation API application is central.

• The json tells the application where everything is: the application, the wrapper script, and what inputs and settings to look for.

• The wrapper script feeds the main arguments for the application.

• The input files are in the Data Store.

Page 18: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

Actual Run Script for Newbler Through the Test Application on Lonestar#!/bin/bash

#$ -N newbler4-2_15#$ -cwd#$ -V#$ -o newbler4-2_15$JOB_ID.out#$ -e newbler4-2_15$JOB_ID.err#$ -l h_rt=01:00:00#$ -A TG-MCB110022#$ -pe 12way 24#$ -q largemem

curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/RUNNING

cd /scratch/0004/iplant/rogerab/job-1625-newbler4-2_15/newbler2.6

# Environmental settings for newbler-2.6:

module purgemodule load TACCmodule irodsINPUT="/iplant/home//rogerab/data/sequencing1/FFGLB5S04.sff"OUTNAME="NewblerOutDir"CPU="1"MIN_CONTIG_SIZE="200"LARGE_CONTIG_SIZE="500"OTHER=""

#Copy from iRODSiget -fT "${INPUT}"waitINPUT_F=$(basename ${INPUT})

runAssembly -o "${OUTNAME}" -m -force -large -cpu "${CPU}" -a "${MIN_CONTIG_SIZE}" -l "${LARGE_CONTIG_SIZE}" "${OTHER}" "${INPUT_F}”

curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/FINISHED

curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/ARCHIVING

imkdir -p /iplant/home/rogerab/archive/jobs/job-1625-newbler4-2_15for i in `find . -maxdepth 1`; do exists=`grep -x "$i" .iplant.archive` if [ ! -n "$exists" ]; then iput -v -f -r $i /iplant/home/rogerab/archive/jobs/job-1625-newbler4-2_15 fiDone

curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/ARCHIVING_FINISHED

Page 19: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

The Discovery Environment Side of the Foundation API

• Key arguments in the TACC json and in the wrapper script need to be entered by way of the App in the DE

• The Foundation API Application is the tool that is run by the DE (foundational_api_adapter.pl)

• The interface for the App is designed by you in TITO.

Page 20: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

The Discovery Environment Side of the Foundation API

• The Foundation API Application has some of its own arguments that it requires for setting up the run at TACC:

• Application ID (appid)

• Maximum Memory (maxMemory)

• Estimated Run Time (requestedTime)

• Job Size (processorCount)

• These are in your json at TACC also, and should be preserved in the precise syntax used here (and in the following examples)

Page 21: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

In TITO: foundation_api_adapter.pl is the application you are integrating.

Page 22: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

In TITO: Your arguments are ordered in a way similar to their appearance in the json at TACC.

Page 23: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

In TITO: The inputs are whatever your application may need. These are the files selected from the Data Store when you setup a run with the application.

NoteFormat!

Page 24: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

In TITO: The options are whatever your application may use for their settings.

NoteFormat!

Page 25: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

In TITO: The run options are whatever TACC needs to setup the run! For memory use a setting of <1000 (gbytes) for the normal queue. Use a setting of 1000 to tell TACC to run it on the largemem queue.

NoteFormat!

Page 26: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

Actual Run Script for Newbler Through the Test Application on Lonestar#!/bin/bash

#$ -N newbler4-2_15#$ -cwd#$ -V#$ -o newbler4-2_15$JOB_ID.out#$ -e newbler4-2_15$JOB_ID.err#$ -l h_rt=01:00:00#$ -A TG-MCB110022#$ -pe 1way 24#$ -q largemem

curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/RUNNING

cd /scratch/0004/iplant/rogerab/job-1625-newbler4-2_15/newbler2.6

# Environmental settings for newbler-2.6:

module purgemodule load TACCmodule irodsINPUT="/iplant/home//rogerab/data/sequencing1/FFGLB5S04.sff"OUTNAME="NewblerOutDir"CPU="1"MIN_CONTIG_SIZE="200"LARGE_CONTIG_SIZE="500"OTHER=""

#Copy from iRODSiget -fT "${INPUT}"waitINPUT_F=$(basename ${INPUT})

runAssembly -o "${OUTNAME}" -m -force -large -cpu "${CPU}" -a "${MIN_CONTIG_SIZE}" -l "${LARGE_CONTIG_SIZE}" "${OTHER}" "${INPUT_F}”

curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/FINISHED

curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/ARCHIVING

imkdir -p /iplant/home/rogerab/archive/jobs/job-1625-newbler4-2_15for i in `find . -maxdepth 1`; do exists=`grep -x "$i" .iplant.archive` if [ ! -n "$exists" ]; then iput -v -f -r $i /iplant/home/rogerab/archive/jobs/job-1625-newbler4-2_15 fiDone

curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/ARCHIVING_FINISHED

Page 27: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

The time limit is specified on TACC runs to help conserve resources and manage to queue. Maximum time is 24 h, but providing options encourages the user to ask for less time if they don’t think it is needed.

NoteFormat!

Page 28: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

The number of processors needed is also specified on TACC runs to help conserve resources. What can be effectively used is an important consideration. A serial application, e.g. one that does not use mpi for multiprocessing like Newbler will not benefit from large numbers of processors. Serial applications (as specified in the json at TACC) must be set to 1 processor. Apps that are set –maxMemory=1000 will run on the largemem queue with 24 cores per node, 48 cores maximum.

NoteFormat!

Page 29: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

Actual Run Script for Newbler Through the Test Application on Lonestar#!/bin/bash

#$ -N newbler4-2_15#$ -cwd#$ -V#$ -o newbler4-2_15$JOB_ID.out#$ -e newbler4-2_15$JOB_ID.err#$ -l h_rt=01:00:00#$ -A TG-MCB110022#$ -pe 1way 24#$ -q largemem

curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/RUNNING

cd /scratch/0004/iplant/rogerab/job-1625-newbler4-2_15/newbler2.6

# Environmental settings for newbler-2.6:

module purgemodule load TACCmodule irodsINPUT="/iplant/home//rogerab/data/sequencing1/FFGLB5S04.sff"OUTNAME="NewblerOutDir"CPU="1"MIN_CONTIG_SIZE="200"LARGE_CONTIG_SIZE="500"OTHER=""

#Copy from iRODSiget -fT "${INPUT}"waitINPUT_F=$(basename ${INPUT})

runAssembly -o "${OUTNAME}" -m -force -large -cpu "${CPU}" -a "${MIN_CONTIG_SIZE}" -l "${LARGE_CONTIG_SIZE}" "${OTHER}" "${INPUT_F}”

curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/FINISHED

curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/ARCHIVING

imkdir -p /iplant/home/rogerab/archive/jobs/job-1625-newbler4-2_15for i in `find . -maxdepth 1`; do exists=`grep -x "$i" .iplant.archive` if [ ! -n "$exists" ]; then iput -v -f -r $i /iplant/home/rogerab/archive/jobs/job-1625-newbler4-2_15 fiDone

curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/ARCHIVING_FINISHED

Page 30: The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1

iPlant DE

Tool: Foundation API

json#1

APP

TACC SERVER

iPlant DE

iPlant Data Store (iRODS)

TACC FoundationAPI

TACC SGE Queue

json#2

Sends Job Request, Inputs, Settings

Application ExecutablesApplication Wrapper

Script

User’s Input Data

iPlant DE Results Store

Run Information, Progress

Returned

Requests Executables, Wrapper

Returns Executables, Wrapper

Requests Input Files

Returns Input Files

Sends Output FilesJob Runs Here!

Initiate Job Here

APPAPP

APP

Results Stored Here

Submit Job