The Foundation APIHow does it work?
How It Runs...
• At the DE:
• The job is started, it runs the Foundation API App, with the information provided in json#1 (created in TITO) and the inputs provided by the user.
• As it runs, the App sends a message to TACC. The message tells the Foundation API Application at TACC what application to run, what TACC system to run it on, where the application and its wrapper script is located in iRODS, and the specific settings or arguments to pass to the wrapper script.
• At TACC:
• The Foundation API Application at TACC runs and it helps create a bash run script to run the job on the SGE queue, with the help of the application’s wrapper script and json#2, which resides within the FAPI system at TACC. The run script includes specifics about where the input data is in iRODS, where the outputs should be put in iRODS after the job, and what settings to pass to the specific bioinformatics application being run.
An Example: Newbler2.6
runAssembly -o outputname -m -force -large –cpu 1 inputReads.sff
runAssembly -o "${OUTNAME}" -m -force -large -cpu "${CPU}" \-a "${MIN_CONTIG_SIZE}" -l "${LARGE_CONTIG_SIZE}" "${OTHER}" "${INPUT_F}"
An Example: Newbler2.6 The first test of the wrapper script#!/bin/bash
#$ -V #Inherit the submission environment#$ -cwd # Start job in submission directory#$ -N newblertest # Job Name#$ -j y # Combine stderr and stdout#$ -o $JOB_NAME.o$JOB_ID # Name of the output#$ -pe 1way 12 # Requests 1task/node, 12 cores total#$ -q development # Queue name ”development”#$ -l h_rt=01:00:00 # Run time (hh:mm:ss) - 1.0 hours #$ -M [email protected] # Use email notification address#$ -m be # Email at Begin and End of jobset -x # Echo commands, use "set echo" with csh
#MODE=${mode}#INPUT="${inputSeqs}"#OUTNAME="${outputName}"#OUTFORM=${outputFormat}INPUT="/iplant/home/rogerab/data/sequencing1/FFGLB5S04.sff”OUTNAME="NewblerOut”CPU=12MIN_CONTIG_SIZE=100LARGE_CONTIG_SIZE=500
module purgemodule load TACCmodule swap intel gccmodule load irods
Iinit:passwordwait
#Copy from iRODSiget -fT "${INPUT}"waitINPUT_F=$(basename ${INPUT})
/work/01685/rogerab/bin3/bin/runAssembly -o "${OUTNAME}" -m -force -large -cpu "${CPU}” \ -a "${MIN_CONTIG_SIZE}" -l "${LARGE_CONTIG_SIZE}" "${OTHER}" "${INPUT_F}”
An Example: Newbler2.6: The final form of the wrapper script
CONTENTS OF newbler_wrapper.sh
INPUT="${inputSeqs}"OUTNAME="${outputName}"CPU="${cpu}"MIN_CONTIG_SIZE="${min_contig_size}"LARGE_CONTIG_SIZE="${large_contig_size}"OTHER="${other}"
#Copy Input File from iRODSiget -fT "${INPUT}"waitINPUT_F=$(basename ${INPUT})
chmod a+x runAssemblychmod a+x createProjectchmod a+x addRunchmod a+x newblerchmod a+x newMappingchmod a+x runMappingchmod a+x runProjectchmod a+x newAssemblychmod a+x stopRunrunAssembly -o "${OUTNAME}" -m -force -large -cpu "${CPU}" -a "${MIN_CONTIG_SIZE}" -l "${LARGE_CONTIG_SIZE}" "${OTHER}" "${INPUT_F}"
An Example: Newbler2.6: The json for the Lonestar Foundation API Application
CONTENTS OF appN.json
{ "name": "newbler", "parallelism": "SERIAL", "version": "2.6", "helpURI": "https://pods.iplantcollaborative.org/wiki/display/DEapps/Newbler", "label": "Newbler 2.6", "shortDescription": "Newbler, genome assembler", "longDescription": "Genome assembler for 454 sequencing reads", "author": "Roger Barthelson", "datePublished": "03/20/12", "tags": [ "assembler", "NGS", "454", "Roche" ], "ontology": [ "http://sswapmeet.sswap.info/sequenceServices/SequenceServices" ], "executionHost": "lonestar4.tacc.teragrid.org", "executionType": "HPC", "deploymentPath": "/iplant/home/rogerab/applications/newbler2.6/bin", "templatePath": "newbler_wrapper.sh", "testPath": "test/newblerwrapper.sh", "checkpointable": "true", "modules": [ "purge", "load TACC", "load irods" ], "inputs": [ { "id": "inputSeqs", "value": { "default": "", "validator": "", "visible": true, "required": true },
An Example: Newbler2.6: The json for the Lonestar Foundation API Application
CONTENTS OF appN.json (continued)
"inputs": [ { "id": "inputSeqs", "value": { "default": "", "validator": "", "visible": true, "required": true }, "details": { "label": "Sequences:", "description": "Sequence file in SFF or fasta format" }, "semantics": { "ontology": [ "http://sswapmeet.sswap.info/sequence/FASTA" ], "minCardinality": 1, "maxCardinality": 1, "fileTypes": [ "fasta-0" ] } } ],
An Example: Newbler2.6: The json for the Lonestar Foundation API Application
CONTENTS OF appN.json (continued) "parameters": [ { "id": "cpu", "value": { "default": ”1", "type": "string", "validator": "", "required": true, "visible": true }, "details": { "label": "number of threads", "description": "Specify the number of cores to be used", "visible": true }, "semantics": { "ontology": [ "xs:string" ] } }, { "id": "min_contig_size", "value": { "default": "200", "validator": "", "required": false, "visible": true, "type": "string" }, "details": { "label": "minimum contig size", "description": "Specify the minimum contig size to be output.", "visible": true }, "semantics": { "ontology": [ "xs:string" ] } },
An Example: Newbler2.6: The json for the Lonestar Foundation API Application
CONTENTS OF appN.json (continued) { "name": "newbler", "parallelism": "SERIAL", "version": "2.6", "helpURI": "https://pods.iplantcollaborative.org/wiki/display/DEapps/Newbler", "label": "Newbler 2.6", "shortDescription": "Newbler, genome assembler", "longDescription": "Genome assembler for 454 sequencing reads", "author": "Roger Barthelson", "datePublished": "03/20/12", "tags": [ "assembler", "NGS", "454", "Roche" ], "ontology": [ "http://sswapmeet.sswap.info/sequenceServices/SequenceServices" ], "executionHost": "lonestar4.tacc.teragrid.org", "executionType": "HPC", "deploymentPath": "/iplant/home/rogerab/applications/newbler2.6/bin", "templatePath": "newbler_wrapper.sh", "testPath": "test/newblerwrapper.sh", "checkpointable": "true", "modules": [ "purge", "load TACC", "load irods" ], "inputs": [ ], "parameters": [ ] }
Where is the application?
An Example: Newbler2.6: The json for the Lonestar Foundation API Application
CONTENTS OF appN.json (continued) { "name": "newbler", "parallelism": "SERIAL", "version": "2.6", "helpURI": "https://pods.iplantcollaborative.org/wiki/display/DEapps/Newbler", "label": "Newbler 2.6", "shortDescription": "Newbler, genome assembler", "longDescription": "Genome assembler for 454 sequencing reads", "author": "Roger Barthelson", "datePublished": "03/20/12", "tags": [ "assembler", "NGS", "454", "Roche" ], "ontology": [ "http://sswapmeet.sswap.info/sequenceServices/SequenceServices" ], "executionHost": "lonestar4.tacc.teragrid.org", "executionType": "HPC", "deploymentPath": "/iplant/home/rogerab/applications/newbler2.6/bin", "templatePath": "newbler_wrapper.sh", "testPath": "test/newblerwrapper.sh", "checkpointable": "true", "modules": [ "purge", "load TACC", "load irods" ], "inputs": [ ], "parameters": [ ] }
Where is the Application?
Where is the json file?
• I don’t know.
• It seems to be entered into a database of information held by the Foundation API application on the TACC side, e.g. on Lonestar. The actual file name doesn’t matter.
• How does it get there?
• curl -X POST -sku ”user:password" -F "[email protected]" https://foundation.iplantc.org/apps-v1/apps
• The response should be the contents of the json file. That means it liked your json.
Test the Application on TACC with the Test Application
• https://foundation.iplantcollaborative.org/iplant-test/
• Your App name is the “id” you entered into the json, plus the version number.
• Example: id:newbler, version:2.6, becomes newbler-2.6
• What to do:
• Log in with your iplant user id and password
• Find your App under Apps Service>Shared Apps
• Get a job submission form, fill it out, submit it!
• Monitor the results under Job Service
The Test Application with the Apps service section shown.
The Test Application: job submission form for newbler.
Summary of the TACC Portion of the Foundation API
• The json loaded into the TACC Foundation API application is central.
• The json tells the application where everything is: the application, the wrapper script, and what inputs and settings to look for.
• The wrapper script feeds the main arguments for the application.
• The input files are in the Data Store.
Actual Run Script for Newbler Through the Test Application on Lonestar#!/bin/bash
#$ -N newbler4-2_15#$ -cwd#$ -V#$ -o newbler4-2_15$JOB_ID.out#$ -e newbler4-2_15$JOB_ID.err#$ -l h_rt=01:00:00#$ -A TG-MCB110022#$ -pe 12way 24#$ -q largemem
curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/RUNNING
cd /scratch/0004/iplant/rogerab/job-1625-newbler4-2_15/newbler2.6
# Environmental settings for newbler-2.6:
module purgemodule load TACCmodule irodsINPUT="/iplant/home//rogerab/data/sequencing1/FFGLB5S04.sff"OUTNAME="NewblerOutDir"CPU="1"MIN_CONTIG_SIZE="200"LARGE_CONTIG_SIZE="500"OTHER=""
#Copy from iRODSiget -fT "${INPUT}"waitINPUT_F=$(basename ${INPUT})
runAssembly -o "${OUTNAME}" -m -force -large -cpu "${CPU}" -a "${MIN_CONTIG_SIZE}" -l "${LARGE_CONTIG_SIZE}" "${OTHER}" "${INPUT_F}”
curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/FINISHED
curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/ARCHIVING
imkdir -p /iplant/home/rogerab/archive/jobs/job-1625-newbler4-2_15for i in `find . -maxdepth 1`; do exists=`grep -x "$i" .iplant.archive` if [ ! -n "$exists" ]; then iput -v -f -r $i /iplant/home/rogerab/archive/jobs/job-1625-newbler4-2_15 fiDone
curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/ARCHIVING_FINISHED
The Discovery Environment Side of the Foundation API
• Key arguments in the TACC json and in the wrapper script need to be entered by way of the App in the DE
• The Foundation API Application is the tool that is run by the DE (foundational_api_adapter.pl)
• The interface for the App is designed by you in TITO.
The Discovery Environment Side of the Foundation API
• The Foundation API Application has some of its own arguments that it requires for setting up the run at TACC:
• Application ID (appid)
• Maximum Memory (maxMemory)
• Estimated Run Time (requestedTime)
• Job Size (processorCount)
• These are in your json at TACC also, and should be preserved in the precise syntax used here (and in the following examples)
In TITO: foundation_api_adapter.pl is the application you are integrating.
In TITO: Your arguments are ordered in a way similar to their appearance in the json at TACC.
In TITO: The inputs are whatever your application may need. These are the files selected from the Data Store when you setup a run with the application.
NoteFormat!
In TITO: The options are whatever your application may use for their settings.
NoteFormat!
In TITO: The run options are whatever TACC needs to setup the run! For memory use a setting of <1000 (gbytes) for the normal queue. Use a setting of 1000 to tell TACC to run it on the largemem queue.
NoteFormat!
Actual Run Script for Newbler Through the Test Application on Lonestar#!/bin/bash
#$ -N newbler4-2_15#$ -cwd#$ -V#$ -o newbler4-2_15$JOB_ID.out#$ -e newbler4-2_15$JOB_ID.err#$ -l h_rt=01:00:00#$ -A TG-MCB110022#$ -pe 1way 24#$ -q largemem
curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/RUNNING
cd /scratch/0004/iplant/rogerab/job-1625-newbler4-2_15/newbler2.6
# Environmental settings for newbler-2.6:
module purgemodule load TACCmodule irodsINPUT="/iplant/home//rogerab/data/sequencing1/FFGLB5S04.sff"OUTNAME="NewblerOutDir"CPU="1"MIN_CONTIG_SIZE="200"LARGE_CONTIG_SIZE="500"OTHER=""
#Copy from iRODSiget -fT "${INPUT}"waitINPUT_F=$(basename ${INPUT})
runAssembly -o "${OUTNAME}" -m -force -large -cpu "${CPU}" -a "${MIN_CONTIG_SIZE}" -l "${LARGE_CONTIG_SIZE}" "${OTHER}" "${INPUT_F}”
curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/FINISHED
curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/ARCHIVING
imkdir -p /iplant/home/rogerab/archive/jobs/job-1625-newbler4-2_15for i in `find . -maxdepth 1`; do exists=`grep -x "$i" .iplant.archive` if [ ! -n "$exists" ]; then iput -v -f -r $i /iplant/home/rogerab/archive/jobs/job-1625-newbler4-2_15 fiDone
curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/ARCHIVING_FINISHED
The time limit is specified on TACC runs to help conserve resources and manage to queue. Maximum time is 24 h, but providing options encourages the user to ask for less time if they don’t think it is needed.
NoteFormat!
The number of processors needed is also specified on TACC runs to help conserve resources. What can be effectively used is an important consideration. A serial application, e.g. one that does not use mpi for multiprocessing like Newbler will not benefit from large numbers of processors. Serial applications (as specified in the json at TACC) must be set to 1 processor. Apps that are set –maxMemory=1000 will run on the largemem queue with 24 cores per node, 48 cores maximum.
NoteFormat!
Actual Run Script for Newbler Through the Test Application on Lonestar#!/bin/bash
#$ -N newbler4-2_15#$ -cwd#$ -V#$ -o newbler4-2_15$JOB_ID.out#$ -e newbler4-2_15$JOB_ID.err#$ -l h_rt=01:00:00#$ -A TG-MCB110022#$ -pe 1way 24#$ -q largemem
curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/RUNNING
cd /scratch/0004/iplant/rogerab/job-1625-newbler4-2_15/newbler2.6
# Environmental settings for newbler-2.6:
module purgemodule load TACCmodule irodsINPUT="/iplant/home//rogerab/data/sequencing1/FFGLB5S04.sff"OUTNAME="NewblerOutDir"CPU="1"MIN_CONTIG_SIZE="200"LARGE_CONTIG_SIZE="500"OTHER=""
#Copy from iRODSiget -fT "${INPUT}"waitINPUT_F=$(basename ${INPUT})
runAssembly -o "${OUTNAME}" -m -force -large -cpu "${CPU}" -a "${MIN_CONTIG_SIZE}" -l "${LARGE_CONTIG_SIZE}" "${OTHER}" "${INPUT_F}”
curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/FINISHED
curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/ARCHIVING
imkdir -p /iplant/home/rogerab/archive/jobs/job-1625-newbler4-2_15for i in `find . -maxdepth 1`; do exists=`grep -x "$i" .iplant.archive` if [ ! -n "$exists" ]; then iput -v -f -r $i /iplant/home/rogerab/archive/jobs/job-1625-newbler4-2_15 fiDone
curl -k https://foundation.iplantc.org/apps-v1/trigger/job/1625/token/3c6abe00-d9a7-49b6-972d-7706d2779151/status/ARCHIVING_FINISHED
iPlant DE
Tool: Foundation API
json#1
APP
TACC SERVER
iPlant DE
iPlant Data Store (iRODS)
TACC FoundationAPI
TACC SGE Queue
json#2
Sends Job Request, Inputs, Settings
Application ExecutablesApplication Wrapper
Script
User’s Input Data
iPlant DE Results Store
Run Information, Progress
Returned
Requests Executables, Wrapper
Returns Executables, Wrapper
Requests Input Files
Returns Input Files
Sends Output FilesJob Runs Here!
Initiate Job Here
APPAPP
APP
Results Stored Here
Submit Job