27
Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL [email protected]

Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL [email protected]

Embed Size (px)

Citation preview

Page 1: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

Example applications of e-Infrastructure:

NGS UI/WMS

Jonathan Churchill - STFC/[email protected]

Page 2: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

Summary• Overview• Example case study.

– Logging in: SSH and MyProxy.– Parametric jobs– Head node for the grid

• Submitting simple jobs.• Submitting ‘real’ jobs.• Lab Session.

Page 3: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

NGS UI/WMS• User Interface / Workload Management System• UMD (gLite) UI and WMS Distributions• NGS Useability improvements:

– ssh / proxy logins– Extensive proxy checking– Gridftp on UI

• “Head node for the Grid”

• Works with NGS and gLite/GridPP sites• 12,000 + CPU Cores

Rapid take up sinceOct ‘09 startup.

Page 4: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

NGS WMSUI

WMS

Information Server (BDII)

MyProxy

RAL-NGS2,Scotgrid-Glasgow, Oxford-OERC,Manchester-NGS2RAL-LCG....etc 12,000+ CPUs

<gsiFTP>

ssh Login : <MEG>

[email protected]

<Soap WS>

Page 5: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

Transcriptome Analysis using the NGS UI/WMS

Jonathan Churchill - STFC/[email protected]

Paul Wilkinson - Exeter University

Page 6: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

Tobacco Hornworm Moth Manduca sexta

Green Dock Beetle Gastrophysa viridula

Page 7: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

mRNA Transcript

genome.gov: National Human Genome Research Institute

Page 8: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

Bio Databases and Applications

ftp

rsync

• Database Mirrors:EMBL

UNIPROT, TREMBL, SWISSPROTPROSITEPRINTSREBASE

• Pre-Installed Applications:BLAST, EMBOSS, FASTAGROMACS, MrBAYES, EXONERATE, NAMD, Siesta

Page 9: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

WMS Parametric Case Study• What Proteins do these ‘contigs’/ transcripts

code for ?

• NCBI BLAST Search in the EBI Uniprot database

1 x 55,000 Contigs 1 month elapsed annotation time.

1000 x 55 Contigs + NGS + WMS < 6 hours elapsed annotation time. Using WMS ‘Parametric’ JDL file one JDL for 1000 Jobs one Submission Command One Status Command Outputs returned to UI Automatically

Page 10: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

Create your own username / password

Page 11: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

Login : SSH/PuTTY

Page 12: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

JDL FileType = "Job";JobType = "Parametric";Executable = "/usr/ngs/BLAST-NCBI";Arguments = "blastall -p blastx -d uniprot -i contig-_PARAM_.fsa -a 1";StdOutput = "contig-_PARAM_.out";StdError = "contig-_PARAM_.err";Parameters = 997;ParameterStart = 0;ParameterStep = 1;MyProxyServer = "myproxy.ngs.ac.uk";InputSandbox = {"contig-_PARAM_.fsa"};InputSandboxBaseURI = "gsiftp://ngsui03.ngs.ac.uk:2811/home/ngs0055/ParamBlast/inputs";OutputSandbox = {"contig-_PARAM_.out","contig-_PARAM_.err"};OutputSandboxBaseDestURI =

"gsiftp://ngsui03.ngs.ac.uk:2811/home/ngs0055/ParamBlast/outputs";Requirements = ( Member("NGS-UEE-BLAST-NCBI", other.GlueHostApplicationSoftwareRunTimeEnvironment));Rank = other.GlueCEStateFreeCPUs;ShallowRetryCount = -1;

Page 13: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

Submit & Monitor

• glite-wms-job-submit –a –o jobIDs blast.jdl• glite-wms-job-status –i jobIDs• One jobID for all 1000 jobs• 1000 Output files• IP & OP files copied from/to UI• Jobs 2-3 hours each• Head node for the grid.

Peak 320 Jobs in flightAvg 150 Jobs in flight

Page 14: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

Summary

• 50,000 Contig analysis in < 6 hours vs 1 Month• ssh Username/password Logins• 1000 Jobs all managed as one ‘job’.• Input/Output on the UI.• Head node for the Grid.

Page 15: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

Summary• Overview• Example case study.

– Logging in: SSH and MyProxy.– Parametric jobs– Head node for the grid

• Submitting simple jobs.• Submitting ‘real’ jobs.• Lab Session.

Page 16: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

Simple Example• [ Create your proxy <- UI does this for you ! ]

– voms-proxy-init --voms ngs.ac.uk

• See what’s availablelcg-infosites --vo ngs.ac.uk ce

• Submit the jobglite-job-submit -a –o jobIDs.txt my_test.jdlhttps://ngswms01.ngs.ac.uk:9000/LHGIagvDl701_msz0jpIg

• Check the status of your jobglite-job-status -i jobIDs.txt

• Get the outputglite-job-output –i jobIds.txt --dir ./outputs

Note: UI/WMS can retrieve outputs automatically

Page 17: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

• Simple JDL file

• Some default parameters set on the UI

Simple Example jdl

Type = "Job";JobType = "Normal";Executable = “settings.sh";StdOutput = “output.out";StdError = “output.err";InputSandBox = {“settings.sh”};OutputSandbox = {“output.err",“output.out"};RetryCount = 1;Requirements = ( other.GlueCEUniqueID == "ngs.rl.ac.uk:2119/jobmanager-lsf-ngs“); Rank = other.GlueCEStateFreeCPUs;

Requirements = other.GlueCEStateStatus == “Production”;

Page 18: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

Summary• Overview• Example case study.

– Logging in: SSH and MyProxy.– Parametric jobs– Head node for the grid

• Submitting simple jobs.• Submitting ‘real’ jobs.• Lab Session.

Page 19: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

NGS Applications Docs

Page 20: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

Input/Output Files• InputSandBox lists all input files

– Inc’s binaries/scripts to run– Wildcards ok

• OutputSandBox lists o/p files to retrieve.– Wildcards not allowed.– Tutorial shows ‘Epilogue’ script.

• InputSandboxBaseURI– Avoids 3rd party transfers via WMS

server.• OutputSandBoxBaseDestURI

– O/P’s to UI or elsewhere.– Output dir must exist.– Files arrive before job “Done”.

Type = "Job";JobType = "mpich";Executable = "/usr/ngs/DLPOLY2"; CpuNumber = 8; StdOutput = "std.out"; StdError = "std.err"; Myproxyserver= "myproxy.ngs.ac.uk"; InputSandbox = {"CONFIG","CONTROL","FIELD","REVCON"};InputSandboxBaseURI = "gsiftp://ngsui03.ngs.ac.uk:2811/home/ngsxxx/dlpoly"; OutputSandbox = {"OUTPUT","STATIS","CONFIG", "CONTROL","FIELD","REVCON","REVIVE", "stdout.out","stderr.out"};OutputSandboxBaseDestURI = "gsiftp://ngsui03.ngs.ac.uk:2811/home/ngsxxx/dlpoly"; Requirements = ( member("NGS-UEE-DLPOLY2", other.GlueHostApplicationSoftwareRunTimeEnvironment));ShallowRetryCount = -1;

Page 21: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

Key Features Summary

• ssh Logins• Input/Output on the UI• Head node for the Grid.• Single Jobs and Parametric Sweeps• Normal and MPI jobs• Example JDLs on wwww.ngs.ac.uk• Questions : [email protected]

Page 22: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

Further Information• NGS Web site UI-WMS Page:

– http://www.ngs.ac.uk/uiwms– Links to simple WMS Tutorials (2) & app specific (Gaussian, NAMD etc)– http://www.ngs.ac.uk/applications

• Tutorials– http://www.ngs.ac.uk/ngs-workload-management-system-and-user-interface-tutorials– http://wiki.ngs.ac.uk/index.php?title=UI-WMS_Tutorial– http://wiki.ngs.ac.uk/index.php?title=UI-WMS_Tutorial2

• Parametric Case Study:– http://www.ngs.ac.uk/mrna-analysis-using-the-ngs– http://www.ngs.ac.uk/sites/default/files/file/newsletters/Dec%202009%20NGS%20news.pdf

• Links to Guides and JDL attributes doc.– http://wiki.ngs.ac.uk/index.php?title=UI-WMS_Tutorial#Further_Resources

Page 23: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

Questions ?

Page 24: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

Lab

• http://wiki.ngs.ac.uk/index.php?title=UI-WMS-SeIUCCR-Tutorial

• Login:– SSH username = “SummerUserXX”

• XX is on your packs

– SSH Password = “2012-SSXX”– Valid until Friday afternoon

Page 25: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

Inputs>whitefly_assembly.accurate.15_lrc1GGTATCAACGCAGAGTKCGCGGGGAGTAGAACAAAGAGCGTCTGAGAGGACTTCGCGATAGTGTTACGTTAATCGATAGCTCGTGTGTTAAAAAAATCTTTCAAGTCCTTCCTGTCTTTTGACTACTTAATTAGTTAATTATTATTTTGATCGAGACAAGCAAAGAAAAATGAATTCCATATTATCTTTGACCGTTTTCGTAACTTTCACAATTGTCTTGGCTCAAAGTGAACAATTAGACAAGAACTTCGGCGTGGGCGAAATCAAGACTCGCATCCAAGATAAAAAATTTGTTGAGAAGCAGTTGGGCTGTGTCCTAGGGAAAGCCGATTGCGACACCTTAGGAAATCAGTTGAAAGTTGCCATTCCAGAAGTCCTAGTTAAAGGCTGCAAGGATTGCACTCCGGAACAATCTGCAAATGCCAATCGATTAATAGCTTTTATAAAGATGAATTATCCAGCAGAATGGAGTCAAATTGCTGCAAAATATGGTGTGAAAGGTGATGCTGTAAAGAGGCCACGACGACATATCAGAAGGTGAAAGGAGTGATGCCAAAGATGTGATAAGTTTTTATTGTTAACTTTCGAGTCTTGACTTGATTTGATCATTGTGTACGTATGTATTTTAATTCTTCCAATTGTGAGCAGTATTTTAAGAGGGTATTCTAAATAACAGCCGTCCAAAAAGTTTTGAACTGAAATTTAAACTGTTAAGTGTTGATGACTTTTACCAATATTTATTTTTTTATCACCGAACTGTTAGTAATACTGCGACCAATACAAATTTATCTTTAGTCAGCTTGATTTTTTATCAAGTTGATTCTTTTTTTTGGACAATTTTTTTTTTATTATTATTCTTCCTCATTTAATGTATGTTTAAAATTGTTAATTGACCACCATTCGCATTTAATTGATTAAGTTTTTCTTATTTTTTTTTTATATGAACCAATGTTATAATTTTGCTCTCATAAACCTACTGTAAAATATTGAGTGTCCAGTTAAAGCTTTAAACTTTATATATTTTAACAAAAAATTAATGAGCTATTTTATAGAACCTAATAA>whitefly_assembly.accurate.15_lrc2TCGGGGGAGTAAATTCATGAAAGATAATCTAATCGTGCAGCCTTTTTATGAGACGCGCTGAAGTTTCGGATTAGGTTTTAGTCTTTACTAATTAATTGTATTTGTTTAGCTCATTAATTTTAATTATTCCACATTTAAAGATGTCTAAGGAAGAAGCAGCAATCCCTCCTCCAATGATTTGGGCCCAGAGATCTGGTGTTGTCTTTTTAACAATTAATGTAGAGGATTGTAAAGACCCCGAAATTAAAATTGAAGAAGATAAATTTTCTTTTAAAAGTGTTGGTGGTGTTGAAAAGAAGAAATATGAAGTCACAGTAAATCTATTTAAAGAAATAGACCCAGAAAAATCTGTAAAACATGTTCGCGAACGACACATTGAGTTAGTCCTAAAAAAGAAAGAAGACAAAGCTCCTTACTGGCCACAATTGACGAAAGAAAAGACTAAGCACCATTGGTTAAAAGTGGATTTCAATAAGTGGAAGGATGAAGATGATAGCGAAGATGAAGCCGAAGGACAAGACTCAGATTTTGGTGATCTAATGCGGTCGATGGGTCAAGGAGGCGGTATGGGTGGTATGGGCGGTATGGGTGGTATGGGAGGAATGGGAGGTATGGGTATGGGCGGTATGGGTGGTATGGGAATGGGTGGTTTAGGTGACAAGCCCTCTTTCGAAGGAATGGAAGAAGAAGATTCGGACGACGAAGATTTGCCCGACCTCGAAGAGTAATAGTGTTTTTATTACACCATATTCCATTTCCCTGTTATTGCATAAGGCCTCAGAAGAAGATGAAAAAATTGAAGCTATGAACGGACAGTCAAATCGATCACGCAGTTCACTG

• ....55,000 more contigs• Split up into ~1000 files

of ~55 contigs each.• Custom perl script or

bioperl routines.• contig-0.fsa ...

config-997.fsa

Page 26: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

NGS WMSngsui03.ngs.ac.uk

ngswms01.ngs.ac.uk

bdii.ngs.ac.ukuk

myproxy.ngs.ac.uk

RAL-NGS2,Scotgrid-Glasgow, Oxford-OERC,Manchester-NGS2RAL-LCG....etc 12,000+ CPUs

gridftp

ssh Login MEG

[email protected]

Page 27: Example applications of e-Infrastructure: NGS UI/WMS Jonathan Churchill - STFC/RAL jonathan.churchill@stfc.ac.uk

Job types

• Single Job– Normal: simple batch job– MPICH: parallel jobs– Interactive: o/p streamed back to the client

• Parametric– Set of similar jobs whose jdl attributes are

parameterised• Collections

– Group of jobs without dependencies• DAG

– Group of dependent jobs