Upload
aaralyn
View
32
Download
0
Tags:
Embed Size (px)
DESCRIPTION
A demonstration of the use of Datagrid testbed and services for the biomedical community. Biomedical applications work package V. Breton, Y Legré (CNRS/IN2P3) R. Météry (CS) Credits : C. Blanchet, T. Contamine, S. Gadras, M. Joubert, A.Minne, J. Montagnat. The Visual DataGrid Blast. - PowerPoint PPT Presentation
Citation preview
WP10
A demonstration of the use of Datagrid testbed and services for the biomedical
community
Biomedical applications work package
V. Breton, Y Legré (CNRS/IN2P3)
R. Météry (CS)
Credits : C. Blanchet, T. Contamine, S. Gadras, M. Joubert, A.Minne, J. Montagnat
WP10
The Visual DataGrid Blast
• A graphical interface to enter query sequences and select the reference database
• A script to execute the BLAST algorithm on the grid
• A graphical interface to analyze results
WP10 When/Where do biologists use BLAST ?
• (When ?) The first step for analysing new sequences: to compare DNA or protein sequences to other ones: stored in personal or public databases
• (Where ?) in a laboratory with an updated version of the genomics and post-genomics data banks– Requires equipment to store databases and run algorithms– Requires manpower for system & network maintenance and frequent
update of databases
• Most biologists use “integrated” web portals for their genomics comparative analysis: no need to worry about the biological file format and the method arguments
WP10Web portals for biologists under
growing pressure • Biologist enters sequences through web interface• Pipelined execution of bio-informatics algorithms
– Genomics comparative analysis– Phylogenetics– 2D, 3D molecular structure of proteins…
• The algorithms are executed on a local cluster– Big labs have big clusters …– But growing pressure
• More and more biologists• compare larger and larger sequences (whole
genomes)…• to more and more genomes…• with fancier and fancier algorithms !!
WP10Executing BLAST on the grid
UIJDL
Information Information ServiceService
Logging & Logging & BookkeepingBookkeeping
Job Submission Job Submission ServiceService
InInput Sandbox :put Sandbox :Input sequencesInput sequences
Job S
ub
mit
Event
Job S
ub
mi t
Event
Job StatusJob Status
Computing Computing ElementElement StorageStorage ElementElement
Credit : Fabio Hernandez
Replica Replica CatalogCatalog
DBBLAST
OutOutput put Sandbox :Sandbox :BLAST resultBLAST result
Resource BrokerResource Broker
DB
WP10 Actual demonstration
DB
BLASTSeq1 > dcscdssdcsdcdsc
bscdsbcbjbfvbfvbvfbvbvbhvbhsvbhdvbhfdbvfd
Seq2 > bvdfvfdvhbdfvb
bhvdsvbhvbhdvrefghefgdscgdfgcsdycgdkcsqkc
…
Seqn > bvdfvfdvhbdfvb
bhvdsvbhvbhdvrefghefgdscgdfgcsdycgdkcsqkchdsqhfduhdhdhqedezhhezldhezhfehflezfzejfv
DB
BLAST
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
DB
BLAST
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
DB
BLAST
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
RESULTdedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbfvbfvbvfbvbvbhvbhsvbhdvbhfdbvfdbvdfvfdvhbdfvbhdbhvdsvbhvbhdvrefghefgdscgdfgcsdycgdkcsqkcqhdsqhfduhdhdhqedezhdhezldhezhfehflezfzeflehfhezfhehfezhflezhflhfhfelhfehflzlhfzdjazslzdhfhfdfezhfehfizhflqfhduhsdslchlkchudcscscdscdscdscsddzdzeqvnvqvnq! Vqlvkndlkvnldwdfbwdfbdbd wdfbfbndblnblkdnblkdbdfbwfdbfn
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
UI
Computing element
Computing element
Inputfile
Computing element
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
Seq1 > dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbdfndfjvbndfbnbnfbjnbjxbnxbjk:nxbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
Seq2 > dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbdfndfjvbndfbnbnfbjnbjxbnxbjk:nxbf
dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbf
Seqn > dedzedzdzedezdzecdscsdcscdssdcsdcdscbscdsbcbjbdfndfjvbndfbnbnfbjnbjxbnxbjk:nxbf
WP10
The Grid impact on computing
• Swissprot vs Swissprot (100000 sequences)– Running time on one CPU : 228 hours
– Tests at Institut de Biologie et Chimie des Protéines (quadripro) : ~49 hours
– Tests on DataGrid (cc-in2p3) : 3 hours
• Impacts :– Reduced pressure on local computing
– Ability to handle very large jobs
WP10The grid impact on data handling
• DataGrid will allow mirroring of databases– An alternative to the
current costly replication mechanism
– Allowing web portals on the grid to access updated databases
BiomedicalBiomedicalReplica Replica CatalogCatalog
Trembl(EBI)
Swissprot(Geneva)
WP10This demo illustrates how grids can bring a revolution to genomics
• Grids expand the performances of genomics web portals– Distributed execution of bio-informatics algorithms,– Even the ones requiring huge amount of CPU– Maintenance of up-to-date biological databases over the network
• Grids open new perspectives in large scale genomics analysis– Complete genome annotation– Cross-genomes analysis– Data mining on distributed databases– Pipelining of huge automatic bio-informatics analysis– …