Upload
sandra-gesing
View
71
Download
0
Tags:
Embed Size (px)
Citation preview
Workflows for biological research at Notre Dame Sandra Gesing
Biological Research at Notre Dame • genomics • proteomics • molecular
simula0ons • docking • disease
modeling Black Swallowtail -‐ larvae and bu9erfly
Molecular Simula@ons and Docking • Predic0on and analysis of molecular structures • Numerous applica0ons, e.g. – Materials science – Drug design
ligands target
docking ?
Molecular Simula@ons and Docking • Predic0on and analysis of molecular structures • Numerous applica0ons, e.g. – Materials science – Drug design
ligands target
docking binding energy
scoring func0ons
binding pocket
Disease Modeling • vector-‐borne diseases, e.g.,
lympha0c filariaris, malaria • mathema0cal models • predic0on of interven0ons • data on weather,
demographics, interven0ons
Biological Research at Notre Dame • technologies and methods for crea0ng, analyzing and
predic0on of data available • immense amount of data, e.g., – ZINC database: ~20 Mio molecular structures – Human genome: ~ 3 Bio DNA base pairs
• compute-‐intensive tasks
Workflows • a sequence of connected steps in a defined order based
on their control and data dependencies
12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaag atagaatcaa
Figure copied from: Stuart Owen „Workflows with Taverna“
Communi@es • users are generally not IT specialists
MoSGrid Molecular Simula0on Grid • science gateway integrated with underlying compute and data
management infrastructure • distributed workflow
management • data repository
MoSGrid
MoSGrid
MoSGrid – Applica@on Areas Molecular Dynamics • Study and simula0on of molecular mo0on Quantum Chemistry • Study and simula0on of molecular electronic behavior rela0ve
to their chemical reac0vity Docking • Main focus on evalua0on of ligand-‐receptor interac0ons
(e.g., for drug design)
MoSGrid
MoSGrid
MoSGrid
VectorBase
VectorBase
VectorBase -‐ Galaxy
VectorBase -‐ Galaxy
Disease Modeling – Baysian Model
Disease Modeling – Baysian Model
An Old Idea: Makefiles part1 part2 part3: input.data split.py ./split.py input.data out1: part1 mysim.exe ./mysim.exe part1 >out1 out2: part2 mysim.exe ./mysim.exe part2 >out2 out3: part3 mysim.exe ./mysim.exe part3 >out3 result: out1 out2 out3 join.py ./join.py out1 out2 out3 > result
Slide copied from: Douglas Thain „Toward a Common Model for Highly Concurrent Applica0ons“
Makeflow = Make + Workflow
Makeflow
Local Condor SGE Work Queue
• Provides portability across batch systems. • Enable parallelism (but not too much!) • Fault tolerance at mul0ple scales. • Data and resource management.
Slide copied from: Douglas Thain „Toward a Common Model for Highly Concurrent Applica0ons“
Outlook • crea0on of more workflows in science gateways • integra0on of science gateways with ICTBioMed
infrastructure • integra0on of Makeflow and ICTBioMed infrastructure
PARTNERS