View
218
Download
2
Tags:
Embed Size (px)
Citation preview
BioScience on the TeraGrid
Daniel S. Katz
Director of Science, TeraGrid GIG
Senior Fellow, Computation Institute, University of Chicago & Argonne National Laboratory
Affiliate Faculty, Center for Computation & Technology, LSU
Adjunct Associate Professor, Electrical and Computer Engineering, LSU
What is the TeraGrid• World’s largest distributed cyberinfrastructure for open scientific
research, supported by US NSF
• Integrated high performance computers (>2 PF HPC & >27000 HTC CPUs), data resources (>2 PB disk, >60 PB tape, data collections), visualization, experimental facilities (VMs, GPUs, FPGAs), network at 11 Resource Provider sites
• Allocated to US researchers and their collaborators through national peer-review process
• DEEP: provide powerful computational resources to enable research that can’t otherwise be accomplished
• WIDE: grow the community of computational science and make the resources easily accessible
• OPEN: connect with new resources and institutions
• Integration: Single {portal, sign-on, help desk, allocations process, advanced user support, EOT, campus champions}
http://www.teragrid.org/
Governance
• 11 Resource Providers (RPs) funded under separate agreements with NSF– Different start and end dates– Different goals– Different agreements– Different funding models
• 1 Coordinating Body – Grid Infrastructure Group (GIG)– University of Chicago/Argonne National Laboratory– Subcontracts to all RPs and six other universities– 7-8 Area Directors– Working groups with members from many RPs
• TeraGrid Forum with Chair
How TeraGrid Is Used
Use ModalityCommunity Size
(rough est. - number of users)
Batch Computing on Individual Resources 850Exploratory and Application Porting 650Workflow, Ensemble, and Parameter Sweep 250Science Gateway Access 500Remote Interactive Steering and Visualization 35Tightly-Coupled Distributed Computation 102006 data
How One Uses TeraGrid
ComputeService
VizService
DataService
Network, Accounting, …
RP 1
RP 3
RP 2
TeraGrid Infrastructure (Accounting, Network, Authorization,…)
POPS (for now)
Science Gateways
UserPortal
Command Line
Science Gateways
• A natural extension of Internet & Web 2.0• Idea resonates with Scientists
– Researchers can imagine scientific capabilities provided through familiar interface• Mostly web portal or web or client-server program
• Designed by communities; provide interfaces understood by those communities– Also provide access to greater capabilities (back end)– Without user understand details of capabilities– Scientists know they can undertake more complex analyses
and that’s all they want to focus on– TeraGrid provides tools to help developer
• Seamless access doesn’t come for free– Hinges on very capable developer
Nancy Wilkins-Diehr
TeraGrid -> XD Future
• Current RP agreements end in March 2011– Except track 2 centers (current and future)
• TeraGrid XD (eXtreme Digital) starts in April 2011– Era of potential interoperation with OSG and others– New types of science applications?
• Current TG GIG continues through July 2011– Allows four months of overlap in coordination– Probable overlap between GIG and XD members
• Blue Waters (track 1) production in 2011
Grid Enabled Neurosurgical Imaging Using Simulation (GENIUS)
Peter Coveney, University College London
Model large-scale patient-specific cerebral blood flow in clinically-relevant time scale
• Provide simulation support within the operating theatre for neuroradiologists
• Provide new information to surgeons for patient management and therapy:1. Diagnosis and risk assessment
2. Predictive simulation in therapy
• Provide patient-specific information to help plan embolisation of arterio-venous malformations, coiling of aneurysms, etc.Clinical workflow:
•Book computing resources in advance or use preemption
•Shift imaging data around quickly over high-bandwidth low-latency dedicated links
•Interactive simulations and real-time visualization for immediate feedback
OLSGW Gadgets
• OLSGW Integrates bio-informatics applications• BLAST, InterProScan, CLUSTALW , MUSCLE, PSIPRED, ACCPRO, VSL2
• 454 Pyrosequencing service under development• Four OLSGW gadgets have been published in the iGoogle gadget directory. Search
for “TeraGrid Life Science”.
Wenjun Wu, Thomas Uram, Michael Papka, ANL
Multiscale Simulation of Arterial Tree
Need to combine multi-scale models: 1D (arteries), 3D Navier Stokes (organs, arterial junctions, etc.), Dissipative Particle Dynamics (capillaries, venules, arterioles, blood cells, etc.), Molecular Dynamics (blood cells, platelets, molecular adhesion, etc.)
NIH/NSF-IMAG project: George Em Karnaidakis, Brown
activated platelets
Arterioles/venules 50 microns
Platelet diameter is 2-4 µmNormal platelet concentration in blood is 300,000/mm3
Functions: activation, adhesion to injured walls, and other platelets
Expressed Sequence Tag (EST) Pipeline• ESTs are a collection of random cDNA sequences, sequenced from a cDNA library
or sequencing devices– Typical inputs are O(Million) sequences– Newer 454 devices from higher volume, are relatively easy to obtain and operate– Stored using FASTA format
• ESTs are clustered and assembled to form contigs• Contigs then used to identify potential unknown genes, by Blasting against
known protein database• Goal: Use TeraGrid for backend computing, with existing software, and a gateway
frontend
Initial results – run that took 5 days on local cluster done in 2 days – more opt. underway
A. Kulshrestha, S. L. Pallickara, K. N. Muthuram, C. Kong, Q. Dong, M. Pierce, H. Tang, IU
Experimental structures
Atomic-level simulation
Coarse-grained (CG) model development CG simulation
An iterative modeling approach combining experimental imaging (cryo-electron tomography), coarse-grained (CG) simulation, and atomic-level molecular dynamics (MD)
New CG Interactions from MD
Wright, Schooler, Ding, Kieffer, Fillmore, Sundquist, Jenson, EMBO, 26, 2218, 2007
CG model refinement Key CG interactions
Multiscale Computer Simulation of the Immature HIV-1 Virion
G. A. Voth, U. of Chicago
CIPRES Portal: A New Science Gateway for Systematics
• Systematics: study of diversification of life and relationships among living things through time
• CIPRES: a flexible web application that can be sustained by the community at minimal cost even beyond the funding period of the project
• Tools include parallel versions of MrBayes, RAxML, GARLI• User requirements include:
– Access to most or all native command line options– Add new tools quickly– Provide personal user space for storing results– Use TeraGrid resources to quickly provide results
• Cited in at least 35 publications, including Nature, PNAS, Cell– Examples: New Family Tree for Arthropoda, Genome Sequence of a
Transitional Eukaryote, Co-evolution of Beetles and Flowering Plants• Used routinely in at least 5 undergraduate classes• Use 77% US (incl. 17 EPSCoR states), 23% 33 other countries
Mark Miller, SDSC
Patient-Specific HIV Drug Therapy
Peter Coveney, University College London
HIV-1 Protease is a common target for HIV drug therapy• Enzyme of HIV responsible for protein maturation• Target for anti-retroviral Inhibitors• Example of structure assisted drug design• 9 FDA inhibitors of HIV-1 proteaseSo what’s the problem?• Emergence of drug resistant mutations in protease• Render drug ineffective• Drug resistant mutants have emerged for all FDA inhibitors• Too many mutations to be interpreted by a clinician
Solution: build a Binding Affinity Calculator (BAC)• Provide tools that allow simulations to be used in clinical context, including
lightweight client– User only needs specify enzyme, mutations relative to wildtype, drug
• Others options can be specified but begin as default• Requires large number of simulations to be constructed and run automatically
(across distributed HPC resources)– To investigate generalisation– Automation is critical for clinical use
• Turn-around time scale of around a week is required• Trade off between accuracy and time-to-solution
Initial results – ensemble MD calculations for lopinavir vs wildtype & five mutants – appear promising; excellent relative ranking in binding free energies
Scripting Protein Structure Prediction
T. Sosnick, K. Freed, G. Hocky, J. DeBartolo, A. Adhikari, J. Xu, W. Wilde, U. Chicago
…1000
predict()calls
Analyze()
int nSim = 1000;int maxRounds = 3;Protein pSet[ ] <ext; exec="Protein.map">;float startTemp[ ] = [ 100.0, 200.0 ];float delT[ ] = [ 1.0, 1.5, 2.0, 5.0, 10.0 ];foreach p, pn in pSet { foreach t in startTemp { foreach d in delT { ItFix(p, nSim, maxRounds, t, d); } }}
ItFix(){ foreach sim in [1:nSim] { (structure[sim], log[sim]) = predict(p, t, d); } result = analyze(structure)}
10 proteins x 1000 simulations x 3 rounds x 2 temps x 5 delta-T’s = 300K application runs