Upload
loren-cannon
View
219
Download
3
Tags:
Embed Size (px)
Citation preview
The iPlant Collaborative Community Cyberinfrastructure for Life Science
Jason Williams Cold Spring Harbor Laboratory, iPlant
www.iPlantCollaborative.org
The iPlant CollaborativeVision
How can we prepare for science we can’t anticipate?
The iPlant CollaborativeVision
Enable life science researchers and educators to use and extend cyberinfrastructure to understand and ultimately predict the complexity of biological systems
The iPlant CollaborativeVision
Fulfilling our vision will mean enabling access to datasets and tools
Environmental data
Phenotype data
Phylogenetic Inferences
Ecological Models
Crop Models
Association Studies
Molecular Networks
Genomic data and analysis:• Sequencing/assembly• Transcriptome profiling• Variants• Functional annotation• Proteomics
The iPlant CollaborativeVision
Genomic data
Environmental data
Phenotype data
Phylogenetic Inferences
Ecological Models
Crop Models
Association Studies
Molecular Networks
Predictive and synthetic
Knowledge gathering
Retrodictive insights
Genomic data and analysis:• Sequencing/assembly• Transcriptome profiling• Variants• Functional annotation• Proteomics
This means working with a vast landscape of data and tools:
The iPlant CollaborativeVision
Genomic data
Environmental data
Phenotype data
Phylogenetic Inferences
Ecological Models
Crop Models
Association Studies
Molecular Networks
Predictive and synthetic
Knowledge gathering
Retrodictive insights
Navigating this landscape requires cyberinfrastructure:
The iPlant CollaborativeWhat is cyberinfrastructure?
Cyberinfrastructure consists of computing systems, data storage systems, instruments and data repositories, visualization environments, and people, linked together by software and networks to improve research productivity and enable breakthroughs not otherwise possible. --Craig Stewart
iPlant makes computation, data storage, cloud services, and software tools easily available to informaticians and researchers, leveraging existing CI investments.
Biological CyberinfrastructureThe Problem of Big Data in Biology
Biological CyberinfrastructureThe Problem of Big Data in Biology
• Initial funding in 2008• Almost 2 years of community input
gathering – software development starts in 2009
• Major CI components appear late 2010• Finished 5th year• Recommended for second 5 year term• > 9000 users • > 20K (analyses) jobs in 2012• > 10K HPC jobs)• 500 terabytes of user data
The iPlant CollaborativeWhere iPlant is today and where we are going
Image from: http://adammclane.com/2011/12/06/bottlenecks/
iPlant Renewed by NSFSeptember 2013 begins next 5 year period
Scientific Advisory Board
Focus on Genotype-Phenotype science
NSF Recommended expansion of scope beyond plants
The iPlant CollaborativeWhere iPlant is today and where we are going
The iPlant CollaborativeWhat we have to offer you
• Data Management & Storage Resources• Access to High Performance Computing Resources• Tool Integration System• Application Programming Interfaces (APIs)• Cloud Computing Resources• Genotype To Phenotype Science Enablement Portfolio• Tree of Life Science Enablement Portfolio • Image Analysis Platform• Support for Molecular Breeding Platform (IBP)• Support for AgMIP
How iPlant CI Enables DiscoveryChallenge: Create an easy-to-use platform powerful enough
to handle data-intensive biology
Many bioinformatics tools “off limits” to those without specialized computational backgrounds.
How iPlant CI Enables DiscoverySolution: Discovery Environment
An extensible platform for science
• High-powered computing• Data sharing/collaboration• Easy to use interface• Virtually limitless apps• Analysis history (provenance)
How iPlant CI Enables DiscoveryWhat the Discovery Environment means to bench biologists
“In one week I was able to align my RNA-Seq samples using a method that had previously took me a month on the bioinformatics laboratory computers…
Richard Barker – Univ. Wisconsin, Madison
How iPlant CI Enables DiscoveryChallenge: Collaborate and access software on demand
Frustrated bioinformaticians serving the needs of severalusers
+ works well / powerful- expensive / complex
Cartoon: http://phdhumor.blogspot.com/2008/12/on-lazy-day-for-bioinformatician.html
How iPlant CI Enables DiscoverySolution: Atmosphere
On-demand computing resource built on a cloud infrastructure
• Virtual Machine pre-configured with: Software Memory requirements Processing power
• Plant authentication and storage and HPC capabilities
• Build custom images/appliances and share with community
• Cross-platform desktop access to GUI applications in the cloud (using VNC)
How iPlant CI Enables DiscoveryWhat Atmosphere means to bioinformaticians
“What my users used to call me for, they now do on their own through Atmosphere. Now I can scale up my user community”
Nathan Miller, Univ. Wisconsin, Madison
• BLAST 400k transcripts against NCBI nr in 36 h vs. 2 months• Use iPlant Data Store to move 1500 high-res images per day
for analysis
“iPlant is a great equalizer.” Mike Covington, UC Davis
How iPlant CI Enables DiscoveryChallenge: Navigate biology’s “Data deluge”
HT Image data – GB’s per dayHT sequence data – TB’s per run
How iPlant CI Enables DiscoverySolution: iPlant Data Store
All data in within the same platform speed and accessibility
• Access your data from multiple iPlant services
• Automatic data backup redundant between University of Arizona and University of Texas (NSF Data management plan)
• Multiple ways to share data with collaborators
• Multi-threaded high speed transfers
• Default 100GB allocation. >1TB allocations available with justification
Source Time (s)
CD 320
Berkeley Server 150
External Drive 36*
USB2.0 Flash 30
iPlant Data Store 18*
My Computer 15
How iPlant CI Enables DiscoverySolution: iPlant Data Store
“The ability to transport 2TB of data overnight using the iRODS system was particularly helpful because previously, we had been mailing hard drives which is not an optimal solution to sharing big data.”
James Koltes ,Iowa State
• DNA Subway: Annotation, DNA Barcoding, RNA-Seq• Standalone Apps: TNRS, TreeViewer, PhytoBisque, etc.• iPlant Semantic Web – “Intelligent” workflow authoring• Foundation API: For programmers embedding iPlant CI capabilities
How iPlant CI Enables DiscoveryMany more applications not covered here…
Highlighted Objectives and Deliverables Community identified priorities
• Increased interoperability with other data providers – e.g. BioMarts, CoGe, MaizeGDB
• Data discovery through interaction with trait repositories (trait/plant ontologies)
• Workflows for variant discovery – SNP detection pipelines
• Scalable Genome Assembly Workflows – expanded capabilities with MAKER, InterProScan
• iPlant Data Commons – Resources for storage, data conversion, and metadata
The iPlant CollaborativeYour colleagues
Staff:Greg AbramSonali AdityaRitu AroraRoger BarthelsonRob BovillBrad BoyleGordon BurleighJohn CazesMike ConwayVictor CorderoRion DooleyAaron DubrowAndy EdmondsDmitry FedorovMelyssa FratkinMichael GattoUtkarsh GaurCornel Ghiban
Leadership Team
Steve Goff - UADan Stanzione – TACCMatthew Vaughn - TACCNirav Merchant - UADoreen Ware – CSHLMichael Schatz – CSHLDavid Micklos – CSHLAnn Stapleton – UNC WilmingtonRon Vetter – UNC Wilmington
Faculty Advisors & Collaborators:Ali AkogluKobus BarnardTimothy ClausnerBrian EnquistDamian GesslerRuth GreneJohn HartmanMatthew HudsonDavid LowenthalB.S. Manjunath
Students:Peter BaileyJeremy BeaulieuDevi BhattacharyaStorme BriscoeYaDi ChenDavid ChoiBarbara Dobrin
David NealeBrian O’MearaSudha RamDavid SaltMark SchildhauerDoug SoltisPam SoltisEdgar SpaldingAlexis StamatakisSteve Welch
Zhenyuan LuEric LyonsAaron MarcuseKubitzNaim MatasciSheldon McKayRobert McLayNathan MillerSteve Mock Martha NarroShannon OliverBenoit ParmentierJmatt PetersonDennis RobertsPaul SarandoJerry SchneiderBruce Schumaker
Steve GregoryMatthew HanlonNatalie HenriquesUwe HilgertNicole HopkinsEunSook JeongLogan JohnsonChris JordanKathleen KennedyMohammed KhalfanDavid KnappLars KoersterkSangeeta KuchimanchiKristian KvilekvalSue LauterTina LeeAndrew LenardsMonica Lent
Edwin SkidmoreBrandon SmithMary Margaret Sprinkle Sriram SrinivasanJosh SteinLisa StillwellJonathan StrootmanPeter Van BurenHans VasquezGrossRebeka VillarrealRamona WalllsLiya WangAnton Westveld Jason WilliamsJohn WregglesworthWeijia Xu
Andrew PredoehlSathee RavindranathKyle SimekGregory StriemerJason VandeventerNicholas WoodwardKuan Yang
Postdocs:Barbara BanburyChristos Noutsos Solon PissisBrad Ruhfel
John DonoghueYekatarina KhartianovaChris La RoseAmgad MadkourAniruddha MaratheAndre MercerKurt MichaelsZack Pierce
The iPlant CollaborativeWorkshop Goals (and considerations)
• Demonstrate some of the ways iPlant CI can advance your science
• Familiarize you with iPlant tools and services
• Help you identify the best way to get started
• Workshop is fast-paced
• Use the handouts and other resources to complete what we don’t finish (+30 people/sharing limited bandwidth!)