View
473
Download
1
Category
Preview:
DESCRIPTION
Keynote as Supercomputing 15, New Orleans, November 20, 2015. Goal: to engage the HPC community in the work of the NIH.
Citation preview
Meeting the Computational Challenges Associated with Human
Health
Philip E. Bourne, PhDAssociate Director for Data Science
National Institutes Health
SC14 New Orleans November 20, 2014
We have come a long way in just one researcher’s career
We Have Both Been Very Successful
World Climate Report 2011
http://www.cnet.com/news/china-unseats-u-s-in-supercomputer-ranking/
We Have Both Been Very Successful
World Climate Report 2011
http://www.cnet.com/news/china-unseats-u-s-in-supercomputer-ranking/
1985 1990 1995 2000 2005 2010
Cray-XMP48220 Mflops
Cray C905 Gflops
Cray T3E1 Tflops
SDSC Abbreviated Timeline
http://www.sdsc.edu/News%20Items/PR101510_25years.html
.. But There is Much to Do
Number of drugs is:– Too few
– Too Long to get to market
– Not personalized
Rare diseases are ignored
Clinical trials are too limited in the number of patients too expensive and not retroactive
Education & training does not match well to current market needs
Research is not cost effective – Not easily replicated
– Too slow to disseminate
…..
.. And there is much ferment in research ..
http://fora.tv/2012/04/20/Congress_Unplugged_Phil_Bourne
.. and healthcare systems
http://www.genomicsengland.co.uk/the-100000-genomes-project/
But we can see the promise and much of that promise is driven by the data
revolution
The NIH Fire Hose Slide
An Example of That Promise:Comorbidity Network for 6.2M Danes
Over 14.9 Years
Jensen et al 2014 Nat Comm 5:4022
What is the NIH Doing to Fulfill That Promise?
ADDS Mission Statement
To foster an open ecosystem that enables biomedical* research to be
conducted as a digital enterprise that enhances health, lengthens life and
reduces illness and disability
* Includes biological, biomedical, behavioral, social, environmental, and clinical studies that relate to understanding health and disease.
Elements of The Ecosystem
Community Policy
Infrastructure
• Sustainability• Collaboration• Training
Elements of The Ecosystem
Community Policy
Infrastructure
• Sustainability Collaboration
• Training
VirtuousResearch
Cycle
Policies – Now & Forthcoming
Data Sharing– Genomic data sharing announced
– Data sharing plans on all research awards
– Data sharing plan enforcement
• Machine readable plan
• Repository requirements to include grant numbers
http://www.nih.gov/news/health/aug2014/od-27.htm
Policies - Forthcoming
Data Citation– Goal: legitimize data as a form of scholarship
– Process:
• Machine readable standard for data citation (done)
• Endorsement of data citation for inclusion in NIH bib sketch, grants, reports, etc.
• Example formats for human readable data citations
• Slowly work into NLM/NCBI workflow
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
DDICC
Software
Standards
Infrastructure - The Commons
Labs
Labs
Labs
Labs
What is the Commons?
A Conceptual Framework for sharing and being FAIR:– Finding
– Accessing
– Integrating,
– Reusing
digital research objects with attribution
The Commons is agnostic of computing platform
The Commons
Digital Objects (with UIDs)
Search(indexed metadata)
Computing Platform
Th
e C
omm
ons
The Commons
Digital Objects (with UIDs)
Search(indexed metadata)
Computing Platform
The
Co
mm
ons
The Commons: Compute Platforms
The CommonsConceptual Framework
Public CloudPlatforms
Super Computing (HPC) Platforms
Other Platforms ?
Google, AWS (Amazon)
Microsoft (Azure), IBM,
other?
In house compute
solutions
Private clouds, HPC
– Pharma
– The Broad
– Bionimbus
Low access by NIH PIs
Super Computing 2014
ADDS coordinating
meeting with SC centers
NERSC “Commons Pilot”
The Commons
Digital Objects (with UIDs)
Search(indexed metadata)
Computing Platform
Th
e C
omm
ons
The Commons: Research Objects APIs and Search
Research Object IDs under discussion by the community
– BD2K centers, NCI Cloud pilots (Google & AWS supported)
– Large Public Data Sets, MODs
Search
– BD2K Data and Software Discovery Indices
– Google Search functions
Appropriate APIs being developed by the community eg
GA4GH
Use cases
The CommonsConceptual Framework
Public CloudPlatforms
The Commons:Next Steps
Next Steps
– Currently identifying pilot projects
Interested speak with Vivien Bonazzi
The CommonsConceptual Framework
Public CloudPlatforms
Commons – Simple Implementation Stack
Scalable HardwareScalable Hardware
Big Data SoftwareBig Data Software
Biomedical Data Software
Biomedical Data Software
APIsAPIs App StoreApp
Store
Biomedical DATABiomedical DATA
The Commons: Business Model
[George Komatsoulis]
1) Build an OPEN digital framework for data science training:
NIH Data Science Workforce Development Center
1) Develop short-term training opportunities: Courses, educational resources, etc.
1) Develop the discipline of biomedical data science and support cross-training – OPEN courseware
Community: TrainingData Science Training Goals
All goals have a diversity component and manate
What Is Needed? – Some Examples from Across the ICs
Homogenization of disparate large unstructured datasets
Deriving structure from unstructured data
Feature mapping and comparison from image data
Visualization and analysis of multi-dimensional phenotypic datasets
Causal modeling of large scale dynamic networks and subsequent discovery
Utilize data that are sparsely and irregularly sampled and noisy
BD2K can offer reference datasets and points of domain expertise to explore these questions
Potential Outcomes
Mobility: improve the outcomes of surgeries in children with cerebral palsy and gait pathology
Wellness: markers derived from constantly monitored eHealth/mobile health devices – apply to smoking cessation, weight loss
Cancer: further personalization of treatment
Mental Health: better identify factors that resist and promote brain disease e.g., schizophrenia, bipolar disorder, major depression, attention deficit hyperactivity disorder (ADHD), obsessive compulsive disorder (OCD), autism
Addiction: utilizing social media to track and treat drug use and addiction
In Summary
CIA World Fact Book
Associate Director for Data Science
Commons BD2K Efficiency
Sustainability Education Innovation Process
• Cloud – Data & Compute
• Search• Security • Reproducibility
Standards• App Store
• Coordinate• Hands-on• Syllabus• MOOCs
• Community• Centers• Training Grants• Catalogs• Standards• Analysis
• Data Resource Support
• Metrics• Best
Practices• Evaluation• Portfolio
Analysis
The Biomedical Research Digital Enterprise
Partnerships
Collaboration
Programmatic Theme
Deliverable
Example Features • IC’s• Researchers• Federal
Agencies• International
Partners• Computer
Scientists
Scientific Data Council External Advisory Board
Training
NIHNIH……Turning Discovery Into HealthTurning Discovery Into Health
philip.bourne@nih.gov
Recommended