Meeting the Computational Challenges Associated with Human Health

Preview:

DESCRIPTION

Keynote as Supercomputing 15, New Orleans, November 20, 2015. Goal: to engage the HPC community in the work of the NIH.

Citation preview

Meeting the Computational Challenges Associated with Human

Health

Philip E. Bourne, PhDAssociate Director for Data Science

National Institutes Health

SC14 New Orleans November 20, 2014

We have come a long way in just one researcher’s career

We Have Both Been Very Successful

World Climate Report 2011

http://www.cnet.com/news/china-unseats-u-s-in-supercomputer-ranking/

We Have Both Been Very Successful

World Climate Report 2011

http://www.cnet.com/news/china-unseats-u-s-in-supercomputer-ranking/

1985 1990 1995 2000 2005 2010

Cray-XMP48220 Mflops

Cray C905 Gflops

Cray T3E1 Tflops

SDSC Abbreviated Timeline

http://www.sdsc.edu/News%20Items/PR101510_25years.html

.. But There is Much to Do

Number of drugs is:– Too few

– Too Long to get to market

– Not personalized

Rare diseases are ignored

Clinical trials are too limited in the number of patients too expensive and not retroactive

Education & training does not match well to current market needs

Research is not cost effective – Not easily replicated

– Too slow to disseminate

…..

.. And there is much ferment in research ..

http://fora.tv/2012/04/20/Congress_Unplugged_Phil_Bourne

.. and healthcare systems

http://www.genomicsengland.co.uk/the-100000-genomes-project/

But we can see the promise and much of that promise is driven by the data

revolution

The NIH Fire Hose Slide

An Example of That Promise:Comorbidity Network for 6.2M Danes

Over 14.9 Years

Jensen et al 2014 Nat Comm 5:4022

What is the NIH Doing to Fulfill That Promise?

ADDS Mission Statement

To foster an open ecosystem that enables biomedical* research to be

conducted as a digital enterprise that enhances health, lengthens life and

reduces illness and disability

* Includes biological, biomedical, behavioral, social, environmental, and clinical studies that relate to understanding health and disease.

Elements of The Ecosystem

Community Policy

Infrastructure

• Sustainability• Collaboration• Training

Elements of The Ecosystem

Community Policy

Infrastructure

• Sustainability Collaboration

• Training

VirtuousResearch

Cycle

Policies – Now & Forthcoming

Data Sharing– Genomic data sharing announced

– Data sharing plans on all research awards

– Data sharing plan enforcement

• Machine readable plan

• Repository requirements to include grant numbers

http://www.nih.gov/news/health/aug2014/od-27.htm

Policies - Forthcoming

Data Citation– Goal: legitimize data as a form of scholarship

– Process:

• Machine readable standard for data citation (done)

• Endorsement of data citation for inclusion in NIH bib sketch, grants, reports, etc.

• Example formats for human readable data citations

• Slowly work into NLM/NCBI workflow

BD2KCenter

BD2KCenter

BD2KCenter

BD2KCenter

BD2KCenter

BD2KCenter

DDICC

Software

Standards

Infrastructure - The Commons

Labs

Labs

Labs

Labs

What is the Commons?

A Conceptual Framework for sharing and being FAIR:– Finding

– Accessing

– Integrating,

– Reusing

digital research objects with attribution

The Commons is agnostic of computing platform

The Commons

Digital Objects (with UIDs)

Search(indexed metadata)

Computing Platform

Th

e C

omm

ons

The Commons

Digital Objects (with UIDs)

Search(indexed metadata)

Computing Platform

The

Co

mm

ons

The Commons: Compute Platforms

The CommonsConceptual Framework

Public CloudPlatforms

Super Computing (HPC) Platforms

Other Platforms ?

Google, AWS (Amazon)

Microsoft (Azure), IBM,

other?

In house compute

solutions

Private clouds, HPC

– Pharma

– The Broad

– Bionimbus

Low access by NIH PIs

Super Computing 2014

ADDS coordinating

meeting with SC centers

NERSC “Commons Pilot”

The Commons

Digital Objects (with UIDs)

Search(indexed metadata)

Computing Platform

Th

e C

omm

ons

The Commons: Research Objects APIs and Search

Research Object IDs under discussion by the community

– BD2K centers, NCI Cloud pilots (Google & AWS supported)

– Large Public Data Sets, MODs

Search

– BD2K Data and Software Discovery Indices

– Google Search functions

Appropriate APIs being developed by the community eg

GA4GH

Use cases

The CommonsConceptual Framework

Public CloudPlatforms

The Commons:Next Steps

Next Steps

– Currently identifying pilot projects

Interested speak with Vivien Bonazzi

The CommonsConceptual Framework

Public CloudPlatforms

Commons – Simple Implementation Stack

Scalable HardwareScalable Hardware

Big Data SoftwareBig Data Software

Biomedical Data Software

Biomedical Data Software

APIsAPIs App StoreApp

Store

Biomedical DATABiomedical DATA

The Commons: Business Model

[George Komatsoulis]

1) Build an OPEN digital framework for data science training:

NIH Data Science Workforce Development Center

1) Develop short-term training opportunities: Courses, educational resources, etc.

1) Develop the discipline of biomedical data science and support cross-training – OPEN courseware

Community: TrainingData Science Training Goals

All goals have a diversity component and manate

What Is Needed? – Some Examples from Across the ICs

Homogenization of disparate large unstructured datasets

Deriving structure from unstructured data

Feature mapping and comparison from image data

Visualization and analysis of multi-dimensional phenotypic datasets

Causal modeling of large scale dynamic networks and subsequent discovery

Utilize data that are sparsely and irregularly sampled and noisy

BD2K can offer reference datasets and points of domain expertise to explore these questions

Potential Outcomes

Mobility: improve the outcomes of surgeries in children with cerebral palsy and gait pathology

Wellness: markers derived from constantly monitored eHealth/mobile health devices – apply to smoking cessation, weight loss

Cancer: further personalization of treatment

Mental Health: better identify factors that resist and promote brain disease e.g., schizophrenia, bipolar disorder, major depression, attention deficit hyperactivity disorder (ADHD), obsessive compulsive disorder (OCD), autism

Addiction: utilizing social media to track and treat drug use and addiction

In Summary

CIA World Fact Book

Associate Director for Data Science

Commons BD2K Efficiency

Sustainability Education Innovation Process

• Cloud – Data & Compute

• Search• Security • Reproducibility

Standards• App Store

• Coordinate• Hands-on• Syllabus• MOOCs

• Community• Centers• Training Grants• Catalogs• Standards• Analysis

• Data Resource Support

• Metrics• Best

Practices• Evaluation• Portfolio

Analysis

The Biomedical Research Digital Enterprise

Partnerships

Collaboration

Programmatic Theme

Deliverable

Example Features • IC’s• Researchers• Federal

Agencies• International

Partners• Computer

Scientists

Scientific Data Council External Advisory Board

Training

NIHNIH……Turning Discovery Into HealthTurning Discovery Into Health

philip.bourne@nih.gov

Recommended