50
The Levinthal Lecture Philip E. Bourne Ph.D., FACMI Associate Director for Data Science National Institutes of Health [email protected] http://www.slideshare.net/pebourne Open Eye Meeting, Santa Fe, March 8, 2016

There is No Intelligent Life Down Here

Embed Size (px)

Citation preview

Page 1: There is No Intelligent Life Down Here

The Levinthal Lecture

Philip E. Bourne Ph.D., FACMIAssociate Director for Data Science

National Institutes of [email protected]

http://www.slideshare.net/pebourne

Open Eye Meeting, Santa Fe, March 8, 2016

Page 2: There is No Intelligent Life Down Here

What follows are my personal views and not necessarily those of my

employer, the US federal government.

Page 3: There is No Intelligent Life Down Here

There is No Intelligent Life Down Here

With Apologies to Cy

Phil Bourne

Open Eye Meeting, Santa Fe, March 8, 2016

Page 4: There is No Intelligent Life Down Here

My Interactions with Cy

Page 5: There is No Intelligent Life Down Here

……And pray that there's intelligent life somewhere up in space'Cause there's bugger all down here on Earth

 

Page 6: There is No Intelligent Life Down Here

Evidence #1

http://www.iucr.org/resources/commissions/crystallographic-computing/schools/school96/banquet-humour

Page 7: There is No Intelligent Life Down Here

Evidence #2

We throttle some but not all scholarly communication

Page 8: There is No Intelligent Life Down Here

Consider Cy’s Own words from around 1970 concerning data sharing

“At that time, it was difficult to obtain crystallographic coordinates although the results of the structural analysis had been published”

Page 9: There is No Intelligent Life Down Here

Local: Cooperative Community Action

Individual letters to editors of journals

Committees IUCr commission on

Biological Macromolecules ACA/USNCCr Richards committee

Funding agencies Articles in journals

Marvin Cassman Fred Richards Richard Dickerson

Courtesy of Helen Berman

Page 10: There is No Intelligent Life Down Here

PDB Growth

http://www.rcsb.org/pdb/statistics/contentGrowthChart.do?content=total&seqid=100

Page 11: There is No Intelligent Life Down Here

A Broad Culture of Sharing

1999 20042003 2007 20142008

Research Tools Policy

NIH Data Sharing Policy

Model Organism Policy

Genome-wide Association (GWAS) Policy

2012

NIH Public Access Policy (Publications)

Big Data to Knowledge (BD2K) Initiative

Genomic Data Sharing (GDS) Policy

Modernization of NIH Clinical Trials

White House Initiative

(2013 “Holdren Memo”)

Page 12: There is No Intelligent Life Down Here

Data Sharing: An Essential ComponentData Sharing: An Essential Component

Page 13: There is No Intelligent Life Down Here

Modernizing NIH Clinical Trials Activities

NIH-Funded trials published within 100 months of completion

Less than 50% published within 30 months of completion

BMJ 2012;344:d7292

Page 14: There is No Intelligent Life Down Here

Modernizing NIH Clinical Trials Activities:

Call to Action

Page 15: There is No Intelligent Life Down Here

Increasing Clinical Trial Transparency Proposed November 2014; Final Spring 2016 (est.)

Notice of Proposed Rulemaking: Clinical Trials Registration and Results Submission (FDAAA, Section 801)– Further implements statutory requirements on private and public

sponsors to register; report results on phase 2, 3, and 4 trials

– Includes drugs, biologics, and devices (except small feasibility)

Draft NIH Policy on Clinical Trial Information Dissemination – Extends Section 801 requirements to all NIH-funded clinical trials

– Includes phase 1 trials and trials of non-FDA regulated interventions such as behavioral trials

Page 16: There is No Intelligent Life Down Here

Evidence #3

Research does not follow a free market economy – you can get rewarded regardless of what you produce

Page 17: There is No Intelligent Life Down Here

True Free Market - Photography

DigitizationDeception

Disruption

Demonetization

Dematerialization

Democratization

Time

Vol

um

e, V

eloc

ity,

Var

iety

Digital camera invented byKodak but shelved

Megapixels & quality improve slowly; Kodak slow to react

Film market collapses;Kodak goes bankrupt

Phones replacecameras

Instagram,Flickr become thevalue proposition

Digital media becomes bona fide form of communication

Page 18: There is No Intelligent Life Down Here

False Market - Biomedical Research?

Digitization of Basic & Clinical Research & EHR’s

Deception

We Are Here

Disruption

Demonetization

Dematerialization

Democratization

Open science

Patient centered health care

Page 19: There is No Intelligent Life Down Here

Sustaining the System is a Problem

Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830

Page 20: There is No Intelligent Life Down Here

ReproducibilityChanging Value of Scholarship

Page 21: There is No Intelligent Life Down Here

“And that’s why we’re here today. Because something called precision medicine … gives us one of the greatest opportunities for new medical breakthroughs that we have ever seen.”

President Barack ObamaJanuary 30, 2015

New Science

Page 22: There is No Intelligent Life Down Here

Lets get a bit closer to home for this audience ….

Page 23: There is No Intelligent Life Down Here

Evidence #4

Molecular graphics has not advanced as it should

http://upload.wikimedia.org/wikipedia/commons/2/2e/Molecular-Graphics-GRIP-75-Console.jpg

Page 24: There is No Intelligent Life Down Here

What Did Cy Say?

1990 – “..although we may not have "chemical insight" there are more and more 3-D structures determined experimentally to aid in understanding which conformational results are reasonable and which are not; as long as we can look at them.”

Page 25: There is No Intelligent Life Down Here

Good News/Bad News of Molecular Graphics Today

Good News:– It is harder to think of a

more powerful way to comprehend complex data

– It has excited generations to the promise of science

– It has adapted to changing technologies

Bad News:– It is not an

adaptive/extensible environment

– It is not a collaborative environment

– It is not an integrative environment

– State not transferable

BMC Bioinformatics 2005, 6:21

Page 26: There is No Intelligent Life Down Here

1. A link brings up figures from the paper

0. Full text of PLoS papers stored in a database

2. Clicking the paper figure retrievesdata from the PDB which is

analyzed

3. A composite view ofjournal and database

content results

Is a database really different than a biological journal?

PloS Comp Biol 2005 1(3) e34

4. The composite view haslinks to pertinent blocks

of literature text and back to the PDB

1.

2.

3.

4.

The Knowledge and Data Cycle

Page 27: There is No Intelligent Life Down Here

Evidence #5

By Pbroks13 (talk) - File:Views on Evolution.jpgNew Scientist Magazine, 19 April 2008, Vol. 198, No.2652, page 31: "Evolution myths: It doesn't matter if people don't grasp evolution"New Scientist Magazine, 19 August 2006, Vol. 191, No.2565, page 11: "Why doesn't America believe in evolution?"., Public Domain, https://commons.wikimedia.org/w/index.php?curid=4403503

Page 28: There is No Intelligent Life Down Here

Nature’s Reductionism

There are ~ 20300 possible proteins>>>> all the atoms in the Universe

~58M protein sequences from 58K organisms (source RefSeq)

116,539 protein structures yield 1393 domain folds (SCOP)

Page 29: There is No Intelligent Life Down Here

Is structure a useful discriminator of species?

Yang, Doolittle & Bourne (2005) PNAS 102(2) 373-8

Page 30: There is No Intelligent Life Down Here

Method – Distance Determination

(FSF)SCOP

SUPERFAMILY

organisms

C. intestinalis C. briggsae F. rubripes

a.1.1 1 1 1

a.1.2 1 1 1

a.10.1 0 0 1

a.100.1 1 1 1

a.101.1 0 0 0

a.102.1 0 1 1

a.102.2 1 1 1

C. intestinalis C. briggsae F. rubripes

C. intestinalis 0 101 109

C. briggsae 0 144

F. rubripes 0

Presence/Absence Data Matrix

Distance Matrix

Page 31: There is No Intelligent Life Down Here

The Answer Would Appear to be Yes

It is possible to generate a reasonable tree of life from merely the presence or absence of superfamilies (FSFs) within a given proteome

Page 32: There is No Intelligent Life Down Here

Environmental Influence

Chris Dupont Scripps Institute of Oceanography

UCSD

DuPont, Yang, Palenik, Bourne. 2006 PNAS 103(47) 17822-17827

Page 33: There is No Intelligent Life Down Here

Evolution of the Earth

4.5 billion years of change 300+50K 1-5 atmospheres Constant photoenergy Chemical and geological

changes Life has evolved in this time

The ocean was the “cradle” for 90% of evolution

Page 34: There is No Intelligent Life Down Here

Whether the deep ocean became oxic or euxinic following the rise in atmospheric oxygen (~2.3 Gya) is debated, therefore both are shown (oxic ocean-solid lines, euxinic ocean-dashed lines).

The phylogenetic tree symbols at the top of the figure show one idea as to the theoretical periods of diversification for each Superkingdom.

Billions of years before present

Concentration

(O2

in arbitrary units, Zn and Fe in m

oles L-1

BacteriaArchaea

Eukarya

Oxygen

Zinc

Iron

CobaltManganese

Theoretical Levels of Trace Metals and Oxygen in the Deep Ocean Through Earth’s History

Replotted from Saito et al, 2003Inorganica Chimica Acta 356: 308-318

Page 35: There is No Intelligent Life Down Here

Evidence #6

Data resources including the PDB don’t fully serve the needs of the

user at this point?

Page 36: There is No Intelligent Life Down Here

Good News/Bad News for the PDB in this Changing Landscape

Bad News:

– Interface complex and uni-data oriented

– Data accessible; methods accessible (sort of); but not together

– Significant redundancy in services offered

– Sustainability

Good News:

– Annotation!

– Demand is increasing

– Integrated with other data types

– Restful services

Page 37: There is No Intelligent Life Down Here

General Problem Statement:

How to insure a high quality annotated data source that provides

the optimal environment for accessibility, integration and analysis

by a broad community of diverse users?

Page 38: There is No Intelligent Life Down Here

Enter the Commons

Page 39: There is No Intelligent Life Down Here

The CommonsComponents

Computing environment

– cloud or HPC (High Performance Computing)

– supports access, utilization, sharing and storage of digital objects.

Methods for Interoperability

– enables connectivity, shareability and interoperability between digital objects.

Digital object compliance model

– describes the properties of digital objects that enables them to be discoverable and shareable.

Page 40: There is No Intelligent Life Down Here

The CommonsComponents

Page 41: There is No Intelligent Life Down Here

BD2KCenter

BD2KCenter

BD2KCenter

BD2KCenter

BD2KCenter

BD2KCenter

DDICC

Software

Standards

Infrastructure - The Commons

Labs

Labs

Labs

Labs

Page 42: There is No Intelligent Life Down Here

Commons - Pilots

The Cloud Credits - business model

BD2K Centers

MODs (Model Organism Databases)

HMP Data and tools available in the cloud

NCI Cloud Pilots & Genomic Data

Commons

Page 43: There is No Intelligent Life Down Here

The PDB in the Commons

Components:– Annotated collection of data files

– API’s to access these data files

– Example methods using these APIs

Potential outcomes– Nothing happens?

– A new breed of developer starts to use PDB data in new ways ?

– The casual user has a broader set of services that previously?

– Quality declines/increases?

Page 44: There is No Intelligent Life Down Here

Delineation of polypharmacology across the human structural kinome

using a functional site interaction fingerprint approach

Zhao et al. J. Med. Chem., 2016, DOI: 10.1021/acs.jmedchem.5b02041

Evidence #7The difficulty to translate academic

ideas into products

Page 45: There is No Intelligent Life Down Here

Functional Site interaction Fingerprint (Fs-IFP) Approach

Step 1. Extract the Structural Kinome 208 kinase, 2383 ligand-bound structures

Step 2. All-against-all binding-site comparison

Step 3. Encoding Fs-IFP

Step 4. Statistics analysis and machine learning

Page 46: There is No Intelligent Life Down Here

Binding Mode Characterization of Kinase Inhibitors

Clustering of Fs-IFP across the structural kinome

Spatial locations for the binding regions for the eight clusters

Page 47: There is No Intelligent Life Down Here

Kinase Binding Profile Prediction Using Fs-IFP

ROC curves of the trained support

vector machine model

The performance of predicted binding profile of 51 type-I inhibitors to 344

kinases

Zheng Zhao, 03/05/2016
We predicted binding profile of 51 inhibhitor to 344 kinases using our trained SVM model. We then compared our result with the tranditional docking method (here we used Surflex software for docking). In the figure, X-axis is 344 kinases. Y-axis is percent. Percent >0 means our predicting method is betten than the tranditional docking method. Overall our model is better than docking method.
Page 48: There is No Intelligent Life Down Here

SummaryThere is more intelligence than we

think.

While we study complex systems they are also why we do not make faster

progress

Page 49: There is No Intelligent Life Down Here

Acknowledgements

The 133 Folks who have passed through my lab over the years

Cy Levinthal for giving me this opportunity

https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0

Page 50: There is No Intelligent Life Down Here

NIHNIH……

Turning Discovery Into HealthTurning Discovery Into Health

[email protected]