59
Pre-competitive Collaboration: Sharing Data to Increase Predictability Jean-Claude Bradley October 17, 2011 3rd Annual Drug Discovery Partnership: Filling the Pipeline Associate Professor of Chemistry Drexel University

Bradley Opal 2011

Embed Size (px)

DESCRIPTION

Jean-Claude Bradley presents at the Opal Events 3rd Annual Drug Discovery Partnership: Filling the Pipeline on Pre-competitive Collaboration: Sharing Data to Increase Predictability

Citation preview

Page 1: Bradley Opal 2011

Pre-competitive Collaboration: Sharing Data to Increase

Predictability

Jean-Claude Bradley

October 17, 2011

3rd Annual Drug Discovery Partnership: Filling the Pipeline

Associate Professor of ChemistryDrexel University

Page 2: Bradley Opal 2011

Opportunities for Competitive Collaboration

Page 3: Bradley Opal 2011

Industry is Sharing More

Page 4: Bradley Opal 2011

Solubility and Melting Points

are critical properties in the drug discovery

process

Page 5: Bradley Opal 2011

Data quality is essential for both

measurements and predictions based on

measurements

Page 6: Bradley Opal 2011

Openness is proving to be a powerful tool for

assessing the reliability of data

Page 7: Bradley Opal 2011

Solubility prediction for Taxol using Abraham descriptors

Pred Exp

Page 8: Bradley Opal 2011

Predicted temperature dependent solubility of Taxol in water based on

melting point (M)

Page 9: Bradley Opal 2011

The Trusted Source ModelBefore online databases (early 90s) searching for properties like melting

points using ONE “trusted source” was practical and acceptable as part of the

chemistry culture.• CRC Handbook• Merck Index• Chemical Vendor Catalogs (e.g. Sigma-Aldrich)

• Peer-Reviewed Journals

Single values don’t tend to be contradicted

Page 10: Bradley Opal 2011

Question Assumptions

Using technology, we can begin to replace the “trusted source”

model with one based on transparency and provenance

Page 11: Bradley Opal 2011

The Chemical Information Validation Sheet

567 curated and referenced measurements from Fall 2010 Chemical Information Retrieval course

Page 12: Bradley Opal 2011

Discovering outliers for melting points (stdev/average)

Page 13: Bradley Opal 2011

Investigating the m.p. inconsistencies of EGCG

Page 14: Bradley Opal 2011

Investigating the m.p. inconsistencies of cyclohexanone

Page 15: Bradley Opal 2011

Most popular data sources

Page 16: Bradley Opal 2011

Alfa Aesar donates melting points to the public

Page 17: Bradley Opal 2011

Open Melting Point Explorer

(Andrew Lang)

Page 18: Bradley Opal 2011

OutliersMDPI

datasetPhysProp (EPA

donated all data to public also)

Page 19: Bradley Opal 2011

Outliers for ethanol: Alfa Aesar and Oxford MSDS

Page 20: Bradley Opal 2011

Inconsistencies and SMILES problems within MDPI dataset

Page 21: Bradley Opal 2011

MDPI Dataset labeled with High Trust Level

Page 22: Bradley Opal 2011

Open Melting Point DatasetsCurrently 27,000 mps for 20,000 compounds

Page 23: Bradley Opal 2011

American Petroleum Institute 5 CPHYSPROP -30 CPHYSPROP 125 Cpeer reviewed journal (2008) 97.5 Cgovernment database -30 Cgovernment database 4.58 C

What is the melting point of 4-benzyltoluene?

Page 24: Bradley Opal 2011

The quest to resolve the melting point of 4-benzyltoluene: liquid at room temp and can be frozen <-30C (Evan Curtin)

Page 25: Bradley Opal 2011

Open Lab Notebook page measuring the melting point of 4-benzyltoluene

Page 26: Bradley Opal 2011

Motivation: Faster Science, Better Science

Page 27: Bradley Opal 2011

Ruling out all melting points above -15C?

Page 28: Bradley Opal 2011

Oops – 4-benzyltoluene freezes after 16 days at -15C!

Page 29: Bradley Opal 2011

Measuring the melting point by slowly heating from -15 C gives 5 C

Page 30: Bradley Opal 2011

There are NO FACTS, only measurements embedded

within assumptions

Open Notebook Science maintains the integrity of data

provenance by making assumptions explicit

Page 31: Bradley Opal 2011

TRUST

PROOF

Page 32: Bradley Opal 2011

Common errors in datasets

multiple melting points for the same compound in the same database

stereochemistry issues sign inversion conversion errors (Kelvin/Celcius

Fahrenheit/Celcius) bad SMILES (non-rendering) salts associated with SMILES for free base using boiling point for melting point

Page 33: Bradley Opal 2011

Open Random Forest modeling of Open Melting Point data using CDK descriptors

(Andrew Lang)

R2 = 0.78, TPSA and nHdon most important

Page 34: Bradley Opal 2011

Melting point prediction service

Page 35: Bradley Opal 2011

Melting point predictions and measurements on iPhone/iPad (Andrew Lang and Alex Clark)

Page 36: Bradley Opal 2011

Publication of double+ validated melting point dataset to Nature

Precedings and LuLu

Page 37: Bradley Opal 2011
Page 38: Bradley Opal 2011
Page 39: Bradley Opal 2011

Crowdsourcing Solubility Data

Page 40: Bradley Opal 2011

ONS Challenge Judges

Page 41: Bradley Opal 2011

ONS Challenge Award Winners

Page 42: Bradley Opal 2011

Web services for summary data

(Andrew Lang)

Page 43: Bradley Opal 2011

Reaction Attempts Book

Page 44: Bradley Opal 2011

Reaction Attempts Book: Reactants listed Alphabetically

Page 45: Bradley Opal 2011
Page 46: Bradley Opal 2011

Interactive NMR spectra using JSpecView or ChemDoodle and the Open JCAMP-DX

format

Page 47: Bradley Opal 2011

Predicting Best Solvent for Imine Formation using solubility and melting

point data (Evan Curtin)

Page 48: Bradley Opal 2011

Predicting Yield of Imine Formation in Ethanol

(Evan Curtin)

Page 49: Bradley Opal 2011

Google Apps Scripts web services

Page 50: Bradley Opal 2011

Google Apps Scripts for conveniently exploring melting

point data

Page 51: Bradley Opal 2011

Straight chain carboxylic acids from 1 to 10 carbons

Straight chain alcohols from 1 to 10 carbons

Comparison of model with triple validated measurements

Page 52: Bradley Opal 2011

Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for validation – only single source available)

Page 53: Bradley Opal 2011

Google Apps Scripts for planning reactions and creating schemes

Page 54: Bradley Opal 2011

Open Melting Points in Supplementary Data Pages of Wikipedia (Martin Walker)

Page 55: Bradley Opal 2011

All ONS web services

Page 56: Bradley Opal 2011

Some Initiatives Promoting More Openness in Drug Discovery

Page 57: Bradley Opal 2011

Open Primary Research in Drug Design using Web2.0 tools (malaria)

(blogs, wikis, Second Life, mailing lists)

Docking

Synthesis

Testing

Rajarshi GuhaIndiana U

JC BradleyDrexel U

Phil RosenthalUCSF

(malaria)

Dan ZaharevitzNCI

(tumors)

Tsu-Soo TanNanyang Inst.

Page 58: Bradley Opal 2011

Outcome of Guha-Bradley-Rosenthal collaboration

Page 59: Bradley Opal 2011

Conclusions

• For science to progress quickly there is great benefit in moving away from a “trusted source” model to one based on transparency and data provenance

• Open Notebook Science can be a useful tool in this context