41
Mining public domain data as a basis for drug repurposing Antony J Williams, Sean Ekins and Valery Tkachenko ACS Philadelphia August 2012 http://tinyurl.com/d6wodsl

Mining public domain data as a basis for drug repurposing

Embed Size (px)

DESCRIPTION

Online databases containing high throughput screening and other property data continue to proliferate in number. Many pharmaceutical chemists will have used databases such as PubChem, ChemSpider, DrugBank, BindingDB and many others. This work will report on the potential value of these databases for providing data to be used to repurpose drugs using cheminformatics-based approaches (e.g. docking, ligand-based machine learning methods). This work will also discuss the potentially related applications of the Open PHACTS project, a European Union Innovative Medicines Initiative project, that is utilizing semantic web based approaches to integrate large scale chemical and biological data in new ways. We will report on how compound and data quality should be taken into account when utilizing data from online databases and how their careful curation can provide high quality data that can be used to underpin the delivery of molecular models that can in turn identify new uses for old drugs.

Citation preview

Page 1: Mining public domain data as a basis for drug repurposing

Mining public domain data as a basis for drug repurposing

Antony J Williams, Sean Ekins and Valery Tkachenko

ACS Philadelphia August 2012

http://tinyurl.com/d6wodsl

Page 2: Mining public domain data as a basis for drug repurposing

Drug Repurposing

Drug repurposing commonly means data reexamination also!

Lots of data mining occurs

Then more screening which creates more data..

LOTS of public databases used to examine repurposing…

Page 4: Mining public domain data as a basis for drug repurposing

Interlinked on the semantic web

Page 5: Mining public domain data as a basis for drug repurposing

Where do you get your data?

Databases? Patents? Papers? Your own lab? Collaborators? All of the above?

What is likely common to all sources? Data Quality issues. There is no perfect database.

Page 6: Mining public domain data as a basis for drug repurposing

Public Domain Databases

Our databases are a mess…

Non-curated databases are proliferating errors

We source and deposit data between databases

Original sources of errors hard to determine

Curation is time-consuming and challenging

Page 7: Mining public domain data as a basis for drug repurposing
Page 8: Mining public domain data as a basis for drug repurposing

Availability of libraries of FDA drugs

Johns Hopkins Clinical Compound library- made compounds available at cost

Page 9: Mining public domain data as a basis for drug repurposing

The FDA Drug Database

Page 10: Mining public domain data as a basis for drug repurposing

The DailyMed Database

Page 11: Mining public domain data as a basis for drug repurposing

Government Databases Should Come With a Health Warning

Williams and Ekins, DDT, 16: 747-750 (2011)

Page 12: Mining public domain data as a basis for drug repurposing

What is Neomycin?

Page 13: Mining public domain data as a basis for drug repurposing

Not this…

Page 14: Mining public domain data as a basis for drug repurposing

Substructure # of

Hits

# of

Correct

Hits

No

stereochemistry

Incomplete

Stereochemistry

Complete but

incorrect

stereochemistry

Gonane 34 5 8 21 0

Gon-4-ene 55 12 3 33 7

Gon-1,4-diene 60 17 10 23 10

Williams, Ekins and TkachenkoDrug Disc Today 17: 685-701 (2012)

Data Errors in the NPC Browser: Analysis of Steroids

Page 15: Mining public domain data as a basis for drug repurposing
Page 16: Mining public domain data as a basis for drug repurposing

Drug Disambiguation Project

Page 17: Mining public domain data as a basis for drug repurposing

NCATS Discovering “New Therapeutic Uses for Existing Molecules”

58 Molecule names and identifiers. Where are the “structures”?

Page 18: Mining public domain data as a basis for drug repurposing

NCATS dataset• Several groups tried to collate molecules• Chris Lipinski provided approximately 30 unique molecules

• Simple molecule descriptors shows no difference between compounds classified as discontinued (N= 15) or those in clinical trials (n = 14).

• Where is the definitive set of publicly accessible molecules for computational repurposing and analysis?

Page 19: Mining public domain data as a basis for drug repurposing

Drug structure quality is important..

Many groups ARE doing in silico repositioning

Integrating or using sets of FDA drugs..and if structures are incorrect predictions will be

Where is the definitive set of FDA approved drugs with correct structures?

Ideally we need linkage between in vitro data and clinical data

Page 20: Mining public domain data as a basis for drug repurposing

We have a problem…

Lots of data available but quality is suspect Errors proliferate database to database Data continues to flow in unabated When errors are identified hard to get fixed! Data licensing is confusing – “Open Data” We are “takers” not “givers” mostly… Standards are lacking:

Data licensing Data processing – structure standardization

Page 21: Mining public domain data as a basis for drug repurposing

• Let’s agree collaboration and crowdsourcing can help

• Provide SIMPLE ways to provide feedback• Contribute when possible – databases should

provide feedback mechanisms• Adopt standards for structure handling and

representation• Adopt standards for data interchange• Allow machine handling of data – use the

power of the semantic web

So what needs to happen to improve?

Page 22: Mining public domain data as a basis for drug repurposing

Williams, Ekins and Tkachenko, Drug Disc Today 17: 685-701 (2012)

Page 23: Mining public domain data as a basis for drug repurposing

Collaboration on Curation Collaborate on curation…share through standards

and open interfaces

Page 24: Mining public domain data as a basis for drug repurposing

All DBs should take comments!

Page 25: Mining public domain data as a basis for drug repurposing

Standardize

Use the SRS as guidance for standardization

Page 26: Mining public domain data as a basis for drug repurposing

“Appify” curation and collaboration

• The data network is complex• “Appify” collaboration and

curation networks • Increasing crowdsourcing role

for data analysis

Ekins & Williams, Pharm Res, 27: 393-395, 2010.

Page 27: Mining public domain data as a basis for drug repurposing

Mobile Apps for Drug Discovery

Page 28: Mining public domain data as a basis for drug repurposing

Open Drug Discovery Teams

Free iOS app used to expose repurposing data All of this data has been tweeted

http://tinyurl.com/6l9qy4f

Ekins, Clark and Williams, Mol Informatics, in Press 2012

Page 29: Mining public domain data as a basis for drug repurposing

Open Drug Discovery Teams

Page 30: Mining public domain data as a basis for drug repurposing

Gather stakeholders. Decide if goals are primarily scientific, commercial or mixed.

Explore benefits of open licensing and drawbacks of enclosure. Hold closely to open definitions and standards. Do not write your own IP licenses!

Provide simple explanations for terms of use. Use metadata to indicate licensing terms explicitly - the Creative Commons Rights Expression Language is a good tool.

Do not lock up metadata. If you can’t make the data public domain, make the metadata public domain.

 

Simple Rules for licensing “open” data

Williams, Wilbanks and Ekins. PLoS Comput. Biol. in Press Sept.2012

Page 31: Mining public domain data as a basis for drug repurposing

Open PHACTS Project Develop a set of robust standards… Implement the standards in a semantic integration hub Deliver services to support drug discovery programs in

pharma and public domain 22 partners, 8 pharmaceutical companies, 3 biotechs 36 months project

Guiding principle is open access, open usage, open source- Key to standards adoption -

Guiding principle is open access, open usage, open source- Key to standards adoption -

Page 32: Mining public domain data as a basis for drug repurposing
Page 33: Mining public domain data as a basis for drug repurposing
Page 34: Mining public domain data as a basis for drug repurposing

To facilitate THIS process!

What’s the structure?What’s the structure?

Are they in our file?

Are they in our file?

What’s similar?What’s

similar?

What’s the target?

What’s the target?Pharmacology

data?Pharmacology

data?

Known Pathways?

Known Pathways?

Working On Now?

Working On Now?Connections

to disease?Connections to disease?

Expressed in right cell type?Expressed in

right cell type?

Competitors?Competitors?

IP?IP?

Page 35: Mining public domain data as a basis for drug repurposing

It’s not JUST structures of course…

Page 36: Mining public domain data as a basis for drug repurposing

Taxol: Paclitaxel Bioassay Data

Most Bioassay data associated with structure with one ambiguous stereocenter

Page 37: Mining public domain data as a basis for drug repurposing

  Hydrophobic

features (HPF)

Hydrogen

bond acceptor

(HBA)

Hydrogen

bond donor

(HBD)

Observed vs.

predicted IC50

r

Acoustic mediated process 2 1 1 0.92

Disposable tip mediated process 0 2 1 0.80

Data from 2 AstraZeneca patents - Ephrin pharmacophores developed using data for 14 compounds with IC50. Different dispensing methods give different results. Impact hypotheses and could impact drug discovery.

Ekins, Olechno and Williams, Submitted 2012

Acoustic Disposable tip

Measuring data: dispensing dependencies

Page 38: Mining public domain data as a basis for drug repurposing

Acoustically-derived IC50 values were 1.5 to 276.5-fold lower than for tip-based dispensing

• Pharmacophores and other computational models are used to guide medicinal chemistry.

• Non tip-based methods may improve HTS results and avoid misleading computational and statistical models.

• No analysis of influence of dispensing processes on data.

• Public databases should annotate metadata to create larger datasets for comparing different computational methods. How much data is reproducible, accurate, valid? The challenge of high-throughput science.

Measuring data: dispensing dependencies

Page 39: Mining public domain data as a basis for drug repurposing

Conclusions

Page 40: Mining public domain data as a basis for drug repurposing

Acknowledgments

Sean Ekins Christopher Lipinski Joe Olechno John Wilbanks Drug Disambiguation project team RSC Cheminformatics Team

Page 41: Mining public domain data as a basis for drug repurposing

Thank you

Email: [email protected] Twitter: @chemconnector Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

Email: [email protected]: collabchemBlog: http://www.collabchem.com/