Upload
orcid-0000-0002-2668-4821
View
1.975
Download
1
Tags:
Embed Size (px)
DESCRIPTION
This is a presentation given at the Opal Events meeting ""Drug Discovery Partnerships: Filling the Pipeline". I was speaking in a session with Jean-Claude Bradley regarding "Pre-competitive Collaboration: Sharing Data to Increase Predictability". This presentation discussed some of the work we are doing on Open PHACTS. My thanks especially to Carole Goble, Lee Harland and Sean Ekins for their comments.
Citation preview
Online Resources to Support Open Drug Discovery Systems
Antony Williams3rd Annual Drug Discovery Partnership: Filling the Pipeline, October 2011
Open Drug Discovery
Pharma Companies spend >$50 billion annually on R&D
How much historical data/knowledge/information is in the public domain? And where is it?
How much generated data is truly competitive? Pre-competitive and public domain data could
deliver high value to drug discovery Data mining Model-building Integrating into in-house and online systems
Internal and external content Built to meet primary use-case Tailored indexes and GUIs Internal unique language & metadata Poor interoperability/integration Powerpoint, Documents, Excel Many suppliers of systems and content in
a single workflow
Literature Patents NewsPipeline SAR CSRs SafetyIn vivo Etc
Pharma Information Tombs
What could create change?
Harvard Business Review (2010)
“One change would make a substantial difference [to drug R&D]: the creation of agreed-upon standards for digitally
representing drug assets.”
It is so difficult to navigate…
What’s the structure?What’s the structure?
Are they in our file?
Are they in our file?
What’s similar?What’s
similar?
What’s the target?
What’s the target?Pharmacology
data?Pharmacology
data?
Known Pathways?
Known Pathways?
Working On Now?
Working On Now?Connections
to disease?Connections to disease?
Expressed in right cell type?Expressed in
right cell type?
Competitors?Competitors?
IP?IP?
Where is chemistry online? Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Patents with chemical structures Drug Discovery data Scientific publications Compound aggregators Blogs/Wikis and Open Notebook Science
PubChem
ChEMBL
ChemSpider
SciDBs Wiki
Pharma are accessing, processing, storing & re-processing
LiteraturePubChem
GenbankPatents
DatabasesDownloads
Data Integration Data AnalysisFirewalled Databases
Public Domain Drug Discovery Data
New trend: Set Data Free on the Web
Open Algorithms, Descriptors and Closed Data – Can We Unlock It?
The Innovative Medicines Initiative EC funded public-private partnership for
pharmaceutical research
Focus on key problems Efficacy Safety Education & Training Knowledge Management
Open PHACTS Project Develop a set of robust standards… Implement the standards in a semantic integration hub Deliver services to support drug discovery programs in
pharma and public domain 22 partners, 8 pharmaceutical companies, 3 biotechs 36 months project
Guiding principle is open access, open usage, open source- Key to standards adoption -
Guiding principle is open access, open usage, open source- Key to standards adoption -
Open PHACTS Project Partners
Example Research questions Give all compounds with IC50 < xxx for target Y in species
W and Z plus assay data
What substructures are associated with readout X (target, pathway, disease, …)
Give all experimental and clinical data for compound X
Give all targets for compound X or a compound with a similarity > y%
Prioritised Research Questions Analysis Prevalent Concepts
Compound Bioassay Target Pathway Disease
Prevalent data relationships Compound – target Compound – bioassay Bioassay – target Compound – target – mode of action Target – target classification Target – pathway and disease
Required cheminformatics functionality
– Chemical substructure searching– Chemical similarity searching
Required bioinformatics functionality
Sequence and similarity searching
Bioprofile similarity searching
Selection of prioritised data sources Chemistry
ChEMBL DrugBank ChEBI PubChem ChemSpider Human Metabolome DB Wombat (commercial)
Ontologies AmiGo (The Gene Ontology) KEGG (Kyoto Encyclopedia of Genes and Genomes) OBI (The Ontology for Biomedical Investigations) Bioassay Ontology EFO (Experimental Factor Ontology)
Biology– EntrezGene– HGNC– Uniprot– Interpro– SCOP– Wikipathways– OMIM– IUPHAR
Linking “Flavors” of Chemistry
Improve Linked Data Access… Coordinate effort to clean up chemistry related data
Open tools – require good validation studies
Support scientists making data open
Support companies/groups promoting software for data sharing
Engage community to help create what they want.
Openness and Quality IssuesWilliams and Ekins, DDT, 16: 747-750 (2011)
Science Translational Medicine 2011
Chemistry Databases on the Internet
Some public databases are “trusted” as primary sources
Trust is granted without investigation or understanding of the content
What do we know about some of the online resources?
PHYSPROP Database
The freely downloadable database under the EPI Suite prediction software
Very Basic filters suggest data quality issues
The Stereochemistry challenge.12500 chemicals with “missed” stereo
Searches on ChemSpider
Most searches are text-based: people searching for information about known chemicals
Creating accurate name-structure dictionaries is critical
NIST Webbook
PubChem
NPC Browser http://tripod.nih.gov/npc/
Cyclic Data Sharing
Data-sharing between open databases is cyclic
Synonyms on PubChem
1,3-DICHLORO-PROPAN-2-ONE
(2R,3R)-Butanediol bis(methanesulfonate)
Ethyl-1-propenyl ether, mixture of cis and trans
PSS-[2-[(Chloromethyl)phenyl]ethyl]-Heptaisobutyl substituted
1-Chlorobenzylethyl-3,5,7,9,11,13,15-heptaisobutylpentacyclo [9.5.1.1(3,9).1(5,15).1(7,13)]octasiloxane
Synonyms on PubChem
Data Proliferation
www.chemspider.com
ChemSpider…
>26 million unique molecules Links together >400 internet resources Linking patents, publications, chemical vendors and
online chemical compound databases Crowdsourced depositions and curations
ChemSpider…
>26 million unique molecules Links together >400 internet resources Linking patents, publications, chemical vendors and
online chemical compound databases Crowdsourced depositions and curations
A focus on data quality – cleaning data on the web
The structure database under Open PHACTS
Acknowledgments Sean Ekins – Collaborations in Chemistry
RSC|ChemSpider team
Open PHACTS consortium – especially Lee Harland and Carole Goble
Data depositors and curators
Software providers – ACD/Labs, OpenEye, GGA Software Inc, Open Source Cheminformatics
Thank you
Email: [email protected] Twitter: ChemConnectorBlog: www.chemspider.com/blogPersonal Blog: www.chemconnector.comSLIDES: www.slideshare.net/AntonyWilliams