View
962
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Making hidden data discoverable: How to build effective drug discovery engines? Sebastian Radestock (Elsevier, Germany) In a complex IT environment comprising dozens if not hundreds of databases and likely as many user interfaces it becomes difficult if not impossible to find all the relevant information needed to make informed decisions. Historical data get lost, not normalized data cannot be compared and maintenance becomes a nightmare. We will discuss a new approach to address this issue by showing various examples and use cases on how in-house data and public data can be integrated in various ways to address the unique and individual needs of companies to keep the competitive edge.
Citation preview
DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE
1
DRUG DISCOVERY TODAY
ELN
Biology
Therapeutic
target
Chemistry
Check chemical feasibility
Synthesize or buy
Test
Check ADME/Tox
Report
Analyze SAR
Generate chemistry ideas
In-house
Knowledge
survey
ELN
DBs
Flatfiles
Journals &
Patents
Journals
Docs
Known ligands
2
Dr. Sebastian Radestock
Product Manager Reaxys
Elsevier Information Systems GmbH
Frankfurt am Main Germany
MAKING HIDDEN DATA DISCOVERABLE:
HOW TO BUILD EFFECTIVE DRUG
DISCOVERY ENGINES?
3
DRUG DISCOVERY TOMORROW
Therapeutic
target
Chemistry
Check chemical feasibility
Synthesize or buy
Test
Check ADME/Tox
Report
Analyze SAR
Generate chemistry ideas
In-house
Knowledge
survey
ELN
Known ligands
DBs
Journals &
Patents
ELN
Flatfiles
Docs
Biology
Federated
search system
Chemistry
integrator
Biology
integrator
Integrator for
biomedical
data
Literature
management
tool
Integrator for
drug safety
TWO APPROACHES TO SOLVING THE CHALLENGE OF DATA ACCESS
DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE
4
DRUG DISCOVERY TOMORROW
ELN ELN
Storage and capture Capture
FEDERATED MODEL
WAREHOUSE APPROACH
Storage Analysis
system
User
CHEMISTRY INTEGRATOR
In-house
DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE
5
EXAMPLE IMPLEMENTATION OF THE FEDERATED MODEL
Storage and capture
FEDERATED MODEL
Analysis
system
User
CHEMISTRY INTEGRATOR
In-house
Expansion of the existing system by integrating Reaxys
Existing system for consolidating in-house structure and bioactivity data
CONTAINS ALL PUBLISHED AND HISTORICAL CHEMISTRY DATA
DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE
6
THE REAXYS DATA BASE
• Compounds, substance property
data, preparations, reactions, and
bibliographic information…
…from 400 core chemistry journals
…from relevant chemistry patents
• Manual extraction of all the data
• Coverage of 500 property data fields,
from basics like boiling point or
melting point, via crystal data and
magnetic properties, to spectra
• All together 750 million data points
• Historical chemistry data…
…dating back to 1771
Scie
nti
fic d
ata
model
SUPPORTING MULTI-DISCIPLINARY RESEARCH
DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE
7
EXPANDED BIBLIOGRAPHIC CONTENT IN REAXYS
• Bibliographic data from 16.000
periodicals covering chemistry and
related sciences have been loaded
into Reaxys
• This goes beyond journals and
patents, it includes conference
proceedings, business articles,
reviews etc.
AGRICULTURAL AND BIOLOGICAL SCIENCES
BIOCHEMISTRY, GENETICS AND MOLECULAR BIOLOGY
CHEMICAL ENGINEERING
DENTISTRY
EARTH AND PLANETARY SCIENCES
ENERGY
ENGINEERING
ENVIRONMENTAL SCIENCE
IMMUNOLOGY AND MICROBIOLOGY
MATERIALS SCIENCE
MEDICINE
NEUROSCIENCE
PHARMACOLOGY, TOXICOLOGY AND PHARMACEUTICS
PHYSICS AND ASTRONOMY
VETERINARY
ETC.
old
new
THE NEXT STEP… COMING 2014
DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE
8
REAXYS-TREE AND AUTOMATIC INDEXING
Title/Abstract
The Biosynthesis of Aristeromycin. Conversion of Neplanocin A to
Aristeromycin by a Novel Enzymatic Reduction
Partially purified cell-free extracts of the aristeromycin producer
Streptomyces citricolor have been shown to catalyze the NADPH-
dependent reduction of neplanocin A to aristeromycin. Stereochemical
studies revealed that the reduction proceeds with anti-geometry and
involves transfer of the 4 pro-R hydrogen atom of NADPH to the 6'β
position of aristeromycin.
Reaxys Keywords: Aristeromycin – biosynthesis, enzymatic
reduction, Neplanocin A
Index for chemistry terms
Translate chemical names into structures and make them searchable
Step 1
Add chemistry relevant keywords and identified chemical entities
Step 2
Step 3
ACCESS TO REAXYS IS VIA THE REAXYS APPLICATION PROGRAMMING INTERFACE (API)
DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE
9
FEDERATED SEARCH SYSTEM WITH REAXYS LOOK-UP
How should the API look
like?
LESSONS LEARNED
DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE
10
THE REAXYS APPLICATION PROGRAMMING INTERFACE (API)
• Customers want to have access to all data in Reaxys
• Substances and substance property data
• Reactions and reaction details
• Citations
• Customers want to have access to all functionality of Reaxys
• Exact structure and reaction searching, similarity and substructure searching
• Factual queries
• Further processing of hitsets
• The Reaxys API was designed to be based on exchanging XML code between the user
and the Reaxys server via HTML POST requests
• Security and usage tracking is an issue
• Secure communication via HTTPS POST is supported
• The Reaxys API is stateful, and login is required
In-house
SOME DISADVANTAGES TO CONSIDER
DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE
11
LIMITATIONS OF THE FEDERATED MODEL
Storage and capture
FEDERATED MODEL
ELN
Analysis
system
User
CHEMISTRY INTEGRATOR
In-house
Architecture allows easy expansion when new data source becomes available
Performance and availability of the system is dependent on the source systems
No clean-up and normalization of the data
DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE
12
EXAMPLE IMPLEMENTATION OF THE WAREHOUSE APPROACH
Storage
Capture
WAREHOUSE APPROACH
Analysis
system
User
In-house
Bioactivity data was normalized, and structures were de-duplicated
A customer set up a system that contains structure and bioactivity data, and IP information
All data was extracted, translated and loaded to fit into one unified data model (UDM)
CHEMISTRY INTEGRATOR
Structures
STRUCTURE DATA FROM REAXYS COMES FROM THE REAXYS STRUCTURE FLAT FILE
DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE
13
PLATFORM FOR STRUCTURE AND BIOACTIVITY DATA
SOME DISADVANTAGES TO CONSIDER
DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE
14
LIMITATIONS OF THE OF THE WAREHOUSE APPROACH
Storage
Capture
WAREHOUSE APPROACH
Analysis
system
User
In-house
Long implementation time and associated high cost
Difficult to accommodate differences or changes in data types
CHEMISTRY INTEGRATOR
Structures
A CONTENT INTEGRATION SOLUTION THAT IS NOW AVAILABLE TO ALL REAXYS USERS
DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE
15
EXAMPLE IMPLEMENTATION OF A FLEXIBLE APPROACH
Capture
FEDERATED MODEL
WAREHOUSE APPROACH
User Structures from PubChem and eMolecules have been integrated into Reaxys
Storage and capture
Pricing
Real-time commercial availability and pricing information comes via an eMolecules API
Structures
Structures
Storage
Using multiple storage systems eliminates the need for one UDM
CHEMISTRY INTEGRATOR
16
The substance crosslinking
icon allows to switch to
corresponding substances in
other data sources
Results are available from
different data source tabs
17
Filter for substances that
are/aren’t contained in
other data sources
Results are available from
different data source tabs
18
Filter for substances that
are/aren’t contained in
other data sources
Results are available from
different data source tabs
19
The commercial availability
icon allows to check real-time
pricing information from
eMolecules
20
PubChem property
headers with direct
links to PubChem
LESSONS LEARNED
DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE
21
FLEXIBLE APPROACH FOR INTEGRATION
• Reaxys has proven to be extremely powerful as analysis and database system
• Separation of the data from different data sources into multiple storage
systems is the way to go…
• … if a powerful crosslinking mechanism is in place
• Some pieces of information that are subject to frequent updates should be
integrated using the federated model
A CONTENT INTEGRATION SOLUTION THAT ELSEVIER BUILT FOR ROCHE
DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE
22
EXAMPLE IMPLEMENTATION OF A FLEXIBLE APPROACH
FEDERATED MODEL
WAREHOUSE APPROACH
User Integration of Roche proprietary data on chemistry experiments
Storage and capture
Pricing
Reaxys with PubChem and eMolecules integrated
ELN In-house Reactions
Capture
Structures
Structures
Storage
CHEMISTRY INTEGRATOR
THE SITUATION AT ROCHE YESTERDAY… AND TODAY
DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE
23
INTEGRATION OF ROCHE IN-HOUSE DATA
https://reaxys.roche.com
24
Normalized
Roche-specific
reaction data
Data on references
and/or experiments,
including PDF links
Roche-specific data is included in
the Output (PDF, MS-Word etc.)
Filters on
Roche-specific
data fields
Up to four Roche reaction
data sources are supported
https://reaxys.roche.com
25
A Roche icon allows
to switch to a Roche
in-house repository
The reaction crosslinking icon
allows to switch to
corresponding reactions in other
data sources
26
Start building a synthesis tree by
clicking on the synthesize link
https://reaxys.roche.com
27
https://reaxys.roche.com
Synthesis planner
opens up
The first step of the synthesis
plan is selected from the
Roche data source
28
One step has
been added
https://reaxys.roche.com
Add a second
step
29
New reactions are
loaded
https://reaxys.roche.com
The second step of the
synthesis plan is
selected from Reaxys
30
Roche
Experimental details of the
“mixed” synthesis plan are
summarized in a table.
Another step has
been added
https://reaxys.roche.com
CUSTOMER FEEDBACK
DR. S. RADESTOCK | 14 OCTOBER 2013 | MAKING HIDDEN DATA DISCOVERABLE
31
INTEGRATION OF ROCHE IN-HOUSE DATA
• Usability and acceptance tests by Roche
showed:
• Increased productivity of researchers at
Roche
• Increased discoverability of the Roche
reaction content
• Reduced maintenance effort for Roche:
• Legacy systems were decommissioned
• Roche gets on-going maintenance and
functionality improvements by Elsevier
• Not compromise in security
• Flexible approach:
• Additional data sources have been
added
32
DRUG DISCOVERY TOMORROW
Therapeutic
target
Chemistry
Check chemical feasibility
Synthesize or buy
Test
Check ADME/Tox
Report
Analyze SAR
Generate chemistry ideas
In-house
Knowledge
survey
ELN
Known ligands
DBs
Journals &
Patents
ELN Flatfiles
Docs
MedScan
Biology
Federated
search system
THANK YOU – QUESTIONS?
Dr. Sebastian Radestock Product Manager Reaxys
Elsevier Information Systems GmbH Frankfurt am Main, Germany
33