Upload
tom-plasterer
View
71
Download
1
Embed Size (px)
Citation preview
Edge Informatics and FAIR* Data
Tom Plasterer, PhDResearch & Development Information (RDI); US Cross-Science Director 20 February 2017
Integrated Pharma Informatics
* Findable, Accessible, Interoperable and Reusable
The right data is there when I need it
Your data and my data are mutually understandable
Our data can be effortlessly combined
I am permitted to use any data I can access
Data can be reshaped for a different purpose
Data sharing is rewarded
‘I’ can be a human or a machine
3
We Want Data Nirvana!
4
Target
Discovery
Lead
Discovery
Lead
Optimization
Pre-Clinical
Development
Clinical
DevelopmentRegistration
Marketing &
Sales
Node and Edge Informatics: Interfaces
5
Target
Discovery
Lead
Discovery
Lead
Optimization
Pre-Clinical
Development
Clinical
DevelopmentRegistration
Marketing &
Sales
NGS
Exome
analysis
Pathway
Analysis
Structure
Analysis
Disease
Contextualization
Node and Edge Informatics: Interfaces
6
NGS
Exome
analysis
Pathway
Analysis
Target
Discovery
Lead
Discovery
Lead
Optimization
Pre-Clinical
Development
Clinical
DevelopmentRegistration
Marketing &
Sales
RNAi
Assay
Development
HTS
Structure
Analysis
Disease
Contextualization
SAR
In vivo non-human testing
Exploratory PK
Exploratory Tox
GLP Tox
Formulation
ADMEPK
Efficacy
IND
Safety, Tolerability
Phase I-III
NDA/BLA
MAA
PMR
REMS
PSUR
Observational Research
Node and Edge Informatics: Interfaces
7
NGS
Exome
analysis
Pathway
Analysis
Target
Discovery
Lead
Discovery
Lead
Optimization
Pre-Clinical
Development
Clinical
DevelopmentRegistration
Marketing &
Sales
RNAi
Assay
Development
HTS
Structure
Analysis
SAR
In vivo non-human testing
Exploratory PK
Exploratory Tox
GLP Tox
Formulation
ADMEPK
Efficacy
IND
Safety, Tolerability
Phase I-III
NDA/BLA
MAA
PMR
REMS
PSUR
Node and Edge Informatics: Interfaces
Seamless information connectivity (an EDGE)
needed across domain NODEs
Disease
ContextualizationObservational
Research
9
FAIR Data: Overview
To be Findable:
• Globally unique, resolvable and persistent identifiers
• Machine-actionable contextual information supporting discovery
To be Accessible:
• Clearly defined access protocol
• Clearly defined rules for authorization/authentication
To be Interoperable:
• Use shared vocabularies and/or ontologies
• Syntactically and semantically machine-accessible format
To be Reusable:
• Be compliant with the F, A and I Principles
• Contextual information, allowing proper interpretation
• Rich provenance information facilitating accurate citation
Mark Wilkinson, Data Interoperability and FAIRness Through Existing Web Technologies
10
FAIR Data: A Brief History
Moving away from Narrative
• Nanopublications
Incubating Standards in Open PHACTS
• VoID, PROV-O
Lorentz Center Workshop
• FORCE 11 FAIR Guiding Principles
• Participants: IMI members, US researchers,
Content providers, ELIXIR; European Open
Science Cloud, Big Data to Knowledge (BD2K)
Current Status:
• FAIR Data Workshops (EU-ELIXIR nodes)
• Inclusion in Horizon 2020, NIH Advocacy
• IMI2 Data FAIR-ification Call
• Vendors getting up to speed
11
FAIR Data: Systems Biology Survey
Molecular Systems Biology
Volume 11, Issue 12, 28 DEC 2015 DOI: 10.15252/msb.20156053
http://onlinelibrary.wiley.com/doi/10.15252/msb.20156053/full#msb156053-fig-0001
12
FAIR Data & Biopharma?
Collaborative & Competitive Intelligence:
• Who do we want to partner with? Are there complementary assets to our portfolio?
• What space is too crowded and not our area of expertise?
• Greenfield situations?
Mergers, Acquisitions, Partnerships:
• How do we efficiently and deeply absorb data generated elsewhere into our systems? How
do we efficiently share?
• Does this make a smaller biotech/start-up a more viable partner?
Improved Patient Care:
• Can we share data and outcomes more efficiently in complicated trial settings (basket trials,
adaptive trials) to better engage opinion leaders and foster dialog?
• Along with Differential Privacy approaches, can we have the broader research community
help mine our data?
Data (Ir)-reproducibility:
• Is preclinical data reproducible?
• Can we utilize data credentialization? (thanks to Dan Crowther @ Sanofi)
13
Differential Privacy (DP): Clinical Data Anonymization
• A quantifiable method for anonymizing data by modifying data fields identified
as those that can aid in the identification of individuals.
• Adapted by large corporations like Apple and Google
to protect the privacy of users of their services.
AZ Differential Privacy Efforts:
• Developed and publishing a DP algorithm designed to anonymize clinical data.
• Developing open source software in R (and Mathematica)
FAIR — DP helps support these guiding principle for scientific data:
• Findable DP may facilitate pharma patient data transparency
• Accessible
• Interoperable Analysis of private and DP data yield the same statistics
• Reusable Enable reuse inside as well as outside the pharma company
firewalls.
Enabling FAIR Guiding Principles for Scientific Data
14
Edge Informatics & FAIR Application: CI360
WINNER
15
Capture Business Questions: Inventory Capture Business Questions and
Sources
16
Translate Questions into Concepts: Team Modeling Domain Expert Concept Map
“Where are the key clinical studies in NSCLC and who are the principle investigators?”
17
Challenge with Data: Remodel
“Where are the key clinical studies in NSCLC and who are the principle investigators?”
(one example)
Challenge with Linked Data
Source: https://clinicaltrials.gov/ct2/show/NCT02027428
18
Refine the Answer: Configurable Interfaces Examine with a Faceted Browser
“What are the open trials in metastatic breast cancer and what drugs are being tested?”
19
Share Insights as a Community: Nanopublish
“Can a biomarker defined population be added to a trial record?”
Share insights with a Knowledge
Base
20
Is CI360 FAIR?
Findable:
• Resources named with URIs, with a defined policy
• Dataset descriptions published with VoID on intranet
To be Accessible:
• Data reachable via REST and SPARQL APIs
• Application secured via SSO
To be Interoperable:
• Uses well-described internal and public ontologies
• All data is linked data (RDF)
To be Reusable:
• Daily updates tracked with VoID and PROV-O
• Vocabularies used in CI360 already reused in four other applications
R&D | RDI
Get your plumbing right• And your data won’t be stuck in a silo
Use Edge Informatics• Consider handoffs—you don’t know how your data will be used in the future
Leverage working public solutions• Don’t reinvent the wheel (OK—Ontology…)
Invest in FAIR Data Stewardship• Investment to future-proof your efforts
FAIR Data and Edge Informatics: Take-aways
R&D | RDI
Thanks
Key Influencers
In Linked Data Community
Molecular Medicine Tri-Con 2017
Conference Organizers
AZ/MedImmune Linked
Data Community