Upload
others
View
3
Download
0
Embed Size (px)
WP4: Technical Integration
BioMedBridges Julie McMurry, Technical Project Lead
On behalf of WP4 partners
Goal: Technical Integration• WP4 delivers
connectors for data integration - built on modern web standards
• WP4 focusses on 11 prioritized use-cases that address real needs within BioMedBridges
• WP4 pilots employ a common technology “stack” which can be reused for the future
Main achievements Six deliverables complete (Use cases report, integration
paths report, webservices, user interfaces, semantic web scaleability)
Eleven pilot projects addressing BMB needs Impact:
Discovery Harmonisation Access Analysis Reuse
Nine BMS RI connected All partners have raised data integration readiness
Funded by FP7 Capacities Specific Programme, grant agreement no. 284209
Visualising and leveraging ontologies in queries
Sharing and integrating medical imaging data
Integrating mouse phenotype data for studying diabetes and obesity
Sharing protein engineering knowledge
Federating biobank queries for translational research
Biosample information integration and discovery
Leveraging the utility of compound screening functional assays
Connectivity-Based searching in UniChem
Connect clinical and lab workflows using TranSMART, Galaxy
Visualising ecological context for genome sequencing data**
Linking clinical trials to the drugs, genes, and results they reference
Gene
s
Prote
ins
Orga
nisms
Bios
ample
s
Dise
ases
Drug
s
Trial
sMe
dical
imag
ing
Use
-cas
e dr
iven
pilo
ts
inte
grat
e di
vers
e da
ta d
omai
ns
4.4, 4.7, 4.6 RDF, SemWeb
4.5 User interfaces &
query BioJS4.3
Programmatic access
Parallel-path developmentbuilt on common technology stack
of modern web standards
4.1, 4.2 Landscape and
planning“simple object
queries”
SemWeb Pilot PhasesDel # Due Deliverable
focus Nature of the activities
Prep D4.4 2013 Planning Overall roadmap
Phase I D4.7 Dec-14 SemWebscaleability
ELIXIR establishes mature services and guidelines. Benchmarks technology and assesses scaleability.
Phase 2 D4.6 Jun-15 Data integration Parallel pilots, according to respective roadmaps, using above guidance
Phase 3 D4.8 Dec-15 Report
Why Semantics and Standards(Ontologies, Identifiers, Formats)
Subject uniprot protein P15056 uniprot protein P15056
Relationship (aka ‘Predicate’)
is_product_of is_reagent_in
Object gene X reaction pathway Y
Tech
nolo
gies
Visualising and leveraging ontologies in queries
Sharing and integrating medical imaging data
Integrating mouse phenotype data for studying diabetes and obesity
Sharing protein engineering knowledge
Federating biobank queries for translational research
Biosample information integration and discovery
Leveraging the utility of compound screening functional assays
Connectivity-Based searching in UniChem
Connect clinical and lab workflows using TranSMART, Galaxy
Visualising ecological context for genome sequencing data**
Linking clinical trials to the drugs, genes, and results they reference
REST GUI RDF SrcCode BioJS
Biolo
gical
mode
ls
Drug
s and
othe
r ch
emica
ls
Gene
expr
essio
n da
ta
Molec
ular
inter
actio
ns &
pa
thway
s
Prote
in se
quen
ces
Biolo
gical
Samp
les
Gene
varia
tion
data
Geno
mic d
ata
Expe
rimen
tal
prote
in pr
oduc
tion
Mous
e
phen
otype
data
Clini
cal tr
ials
Funding for conversion
App
Progress on RDF conversions
Triples
Phase
prod prod prod prod prod prod
11.4M 375M 447M 12.5M 25B 102M 316M 80M* * 50000 2700Linked Open Data Cloud
Linked Open Data Cloud
prod prod prod prod prod prod
RDF platform Released Nov 2013 >16 billion RDF triples
Linked Open Data
Jupp et al (2013). The EBI RDF Platform: Linked Open Data for the Life Sciences. Bioinformatics.
Technical lessons LOD best practice
Collect use cases Publish the scheme Publish a SPARQL endpoint Provide representative queries Link to other datasets by using their URIs Communicate your data release cycle Test and validate your dataset Define the scope and supported use-cases Provide a license and citation info Have a strategy for long-term sustainability
https://github.com/dbcls/bh14/wiki/Ten-simple-rules-for-publishing-Linked-Data-for-the-Life-Sciences
Simplifying complex relationships supports data search and integration
http://tinyurl.com/linkeddataprovenance
Key lesson learnedStrategies that are:
driven by community use cases implement standards consider reusability from the start
…are inherently more sustainable
Funded by FP7 Capacities Specific Programme, grant agreement no. 284209
Sustainability and ReuseReuse Src code Hosting Bug fixes New
features Expertise Funding
Visualising and leveraging ontologies in queries CTTV, Euro BioImaging
Sharing and integrating medical imaging data CTMM-TRAIT
Integrating mouse phenotype data for studying diabetes and obesity CTTV
Sharing protein engineering knowledge
Federating biobank queries for translational research CTTV
Biosample information integration and discovery CTTV
Leveraging the utility of compound screening functional assays CTTV
Connectivity-Based searching in UniChem CTTV
Connect clinical and lab workflows using TranSMART, Galaxy
CTMM-TRAIT
Linking clinical trials to the drugs, genes, and results they reference ?
Most of the pilots (and lessons) are already being put to use within translational research
projects (eg. CTTV, CTMM-TRAIT)
Data Domains Technologies
Remaining work Continue parallel-path development of pilot projects
according to their respective roadmaps ErasmusMC: Clinical and imaging integration using TransMart EMBL: Semantic Search Widget HMGU: RDF transformation, Query TUM-MED: RDF transformation, Query STFC: RDF transformation, Query UDUS: RDF transformation, Query VUMC: Clinical and laboratory integration using TransMart
Make outputs of work more visible, accessible, and reusable (eg. BioJS widgets)
Prepare for end-of-project transition Open source repositories Documentation Reporting
Collaborators
17
ECRIN ELIXIR EMBRC BBMRI EATRIS, BBMRIUDUS CSC Stazione Zoologica Anton Dohrn - Napoli TUM-MED VUMC
Benjamin Braasch Tommi Nyrönen Remo Sanges Raffael Bild Ward Blonde
Christian Krauth EMBL-EBI Max Plank Institute Sabine Brunner Jeroen Belien
Christian Ohmann Ewan Birney Renzo Kottman Klaus Kuhn Freek de Bruijn
Martin Eckert Niklas Blomberg Ashish Lamichhane Jan Willem Boiten
Töresin Karakoyun Tony Burdett Willi Mann InfrafrontierNathalie Conte UMCG HMGU
Euro-BioImaging Adam Faulconbridge Dennis Hendriksen Philipp Gormanns
EMBL Jon Ison Morris Swertz Christoph Lengger
Jan Ellenberg Andrew Jenkinson Holger Maier
EMBL-EBI Rafael Jiminez EU-OPENSCREENGabriella Rustici Simon Jupp EBI Instruct
ErasmusMC James Malone Jon Chambers STFC
Stefan Klein Julie McMurry John P. Overington Narayanan Krishnan
Wiro Niessen Helen Parkinson Chris Morris
Erwin Vast Francis Rowland Martyn Winn
Drashtti Vasant
Danielle Welter
WP4 Acknowledgements
Funding: European Union Seventh Framework Programme, grant agreement number 284209
Bridging imagingand clinical data
BioMedBridgesStefan Klein, Erwin Vast, Wiro Niessen
Erasmus MC
Volume GM = 790 mlVolume WM = 497 ml
Medical imaging data
www.xnat.org
XNAT – Imaging platform
Bridging imaging and clinical data
Grey Matter Volume: 790 ml
MR image
No AlzheimerAlzheimer
Genetics + age
Derived data
Grey matter volume
Ris
k sc
ore
TranSMART
Bridging XNAT and TranSMART Created open-source TranSMART plugin to import
XNAT data
Bridging XNAT and TranSMART
New TranSMART administration panel.
XNAT data shown in TranSMART
explorer.
Conclusion XNAT: storing and sharing medical imaging
(derived) data TranSMART: analysis of translational research data Bridging: import image-derived data from XNAT into
TranSMART
Future work: Harmonize storage of image derived data Write documentation