BioSHaRE: Advanced Database and catalogue platforms: BiobankConnect and MOLGENIS - Morris Swertz - University Medical Center Groningen

Modular systems for biobank harmonization and genomics data, Morris Swertz & MOLGENIS team, [email protected]

BioSHaRE Tool roll-out, July 28 2015, Milano

Modular data system – query, share, integrate and analyze your data using online modules

Data requestFind and request (biobank) data sets and items

Genome browserData sharing and integration DAS protocol

Upload formatImport data and meta data using EMX format (D4.1)

Meta model registryMeta-data registry of models for biobanks and molecular data (D4.4)

AnnotatorsData integration for diagnostics and personalized medicine

ComputeLarge scale computation on computational clusters, grids and clouds

Biobank Connect Using ontologies to derive harmonization rule for data pooling (D2.2)

RNA pipelineNGS data quantitation, structure,eQTL allele specific expression

Impute pipelineGWAS harmonization and imputation

R statisticsUse R data api to up/download data and integrate graphics

Data explorerFilter and download for further analysis (D4.2)

DNA pipelineNGS data alignment, SNV/SV calling, QC, NIPT

Swertz et al (2010) BMB Bioinformatics Suppl 12:S12Open source at http://github.com/molgenis/molgenis

http://github.com/molgenis/molgenis

Added many features in 32 releases, including:Biobank tooling added(to build catalogues, data request, harmonization, federated search)

Completely rebuild to use industry standards (Java, Maven modules, MySQL, ElasticSearch, HTML5, REST, React, Github)

Improved data entry forms(Questionnaire module; skip questions, validation, ontology upload)

Improved scriptable interface(REST/RSQL query api, R api, Python api)

Added genome capabilities(Genome browser + VCF upload + improved scalability)

Added aggregate overviews(counts per biobank, charts, integration of R scripts, custom reports)

Added annotators to integrate public data(1KG, Exac, CADD, COSMIC, SNPeff, etc)

Added admin options(user/group permissions, customize menu structure, switch color/theme, custom css)

Open source at http://github.com/molgenis/molgenis/releases

http://github.com/molgenis/molgenis/releases

Highlights

Observ-EMX (meta)data structure description tool

SORTA data curation to standard (re)coding tool

BiobankConnect (biobank) data pooling tool

OMX – (gen)omics data tools

Challenge 1

Genomic features, mutations,individuals, ontologies ...

Metadata for phenotypes,datasets, samples, panels …

Genotypes, variants,factors, conditions …

Phenotypes, NGS, GWAS, eQTLs, Microbiome, Metabolomics ...

(Observ)EMX – customize to your data standard using ‘entity model’ spreadsheet

Swertz et al, 2012, Human Mutation 33(5): 867-73Open source at http://github.com/molgenis/molgenis

Your DataYour Meta Data


Challenge 2

ATBMTBMOUNTAINBIKEMOUNTAINBIKENMOUNTAIN BIKENMOUTAINBIKINGCROSS FIETSCROSS FIETSENWIELRENNENRACE FIETSENRACEFIETSENRACEFIETSRACE FIETSFIETSEN RACEWII (NINTENDO) SPORTEN

What is the MET code for this ‘activity’ that was recorded by

LifeLines participant?

MET = The Metabolic Equivalent of Task

SORTA – rapidly (re)code your data using ontologies, e.g. go from free text to categories

Pang et al (submitted)Open source at http://github.com/molgenis/molgenis


Increase in blood pressure

CM ever had high blood pressure

Have you ever been told that you have elevated or

high blood pressure?

Are you taking medication for high blood pressure?

Hypertension

MIRCROS

Search for‘History of hypertension’

History of Hypertension

Challenge 2

BiobankConnect – semi-automatically map data for pooled analysis using ontologies

Pang et al, 2015, JAMIA 22(1):65-75Open source at http://github.com/molgenis/molgenis


Challenge 4

Insight / action?

DNA / phenotyp

e

16

Challenge 4DNA /

phenotype

Population

genomic variants

Insight / action?

Clinical symptom

s & diseases

Biochem-ical

pathways

Imputation

Patho-genicity

predictions

Model organism knockouts

Actionable genes

Known pathogeni

c & benign variants

Allele specific

expression

Tissue specific

expression

Pedigree analysis

Protein conser-vation

Age, onset,

heritability modes

GWAS and QTL studies

Conseq-uence

17

Upload VCF file

OMX – extension for dealing with loading, browsing and annotating genomics data

=Open source at http://github.com/molgenis/molgenis

Before the demos: AcknowledgementsThe research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 261433 (Biobank Standardisation and Harmonisation for Research Excellence in the European Union - BioSHaRE-EU), grant 284209 (BioMedBridges), TI Food and Nutrition grant TIFN GH001, and BBMRI-NL grant 184.021.007, a research infrastructure financed by the Netherlands Organization for Scientific Research (NWO).Many thanks to Chao Pang, Joeri van der Velde, Tommy de Boer, Fleur Kelpin, Dennis Hendriksen, Erwin Winder, Mark de Haan, Bart Charbon, Jonathan Jetten, Joel Kuiper, Anna Sijtsma, Annet Sollie, Martijn Dijkstra, Nynke Smidt, David van Enckevort, Rolf Sijmons, Isabel Fortier, Yannick Marcon, Danny Doiron, Hans Hillege, Anthony Brookes, Robert Hastings and many other collaborators in BioSHaRE and Beyond

Demos (http://molgenis.org/youtube)

Demo 1 Upload your data using EMX and basic features

Demo 2 Rapidly code a phenotype dataset to a standard set of values (HPO)

Demo 3Rapidly map biobanks (LifeLines and Prevend) to a standard schema (HOP) as basis for joint analysis

Demo 4 Uploading and annotating a NGS data file in VCF format

Contact: [email protected] Open source at http://github.com/molgenis/molgenis

Health & Medicine

BioSHaRE: Advanced Database and catalogue platforms: BiobankConnect and MOLGENIS - Morris Swertz - University Medical Center Groningen