Upload
lisette-giepmans
View
276
Download
2
Embed Size (px)
Citation preview
Modular systems for biobank harmonization and genomics data, Morris Swertz & MOLGENIS team, [email protected]
BioSHaRE Tool roll-out, July 28 2015, Milano
Modular data system – query, share, integrate and analyze your data using online modules
Data requestFind and request (biobank) data sets and items
Genome browserData sharing and integration DAS protocol
Upload formatImport data and meta data using EMX format (D4.1)
Meta model registryMeta-data registry of models for biobanks and molecular data (D4.4)
AnnotatorsData integration for diagnostics and personalized medicine
ComputeLarge scale computation on computational clusters, grids and clouds
Biobank Connect Using ontologies to derive harmonization rule for data pooling (D2.2)
RNA pipelineNGS data quantitation, structure,eQTL allele specific expression
Impute pipelineGWAS harmonization and imputation
R statisticsUse R data api to up/download data and integrate graphics
Data explorerFilter and download for further analysis (D4.2)
DNA pipelineNGS data alignment, SNV/SV calling, QC, NIPT
Swertz et al (2010) BMB Bioinformatics Suppl 12:S12Open source at http://github.com/molgenis/molgenis
Added many features in 32 releases, including:Biobank tooling added(to build catalogues, data request, harmonization, federated search)
Completely rebuild to use industry standards (Java, Maven modules, MySQL, ElasticSearch, HTML5, REST, React, Github)
Improved data entry forms(Questionnaire module; skip questions, validation, ontology upload)
Improved scriptable interface(REST/RSQL query api, R api, Python api)
Added genome capabilities(Genome browser + VCF upload + improved scalability)
Added aggregate overviews(counts per biobank, charts, integration of R scripts, custom reports)
Added annotators to integrate public data(1KG, Exac, CADD, COSMIC, SNPeff, etc)
Added admin options(user/group permissions, customize menu structure, switch color/theme, custom css)
Open source at http://github.com/molgenis/molgenis/releases
Highlights
Observ-EMX (meta)data structure description tool
SORTA data curation to standard (re)coding tool
BiobankConnect (biobank) data pooling tool
OMX – (gen)omics data tools
Challenge 1
Genomic features, mutations,individuals, ontologies ...
Metadata for phenotypes,datasets, samples, panels …
Genotypes, variants,factors, conditions …
Phenotypes, NGS, GWAS, eQTLs, Microbiome, Metabolomics ...
(Observ)EMX – customize to your data standard using ‘entity model’ spreadsheet
Swertz et al, 2012, Human Mutation 33(5): 867-73Open source at http://github.com/molgenis/molgenis
Your DataYour Meta Data
Challenge 2
ATBMTBMOUNTAINBIKEMOUNTAINBIKENMOUNTAIN BIKENMOUTAINBIKINGCROSS FIETSCROSS FIETSENWIELRENNENRACE FIETSENRACEFIETSENRACEFIETSRACE FIETSFIETSEN RACEWII (NINTENDO) SPORTEN
What is the MET code for this ‘activity’ that was recorded by
LifeLines participant?
MET = The Metabolic Equivalent of Task
SORTA – rapidly (re)code your data using ontologies, e.g. go from free text to categories
Pang et al (submitted)Open source at http://github.com/molgenis/molgenis
Increase in blood pressure
CM ever had high blood pressure
Have you ever been told that you have elevated or
high blood pressure?
Are you taking medication for high blood pressure?
Hypertension
MIRCROS
Search for‘History of hypertension’
History of Hypertension
Challenge 2
BiobankConnect – semi-automatically map data for pooled analysis using ontologies
Pang et al, 2015, JAMIA 22(1):65-75Open source at http://github.com/molgenis/molgenis
Challenge 4
Insight / action?
DNA / phenotyp
e
16
Challenge 4DNA /
phenotype
Population
genomic variants
Insight / action?
Clinical symptom
s & diseases
Biochem-ical
pathways
Imputation
Patho-genicity
predictions
Model organism knockouts
Actionable genes
Known pathogeni
c & benign variants
Allele specific
expression
Tissue specific
expression
Pedigree analysis
Protein conser-vation
Age, onset,
heritability modes
GWAS and QTL studies
Conseq-uence
17
Upload VCF file
OMX – extension for dealing with loading, browsing and annotating genomics data
=Open source at http://github.com/molgenis/molgenis
Before the demos: AcknowledgementsThe research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 261433 (Biobank Standardisation and Harmonisation for Research Excellence in the European Union - BioSHaRE-EU), grant 284209 (BioMedBridges), TI Food and Nutrition grant TIFN GH001, and BBMRI-NL grant 184.021.007, a research infrastructure financed by the Netherlands Organization for Scientific Research (NWO).Many thanks to Chao Pang, Joeri van der Velde, Tommy de Boer, Fleur Kelpin, Dennis Hendriksen, Erwin Winder, Mark de Haan, Bart Charbon, Jonathan Jetten, Joel Kuiper, Anna Sijtsma, Annet Sollie, Martijn Dijkstra, Nynke Smidt, David van Enckevort, Rolf Sijmons, Isabel Fortier, Yannick Marcon, Danny Doiron, Hans Hillege, Anthony Brookes, Robert Hastings and many other collaborators in BioSHaRE and Beyond
Demos (http://molgenis.org/youtube)
Demo 1 Upload your data using EMX and basic features
Demo 2 Rapidly code a phenotype dataset to a standard set of values (HPO)
Demo 3Rapidly map biobanks (LifeLines and Prevend) to a standard schema (HOP) as basis for joint analysis
Demo 4 Uploading and annotating a NGS data file in VCF format
Contact: [email protected] Open source at http://github.com/molgenis/molgenis