38
Standards / models / mappings registry Morris Swertz BioMedBridges WP3 workshop June 24, 2014, VUmc, Amsterdam

Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Standards / models / mappings registry

Morris Swertz

BioMedBridges WP3 workshop

June 24, 2014, VUmc, Amsterdam

Page 2: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Outline

•  Background

•  User stories

•  Implementation pointers

•  Goals of the meeting

•  Open the discussion

Page 3: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Background

3

Page 4: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Objective

The BMS standards registry aims to facilitate syntactic operability across research infrastructure so samples and data can be integrated and analysed across ESFRI BMS domains.

4

Page 5: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Standard?

•  What do we mean by “standard”?

•  So far we have limited ourselves to ‘models’ that are used to describe life science data. These may or may not be formal standards. Examples •  Formats like VCF and MAGE-TAB

•  Models / guidelines like MIABIS

•  (Partial) Dictionaries like used in clinical and biobank studies

•  I.e. ‘standard’ can by any model/format that is used by multiple parties to facilitate data sharing / integration.

5

Page 6: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Who?

•  Data consumer?

•  Data producer?

•  Software developer?

•  Architect?

6

Page 7: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

History

•  Few meetings within BMB

•  Three ‘idea labs’ @ HandsOn biobanks

•  Private communications

7

Page 8: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Proposed goal of today?

•  Scope •  what types of standards to include

•  Yes: models, formats, guidelines?

•  Discussion: identifiers, value sets, ontologies?

•  priorities of user stories (MoSCoW)

•  priorities on content - what standards to catalogue first and why

•  priorities of meta-data to be captured about each standard

•  overview of stakeholders/users

•  Next steps •  ideas about user interfaces needed

•  pointers to useful existing solutions/contents we can reuse

•  identify tasks / contributors to the deliverable

•  summary of the ‘how to demo’ for first public release

8

Page 9: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

User story

9

Page 10: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Find standard

Synopsis:

As researcher / data provider I want to find relevant standard

How to demo:

There is a search box where users can type in a topic. Then the portal will return a list of available standard including contextual information to rapidly assess the relevance and value of the micro standard [what parameters / tags?]. For example, new biobanks starting up could use this to quickly find existing questionnaire modules or existing biobanks could rapidly assess to what standards they could harmonize their variables, e.g. when translating from local language to English.

10

Page 11: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Biosharing.org 11

http://www.biosharing.org

Page 12: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

EDAM 12

http://bioportal.bioontology.org/ontologies/EDAM/?p=classes&conceptid=http%3A%2F%2Fedamontology.org%2Fformat_3162&jump_to_nav=true

Page 13: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Drill down on schema to evaluate fit for purpose

Synopsis:

As a researcher I want suitable details on the data elements of the standard so I can assess if the standard is fit for my purpose.

How to demo:

The focus group wanted to see essential data and metadata to ease the search and evaluation process of assessing whether a biobank / sample collection / study is fit for the purpose of research or experiments currently under consideration. I.e. the portal should ideally have a tight integration between the model definitions and annotation and external uses of the standard such as other repositories containing information that define use of the models.

13

Page 14: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Proof of concept 14

http://www.molgenis08.target.rug.nl

Page 15: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Proof of concept 15

http://www.molgenis08.target.rug.nl

Page 16: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Mapping between standards

Synopsis:

As a user I want to easily find mapping between standards

How to demo:

I can easily generate / curate mappings between standards and its elements so I can rapidly evaluate if standards are related and how I could move between standards. This is in particular motivated because ESFRIs have been developing in parallel resulting in some overlap. Moreover, within individual studies and biobanks data has been collected before standardization took place. Hence, there is a need to rapidly assess the mapping of data elements between models and formats. In collaboration with the BioSHaRE project an ontology based method has been developed to facilitate this process, which will be integrated in the repository. Figure 2 shows an example of ‘target data elements’ on the left and then proposes mappings for these elements across three data sources.

16

Page 17: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Proof of concept (presentation of Chao) 17

http://biobankconnect.org/

Page 18: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Using standards as tool for integration

Synopsis:

As a user I want to integrate my data with other data using standards registry as a way

How to demo:

Given data in hand I can use the mapping tool to integrate into standards and then discover data sources / annotation services that I can use to integrate with. WP8 personalized medicine has provided good example use case around leukemia where mutation data needs to be enhanced with knowledge on existing cancer cases (integration with COSMIC) and whether gene expression can be influenced using drugs (integration with Chembl).

18

Page 19: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Proof of concept (WP8) 19

Discover annotation services available

based on attribute meta data

Page 20: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Evaluate ‘maturity’

Synopsis:

As user I want to evaluate the maturity / quality / support of the standard as a basis to decide whether to use it

How to demo:

I want to see to what extent the standard is used in databases and software tools and institutes, and what their experiences are. Also I want to know if there is active support of the standard in term of software tools and expert groups that can help using the standard.

20

Page 21: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Could not find good example? 21

Page 22: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Access to expert knowledge

Synopsis:

As researcher I want to access to expert knowledge about the standard

How to demo:

I can easily drill down on the models to discover background information that support the models. For example, for questionnaire modules it would be discoverable if there are pitfalls when changing the order of the questions, or if there is knowledge about the stability when using in a longitudinal setting or when using in repetition. Moreover, it should be visible what persons or institutes have provided the information so that users can assess to what extent they want to trust the information provided.

22

Page 23: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Could not find good source for this: 23

E.g. http://biostars.org

Page 24: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Find collaborators

Synopsis:

As a user I want to get into contact with colleagues having similar questions / objectives

How to demo:

I can easily find colleagues who are or have been dealing with similar questions as the current user is struggling with. For example, a new biobankers developing a new laboratory protocol may wonder what information should be captured to ensure research use and may want to learn more than currently available in the portal. The focus group expected that in these cases a forum application was not enough and that the portal would provide a perfect platform to even enable drill down to experts who have indicated willing to be found.

24

Page 25: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Could not find a example ... Out of scope? 25

Page 26: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Other stories

•  As group I want to develop define the data elements of a new standard (e.g. MIABIS working group)

•  As user I want to see version history of the standard (why?)

•  As user I want to convert between standards (e.g. ETL)

•  As database owner I want to move my data to sustainable resource

•  As user I want to upload my data to a public repository (e.g. EGA)

•  As user I want to protocols related to standards (http://www.molmeth.org/

•  And associated data items? E.g. blood pressure

•  As user I want to choose between overlapping formats

•  PLEASE EXTEND

26

Page 27: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Proof of concept implementation

27

Page 28: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Proof of concept

•  Code in http://github/molgenis/molgenis

•  Early demo on http://molgenis08.target.rug.nl

Page 29: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Basic data capabilities (fine grained meta data / schema) 29

Page 30: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Basic capabilities (any data) 30

Page 31: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

REST interface (metadata/data, used in WP4 federation) 31

Page 32: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Excel upload details 32

https://www.dropbox.com/s/1r26jfh8lmupvqj/MAGE-TAB.xlsx

Page 33: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Now simplifying and detailing a bit more 33

https://github.com/molgenis/molgenis/wiki/EMX-upload-format

Page 34: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Looking forward 34

https://docs.google.com/spreadsheets/d/1rDqQ4hz4uWs4JcVKmJTZza__ztBivExVTbkGOx3ev_0

Page 35: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Integrations

•  Tool registry •  To link to tools who can implement a mapping / consume /

produce a standard

•  Identifiers.org •  For merging datasets across identifier spaces (which you need in

combination with format / model mapping)

•  Biosharing.org •  To not duplicate meta data about standards and formats and

their usage?

35

Page 36: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Goal of this meeting

36

Page 37: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Goals of the meeting

•  Scope •  what types of standards to include

•  Yes: models, formats, MIAx guidelines? (syntax?)

•  Discussion: identifiers, value sets, ontologies? (semantics?)

•  Tools registry: services, query interfaces, tools, ...

•  priorities of user stories (MoSCoW)

•  priorities on content - what standards to catalogue first and why

•  priorities of meta-data to be captured about each standard

•  overview of stakeholders/users

•  Next steps •  existing solutions/contents and gap analysis

•  ideas about user interfaces needed

•  identify tasks / contributors to the deliverable

•  summary of the ‘how to demo’ for first public release

•  Case studies

37

Page 38: Morris Swertz June 24, 2014, VUmc, Amsterdam · • Tools registry: services, query interfaces, tools, ... • priorities of user stories (MoSCoW) • priorities on content - what

Notes

•  License?

38