40
Medicel Integrator platform - views to current themes in systems biology Tommi Aho Computational Systems Biology 1 6.2.2008 (slides modified from material of Medicel)

Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Medicel Integrator platform -views to

current themes in systems biology

Tommi AhoComputational Systems Biology 1

6.2.2008

(slides modified from material of Medicel)

Page 2: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Outline

•Part 1: Biology is complex•Part 2: How to model biology – in theory•Part 3: Is the data available?•Part 4: Data integration

Page 3: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 1 – Biology is complex

http://www.studiodaily.com/main/technique/tprojects/6850.html

Page 4: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 2 – How to model biology – in theory

Page 5: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 2 – How to model biology – in theory

Page 6: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Biology Modeling

Data Integration

anatomybiochemistry

botanycell biology

ecologyevolutiongenetics

immunologyhistolocymicrobiologyparasitologypathologypharmacalogyzoology

Structural models,Graph models,

FBA, ODEdx/dt = f (x(t), u(t) ,t ) y(t) = g (x(t), u(t), t )

Partial differential equationsStatistics, Optimization

XML, SBML, SQL, RDF, OGSA-DAI, etc

Part 2 – How to model biology – in theory

Page 7: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 3 – Is the data available?

Page 8: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 3 – Long history of research

•1924: Spemann and Mangold reveal the phenomenon of primary embryonic induction

•1970: Nieuwkoop et al. show that animal hemisphere cells are induced to become mesoderm by signals from vegetal hemisphere cells (so it depends on primary polarity axis definition)

•1990: ...Fig. from Scott GilbertDevelopmental BiologySinauer Press

Page 9: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 3 – Documentation of components

•1990: Asashima et al., Smith et al., Sokol et al. Show that activin A (TGFb) induces mesoderm. Then noggin and Vg1 join the list. (Chicken ovalbumin genes was cloned 10 years earlier)

•> 5000 references to text•> 150 images

Page 10: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 3 – The components

•Once upon a time: One gene, one protein, one function

activin

Cell differentiation

Page 11: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 3 – More functions

•One gene, many functions...

activin

Gonadotropin release

Cell differentiation

Inflammation

Carbohydrate metabolism

Protein & steroid metabolism

Page 12: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 3 – More components

•Redundancy, overlapping, specificity, divergence...

activin

Gonadotropin release

Cell differentiation

Inflammation

Carbohydrate metabolism

Protein & steroid metabolism

nogin

Vg1wnt

Page 13: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 3 – Store and share the data

•Scientific documentation today

Page 14: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 3 – Store and share the data

•Scientific documentation today

The user hard disk

Page 15: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 3 – Data is far away

•typical

Page 16: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 3 – Data should be at hand

•integrated

Page 17: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 3 – Conclusions

•Most of the data is never shared•No systematic data accumulation•Lacking meta-data: what parameter was measured, where did the sample come from and when was the parameter measured?

•Seriously impairs our competitiveness•IT solutions needed - biomedical researchers cannot resolve the problems alone

Page 18: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 4 – Data integration

“Integration is difficult”Stein, L.D., Integrating Biological Databases. Nature Rev. Genet. 4, 337-345 (2003)

Page 19: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 4 – Integration example

Model­ as SBML file­ 612 compounds with IDs

Model­ as Excel file­ 1039 compounds with somewhatsimilar IDs with SBML model­ 756 corresponding KEGG IDs

KEGG database­ 1843 compounds withKEGG IDs

Page 20: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 4 – Integration example

538 same

Model­ as SBML file­ 612 compounds with IDs

Model­ as Excel file­ 1039 compounds with somewhatsimilar IDs with SBML model­ 756 corresponding KEGG IDs

KEGG database­ 1843 compounds withKEGG IDs

501 not found

74 not found1255 not found588 same

169 not found

Page 21: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 4 – Integration difficulties

•Diversity of data•Heterogenity of available databases:

› Data stored in different formats› Often no schema (i.e. structural definition) available

Page 22: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 4 – Integration difficulties

•Conflicts of terms: What is a gene?•Namespace difficulties (1):

› One object, multiple names

e.g. P53_HUMAN: P04637, Cellular tumor antigen p53, Antigen NY-CO-13,Tumor suppressor p53, Phosphoprotein p53, p53, ...

= = =

P04637P53_HUMAN Phosphoprotein

p53

Tumor suppressor

p53

= ...

Page 23: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 4 – Integration difficulties

•Namespace difficulties (2): › Multiple objects, one name

e.g. P53 refers to

• a set of proteins across different species

• a set of transcripts encoding those proteins

• a set of genes encoding those transcripts

Common name

...Object 1 Object 2 Object 3 Object 4

!= != != !=

Page 24: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 4 – Technical difficulties

•Lack of metadata - or metadata exists, but in unstructured form (e.g. notes) that is not computer readable

•External databases: No standard accession method•Database versions: Updated vs. old data•Data model: No unified model available•Amount of data

Page 25: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

•The system includes following data sources: › ENSEMBL

› NCBI Taxonomy

› NCBI Refseq Proteins

› UniProt/Swissprot

› UniProt/TrEMBL

› Interpro

› Mammalian Phenotype Ontology

› IntAct

› KEGG

› Human Disease Ontology

› GO (Gene Ontology)

› Cell Ontology

Part 4 – Databases in Integrator

• Chebi• Cytomer• Brenda Tissue Ontology• PDB• PubMed

Current database• 2,5 million proteins

• 75 000 genes

• 98 000 transcripts

• 10 million connections on 144 000 pathways

• 1200 different species

Page 26: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 4 – Database Schema - Medicel Infomodel

•Performing efficient searches across databases presents a big problem as the database structures are not unified

•Answer -> Structuring of data into a unified schema •Medicel Infomodel is the framework of the platform•Explains how data is organized into tables and fields of

the database•Using a unified schema is indispensable when wanting to

bring different experimental data together•Data is much more worth when it is compatible -> more

likely to arouse new knowledge

Page 27: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

•Schema to model biology•Divided into biological data and meta data•Biological systems consist of interacting components•Interactions effect the change in the amounts of the

components•Amounts of the components give the state of the system•Pathways model these systems

Part 4 – Database Schema - Medicel Infomodel

Page 28: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

•About 200 data tables constitute a relational database •Tables define the attributes of objects and the relations

of the objects to each other•E.g. a gene can be annotated to a category and the

category annotated to be part of another category•Data in the tables is structured in rows and columns

› Table -> Object Class› Row -> Object› Column -> Property of Object

•Knowledge of the Infomodel is not required of every user

Part 4 – Database Schema - Medicel Infomodel

Page 29: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 4 – Medicel Infomodel at high level

Component Data System Data State Data Laboratory Data

(This is an abstract representation showing only a fraction of the Medicel Infomodel.)

Page 30: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 4 – What is Component Data

•Definitions of quantifiable components (e.g. protein, genome, gene, macromolecular complex, organism)

› Name is not a real definition› Structural facts are concrete definitions that

• can be detected in laboratory• compared by computer algorithms

› Component list (formula)• implies molecular mass and charge

› Patterns• Bonds between components

› Sequence› Features

•Useful definitions can explain system behaviour

Page 31: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 4 – Where does component data come from?

•Population of databases› e.g. UniProt, Ensembl are protein databases› The key is to identify “reference objects” -> one unique

name which may have many database references•Own components given in

› Individuals e.g. patients examined› Populations e.g. any group of individuals like ‘the Finns’› Organisms e.g. genetically engineered microbe strains

Page 32: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 4 – What is system data

A system is an assemblage of inter-related elements comprising a unified whole (Wikipedia)

Page 33: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Location•a named real biological system that can be identified•a unique location needs to be created for each distinct

biologically interesting context•are related through common components•for each Location, information is recorded about

› Environment› Population› Individual› Organism› Organ› Tissue› Cell type› Cellular compartment

... an assemblage of inter-related elements comprising a unified whole

Page 34: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Locations are related through common components

i n p u to u t p u t

L[location1]: En[fermentor]

L[location2]: En[fermentor]Po[population]

O[Saccharomyces cerevisiae]Ct[yeast_cell]

L[location3]:

En[fermentor] ...

Cc[nucleus]

Page 35: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Components

•Various kinds of components› Genes, Transcripts, Proteins, Compounds,

Macromolecular complexes...

› but also, at a higher level: Cell types, Individuals, Populations, Environments

• not limited to molecular systems

... an assemblage of inter-related elements comprising a unified whole

Page 36: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Interaction•an event (or a process)

› typically, a biochemical event

•Components are connected to Interactions via Connections

› Different types of connections:• substrate (is consumed)

• product (is produced)

• control (is neither consumed or produced, but affects)

• outcome (not consumed or produced, but affected)

... an assemblage of inter-related elements comprising a unified whole

     

Page 37: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Example: Transcription

gene

transcript

transcription

connections

Page 38: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Pathway

•a network model of one location› a container for the components and interactions

•there can be multiple pathways for one location› at different abstraction levels

› alternative models from different origin, creators, evidence

... an assemblage of inter-related elements comprising a unified whole

Page 39: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 4 – What is state data?

•State data describes quantitatively the state in which a location (system) currently is in

› May quantify something about the location itself or about a component in the location

•Non-state data can be derived from state data

› E.g. p-values are quantitative but not state data

Page 40: Medicel Integrator platform - views to current themes in ... · Vg1 wnt. Part 3 – Store and share the data •Scientific documentation today. Part 3 – Store and share the data

Part 4 – Infomodel for state data

s t a t ev a r ia b le

v a r ia b le

u n it

c o m p o n e n t

lo c a t io n

p a t h w a y

s t a t e d a t ap o in t

s a m p le

in d e x

t im e o fo b s e r v a t io n

fr e e t e x td e s c r ip t io n

v a lu e

0 . . 1

1

1

0 . . *

0 . . 1

1

1

1

0 . . 1

0 . . 1

0 . . 1

0 . . 1

x - c o o r d in a t e

y - c o o r d in a t e

z - c o o r d in a t e0 . . 1

0 . . 1

0 . . 1

Quantitative information

Biological information

Storing information

Several data points per state variable – one state variable per data point