Systems biology - Bioinformatics on complete biological systems

Preview:

Citation preview

Lars Juhl Jensen

Systems biologyBioinformatics on complete biological

systems

can a biologist fix a radio?

Lazebnik, Biochemistry, 2004

one gene

one postdoc

knockout phenotype

name the gene

Lazebnik, Biochemistry, 2004

all aspects

one gene

high-throughput biology

one technology

one lab

all genes

one aspect

systems biology

complete systems

all aspects

all genes

systems-level properties

two subfields

mathematical modeling

small systems

data integration

large systems

mathematical modeling

small systems

Chen, Mol. Biol. Cell, 2004

many equations

Chen, Mol. Biol. Cell, 2004

simulation

Chen, Mol. Biol. Cell, 2004

many parameters

Chen, Mol. Biol. Cell, 2004

requires detailed knowledge

data integration

association networks

guilt by association

STRING

~2.6 million proteins

Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011

genomic context

gene fusion

Korbel et al., Nature Biotechnology, 2004

operons

Korbel et al., Nature Biotechnology, 2004

bidirectional promoters

Korbel et al., Nature Biotechnology, 2004

phylogenetic profiles

Korbel et al., Nature Biotechnology, 2004

a real example

Cell

Cellulosomes

Cellulose

experimental data

gene coexpression

protein interactions

Jensen & Bork, Science, 2008

curated knowledge

complexes

pathways

Letunic & Bork, Trends in Biochemical Sciences, 2008

many databases

different formats

different identifiers

variable quality

not comparable

hard work

quality scores

von Mering et al., Nucleic Acids Research, 2005

calibrate vs. gold standard

missing most of the data

text mining

>10 km

too much to read

computer

as smart as a dog

teach it specific tricks

named entity recognition

comprehensive lexicon

cyclin dependent kinase 1

CDK1

CDC2

flexible matching

spaces and hyphens

cyclin dependent kinase 1

cyclin-dependent kinase 1

orthographic variation

CDC2

hCdc2

“black list”

SDS

information extraction

count co-mentioning

within documents

within paragraphs

within sentences

scoring scheme

corpora

~22 million abstracts

no access

~4 million full-text articles

augmented browsing

Reflect

browser add-on

real-time text mining

Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009O’Donoghue et al., Journal of Web Semantics, 2010

localization and disease

small molecules

proteins

compartments

tissues

diseases

organisms

environments

suite of web resources

common backend database

jensenlab.org

text mining

curated knowledge

experimental data

computational predictions

quality scores

web-centric databases

DISEASES

visualization

COMPARTMENTS

compartments.jensenlab.org

TISSUES

tissues.jensenlab.org

project onto networks

Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011

compartments.jensenlab.org

tissues.jensenlab.org

diseases.jensenlab.org

summary

bioinformatics

more than alignment

data/text mining

save you much time

Acknowledgments

Protein networks

Christian von MeringDamian Szklarczyk

Michael KuhnManuel Stark

Samuel ChaffronChris Creevey

Jean MullerTobias DoerksPhilippe Julien

Alexander RothMilan Simonovic

Jan KorbelBerend Snel

Martijn HuynenPeer Bork

Literature miningSune FrankildEvangelos PafilisJanos BinderKalliopi TsafouAlberto SantosHeiko HornMichael KuhnNigel BrownReinhardt SchneiderSean O’Donoghue

Recommended