32
Statistical Bioinformatics • Genomics • Transcriptomics • Proteomics • Systems Biology

Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

Embed Size (px)

Citation preview

Page 1: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

Statistical Bioinformatics

• Genomics

• Transcriptomics

• Proteomics

• Systems Biology

Page 2: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

Statistical Bioinformatics

• Genomics

• Transcriptomics

• Proteomics

• Systems Biology

Page 3: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

Multiple Sequence Alignment (MSA)

Page 4: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

Multiple Sequence Alignments (MSA):

Page 5: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

Some past forces shaping MSAs

• Divergence of sequences by speciation and nucleotide substitution (Phylogenetics).

• Horizontal gene transfer (recombination), especially in bacteria and viruses.

Page 6: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

TOPALi v.1 TOPALi v.1 Recombination detectionRecombination detection

Frank Wright,Iain Milne & Dirk Husmeier

Page 7: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

TOPALi applied to Roseburia

and Eubacterium sequences

Page 8: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

Some past forces shaping MSAs

• Divergence of sequences by speciation and nucleotide substitution (Phylogenetics).

• Horizontal gene transfer (recombination), especially in bacteria and viruses.

• Selective pressure acting on functional domains.

Page 9: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

TOPALi v2 Future plans

• Detect genomic regions under selective pressure functional domains in proteins

• Methodology development: combined prediction of breakpoints due to recombination and evolutionary rate change.

• Improved phylogenetic analysis • Investigate use of UK GRID computational

resources for faster analyses

Page 10: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

Statistical Bioinformatics

• Genomics

• Transcriptomics

• Proteomics

• Systems Biology

Page 11: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

Genes differently expressed between two conditions

– Affymetrix microarrayMouse liver experiment

– Low fat diet vs high fat diet (6 per group)

– Plot of log-fold change vs. average log intensity.

– Points far away from the horizontal line seem “differentially expressed”.

– Which are significant?

Page 12: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

• Statistical Methods (SAM, Limma,…) help to detect significant genes

• BUT: Many methods assume that the variances in both groups are the same

• If this is not the case:– Algorithms might give

wrong answers– The definition of “differential

expression” becomes more difficult

Which group gives 'higher' values?

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

Page 13: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

0 2 4 6

02

46

81

0

Check for change in variance

expected absolute log(F)

ob

se

rve

d a

bso

lute

lo

g(F

)

Page 14: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

Claus Mayer (BioSS)

• More complex statistical tests for detecting differential gene expression.

• Situations where standard assumptions are violated.

• Allows for different variance-covariance structures in both populations.

Page 15: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

Statistical Bioinformatics

• Genomics

• Transcriptomics

• Proteomics

• Systems Biology

Page 16: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

Proteomics: 2-D Gels

gel 1 gel 2

How to compare gels 1 and 2?

Page 17: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

Chris Glasbey: Nonlinear Warping

John Gustafsson, Chalmers University, Sweden

WARP

Page 18: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

2-D Gel Comparison

Two gels superimposed (in different colours)

Page 19: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

Proteomics: 2-D Gel Interpretation

• Graham Horgan

• Identify spots which differ between treatments using variance and covariance information from other spots

differently expressed proteins

• Assessment of associations between spot densities and physiological variables.

Page 20: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

Statistical Bioinformatics

• Genomics

• Transcriptomics

• Proteomics

• Systems Biology

Page 21: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

Detect active pathways in a “known” network

• Network of protein-protein and protein-DNA interactions “known” from the literature

• Gene expression profiling for different conditions– Bacterial strains: promoting - preventing inflammation– Mice on a low-fat vs. high-fat diet

• Can we identify different pathways associated with these conditions?

• We need a robust method – Expression data: noisy, missing values– Post-translational modifications

Page 22: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

Cytokine Network

• Collaboration with SCGTI• Interferon Pathway

– Cytokines– Pivotal role in modulating the innate and adaptive

mammalian immune system• Network of protein-protein and protein-DNA

interactions from the literature• Two gene expression times series from bone

marrow-derived macrophages in mice – Infected with cytomegalovirus – Infected and treated with IFN-gamma

Page 23: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

casp8

bak

cybb

casp9 cdkn1a

ccl5

b2m

bcl-xlbcl2

c2ta

casp1

casp3

casp7fcer2a

fkbp4

g1p2 hist4h4

hla-ahla-bhla-c hla-dra

ifna11ifna1

hla-drb

ifna14 ifna4

ii

il12a

il12b

il1b

irf1

irf5

irf4

irf3

irf7

isgf3g

itgam

lcsbp1

lfnb

oas1

prkr

psmb10

psme1

psmb9

psmb8

psme2

sfpi1

stat1

stat2

stat6

tap1

tap2

tnfrsf6

tnfsf6

ctss

irf2

Page 24: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

casp8

bak

cybb

casp9 cdkn1a

ccl5

b2m

bcl-xlbcl2

c2ta

casp1

casp3

casp7fcer2a

fkbp4

g1p2 hist4h4

hla-ahla-bhla-c hla-dra

ifna11ifna1

hla-drb

ifna14 ifna4

ii

il12a

il12b

il1b

irf1

irf5

irf4

irf3

irf7

isgf3g

itgam

lcsbp1

lfnb

oas1

prkr

psmb10

psme1

psmb9

psmb8

psme2

sfpi1

stat1

stat2

stat6

tap1

tap2

tnfrsf6

tnfsf6

ctss

Subnetwork 1 = Infected

irf2

Page 25: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

casp8

bak

cybb

casp9 cdkn1a

ccl5

b2m

bcl-xlbcl2

c2ta

casp1

casp3

casp7fcer2a

fkbp4

g1p2 hist4h4

hla-ahla-bhla-c hla-dra

ifna11ifna1

hla-drb

ifna14 ifna4

ii

il12a

il12b

il1b

irf1

irf5

irf4

irf3

irf7

isgf3g

itgam

lcsbp1

lfnb

oas1

prkr

psmb10

psme1

psmb9

psmb8

psme2

sfpi1

stat1

stat2

stat6

tap1

tap2

tnfrsf6

tnfsf6

ctss

Subnetwork 2 = Infected+treated

irf2

Page 26: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

Reverse Engineering of Regulatory Networks

• Can we learn the network structure from postgenomic data themselves?

• Statistical methods to distinguish between– Direct correlations– Indirect correlations

• Challenge: Distinguish between– Correlations– Causal interactions

• Breaking symmetries with active interventions:– Gene knockouts (VIGs, RNAi)

Page 27: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology
Page 28: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology

Evaluation: Raf signalling pathway

• Cellular signalling network of 11 phosphorylated proteins and phospholipids in human immune systems cell

• Laboratory data from cytometry experiments– Down-sampled to 100 measurements– Sample size indicative of microarray experiments

• Two types of experiments:– Passive observations– Active interventions (gene knockouts)

• Literature: “gold-standard” network

Page 29: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology
Page 30: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology
Page 31: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology
Page 32: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology