43
Prediction of protein localization and membrane protein topology Gunnar von Heijne Department of Biochemistry and Biophysics Stockholm Bioinformatics Center Stockholm University

Prediction of protein localization and membrane protein topology

  • Upload
    aurek

  • View
    44

  • Download
    0

Embed Size (px)

DESCRIPTION

Prediction of protein localization and membrane protein topology. Gunnar von Heijne Department of Biochemistry and Biophysics Stockholm Bioinformatics Center Stockholm University. Stockholm Bioinformatics Center. www.sbc.su.se. sorting. Protein localization. - PowerPoint PPT Presentation

Citation preview

Page 1: Prediction of protein localization and membrane protein topology

Prediction of protein localization and membrane protein topology

Gunnar von Heijne

Department of Biochemistry and Biophysics

Stockholm Bioinformatics Center

Stockholm University

Page 2: Prediction of protein localization and membrane protein topology

Stockholm Bioinformatics Center

www.sbc.su.se

sorting

Page 3: Prediction of protein localization and membrane protein topology

Protein localization

Page 4: Prediction of protein localization and membrane protein topology

Protein sorting in a eukaryotic cell

SP

Page 5: Prediction of protein localization and membrane protein topology

The ’canonical’ signal peptide

n h c

-3 -1

n-region: positively charged

h-region: hydrophobic

c-region: more polar, small residues in -1, -3

mTP

Page 6: Prediction of protein localization and membrane protein topology

mTPs are rich in R & K and can form amphiphilic helices

(Abe et al., Cell 100:551)

cTP

mTP bound to Tom20

Page 7: Prediction of protein localization and membrane protein topology

Typical chloroplast transit peptide

IV X A A

mature

MA-

no G,P,K,R

no D,E

high S,T

no D,E

high S,T

high R

no D,E

high S,T

ANN

Page 8: Prediction of protein localization and membrane protein topology

A simple artificial neural network (ANN)

A C G T A C G T A C G T

A A G AC

1 0 0 0 1 0 0 0 0 0 1 0

ACGnot

ACG output layer

input layer

Inside ANN

Page 9: Prediction of protein localization and membrane protein topology

Artificial neural networks:a summary

- a high-quality dataset (positive and negative examples)

- an ANN architecture (can be optimized)

- all internal parameters in the ANN are systematically optimized during a training session

- evaluate the predictive performance using cross- validation

ChloroP

Page 10: Prediction of protein localization and membrane protein topology

ChloroP(Prot.Sci. 8:978)

0

10

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

MEME score

residue

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

network score

-30

-20

-10

TargetP

Page 11: Prediction of protein localization and membrane protein topology

TargetP - a four-state SP/mTP/cTP/other predictor

(JMB 300:1105)

performance

Page 12: Prediction of protein localization and membrane protein topology

TargetP sensitivity/specificity

sens spec

SP .91 .96

mTP .82 .90

cTP .85 .69

other .85 .78

sens = tp/(tp+fn) spec = tp/(tp+fp)

Other predictors

Page 13: Prediction of protein localization and membrane protein topology

Other ways to predict localization

- amino acid composition

- sequence homology

- domain structure

- phylogenetic profiles

- expression profiles

Membrane proteins

Page 14: Prediction of protein localization and membrane protein topology

Popular prediction programs

SignalP (NN, HMM)

ChloroP

TargetP

LipoP

-------

MitoProt

PSORT

Membrane proteins

www.cbs.dtu.dk

Page 15: Prediction of protein localization and membrane protein topology

Membrane protein topology

Page 16: Prediction of protein localization and membrane protein topology

A simulated lipid bilayer(Grubmüller et al.)

QuickTime™ and aYUV420 codec decompressorare needed to see this picture.

Page 17: Prediction of protein localization and membrane protein topology

Only two basic structures(Quart.Rev.Biophys. 32:285)

Helix bundle ß-barrel

Lipid/prot interactions

Page 18: Prediction of protein localization and membrane protein topology

Most MPs are synthesized at the ER

SP

Page 19: Prediction of protein localization and membrane protein topology

The basic model(courtesy Bill Skach)

prediction

Page 20: Prediction of protein localization and membrane protein topology

Topology prediction

Page 21: Prediction of protein localization and membrane protein topology

TM helix lengths are typically 20-30 residues

(Bowie, JMB 272:780)

Trp, Tyr

Page 22: Prediction of protein localization and membrane protein topology

Trp & Tyr are enriched in the region near the lipid headgroups

(Prot.Sci. 6:808; 7:2026)

Loop lengths

Page 23: Prediction of protein localization and membrane protein topology

Loops tend to be short(Tusnady & Simon, JMB 283:489)

PI rule

Page 24: Prediction of protein localization and membrane protein topology

The ’positive inside’ rule(EMBO J. 5:3021; EJB 174:671, 205:1207; FEBS Lett. 282:41)

N

C

+ + +

Bacterial IMin: 16% KR out: 4% KR

Eukaryotic PMin: 17% KR out: 7% KR

Thylakoid membranein: 13% KR out: 5% KR

Mitochondrial IMIn: 10% KR out: 3% KR

in

out

prediction

Page 25: Prediction of protein localization and membrane protein topology

The positive-inside rule applies to all organisms

(Nilsson, Persson & von Heijne, submitted)

0

10

20

30

40

50

60

70

80

90

100

110

A C D E F G H I K L M N P Q R S T V W Y

(D+E) (K+R) (W+Y)

num

ber

of g

enom

es

amino acid

Page 26: Prediction of protein localization and membrane protein topology

Topology can be manipulated(Nature 341:456)

Lep constructs expressed in E. coli

f-Met-Ala-Asn-Met-Phe-

H1 H2

P1

P2

+

+

- -

QSLNASASE

H1 H2

P1

P2

++

+

+ +

+

++

+

+

- -

---

f-Met-Ala-Asn-Met-Phe-

Ala-Asn-Met-(Lys) -Phe-

H1H2

P1

P2

+

+

- -

QSLNASASE

4-

-

Lep wt Lep' Lep'-inv

periplasm

cytoplasm10+

2+

2+

4+

0+0+

PK

Page 27: Prediction of protein localization and membrane protein topology

Topology prediction - a classical problem in bioinformatics

MDSQRNLLVIALLFVSFMIWQAWE....

4 characteristics

Page 28: Prediction of protein localization and membrane protein topology

Three important characteristics

~20 hydrophobic residues

predictors

’Positive inside’ rule

Trp, Tyr

Page 29: Prediction of protein localization and membrane protein topology

Popular topology predictors

TMHMM (HMM)HMMTOP (HMM)TopPred (h-plot + PI-rule)MEMSAT (dynamic programming)TMAP (h-plot, mult. alignment)PHD (NN, mult. alignment)

toppred

Page 30: Prediction of protein localization and membrane protein topology

TopPred(JMB 225:487)

0 100 200 300 400-3

-2

-1

0

1

2

3

position

<H>

http://bioweb.pasteur.fr/seqanal/interfaces/toppred.html 2 3 5 4 2 2

1 0 0 1 1 0

2

∆+ = 17

2

1

3

0

5

0

4

1

2

3

0

2

∆+ = 9

- construct all possible topologies

- rank based on +

E. coli LacY

TMHMM

Page 31: Prediction of protein localization and membrane protein topology

TMHMM(Sonnhammer et al., ISMB 6:175, Krogh et al., JMB

305:567)

h & l models

www.cbs.dtu.dkwww.sbc.su.se

A hidden Markov model-based method

Page 32: Prediction of protein localization and membrane protein topology

HMMTOP(Tusnady & Simon, JMB 283:489)

performance

Page 33: Prediction of protein localization and membrane protein topology

Helix & loop models in TMHMM

HMMTOP

Page 34: Prediction of protein localization and membrane protein topology

TMHMM performance(Krogh et al., JMB 305:567; Melén et al. JMB 327:735)

Discrimination globular/membrane:sens & spec > 98%

Correct topology: 55-60%

Single TM identification:sensitivity: 96%specificity: 98%

Training set:160 membrane proteins650 globular proteins

# of TM proteins

Page 35: Prediction of protein localization and membrane protein topology

Can performance be improved?

Consensus predictions

Multiple alignments

Experimental constraints

# of TM proteins

Page 36: Prediction of protein localization and membrane protein topology

’Consensus’ predictions indicate reliability

(FEBS Lett. 486:267)

0

0,2

0,4

0,6

0,8

1

5/0 4/1 3/2 & 3/1/1 2/1/1/1

60 E. coli proteins

majority level

frac

tion

corr

ect/

cove

rage

5 prediction methods used

46% of 764 predicted E. coli IM proteins are in the 5/0 or 4/1 classes

Partial consensus

Page 37: Prediction of protein localization and membrane protein topology

TMHMM reliability scores(Melén et al. JMB 327:735)

TMHMM output:

1. Mean probability pmean

2. Minimum probability pmin(label)

3. PbestPath/PallPaths

Sequence: M C Y G K C I p(i): 0.78 0.78 0.78 0.76 0.76 0.08 0.03 p(h): 0.00 0.00 0.02 0.02 0.15 0.85 0.93 p(o): 0.22 0.22 0.20 0.20 0.08 0.07 0.04 Label: i i i i i h h

S3 results

Page 38: Prediction of protein localization and membrane protein topology

TMHMM (score 3)Prediction accuracy vs. coverage

Test set bias

60

70

80

90

100

0 20 40 60 80 100

perc

ent

corr

ect

coverage

~70%~45%

92 bacterial proteins

Page 39: Prediction of protein localization and membrane protein topology

”Experimentally known topologies” is a biased sample

0

10

20

30

40

test set

C. elegans

S.cerevisiae

E.coli

perc

ent

0-0.

25

0.25

-0.5

0.5-

0.75

0.75

-1

score interval

Estimate true performance

Page 40: Prediction of protein localization and membrane protein topology

Correlation between accuracy and TMHMM S3 score

02040608010000.20.40.60.81

mean score

perc

ent

corr

ect

genomes

Page 41: Prediction of protein localization and membrane protein topology

Expected TMHMM performance on proteomes

E. coli

S. cerevisiae

test set

C. elegans

40

50

60

70

80

90

100

0 25 50 75 100

coverage

perc

ent

corr

ect

Add C-term.

Page 42: Prediction of protein localization and membrane protein topology

Original TMHMM prediction, one TM helix missing

TMHMM prediction with C-terminus fixed to inside

Experimental information helps(JMB 327:735)

improvement

Page 43: Prediction of protein localization and membrane protein topology

When the location of the C-terminus is

known, the correct topology is predicted for

an estimated ~70% of all membrane proteins

(~ 55% when not known)

Reporter fusions

Experimental information helps(JMB 327:735)