47
Secondary structure prediction

Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Embed Size (px)

Citation preview

Page 1: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure prediction

Page 2: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure prediction

Amino acid sequence -> Secondary structure

Alpha helixBeta strandDisordered/coil

70% accuracy 1991, 81% accuracy in 2009

Page 3: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure prediction

Limits:

Limited to globular proteins

Not for membrane proteins

Page 4: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure prediction

Applications

Site directed mutagenesisLocate functionally important residuesFind structural units / domains

Page 5: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure prediction

Techniques

Linear statisticsPhysicochemical propertiesLinear discriminationMachine learningNeural NetworksK-nearest neighboursEvolutionary treesResidue substitution matrices

Using evolutionary information = Multiple sequence alignments.

Page 6: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure predictionJnet

Cuff and Barton (2000)

Neural Network

Training set: 480 proteins (non homologous)

Construction of MSA for each using BLAST

Page 7: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure predictionJnet

Neural network

Ni Neuron i, Nj neuron jWij weight from Ni to Nj

Ni NjWij

Signal forward propagationOutput from Ni * Weight Ni to NjInput to Nj is Ij = Oi * Wij

Page 8: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure predictionJnet

Neural network

Ni Neuron i, Nj neuron jWij weight from Ni to Nj

Ni

Nz

Wij

Nj

Nk

Input layer

Output layer

Page 9: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure predictionJnet

Neural network

The network receives input values

Ni

Nz

Wiz

Nj

Nk

Wjz

Wkz

Ii

Ij

Ik

Input layer

Output layer

Page 10: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure predictionJnet

Neural network

Signal forward propagation

Ni

Nz

Wiz

Nj

Nk

Sum of outputs from Ni, Nj, Nk

Wjz

Wkz

Oz

Ii

Ij

Ik

Page 11: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure predictionJnet

Neural network

Compute the error

Ni

Nz

Wiz

Nj

Nk

Desired value is 1, Oz is 0.8Error is = Oz – desired value = 1-0.8 =0.2

Wjz

Wkz

Oz

Ii

Ij

Ik

Page 12: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure predictionJnet

Neural network

Error backpropagation

Ni

Nz

Wiz

Nj

Nk

Weights are modified so that the result is a bit closer to what we wanted

Wjz

Wkz

Oz

Page 13: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure predictionJnet

Neural network

Ni

NzNj

Nk

Input layer

Output layer

Hidden layer

Np

Nq

Page 14: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure predictionJnet

Neural network

Input layer: read a sequence

A

C

D

…Y

CTEIL...

Page 15: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure predictionJnet

Neural network

Input layer: read a sequence

A

C

D

…Y

CDEKL...CDEKL...

010

0

Page 16: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure predictionJnet

Neural network

Input layer: read a sequence

A

C

D

…Y

CDEKL...CDEKL...

001

0

Page 17: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure predictionJnet

Neural network

Input layer: read a sequence

A

C

D

…Y

CDEKL...CDEKL...

Page 18: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure predictionJnet

Neural network

Output layer: structureDesired output: known structure

A

C

D

…Y

CDEKL...

b

c

a

Alpha-helix

1

0

0

Page 19: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure predictionJnet

Neural network

Output layer: structureDesired output: known structure

A

C

D

…Y

CDEKL...

b

c

a

Alpha-helix

1

0

0

0.4

0.6

0.1

Page 20: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure predictionJnet

Neural network

Error backpropagation = weights are modified

A

C

D

…Y

CDEKL...

b

c

a

Alpha-helix

1

0

0

0.4

0.6

0.1

Page 21: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure predictionJnet

Jnet architecture

LAPEDCDEKLKLEPNAC

b

c

a

Sequence to structure network

Input layer = window of 17 residuesHidden layer = 9 neuronsOutput layer = 3 neurons

Page 22: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure predictionJnet

Jnet architecture

FLAPEDCDEKLKLEPNACW

b

c

a

ccaacaaccbbbbbcbbbc

Page 23: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure predictionJnet

Jnet architecture

FLAPEDCDEKLKLEPNACW

b

c

a

Structure to structure network

b

c

a

Input layer = window of 19 residuesHidden layer = 9 neuronsOutput layer = 3 neurons

ccaacaaccbbbbbcbbbc

ccaaaaacccbbbbbbbbc

Page 24: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure prediction

Cole et al (2008) Nucleic Acids Research

Geoff Barton, University of Dundee

Page 25: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure prediction

Uses algorithm Jnet2.0Three state predictionAlpha, beta, coilAccuracy 81.5% (2008)But if no homolog (orphan sequence) 65.9%!

PSIBLAST PSSM matrixHMMer profiles (instead of aa frequencies)Multiple neural networks100 hidden layer units

Jpred

Page 26: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure prediction

First, search against PDB sequences using BLAST (but only for warning)

PSIBLAST search of UniRef90, 3 iterations, Alignment of hits (filtered at 75% id)

Profiles from alignment (PSSM and HMMer)

Profiles are input to JNet

Alternative: user provides alignment (faster)

Jpred

Page 27: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure predictionAdvanced Jpred4 usage

Page 28: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure prediction

Jpred output

Page 29: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure prediction

Jpred output / JalviewJPred

Page 30: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure prediction

Jpred output / view allJPred

Page 31: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Secondary structure prediction

Jpred output / PDF outputJPred

Page 32: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Starting Jalview

Open Firefox with JRE (from ZDV)

Go to http://www.jalview.org

Click the pink arrow “Launch Jalview Desktop”

You can close all the demo windows that appear

Exercise 1/4Jalview 2D prediction

Page 33: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Load an alignment

Use MR1_fasta.txtThis is an alignment of a fragment of the mineralocorticoid receptor

Open it from File > Input alignment > From file(Hint: You can load it directly as an URL, e.g. https://cbdm.uni-mainz.de/files/2015/02/MR1_fasta.txt)

The alignment has its own Menu tabsTry Colour > Clustalxto see conservation

Exercise 1/4Jalview 2D prediction

Page 34: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Exercise 2/4Jalview 2D prediction

Build a phylogenetic tree

Also using MR1_fasta.txt

Try Calculate > Calculate tree > Neighbour joining using PAM 250

Hint: We used identifiers indicating species names

Try now Calculate > Calculate tree > Average distance using PAM 250

Compare results. Which one is better?

Page 35: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Exercise 2/4Jalview 2D prediction

XenopusAnolis

Taeniopygia

GallusMonodelphis Danio

Page 36: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Web service -> Secondary Structure Prediction -> Jnet secondary str pred

No selection (or all sequences selected) = Jnet runs on top sequence using the alignment (fast)

One sequence (or region) selected = Jnet runs on that sequence using homologs (slow)

Some sequences selected = Jnet runs on top one using homologs (slow) Try with all sequences selected

Exercise 3/4Jalview 2D prediction

Page 37: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

If this doesn’t work you can run directly MR1_fasta.txt on jpred4.

Use the advanced optionUpload a file optionSelect type of input = Multiple alignment (use format FASTA)Tick the skip PDB search optionThere is an option to view output in Jalview

Exercise 3/4Jalview 2D prediction

Page 38: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Annotations: • Lupas_21, Lupas_14, Lupas_28

Coiled-coil predictions for the sequence. 21, 14 and 28 are windows used.

Exercise 3/4Jalview 2D prediction

Page 39: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Annotations: • JNETHMM, JNETALIGN: predictions using diff profiles• Jnetpred: Consensus prediction. Beta sheets: green arrows. Alpha helices: red tubes.

Exercise 3/4Jalview 2D prediction

Page 40: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Annotations: • JNETCONFConfidence in the prediction.

Exercise 3/4Jalview 2D prediction

Page 41: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Annotations: • JNETSOL25,JNETSOL5,JNETSOL0

Solvent accessibility predictions - binary predictions of 25%, 5% or 0% solvent accessibility.

Exercise 3/4Jalview 2D prediction

Page 42: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Obtain the sequence of the human glutamine synthetase.

Run BLAST with the human sequence against: 1) the archaea Methanosarcina2) the bacteria Escherichia coli3) the fungi Pseudozima Antarctica

Get the best homolog, align the sequences (including the human protein, on top) and use the input in Jalview.

Exercise 4/42D prediction of known 3D

Page 43: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

NCBI BLAST against single species is faster!

Exercise 4/42D prediction of known 3D

Page 44: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Obtain the sequence of the human glutamine synthetase.

Run BLAST with the human sequence against: 1) the archaea Methanosarcina2) the bacteria Escherichia coli3) the fungi Pseudozima Antarctica

Get the best homolog from each, align the sequences and use the input in Jalview. Put the human protein on top.

Exercise 4/42D prediction of known 3D

Page 45: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Load the alignment in Jalview and run web prediction.

Alternative. Run the alignment in the Jpred4 server. (Hint: You could run the human sequence alone but that will search for homologs and will take very long)

Compare the prediction with the known 3D of the human protein (open it in Chimera, File > Fetch by ID > PDB 2QC8)

Exercise 4/42D prediction of known 3D

Page 46: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

We need to hide all chains except one.

Select one of the chains (ctrl + click on a residue, then arrow up). Invert selection (press arrow right).

Actions > Ribbon > HideActions > Atoms/bonds > Hide

Select the chain and focus on itActions > Focus

Exercise 4/42D prediction of known 3D

Page 47: Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in

Compare the output of jpred/jalview 2D pref with the 3D structure of this protein.

For example, locate a predicted helix or beta-strand in Jalview. Find out the start and end positions hovering over the human sequence with the mouse (the numbers on top of the alignment are different from the amino acid positions in each sequence).

Color the corresponding residues it in the 3D view usingSelect > Atom specifierAnd ranges: e.g. :113-126 (predicted as helix)Actions > color > red

Apply color some helices red and strands in green.

Do you see differences? Where are they?Would you say that the 2D prediction was reasonable?

Exercise 4/42D prediction of known 3D