29
Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

Embed Size (px)

Citation preview

Page 1: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

Study of Protein Prediction Related

Problems

Ph.D. candidate

2013.10.16

Le-Yi WEI

1

Page 2: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

1

2

3

Background

Methods

Experiments

Contents

2

Page 3: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

Background

3

Page 4: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

>Example PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTTADELKKSADVRWHAERIINAVDDAVASMDDTEKMSMKLRNLSGKHAKSFQVDPEYFKVLAAVIADTVAAGDAGFEKLMSMI

4

Definition of protein

20 different amino acids

… …

A C D … … V W Y

Page 5: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

Protein prediction related problems

5

Protein Protein structural class prediction

Protein foldprediction

Multi-functional enzyme predictionProtein remote

homology detection

Other protein-related problems, etc.

Protein subcellular localization prediction

Page 6: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

6

Common points

Treat the protein-related problems as classification tasks

Query protein sequence

Data presentation

Classificationalgorithms

Predictedresults

The framework of a classification task

Two major components

Page 7: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

Methods

7

Page 8: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

Feature extraction methods

8

Primary sequence based

Secondary structure based

Sequence-structure based

e.g. Physicochemical features, N-gram, Functional Domain, PSSM-profile (auto-covariance), etc.

e.g. Secondary sequence based, and probability matrix based

e.g. Triple-sequence-structure features

Page 9: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

Primary-sequence based

9

• n-gram model

Given a query protein sequence:

Compute

Obtain

Page 10: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

10

A query protein sequence

… …

Database sequence 1

Database sequence 2

Database sequence 3

Database sequence n-2

Database sequence n-1

Database sequence n

… …

0

1

0

1

0

0

PSI-BLAST

Functional protein database

Featurevector

Primary-sequence based

• Functional domain

… …

Page 11: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

11

Position-Specific Score Matrix (PSSM)

Protein database

PSI-BLAST

Primary-sequence based

• Evolution information

Page 12: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

1220-D features

Primary-sequence based

• AAC features

Compute

Obtain

Page 13: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

1320*g-D features

Primary-sequence based

• Auto-covariance (AC) transformation

Compute

Obtain

Page 14: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

14

Primary-sequence based

PSSM profile Frequency profile

• Consensus sequence

Consensus sequence:

A query sequence:

Page 15: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

15

Secondary structure based

• Secondary structure sequence

SLFEQLGGQAAVQAVTAQFYANIQADA example of a query protein sequence :

CCHEHEEEEECCCCHHHHHHEEEEECC

Predicted secondary structure sequence , which has three

states:

PSI-PRED

C (coil), H (Helix), E (strand)

Page 16: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

16

Secondary structure based

• Structure state confidence matrix

A example of a structure state confidence matrix:

A query protein sequencePredicted structure sequence

Predicted confidence

Page 17: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

17

Secondary structure based

• Global structural features

Compute Obtain

Structure state confidence matrix:

Page 18: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

18

Secondary structure based

• Local structural features

Compute Obtain

Structure state confidence matrix:

Page 19: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

19

Sequence-structure based

The framework of triple sequence-structure feature extraction method

Page 20: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

20

Classification algorithms

Commonly used classification algorithms

e.g. Support Vector Machine (SVM), Random Forest (RF), SMO, Naive Bayes, etc.

Ensemble classification algorithms

e.g. Majority Vote, Average Probability, Selective Ensemble, etc.

Page 21: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

Experiments

21

Page 22: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

22

The framework of RF_PSCP

Webserver site : http://59.77.16.70:8080/RF_PSCP/Index.html

Page 23: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

23

Datasets

Three benchmark datasets

Three updated large-scale datasets

Sequence similarity

• Protein structural class prediction

Page 24: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

24

Results

Comparison with existing methods on three benchmark datasets

Page 25: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

25

Results

Tests of the proposed method on three updated large-scale datasets

Page 26: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

26

Results

Comparison with different combinations of feature subsets on three benchmark datasets

Page 27: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

27

Results

Optimization of Random forest classifier

Page 28: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

28

Page 29: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

Q&A!

29