Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling...

Protein Structure and Function Prediction

Predicting 3D Structure

– Comparative modeling (homology)

– Fold recognition (threading)

Outstanding difficult problem

Comparative Modeling

Comparative structure predictionproduces an all atom model of asequence, based on its alignment to oneor more related protein structures in thedatabase

Similar sequence suggests similar structure

Comparative ModelingModeling of a sequence based on known structuresConsist of four major steps :1. Finding a known structure(s) related to the sequence

to be modeled (template), using sequence comparison methods such as PSI-BLAST

2. Aligning sequence with the templates

3. Building a model

4. Assessing the model

Comparative Modeling• Accuracy of the comparative model is

related to the sequence identity on which it is based

>50% sequence identity = high accuracy

30%-50% sequence identity= 90% modeled

<30% sequence identity =low accuracy (many errors)

• Similarity particularly high in core– Alpha helices and beta sheets preserved– Even near-identical sequences vary in loops

Comparative Modeling Methods

MODELLER (Sali –Rockefeller/UCSF)

SCWRL (Dunbrack- UCSF )

SWISS-MODEL http://swissmodel.expasy.org//SWISS-MODEL.html

Protein Folds

• A combination of secondary structural units– Forms basic level of classification

• Each protein family belongs to a fold– Estimated 1000–3000 different folds

– Fold is shared among close and distant family members

• Different sequences can share similar folds

Hemoglobin TIM

Protein Folds: sequential and spatial arrangement of secondary structures

Fold classification:(SCOP)•Class:

All alphaAll betaAlpha/betaAlpha+beta

•Fold•Family•Superfamily

Basic steps in Fold Recognition :

Compare sequence against a Library of all known Protein Folds (finite number)

Query sequenceQuery sequence

MTYGFRIPLNCERWGHKLSTVILKRP...

Goal: find to what folding template the sequence fits bestGoal: find to what folding template the sequence fits best

Find ways to evaluate sequence-structure fit

Find best fold for a protein sequence: Fold recognition (threading)

MAHFPGFGQSLLFGYPVYVFGD...

Potential fold

1) ... 56) ... n)

-10 ... -123 ... 20.5

Programs for fold recognition

• TOPITS (Rost 1995)

• GenTHREADER (Jones 1999)

• SAMT02 (UCSC HMM)

• 3D-PSSM http://www.sbg.bio.ic.ac.uk/~3dpssm/

Ab Initio Modeling

• Compute molecular structure from laws of physics and chemistry alone– Ideal solution (theoretically)

• Simulate process of protein folding– Apply minimum energy considerations

• Practically nearly impossible– Exceptionally complex calculations– Biophysics understanding incomplete

Ab Initio Methods

• Rosetta (Bakers lab, Seattle)

• Undertaker (Karplus, UCSC)

Predicting Protein Function

PART 2

Inferring protein function :

• Based on the existence of known protein domains

• Based on homology

Protein Domains

• Domains can be considered as building blocks of proteins.

• Some domains can be found in many proteins with different functions, while others are only found in proteins with a certain function.

• The presence of a particular domain can be indicative of the function of the protein.

DNA Binding domainZinc-Finger

Protein Domain can be defined by :

• A motif• A profile (PSSM)• A Hidden Markov Model

Rxx(F,Y,W)(R,K)SAQ

Profile Scoring

PROSITE

• ProSite is a database of protein domains that can be searched by either regular expression patterns or sequence profiles.

Zinc_Finger_C2H2 Cx{2,4}Cx3(L,I,V,M,F,Y,W,C)x8Hx{3,5}H

Profile HMM (Hidden Markov Model)

D16 D17 D18 D19

M16 M17 M18 M19

I16 I19I18I17

100% 100%

D 0.8S 0.2

P 0.4R 0.6

T 1.0 R 0.4S 0.6

X XX X

50%D R T RD R T SS - - SS P T RD R T RD P T SD - - SD - - SD - - SD - - R

16 17 18 19

HMM is a probabilistic model of the MSA consisting of a number of interconnected states

delete

insert

• The Pfam database is based on two distinct classes of alignments– Seed alignments which are deemed to be

accurate and used to produce Pfam A– Alignments derived by automatic clustering of

SwissProt, which are less reliable and give rise to Pfam B

• Database that contains a large collection of multiple sequence alignments andProfile hidden Markov Models (HMMs).

• High-quality seed alignments are used to build HMMs to which sequences are aligned

InterPro

Was built from protein classification databases, such as:

• PROSITE• ProDom• SMART• Pfam• PRINTS

Uses UniProt = SWISSPROT and TrEMBL

Database and Tools for protein families and domains

• InterPro - Integrated Resources of Proteins Domains and Functional Sites

• Prosite – A dadabase of protein families and domain • BLOCKS - BLOCKS db • Pfam - Protein families db (HMM derived)• PRINTS - Protein Motif fingerprint db • ProDom - Protein domain db (Automatically generated) • PROTOMAP - An automatic hierarchical classification of Swiss-Prot

proteins • SBASE - SBASE domain db • SMART - Simple Modular Architecture Research Tool • TIGRFAMs - TIGR protein families db

Inferring protein function based on sequence homology

Clusters of Orthologous Groups of proteins

(COGs) Classification of conserved genes according to their

homologous relationships. (Koonin et al., NAR)

Homologs - Proteins with a common evolutionary origin

Paralogs - Proteins encoded within a given species that arose from one or more gene duplication events.

Orthologs - Proteins from different species that evolved by vertical descent (speciation).

Clusters of Orthologous Groups of proteins

(COGs)

Each COG consists of individual orthologous proteins or orthologous sets of paralogs from at least three lineages.

Orthologs typically have the same function, allowing transfer of functional information from one member to an entire COG.

COGS - Clusters of orthologous groups

* All-against-all sequence comparison of the proteins encoded in completed genomes (paralogs/orthologs)

* For a given protein “a” in genome A, if there are several similarproteins in genome B, the most similar one is selected

* If when using the protein “b” as a query, protein “a” in genome A is selected as the best hit “a” and “b” can be included in a COG

* Proteins in a COG are more similar to other proteins in the COG than to any other protein in the compared genomes

* A COG is defined when it includes at least three homologousproteins from three distant genomes

Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling...

Documents

Homology Groups And Persistence Homology

Persistent homology analysis of protein structure ... · PERSISTENT HOMOLOGY FOR PROTEIN continuously over a range of spatial scales. Unlike commonly used computational homology,

Protein 3D Structure Determination Using Homology Modeling and Structure Analysis

Introduction to 3D-Structure Visualization and Homology ...edu.sib.swiss/.../section/1232/Introduction-SMW_part_I.pdfIntroduction to 3D-Structure Visualization and Homology Modeling

Homology modeling and structure prediction of thioredoxin (TRX) protein … · Homology modeling and structure prediction of thioredoxin (TRX) protein in wheat (Triticum aestivum

Secondary structure prediction for modelling by homology · 2006-01-09 · Secondary structure prediction for modelling by homology P.E.Boscottl, G.J.Barton2 and W.G.Richardsr'3 lPhysical

Tertiary Structure Prediction Methods Any given protein sequence Structure selection Compare sequence with proteins have solved structure Homology Modeling

When homology modeling does not work? Jon K. Lærdahl ... · When homology modeling does not work? Jon K. Lærdahl, Structural Bioinformatics Sequence homology Structure similarity

Fragment based homology modeling and simulation based ... · Fragment based homology modeling and simulation based ... was predicted using CFSSP- Chou & Fasman Secondary Structure

Computational Homology Tutorial Homology Algorithmspeople.math.gatech.edu/~chomp/workshop/mrozek.pdf · Computational Homology Tutorial Homology Algorithms Computational Homology

Protein structure and homology modeling Morten Nielsen, CBS, BioCentrum, DTU

Introduction to 3D-Structure Visualization and Homology ...edu.isb-sib.ch/.../Introduction-SMW_partIII-ho.pdf · Introduction to 3D-Structure Visualization and Homology Modeling using

Homology Modeling and Structure Prediction of Thioredoxin (TRX) Protein in Wheat (Triticum Aestivum L.)

Persistent homology analysis of protein structure ... · Persistent homology analysis of protein structure, exibility and folding Kelin Xia 1 ;2Guo-Wei Wei 3 4 1Department of Mathematics

Khovanov homology of torus links: Structure and Computations

BASIC PROTEIN STRUCTURE PREDICTION FOR THE BIOLOGIST: …€¦ · class of protein structure prediction methods has appeared: protein threading. Homology modeling makes structure

Protein Threading - University of Wisconsin–Madison · 2020-01-19 · Protein Threading • Generalization of homology modeling –homology modeling: align sequence to sequence

Protein structure and homology modeling

Protein Threading Zhanggroup 2003 10 22. Overview Background protein structure protein folding and designability Protein threading Current limitations

Protein Structure Prediction and Analysis. 1. Protein Structure Prediction - Homology Modeling, Threading, Ab Initio Structure Prediction 2. Protein Structure