View
49
Download
0
Category
Preview:
DESCRIPTION
Protein functions prediction. Signal peptides Transmembrane regions and topology PTM (post-translational modifications) Low complexity and biased regions Repeats Coils. Secondary structure Antigenic peptides Domain/Motifs Tools The EMBOSS package. Introduction . Different techniques. - PowerPoint PPT Presentation
Citation preview
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Protein functions prediction
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Introduction Signal peptides Transmembrane
regions and topology PTM (post-
translational modifications)
Low complexity and biased regions
Repeats Coils
Secondary structure Antigenic peptides Domain/Motifs Tools The EMBOSS package
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Different techniques Algorithms
Sliding window, Nearest Neighbor Patterns, regular expression Weight matrices HMM, profiles Neural Networks Rules
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Sliding windowTHISISATESTSEQVENCETHATDISPLAYSTHESLIDINGWINDQW
Score1Score2Scoren
Width or Size=11, Step=5Results are usually displayed as a graph, see example ->
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Patterns / regular expression Pattern: <A-x-[ST](2)-x(0,1)-{V} Regexp: ^A.[ST]{2}.?[^V] Text: The sequence must start with an
alanine, followed by any amino acid, followed by a serine or a threonine, two times, followed by any amino acid or nothing, followed by any amino acid except a valine.
Simply the syntax differ…
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Weight matrices (PSSM)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
HMM / profiles
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Neural NetworksGeneral principle: Example:
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Signals found in proteins N-ter
exportation - secretion mitochondria chloroplast
internal NLS (nuclear
localization signal)
C-ter GPI-anchor (Glycosyl
Phosphatidyl Inositol) other membrane
anchors (see PTM) other unknown ?
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Signals detection tools SignalP MitoProt ChloroP Predotar PSort TargetP Sigcleave (EMBOSS)
Big-PI DGPI
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Transmembrane regions Detection (signal peptide, hydropathy, helices) Organisation (topology)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Transmembrane detection tools
TMHMM TMPred TopPred2 DAS HMMTop Tmap (EMBOSS)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Post translational modifications
Phosphorylation S - T - Y
N-glycosylation N
O-glycosylation S - T - (HO)K
Acetylation, methylation D - E - K
Sulfation Y
Farnesylation, myristylation, palmitoylation, geranylgeranylation, GPI-anchor C - Nter - Cter
Ubiquitination and family K - Nter
Inteins (protein splicing) Pre-translational
Selenoprotein C
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
PTM detection Pattern prediction
(PROSITE) Short or weak signal Frequent hit producer Best method is
experimental MS/MS detection
Most method use « rules » joining pattern detection and knowledge to predict sites.
NetOGlyc - Prediction of type O-glycosylation sites in mammalian proteins
DictyOGlyc - Prediction of GlcNAc O-glycosylation sites in Dictyostelium
YinOYang - O-beta-GlcNAc attachment sites in eukaryotic protein sequences
NetPhos - Prediction of Ser, Thr and Tyr phosphorylation sites in eukaryotic proteins
NMT - Prediction of N-terminal N-myristoylation
Sulfinator - Prediction of tyrosine sulfation sites
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Low complexity regions repeats compositional bias PEST
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Low complexity / Repeats DUST (DNA) / SEG
de novo detection RepeatMasker (DNA)
search collection REP
search collection REPRO, Radar
de novo detection PEST, PESTFind
de novo detection
EMBOSS (DNA) einverted equicktandem etandem palindrome
EMBOSS (protein) oddcomp
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Coils Helix of helix
coiled-coil Leu-zipper
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Coils detection COILS
Weight matrices Paircoil, Multicoil
Pairwise correlation Marcoil
HMM Pepcoil (EMBOSS)
Weight matrices
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Secondary structure Structure to
predict Alpha-helices Beta-sheets Turns Random coil
Garnier (EMBOSS) PHD DSC PREDATOR NNSSP Jpred Jnet Many others
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Antigenic peptide Peptides binding to MHC class I
8, 9, 10 mers class II
15 mers (3+9+3) Depend highly on MHC
type
Use of experimental knowledge Databases of known
peptides
SYFPEITHI HLA_Bind (BIMAS) MAPPP combined expert Antigenic (EMBOSS) Many more
Prediction of proteasome cleavage sites
NetChop PaProc
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Domain / Motif All the protein domain
descriptors PROSITE PFAM SMART PRODOM BLOCKS PRINTS …
Federation: InterPro Many techniques
Patterns, Regexp PSSM (PSI-BLAST) Profiles HMM
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Other Tools You can find some of them on our servers
www.ch.embnet.org Or on ExPASy server
www.expasy.org/tools Or ask Google!!
www.google.com
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
European Molecular Biology Open Software Suite
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Free Open Source (for most Unix plateforms) GCG successor (compatible with GCG file format) More than 100 programs Easy to install locally
but no interface, requires local databases Unix command-line only
Interfaces Jemboss, www2gcg, w2h, wemboss … (with account) Pise, EMBOSS-GUI (no account)
Access: http://www.emboss.org
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Jemboss/index.html
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Pise a tool to generate Web interfaces for Molecular Biology programs
http://emboss.ch.embnet.org:8080/Pise
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
http://bioinfo.pbi.nrc.ca:8090/EMBOSS/
GUI (Canada)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Format USA 'asis' :: Sequence [start : end : reverse] Format :: '@' ListFile [start : end : reverse] Format :: 'list' : ListFile [start : end : reverse] Format :: Database : Entry [start : end : reverse] Format :: Database - SearchField : Word [start : end : reverse] Format :: File : Entry [start : end : reverse] Format :: File : SearchField : Word [start : end : reverse] Format :: Program Program-parameters '|' [start : end : reverse]
Example: fasta::Swissprot:UBP5_HUMAN[200:300]
Databases Any can be added, use showdb to display the available databases
Some details
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Some tools for DNA redata Search REBASE for enzyme name, references, suppliers etc remap Display a sequence with restriction cut sites, translation etc restover Finds restriction enzymes that produce a specific overhang restrict Finds restriction enzyme cleavage sites showseq Display a sequence with features, translation etc silent Silent mutation restriction enzyme scan cirdna Draws circular maps of DNA constructs lindna Draws linear maps of DNA constructs revseq Reverse and complement a sequence …
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Example: remapECLAC E.coli lactose operon with lacI, lacZ, lacY and lacA genes. Hin6I TaqI | HhaI | Bsc4I | Bsu6I | | Hin6I | BssKI | | | HhaI AciI | | BsiSI \ \ \ \ \ \ \ \ GACACCATCGAATGGCGCAAAACCTTTCGCGGTATGGCATGATAGCGCCCGGAAGAGAGT 10 20 30 40 50 60 ----:----|----:----|----:----|----:----|----:----|----:----| CTGTGGTAGCTTACCGCGTTTTGGAAAGCGCCATACCGTACTATCGCGGGCCTTCTCTCA / / / / / / / /// | TaqI | Hin6I AciI | | ||BssKI Bsc4I HhaI | | |BsiSI | | Bsu6I | Hin6I HhaI# Enzymes that cut Frequency Isoschizomers AciI 1 Bsc4I 1 BsiSI 1 BssKI 1 Bsu6I 1 HhaI 2 Hin6I 2 HinP1I,HspAI TaqI 1# Enzymes that do not cutAclI BamHI BceAI Bse1I BshI ClaI EcoRI EcoRII Hin4I HindII HindIII HpyCH4IV KpnI NotI
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Example: cirdna File: ../../data/data.cirpStart 1001End 4270grouplabelBlock 1011 1362 3ex1endlabellabelTick 1610 8EcoR1endlabellabelBlock 1647 1815 1endlabellabelTick 2459 8BamH1endlabellabelBlock 4139 4258 3ex2endlabelendgroupgrouplabelRange 2541 2812 [ ] 5AluendlabellabelRange 3322 3497 > < 5MER13endlabelendgroup
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2002.10
Exercises DEA Exercises web based sequence analysis The goal of this exercise is to use web based tools for protein sequence analysis
a) Take this TrEMBL sequence (Q9X252) and try a BLAST against swissprot with the complete protein or with the first 70 residues. Explain the difference. Use TMPred, SignalP, and COILS to help you.
b) Pass this sequence through PFSCAN and search all databases. Compare with this command on ludwig-sun1/2: hits -b "prf pat pfam" tr:Q9X252
c) use the different profile, motifs, pattern databases to get more information about the domain(s) you found.
d) How do you evaluate the PRINTS tropomyosin annotation in this TrEMBL entry (Q9WZH0)? List of useful links:
basic BLAST or advanced BLAST or PSI-BLAST TMPred prediction tool for transmembrane regions (or TMHMM) COILS prediction tool for coiled-coil regions SignalP prediction tool for signal-peptide cleavage site
Profile, domain, motifs databases and search sites: PFSCAN InterPro (Pfam, PRINTS, PROSITE, SMART) HITS
Recommended