Upload
donna-barker
View
220
Download
2
Tags:
Embed Size (px)
Citation preview
Bioinformatics and Protein Sequence Analysis
Surabhi Agarwal
With sequencing of large number of proteins and subsequent storage of data, it has become easier
for researchers to study the proteins. These studies help in providing preliminary insights into the
structural and functional aspects of proteins without conducting experiments.
Master Layout (Part 1)
5
3
2
4
1 This animation consists of 2 parts:Part 1: Protein Sequence AlignmentPart 2: Alignment analysis and interpretations
Extract the newly determined amino acid sequence for your query peptide.
Assess the significance of the result with its alignment scoreSeq 2
Seq 3
Seq 1
Definitions of the componentsPart 1 – Protein sequence alignment
5
3
2
4
11. Query Peptide: This refers to the unknown protein or peptide that is provided
as an input to the sequence analysis server. The sequence of this protein is determined before carrying out further studies for analyzing similarity matches with other proteins.
2. Relevant Algorithm: An algorithm refers to the sequence of logical steps that are used for comparing the query peptide with other given protein sequences. The nature of query such as “Local” or “Global” and “Pair-wise alignment” or “Multiple Sequence Alignment” determines the algorithm that is used.
3. Local Alignment: “Local” alignment represents matching individual blocks of protein sequences in which the protein alignment gets broken at positions where a mismatch occurs. The aim of such alignment studies is to find the longest possible blocks of similarity in aligned protein sequences.
4. Global Alignment: “Global” alignment represents an end-to-end alignment of two or more sequences, where gaps are introduced at the positions where mismatches occur.
5. Pair-wise sequence alignment: This procedure compares and aligns two given sequences. The comparison can either be Global or Local with the quality of alignment being judged by the alignment score.
5
3
2
4
16. Multiple Sequence Alignment: This refers to the end-to-end alignment of several
given sequences that are provided to the search engine. Multiple alignment tends to introduce minimum gaps and finds regions of similarity within all given sequences.
7. Word –length: The minimum length of an amino acid sequence that needs to match exactly in order to initiate an alignment process in either direction. Sensitivity and speed of alignment are dependent on the word length provided by the user.
8. Scoring Matrix: The matrix of values that are referred to for assigning a score to the alignment of pairs of residues. The matrix used for a BLAST search is selected depending on the type of sequences that one is searching with. These are PAM series matrices and BLOSUM series.
a) PAM: PAM stands for Point Accepted Mutations. It is a log-odds, matrix scoring system that is constructed on the amino acid replacements in a set of closely related proteins. PAM value helps in defining the percentage of mutations that get accepted from a given set of proteins. 1 PAM refers to a change in position for an average of 1% of amino-acids residues.
b) BLOSUM: This stands for “Blocks of Amino Acid Substitution Matrix” and is constructed from a set of distantly related proteins. BLOSUM provides a comprehensive biological insight into proteins when the evolutionary distance is not known beforehand. It is based on the relative frequency of amino acid residues and the probabilities of their substitution in a set of highly conserved blocks of residues in proteins that are evolutionarily distant.
Definitions of the componentsPart 1 – Protein sequence alignment
5
3
2
4
19. Threshold: Threshold provides a measure of the statistical significance of the
results of an alignment study and represents the expected number of matches occurring by chance event.
10. Gap Penalty and Gap Extension: In an alignment of two or more given protein sequences, a gap is introduced wherever an amino acid mismatch occurs. In this context, “Gap penalty” refers to a deduction in the overall alignment score on introduction of a gap while the “Gap Extension” is for extending an already existing gap.
11. Alignment Score: This is also referred to as the Bit Score and provides a comparative quantification of the quality of alignment. The score increases when a higher number of residue matches and lower number of mismatches are encountered. The alignment having a higher bit score is a better match.
12. Percentage Identity: This indicates the percentage of amino acid residues that are an identical match to each other during the comparison of two sequences.
13. E-value: E-value provides a quantification of any chance alignment between two or more sequences instead of them being a biologically significant match. For similarity match against a database, this value is dependant on the size of the database against which the sequence is compared. The closer the e-value is to zero, the higher is the biological significance of the match.
14. Hit: The results of a search are called a ‘Hit’ and the term ‘best Hit’ would refer to the best result for that particular query.
Definitions of the componentsPart 1 – Protein sequence alignment
Step 1: Pair-wise sequence alignment for two given sequences - INPUT
Action Audio Narration
1
5
3
2
4Description of the action
SEQUENCE DATABASE
ALIGNMENT ALGORITHM (BLAST)
Enter sequence 1
Schematic of the process of pair-wise alignment
Follow the animation steps. Re-draw all figures. Show all definitions first by highlighting the parameter. Follow it with input of 2 sequences and the parameter values one by one. Downlink after scoring matrix should look like the downlinks seen on web-pages. Click on the downlink and show the BLOSUM62 Matrix getting selected. Click on BLAST tool
Alignment algorithms are computer algorithms which take the 2 protein sequences and align them residue by residue. Here we depict alignment done between 2 given sequences. To align two sequences, enter them in input box. We took the example of CBR-COL-186 protein of Caenorhabditis briggsae and collagen of Caenorhabditis elegans. The sequences are abridged for the purpose of animation. To carry out the exact study, users can download the sequences corresponding to the Gene ID. Enter the parameters as per the nature of the query and the purpose of the search and finally click on the BLAST tool.
>gi|268576797|ref|XP_002643378.1| C. briggsae CBR-COL-186 protein [Caenorhabditis briggsae] MKSTEKKSTELDLELEAQSLRRIAFFGVAMSTVATFVCIITVPLAYNKMQQMQSNMIDQYMASARGIRVA …
Enter sequence 2
>gi|6682|emb|CAA35955.1| collagen [Caenorhabditis elegans] MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSSLQSEVEFCQHRSNGLWDEYK …
Word Size
Threshold
Gap penalty
Scoring Matrix
PAM30BLOSUM62
BLOSUM62
3
10
Existence 11, Extension 1
1
Enter sequence 1
Length of initial set of amino acids that needs to be matched before
alignment beginsExpected Number of Matches that
are allowed to occur by chanceValues deducted from overall alignment score on introduction and extension of
mismatchesThe reference matrix used to assign
scores to matches of residues
Action
1
5
3
2
4 Shows the various output formats for pair-wise alignment
Show the smaller image of the server with every output and definitions coming out of it one at a time as shown in the powerpoint animation
Pair-wise alignment with the help of BLOSUM 62 matrix gives various kinds of results after alignment. These are alignment, alignment score, dot-plot, percentage identity and e-value. The raw score from BLOSUM62 matrix is 189 and from PAM30 matrix is 178. Bit score for alignment of the exact same study done using BLOSUM62 is 77.4 and for PAM30 matrix is 78.7. Therefore, the Bit scores give a uniform and normalized measure of the overall quality of alignment irrespective of the scoring system. The biological significance of this result is very high as the e value is very near to 0. For a more detailed study on the types of BLAST tools available, visit http://blast.ncbi.nlm.nih.gov/Blast.cgi
Step 2: Pair-wise sequence alignment for two given sequences - OUTPUT
http://blast.ncbi.nlm.nih.gov/Blast.cgi
ALIGNMENT:Sequence 1 LELEAQSLRRIAFFGVAMSTVATFVCIITVPLAYNKMQQMQSNMIDQYMASARGIRVARR + E +SLR++AFFG+A+ST+AT II VP+ YN MQ +QS++ + Sequence 2 IAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSSLQSE----------VEF
Audio NarrationDescription of the action
Shows the match or mismatch between each of the residues
Sequence 1
Sequence 2
Gaps introduced in sequence 2 due to lack of similar residues in
sequence 1
DOT-PLOT
Dot-Plot is the graphical visualization of the two given
sequences to find approximate overlaps to identify regions of close
similarity
BIT SCORE
Bit score are the normalized scores which are found after normalization of raw scores based on the scoring
matrix used in the algorithm
77.4 bits
PERCENTAGE IDENTITY
The percentage of residues which were identical in the two sequences
34%
E-VALUE
The statistical measure of the biological significance. The closer e-value is to 0, higher is the biological
significance
6e-19
Action Audio Narration
1
5
3
2
4Description of the action
Schematic of the process of pair-wise alignment
Alignment can also be done by matching a sequence against a related database of sequences to identify it. Input the unknown sequence, and then select the database against which the sequence is to be matched. Fill the parameter values as per the purpose of the search and the nature of the query sequence. In this case we study the hits using PAM30 scoring Matrix. Click on the BLAST tool once all parameters have been entered.
Step 3: Pair-wise alignment of sequences against database- INPUT
Follow the animation steps. Re-draw all figures. Show all definitions first by highlighting the parameter. Follow it with input of 1 sequence. Downlink after “Select Database” and “Scoring Matrix” should look like the downlinks seen on web-pages. Select “Protein” under the “Select Database” options box as shown in the animation. Follow this by inputting the parameter values one by one. Click on the downlink against “Scoring Matrix” and show the PAM30 Matrix. Click on BLAST tool.
SEQUENCE DATABASE
ALIGNMENT ALGORITHM (BLAST)
Enter sequence 1MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSSLQSEVEFCQHRSNGLWDEYKRFQGVSGVEGRIKRDAYHRSLGVSGASRKARRQSYGNDAAVGGFGGSSGGSCCSCGSGAAGPAGSPGQDGAPGNDGAPGAPGNPGQDASEDQTAGPDSFCFDCPAGPPGPSGAPGQKGPSGAPGAPGQSGGAALPGPPGP
Word Size
Threshold
Gap penalty
Scoring Matrix
PAM30BLOSUM62
PAM30
3
10
Existence 11, Extension 1SELECT
DATABASEPROTEINNUCLEOTIDEGENEPROTEOMEGEOESTSNP
Action Audio NarrationDescription of the action
1
5
3
2
4Shows the various output formats for pair-wise alignment
Show the smaller image of the server with every output and definitions coming out of it one at a time as shown in the powerpoint animation
Pair-wise alignment gives various kinds of results after alignment. These are alignment views, alignment score, dot-plot, e-value, percentage identity amongst many others. When compared to bit scores from other hits of the result, the bit score turns out to be the highest for collagen proteins in Caenorhabditis elegans
Step 4: Pair-wise alignment of sequences against database- OUTPUT
SEQUENCE DATABASE
ALIGNMENT ALGORITHM (BLAST)
Enter sequence 1MPSSVSWGILLLAGLCCLVPVSLAEDPQGDAAQKTDTSHHDQDHPTFNKITPNLAEFAFSLYRQLAHQSNSTNIFFSPVSIATAFAML
Word Size
Threshold
Gap penalty
Scoring Matrix
PAMBLOSUM
BLOSUM
3
10
Existence 11, Extension 1SELECT DATABASE
PROTEINNUCLEOTIDEGENEPROTEOMEGEOESTSNP
IDENTIFICATIONGENE ID: 179452 col-13 | Collagen [Caenorhabditis elegans]
Identifies the protein sequence and the source organism for the
unknown sequence
ALIGNMENT:Query MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSSLQSEVEFCQH MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSSLQSEVEFCQHDatabase MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSSLQSEVEFCQH
Alignment shows 100% matching with the identified sequence
TOTAL SCORE624 bits
Measure of the quality of the alignment when compared to bit scores of other hits of the search
E-Value1e-176
In the case of database searches, E-value is found by the multiplication of
pair-wise e-value number of sequences in the database.
http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html; http://pfam.sanger.ac.uk/
100%Percentage Identity
Percentage of residues exactly matching in the query sequence and
the selected hit
Domain Identified (if any)
The query is scanned to find domains from Pfam Database. In case, such a
domain is identified, it is shown as part of the result
17 691 50 100 150 200 300250
Pfam ID: pfam01484: Domain Name: Col_cuticle_N Description: Nematode cuticle collagen N-terminal domain
Action Audio Narration
1
5
3
2
4Description of the action
SEQUENCE DATABASE
MULTIPLE SEQUENCE ALIGNMENT (CLUSTAL-W)
Enter sequence 1
Schematic of the process of pair-wise alignment
Follow the animation steps. Enter first 2 sequences. Click on “Add more sequences”. Open the 3rd input box for entering thee 3rd sequence. Show the input of 3rd sequence. Show the input of parameters. Select “Absolute” ahead of “Score Type” downlonk. Downlink after scoring matrix should look like the downlinks seen on web-pages.
Multiple Sequence Alignment tools are used to compare the amino acid sequences of more than two proteins. The word-size is the length of the seed set of amino acids, which needs to match exactly to get extended in both directions. Window Length is the length of the residues on either side, till which the alignment will be extended. The Gap penalty and extension hold the same meaning as in pair-wise alignment. In the scores, users can choose to see absolute scores for comparing or percentage value of the scores.
>gi|268574584|ref|XP_002642271.1| Hypothetical protein CBG18259 [Caenorhabditis briggsae] MDEKQRLQAYRFVAYSAVTFSTVAVFSLCITLPLVYNYVDGIKTQINHEIKFCKHSARDIFAEVNHIRANPKNASRFARQAGYGTDEAVSGGS
Enter sequence 2>gi|32565788|ref|NP_871711.1| COLlagen family member (col-96) [Caenorhabditis elegans] MDEITRRNAYRFVAYSAVTFSVVAVFSLCITLPMVYNYVHGIKSQINHQISFCKHSARDIFSEVNHIRASPNNATLREKRQAGDCSGCCL
Word Size
Window length
Gap penalty
Score type
ABSOLUTEPERCENTAGE
ABSOLUTE
3
10
Existence 11, Extension 1
Enter sequence 3>gi|17559060|ref|NP_505677.1| COLlagen family member (col-13) [Caenorhabditis elegans] MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSSLQSEVEFCQHRSNGLWDEYKRFQGVSGVEGRIKRDAYH
ADD MORE SEQUENCES
Step 5: Multiple Sequence Alignment - INPUTThe word-size is the length of the initial seed set of amino acids, which needs to
match exactly to get the alignment extended in both directions
Window Length is the length of the residues on either side of the initial
matched sequence, till which the alignment will be extended.
Users can choose to see absolute scores for comparing or percentage value of
the scores
SEQUENCE DATABASEEnter sequence 1MPSSVSWGILLLAGLCCLVPVSLAEDPQGDAAQKTDTSHHDQDHPTFNKITP
Enter sequence 2
MKLLKLTGFIFFLFFLTESLTLPTQPRDIENFNSTQKFIEDNIEYITIIAFAQYVQEA
Word Size
ThresholdGap penalty
Scoring MatrixBLOSUM
3
10
Existence 11, Extension 1
MULTIPLE SEQUENCE ALIGNMENT (CLUSTAL-W)
Action Audio NarrationDescription of the action
1
5
3
2
4
Enter sequence 2
MKLLKLTGFIFFLFFLTESLTLPTQPRDIENFNSTQKFIEDNIEYITIIAFAQYVQEA
MULTIPLE SEQUENCE ALIGNMENTsequence 1 MDE-----KQRLQAYRFVAYSAVTFSTVAVFSLCITLPLVYNYVDGIKTQsequence 2 MDE-----ITRRNAYRFVAYSAVTFSVVAVFSLCITLPMVYNYVHGIKSQsequence 3 MSEDLKQIAQETESLRKVAFFGIAVSTIATLTAIIAVPMLYNYMQHVQSS
COLOR CODED ALIGNMENT
Shows the various output formats for multiple sequence alignment
Show the smaller image of the server with every output coming out of it one at a time
Multiple sequence alignment gives various kinds of results after alignment. The alignment view in text format displays the residue wise matching for the input sequence. The color coded alignment gives a better graphical picture as the amino acid residues are assigned colors based on their physico-chemical properties. Here we depict one of the many color coding available. Alignment score is an absolute term, as selected previously. It can be compared with other scores to measure the quality of alignment. Users obtain .output file for the summary of the result, .aln files which contains the text alignment and .dnd files which contain the distance based information. For detailed understanding of these outputs, kindly visit http://www.ebi.ac.uk/Tools/clustalw2/index.html
Text alignment of query sequences
http://www.ebi.ac.uk/Tools/es/cgi-bin/clustalw2/
Sequence 1Sequence 2Sequence 3
ALIGNMENT SCORE
5269
Color coded alignment of query sequences
Alignment score which can be compared with other scores to
measure the quality of alignmnet
Mapping of colors to amino acid groups
Step 6: Multiple Sequence Alignment - OUTPUT
Master Layout (Part 2)
5
3
2
4
1 This animation consists of 2 parts:Part 1: Protein Sequence AlignmentPart 2: Alignment analysis and interpretations
Protein secondary structures
Structural features that decide function
Phylogram representing evolutionary relationships
Definitions of the componentsPart 2 – Alignment analysis and interpretations
5
3
2
4
11. Computational Phylogenetic Predictions: Sequence alignment studies of
proteins can reveal the conserved and variable residues between the two sequences. Protein sequences derived from different organisms, but having a high degree of similarity are assumed to be coming from the same ancestor. Such predictions, which can now be carried out computationally with the help of various algorithms, help in providing an insight into evolutionary processes.
2. Phylogram: Phylogram is a pictorial representation that provides a visualization of evolutionary relationships or phylogeny. In this, the length of branches in the tree are considered to be proportional to the evolutionary distance.
3. Cladogram: A Cladogram is another form of pictorial representation that also gives a visual insight into evolutionary relationships or phylogeny. Unlike the phylogram, the branches of a cladogram are of equal length irrespective of the evolutionary distance.
4. Maximum Parsimony: A method used for alignments which show very strong sequence similarity. This is usually applied for less than twelve sequences.
5
3
2
4
15. Distance methods: This predicts the evolutionary distance when there is any
sequence variation present and can be used on large number of sequences. As the distance between two sequences increases, the uncertainty of the alignment also increases.
6. Maximum likelihood: This method is useful for prediction of evolutionary distance when sequence variability is high. It can be used for alignments with any amount of variability.
7. Protein structure prediction: The three dimensional structure of a protein is largely specified by its amino acid sequence. Protein structures can be predicted with an accuracy of 70-75% when provided with the sequence.
8. Functional annotation: Function(s) of proteins can be predicted for those proteins having a well-described homology. Gene Ontology terms (GO terms) provide a unique identification of the function that the gene is involved in. These functions are categorized at different levels of functional hierarchy.
9. Protein motif: Common patterns of residues in a set of protein sequences is known as a motif.
Definitions of the componentsPart 2 – Alignment analysis and interpretations
Step 1: Phylogenetic analysis from alignment- Input
Action Audio Narration
1
5
3
2
4 Description of the action
SEQUENCE DATABASE
PHYLOGENETIC ANALYSIS (PHYLIP)
Enter a sequence alignment for 2 or more sequences
Schematic of the process of analysis of alignment
Follow the animation steps. Show the description of each of the methods as the mouse hovers over them. Finally select “Maximum Parsimony” method. Downlink after scoring matrix should look like the downlinks seen on web-pages.
Multiple sequence alignment produces alignment files (.aln), which can be used to determine the evolutionary distances of a set of given protein sequences. This can be achieved by many server-based and stand-alone programs. The user needs to select the method for calculating the distance. Here we depict the usage of alignment files for phylogenetic analysis.
Select a method
Seq1 -------------- LLFLFSSAYSRGVFRRDTHKSeq2 MKWVTFISLLFLFSSAYSRGVFRRDAHSeq3 MKWVTFLLLLFVSGSAFSRGVFRREA
MAXIMUM PARSIMONY
DISTANCE METHODS
MAXIMUM LIKELIHOOD
USED FOR SEQUENCES WITH HIGHLY CONSERVED RESIDUESUSED FOR SEQUENCES WITH MODERATELY CONSERVED RESIDUESUSED FOR SEQUENCES WITH HIGHLY VARIABLE RESIDUES
MAXIMUM PARSIMONY
Step 2: Phylogenetic analysis from alignment- Output
Action Audio Narration
1
5
3
2
4 Description of the action
SEQUENCE DATABASE
PHYLOGENETIC ANALYSIS (PHYLIP)
Enter a sequence alignment for 2 or more sequences
Follow the animation steps. The server on the previous slide gives the following outputs
The outputs from the analysis will be Distance file known as the DND file, Cladogram and Phylogram which are evolutionary trees. In the DND file, there is a common node. The values against the sequence are the distance from the common node. DND files give the distance measure of the aligned sequences from their common ancestral node. Cladograms are the graphical representation of the branching during evolution of the proteins that were aligned. Cladograms do not represent the evolutionary distances or the common ancestral node. Phylograms also represent the evolutionary distance tree in a graphical format. In this, the branch lengths correspond to the evolutionary distance between the two proteins. All branches will converge to a common ancestral root.
Select a method
PGFPPLVAPEPDALCAAFQDNPNLPRLVRPEVDVMCTAFHDNPKLK-PDPNTLCDEFKADEKKF
MAXIMUM PARSIMONY
( seq 1:0.13525, Seq 2:0.09868, seq 3:0.09868);
DND FILESCLADOGRAMPHYLOGRAM
Schematic of the process of analysis of alignment
DND files gives the distance measure of the aligned sequences from their
common ancestral node Branching diagram depicting evolutionary
relationships or phylogeny.
Phylogram is a branching depicting evolutionary relationships or phylogeny. In this, the length of branches in the tree are
considered to be proportional to the evolutionary distance.
Action Audio Narration
1
5
3
2
4 Description of the action
SEQUENCE DATABASE
Structural and Functional prediction (MeMe server)
Enter a sequence alignment for 2 or more sequences
Schematic of the for structural and functional analysis
Alignment files can also be used for a variety of structural and functional analysis. Here we represent the functioning of such programs and servers by taking a simple example of protein motif prediction. The range of the width and the maximum number of motifs to be found are defined by the user.
Range for width of the motifs to be found
Seq 1 PGFPPLVAPEPDALCAAFQDNSeq 2 PNLPRLVRPEVDVMCTAFHDNSeq 3 PKLK-PDPNTLCDEFKADEKKF
6-50
Maximum number of motifs to be found
3
Follow the animation steps. Input the alignment. Input the parameters. Click on the server tool.
Step 3: Structural and Functional prediction from alignment- Input
http://meme.sdsc.edu/meme4_4_0/intro.html
Action Audio Narration
1
5
3
2
4Description of the action
The outputs obtained are1. Block Diagram of protein motifs, which is the schematic used to visualize the positions and kinds of motifs in the alignment of two or more sequences. The color coding varies from server to server.2. Sites of the blocks on a residue-by-residue basis.
Color coded block diagram for motifs
Residue-wise sites for motifs
Schematic of the for structural and functional analysis
SEQUENCE DATABASE
Enter a sequence alignment for 2 or more sequences
PGFPPLVAPEPDALCAAFQDNPNLPRLVRPEVDVMCTAFHDNPKLK-PDPNTLCDEFKADEKKF
Range for width of the motifs to be found
6-50
Maximum number of motifs to be found 3
Structural and Functional prediction (MeMe server)
Block diagram of motif prediction is the schematic used to visualize the positions and
kinds of motifs in the alignment of twoor more sequences
The color coded diagram shows the positions of the motifs in the text
alignment of the compared sequences
Step 4: Structural and Functional prediction from alignment- Output
Follow the animation steps., The server on the previous slide gives the following outputs
http://meme.sdsc.edu/meme4_4_0/intro.html
Step 5: Structural and Functional prediction from alignment- Further Analysis
Action Audio Narration
1
5
3
2
4Description of the action
Functions that can be predicted from sequence data
Animator needs to re-draw all the images shown as they have been retrieved from web-resources. Show the pie chart. Highlight one quarter of it one at a time and depict the diagram next to it along with narrating it.
Once the protein motifs are detected, they can be used for further analysis, such as 1. Epitope Prediction2. Active site determination3. Determination of trans-membrane domains4. Identification of DNA binding residues
http://qwickstep.com/search/the-active-site-of-an-enzyme.html, http://www.science.uva.nl/research/its/molsim/research/TMsignalling_lizhe/index.htmlhttps://www.uzh.ch/oci/ssl-dir/group/files/14_roverview.jpg, http://medgadget.com/archives/2008/03/3d_imaging_of_bleomycindna_binding.html
Identify DNA binding residues
Subtilisn
Finding Trans-membrane domains
Epitope prediction in antigens
Enzyme Active sites
Interacativity Type Options Results
1
2
5
3
4
Input the term “insulin chain A” in the protein database of your choice 1
Chose the protein sequences corresponding to insulin A 2.
Check the source organism for the protein sequence. 3.
Store the FASTA sequences mentioned against Human and mouse in separate locations 4
Run the server to obtain output 6.
Input the two sequences in a multiple alignment server 5
Arrange the steps in the order to be performed.
Remove the step number from the bottom of the tab . Show all the steps in the mixed order. The user must click on the tabs order wise. If the user clicks at a tab which is not in the right order, then flash a message saying “try again”
All the tabs must be arranged in right order.
Check the.dnd file to find evolutionary distance 8
Check for the .aln file and input it into programs for finding Phylogenetic distances such as phylip 7
Interactivity option 1: Find the evolutionary distance between insulin chain A of human and mouse
Interacativity Type Options Results
1
2
5
3
4
Match the left column to the right
Match the meaning of the parameter on the right to the name of the parameter on the left. If the matching is correct, turn the tab green, else flash “Try Again”
Results on next slide
Interactivity option 2.a : Match the following
PAM MATRIX
BLOSUM MATRIX
PHYLOGRAM
BIT SCORE
E-VALUE
DOMAIN IDENTIFICATION
EVOLUTIONARY TREE
SIMILARITY BASED SCORING MATRIX
MEASURE OF BIOLOGICAL SIGNIFICANCE
DISTANCE BASED SCORING MATRIX
MEASURE OF QUALITY OF ALIGNMENT, NORMALIZED ACCORDING TO SCORING
MATRIX
BLAST RESULT LINKED TO PFAM
Boundary/limitsInteracativity Type Options Results
1
2
5
3
4
Interactivity option 2.b : Match the followingPAM MATRIX
BLOSUM MATRIX
PHYLOGRAM
BIT SCORE
E-VALUE
DOMAIN IDENTIFICATION
EVOLUTIONARY TREE
SIMILARITY BASED SCORING MATRIX
MEASURE OF BIOLOGICAL SIGNIFICANCE
DISTANCE BASED SCORING MATRIX
MEASURE OF QUALITY OF ALIGNMENT, NORMALIZED ACCORDING TO
SCORING MATRIX
BLAST RESULT LINKED TO PFAM
Match the left column to the right
Match the meaning of the parameter on the right to the name of the parameter on the left. If the matching is correct, turn the tab green, else flash “Try Again”
Correct Matching
Questionnaire1. Which is a scoring matrix based on distantly related proteins?
Answers: a) PAM b)BLOSUM c) Both d) None
2. Which parameter signifies whether the match between two sequences is a
chance alignment?
Answers: a) word-length b) e-value c) dot-plot d) none
3. Which evolutionary tree has the branch length corresponding to the evolutionary
distances?
Answers: a) Phylogram b)Cladogram c) both d) none
4. Which is NOT a ClustalW output file extension?
Answers: a) .dnd b) .txt c) .aln d) .output
5. Phylogenetic method for most variable sequence is
Answers: a) Distance method b) Maximum Distance c) Maximum Parsimony d)
Maximum Likelihood
1
5
2
4
3
Links for further readingReference websites:
http://blast.ncbi.nlm.nih.gov/Blast.cgihttp://www.ebi.ac.uk/Tools/clustalw2/index.html
http://www.pdb.org/pdb/home/home.dohttp://expasy.org/sprot/
http://expasy.org/prosite/http://pfam.sanger.ac.uk/
http://www.psc.edu/general/software/packages/phylip/
Links for further readingFollowing URLs are used for animations
http://www.ncbi.nlm.nih.gov/http://blast.ncbi.nlm.nih.gov/Blast.cgi
http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html http://pfam.sanger.ac.uk/
http://www.ebi.ac.uk/Tools/es/cgi-bin/clustalw2/http://meme.sdsc.edu/meme4_4_0/intro.html
http://www.ebi.ac.uk/Tools/clustalw2/index.htmlhttp://qwickstep.com/search/the-active-site-of-an-enzyme.html
http://www.science.uva.nl/research/its/molsim/research/TMsignalling_lizhe/index.htmlhttps://www.uzh.ch/oci/ssl-dir/group/files/14_roverview.jpg
http://medgadget.com/archives/2008/03/3d_imaging_of_bleomycindna_binding.html
Links for further reading
Books:
Bioinformatics Sequence and Genome Analysis by David Mount