1
BioInformat ics Database of Primer Results In order to help predict the way proteins will act in an organism, biologists cross-examine sequences of amino acids from many proteins. There are a total of 20 amino acids in existence and proteins often consist of 300 or more amino acids. A “multiple alignment” is performed on a collection of sequences to maximize the areas where the amino acids are similar across all sequences. Online websites presently are available to accomplish the task. Once the multiple alignment is complete, a tedious process begins of searching for contiguous subsequences of the aligned group of protein sequences that may be useful in determining properties about the proteins’ functions. Subsequences that are selected for further analysis are called “primers.” The primer search process is often done by hand and can take hours for small sequence lengths. This project entails a Java program that automates the primer search process and a database organizing results obtained after primers are generated. The software allows the user to examine multiple primers at once and to adjust primer lengths. Once the primers are generated, lab tests are performed on the primers and the results are entered into a database. The database can be queried to find results that might be useful to a biologist. Abstract What is a Protein Sequence? A string of amino acids, each represented by a single letter There are 20 different amino acids Typical proteins are about 300 amino acids long EXAMPLE: … I L V K M U T A N K V K M U … Multiple Alignment Example Shaded areas show regions of exact match. A dash is placed in the smaller protein sequence to achieve the alignment. Redundancies in each column are then removed. The codons are listed for each corresponding amino acid to determine how many different ways each amino acid can be produced from DNA. The total degeneracy is the product of each amino acid’s value. The higher this number is, the less likely we know where the sequence originated from, and the less useful it is in any experiments. Degeneracy Example Data Mining We want to find Association Rules based on data collected about primers to make predictions about which ones to use Association Rules have the form LHSRHS Interpretation: If every item in LHS occurs, then it is likely that all of the items in RHS will also occur Example: LHS = protein sequence A contains primers 1, 2 & 3 RHS = protein sequence A contains primer 4 & 5 Data Mining: Support & Confidence Support How often do LHS & RHS occur together? Confidence Whenever LHS occurs, how often does RHS occur as well? Scope Data is small compared to online databanks Looking to larger sources to increase the support of any predictions made will help in the future Inspection Window This window alllows the user to manipulate one particular primer chosen from a multiple alignment. The control buttons located at the bottom allow the length and position of the primer to be changed with degeneracy updated automatically. Biological Description of the Gene Name of Gene Nucleotide Sequence for Gene Amino Acid Sequence Oligos Contained in the Gene Information for the Experiment Reactions for the Experiment By clicking on Oligos, you can choose which Oligos occurred in the reaction. By clicking on Observations, you can record results about each reaction.

BioInformatics Database of Primer Results In order to help predict the way proteins will act in an organism, biologists cross-examine sequences of amino

Embed Size (px)

Citation preview

Page 1: BioInformatics Database of Primer Results In order to help predict the way proteins will act in an organism, biologists cross-examine sequences of amino

BioInformatics

Database of Primer Results

In order to help predict the way proteins will act in an organism, biologists cross-examine sequences of amino acids from many proteins. There are a total of 20 amino acids in existence and proteins often consist of 300 or more amino acids. A “multiple alignment” is performed on a collection of sequences to maximize the areas where the amino acids are similar across all sequences. Online websites presently are available to accomplish the task.

Once the multiple alignment is complete, a tedious process begins of searching for contiguous subsequences of the aligned group of protein sequences that may be useful in determining properties about the proteins’ functions. Subsequences that are selected for further analysis are called “primers.” The primer search process is often done by hand and can take hours for small sequence lengths.

This project entails a Java program that automates the primer search process and a database organizing results obtained after primers are generated. The software allows the user to examine multiple primers at once and to adjust primer lengths. Once the primers are generated, lab tests are performed on the primers and the results are entered into a database. The database can be queried to find results that might be useful to a biologist.

Abstract What is a Protein Sequence?A string of amino acids, each represented by a single letterThere are 20 different amino acidsTypical proteins are about 300 amino acids longEXAMPLE:

… I L V K M U T A N K V K M U …

Multiple Alignment Example

Shaded areas show regions of exact match.

A dash is placed in the smaller protein sequence to achieve the alignment.

Redundancies in each column are then removed.

The codons are listed for each corresponding amino acid to determine how many different ways each amino acid can be produced from DNA.

The total degeneracy is the product of each amino acid’s value. The higher this number is, the less likely we know where the sequence originated from, and the less useful it is in any experiments.

Degeneracy Example

Data Mining

We want to find Association Rules based on data collected about primers to make predictions about which ones to use

Association Rules have the form LHSRHSInterpretation: If every item in LHS occurs, then it is likely that all

of the items in RHS will also occurExample:

LHS = protein sequence A contains primers 1, 2 & 3RHS = protein sequence A contains primer 4 & 5

Data Mining:Support & Confidence

SupportHow often do LHS & RHS occur together?

Confidence Whenever LHS occurs, how often does RHS occur as well?

ScopeData is small compared to online databanksLooking to larger sources to increase the support of any predictions made will help in the future

Inspection Window

This window alllows the user to manipulate one particular primer chosen from a multiple alignment.

The control buttons located at the bottom allow the length and position of the primer to be changed with degeneracy updated automatically.

Biological Description of the Gene

Name of Gene

Nucleotide Sequence for Gene

Amino Acid Sequence

Oligos Contained in the Gene

Information for the Experiment

Reactions for the Experiment

By clicking on Oligos, you can choose which Oligos occurred in the reaction.

By clicking on Observations, you can record results about each reaction.