9
Briana Halbert Bioinformatics Computer Lab October 25, 2013 Purpose The purpose of this activity is to successfully determine the length of the cDNA fragment, translation initiation, and termination using web based tools NCBI-BLAST to find out the protein sequence in one-letter abbreviations, molecular weight, pI, amino acid composition, and the proteins extinction coefficient. This information will be used in addition to background information to determine the functional characteristics of the assigned gene. By performing this activity, experimenters will understand the concepts of protein and DNA sequence functions and their specific identities. Background Gene Rv0211 has a functional subunit that serves as the Rate-limiting gluconeogenic enzyme [catalytic activity: GTP + oxaloacetate = GDP + phosphoenolpyruvate + CO2]. The function of the complex as a whole can be categorized as intermediary metabolism and respiration. Methionine (Met) residues of proteins are readily oxidized to methionine sulfoxide (MetO), especially under oxidative stress conditions. Oxidative alteration of Met to R/S-Met(O) sterioisomers is reversed by methionine sulfoxide reductases which reduce: MsrA, S-MetO and MsrB and R-MetO, which prevents irreversible oxidative protein damage. This protein is highly conserved and it carries out the enzymatic reduction of methionine sulfoxide to methionine This is important because oxidative protein damage can cause Alzheimer’s Disease in people because one of the major causes of this disease is high oxidative stress levels. The proposed function of this gene is the repair of oxidative damage to proteins to restore biological activity. Mycobacterium tuberculosis is the bacterium that causes the disease tuberculosis in humans. Tuberculosis (TB) is the leading cause of death in the world from a bacterial infectious disease. The disease affects 1.8 billion people/year, which is equal to

Bioinformatics LAb Report

Embed Size (px)

DESCRIPTION

Biochemistry Lab

Citation preview

Page 1: Bioinformatics LAb Report

Briana HalbertBioinformatics Computer Lab

October 25, 2013

Purpose

The purpose of this activity is to successfully determine the length of the cDNA fragment, translation initiation, and termination using web based tools NCBI-BLAST to find out the protein sequence in one-letter abbreviations, molecular weight, pI, amino acid composition, and the proteins extinction coefficient. This information will be used in addition to background information to determine the functional characteristics of the assigned gene. By performing this activity, experimenters will understand the concepts of protein and DNA sequence functions and their specific identities.

Background

Gene Rv0211 has a functional subunit that serves as the Rate-limiting gluconeogenic enzyme [catalytic activity: GTP + oxaloacetate = GDP + phosphoenolpyruvate + CO2]. The function of the complex as a whole can be categorized as intermediary metabolism and respiration.

Methionine (Met) residues of proteins are readily oxidized to methionine sulfoxide (MetO), especially under oxidative stress conditions. Oxidative alteration of Met to R/S-Met(O) sterioisomers is reversed by methionine sulfoxide reductases which reduce: MsrA, S-MetO and MsrB and R-MetO, which prevents irreversible oxidative protein damage. This protein is highly conserved and it carries out the enzymatic reduction of methionine sulfoxide to methionine This is important because oxidative protein damage can cause Alzheimer’s Disease in people because one of the major causes of this disease is high oxidative stress levels. The proposed function of this gene is the repair of oxidative damage to proteins to restore biological activity.

Mycobacterium tuberculosis is the bacterium that causes the disease tuberculosis in humans. Tuberculosis (TB) is the leading cause of death in the world from a bacterial infectious disease. The disease affects 1.8 billion people/year, which is equal to one-third of the entire world population. M. tuberculosis is an obligate aerobe. Because of this, the bacterium is always found in the well aerated upper lobes of the lungs. It is primarily transmitted through the air.1

Since M. tuberculosis is a bacterium, it is prokaryotic and contains DNA. The expression of DNA, similar for all organisms, is manifested in the transcription of RNA to be further translated into protein. However, the transcription of RNA is regulated by proteins. As mentioned previously, this gene (Rv0211) functions as a rate-limiting gluconeogenic enzyme.

Bioinformatics is the study of science that focuses on the collection and analysis of biological information through computer generated sequences. The origin of this science was discovered during the construction of the Genome Project. The Genome Project allowed bioinformatics to target both the biological and genomic information simultaneously.

Page 2: Bioinformatics LAb Report

Briana HalbertBioinformatics Computer Lab

October 25, 2013Procedure

At the beginning of the experiment, the site “http://www.ncbi.nlm.nih.gov/” was located. Once located, the pull down menu was utilized to find the category of the gene, specifically gene Rv0211. Once the results appear from the search, the top most result was selected in order to record- the function of the gene product. The link button was clicked in order to search and download the gene sequence through clicking “GenBank”. Next the gene number was identified. The gene’s protein sequence was then viewed in one letter code and the DNA sample. Both findings of data were copied and paste into a document which enable the process to go ahead and find web.expasy.org/protparam/ and relocate the data into the given box of the website. From there “compute parameters” was programmed. The results were shown in record of number of amino acids in the protein, molecular weight of the protein, theoretical pI of protein, the amino acid composition of the protein, and the extinction with/without disulfide bonds. Next the pI was determined in order to find net charge of the protein at 7.0. The proper ion exchange column was chosen for purity. Also the number of tyrosine and typtophan was checked through the use of amino acids composition. The observations were then recorded with their perspective efficient coefficients. Next the other genes were analyzed in order to obtain the same formation of results but with different data. More observations were recorded. The site http://www.ncbi.nlm.nih.gov was found in order to select proteins and go to their data base. Blast protein toll was selected. In the section of BLAST, the protein sequence was posted. From here the blast button was selected onto a page that displays homology information. Alignments were searched in order to adjust a series of 3 amino acids sequences. Observations were jotted down on the representation of the first, second, and third lines in sequences of the protein. The Blast was copied and pasted 95-98% similar to assigned protein. Search data was also included in results.

Results

Amino Acid Sequence of Rv0211

MTSATIPGLDTAPTNHQGLLSWVEEVAELTQPDRVVFTDGSEEEFQRLCDQLVEAGTFIRLNPEKHKNSYLALSDPSDVARVESRTYICSAKEIDAGPTNNWMDPGEMRSIMKDLYRGCMRGRTMYVVPFCMGPLGAEDPKLGVEITDSEYVVVSMRTMTRMGKAALEKMGDDGFFVKALHSVGAPLEPGQKDVAWPCSETKYITHFPETREIWSYGSGYGGNALLGKKCYSLRIASAMAHDEGWLAEHMLILKLISPENKAYYFAAAFPSACGKTNLAMLQPTIPGWRAETLGDDIAWMRFGKDGRLYAVNPEFGFFGVAPGTNWKSNPNAMRTIAAGNTVFTNVALTDDGDVWWEGLEGDPQHLIDWKGNDWYFRETETNAAHPNSRYCTPMSQCPILAPEWDDPQGVPISGILFGGRRKTTVPLVTEARDWQHGVFIGATLGSEQTAAAEGKVGNVRRDPMAMLPFLGYNVGDYFQHWINLGKHADESKLPKVFFVNWFRRGDDGRFLWPGFGENSRVLKWIVDRIEHKAGGATTPIGTVPAVEDLDLDGLDVDAADVAAALAVDADEWRQELPLIEEWLQFVGEKLPTGVKDEFDALKERLG

Figure 1. Amino Acid Sequence

Page 3: Bioinformatics LAb Report

Briana HalbertBioinformatics Computer Lab

October 25, 2013Fourth Gene # of amino

acidsMolecular Weight

Theoretical pI

Extinction Coefficient

Rv0211 606 67253.0 g 4.92 134340

Table 1. Gene Fourth Data

Amino Acid Composition Thr (T) 36 5.9%

Arg (R) 31 5.1% Ile (I) 24 4.0%

Asn (N) 22 3.6% Leu (L) 49 8.1%

Asp (D) 43 7.1% Lys (K) 28 4.6%

Cys (C) 9 1.5% Met (M) 19 3.1%

Gln (Q) 14 2.3% Phe (F) 26 4.3%

Glu (E) 43 7.1% Pro (P) 37 6.1%

Gly (G) 58 9.6% Ser (S) 26 4.3%

His (H) 12 2.0% Thr (T) 36 5.9%

Ile (I) 24 4.0% Trp (W) 20 3.3%

Leu (L) 49 8.1% Tyr (Y) 16 2.6%

Lys (K) 28 4.6% Val (V) 39 6.4%

Met (M) 19 3.1% Pyl (O) 0 0.0%

Phe (F) 26 4.3% Sec (U) 0 0.0%

Pro (P) 37 6.1% Ala (A) 54 8.9%

Ser (S) 26 4.3% Arg (R) 31 5.1%

Table 2. Amino Acid Composition

Total Number of Tyrosine and Tryptophan: 16 + 20 = 36 total

Total Number of Cysteine: 9

Wavelength Molar Extinction w/o Disulfides Molar Extinction w/ All Disulfides280 133840 134340

Table 3. Extinction Coefficient for Rv0211

Page 4: Bioinformatics LAb Report

Briana HalbertBioinformatics Computer Lab

October 25, 2013

Gene # # of Trp + Tyro (Total) ε m−1 c−1 `Group Name

1 12 34045 Oliver, Faine

2 22 56965 Young, Hendricks

3 9 25440 Wilson, Davis, Brownley

4 36 134340 Graham, Mosley

Table 4. Four Experimental Data Groups

Score Expect Method Identities Positives Gaps

1155 bits(2987)

0.0 Compositional matrix adjust.

548/605(91%) 577/605(95%) 0/605(0%)

Table 5. Homolog of Rv0211

Query 1 MTSATIPGLDTAPTNHQGLLSWVEEVAELTQPDRVVFTDGSEEEFQRLCDQLVEAGTFIR 60 MTSATIPGLDTAPTNHQGLLSWV+EVAELTQPDRVVF DGS+EEF RL QLV+AGTF R

Sbjct 1 MTSATIPGLDTAPTNHQGLLSWVQEVAELTQPDRVVFADGSDEEFHRLSAQLVDAGTFTR 60

Query 61 LNPEKHKNSYLALSDPSDVARVESRTYICSAKEIDAGPTNNWMDPGEMRSIMKDLYRGCM 120 LN EK NSYLALSDPSDVARVESRT+ICS +EIDAGPTNNWMDP EMR++M DLYRGCM

Sbjct 61 LNDEKFPNSYLALSDPSDVARVESRTFICSEREIDAGPTNNWMDPSEMRTLMTDLYRGCM 120

Query 121 RGRTMYVVPFCMGPLGAEDPKLGVEITDSEYVVVSMRTMTRMGKAALEKMGDDGFFVKAL 180 RGRTMYVVPFCMGPLGAEDPKLGVEITDSEYVVVSM+ MTRMG AALEKMG DGFFVKAL

Sbjct 121 RGRTMYVVPFCMGPLGAEDPKLGVEITDSEYVVVSMKVMTRMGTAALEKMGQDGFFVKAL 180

Query 181 HSVGAPLEPGQKDVAWPCSETKYITHFPETREIWSYGSGYGGNALLGKKCYSLRIASAMA 240 HSVGAPLE GQ DV WPCS+TKYITHFPETREIWSYGSGYGGNALLGKKCYSLRIASAMA

Sbjct 181 HSVGAPLEDGQADVPWPCSDTKYITHFPETREIWSYGSGYGGNALLGKKCYSLRIASAMA 240

Page 5: Bioinformatics LAb Report

Briana HalbertBioinformatics Computer Lab

October 25, 2013Query 241 HDEGWLAEHMLILKLISPENKAYYFAAAFPSACGKTNLAMLQPTIPGWRAETLGDDIAWM 300 DEGWLAEHMLILKLISPENKAYY AAAFPSACGKTNLAMLQPTIPGWRAETLGDDIAWM

Sbjct 241 RDEGWLAEHMLILKLISPENKAYYIAAAFPSACGKTNLAMLQPTIPGWRAETLGDDIAWM 300

Query 301 RFGKDGRLYAVNPEFGFFGVAPGTNWKSNPNAMRTIAAGNTVFTNVALTDDGDVWWEGLE 360 RFGKDGRLYAVNPEFGFFGVAPGTNWKSNPNAMRTIAAGNTVFTNVALTDDG+VWWEGLE

Sbjct 301 RFGKDGRLYAVNPEFGFFGVAPGTNWKSNPNAMRTIAAGNTVFTNVALTDDGEVWWEGLE 360

Query 361 GDPQHLIDWKGNDWYFRETETNAAHPNSRYCTPMSQCPILAPEWDDPQGVPISGILFGGR 420 GDPQHL+DWKGN+WYFRETET AAHPNSRYCTPMSQCPILAPEWDDPQGVPIS ILFGGR

Sbjct 361 GDPQHLVDWKGNEWYFRETETTAAHPNSRYCTPMSQCPILAPEWDDPQGVPISAILFGGR 420

Query 421 RKTTVPLVTEARDWQHGVFIGATLGSEQTAAAEGKVGNVRRDPMAMLPFLGYNVGDYFQH 480 RKTTVPLVT+ARDWQHGVFIGATLGSEQTAAAEGKVGNVRRDPMAMLPF+GYNVGDY QH

Sbjct 421 RKTTVPLVTQARDWQHGVFIGATLGSEQTAAAEGKVGNVRRDPMAMLPFMGYNVGDYVQH 480

Query 481 WINLGKHADESKLPKVFFVNWFRRGDDGRFLWPGFGENSRVLKWIVDRIEHKAGGATTPI 540 WI++GK++DESKLP+VFFVNWFRRG+D RFLWPGFGENSRV+KWIVDRIEHKAGG TTPI

Sbjct 481 WIDIGKNSDESKLPQVFFVNWFRRGEDHRFLWPGFGENSRVMKWIVDRIEHKAGGKTTPI 540

Query 541 GTVPAVEDLDLDGLDVDAADVAAALAVDADEWRQELPLIEEWLQFVGEKLPTGVKDEFDA 600 GTVP VEDLDL+GLD + ADV+ ALAV+A+EWR+ELPLIEEWLQF+GEKLPTG+KDEFDA

Sbjct 541 GTVPTVEDLDLEGLDANPADVSEALAVNAEEWREELPLIEEWLQFIGEKLPTGIKDEFDA 600

Query 601 LKERL 605 LKERL

Sbjct 601 LKERL 605

Page 6: Bioinformatics LAb Report

Briana HalbertBioinformatics Computer Lab

October 25, 2013

Discussion

Homologs are useful in confirming the function of a gene based off of a known function of a gene homologous to the gene of interest. In the homology search, it was important to find a homolog that had a high percentage of similarity. The homolog chosen has a sequence with 605 amino acids as opposed to the 606 amino acids in Rv0211. The similarity is 91% with Rv0211. In analyzing the sequence for the homolog there are gaps, +, and -. The gaps in the sequence mean that there are penalties. Where there are spaces, the sequences of the homolog and Rv0211 are not similar, and where there are + signs the two sequences have similar chemical characteristics.

This homologue represents a phosphoenolpyruvate carboxykinase like Rv0211. Phosphoenolpyruvate carboxykinase is an important enzyme in gluconeogenesis. It is found in both the cytosol and mitochondria of the liver cells. The enzyme is regulated by insulin, glucocorticoids, cyclic adenosine monophosphate (cAMP) and diet to maintain glucose homeostasis. There are two types of phosphoenolpyruvate carboxykinase that exist which are PCK1, PEPCK1 (soluble in the cytosol) and PCK2, PEPCK2 (soluble in the mitochondria).

The sum of the number of Tryptophan and Tyrosine in a gene also has a great impact on the molar extinction. The extinction coefficient of a protein at 280 nm depends almost exclusively on the number of aromatic residues, particularly tryptophan, and can be predicted from the sequence of amino acids. The molar extinction is a measurement of how strongly a chemical species absorbs light at a given wavelength. In the case of Rv0211 where the Try-Trp sum is 36, the molar extinction with and without disulfides is 134340 and 133840 respectively, which is far greater than those of Rv0137c whose Try-Trp sum is 12 and molar extinction with and without disulfides is 34045. Rv0137c has a higher molar extinction with disulfides than Rv0162c whose extinctions with disulfides is 25440,

Page 7: Bioinformatics LAb Report

Briana HalbertBioinformatics Computer Lab

October 25, 2013respectively. Rv01472 has the second highest molar extinction with disulfides corresponding to 56965.

References

1. Todar, Kenneth. "Tuberculosis." Todar's Online Textbook of Bacteriology. N.p., 2008. Web. 5 Oct 2010. <http://www.textbookofbacteriology.net/tuberculosis.html>.

2. "Patient.co.uk - Trusted Medical Information and Support." Patient.co.uk. N.p., n.d. Web. 01 Nov. 2013.