57
CSC 498 Project: Analysis of Algal Virus Genomes Using Bioinformatics Tools By Philippe Traverse V00136885 1

traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

CSC 498 Project: Analysis of Algal Virus Genomes Using Bioinformatics Tools

By

Philippe TraverseV00136885

1

Page 2: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

Introduction

There are currently over 200 different companies worldwide involved in the research and

production of algae-based biofuels (Santhanam, 2011). Algae-based biofuels provide an interesting

avenue for a sustainable alternative to fossil fuels due to both their high lipid content compared to other

plants, and the fact that they require only seawater to grow, as opposed to all the land-based plants

currently being used to produce biofuel (Halim et al, 2010). One of the types of algae that has the

highest lipid content is Chlorella sp., which has been used for a long time as a model organism of algae in

the literature as it is easy to grow and well documented by other scientists (Radakovits et al, 2010).

Chlorella is a single-celled green algae which belongs to the family Chlorophyta, and can live

symbiotically with Paramecium bursaria by providing it with products of photosynthesis while the

Paramecium provides it with both motility and nutrients in times of low sunlight (Karakashian, 1975).

Within its Paramecium bursaria host, the Chlorella can be infected by a virus known as PBCV-1.

PBCV-1 is a large virus of the Phycodnaviridae family with a double stranded DNA genome of

333kbp. It is shaped as an icosahedral polygon measuring approximately 190nm in diameter (van Etten

et al, 2002). Based on phylogenetic analysis of PBCV-1 and other viruses, and in particular the DNA

polymerase gene encoded by them, viruses of single-celled algae such as PBCV-1 are thought to be some

of the oldest organisms on the planet at nearly 1.2

billion years old (Villareal et al, 2000). Research into

phylogenetic background as well as functional

annotation of genes in PBCV-1 is interesting for both

the evolutionary history of single-celled algae viruses

as well as their potential application as a vector for

genetic modification of Chlorella.

2Figure 1. Virus Phylogeny. Source: Villareal et al, 2000.

Page 3: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

Methods

Sequence data of PBCV-1 was kindly provided from Dr David Dunigan and Dr Peter van Etten

from the University of Nebraska. This sequence data was put into the algae virus database hosted at

bioinformatics.ca and maintained by Dr Chris Upton at UVic. In addition, excel files from Dr Dunigan

were also provided which included information about parameters such as gene size, GC content, and

some functional annotations of PBCV-1’s genome (see Appendix 1).

The tools used at virology.ca each have access to seven different databases of viruses, organized

into phylogenetic groups: Adenoviridae, Algal viruses, Asfarviridae, Baculoviridae, Coronaviridae,

Herpesviridae, and Iridoviridae. The databases each contain several genomes of individual viruses, with

the genomes organized into genes. Genes that are homologous across several genomes in the database

are grouped into families; by selecting a single family, all the genes can be retrieved at once and

compared against one another. The main tool used to retrieve and organize sequences by gene family is

the Viral Orthologous Clusters (VOCs) tool.

Once genes have been retrieved from the database, there are several other tools that can be

used for comparison. One of the most common visual ways of comparing related sequences is using a

program called Dotter. In this program, a graph is made with one sequence along the x-axis and the

other along the y-axis, and each pixel in the graph is shaded according to how similar the two sequences

are at that position. The resulting graph shows how similar the two sequences are, with perfect

similarity being represented by a perfect diagonal. This tool is particularly useful for finding regions of a

gene or genome that have been duplicated or reversed, as these regions will appear as lines parallel or

perpendicular to the diagonal, respectively. The genome of PBCV-1 was compared to the genomes of

similar strains to PBCV-1, namely strains AR158 and NY2A.

3

Page 4: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

Alongside whole genome comparison, a subset of the largest genes of PBCV-1 was also

uploaded to PDBAlert, which is a web-based tool hosted by the Max Planck Institute for Developmental

Biology in Tübingen, Germany. The user uploads protein or nucleotide sequences, and the queries are

continuously compared against their databases using HHPred. HHPred attempts to predict the tertiary

structure of the sequences being compared using hidden markov models to create a sequence profile

(Agarval et al, 2008). Sequence profiles are then compared to find regions of similarity, and genes with

similar tertiary structure to the query sequence are returned as search results, sorted by statistical

relevancy. This makes HHPred a very useful tool for predicting gene function as the function of every

protein is directly linked to its structure.

Finally, the genes of PBCV-1 were also classified into several categories: First, they were

separated into major and minor genes, based on both their size and their predicted function; Second,

they were separated into stage of expression in the virion life cycle; Lastly, they were categorized by

their method of detection by Dr Dunigan in the lab. According to the gene annotations provided by Dr

Dunigan (see Appendix 1), the genes were grouped into the above categories using simple python

scripts (see Appendix 2) and then assigned different colours for an easy visual representation of the

whole genome in the VGO genome map program.

4

Page 5: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

Results

Dot plots of PBCV-1 against its closely related strains PBCV-AR158 and PBCV-NY2A reveal that

the strains have very few differences in their genomes whatsoever. This is somewhat expected,

considering that the three organisms are only different strains of the same virus (see Figure 3).

Interestingly, there are relatively few obvious regions that have been duplicated or reversed.

Sequences uploaded to PDBAlert yielded many interesting and statistically significant hits: For

example, gene A540L of PBCV-1 was found to be highly similar to protein folding chaperone molecules

3gud_A and 3gw6_A which each come from separate bacteriophage viruses. There were many other hits

as well which had already been identified by Dr Dunigan previously through his own BLAST searches;

however, some entirely new ones were found thanks to HHPred’s structural prediction technique. For a

more complete list of results from HHPred on PDBAlert, see table 1.

5

Figure 3. a)AR158vsNY2A b)PBCV1vsAR158 c)PBCV1vsNY2A

Page 6: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

Gene Name Best HHPred Search ResultA540L Intramolecular Chaperone 3gud_A A561L Alginate Lyase vAL-1 A565R HP0958 Protein from Helicobacter pylori CCUG 17874A583L Topoisomerase II-DNA cleavage complex; DNA Gyrase Subunit BA629R Ribonucleoside-diphosphate reductase large chainA363R chromodomain-ATPase portion of Chd1 chromatin remodelerA456L Adeno-associated virus type 2 Rep40-ADP complexA512R No Useful HitA181/182R ChitinaseA185R DNA polymerase deltaA189/192R No Useful HitA219/222/226R Chondroitin synthaseA256/257L No Useful HitA014R No Useful HitA018L No Useful HitA025/027/029L No Useful HitA035L No Useful HitA044L No Useful HitA111/114R Glycosyltransferase (fucosyltransferase)A140/145R No Useful HitA002bL No Useful HitA002aR No Useful HitA002L No Useful Hita001L No Useful Hit

Table 1. HHPred Results. Rows highlighted in yellow represent genes whose function was previously unknown

6

Page 7: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

Categorization of genes in PBCV-1 based on Dr Dunigan’s notes was successfully implemented

using python scripting techniques to produce colouring files. Colouring files simply contain the name of

each gene followed by a hex or RGB value corresponding to the desired colour. The resulting colouring

files were then used in the Genome Map window of VOCs to produce a visualization of the genome with

genes coloured according to their categorization (see figure 3). These visualizations were then

forwarded on to Dr Dunigan and Dr van Etten in Nebraska to be used in their manuscript describing their

research of PBCV1.

7

Figure 3. Visual Categorization of Genes by a)Functional Annotations b)Major/Minor genes c)Method of Detection

Page 8: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

Conclusion

Algal viruses continue to be an interesting potential tool for genetic modification of algae, but

before these viruses can be manipulated more work still needs to be done in order to identify gene

functions. Bioinformatics can provide us with the tools necessary to do this; by comparing against a large

database of related sequences, regions of structural similarity can be found. Furthermore, the functional

annotation of genes enables researchers to use computer software to group these genes into

orthologous groups, allowing for much faster and easier identification of novel genes. Finally,

bioinformatics software is an invaluable asset in the visual representation of large contiguous sequences

and display of genetic information.

Single-celled green algae have the potential to one day change the world with their unique

ability to produce high amounts of lipids and other energy-rich molecules. With the vast amount of

sequence data available today in the many different databases across the web, the future is bright for

genomic comparisons of both viruses and living organisms alike.

8

Page 9: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

Appendices

Appendix 1. Functional Annotations Provided by Dr Dunigan

Gene Major

Start End %AT

nt aa KEGG COG PfamA BLASTp

a001L No 280 549 68 270 89 N/A N/A N/A N/A

a002aR

No 1022 1177 66 156 51 N/A N/A N/A N/A

A002BL

Yes 1174 1335 56 162 53 N/A N/A N/A N/A

A002cR

Yes 1367 1513 14 147 48 N/A N/A N/A N/A

a002dR

No 1408 1539 12 132 43 N/A N/A N/A N/A

A002L Yes 512 1063 66 552 183 N/A N/A N/A N/A

A003R Yes 1792 2217 70 426 141 N/A N/A N/A N/A

a004aL

No 2221 2361 52 141 46 N/A N/A N/A N/A

a004L No 1891 2127 58 237 78 N/A N/A N/A N/A

A005R Yes 2288 3094 66 807 268 N/A FOG: Ankyrin repeat

Ankyrin repeat Similar to ankyrin 23/unc44 [Aedes aegypti] [2e-33]

a006L No 2393 2611 60 219 72 N/A N/A N/A N/A

A007/008L

Yes 3292 4701 84 1410

469 N/A FOG: Ankyrin repeat

Ankyrin repeat Similar to Pfs NACHT and Ankyrin domain protein [Aspergillus fumigatus Af293] [2e-55]

A009R Yes 4998 5735 84 738 245 N/A N/A Protein of unknown function (DUF1390)

N/A

a010aR

No 6837 7022 60 186 61 N/A N/A N/A N/A

A010R Yes 5768 6973 88 1206

401 N/A N/A Large eukaryotic DNA virus major capsid protein

Similar to hypothetical protein OTV1_165 [Ostreococcus tauri virus 1] [6e-42]

9

Page 10: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

A011L Yes 6970 8181 84 1212

403 N/A N/A Large eukaryotic DNA virus major capsid protein

Similar to hypothetical protein OsV5_190f [Ostreococcus virus OsV5] [5e-44]

a012R No 7897 8190 64 294 97 N/A N/A N/A N/A

a013L No 8230 8571 58 342 113 N/A N/A N/A N/A

A014R Yes 8255 12364 40 4110

1369

N/A N/A N/A N/A

a015L No 8863 9411 48 549 182 N/A N/A N/A N/A

a016L No 9808 10500 64 693 230 N/A N/A N/A N/A

a017L No 11371 11571 54 201 66 N/A N/A N/A N/A

A018L Yes 12367 16374 80 4008

1335

N/A N/A Chlorovirus glycoprotein repeat

N/A

a019R No 12869 13519 60 651 216 N/A N/A N/A N/A

a020R No 13889 14143 62 255 84 N/A N/A N/A N/A

a021R No 14165 14632 68 468 155 N/A N/A N/A N/A

a022R No 14702 15031 58 330 109 N/A N/A N/A N/A

a023R No 15482 15706 70 225 74 N/A N/A N/A N/A

a024R No 15770 16132 64 363 120 N/A N/A N/A N/A

A025/027/029L

Yes 16432 20511 78 4080

1359

N/A N/A N/A N/A

a026R no 16916 17512 72 597 198 N/A N/A N/A N/A

a030R no 18185 18721 72 537 178 N/A N/A N/A N/A

a031R no 18755 19030 54 276 91 N/A N/A N/A N/A

a032R no 19232 19810 68 579 192 N/A N/A N/A N/A

a033R No 20189 20428 66 240 79 N/A N/A N/A N/A

A034R Yes 20572 21498 18 927 308 N/A N/A Protein kinase domain

N/A

A035L Yes 21500 23260 88 1761

586 N/A N/A N/A N/A

a036R No 22696 23037 56 342 113 N/A N/A N/A N/A

A037L Yes 23288 23605 68 318 105 N/A N/A N/A N/A

a038R no 23584 23823 68 240 79 N/A N/A N/A N/A

A039L Yes 23623 24078 80 456 151 S-phase kinase-associated protein 1

SCF ubiquitin ligase SKP1 component

Skp1 family tetramerisation domain / Skp1 family dimerisation domain

Similar to predicted protein [Physcomitrella patens subsp. patens] [2e-27]

a040L No 24087 24476 56 390 129 N/A N/A N/A N/A

A041R Yes 24148 25386 34 1239

412 N/A N/A N/A Similar to predicted protein [Populus trichocarpa] [6e-

10

Page 11: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

10]

a042L No 24238 24504 32 267 88 N/A N/A N/A N/A

a043R No 24798 25025 64 228 75 N/A N/A N/A N/A

a044aL

No 25328 25492 66 165 54 N/A N/A N/A N/A

A044L Yes 25383 27182 84 1800

599 N/A ATPases of the AAA+ class

ATPase family associated with various cellular activities (AAA)

Similar to hypothetical protein [Trypanosoma cruzi strain CL Brener] [6e-21]

a045R No 26328 26540 52 213 70 N/A N/A N/A N/A

a046R No 26720 27220 68 501 166 N/A N/A N/A N/A

a047aL

No 27217 27387 58 171 56 N/A N/A N/A N/A

a047L no 26830 27033 64 204 67 N/A N/A N/A N/A

A048R Yes 27248 27619 18 372 123 N/A N/A N/A N/A

A049L Yes 27610 28269 70 660 219 glycerophosphoryl diester phosphodiesterase

Glycerophosphoryl diester phosphodiesterase

Glycerophosphoryl diester phosphodiesterase family

Similar to glycerophosphodiester phosphodiesterase [Caldivirga maquilingensis IC-167] [4e-30]

A050aL

Yes 28821 28961 84 141 46 N/A N/A N/A N/A

A050L Yes 28286 28711 84 426 141 N/A N/A Pyrimidine dimer DNA glycosylase

Similar to pyrimidine dimer DNA glycosylase/endonuclease V [Prochlorococcus marinus subsp. marinus str. CCMP1375] [2e-29]

A051L Yes 28984 29595 72 612 203 N/A N/A N/A Similar to hypothetical protein Bxe_A0110 [Burkholderia xenovorans LB400] [3e-48]

a052R No 29167 29418 62 252 83 N/A N/A N/A N/A

A053R Yes 29684 30775 62 109 363 D-lactate Lactate D-isomer Similar to

11

Page 12: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

2 dehydrogenase

dehydrogenase and related dehydrogenases

specific 2-hydroxyacid dehydrogenase catalytic domain / D-isomer specific 2-hydroxyacid dehydrogenase NAD binding domain

fermentative D-lactate dehydrogenase NAD-dependent [Escherichia coli str. K-12 substr. MG1655] [1e-65]

a054L No 29800 30123 56 324 107 N/A N/A N/A N/A

a055L No 30008 30403 62 396 131 N/A N/A N/A N/A

a056L No 30289 30603 58 315 104 N/A N/A N/A N/A

A057aR

Yes 31650 32666 38 1017

338 N/A N/A N/A N/A

a057R No 31034 31447 74 414 137 N/A N/A N/A N/A

A058L Yes 30976 31623 58 648 215 N/A N/A N/A N/A

a059L No 31779 32003 52 225 74 N/A N/A N/A N/A

A060L Yes 32796 33500 86 705 234 N/A N/A N/A N/A

A061L Yes 33529 34158 74 630 209 N/A N/A Macrocin-O-methyltransferase (TylF)

Similar to macrocin-O-methyltransferase domain-containing protein [Thermomonospora curvata DSM 43183] [9e-31]

a062aR

No 34016 34198 54 183 60 N/A N/A N/A N/A

a062R No 33808 34179 60 372 123 N/A N/A N/A N/A

A063L Yes 34191 34889 96 699 232 N/A N/A N/A N/A

A064R Yes 34955 36872              

a065L No 35265 35510 60 246 81 N/A N/A N/A N/A

a066aL

No 36851 36976 58 126 41 N/A N/A N/A N/A

a066L No 36607 36945 70 339 112 N/A N/A N/A N/A

A067R Yes 37043 37972 62 930 309 N/A N/A N/A N/A

a068L No 37375 37677 66 303 100 N/A N/A N/A N/A

a069L No 37433 37873 56 441 146 N/A N/A N/A N/A

a070L No 37798 38064 66 267 88 N/A N/A N/A N/A

A071R Yes 38003 39067 80 1065

354 N/A N/A N/A Similar to TPR repeat-containing protein [alpha proteobacterium

12

Page 13: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

HIMB114] [8e-16]

a072L No 38164 38436 64 273 90 N/A N/A N/A N/A

a073L No 38417 38626 56 210 69 N/A N/A N/A N/A

a074L No 38660 38863 74 204 67 N/A N/A N/A N/A

a075aR

No 39784 39933 60 150 49 N/A N/A N/A N/A

A075bL

Yes 39950 40138 80 189 62 N/A N/A N/A N/A

A075cR

Yes 40014 40172 70 159 52 N/A N/A N/A N/A

A075L Yes 39078 39920 80 843 280 N/A N/A Exostosin family Similar to predicted protein [Chlamydomonas reinhardtii] [1e-15]

A076L Yes 40190 40501 50 312 103 N/A N/A N/A N/A

A077L Yes 40470 40745 76 276 91 N/A N/A N/A N/A

A078aL

Yes 41756 41878 92 123 40 N/A N/A N/A N/A

A078bR

Yes 42007 42183 52 177 58 N/A N/A N/A N/A

a078cL

No 42431 42565 62 135 44 N/A N/A N/A N/A

A078R Yes 40863 41759 74 897 298 N-carbamoylputrescine amidase

Predicted amidohydrolase

Carbon-nitrogen hydrolase

Similar to hydrolase carbon-nitrogen family putative [Heliobacterium modesticaldum Ice1] [4e-81]

A079R Yes 42467 43195 80 729 242 N/A N/A Protein of unknown function (DUF1390)

N/A

a080L No 42638 43045 58 408 135 N/A N/A N/A N/A

A081L Yes 43192 43761 76 570 189 N/A N/A N/A N/A

a082R No 43760 43975 60 216 71 N/A N/A N/A N/A

a083R No 43791 44063 4 273 90 N/A N/A N/A N/A

a084aL

no 44458 44580 46 123 40 N/A N/A N/A N/A

A084L Yes 43811 44377 76 567 188 N/A N/A N/A N/A

13

Page 14: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

A085R Yes 44498 45226 66 729 242 N/A N/A 2OG-Fe(II) oxygenase superfamily

Similar to Procollagen-proline dioxygenase [Paenibacillus sp. JDR-2] [2e-18]

a086aL

No 45101 45229 54 129 42 N/A N/A N/A N/A

a086BL

No 46489 46632 72 144 47 N/A N/A N/A N/A

a086cL

No 46508 46639 14 132 43 N/A N/A N/A N/A

a086L No 44953 45195 60 243 80 N/A N/A N/A N/A

A087R Yes 45223 46623 32 1401

466 N/A N/A HNH endonuclease

N/A

A088R Yes 46755 47528 72 774 257 N/A N/A N/A N/A

a089aL

No 47826 47972 76 147 48 N/A N/A N/A N/A

A089R Yes 47628 47834 78 207 68 N/A N/A N/A N/A

A090R Yes 47851 48315 70 465 154 N/A N/A N/A N/A

a092/093aL

No 48547 48732 68 186 61 N/A N/A N/A N/A

A092/093L

Yes 48326 49627 72 1302

433 N/A N/A PBCV-specific basic adaptor domain

N/A

A094L Yes 49662 50756 74 1095

364 N/A Beta-glucanase/Beta-glucan synthetase

Glycosyl hydrolases family 16

Similar to glucan endo-13-beta-D-glucosidase [Shewanella frigidimarina NCIMB 400] [3e-21]

a095R No 50014 50232 52 219 72 N/A N/A N/A N/A

a096R No 50346 50549 68 204 67 N/A N/A N/A N/A

a097R No 50678 50893 68 216 71 N/A N/A N/A N/A

A098R Yes 50886 52592 76 1707

568 N/A Glycosyltransferases probably involved in cell wall biogenesis

Chitin synthase Similar to hypothetical protein BRAFLDRAFT_103600 [Branchiostoma floridae] [3e-58]

a099aL

No 52539 52682 60 144 47 N/A N/A N/A N/A

a099L No 52152 52394 62 243 80 N/A N/A N/A N/A

14

Page 15: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

A100R Yes 52691 54478 60 1788

595 glucosamine--fructose-6-phosphate aminotransferase (isomerizing)

Glucosamine 6-phosphate synthetase contains amidotransferase and phosphosugar isomerase domains

Glutamine amidotransferases class-II / SIS domain

Similar to glutamine--fructose-6-phosphate transaminase [Methylibium petroleiphilum PM1] [1e-156]

a101L No 52925 53188 54 264 87 N/A N/A N/A N/A

a102L No 54468 54974 56 507 168 N/A N/A N/A N/A

A103R Yes 54622 55614 84 993 330 N/A mRNA capping enzyme guanylyltransferase (alpha) subunit

mRNA capping enzyme catalytic domain / mRNA capping enzyme C-terminal domain

Similar to hypothetical protein OTV1_156 [Ostreococcus tauri virus 1] [1e-26]

a104L No 55054 55344 58 291 96 N/A N/A N/A N/A

A105L Yes 55626 56480 84 855 284 ubiquitin carboxyl-terminal hydrolase 8

N/A Ubiquitin carboxyl-terminal hydrolase

Similar to GM11729 [Drosophila sechellia] [9e-12]

a106R No 55770 56168 56 399 132 N/A N/A N/A N/A

A107L Yes 56516 57388 80 873 290 N/A N/A Transcription factor TFIIB repeat

Similar to PREDICTED: hypothetical protein [Vitis vinifera] [1e-09]

A108aR

Yes 57212 57400 52 189 62 N/A N/A N/A N/A

A108bL

Yes 57471 57977 84 507 168 N/A N/A GIY-YIG catalytic domain

Similar to hypothetical protein VIBHAR_p08226 [Vibrio harveyi ATCC BAA-1116] [1e-07]

a108R No 56933 57148 60 216 71 N/A N/A N/A N/A

a110L No 58082 58390 50 309 102 N/A N/A N/A N/A

15

Page 16: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

A111/114R

Yes 58098 60680 80 2583

860 N/A N/A Glycosyltransferase family 10 (fucosyltransferase)

Similar to glycosyl transferase [Cyanothece sp. ATCC 51142] [1e-35]

a112L No 58775 59068 42 294 97 N/A N/A N/A N/A

a113L No 58809 59093 34 285 94 N/A N/A N/A N/A

a115L No 59265 59495 58 231 76 N/A N/A N/A N/A

a116R No 59398 59643 52 246 81 N/A N/A N/A N/A

a117L No 59874 60266 68 393 130 N/A N/A N/A N/A

A118R Yes 60726 61763 74 1038

345 GDPmannose 4-6-dehydratase

Nucleoside-diphosphate-sugar epimerases

NAD dependent epimerase/dehydratase family

Similar to GDP-D-mannose dehydratase [Yersinia pseudotuberculosis IP 32953] [1e-113]

a119L No 61628 62029 70 402 133 N/A N/A N/A N/A

a120L No 61675 61947 50 273 90 N/A N/A N/A N/A

A121R Yes 61780 62094 72 315 104 N/A N/A N/A Similar to hypothetical protein [Tetrahymena thermophila SB210] [2e-15]

A122/123aR

Yes 66133 66258 68 126 41 N/A N/A N/A N/A

a122/123bL

No 66203 66346 62 144 47 N/A N/A N/A N/A

A122/123cL

Yes 66407 66541 64 135 44 N/A N/A N/A N/A

A122/123R

Yes 62145 66176 88 4032

1343

N/A Autotransporter adhesin

Chlorovirus glycoprotein repeat / Domain of unknown function (DUF3476)

Similar to hypothetical protein Epulo_01906 [Epulopiscium sp. N.t. morphotype B] [2e-25]

a124L No 66517 66771 60 255 84 N/A N/A N/A N/A

16

Page 17: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

A125L Yes 66178 67120 84 543 180 N/A DNA-directed RNA polymerase subunit M/Transcription elongation factor TFIIS

Transcription factor S-II (TFIIS) central domain / Transcription factor S-II (TFIIS)

Similar to unnamed protein product [Kluyveromyces lactis] [3e-17]

a126R No 66620 66820 24 201 66 N/A N/A N/A N/A

A127R Yes 67154 67891 14 738 245 N/A N/A N/A Similar to hypothetical protein OTV1_142 [Ostreococcus tauri virus 1] [4e-20]

a128L no 67510 67770 56 261 86 N/A N/A N/A N/A

A129R Yes 67952 69028 82 1077

358 N/A N/A N/A N/A

A130aR

Yes 69261 69404 62 144 47 N/A N/A N/A N/A

a130bR

No 69344 69475 58 132 43 N/A N/A N/A N/A

A130R Yes 69049 69366 84 318 105 N/A N/A N/A N/A

A131L Yes 69359 69769 68 411 136 N/A N/A N/A N/A

a132R No 69533 69805 54 273 90 N/A N/A N/A N/A

A133R Yes 69960 70583 86 624 207 N/A N/A Thylakoid formation protein

Similar to inositol phosphatase-like protein [Chlamydomonas reinhardtii] [1e-24]

A134L Yes 70558 71055 92 498 165 N/A N/A GIY-YIG catalytic domain

N/A

A135L Yes 71055 71603 74 549 182 N/A N/A N/A N/A

A136R Yes 71134 71574 30 441 146 N/A N/A N/A N/A

A137R Yes 71625 71843 44 219 72 N/A N/A N/A N/A

a138aR

No 72604 72747 72 144 47 N/A N/A N/A N/A

A138R Yes 71880 72701 84 822 273 N/A N/A N/A N/A

A139L Yes 72698 73153 60 456 151 N/A N/A N/A N/A

A140/145R

Yes 73107 76490 90 3384

1127

N/A N/A N/A N/A

a142L No 74474 74875 48 402 133 N/A N/A N/A N/A

a144L No 74244 75740 64 1497

498 N/A N/A N/A N/A

a146L No 75752 76006 68 255 84 N/A N/A N/A N/A

a147L No 75882 76655 56 774 257 N/A N/A N/A N/A

A148R Yes 76540 76872 82 333 110 N/A N/A N/A N/A

17

Page 18: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

a149L No 76642 77052 62 411 136 N/A N/A N/A N/A

A150L Yes 76856 77314 74 459 152 N/A N/A N/A N/A

A151R Yes 77397 77804 52 408 135 N/A N/A N/A N/A

a152L No 77822 78094 58 273 90 N/A N/A N/A N/A

A153R Yes 77880 79259 80 1380

459 N/A DNA or RNA helicases of superfamily II

Type III restriction enzyme res subunit

Similar to hypothetical protein OsV5_067f [Ostreococcus virus OsV5] [2e-61]

A154L Yes 79256 80299 70 1044

347 N/A N/A N/A Similar to EsV-1-7 [Ectocarpus siliculosus virus 1] [9e-11]

a155R No 79412 79696 52 285 94 N/A N/A N/A N/A

a156L No 79885 80217 52 333 110 N/A N/A N/A N/A

A157L Yes 80393 80725 80 333 110 N/A N/A N/A N/A

A158L Yes 80770 81084 68 315 104 N/A N/A N/A N/A

A159R Yes 81104 81436 36 333 110 N/A N/A N/A N/A

a160L No 81257 81751 68 495 164 N/A N/A N/A N/A

A161R Yes 81345 81716 58 372 123 N/A N/A N/A N/A

a162aR

No 82831 82959 68 129 42 N/A N/A N/A N/A

A162L Yes 81717 82952 78 1236

411 N/A N/A N/A N/A

A163R Yes 82995 84296 12 1302

433 N/A N/A Ligand-gated ion channel

N/A

A164aR

Yes 84314 84493 14 180 59 N/A N/A N/A N/A

a164L No 83873 84277 80 405 134 N/A N/A N/A N/A

A165aL

Yes 84835 85326 70 492 163 N/A N/A N/A N/A

A165L Yes 84486 84935 60 450 149 N/A N/A N/A N/A

A166R Yes 85431 86237 80 807 268 N/A N/A YqaJ-like viral recombinase domain

Similar to hypothetical protein OsV5_146f [Ostreococcus virus OsV5] [2e-33]

a167aR

No 86128 86283 68 156 51 N/A N/A N/A N/A

a167L No 85677 85880 64 204 67 N/A N/A N/A N/A

A168R Yes 86276 86776 86 501 166 N/A N/A N/A N/A

A169R Yes 86904 87875 74 972 323 aspartate carbamoyltransferase catalytic subunit

Ornithine carbamoyltransferase

Aspartate/ornithine carbamoyltransferase

Similar to LOC100282373 [Zea mays] [7e-78]

18

Page 19: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

carbamoyl-P binding domain / Aspartate/ornithine carbamoyltransferase Asp/Orn binding domain

a170L No 86990 87259 46 270 89 N/A N/A N/A N/A

A171R Yes 87904 89067 84 1164

387 N/A N/A N/A Similar to predicted protein [Physcomitrella patens subsp. patens] [9e-11]

A172aL

Yes 88944 89111 60 168 55 N/A N/A N/A N/A

a172L No 88627 88947 52 321 106 N/A N/A N/A N/A

A173L Yes 89068 89934 68 867 288 N/A Predicted esterase of the alpha-beta hydrolase superfamily

Patatin-like phospholipase

Similar to hypothetical protein BRAFLDRAFT_104465 [Branchiostoma floridae] [8e-20]

A174L Yes 89941 90138 76 198 65 N/A N/A N/A N/A

A175R Yes 89974 90183 22 210 69 N/A N/A N/A N/A

A176aL

Yes 90415 90537 64 123 40 N/A N/A N/A N/A

A176L Yes 90180 90413 62 234 77 N/A N/A PBCV-specific basic adaptor domain

N/A

A177R Yes 90639 91379 84 741 246 N/A N/A Protein of unknown function (DUF1390)

N/A

a178L No 90915 91160 66 246 81 N/A N/A N/A N/A

a179L No 91201 91530 62 330 109 N/A N/A N/A N/A

19

Page 20: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

A180R Yes 91466 91792 78 327 108 N/A Predicted RNA-binding protein homologous to eukaryotic snRNP

Domain of unknown function (DUF814)

Similar to fibronectin-binding A domain-containing protein [Fervidobacterium nodosum Rt17-B1] [1e-12]

A181/182R

Yes 91810 94302 80 2493

830 bifunctional chitinase/lysozyme

N/A Cellulose binding domain

Similar to fibronectin type III domain protein [Edwardsiella ictaluri 93-146] [7e-47]

a183L No 93624 93854 60 231 76 N/A N/A N/A N/A

a184L No 94120 94470 76 351 116 N/A N/A N/A N/A

A185R Yes 94548 97390 40 2742

913 DNA polymerase delta subunit 1

DNA polymerase elongation subunit (family B)

DNA polymerase family B exonuclease domain / DNA polymerase family B

Similar to hypothetical protein OsV5_240f [Ostreococcus virus OsV5] [1e-143]

a186L No 94966 95220 46 255 84 N/A N/A N/A N/A

a187L No 95295 96038 58 744 247 N/A N/A N/A N/A

A188aR

Yes 96944 97390 58 447 148 N/A DNA polymerase elongation subunit (family B)

DNA polymerase family B

Similar to hypothetical protein OTV1_208 [Ostreococcus tauri virus 1] [4e-09]

a188bR

No 97258 97398 72 141 46 N/A N/A N/A N/A

a188L No 96724 96957 56 234 77 N/A N/A N/A N/A

A189/192R

Yes 97433 101332

28 3900

1299

N/A N/A N/A N/A

a190L no 97525 97743 52 219 72 N/A N/A N/A N/A

A193L Yes 101340 102128

72 789 262 proliferating cell nuclear antigen

DNA polymerase sliding clamp subunit (PCNA homolog)

Proliferating cell nuclear antigen N-terminal domain / Proliferating cell nuclear antigen C-terminal domain

Similar to hypothetical protein OsV5_115f [Ostreococcus virus OsV5] [4e-36]

20

Page 21: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

a194R No 101887 102135

56 249 82 N/A N/A N/A N/A

a195R No 102102 102350

66 249 82 N/A N/A N/A N/A

A196L Yes 102156 102614

90 459 152 N/A N/A N/A N/A

a197R No 102340 102627

62 288 95 N/A N/A N/A N/A

a198R No 102399 102623

62 225 74 N/A N/A N/A N/A

A199R Yes 102663 102968

6 306 101 N/A N/A N/A N/A

a200aR

No 103320 103484

52 165 54 N/A N/A N/A N/A

A200R Yes 103057 103413

76 357 118 N/A N/A Cytidine and deoxycytidylate deaminase zinc-binding region

N/A

A201aL

Yes 103546 103722

18 177 58 N/A N/A N/A N/A

A201L Yes 103422 103706

74 285 94 N/A N/A N/A N/A

A202L Yes 103726 104067

90 342 113 N/A N/A N/A N/A

A203R Yes 104134 104784

26 651 216 N/A N/A N/A N/A

a204aL

No 104750 104911

58 162 53 N/A N/A N/A N/A

a204L No 104256 104495

62 240 79 N/A N/A N/A N/A

A205R Yes 104811 105431

80 621 206 N/A N/A PBCV-specific basic adaptor domain

N/A

a206L No 105024 105281

64 258 85 N/A N/A N/A N/A

A207R Yes 105509 106627

86 1119

372 ornithine decarboxylase

Arginine decarboxylase (spermidine biosynthesis)

Pyridoxal-dependent decarboxylase pyridoxal binding domain / Pyridoxal-dependent decarboxylase C-terminal sheet domain

Similar to ornithine decarboxylase [Bos taurus] [2e-73]

A208R Yes 106658 107593

76 936 311 N/A N/A N/A N/A

a210L No 107125 107616

64 492 163 N/A N/A N/A N/A

a211R No 107223 107546

56 324 107 N/A N/A N/A N/A

A212R Yes 107615 107782

78 168 55 N/A N/A N/A N/A

a213a No 108261 10840 56 147 48 N/A N/A N/A N/A

21

Page 22: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

L 7A213L Yes 107779 10822

584 447 148 N/A N/A N/A N/A

a214aL

No 108548 108691

46 144 47 N/A N/A N/A N/A

A214L Yes 108265 108672

76 408 135 N/A N/A N/A N/A

A215L Yes 108746 109711

82 966 321 N/A N/A N/A Similar to hypothetical protein CC1G_06067 [Coprinopsis cinerea okayama7#130] [1e-08]

a216R No 109263 109586

62 324 107 N/A N/A N/A N/A

A217L Yes 109733 110917

70 1185

394 N/A N/A N/A N/A

a218L No 110562 110927

18 366 121 N/A N/A N/A N/A

A219/222/226R

Yes 110893 112926

58 2034

677 N/A Glycosyltransferases probably involved in cell wall biogenesis

N/A Similar to putative transmembrane cellulose synthase [Rhizobium leguminosarum bv. viciae 3841] [2e-82]

a220L No 110995 111384

50 390 129 N/A N/A N/A N/A

a223aL

No 111891 112169

64 279 92 N/A N/A N/A N/A

a223R No 111638 111850

60 213 70 N/A N/A N/A N/A

a224L No 112197 112463

54 267 88 N/A N/A N/A N/A

a225L No 112525 112797

66 273 90 N/A N/A N/A N/A

A227L Yes 112931 113344

80 414 137 N/A N/A N/A N/A

a228aR

No 113268 113402

64 135 44 N/A N/A N/A N/A

a228R No 112989 113219

60 231 76 N/A N/A N/A N/A

A229L Yes 113365 113598

82 234 77 N/A N/A N/A N/A

A230R Yes 113623 114213

36 591 196 N/A N/A N/A N/A

A231L Yes 114214 115365

80 1152

383 N/A N/A N/A N/A

a232aR

No 115274 115408

52 135 44 N/A N/A N/A N/A

a232R No 115006 115269

60 264 87 N/A N/A N/A N/A

A233R Yes 115442 115780

42 339 112 N/A N/A N/A N/A

A234L Yes 115777 116103

72 327 108 N/A N/A N/A N/A

22

Page 23: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

a235R No 115879 116130

54 252 83 N/A N/A N/A N/A

a236L No 116127 116390

68 264 87 N/A N/A N/A N/A

A237R Yes 116167 117723

22 1557

518 N/A Homospermidine synthase

Saccharopine dehydrogenase

Similar to homospermidine synthase [Opitutus terrae PB90-1] [1e-83]

a238L no 117246 117590

46 345 114 N/A N/A N/A N/A

A239L Yes 117726 118172

72 447 148 N/A N/A N/A N/A

A240aL

Yes 118272 118457

72 186 61 N/A N/A N/A N/A

a240bR

No 118336 118476

68 141 46 N/A N/A N/A N/A

a240cL

No 118447 118611

52 165 54 N/A N/A N/A N/A

a240dL

No 118550 118696

60 147 48 N/A N/A N/A N/A

a240L No 117770 117967

48 198 65 N/A N/A N/A N/A

A241R Yes 118556 120733

78 2178

725 ATP-dependent RNA helicase DOB1

Lhr-like helicases

DEAD/DEAH box helicase / DSHCT (NUC185) domain

Similar to hypothetical protein [Monosiga brevicollis MX1] [2e-88]

a242L No 119090 119443

62 354 117 N/A N/A N/A N/A

A243R Yes 120839 121747

92 909 302 N/A N/A N/A N/A

a244L No 121832 122065

44 234 77 N/A N/A N/A N/A

A245R Yes 121875 122438

80 564 187 Cu/Zn superoxide dismutase

Cu/Zn superoxide dismutase

Copper/zinc superoxide dismutase (SODC)

Similar to superoxide dismutase [cu-zn] [7e-42]

A246R Yes 122878 123336

56 459 152 N/A N/A Barwin family Similar to PR4 (PATHOGENESIS-RELATED 4) chitin binding [Arabidopsis thaliana] [8e-10]

A246aR

Yes 122469 122810

82 342 113 N/A N/A N/A N/A

23

Page 24: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

A247R Yes 123427 124578

84 1152

383 N/A FOG: Ankyrin repeat

Ankyrin repeat Similar to hypothetical protein Aasi_1435 [Candidatus Amoebophilus asiaticus 5a2] [9e-26]

A248R Yes 124712 125638

70 927 308 calcium/calmodulin-dependent protein kinase I

Serine/threonine protein kinase

Protein kinase domain

Similar to hypothetical protein [Paramecium tetraurelia strain d4-2] [2e-26]

a249L No 125191 125499

70 309 102 N/A N/A N/A N/A

a250aR

No 125816 125971

54 156 51 N/A N/A N/A N/A

A250R Yes 125670 125954

82 285 94 N/A N/A Ion channel Similar to EsV-1-223 [Ectocarpus siliculosus virus 1] [6e-07]

a251aL

No 126233 126658

56 426 141 N/A N/A N/A N/A

a251bL

No 126958 127113

80 156 51 N/A N/A N/A N/A

A251R Yes 126048 127028

82 981 326 N/A Adenine-specific DNA methylase

D12 class N6 adenine-specific DNA methyltransferase

Similar to Site-specific DNA-methyltransferase (adenine-specific) [Dokdonia donghaensis MED134] [2e-50]

a252aL

No 128124 128378

56 255 84 N/A N/A N/A N/A

a252bL

No 127966 128118

70 153 50 N/A N/A N/A N/A

A252R Yes 127028 128056

74 1029

342 N/A N/A N/A N/A

a253aR

No 128500 128634

56 135 44 N/A N/A N/A N/A

A253R Yes 128133 128585

62 453 150 N/A N/A N/A N/A

A254R Yes 128650 129126

72 477 158 N/A N/A N/A N/A

A255R Yes 129169 129615

86 447 148 N/A N/A N/A N/A

A256/257L

Yes 129608 132121

68 2514

837 N/A N/A N/A N/A

a258R No 131760 13201 60 252 83 N/A N/A N/A N/A

24

Page 25: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

1a259aR

No 132433 132675

68 243 80 N/A N/A N/A N/A

a259bR

No 132647 132847

26 201 66 N/A N/A N/A N/A

A259L Yes 132088 132612

78 525 174 N/A N/A N/A N/A

A260aR

Yes 133700 133897

60 198 65 N/A N/A N/A N/A

a260bR

No 133915 134184

54 270 89 N/A N/A N/A N/A

A260R Yes 132729 134246

78 1518

505 N/A Chitinase Glycosyl hydrolases family 18

Similar to hypothetical protein FG10729.1 [Gibberella zeae PH-1] [1e-48]

A261R Yes 134281 134898

88 618 205 N/A N/A N/A N/A

A262/263L

Yes 134966 135736

64 771 256 N/A N/A N/A N/A

a264R No 135649 135852

84 204 67 N/A N/A N/A N/A

A265L Yes 135838 136587

90 750 249 N/A N/A Poxvirus A22 protein

Similar to hypothetical protein OsV5_058f [Ostreococcus virus OsV5] [5e-10]

a266aR

No 136492 136617

58 126 41 N/A N/A N/A N/A

a266R No 136197 136448

68 252 83 N/A N/A N/A N/A

A267L Yes 136638 137582

80 945 314 N/A N/A N/A Similar to hypothetical protein MIMI_R423 [Acanthamoeba polyphaga mimivirus] [2e-15]

a268R No 137001 137216

50 216 71 N/A N/A N/A N/A

a269R No 137259 137465

62 207 68 N/A N/A N/A N/A

a270R No 137775 138056

80 282 93 N/A N/A N/A N/A

A271L Yes 137759 138583

74 825 274 N/A Lysophospholipase

Putative lysophospholipase

Similar to putative hydrolase [marine gamma proteobacterium HTCC2080] [2e-13]

a272aR

no 138380 138619

58 240 79 N/A N/A N/A N/A

a272R No 138026 13822 54 204 67 N/A N/A N/A N/A

25

Page 26: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

9A273L Yes 138638 13905

488 417 138 N/A N/A Domain of

unknown function (DUF305)

Similar to hypothetical protein MIMI_L153 [Acanthamoeba polyphaga mimivirus] [4e-16]

A274R Yes 139119 139910

16 792 263 N/A N/A N/A N/A

A275R Yes 140138 140896

80 759 252 N/A N/A Protein of unknown function (DUF1390)

N/A

a276L No 140462 140746

60 285 94 N/A N/A N/A N/A

A277L Yes 140812 141723

70 912 303 calcium/calmodulin-dependent protein kinase (CaM kinase) II

Serine/threonine protein kinase

Protein kinase domain

Similar to hypothetical protein DDB_G0289119 [Dictyostelium discoideum AX4] [1e-21]

A278L Yes 141759 143591

80 1833

610 N/A N/A Protein kinase domain / PBCV-specific basic adaptor domain

Similar to protein kinase Fuz7 [Ustilago maydis 521] [8e-06]

a279R No 142146 142364

68 219 72 N/A N/A N/A N/A

a280R No 143242 143523

56 282 93 N/A N/A N/A N/A

a281R No 143601 144320

52 720 239 N/A N/A N/A N/A

A282L Yes 143630 145339

60 1710

569 N/A N/A Protein kinase domain / PBCV-specific basic adaptor domain

Similar to hypothetical protein MGL_1199 [Malassezia globosa CBS 7966] [1e-06]

a283L No 144085 144381

52 297 98 N/A N/A N/A N/A

26

Page 27: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

A284L Yes 145395 146234

82 840 279 choloylglycine hydrolase

Penicillin V acylase and related amidases

Linear amide C-N hydrolases choloylglycine hydrolase family

Similar to penicillin amidase [Planctomyces limnophilus DSM 3776] [1e-41]

a285R No 146062 146277

64 216 71 N/A N/A N/A N/A

a286aR

No 147401 147574

76 174 57 N/A N/A N/A N/A

a286bL

No 147460 147633

52 174 57 N/A N/A N/A N/A

A286R Yes 146261 147397

30 1137

378 N/A N/A N/A N/A

A287R Yes 147481 148287

74 807 268 N/A N/A GIY-YIG catalytic domain / NUMOD1 domain

Similar to hypothetical protein RUMHYD_01972 [Blautia hydrogenotrophica DSM 10507] [1e-17]

a288L No 147552 147818

62 267 88 N/A N/A N/A N/A

A289L Yes 148278 149129

74 852 283 MAP/microtubule affinity-regulating kinase

Serine/threonine protein kinase

Protein kinase domain

Similar to hypothetical protein [Paramecium tetraurelia strain d4-2] [2e-23]

a290R No 148366 148773

48 408 135 N/A N/A N/A N/A

a291R No 148920 149189

68 270 89 N/A N/A N/A N/A

A292L Yes 149247 150233

80 987 328 chitosanase N/A Glycosyl hydrolase family 46

Similar to Chitosanase [Streptosporangium roseum DSM 43021] [1e-16]

a293R No 149329 149835

56 507 168 N/A N/A N/A N/A

a294R No 149926 150150

54 225 74 N/A N/A N/A N/A

27

Page 28: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

A295L Yes 150236 151189

84 954 317 GDP-L-fucose synthase

dTDP-D-glucose 46-dehydratase

NAD dependent epimerase/dehydratase family

Similar to NAD-dependent epimerase/dehydratase [Spirosoma linguale DSM 74] [1e-110]

A296R Yes 151233 151706

14 474 157 N/A N/A N/A N/A

a297aL

No 152133 152261

40 129 42 N/A N/A N/A N/A

A297L Yes 151703 152236

88 534 177 N/A N/A N/A Similar to Pc22g24690 [Penicillium chrysogenum Wisconsin 54-1255] [4e-07]

A298L Yes 152334 153011

72 678 225 N/A N/A N/A Similar to hypothetical protein bglu_2g06720 [Burkholderia glumae BGR1] [2e-12]

a299R No 152521 153039

48 519 172 N/A N/A N/A N/A

a300R No 152796 153026

58 231 76 N/A N/A N/A N/A

A301L Yes 153032 153757

80 726 241 N/A N/A N/A N/A

a302R No 153359 153781

50 423 140 N/A N/A N/A N/A

a303L No 153636 153974

68 339 112 N/A N/A N/A N/A

A304R Yes 153814 154050

8 237 78 N/A N/A N/A N/A

A305L Yes 154040 154654

66 615 204 dual specificity phosphatase

N/A Dual specificity phosphatase catalytic domain

Similar to dual specificity protein phosphatase 7 putative [Aedes aegypti] [2e-12]

A306L Yes 154680 154940

82 261 86 N/A N/A N/A N/A

a307aR

No 154921 155163

68 243 80 N/A N/A N/A N/A

a307R No 154685 154921

30 237 78 N/A N/A N/A N/A

A308L Yes 154981 155340

72 360 119 N/A N/A N/A N/A

a309R No 155242 15537 64 129 42 N/A N/A N/A N/A

28

Page 29: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

0A310L Yes 155484 15599

688 513 170 N/A N/A N/A N/A

a311R No 155572 155823

54 252 83 N/A N/A N/A N/A

a312aR

No 156732 156890

42 159 52 N/A N/A N/A N/A

A312L Yes 156056 156772

66 717 238 N/A N/A N/A Similar to hypothetical protein OsV5_171r [Ostreococcus virus OsV5] [5e-21]

a313aR

No 157107 157268

64 162 53 N/A N/A N/A N/A

A313L Yes 157000 157215

78 216 71 N/A N/A N/A N/A

A314R Yes 157306 157548

68 243 80 N/A N/A N/A N/A

A315L Yes 157545 158285

80 741 246 N/A N/A GIY-YIG catalytic domain / NUMOD1 domain

Similar to hypothetical protein RUMHYD_01972 [Blautia hydrogenotrophica DSM 10507] [1e-18]

A316R Yes 158346 159662

18 1317

438 N/A N/A N/A Similar to hypothetical protein LNTAR_07704 [Lentisphaera araneosa HTCC2155] [3e-14]

a317L No 158400 158855

50 456 151 N/A N/A N/A N/A

A318R Yes 159613 159786

70 174 57 N/A N/A N/A N/A

a319L No 159131 159385

60 255 84 N/A N/A N/A N/A

A320R Yes 159641 160060

68 420 139 N/A N/A N/A N/A

a321aR

No 160404 160541

56 138 45 N/A N/A N/A N/A

A321R Yes 160112 160471

90 360 119 N/A N/A N/A N/A

A322L Yes 160547 161077

90 531 176 N/A N/A N/A N/A

a323R No 160674 161072

52 399 132 N/A N/A N/A N/A

29

Page 30: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

A324L Yes 161118 162479

68 1362

453 N/A N/A N/A Similar to hypothetical protein OsV5_073r [Ostreococcus virus OsV5] [4e-27]

a325R No 161385 161609

44 225 74 N/A N/A N/A N/A

A326L Yes 162545 163174

66 630 209 N/A N/A N/A N/A

a327R No 163110 163436

60 327 108 N/A N/A N/A N/A

A328L Yes 163205 164272

88 1068

355 N/A N/A N/A N/A

a329aL

No 165515 165649

36 0 96 N/A N/A N/A N/A

A329bR

Yes 166040 166225

80 186 61 N/A N/A N/A N/A

a329cR

No 166204 166389

74 186 61 N/A N/A N/A N/A

A329R Yes 164362 164652

64 291 96 N/A N/A N/A N/A

A330R Yes 166437 167735

84 1299

432 N/A FOG: Ankyrin repeat

Ankyrin repeat Similar to ankyrin repeat protein [Trichomonas vaginalis G3] [7e-31]

a331L No 167096 167299

62 204 67 N/A N/A N/A N/A

A333L Yes 167765 168937

92 1173

390 N/A N/A Chitin binding domain / Chitin binding Peritrophin-A domain

Similar to conserved hypothetical protein [Culex quinquefasciatus] [2e-12]

a334L No 168037 168306

54 270 89 N/A N/A N/A N/A

a335R No 168067 168291

58 225 74 N/A N/A N/A N/A

a336R No 168608 168865

60 258 85 N/A N/A N/A N/A

A337L Yes 169024 169632

76 609 202 N/A N/A N/A N/A

A339L Yes 169647 169832

76 186 61 N/A N/A N/A N/A

a340R No 169567 169845

62 279 92 N/A N/A N/A N/A

a341aR

No 170276 170473

52 198 65 N/A N/A N/A N/A

a341bR

No 170404 170562

8 159 52 N/A N/A N/A N/A

A341L Yes 169952 170359

76 408 135 N/A N/A N/A N/A

30

Page 31: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

A342L Yes 170477 172207

74 1731

576 N/A N/A N/A Similar to hypothetical protein OTV1_098 [Ostreococcus tauri virus 1] [6e-12]

a343R No 170839 171099

56 261 86 N/A N/A N/A N/A

a344R No 170921 171268

44 348 115 N/A N/A N/A N/A

a345L No 171136 171426

54 291 96 N/A N/A N/A N/A

a346L No 172288 172515

48 228 75 N/A N/A N/A N/A

a347L No 172376 172588

40 213 70 N/A N/A N/A N/A

a348aR

No 172752 172907

52 156 51 N/A N/A N/A N/A

A348R Yes 172385 172864

66 480 159 N/A N/A N/A N/A

A349L Yes 172865 173413

66 549 182 N/A N/A N/A N/A

a350aR

No 173627 173761

52 135 44 N/A N/A N/A N/A

A350R Yes 173289 173657

64 369 122 N/A N/A Protein of unknown function (DUF3605)

N/A

a351aL

No 174738 174869

36 132 43 N/A N/A N/A N/A

A351L Yes 173636 174712

80 1077

358 N/A N/A GIY-YIG catalytic domain

N/A

A352L Yes 174838 175461

90 624 207 N/A N/A N/A N/A

a353R No 175155 175361

48 207 68 N/A N/A N/A N/A

A354R Yes 175590 176627

88 1038

345 N/A N/A N/A Similar to endonuclease [Streptococcus pyogenes MGAS8232] [1e-07]

a355L No 176054 176272

64 219 72 N/A N/A N/A N/A

A356R Yes 176506 176829

68 324 107 N/A N/A N/A N/A

A357L Yes 176631 177617

82 987 328 N/A N/A N/A N/A

a358R No 177271 177585

54 315 104 N/A N/A N/A N/A

a359L No 177293 177460

72 168 55 N/A N/A N/A N/A

A363R Yes 177696 181235

48 3540

1179

N/A N/A N/A Similar to D6/D11-like helicase [Marseillevirus] [6e-06]

a364L No 179408 179611

56 204 67 N/A N/A N/A N/A

31

Page 32: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

a365L No 180919 181209

76 291 96 N/A N/A N/A N/A

A366L Yes 181239 182006

72 768 255 N/A N/A Protein of unknown function (DUF1390)

N/A

a367R No 181338 181613

68 276 91 N/A N/A N/A N/A

A368L Yes 182269 183780

66 1512

503 N/A N/A N/A N/A

a370R No 183052 183300

72 249 82 N/A N/A N/A N/A

a371R No 183439 183627

62 189 62 N/A N/A N/A N/A

a372L No 183773 184003

64 231 76 N/A N/A N/A N/A

A373R Yes 183954 184412

76 459 152 N/A N/A N/A N/A

a374L No 184016 184279

64 264 87 N/A N/A N/A N/A

A375R Yes 184452 184973

88 522 173 N/A N/A N/A N/A

a376R No 184910 185101

50 192 63 N/A N/A N/A N/A

A378L Yes 184981 185766

88 786 261 N/A N/A N/A N/A

A379L Yes 185805 186428

76 624 207 N/A N/A N/A N/A

a380R No 186054 186359

42 306 101 N/A N/A N/A N/A

a381R No 186136 186393

66 258 85 N/A N/A N/A N/A

A383R Yes 186572 187954

94 1383

460 N/A N/A Large eukaryotic DNA virus major capsid protein

Similar to hypothetical protein OTV1_165 [Ostreococcus tauri virus 1] [4e-21]

a384aR

No 187752 187964

70 213 70 N/A N/A N/A N/A

A384aL

Yes 187904 188092

68 189 62 N/A N/A N/A N/A

A384bL

Yes 188031 188213

76 183 60 N/A N/A N/A N/A

a384bR

No 188097 188243

42 147 48 N/A N/A N/A N/A

A384cL

Yes 188236 190164

92 1929

642 N/A N/A Chitin binding Peritrophin-A domain / Large eukaryotic DNA virus major capsid protein

Similar to basal body protein [Naegleria gruberi] [1e-05]

a385L No 188307 188537

50 231 76 N/A N/A N/A N/A

a388R No 189372 189701

62 330 109 N/A N/A N/A N/A

a391R No 189884 190171

74 288 95 N/A N/A N/A N/A

A392R Yes 190233 19101 26 780 259 N/A N/A Poxvirus A32 Similar to

32

Page 33: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

2 protein hypothetical protein OsV5_098r [Ostreococcus virus OsV5] [3e-51]

a393aL

No 191066 191200

2 135 44 N/A N/A N/A N/A

a393L No 190818 191015

74 198 65 N/A N/A N/A N/A

A394R Yes 191103 191468

62 366 121 N/A N/A N/A N/A

A395aL

Yes 191755 191877

72 123 40 N/A N/A N/A N/A

a395bL

No 191872 192009

64 138 45 N/A N/A N/A N/A

A395R Yes 191505 191753

76 249 82 N/A N/A N/A N/A

A396L Yes 191899 192357

82 459 152 N/A N/A N/A N/A

A397R Yes 192530 192988

86 459 152 N/A N/A N/A N/A

a398aR

No 193188 193367

54 180 59 N/A N/A N/A N/A

A398L Yes 192998 193354

60 357 118 N/A N/A N/A N/A

A399R Yes 193444 194028

64 585 194 N/A N/A RNase H N/A

A400R Yes 194063 194419

78 357 118 N/A N/A N/A N/A

A401R Yes 194454 195287

78 834 277 N/A N/A N/A Similar to hypothetical protein OsV5_178f [Ostreococcus virus OsV5] [8e-57]

A402R Yes 195325 196008

78 684 227 N/A N/A N/A N/A

A403R Yes 196114 196395

78 282 93 N/A N/A N/A N/A

A404aL

Yes 197020 197178

84 159 52 N/A N/A N/A N/A

A404R Yes 196440 197015

88 576 191 N/A N/A N/A N/A

A405R Yes 197233 198723

6 1491

496 N/A N/A N/A N/A

a406L No 198489 198689

70 201 66 N/A N/A N/A N/A

A407L Yes 198725 199357

80 633 210 N/A N/A N/A Similar to hypothetical protein OTV1_100 [Ostreococcus tauri virus 1] [5e-09]

33

Page 34: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

A408L Yes 199390 200223

56 834 277 N/A N/A N/A Similar to EsV-1-42 [Ectocarpus siliculosus virus 1] [1e-12]

a409R No 199700 199948

44 249 82 N/A N/A N/A N/A

A410L Yes 200174 200506

72 333 110 N/A N/A N/A Similar to hypothetical protein FeldSpV_gp117 [Feldmannia species virus] [3e-07]

A411R Yes 200616 201128

78 513 170 N/A N/A N/A N/A

A412R Yes 201158 201697

74 540 179 N/A N/A N/A N/A

A413L Yes 201698 202432

80 735 244 N/A N/A N/A N/A

A414R Yes 202444 202725

48 282 93 N/A N/A N/A N/A

a415L No 202491 202700

66 210 69 N/A N/A N/A N/A

A416R Yes 202802 203368

82 567 188 N/A Deoxynucleoside kinases

Deoxynucleoside kinase

Similar to hypothetical protein MIV029R [Invertebrate iridescent virus 3] [5e-20]

A417L Yes 203344 204633

82 1290

429 N/A N/A N/A Similar to hypothetical protein [Paramecium tetraurelia strain d4-2] [4e-06]

a418R No 203704 203943

62 240 79 N/A N/A N/A N/A

a419R No 204434 204646

74 213 70 N/A N/A N/A N/A

A420L Yes 204664 204876

82 213 70 N/A N/A N/A N/A

A421R Yes 204921 205217

10 297 98 N/A N/A N/A N/A

A422aR

Yes 206324 206515

74 192 63 N/A N/A N/A N/A

34

Page 35: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

A422R Yes 205267 206259

82 993 330 N/A N/A NUMOD4 motif / HNH endonuclease

Similar to HNH endonuclease [Acanthamoeba polyphaga mimivirus] [2e-11]

A423R Yes 206523 206996

72 474 157 N/A N/A N/A N/A

A424R Yes 207017 207346

86 330 109 N/A N/A N/A N/A

a425L No 207176 207364

58 189 62 N/A N/A N/A N/A

A426R Yes 207379 207723

52 345 114 N/A N/A N/A N/A

A427L Yes 207726 208085

76 360 119 N/A N/A Thioredoxin N/A

A428L Yes 208133 208570

80 438 145 N/A N/A N/A N/A

A429L Yes 208596 210026

70 1431

476 N/A N/A N/A Similar to PREDICTED: hypothetical protein [Vitis vinifera] [5e-10]

A430L Yes 210155 211468

76 1314

437 N/A N/A Large eukaryotic DNA virus major capsid protein

Similar to hypothetical protein OsV5_190f [Ostreococcus virus OsV5] [5e-96]

A431L Yes 211513 211713

64 201 66 N/A N/A N/A N/A

A432R Yes 211752 212225

14 474 157 N/A N/A N/A N/A

a433aR

No 212149 212283

46 135 44 N/A N/A N/A N/A

a433R No 211961 212302

56 342 113 N/A N/A N/A N/A

a434aR

No 212284 212472

62 189 62 N/A N/A N/A N/A

a434L No 212280 212450

58 171 56 N/A N/A N/A N/A

A435R Yes 212315 212506

30 192 63 N/A N/A N/A N/A

A436L Yes 212299 212490

60 192 63 N/A N/A PBCV-specific basic adaptor domain

N/A

A437aR

Yes 212700 212852

40 153 50 N/A N/A N/A N/A

A437L Yes 212519 212830

76 312 103 N/A N/A Non-histone chromosomal protein MC1

N/A

35

Page 36: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

A438L Yes 212859 213095

86 237 78 N/A Glutaredoxin and related proteins

Glutaredoxin Similar to glutaredoxin 3 [Ochrobactrum intermedium LMG 3301] [3e-07]

A439aR

Yes 213465 213635

70 171 56 N/A N/A N/A N/A

A439R Yes 213120 213458

34 339 112 N/A N/A N/A N/A

A440L Yes 213625 213891

70 267 88 N/A N/A N/A N/A

A441L Yes 213910 214323

72 414 137 N/A N/A N/A N/A

a442R No 213958 214353

10 396 131 N/A N/A N/A N/A

A443R Yes 214463 215389

74 927 308 N/A N/A N/A N/A

A444L Yes 215534 215848

84 315 104 N/A N/A N/A N/A

A445L Yes 215886 217274

86 1389

462 aarF domain-containing kinase

Predicted unusual protein kinase

ABC1 family Similar to ABC-1 domain protein [Sulfolobus islandicus M.16.4] [1e-29]

a446R No 216349 216654

44 306 101 N/A N/A N/A N/A

A447aR

Yes 217321 217458

8 138 45 N/A N/A N/A N/A

a447R No 216679 216978

46 300 99 N/A N/A N/A N/A

a448aL

No 217787 217930

40 144 47 N/A N/A N/A N/A

A448L Yes 217342 217662

76 321 106 N/A N/A Thioredoxin Similar to transglutaminase [Brugia malayi] [2e-07]

A449R Yes 217799 218380

70 582 193 N/A N/A mRNA capping enzyme beta chain

N/A

A450R Yes 218680 219429

86 750 249 N/A N/A Protein of unknown function (DUF1390)

N/A

a451L No 218974 219279

54 306 101 N/A N/A N/A N/A

A452L Yes 219453 219692

82 240 79 N/A N/A N/A N/A

a453R No 219757 220026

10 270 89 N/A N/A N/A N/A

A454L Yes 219770 220639

82 870 289 N/A N/A N/A N/A

a455R No 220218 220661

42 444 147 N/A N/A N/A N/A

A456L Yes 220670 22263 74 196 654 N/A Predicted D5 N terminal Similar to

36

Page 37: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

4 5 ATPase like hypothetical protein OsV5_188r [Ostreococcus virus OsV5] [1e-100]

a457R No 220932 221162

44 231 76 N/A N/A N/A N/A

a458L No 221107 221358

60 252 83 N/A N/A N/A N/A

a459R No 221286 221588

52 303 100 N/A N/A N/A N/A

a460R No 221801 222037

56 237 78 N/A N/A N/A N/A

A461R Yes 222723 222956

54 234 77 N/A N/A N/A N/A

A462R Yes 222749 222964

36 216 71 N/A N/A N/A N/A

a463L No 222822 223397

54 576 191 N/A N/A N/A N/A

A464aR

Yes 223726 223851

58 126 41 N/A N/A N/A N/A

A464R Yes 222966 223793

50 828 275 N/A dsRNA-specific ribonuclease

RNase3 domain / Double-stranded RNA binding motif

Similar to hypothetical protein OsV5_145f [Ostreococcus virus OsV5] [5e-50]

A465R Yes 223814 224170

74 357 118 N/A Mitochondrial sulfhydryl oxidase involved in the biogenesis of cytosolic Fe/S proteins

Erv1 / Alr family Similar to putative thiol oxidoreductase [Acanthamoeba polyphaga mimivirus] [4e-11]

a466L No 223948 224223

68 276 91 N/A N/A N/A N/A

A467L Yes 224220 225158

78 939 312 N/A N/A N/A Similar to hypothetical protein OTV1_005 [Ostreococcus tauri virus 1] [4e-12]

37

Page 38: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

A468R Yes 225292 226623

72 1332

443 N/A N/A Eukaryotic and archaeal DNA primase small subunit / Herpesviridae UL52/UL70 DNA primase

Similar to hypothetical protein OsV5_086f [Ostreococcus virus OsV5] [9e-19]

a469L No 226311 226541

60 231 76 N/A N/A N/A N/A

a470aL

No 227195 227356

66 162 53 N/A N/A N/A N/A

A470R Yes 226696 227307

80 612 203 N/A N/A N/A Similar to hypothetical protein OsV5_091r [Ostreococcus virus OsV5] [1e-26]

A471R Yes 227367 227888

70 522 173 N/A N/A N/A Similar to hypothetical protein MIMI_L507 [Acanthamoeba polyphaga mimivirus] [6e-24]

a472L No 227642 227914

70 273 90 N/A N/A N/A N/A

A473L Yes 228013 229566

70 1554

517 N/A Glycosyltransferases probably involved in cell wall biogenesis

Glycosyl transferase family 2 / Cellulose synthase

Similar to unnamed protein product [Podospora anserina] [1e-127]

a474R No 228377 228694

46 318 105 N/A N/A N/A N/A

a475R No 229048 229320

50 273 90 N/A N/A N/A N/A

A476R Yes 229705 230679

80 975 324 ribonucleoside-diphosphate reductase subunit M2

Ribonucleotide reductase beta subunit

Ribonucleotide reductase small chain

Similar to ribonucleoside-diphosphate reductase small chain [Candidatus Protochlamydia amoebophila UWE25] [1e-112]

38

Page 39: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

a477L No 229906 230127

52 222 73 N/A N/A N/A N/A

a478aL

No 231306 231812

56 507 168 N/A N/A N/A N/A

A478L Yes 230676 231608

74 933 310 N/A N/A N/A Similar to hypothetical protein MIMI_R423 [Acanthamoeba polyphaga mimivirus] [3e-32]

a479L No 231530 231769

52 240 79 N/A N/A N/A N/A

A480L Yes 231643 231924

82 282 93 N/A N/A N/A N/A

a481aL

No 232680 232802

64 123 40 N/A N/A N/A N/A

A481L Yes 231952 232626

78 675 224 N/A N/A N/A N/A

A482R Yes 232744 233391

86 648 215 N/A N/A MYM-type Zinc finger with FCS sequence motif

Similar to hypothetical protein OTV1_050 [Ostreococcus tauri virus 1] [4e-20]

a483L No 233160 233375

68 216 71 N/A N/A N/A N/A

A484L Yes 233399 233866

82 468 155 N/A N/A N/A N/A

A485R Yes 233951 234397

54 447 148 N/A N/A N/A N/A

A486L No 234401 234859

80 459 152 N/A N/A N/A N/A

a487R Yes 234667 234864

60 198 65 N/A N/A N/A N/A

A488R Yes 235101 236054

82 954 317 N/A N/A N/A Similar to hypothetical protein OTV1_103 [Ostreococcus tauri virus 1] [1e-22]

a489R No 235651 236019

56 369 122 N/A N/A N/A N/A

A490L Yes 236080 237012

82 933 310 N/A N/A N/A Similar to hypothetical protein MIMI_R423 [Acanthamoeba polyphaga mimivirus] [4e-28]

a491aL

No 237279 237428

52 150 49 N/A N/A N/A N/A

A491R Yes 237110 237340

78 231 76 N/A N/A N/A N/A

39

Page 40: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

A492L Yes 237337 237906

70 570 189 N/A N/A N/A N/A

A493L Yes 237944 238519

92 576 191 N/A N/A N/A N/A

A494R Yes 238579 239661

14 1083

360 N/A N/A A2L zinc ribbon domain / Poxvirus Late Transcription Factor VLTF3 like

Similar to hypothetical protein OsV5_117f [Ostreococcus virus OsV5] [2e-45]

A495R Yes 239737 240402

76 666 221 N/A N/A GIY-YIG catalytic domain

Similar to putative SegB homing endonuclease [Staphylococcus phage PH15] [4e-08]

a496L No 240270 240482

60 213 70 N/A N/A N/A N/A

A497R Yes 240443 240883

88 441 146 N/A N/A N/A N/A

a498L No 240457 240831

60 375 124 N/A N/A N/A N/A

a499L No 240483 240716

66 234 77 N/A N/A N/A N/A

A500L Yes 240947 242005

86 1059

352 N/A N/A N/A N/A

A502L Yes 242040 242327

92 288 95 N/A N/A N/A N/A

A503L Yes 242339 243253

72 915 304 N/A N/A N/A N/A

a504aL

No 243222 243350

52 129 42 N/A N/A N/A N/A

a504R No 242786 243076

66 291 96 N/A N/A N/A N/A

A505L Yes 243250 244704

64 1455

484 N/A N/A N/A N/A

a506R No 243470 243682

50 213 70 N/A N/A N/A N/A

a507R No 243634 244194

48 561 186 N/A N/A N/A N/A

a508R No 244334 244567

54 234 77 N/A N/A N/A N/A

a509R No 244423 244728

52 306 101 N/A N/A N/A N/A

a510R No 244616 244819

50 204 67 N/A N/A N/A N/A

a511L No 244999 245220

50 222 73 N/A N/A N/A N/A

A512R Yes 245005 247419

56 2415

804 N/A N/A N/A N/A

a513R No 245138 245500

56 363 120 N/A N/A N/A N/A

a514L No 246105 246539

62 435 144 N/A N/A N/A N/A

a515L No 246172 246783

44 612 203 N/A N/A N/A N/A

a516R No 246416 246622

44 207 68 N/A N/A N/A N/A

A517L Yes 247408 248442

86 1035

344 DNA (cytosine-5-)-

Site-specific DNA

C-5 cytosine-specific DNA

Similar to site-specific DNA-

40

Page 41: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

methyltransferase

methylase methylase methyltransferase [Acinetobacter baumannii ATCC 19606] [2e-22]

a518R No 248150 248452

62 303 100 N/A N/A N/A N/A

A519L Yes 248519 248767

74 249 82 N/A N/A N/A N/A

A520L Yes 248772 249074

74 303 100 N/A N/A N/A N/A

A521aL

Yes 249668 250276

96 609 202 N/A N/A N/A Similar to 136R [Invertebrate iridescent virus 6] [2e-06]

A521L Yes 249093 249722

70 630 209 N/A N/A N/A N/A

a522R No 249954 250226

62 273 90 N/A N/A N/A N/A

A523R Yes 250331 250846

2 516 171 N/A N/A N/A N/A

a524L No 250420 250743

48 324 107 N/A N/A N/A N/A

a525R No 250527 250727

48 201 66 N/A N/A N/A N/A

A526R Yes 250898 251338

90 441 146 N/A N/A N/A N/A

A527aL

Yes 251724 251900

78 177 58 N/A N/A N/A N/A

A527R Yes 251338 251637

58 300 99 N/A N/A N/A N/A

a528R No 251727 252011

74 285 94 N/A N/A N/A N/A

a529L No 251934 252152

38 219 72 N/A N/A N/A N/A

A530R Yes 251953 252993

2 1041

346 N/A Site-specific DNA methylase

C-5 cytosine-specific DNA methylase

Similar to hypothetical protein StreC_13626 [Streptomyces sp. C] [1e-20]

A531L Yes 252995 253198

64 204 67 N/A N/A N/A N/A

A532aL

Yes 253484 253636

84 153 50 N/A N/A N/A N/A

A532L Yes 253218 253457

78 240 79 N/A N/A N/A N/A

A533R Yes 253765 254889

82 1125

374 N/A N/A N/A N/A

A534R Yes 254810 255127

56 318 105 N/A N/A N/A N/A

A535L Yes 255129 255344

82 216 71 N/A N/A N/A N/A

a536aL

No 255590 255766

48 177 58 N/A N/A N/A N/A

A536L Yes 255394 25561 80 222 73 N/A N/A N/A N/A

41

Page 42: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

5A537L Yes 255643 25644

078 798 265 N/A N/A N/A N/A

a538L No 256463 256660

68 198 65 N/A N/A N/A N/A

a539aR

No 257034 257198

64 165 54 N/A N/A N/A N/A

A539R Yes 256565 257086

80 522 173 N/A N/A GIY-YIG catalytic domain

N/A

A540L Yes 257089 260859

82 3771

1256

N/A N/A N/A Similar to hypothetical protein LNTAR_05021 [Lentisphaera araneosa HTCC2155] [7e-24]

a541R No 258710 259000

52 291 96 N/A N/A N/A N/A

a542R No 259347 259565

48 219 72 N/A N/A N/A N/A

A543L No 260940 261107

58 168 55 N/A N/A N/A N/A

A544R Yes 260953 261849

52 897 298 N/A ATP-dependent DNA ligase

ATP dependent DNA ligase domain

Similar to ATP dependent DNA ligase [Chthoniobacter flavus Ellin428] [1e-52]

a545L No 261324 261545

72 222 73 N/A N/A N/A N/A

A546L Yes 261818 263008

78 1191

396 N/A N/A Glycosyl transferases group 1

Similar to hypothetical protein OsV5_020r [Ostreococcus virus OsV5] [6e-21]

a547R No 262104 262337

58 234 77 N/A N/A N/A N/A

A548L Yes 263043 264530

72 1488

495 SWI/SNF-related matrix-associated actin-dependent regulator of chromatin subfamily A member 5

Superfamily II DNA/RNA helicases SNF2 family

SNF2 family N-terminal domain / Helicase conserved C-terminal domain

Similar to hypothetical protein OTV1_129 [Ostreococcus tauri virus 1] [2e-63]

a549R No 263463 263708

56 246 81 N/A N/A N/A N/A

a550aR

No 264418 264576

54 159 52 N/A N/A N/A N/A

a550b No 264494 26468 68 189 62 N/A N/A N/A N/A

42

Page 43: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

L 2a550R No 263955 26436

566 411 136 N/A N/A N/A N/A

a551aR

No 264946 265074

44 129 42 N/A N/A N/A N/A

A551bL

Yes 265136 265303

62 168 55 N/A N/A N/A N/A

A551L Yes 264603 265028

70 426 141 dUTP pyrophosphatase

dUTPase dUTPase Similar to dUTPase (Dut) putaive [Aspergillus fumigatus Af293] [3e-39]

A552R Yes 265187 266140

86 954 317 N/A N/A Transcription factor TFIID (or TATA-binding protein TBP)

Similar to hypothetical protein OTV1_140 [Ostreococcus tauri virus 1] [5e-09]

a553L No 265624 265839

50 216 71 N/A N/A N/A N/A

A554/556/557L

Yes 266143 267639

70 1497

498 N/A Predicted ATPase of the PP-loop superfamily implicated in cell cycle control

PP-loop family Similar to predicted protein [Ostreococcus lucimarinus CCE9901] [2e-32]

A555aR

Yes 267599 267736

58 138 45 N/A N/A N/A N/A

a555R No 266521 266862

54 342 113 N/A N/A N/A N/A

a558aL

No 268960 269100

36 141 46 N/A N/A N/A N/A

A558L Yes 267736 268938

82 1203

400 N/A N/A Large eukaryotic DNA virus major capsid protein

Similar to hypothetical protein OsV5_190f [Ostreococcus virus OsV5] [1e-46]

A559L Yes 269042 269683

78 642 213 N/A N/A N/A N/A

a560R No 269181 269495

68 315 104 N/A N/A N/A N/A

A561L Yes 269727 271676

90 1950

649 N/A N/A N/A N/A

a562R No 270374 270571

42 198 65 N/A N/A N/A N/A

a563R No 271510 271689

68 180 59 N/A N/A N/A N/A

43

Page 44: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

A564L Yes 271714 272769

70 1056

351 N/A N/A N/A Similar to EsV-1-7 [Ectocarpus siliculosus virus 1] [6e-11]

A565R Yes 272865 274877

70 2013

670 N/A N/A N/A N/A

a566L No 274139 274411

60 273 90 N/A N/A N/A N/A

A567L Yes 274874 275332

76 459 152 N/A N/A N/A N/A

A568L Yes 275356 275895

72 540 179 N/A N/A N/A N/A

a569R No 275519 275773

66 255 84 N/A N/A N/A N/A

A570L Yes 275920 276306

86 387 128 N/A N/A N/A N/A

A571R Yes 276366 276716

14 351 116 N/A N/A PBCV-specific basic adaptor domain

N/A

A572R Yes 276730 277275

74 546 181 N/A N/A N/A N/A

a573L No 276965 277174

60 210 69 N/A N/A N/A N/A

A574aL

Yes 277951 278136

64 186 61 N/A N/A N/A N/A

A574L Yes 277272 278066

64 795 264 proliferating cell nuclear antigen

DNA polymerase sliding clamp subunit (PCNA homolog)

Proliferating cell nuclear antigen N-terminal domain / Proliferating cell nuclear antigen C-terminal domain

Similar to predicted protein [Ostreococcus lucimarinus CCE9901] [2e-32]

A575L Yes 278085 278591

78 507 168 N/A N/A N/A N/A

a576R No 278403 278624

60 222 73 N/A N/A N/A N/A

A577L Yes 278644 279045

84 402 133 N/A N/A N/A N/A

a578L No 279248 279379

62 132 43 N/A N/A N/A N/A

A579L Yes 279084 279800

82 717 238 N/A N/A N/A Similar to hypothetical protein MAR_ORF016 [Marseillevirus] [8e-30]

a580R No 279643 279864

64 222 73 N/A N/A N/A N/A

44

Page 45: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

A581R Yes 279882 280679

52 798 265 DNA adenine methylase

Site-specific DNA methylase

D12 class N6 adenine-specific DNA methyltransferase

Similar to Dam-like adenine-specific DNA methylase [Marseillevirus] [4e-46]

a582aL

No 280684 280851

54 168 55 N/A N/A N/A N/A

a582bL

No 280782 280940

58 159 52 N/A N/A N/A N/A

a582L No 280182 280418

64 237 78 N/A N/A N/A N/A

A583L Yes 280793 283978

82 3186

1061

DNA topoisomerase II

Type IIA topoisomerase (DNA gyrase/topo II topoisomerase IV) B subunit

DNA gyrase B / DNA gyrase/topoisomerase IV subunit A

Similar to predicted protein [Physcomitrella patens subsp. patens] [0.0]

a584R No 281070 281366

56 297 98 N/A N/A N/A N/A

a585R No 281369 281608

62 240 79 N/A N/A N/A N/A

A586R Yes 282021 282248

62 228 75 N/A N/A N/A N/A

a587R No 282455 282799

60 345 114 N/A N/A N/A N/A

a588R No 283130 283480

42 351 116 N/A N/A N/A N/A

A589aL

Yes 283993 284145

80 153 50 N/A N/A N/A N/A

a589L No 283489 283719

56 231 76 N/A N/A N/A N/A

A590aL

Yes 285276 285401

44 126 41 N/A N/A N/A N/A

A590L Yes 284145 285191

84 1047

348 N/A N/A N/A N/A

a591L No 285284 285451

68 168 55 N/A N/A N/A N/A

A592R Yes 285435 285644

82 210 69 N/A N/A N/A N/A

A593R Yes 285694 286593

74 900 299 N/A N/A N/A N/A

a594R No 286070 286420

42 351 116 N/A N/A N/A N/A

a595L No 286404 286658

56 255 84 N/A N/A N/A N/A

A596R Yes 286625 287053

78 429 142 dCMP deaminase

Deoxycytidylate deaminase

Cytidine and deoxycytidylate deaminase zinc-binding region

Similar to deoxycytidylate deaminase [Phage phiJL001] [2e-23]

a597L No 286822 287118

54 297 98 N/A N/A N/A N/A

A598L Yes 287054 288145

84 1092

363 histidine decarboxylase

Glutamate decarboxylase and related PLP-dependent

Pyridoxal-dependent decarboxylase conserved domain

Similar to histidine decarboxylase [Nostoc punctiforme PCC 73102] [2e-58]

45

Page 46: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

proteins

a599R No 287117 287656

62 540 179 N/A N/A N/A N/A

a600aR

No 288026 288166

58 141 46 N/A N/A N/A N/A

a600R No 287904 288152

52 249 82 N/A N/A N/A N/A

A601R Yes 288225 288530

52 306 101 N/A N/A N/A N/A

A602L Yes 288531 289136

60 606 201 N/A N/A N/A N/A

A603aL

Yes 289390 289575

72 186 61 N/A N/A N/A N/A

a603bR

No 289445 289591

72 147 48 N/A N/A N/A N/A

a603R No 289083 289400

78 318 105 N/A N/A N/A N/A

A604L Yes 289592 289996

80 405 134 N/A N/A N/A N/A

A605L Yes 290112 290588

80 477 158 N/A N/A N/A N/A

a606L No 290452 290796

50 345 114 N/A N/A N/A N/A

A607R Yes 290633 291808

12 1176

391 N/A N/A N/A Similar to PREDICTED: similar to ankyrin 23/unc44 [Tribolium castaneum] [4e-09]

A609L Yes 291805 292974

82 1170

389 UDPglucose 6-dehydrogenase

UDP-N-acetyl-D-mannosaminuronate dehydrogenase

UDP-glucose/GDP-mannose dehydrogenase family NAD binding domain / UDP-glucose/GDP-mannose dehydrogenase family central domain / UDP-glucose/GDP-mannose dehydrogenase family UDP binding domain

Similar to UDP-glucose dehydrogenase [Vibrio furnissii CIP 102972] [1e-123]

a610R No 291859 292134

66 276 91 N/A N/A N/A N/A

a611R No 292639 292863

62 225 74 N/A N/A N/A N/A

A612L Yes 293059 293418

78 360 119 N/A Proteins containing SET domain

SET domain Similar to nuclear protein SET [Methanoculleus

46

Page 47: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

marisnigri JR1] [3e-13]

a613R No 293185 293448

54 264 87 N/A N/A N/A N/A

A614L Yes 293449 295182

86 1734

577 N/A N/A Protein kinase domain

N/A

a615R No 293917 294117

66 201 66 N/A N/A N/A N/A

a616R No 294857 295138

60 282 93 N/A N/A N/A N/A

A617R Yes 295254 296219

34 966 321 N/A N/A N/A N/A

A618L Yes 296241 296636

70 396 131 N/A N/A N/A N/A

A619L Yes 296653 297366

84 714 237 N/A N/A N/A N/A

a620aR

No 297535 297708

56 174 57 N/A N/A N/A N/A

A620L Yes 297436 297687

86 252 83 N/A N/A N/A N/A

a621aR

No 297918 298085

76 168 55 N/A N/A N/A N/A

a621bL

No 298062 298202

56 141 46 N/A N/A N/A N/A

A621L Yes 297719 298072

84 354 117 N/A N/A N/A N/A

A622L Yes 298138 299700

80 1563

520 N/A N/A Large eukaryotic DNA virus major capsid protein

Similar to hypothetical protein OsV5_190f [Ostreococcus virus OsV5] [2e-53]

A623aL

Yes 299923 300102

56 180 59 N/A N/A N/A N/A

A623L Yes 299757 299960

72 204 67 N/A N/A AN1-like Zinc finger

Similar to hypothetical protein SORBIDRAFT_01g005640 [Sorghum bicolor] [2e-07]

A624R Yes 299990 300355

32 366 121 N/A N/A Predicted membrane protein (DUF2177)

N/A

A625R Yes 300424 301722

72 1299

432 N/A Transposase and inactivated derivatives

Helix-turn-helix domain / Putative transposase DNA-binding domain

Similar to transposase IS605 OrfB family [Arthrospira maxima CS-328] [2e-20]

47

Page 48: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

a626aR

No 301617 301775

60 159 52 N/A N/A N/A N/A

a626L No 300882 301106

48 225 74 N/A N/A N/A N/A

a627aR

No 303152 303289

54 138 45 N/A N/A N/A N/A

A627R Yes 301818 303155

68 1338

445 N/A N/A N/A N/A

A628L Yes 303158 303451

82 294 97 N/A N/A N/A N/A

A629R Yes 303605 305920

74 2316

771 ribonucleoside-diphosphate reductase subunit M1

Ribonucleotide reductase alpha subunit

ATP cone domain / Ribonucleotide reductase all-alpha domain / Ribonucleotide reductase barrel domain

Similar to Ribonucleoside-diphosphate reductase large chain putative [Pediculus humanus corporis] [0.0]

a630R No 303900 304139

50 240 79 N/A N/A N/A N/A

A631L Yes 304106 304375

42 270 89 N/A N/A N/A N/A

a632L No 304759 304977

64 219 72 N/A N/A N/A N/A

A633R Yes 305955 306317

76 363 120 N/A N/A N/A N/A

a634aL

No 306761 306889

46 129 42 N/A N/A N/A N/A

A634L Yes 306327 306731

72 405 134 N/A N/A N/A N/A

a635aR

No 307064 307258

64 195 64 N/A N/A N/A N/A

A635R Yes 306774 307031

14 258 85 N/A N/A N/A N/A

A636R Yes 307085 307375

84 291 96 N/A N/A N/A N/A

a637aL

No 307792 307965

62 174 57 N/A N/A N/A N/A

A637R Yes 307446 307871

84 426 141 N/A N/A N/A N/A

A638R Yes 307915 308994

80 1080

359 agmatine deiminase

Peptidylarginine deiminase and related enzymes

Porphyromonas-type peptidyl-arginine deiminase

Similar to conserved hypothetical protein [Clostridiales bacterium 1_7_47_FAA] [1e-111]

a639L No 307938 308336

48 399 132 N/A N/A N/A N/A

a640R No 308165 308404

52 240 79 N/A N/A N/A N/A

a641L No 308469 308726

42 258 85 N/A N/A N/A N/A

A643R Yes 309013 310410

68 1398

465 N/A N/A N/A N/A

a644aR

No 310852 311022

48 171 56 N/A N/A N/A N/A

A644R Yes 310449 310970

84 522 173 N/A N/A N/A N/A

A645R Yes 311064 31143 84 372 123 N/A N/A N/A N/A

48

Page 49: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

5A646aL

Yes 312100 312225

84 126 41 N/A N/A N/A N/A

A646L Yes 311460 312011

78 552 183 N/A N/A N/A N/A

A647R Yes 312293 312862

26 570 189 N/A N/A Protein of unknown function (DUF1390)

N/A

a648L No 312869 313078

54 210 69 N/A N/A N/A N/A

A649R Yes 312890 313669

68 780 259 N/A N/A Protein of unknown function (DUF1390)

N/A

a650aL

No 313552 313674

10 123 40 N/A N/A N/A N/A

a650BL

No 313671 313847

72 177 58 N/A N/A N/A N/A

a650cR

No 313683 313865

40 183 60 N/A N/A N/A N/A

a650L No 313404 313667

72 264 87 N/A N/A N/A N/A

A651L Yes 313687 314379

80 693 230 N/A N/A GIY-YIG catalytic domain / NUMOD1 domain

Similar to putative SegB homing endonuclease [Staphylococcus phage PH15] [5e-14]

49

Page 50: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

Appendix 2. Example Python Scripts for Synthesizing Colouring Files

a. Categorizing into proteomic method of detection

b. Categorizing into major/minor genes

50

Page 51: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

c. Converting from Genbank to FASTA (note: non-functional outside of our specific input)

51

Page 52: traversep.files.wordpress.com€¦  · Web viewThere are currently over 200 different companies worldwide involved in the research and production of algae-based biofuels (Santhanam,

References

1. V. Agarwal et al., “PDBalert: automatic, recurrent remote homology tracking and protein structure prediction,” BMC structural biology 8, no. 1 (2008): 51.

2. R. Halim et al., “Oil extraction from microalgae for biodiesel production,” Bioresource technology (2010).

3. M W Karakashian, “Symbiosis in Paramecium Bursaria,” Symposia of the Society for Experimental Biology, no. 29 (1975): 145-173.

4. R. Radakovits et al., “Genetic Engineering of Algae for Enhanced Biofuel Production,” Eukaryotic Cell 9, no. 4 (February 2010): 486-501.

5. Santhanam, Narsi et al., “Oilgae Comprehensive Report Preview,” Internal Report [Oilgae]. Unpublished.

6. J. L Van Etten, L. C Lane, and R. H Meints, “Viruses and viruslike particles of eukaryotic algae.,” Microbiology and Molecular Biology Reviews 55, no. 4 (1991): 586.

7. L. P Villarreal and V. R DeFilippis, “A hypothesis for DNA viruses as the origin of eukaryotic replication proteins,” Journal of Virology 74, no. 15 (2000): 7079.

52