Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
CSC 498 Project: Analysis of Algal Virus Genomes Using Bioinformatics Tools
By
Philippe TraverseV00136885
1
Introduction
There are currently over 200 different companies worldwide involved in the research and
production of algae-based biofuels (Santhanam, 2011). Algae-based biofuels provide an interesting
avenue for a sustainable alternative to fossil fuels due to both their high lipid content compared to other
plants, and the fact that they require only seawater to grow, as opposed to all the land-based plants
currently being used to produce biofuel (Halim et al, 2010). One of the types of algae that has the
highest lipid content is Chlorella sp., which has been used for a long time as a model organism of algae in
the literature as it is easy to grow and well documented by other scientists (Radakovits et al, 2010).
Chlorella is a single-celled green algae which belongs to the family Chlorophyta, and can live
symbiotically with Paramecium bursaria by providing it with products of photosynthesis while the
Paramecium provides it with both motility and nutrients in times of low sunlight (Karakashian, 1975).
Within its Paramecium bursaria host, the Chlorella can be infected by a virus known as PBCV-1.
PBCV-1 is a large virus of the Phycodnaviridae family with a double stranded DNA genome of
333kbp. It is shaped as an icosahedral polygon measuring approximately 190nm in diameter (van Etten
et al, 2002). Based on phylogenetic analysis of PBCV-1 and other viruses, and in particular the DNA
polymerase gene encoded by them, viruses of single-celled algae such as PBCV-1 are thought to be some
of the oldest organisms on the planet at nearly 1.2
billion years old (Villareal et al, 2000). Research into
phylogenetic background as well as functional
annotation of genes in PBCV-1 is interesting for both
the evolutionary history of single-celled algae viruses
as well as their potential application as a vector for
genetic modification of Chlorella.
2Figure 1. Virus Phylogeny. Source: Villareal et al, 2000.
Methods
Sequence data of PBCV-1 was kindly provided from Dr David Dunigan and Dr Peter van Etten
from the University of Nebraska. This sequence data was put into the algae virus database hosted at
bioinformatics.ca and maintained by Dr Chris Upton at UVic. In addition, excel files from Dr Dunigan
were also provided which included information about parameters such as gene size, GC content, and
some functional annotations of PBCV-1’s genome (see Appendix 1).
The tools used at virology.ca each have access to seven different databases of viruses, organized
into phylogenetic groups: Adenoviridae, Algal viruses, Asfarviridae, Baculoviridae, Coronaviridae,
Herpesviridae, and Iridoviridae. The databases each contain several genomes of individual viruses, with
the genomes organized into genes. Genes that are homologous across several genomes in the database
are grouped into families; by selecting a single family, all the genes can be retrieved at once and
compared against one another. The main tool used to retrieve and organize sequences by gene family is
the Viral Orthologous Clusters (VOCs) tool.
Once genes have been retrieved from the database, there are several other tools that can be
used for comparison. One of the most common visual ways of comparing related sequences is using a
program called Dotter. In this program, a graph is made with one sequence along the x-axis and the
other along the y-axis, and each pixel in the graph is shaded according to how similar the two sequences
are at that position. The resulting graph shows how similar the two sequences are, with perfect
similarity being represented by a perfect diagonal. This tool is particularly useful for finding regions of a
gene or genome that have been duplicated or reversed, as these regions will appear as lines parallel or
perpendicular to the diagonal, respectively. The genome of PBCV-1 was compared to the genomes of
similar strains to PBCV-1, namely strains AR158 and NY2A.
3
Alongside whole genome comparison, a subset of the largest genes of PBCV-1 was also
uploaded to PDBAlert, which is a web-based tool hosted by the Max Planck Institute for Developmental
Biology in Tübingen, Germany. The user uploads protein or nucleotide sequences, and the queries are
continuously compared against their databases using HHPred. HHPred attempts to predict the tertiary
structure of the sequences being compared using hidden markov models to create a sequence profile
(Agarval et al, 2008). Sequence profiles are then compared to find regions of similarity, and genes with
similar tertiary structure to the query sequence are returned as search results, sorted by statistical
relevancy. This makes HHPred a very useful tool for predicting gene function as the function of every
protein is directly linked to its structure.
Finally, the genes of PBCV-1 were also classified into several categories: First, they were
separated into major and minor genes, based on both their size and their predicted function; Second,
they were separated into stage of expression in the virion life cycle; Lastly, they were categorized by
their method of detection by Dr Dunigan in the lab. According to the gene annotations provided by Dr
Dunigan (see Appendix 1), the genes were grouped into the above categories using simple python
scripts (see Appendix 2) and then assigned different colours for an easy visual representation of the
whole genome in the VGO genome map program.
4
Results
Dot plots of PBCV-1 against its closely related strains PBCV-AR158 and PBCV-NY2A reveal that
the strains have very few differences in their genomes whatsoever. This is somewhat expected,
considering that the three organisms are only different strains of the same virus (see Figure 3).
Interestingly, there are relatively few obvious regions that have been duplicated or reversed.
Sequences uploaded to PDBAlert yielded many interesting and statistically significant hits: For
example, gene A540L of PBCV-1 was found to be highly similar to protein folding chaperone molecules
3gud_A and 3gw6_A which each come from separate bacteriophage viruses. There were many other hits
as well which had already been identified by Dr Dunigan previously through his own BLAST searches;
however, some entirely new ones were found thanks to HHPred’s structural prediction technique. For a
more complete list of results from HHPred on PDBAlert, see table 1.
5
Figure 3. a)AR158vsNY2A b)PBCV1vsAR158 c)PBCV1vsNY2A
Gene Name Best HHPred Search ResultA540L Intramolecular Chaperone 3gud_A A561L Alginate Lyase vAL-1 A565R HP0958 Protein from Helicobacter pylori CCUG 17874A583L Topoisomerase II-DNA cleavage complex; DNA Gyrase Subunit BA629R Ribonucleoside-diphosphate reductase large chainA363R chromodomain-ATPase portion of Chd1 chromatin remodelerA456L Adeno-associated virus type 2 Rep40-ADP complexA512R No Useful HitA181/182R ChitinaseA185R DNA polymerase deltaA189/192R No Useful HitA219/222/226R Chondroitin synthaseA256/257L No Useful HitA014R No Useful HitA018L No Useful HitA025/027/029L No Useful HitA035L No Useful HitA044L No Useful HitA111/114R Glycosyltransferase (fucosyltransferase)A140/145R No Useful HitA002bL No Useful HitA002aR No Useful HitA002L No Useful Hita001L No Useful Hit
Table 1. HHPred Results. Rows highlighted in yellow represent genes whose function was previously unknown
6
Categorization of genes in PBCV-1 based on Dr Dunigan’s notes was successfully implemented
using python scripting techniques to produce colouring files. Colouring files simply contain the name of
each gene followed by a hex or RGB value corresponding to the desired colour. The resulting colouring
files were then used in the Genome Map window of VOCs to produce a visualization of the genome with
genes coloured according to their categorization (see figure 3). These visualizations were then
forwarded on to Dr Dunigan and Dr van Etten in Nebraska to be used in their manuscript describing their
research of PBCV1.
7
Figure 3. Visual Categorization of Genes by a)Functional Annotations b)Major/Minor genes c)Method of Detection
Conclusion
Algal viruses continue to be an interesting potential tool for genetic modification of algae, but
before these viruses can be manipulated more work still needs to be done in order to identify gene
functions. Bioinformatics can provide us with the tools necessary to do this; by comparing against a large
database of related sequences, regions of structural similarity can be found. Furthermore, the functional
annotation of genes enables researchers to use computer software to group these genes into
orthologous groups, allowing for much faster and easier identification of novel genes. Finally,
bioinformatics software is an invaluable asset in the visual representation of large contiguous sequences
and display of genetic information.
Single-celled green algae have the potential to one day change the world with their unique
ability to produce high amounts of lipids and other energy-rich molecules. With the vast amount of
sequence data available today in the many different databases across the web, the future is bright for
genomic comparisons of both viruses and living organisms alike.
8
Appendices
Appendix 1. Functional Annotations Provided by Dr Dunigan
Gene Major
Start End %AT
nt aa KEGG COG PfamA BLASTp
a001L No 280 549 68 270 89 N/A N/A N/A N/A
a002aR
No 1022 1177 66 156 51 N/A N/A N/A N/A
A002BL
Yes 1174 1335 56 162 53 N/A N/A N/A N/A
A002cR
Yes 1367 1513 14 147 48 N/A N/A N/A N/A
a002dR
No 1408 1539 12 132 43 N/A N/A N/A N/A
A002L Yes 512 1063 66 552 183 N/A N/A N/A N/A
A003R Yes 1792 2217 70 426 141 N/A N/A N/A N/A
a004aL
No 2221 2361 52 141 46 N/A N/A N/A N/A
a004L No 1891 2127 58 237 78 N/A N/A N/A N/A
A005R Yes 2288 3094 66 807 268 N/A FOG: Ankyrin repeat
Ankyrin repeat Similar to ankyrin 23/unc44 [Aedes aegypti] [2e-33]
a006L No 2393 2611 60 219 72 N/A N/A N/A N/A
A007/008L
Yes 3292 4701 84 1410
469 N/A FOG: Ankyrin repeat
Ankyrin repeat Similar to Pfs NACHT and Ankyrin domain protein [Aspergillus fumigatus Af293] [2e-55]
A009R Yes 4998 5735 84 738 245 N/A N/A Protein of unknown function (DUF1390)
N/A
a010aR
No 6837 7022 60 186 61 N/A N/A N/A N/A
A010R Yes 5768 6973 88 1206
401 N/A N/A Large eukaryotic DNA virus major capsid protein
Similar to hypothetical protein OTV1_165 [Ostreococcus tauri virus 1] [6e-42]
9
A011L Yes 6970 8181 84 1212
403 N/A N/A Large eukaryotic DNA virus major capsid protein
Similar to hypothetical protein OsV5_190f [Ostreococcus virus OsV5] [5e-44]
a012R No 7897 8190 64 294 97 N/A N/A N/A N/A
a013L No 8230 8571 58 342 113 N/A N/A N/A N/A
A014R Yes 8255 12364 40 4110
1369
N/A N/A N/A N/A
a015L No 8863 9411 48 549 182 N/A N/A N/A N/A
a016L No 9808 10500 64 693 230 N/A N/A N/A N/A
a017L No 11371 11571 54 201 66 N/A N/A N/A N/A
A018L Yes 12367 16374 80 4008
1335
N/A N/A Chlorovirus glycoprotein repeat
N/A
a019R No 12869 13519 60 651 216 N/A N/A N/A N/A
a020R No 13889 14143 62 255 84 N/A N/A N/A N/A
a021R No 14165 14632 68 468 155 N/A N/A N/A N/A
a022R No 14702 15031 58 330 109 N/A N/A N/A N/A
a023R No 15482 15706 70 225 74 N/A N/A N/A N/A
a024R No 15770 16132 64 363 120 N/A N/A N/A N/A
A025/027/029L
Yes 16432 20511 78 4080
1359
N/A N/A N/A N/A
a026R no 16916 17512 72 597 198 N/A N/A N/A N/A
a030R no 18185 18721 72 537 178 N/A N/A N/A N/A
a031R no 18755 19030 54 276 91 N/A N/A N/A N/A
a032R no 19232 19810 68 579 192 N/A N/A N/A N/A
a033R No 20189 20428 66 240 79 N/A N/A N/A N/A
A034R Yes 20572 21498 18 927 308 N/A N/A Protein kinase domain
N/A
A035L Yes 21500 23260 88 1761
586 N/A N/A N/A N/A
a036R No 22696 23037 56 342 113 N/A N/A N/A N/A
A037L Yes 23288 23605 68 318 105 N/A N/A N/A N/A
a038R no 23584 23823 68 240 79 N/A N/A N/A N/A
A039L Yes 23623 24078 80 456 151 S-phase kinase-associated protein 1
SCF ubiquitin ligase SKP1 component
Skp1 family tetramerisation domain / Skp1 family dimerisation domain
Similar to predicted protein [Physcomitrella patens subsp. patens] [2e-27]
a040L No 24087 24476 56 390 129 N/A N/A N/A N/A
A041R Yes 24148 25386 34 1239
412 N/A N/A N/A Similar to predicted protein [Populus trichocarpa] [6e-
10
10]
a042L No 24238 24504 32 267 88 N/A N/A N/A N/A
a043R No 24798 25025 64 228 75 N/A N/A N/A N/A
a044aL
No 25328 25492 66 165 54 N/A N/A N/A N/A
A044L Yes 25383 27182 84 1800
599 N/A ATPases of the AAA+ class
ATPase family associated with various cellular activities (AAA)
Similar to hypothetical protein [Trypanosoma cruzi strain CL Brener] [6e-21]
a045R No 26328 26540 52 213 70 N/A N/A N/A N/A
a046R No 26720 27220 68 501 166 N/A N/A N/A N/A
a047aL
No 27217 27387 58 171 56 N/A N/A N/A N/A
a047L no 26830 27033 64 204 67 N/A N/A N/A N/A
A048R Yes 27248 27619 18 372 123 N/A N/A N/A N/A
A049L Yes 27610 28269 70 660 219 glycerophosphoryl diester phosphodiesterase
Glycerophosphoryl diester phosphodiesterase
Glycerophosphoryl diester phosphodiesterase family
Similar to glycerophosphodiester phosphodiesterase [Caldivirga maquilingensis IC-167] [4e-30]
A050aL
Yes 28821 28961 84 141 46 N/A N/A N/A N/A
A050L Yes 28286 28711 84 426 141 N/A N/A Pyrimidine dimer DNA glycosylase
Similar to pyrimidine dimer DNA glycosylase/endonuclease V [Prochlorococcus marinus subsp. marinus str. CCMP1375] [2e-29]
A051L Yes 28984 29595 72 612 203 N/A N/A N/A Similar to hypothetical protein Bxe_A0110 [Burkholderia xenovorans LB400] [3e-48]
a052R No 29167 29418 62 252 83 N/A N/A N/A N/A
A053R Yes 29684 30775 62 109 363 D-lactate Lactate D-isomer Similar to
11
2 dehydrogenase
dehydrogenase and related dehydrogenases
specific 2-hydroxyacid dehydrogenase catalytic domain / D-isomer specific 2-hydroxyacid dehydrogenase NAD binding domain
fermentative D-lactate dehydrogenase NAD-dependent [Escherichia coli str. K-12 substr. MG1655] [1e-65]
a054L No 29800 30123 56 324 107 N/A N/A N/A N/A
a055L No 30008 30403 62 396 131 N/A N/A N/A N/A
a056L No 30289 30603 58 315 104 N/A N/A N/A N/A
A057aR
Yes 31650 32666 38 1017
338 N/A N/A N/A N/A
a057R No 31034 31447 74 414 137 N/A N/A N/A N/A
A058L Yes 30976 31623 58 648 215 N/A N/A N/A N/A
a059L No 31779 32003 52 225 74 N/A N/A N/A N/A
A060L Yes 32796 33500 86 705 234 N/A N/A N/A N/A
A061L Yes 33529 34158 74 630 209 N/A N/A Macrocin-O-methyltransferase (TylF)
Similar to macrocin-O-methyltransferase domain-containing protein [Thermomonospora curvata DSM 43183] [9e-31]
a062aR
No 34016 34198 54 183 60 N/A N/A N/A N/A
a062R No 33808 34179 60 372 123 N/A N/A N/A N/A
A063L Yes 34191 34889 96 699 232 N/A N/A N/A N/A
A064R Yes 34955 36872
a065L No 35265 35510 60 246 81 N/A N/A N/A N/A
a066aL
No 36851 36976 58 126 41 N/A N/A N/A N/A
a066L No 36607 36945 70 339 112 N/A N/A N/A N/A
A067R Yes 37043 37972 62 930 309 N/A N/A N/A N/A
a068L No 37375 37677 66 303 100 N/A N/A N/A N/A
a069L No 37433 37873 56 441 146 N/A N/A N/A N/A
a070L No 37798 38064 66 267 88 N/A N/A N/A N/A
A071R Yes 38003 39067 80 1065
354 N/A N/A N/A Similar to TPR repeat-containing protein [alpha proteobacterium
12
HIMB114] [8e-16]
a072L No 38164 38436 64 273 90 N/A N/A N/A N/A
a073L No 38417 38626 56 210 69 N/A N/A N/A N/A
a074L No 38660 38863 74 204 67 N/A N/A N/A N/A
a075aR
No 39784 39933 60 150 49 N/A N/A N/A N/A
A075bL
Yes 39950 40138 80 189 62 N/A N/A N/A N/A
A075cR
Yes 40014 40172 70 159 52 N/A N/A N/A N/A
A075L Yes 39078 39920 80 843 280 N/A N/A Exostosin family Similar to predicted protein [Chlamydomonas reinhardtii] [1e-15]
A076L Yes 40190 40501 50 312 103 N/A N/A N/A N/A
A077L Yes 40470 40745 76 276 91 N/A N/A N/A N/A
A078aL
Yes 41756 41878 92 123 40 N/A N/A N/A N/A
A078bR
Yes 42007 42183 52 177 58 N/A N/A N/A N/A
a078cL
No 42431 42565 62 135 44 N/A N/A N/A N/A
A078R Yes 40863 41759 74 897 298 N-carbamoylputrescine amidase
Predicted amidohydrolase
Carbon-nitrogen hydrolase
Similar to hydrolase carbon-nitrogen family putative [Heliobacterium modesticaldum Ice1] [4e-81]
A079R Yes 42467 43195 80 729 242 N/A N/A Protein of unknown function (DUF1390)
N/A
a080L No 42638 43045 58 408 135 N/A N/A N/A N/A
A081L Yes 43192 43761 76 570 189 N/A N/A N/A N/A
a082R No 43760 43975 60 216 71 N/A N/A N/A N/A
a083R No 43791 44063 4 273 90 N/A N/A N/A N/A
a084aL
no 44458 44580 46 123 40 N/A N/A N/A N/A
A084L Yes 43811 44377 76 567 188 N/A N/A N/A N/A
13
A085R Yes 44498 45226 66 729 242 N/A N/A 2OG-Fe(II) oxygenase superfamily
Similar to Procollagen-proline dioxygenase [Paenibacillus sp. JDR-2] [2e-18]
a086aL
No 45101 45229 54 129 42 N/A N/A N/A N/A
a086BL
No 46489 46632 72 144 47 N/A N/A N/A N/A
a086cL
No 46508 46639 14 132 43 N/A N/A N/A N/A
a086L No 44953 45195 60 243 80 N/A N/A N/A N/A
A087R Yes 45223 46623 32 1401
466 N/A N/A HNH endonuclease
N/A
A088R Yes 46755 47528 72 774 257 N/A N/A N/A N/A
a089aL
No 47826 47972 76 147 48 N/A N/A N/A N/A
A089R Yes 47628 47834 78 207 68 N/A N/A N/A N/A
A090R Yes 47851 48315 70 465 154 N/A N/A N/A N/A
a092/093aL
No 48547 48732 68 186 61 N/A N/A N/A N/A
A092/093L
Yes 48326 49627 72 1302
433 N/A N/A PBCV-specific basic adaptor domain
N/A
A094L Yes 49662 50756 74 1095
364 N/A Beta-glucanase/Beta-glucan synthetase
Glycosyl hydrolases family 16
Similar to glucan endo-13-beta-D-glucosidase [Shewanella frigidimarina NCIMB 400] [3e-21]
a095R No 50014 50232 52 219 72 N/A N/A N/A N/A
a096R No 50346 50549 68 204 67 N/A N/A N/A N/A
a097R No 50678 50893 68 216 71 N/A N/A N/A N/A
A098R Yes 50886 52592 76 1707
568 N/A Glycosyltransferases probably involved in cell wall biogenesis
Chitin synthase Similar to hypothetical protein BRAFLDRAFT_103600 [Branchiostoma floridae] [3e-58]
a099aL
No 52539 52682 60 144 47 N/A N/A N/A N/A
a099L No 52152 52394 62 243 80 N/A N/A N/A N/A
14
A100R Yes 52691 54478 60 1788
595 glucosamine--fructose-6-phosphate aminotransferase (isomerizing)
Glucosamine 6-phosphate synthetase contains amidotransferase and phosphosugar isomerase domains
Glutamine amidotransferases class-II / SIS domain
Similar to glutamine--fructose-6-phosphate transaminase [Methylibium petroleiphilum PM1] [1e-156]
a101L No 52925 53188 54 264 87 N/A N/A N/A N/A
a102L No 54468 54974 56 507 168 N/A N/A N/A N/A
A103R Yes 54622 55614 84 993 330 N/A mRNA capping enzyme guanylyltransferase (alpha) subunit
mRNA capping enzyme catalytic domain / mRNA capping enzyme C-terminal domain
Similar to hypothetical protein OTV1_156 [Ostreococcus tauri virus 1] [1e-26]
a104L No 55054 55344 58 291 96 N/A N/A N/A N/A
A105L Yes 55626 56480 84 855 284 ubiquitin carboxyl-terminal hydrolase 8
N/A Ubiquitin carboxyl-terminal hydrolase
Similar to GM11729 [Drosophila sechellia] [9e-12]
a106R No 55770 56168 56 399 132 N/A N/A N/A N/A
A107L Yes 56516 57388 80 873 290 N/A N/A Transcription factor TFIIB repeat
Similar to PREDICTED: hypothetical protein [Vitis vinifera] [1e-09]
A108aR
Yes 57212 57400 52 189 62 N/A N/A N/A N/A
A108bL
Yes 57471 57977 84 507 168 N/A N/A GIY-YIG catalytic domain
Similar to hypothetical protein VIBHAR_p08226 [Vibrio harveyi ATCC BAA-1116] [1e-07]
a108R No 56933 57148 60 216 71 N/A N/A N/A N/A
a110L No 58082 58390 50 309 102 N/A N/A N/A N/A
15
A111/114R
Yes 58098 60680 80 2583
860 N/A N/A Glycosyltransferase family 10 (fucosyltransferase)
Similar to glycosyl transferase [Cyanothece sp. ATCC 51142] [1e-35]
a112L No 58775 59068 42 294 97 N/A N/A N/A N/A
a113L No 58809 59093 34 285 94 N/A N/A N/A N/A
a115L No 59265 59495 58 231 76 N/A N/A N/A N/A
a116R No 59398 59643 52 246 81 N/A N/A N/A N/A
a117L No 59874 60266 68 393 130 N/A N/A N/A N/A
A118R Yes 60726 61763 74 1038
345 GDPmannose 4-6-dehydratase
Nucleoside-diphosphate-sugar epimerases
NAD dependent epimerase/dehydratase family
Similar to GDP-D-mannose dehydratase [Yersinia pseudotuberculosis IP 32953] [1e-113]
a119L No 61628 62029 70 402 133 N/A N/A N/A N/A
a120L No 61675 61947 50 273 90 N/A N/A N/A N/A
A121R Yes 61780 62094 72 315 104 N/A N/A N/A Similar to hypothetical protein [Tetrahymena thermophila SB210] [2e-15]
A122/123aR
Yes 66133 66258 68 126 41 N/A N/A N/A N/A
a122/123bL
No 66203 66346 62 144 47 N/A N/A N/A N/A
A122/123cL
Yes 66407 66541 64 135 44 N/A N/A N/A N/A
A122/123R
Yes 62145 66176 88 4032
1343
N/A Autotransporter adhesin
Chlorovirus glycoprotein repeat / Domain of unknown function (DUF3476)
Similar to hypothetical protein Epulo_01906 [Epulopiscium sp. N.t. morphotype B] [2e-25]
a124L No 66517 66771 60 255 84 N/A N/A N/A N/A
16
A125L Yes 66178 67120 84 543 180 N/A DNA-directed RNA polymerase subunit M/Transcription elongation factor TFIIS
Transcription factor S-II (TFIIS) central domain / Transcription factor S-II (TFIIS)
Similar to unnamed protein product [Kluyveromyces lactis] [3e-17]
a126R No 66620 66820 24 201 66 N/A N/A N/A N/A
A127R Yes 67154 67891 14 738 245 N/A N/A N/A Similar to hypothetical protein OTV1_142 [Ostreococcus tauri virus 1] [4e-20]
a128L no 67510 67770 56 261 86 N/A N/A N/A N/A
A129R Yes 67952 69028 82 1077
358 N/A N/A N/A N/A
A130aR
Yes 69261 69404 62 144 47 N/A N/A N/A N/A
a130bR
No 69344 69475 58 132 43 N/A N/A N/A N/A
A130R Yes 69049 69366 84 318 105 N/A N/A N/A N/A
A131L Yes 69359 69769 68 411 136 N/A N/A N/A N/A
a132R No 69533 69805 54 273 90 N/A N/A N/A N/A
A133R Yes 69960 70583 86 624 207 N/A N/A Thylakoid formation protein
Similar to inositol phosphatase-like protein [Chlamydomonas reinhardtii] [1e-24]
A134L Yes 70558 71055 92 498 165 N/A N/A GIY-YIG catalytic domain
N/A
A135L Yes 71055 71603 74 549 182 N/A N/A N/A N/A
A136R Yes 71134 71574 30 441 146 N/A N/A N/A N/A
A137R Yes 71625 71843 44 219 72 N/A N/A N/A N/A
a138aR
No 72604 72747 72 144 47 N/A N/A N/A N/A
A138R Yes 71880 72701 84 822 273 N/A N/A N/A N/A
A139L Yes 72698 73153 60 456 151 N/A N/A N/A N/A
A140/145R
Yes 73107 76490 90 3384
1127
N/A N/A N/A N/A
a142L No 74474 74875 48 402 133 N/A N/A N/A N/A
a144L No 74244 75740 64 1497
498 N/A N/A N/A N/A
a146L No 75752 76006 68 255 84 N/A N/A N/A N/A
a147L No 75882 76655 56 774 257 N/A N/A N/A N/A
A148R Yes 76540 76872 82 333 110 N/A N/A N/A N/A
17
a149L No 76642 77052 62 411 136 N/A N/A N/A N/A
A150L Yes 76856 77314 74 459 152 N/A N/A N/A N/A
A151R Yes 77397 77804 52 408 135 N/A N/A N/A N/A
a152L No 77822 78094 58 273 90 N/A N/A N/A N/A
A153R Yes 77880 79259 80 1380
459 N/A DNA or RNA helicases of superfamily II
Type III restriction enzyme res subunit
Similar to hypothetical protein OsV5_067f [Ostreococcus virus OsV5] [2e-61]
A154L Yes 79256 80299 70 1044
347 N/A N/A N/A Similar to EsV-1-7 [Ectocarpus siliculosus virus 1] [9e-11]
a155R No 79412 79696 52 285 94 N/A N/A N/A N/A
a156L No 79885 80217 52 333 110 N/A N/A N/A N/A
A157L Yes 80393 80725 80 333 110 N/A N/A N/A N/A
A158L Yes 80770 81084 68 315 104 N/A N/A N/A N/A
A159R Yes 81104 81436 36 333 110 N/A N/A N/A N/A
a160L No 81257 81751 68 495 164 N/A N/A N/A N/A
A161R Yes 81345 81716 58 372 123 N/A N/A N/A N/A
a162aR
No 82831 82959 68 129 42 N/A N/A N/A N/A
A162L Yes 81717 82952 78 1236
411 N/A N/A N/A N/A
A163R Yes 82995 84296 12 1302
433 N/A N/A Ligand-gated ion channel
N/A
A164aR
Yes 84314 84493 14 180 59 N/A N/A N/A N/A
a164L No 83873 84277 80 405 134 N/A N/A N/A N/A
A165aL
Yes 84835 85326 70 492 163 N/A N/A N/A N/A
A165L Yes 84486 84935 60 450 149 N/A N/A N/A N/A
A166R Yes 85431 86237 80 807 268 N/A N/A YqaJ-like viral recombinase domain
Similar to hypothetical protein OsV5_146f [Ostreococcus virus OsV5] [2e-33]
a167aR
No 86128 86283 68 156 51 N/A N/A N/A N/A
a167L No 85677 85880 64 204 67 N/A N/A N/A N/A
A168R Yes 86276 86776 86 501 166 N/A N/A N/A N/A
A169R Yes 86904 87875 74 972 323 aspartate carbamoyltransferase catalytic subunit
Ornithine carbamoyltransferase
Aspartate/ornithine carbamoyltransferase
Similar to LOC100282373 [Zea mays] [7e-78]
18
carbamoyl-P binding domain / Aspartate/ornithine carbamoyltransferase Asp/Orn binding domain
a170L No 86990 87259 46 270 89 N/A N/A N/A N/A
A171R Yes 87904 89067 84 1164
387 N/A N/A N/A Similar to predicted protein [Physcomitrella patens subsp. patens] [9e-11]
A172aL
Yes 88944 89111 60 168 55 N/A N/A N/A N/A
a172L No 88627 88947 52 321 106 N/A N/A N/A N/A
A173L Yes 89068 89934 68 867 288 N/A Predicted esterase of the alpha-beta hydrolase superfamily
Patatin-like phospholipase
Similar to hypothetical protein BRAFLDRAFT_104465 [Branchiostoma floridae] [8e-20]
A174L Yes 89941 90138 76 198 65 N/A N/A N/A N/A
A175R Yes 89974 90183 22 210 69 N/A N/A N/A N/A
A176aL
Yes 90415 90537 64 123 40 N/A N/A N/A N/A
A176L Yes 90180 90413 62 234 77 N/A N/A PBCV-specific basic adaptor domain
N/A
A177R Yes 90639 91379 84 741 246 N/A N/A Protein of unknown function (DUF1390)
N/A
a178L No 90915 91160 66 246 81 N/A N/A N/A N/A
a179L No 91201 91530 62 330 109 N/A N/A N/A N/A
19
A180R Yes 91466 91792 78 327 108 N/A Predicted RNA-binding protein homologous to eukaryotic snRNP
Domain of unknown function (DUF814)
Similar to fibronectin-binding A domain-containing protein [Fervidobacterium nodosum Rt17-B1] [1e-12]
A181/182R
Yes 91810 94302 80 2493
830 bifunctional chitinase/lysozyme
N/A Cellulose binding domain
Similar to fibronectin type III domain protein [Edwardsiella ictaluri 93-146] [7e-47]
a183L No 93624 93854 60 231 76 N/A N/A N/A N/A
a184L No 94120 94470 76 351 116 N/A N/A N/A N/A
A185R Yes 94548 97390 40 2742
913 DNA polymerase delta subunit 1
DNA polymerase elongation subunit (family B)
DNA polymerase family B exonuclease domain / DNA polymerase family B
Similar to hypothetical protein OsV5_240f [Ostreococcus virus OsV5] [1e-143]
a186L No 94966 95220 46 255 84 N/A N/A N/A N/A
a187L No 95295 96038 58 744 247 N/A N/A N/A N/A
A188aR
Yes 96944 97390 58 447 148 N/A DNA polymerase elongation subunit (family B)
DNA polymerase family B
Similar to hypothetical protein OTV1_208 [Ostreococcus tauri virus 1] [4e-09]
a188bR
No 97258 97398 72 141 46 N/A N/A N/A N/A
a188L No 96724 96957 56 234 77 N/A N/A N/A N/A
A189/192R
Yes 97433 101332
28 3900
1299
N/A N/A N/A N/A
a190L no 97525 97743 52 219 72 N/A N/A N/A N/A
A193L Yes 101340 102128
72 789 262 proliferating cell nuclear antigen
DNA polymerase sliding clamp subunit (PCNA homolog)
Proliferating cell nuclear antigen N-terminal domain / Proliferating cell nuclear antigen C-terminal domain
Similar to hypothetical protein OsV5_115f [Ostreococcus virus OsV5] [4e-36]
20
a194R No 101887 102135
56 249 82 N/A N/A N/A N/A
a195R No 102102 102350
66 249 82 N/A N/A N/A N/A
A196L Yes 102156 102614
90 459 152 N/A N/A N/A N/A
a197R No 102340 102627
62 288 95 N/A N/A N/A N/A
a198R No 102399 102623
62 225 74 N/A N/A N/A N/A
A199R Yes 102663 102968
6 306 101 N/A N/A N/A N/A
a200aR
No 103320 103484
52 165 54 N/A N/A N/A N/A
A200R Yes 103057 103413
76 357 118 N/A N/A Cytidine and deoxycytidylate deaminase zinc-binding region
N/A
A201aL
Yes 103546 103722
18 177 58 N/A N/A N/A N/A
A201L Yes 103422 103706
74 285 94 N/A N/A N/A N/A
A202L Yes 103726 104067
90 342 113 N/A N/A N/A N/A
A203R Yes 104134 104784
26 651 216 N/A N/A N/A N/A
a204aL
No 104750 104911
58 162 53 N/A N/A N/A N/A
a204L No 104256 104495
62 240 79 N/A N/A N/A N/A
A205R Yes 104811 105431
80 621 206 N/A N/A PBCV-specific basic adaptor domain
N/A
a206L No 105024 105281
64 258 85 N/A N/A N/A N/A
A207R Yes 105509 106627
86 1119
372 ornithine decarboxylase
Arginine decarboxylase (spermidine biosynthesis)
Pyridoxal-dependent decarboxylase pyridoxal binding domain / Pyridoxal-dependent decarboxylase C-terminal sheet domain
Similar to ornithine decarboxylase [Bos taurus] [2e-73]
A208R Yes 106658 107593
76 936 311 N/A N/A N/A N/A
a210L No 107125 107616
64 492 163 N/A N/A N/A N/A
a211R No 107223 107546
56 324 107 N/A N/A N/A N/A
A212R Yes 107615 107782
78 168 55 N/A N/A N/A N/A
a213a No 108261 10840 56 147 48 N/A N/A N/A N/A
21
L 7A213L Yes 107779 10822
584 447 148 N/A N/A N/A N/A
a214aL
No 108548 108691
46 144 47 N/A N/A N/A N/A
A214L Yes 108265 108672
76 408 135 N/A N/A N/A N/A
A215L Yes 108746 109711
82 966 321 N/A N/A N/A Similar to hypothetical protein CC1G_06067 [Coprinopsis cinerea okayama7#130] [1e-08]
a216R No 109263 109586
62 324 107 N/A N/A N/A N/A
A217L Yes 109733 110917
70 1185
394 N/A N/A N/A N/A
a218L No 110562 110927
18 366 121 N/A N/A N/A N/A
A219/222/226R
Yes 110893 112926
58 2034
677 N/A Glycosyltransferases probably involved in cell wall biogenesis
N/A Similar to putative transmembrane cellulose synthase [Rhizobium leguminosarum bv. viciae 3841] [2e-82]
a220L No 110995 111384
50 390 129 N/A N/A N/A N/A
a223aL
No 111891 112169
64 279 92 N/A N/A N/A N/A
a223R No 111638 111850
60 213 70 N/A N/A N/A N/A
a224L No 112197 112463
54 267 88 N/A N/A N/A N/A
a225L No 112525 112797
66 273 90 N/A N/A N/A N/A
A227L Yes 112931 113344
80 414 137 N/A N/A N/A N/A
a228aR
No 113268 113402
64 135 44 N/A N/A N/A N/A
a228R No 112989 113219
60 231 76 N/A N/A N/A N/A
A229L Yes 113365 113598
82 234 77 N/A N/A N/A N/A
A230R Yes 113623 114213
36 591 196 N/A N/A N/A N/A
A231L Yes 114214 115365
80 1152
383 N/A N/A N/A N/A
a232aR
No 115274 115408
52 135 44 N/A N/A N/A N/A
a232R No 115006 115269
60 264 87 N/A N/A N/A N/A
A233R Yes 115442 115780
42 339 112 N/A N/A N/A N/A
A234L Yes 115777 116103
72 327 108 N/A N/A N/A N/A
22
a235R No 115879 116130
54 252 83 N/A N/A N/A N/A
a236L No 116127 116390
68 264 87 N/A N/A N/A N/A
A237R Yes 116167 117723
22 1557
518 N/A Homospermidine synthase
Saccharopine dehydrogenase
Similar to homospermidine synthase [Opitutus terrae PB90-1] [1e-83]
a238L no 117246 117590
46 345 114 N/A N/A N/A N/A
A239L Yes 117726 118172
72 447 148 N/A N/A N/A N/A
A240aL
Yes 118272 118457
72 186 61 N/A N/A N/A N/A
a240bR
No 118336 118476
68 141 46 N/A N/A N/A N/A
a240cL
No 118447 118611
52 165 54 N/A N/A N/A N/A
a240dL
No 118550 118696
60 147 48 N/A N/A N/A N/A
a240L No 117770 117967
48 198 65 N/A N/A N/A N/A
A241R Yes 118556 120733
78 2178
725 ATP-dependent RNA helicase DOB1
Lhr-like helicases
DEAD/DEAH box helicase / DSHCT (NUC185) domain
Similar to hypothetical protein [Monosiga brevicollis MX1] [2e-88]
a242L No 119090 119443
62 354 117 N/A N/A N/A N/A
A243R Yes 120839 121747
92 909 302 N/A N/A N/A N/A
a244L No 121832 122065
44 234 77 N/A N/A N/A N/A
A245R Yes 121875 122438
80 564 187 Cu/Zn superoxide dismutase
Cu/Zn superoxide dismutase
Copper/zinc superoxide dismutase (SODC)
Similar to superoxide dismutase [cu-zn] [7e-42]
A246R Yes 122878 123336
56 459 152 N/A N/A Barwin family Similar to PR4 (PATHOGENESIS-RELATED 4) chitin binding [Arabidopsis thaliana] [8e-10]
A246aR
Yes 122469 122810
82 342 113 N/A N/A N/A N/A
23
A247R Yes 123427 124578
84 1152
383 N/A FOG: Ankyrin repeat
Ankyrin repeat Similar to hypothetical protein Aasi_1435 [Candidatus Amoebophilus asiaticus 5a2] [9e-26]
A248R Yes 124712 125638
70 927 308 calcium/calmodulin-dependent protein kinase I
Serine/threonine protein kinase
Protein kinase domain
Similar to hypothetical protein [Paramecium tetraurelia strain d4-2] [2e-26]
a249L No 125191 125499
70 309 102 N/A N/A N/A N/A
a250aR
No 125816 125971
54 156 51 N/A N/A N/A N/A
A250R Yes 125670 125954
82 285 94 N/A N/A Ion channel Similar to EsV-1-223 [Ectocarpus siliculosus virus 1] [6e-07]
a251aL
No 126233 126658
56 426 141 N/A N/A N/A N/A
a251bL
No 126958 127113
80 156 51 N/A N/A N/A N/A
A251R Yes 126048 127028
82 981 326 N/A Adenine-specific DNA methylase
D12 class N6 adenine-specific DNA methyltransferase
Similar to Site-specific DNA-methyltransferase (adenine-specific) [Dokdonia donghaensis MED134] [2e-50]
a252aL
No 128124 128378
56 255 84 N/A N/A N/A N/A
a252bL
No 127966 128118
70 153 50 N/A N/A N/A N/A
A252R Yes 127028 128056
74 1029
342 N/A N/A N/A N/A
a253aR
No 128500 128634
56 135 44 N/A N/A N/A N/A
A253R Yes 128133 128585
62 453 150 N/A N/A N/A N/A
A254R Yes 128650 129126
72 477 158 N/A N/A N/A N/A
A255R Yes 129169 129615
86 447 148 N/A N/A N/A N/A
A256/257L
Yes 129608 132121
68 2514
837 N/A N/A N/A N/A
a258R No 131760 13201 60 252 83 N/A N/A N/A N/A
24
1a259aR
No 132433 132675
68 243 80 N/A N/A N/A N/A
a259bR
No 132647 132847
26 201 66 N/A N/A N/A N/A
A259L Yes 132088 132612
78 525 174 N/A N/A N/A N/A
A260aR
Yes 133700 133897
60 198 65 N/A N/A N/A N/A
a260bR
No 133915 134184
54 270 89 N/A N/A N/A N/A
A260R Yes 132729 134246
78 1518
505 N/A Chitinase Glycosyl hydrolases family 18
Similar to hypothetical protein FG10729.1 [Gibberella zeae PH-1] [1e-48]
A261R Yes 134281 134898
88 618 205 N/A N/A N/A N/A
A262/263L
Yes 134966 135736
64 771 256 N/A N/A N/A N/A
a264R No 135649 135852
84 204 67 N/A N/A N/A N/A
A265L Yes 135838 136587
90 750 249 N/A N/A Poxvirus A22 protein
Similar to hypothetical protein OsV5_058f [Ostreococcus virus OsV5] [5e-10]
a266aR
No 136492 136617
58 126 41 N/A N/A N/A N/A
a266R No 136197 136448
68 252 83 N/A N/A N/A N/A
A267L Yes 136638 137582
80 945 314 N/A N/A N/A Similar to hypothetical protein MIMI_R423 [Acanthamoeba polyphaga mimivirus] [2e-15]
a268R No 137001 137216
50 216 71 N/A N/A N/A N/A
a269R No 137259 137465
62 207 68 N/A N/A N/A N/A
a270R No 137775 138056
80 282 93 N/A N/A N/A N/A
A271L Yes 137759 138583
74 825 274 N/A Lysophospholipase
Putative lysophospholipase
Similar to putative hydrolase [marine gamma proteobacterium HTCC2080] [2e-13]
a272aR
no 138380 138619
58 240 79 N/A N/A N/A N/A
a272R No 138026 13822 54 204 67 N/A N/A N/A N/A
25
9A273L Yes 138638 13905
488 417 138 N/A N/A Domain of
unknown function (DUF305)
Similar to hypothetical protein MIMI_L153 [Acanthamoeba polyphaga mimivirus] [4e-16]
A274R Yes 139119 139910
16 792 263 N/A N/A N/A N/A
A275R Yes 140138 140896
80 759 252 N/A N/A Protein of unknown function (DUF1390)
N/A
a276L No 140462 140746
60 285 94 N/A N/A N/A N/A
A277L Yes 140812 141723
70 912 303 calcium/calmodulin-dependent protein kinase (CaM kinase) II
Serine/threonine protein kinase
Protein kinase domain
Similar to hypothetical protein DDB_G0289119 [Dictyostelium discoideum AX4] [1e-21]
A278L Yes 141759 143591
80 1833
610 N/A N/A Protein kinase domain / PBCV-specific basic adaptor domain
Similar to protein kinase Fuz7 [Ustilago maydis 521] [8e-06]
a279R No 142146 142364
68 219 72 N/A N/A N/A N/A
a280R No 143242 143523
56 282 93 N/A N/A N/A N/A
a281R No 143601 144320
52 720 239 N/A N/A N/A N/A
A282L Yes 143630 145339
60 1710
569 N/A N/A Protein kinase domain / PBCV-specific basic adaptor domain
Similar to hypothetical protein MGL_1199 [Malassezia globosa CBS 7966] [1e-06]
a283L No 144085 144381
52 297 98 N/A N/A N/A N/A
26
A284L Yes 145395 146234
82 840 279 choloylglycine hydrolase
Penicillin V acylase and related amidases
Linear amide C-N hydrolases choloylglycine hydrolase family
Similar to penicillin amidase [Planctomyces limnophilus DSM 3776] [1e-41]
a285R No 146062 146277
64 216 71 N/A N/A N/A N/A
a286aR
No 147401 147574
76 174 57 N/A N/A N/A N/A
a286bL
No 147460 147633
52 174 57 N/A N/A N/A N/A
A286R Yes 146261 147397
30 1137
378 N/A N/A N/A N/A
A287R Yes 147481 148287
74 807 268 N/A N/A GIY-YIG catalytic domain / NUMOD1 domain
Similar to hypothetical protein RUMHYD_01972 [Blautia hydrogenotrophica DSM 10507] [1e-17]
a288L No 147552 147818
62 267 88 N/A N/A N/A N/A
A289L Yes 148278 149129
74 852 283 MAP/microtubule affinity-regulating kinase
Serine/threonine protein kinase
Protein kinase domain
Similar to hypothetical protein [Paramecium tetraurelia strain d4-2] [2e-23]
a290R No 148366 148773
48 408 135 N/A N/A N/A N/A
a291R No 148920 149189
68 270 89 N/A N/A N/A N/A
A292L Yes 149247 150233
80 987 328 chitosanase N/A Glycosyl hydrolase family 46
Similar to Chitosanase [Streptosporangium roseum DSM 43021] [1e-16]
a293R No 149329 149835
56 507 168 N/A N/A N/A N/A
a294R No 149926 150150
54 225 74 N/A N/A N/A N/A
27
A295L Yes 150236 151189
84 954 317 GDP-L-fucose synthase
dTDP-D-glucose 46-dehydratase
NAD dependent epimerase/dehydratase family
Similar to NAD-dependent epimerase/dehydratase [Spirosoma linguale DSM 74] [1e-110]
A296R Yes 151233 151706
14 474 157 N/A N/A N/A N/A
a297aL
No 152133 152261
40 129 42 N/A N/A N/A N/A
A297L Yes 151703 152236
88 534 177 N/A N/A N/A Similar to Pc22g24690 [Penicillium chrysogenum Wisconsin 54-1255] [4e-07]
A298L Yes 152334 153011
72 678 225 N/A N/A N/A Similar to hypothetical protein bglu_2g06720 [Burkholderia glumae BGR1] [2e-12]
a299R No 152521 153039
48 519 172 N/A N/A N/A N/A
a300R No 152796 153026
58 231 76 N/A N/A N/A N/A
A301L Yes 153032 153757
80 726 241 N/A N/A N/A N/A
a302R No 153359 153781
50 423 140 N/A N/A N/A N/A
a303L No 153636 153974
68 339 112 N/A N/A N/A N/A
A304R Yes 153814 154050
8 237 78 N/A N/A N/A N/A
A305L Yes 154040 154654
66 615 204 dual specificity phosphatase
N/A Dual specificity phosphatase catalytic domain
Similar to dual specificity protein phosphatase 7 putative [Aedes aegypti] [2e-12]
A306L Yes 154680 154940
82 261 86 N/A N/A N/A N/A
a307aR
No 154921 155163
68 243 80 N/A N/A N/A N/A
a307R No 154685 154921
30 237 78 N/A N/A N/A N/A
A308L Yes 154981 155340
72 360 119 N/A N/A N/A N/A
a309R No 155242 15537 64 129 42 N/A N/A N/A N/A
28
0A310L Yes 155484 15599
688 513 170 N/A N/A N/A N/A
a311R No 155572 155823
54 252 83 N/A N/A N/A N/A
a312aR
No 156732 156890
42 159 52 N/A N/A N/A N/A
A312L Yes 156056 156772
66 717 238 N/A N/A N/A Similar to hypothetical protein OsV5_171r [Ostreococcus virus OsV5] [5e-21]
a313aR
No 157107 157268
64 162 53 N/A N/A N/A N/A
A313L Yes 157000 157215
78 216 71 N/A N/A N/A N/A
A314R Yes 157306 157548
68 243 80 N/A N/A N/A N/A
A315L Yes 157545 158285
80 741 246 N/A N/A GIY-YIG catalytic domain / NUMOD1 domain
Similar to hypothetical protein RUMHYD_01972 [Blautia hydrogenotrophica DSM 10507] [1e-18]
A316R Yes 158346 159662
18 1317
438 N/A N/A N/A Similar to hypothetical protein LNTAR_07704 [Lentisphaera araneosa HTCC2155] [3e-14]
a317L No 158400 158855
50 456 151 N/A N/A N/A N/A
A318R Yes 159613 159786
70 174 57 N/A N/A N/A N/A
a319L No 159131 159385
60 255 84 N/A N/A N/A N/A
A320R Yes 159641 160060
68 420 139 N/A N/A N/A N/A
a321aR
No 160404 160541
56 138 45 N/A N/A N/A N/A
A321R Yes 160112 160471
90 360 119 N/A N/A N/A N/A
A322L Yes 160547 161077
90 531 176 N/A N/A N/A N/A
a323R No 160674 161072
52 399 132 N/A N/A N/A N/A
29
A324L Yes 161118 162479
68 1362
453 N/A N/A N/A Similar to hypothetical protein OsV5_073r [Ostreococcus virus OsV5] [4e-27]
a325R No 161385 161609
44 225 74 N/A N/A N/A N/A
A326L Yes 162545 163174
66 630 209 N/A N/A N/A N/A
a327R No 163110 163436
60 327 108 N/A N/A N/A N/A
A328L Yes 163205 164272
88 1068
355 N/A N/A N/A N/A
a329aL
No 165515 165649
36 0 96 N/A N/A N/A N/A
A329bR
Yes 166040 166225
80 186 61 N/A N/A N/A N/A
a329cR
No 166204 166389
74 186 61 N/A N/A N/A N/A
A329R Yes 164362 164652
64 291 96 N/A N/A N/A N/A
A330R Yes 166437 167735
84 1299
432 N/A FOG: Ankyrin repeat
Ankyrin repeat Similar to ankyrin repeat protein [Trichomonas vaginalis G3] [7e-31]
a331L No 167096 167299
62 204 67 N/A N/A N/A N/A
A333L Yes 167765 168937
92 1173
390 N/A N/A Chitin binding domain / Chitin binding Peritrophin-A domain
Similar to conserved hypothetical protein [Culex quinquefasciatus] [2e-12]
a334L No 168037 168306
54 270 89 N/A N/A N/A N/A
a335R No 168067 168291
58 225 74 N/A N/A N/A N/A
a336R No 168608 168865
60 258 85 N/A N/A N/A N/A
A337L Yes 169024 169632
76 609 202 N/A N/A N/A N/A
A339L Yes 169647 169832
76 186 61 N/A N/A N/A N/A
a340R No 169567 169845
62 279 92 N/A N/A N/A N/A
a341aR
No 170276 170473
52 198 65 N/A N/A N/A N/A
a341bR
No 170404 170562
8 159 52 N/A N/A N/A N/A
A341L Yes 169952 170359
76 408 135 N/A N/A N/A N/A
30
A342L Yes 170477 172207
74 1731
576 N/A N/A N/A Similar to hypothetical protein OTV1_098 [Ostreococcus tauri virus 1] [6e-12]
a343R No 170839 171099
56 261 86 N/A N/A N/A N/A
a344R No 170921 171268
44 348 115 N/A N/A N/A N/A
a345L No 171136 171426
54 291 96 N/A N/A N/A N/A
a346L No 172288 172515
48 228 75 N/A N/A N/A N/A
a347L No 172376 172588
40 213 70 N/A N/A N/A N/A
a348aR
No 172752 172907
52 156 51 N/A N/A N/A N/A
A348R Yes 172385 172864
66 480 159 N/A N/A N/A N/A
A349L Yes 172865 173413
66 549 182 N/A N/A N/A N/A
a350aR
No 173627 173761
52 135 44 N/A N/A N/A N/A
A350R Yes 173289 173657
64 369 122 N/A N/A Protein of unknown function (DUF3605)
N/A
a351aL
No 174738 174869
36 132 43 N/A N/A N/A N/A
A351L Yes 173636 174712
80 1077
358 N/A N/A GIY-YIG catalytic domain
N/A
A352L Yes 174838 175461
90 624 207 N/A N/A N/A N/A
a353R No 175155 175361
48 207 68 N/A N/A N/A N/A
A354R Yes 175590 176627
88 1038
345 N/A N/A N/A Similar to endonuclease [Streptococcus pyogenes MGAS8232] [1e-07]
a355L No 176054 176272
64 219 72 N/A N/A N/A N/A
A356R Yes 176506 176829
68 324 107 N/A N/A N/A N/A
A357L Yes 176631 177617
82 987 328 N/A N/A N/A N/A
a358R No 177271 177585
54 315 104 N/A N/A N/A N/A
a359L No 177293 177460
72 168 55 N/A N/A N/A N/A
A363R Yes 177696 181235
48 3540
1179
N/A N/A N/A Similar to D6/D11-like helicase [Marseillevirus] [6e-06]
a364L No 179408 179611
56 204 67 N/A N/A N/A N/A
31
a365L No 180919 181209
76 291 96 N/A N/A N/A N/A
A366L Yes 181239 182006
72 768 255 N/A N/A Protein of unknown function (DUF1390)
N/A
a367R No 181338 181613
68 276 91 N/A N/A N/A N/A
A368L Yes 182269 183780
66 1512
503 N/A N/A N/A N/A
a370R No 183052 183300
72 249 82 N/A N/A N/A N/A
a371R No 183439 183627
62 189 62 N/A N/A N/A N/A
a372L No 183773 184003
64 231 76 N/A N/A N/A N/A
A373R Yes 183954 184412
76 459 152 N/A N/A N/A N/A
a374L No 184016 184279
64 264 87 N/A N/A N/A N/A
A375R Yes 184452 184973
88 522 173 N/A N/A N/A N/A
a376R No 184910 185101
50 192 63 N/A N/A N/A N/A
A378L Yes 184981 185766
88 786 261 N/A N/A N/A N/A
A379L Yes 185805 186428
76 624 207 N/A N/A N/A N/A
a380R No 186054 186359
42 306 101 N/A N/A N/A N/A
a381R No 186136 186393
66 258 85 N/A N/A N/A N/A
A383R Yes 186572 187954
94 1383
460 N/A N/A Large eukaryotic DNA virus major capsid protein
Similar to hypothetical protein OTV1_165 [Ostreococcus tauri virus 1] [4e-21]
a384aR
No 187752 187964
70 213 70 N/A N/A N/A N/A
A384aL
Yes 187904 188092
68 189 62 N/A N/A N/A N/A
A384bL
Yes 188031 188213
76 183 60 N/A N/A N/A N/A
a384bR
No 188097 188243
42 147 48 N/A N/A N/A N/A
A384cL
Yes 188236 190164
92 1929
642 N/A N/A Chitin binding Peritrophin-A domain / Large eukaryotic DNA virus major capsid protein
Similar to basal body protein [Naegleria gruberi] [1e-05]
a385L No 188307 188537
50 231 76 N/A N/A N/A N/A
a388R No 189372 189701
62 330 109 N/A N/A N/A N/A
a391R No 189884 190171
74 288 95 N/A N/A N/A N/A
A392R Yes 190233 19101 26 780 259 N/A N/A Poxvirus A32 Similar to
32
2 protein hypothetical protein OsV5_098r [Ostreococcus virus OsV5] [3e-51]
a393aL
No 191066 191200
2 135 44 N/A N/A N/A N/A
a393L No 190818 191015
74 198 65 N/A N/A N/A N/A
A394R Yes 191103 191468
62 366 121 N/A N/A N/A N/A
A395aL
Yes 191755 191877
72 123 40 N/A N/A N/A N/A
a395bL
No 191872 192009
64 138 45 N/A N/A N/A N/A
A395R Yes 191505 191753
76 249 82 N/A N/A N/A N/A
A396L Yes 191899 192357
82 459 152 N/A N/A N/A N/A
A397R Yes 192530 192988
86 459 152 N/A N/A N/A N/A
a398aR
No 193188 193367
54 180 59 N/A N/A N/A N/A
A398L Yes 192998 193354
60 357 118 N/A N/A N/A N/A
A399R Yes 193444 194028
64 585 194 N/A N/A RNase H N/A
A400R Yes 194063 194419
78 357 118 N/A N/A N/A N/A
A401R Yes 194454 195287
78 834 277 N/A N/A N/A Similar to hypothetical protein OsV5_178f [Ostreococcus virus OsV5] [8e-57]
A402R Yes 195325 196008
78 684 227 N/A N/A N/A N/A
A403R Yes 196114 196395
78 282 93 N/A N/A N/A N/A
A404aL
Yes 197020 197178
84 159 52 N/A N/A N/A N/A
A404R Yes 196440 197015
88 576 191 N/A N/A N/A N/A
A405R Yes 197233 198723
6 1491
496 N/A N/A N/A N/A
a406L No 198489 198689
70 201 66 N/A N/A N/A N/A
A407L Yes 198725 199357
80 633 210 N/A N/A N/A Similar to hypothetical protein OTV1_100 [Ostreococcus tauri virus 1] [5e-09]
33
A408L Yes 199390 200223
56 834 277 N/A N/A N/A Similar to EsV-1-42 [Ectocarpus siliculosus virus 1] [1e-12]
a409R No 199700 199948
44 249 82 N/A N/A N/A N/A
A410L Yes 200174 200506
72 333 110 N/A N/A N/A Similar to hypothetical protein FeldSpV_gp117 [Feldmannia species virus] [3e-07]
A411R Yes 200616 201128
78 513 170 N/A N/A N/A N/A
A412R Yes 201158 201697
74 540 179 N/A N/A N/A N/A
A413L Yes 201698 202432
80 735 244 N/A N/A N/A N/A
A414R Yes 202444 202725
48 282 93 N/A N/A N/A N/A
a415L No 202491 202700
66 210 69 N/A N/A N/A N/A
A416R Yes 202802 203368
82 567 188 N/A Deoxynucleoside kinases
Deoxynucleoside kinase
Similar to hypothetical protein MIV029R [Invertebrate iridescent virus 3] [5e-20]
A417L Yes 203344 204633
82 1290
429 N/A N/A N/A Similar to hypothetical protein [Paramecium tetraurelia strain d4-2] [4e-06]
a418R No 203704 203943
62 240 79 N/A N/A N/A N/A
a419R No 204434 204646
74 213 70 N/A N/A N/A N/A
A420L Yes 204664 204876
82 213 70 N/A N/A N/A N/A
A421R Yes 204921 205217
10 297 98 N/A N/A N/A N/A
A422aR
Yes 206324 206515
74 192 63 N/A N/A N/A N/A
34
A422R Yes 205267 206259
82 993 330 N/A N/A NUMOD4 motif / HNH endonuclease
Similar to HNH endonuclease [Acanthamoeba polyphaga mimivirus] [2e-11]
A423R Yes 206523 206996
72 474 157 N/A N/A N/A N/A
A424R Yes 207017 207346
86 330 109 N/A N/A N/A N/A
a425L No 207176 207364
58 189 62 N/A N/A N/A N/A
A426R Yes 207379 207723
52 345 114 N/A N/A N/A N/A
A427L Yes 207726 208085
76 360 119 N/A N/A Thioredoxin N/A
A428L Yes 208133 208570
80 438 145 N/A N/A N/A N/A
A429L Yes 208596 210026
70 1431
476 N/A N/A N/A Similar to PREDICTED: hypothetical protein [Vitis vinifera] [5e-10]
A430L Yes 210155 211468
76 1314
437 N/A N/A Large eukaryotic DNA virus major capsid protein
Similar to hypothetical protein OsV5_190f [Ostreococcus virus OsV5] [5e-96]
A431L Yes 211513 211713
64 201 66 N/A N/A N/A N/A
A432R Yes 211752 212225
14 474 157 N/A N/A N/A N/A
a433aR
No 212149 212283
46 135 44 N/A N/A N/A N/A
a433R No 211961 212302
56 342 113 N/A N/A N/A N/A
a434aR
No 212284 212472
62 189 62 N/A N/A N/A N/A
a434L No 212280 212450
58 171 56 N/A N/A N/A N/A
A435R Yes 212315 212506
30 192 63 N/A N/A N/A N/A
A436L Yes 212299 212490
60 192 63 N/A N/A PBCV-specific basic adaptor domain
N/A
A437aR
Yes 212700 212852
40 153 50 N/A N/A N/A N/A
A437L Yes 212519 212830
76 312 103 N/A N/A Non-histone chromosomal protein MC1
N/A
35
A438L Yes 212859 213095
86 237 78 N/A Glutaredoxin and related proteins
Glutaredoxin Similar to glutaredoxin 3 [Ochrobactrum intermedium LMG 3301] [3e-07]
A439aR
Yes 213465 213635
70 171 56 N/A N/A N/A N/A
A439R Yes 213120 213458
34 339 112 N/A N/A N/A N/A
A440L Yes 213625 213891
70 267 88 N/A N/A N/A N/A
A441L Yes 213910 214323
72 414 137 N/A N/A N/A N/A
a442R No 213958 214353
10 396 131 N/A N/A N/A N/A
A443R Yes 214463 215389
74 927 308 N/A N/A N/A N/A
A444L Yes 215534 215848
84 315 104 N/A N/A N/A N/A
A445L Yes 215886 217274
86 1389
462 aarF domain-containing kinase
Predicted unusual protein kinase
ABC1 family Similar to ABC-1 domain protein [Sulfolobus islandicus M.16.4] [1e-29]
a446R No 216349 216654
44 306 101 N/A N/A N/A N/A
A447aR
Yes 217321 217458
8 138 45 N/A N/A N/A N/A
a447R No 216679 216978
46 300 99 N/A N/A N/A N/A
a448aL
No 217787 217930
40 144 47 N/A N/A N/A N/A
A448L Yes 217342 217662
76 321 106 N/A N/A Thioredoxin Similar to transglutaminase [Brugia malayi] [2e-07]
A449R Yes 217799 218380
70 582 193 N/A N/A mRNA capping enzyme beta chain
N/A
A450R Yes 218680 219429
86 750 249 N/A N/A Protein of unknown function (DUF1390)
N/A
a451L No 218974 219279
54 306 101 N/A N/A N/A N/A
A452L Yes 219453 219692
82 240 79 N/A N/A N/A N/A
a453R No 219757 220026
10 270 89 N/A N/A N/A N/A
A454L Yes 219770 220639
82 870 289 N/A N/A N/A N/A
a455R No 220218 220661
42 444 147 N/A N/A N/A N/A
A456L Yes 220670 22263 74 196 654 N/A Predicted D5 N terminal Similar to
36
4 5 ATPase like hypothetical protein OsV5_188r [Ostreococcus virus OsV5] [1e-100]
a457R No 220932 221162
44 231 76 N/A N/A N/A N/A
a458L No 221107 221358
60 252 83 N/A N/A N/A N/A
a459R No 221286 221588
52 303 100 N/A N/A N/A N/A
a460R No 221801 222037
56 237 78 N/A N/A N/A N/A
A461R Yes 222723 222956
54 234 77 N/A N/A N/A N/A
A462R Yes 222749 222964
36 216 71 N/A N/A N/A N/A
a463L No 222822 223397
54 576 191 N/A N/A N/A N/A
A464aR
Yes 223726 223851
58 126 41 N/A N/A N/A N/A
A464R Yes 222966 223793
50 828 275 N/A dsRNA-specific ribonuclease
RNase3 domain / Double-stranded RNA binding motif
Similar to hypothetical protein OsV5_145f [Ostreococcus virus OsV5] [5e-50]
A465R Yes 223814 224170
74 357 118 N/A Mitochondrial sulfhydryl oxidase involved in the biogenesis of cytosolic Fe/S proteins
Erv1 / Alr family Similar to putative thiol oxidoreductase [Acanthamoeba polyphaga mimivirus] [4e-11]
a466L No 223948 224223
68 276 91 N/A N/A N/A N/A
A467L Yes 224220 225158
78 939 312 N/A N/A N/A Similar to hypothetical protein OTV1_005 [Ostreococcus tauri virus 1] [4e-12]
37
A468R Yes 225292 226623
72 1332
443 N/A N/A Eukaryotic and archaeal DNA primase small subunit / Herpesviridae UL52/UL70 DNA primase
Similar to hypothetical protein OsV5_086f [Ostreococcus virus OsV5] [9e-19]
a469L No 226311 226541
60 231 76 N/A N/A N/A N/A
a470aL
No 227195 227356
66 162 53 N/A N/A N/A N/A
A470R Yes 226696 227307
80 612 203 N/A N/A N/A Similar to hypothetical protein OsV5_091r [Ostreococcus virus OsV5] [1e-26]
A471R Yes 227367 227888
70 522 173 N/A N/A N/A Similar to hypothetical protein MIMI_L507 [Acanthamoeba polyphaga mimivirus] [6e-24]
a472L No 227642 227914
70 273 90 N/A N/A N/A N/A
A473L Yes 228013 229566
70 1554
517 N/A Glycosyltransferases probably involved in cell wall biogenesis
Glycosyl transferase family 2 / Cellulose synthase
Similar to unnamed protein product [Podospora anserina] [1e-127]
a474R No 228377 228694
46 318 105 N/A N/A N/A N/A
a475R No 229048 229320
50 273 90 N/A N/A N/A N/A
A476R Yes 229705 230679
80 975 324 ribonucleoside-diphosphate reductase subunit M2
Ribonucleotide reductase beta subunit
Ribonucleotide reductase small chain
Similar to ribonucleoside-diphosphate reductase small chain [Candidatus Protochlamydia amoebophila UWE25] [1e-112]
38
a477L No 229906 230127
52 222 73 N/A N/A N/A N/A
a478aL
No 231306 231812
56 507 168 N/A N/A N/A N/A
A478L Yes 230676 231608
74 933 310 N/A N/A N/A Similar to hypothetical protein MIMI_R423 [Acanthamoeba polyphaga mimivirus] [3e-32]
a479L No 231530 231769
52 240 79 N/A N/A N/A N/A
A480L Yes 231643 231924
82 282 93 N/A N/A N/A N/A
a481aL
No 232680 232802
64 123 40 N/A N/A N/A N/A
A481L Yes 231952 232626
78 675 224 N/A N/A N/A N/A
A482R Yes 232744 233391
86 648 215 N/A N/A MYM-type Zinc finger with FCS sequence motif
Similar to hypothetical protein OTV1_050 [Ostreococcus tauri virus 1] [4e-20]
a483L No 233160 233375
68 216 71 N/A N/A N/A N/A
A484L Yes 233399 233866
82 468 155 N/A N/A N/A N/A
A485R Yes 233951 234397
54 447 148 N/A N/A N/A N/A
A486L No 234401 234859
80 459 152 N/A N/A N/A N/A
a487R Yes 234667 234864
60 198 65 N/A N/A N/A N/A
A488R Yes 235101 236054
82 954 317 N/A N/A N/A Similar to hypothetical protein OTV1_103 [Ostreococcus tauri virus 1] [1e-22]
a489R No 235651 236019
56 369 122 N/A N/A N/A N/A
A490L Yes 236080 237012
82 933 310 N/A N/A N/A Similar to hypothetical protein MIMI_R423 [Acanthamoeba polyphaga mimivirus] [4e-28]
a491aL
No 237279 237428
52 150 49 N/A N/A N/A N/A
A491R Yes 237110 237340
78 231 76 N/A N/A N/A N/A
39
A492L Yes 237337 237906
70 570 189 N/A N/A N/A N/A
A493L Yes 237944 238519
92 576 191 N/A N/A N/A N/A
A494R Yes 238579 239661
14 1083
360 N/A N/A A2L zinc ribbon domain / Poxvirus Late Transcription Factor VLTF3 like
Similar to hypothetical protein OsV5_117f [Ostreococcus virus OsV5] [2e-45]
A495R Yes 239737 240402
76 666 221 N/A N/A GIY-YIG catalytic domain
Similar to putative SegB homing endonuclease [Staphylococcus phage PH15] [4e-08]
a496L No 240270 240482
60 213 70 N/A N/A N/A N/A
A497R Yes 240443 240883
88 441 146 N/A N/A N/A N/A
a498L No 240457 240831
60 375 124 N/A N/A N/A N/A
a499L No 240483 240716
66 234 77 N/A N/A N/A N/A
A500L Yes 240947 242005
86 1059
352 N/A N/A N/A N/A
A502L Yes 242040 242327
92 288 95 N/A N/A N/A N/A
A503L Yes 242339 243253
72 915 304 N/A N/A N/A N/A
a504aL
No 243222 243350
52 129 42 N/A N/A N/A N/A
a504R No 242786 243076
66 291 96 N/A N/A N/A N/A
A505L Yes 243250 244704
64 1455
484 N/A N/A N/A N/A
a506R No 243470 243682
50 213 70 N/A N/A N/A N/A
a507R No 243634 244194
48 561 186 N/A N/A N/A N/A
a508R No 244334 244567
54 234 77 N/A N/A N/A N/A
a509R No 244423 244728
52 306 101 N/A N/A N/A N/A
a510R No 244616 244819
50 204 67 N/A N/A N/A N/A
a511L No 244999 245220
50 222 73 N/A N/A N/A N/A
A512R Yes 245005 247419
56 2415
804 N/A N/A N/A N/A
a513R No 245138 245500
56 363 120 N/A N/A N/A N/A
a514L No 246105 246539
62 435 144 N/A N/A N/A N/A
a515L No 246172 246783
44 612 203 N/A N/A N/A N/A
a516R No 246416 246622
44 207 68 N/A N/A N/A N/A
A517L Yes 247408 248442
86 1035
344 DNA (cytosine-5-)-
Site-specific DNA
C-5 cytosine-specific DNA
Similar to site-specific DNA-
40
methyltransferase
methylase methylase methyltransferase [Acinetobacter baumannii ATCC 19606] [2e-22]
a518R No 248150 248452
62 303 100 N/A N/A N/A N/A
A519L Yes 248519 248767
74 249 82 N/A N/A N/A N/A
A520L Yes 248772 249074
74 303 100 N/A N/A N/A N/A
A521aL
Yes 249668 250276
96 609 202 N/A N/A N/A Similar to 136R [Invertebrate iridescent virus 6] [2e-06]
A521L Yes 249093 249722
70 630 209 N/A N/A N/A N/A
a522R No 249954 250226
62 273 90 N/A N/A N/A N/A
A523R Yes 250331 250846
2 516 171 N/A N/A N/A N/A
a524L No 250420 250743
48 324 107 N/A N/A N/A N/A
a525R No 250527 250727
48 201 66 N/A N/A N/A N/A
A526R Yes 250898 251338
90 441 146 N/A N/A N/A N/A
A527aL
Yes 251724 251900
78 177 58 N/A N/A N/A N/A
A527R Yes 251338 251637
58 300 99 N/A N/A N/A N/A
a528R No 251727 252011
74 285 94 N/A N/A N/A N/A
a529L No 251934 252152
38 219 72 N/A N/A N/A N/A
A530R Yes 251953 252993
2 1041
346 N/A Site-specific DNA methylase
C-5 cytosine-specific DNA methylase
Similar to hypothetical protein StreC_13626 [Streptomyces sp. C] [1e-20]
A531L Yes 252995 253198
64 204 67 N/A N/A N/A N/A
A532aL
Yes 253484 253636
84 153 50 N/A N/A N/A N/A
A532L Yes 253218 253457
78 240 79 N/A N/A N/A N/A
A533R Yes 253765 254889
82 1125
374 N/A N/A N/A N/A
A534R Yes 254810 255127
56 318 105 N/A N/A N/A N/A
A535L Yes 255129 255344
82 216 71 N/A N/A N/A N/A
a536aL
No 255590 255766
48 177 58 N/A N/A N/A N/A
A536L Yes 255394 25561 80 222 73 N/A N/A N/A N/A
41
5A537L Yes 255643 25644
078 798 265 N/A N/A N/A N/A
a538L No 256463 256660
68 198 65 N/A N/A N/A N/A
a539aR
No 257034 257198
64 165 54 N/A N/A N/A N/A
A539R Yes 256565 257086
80 522 173 N/A N/A GIY-YIG catalytic domain
N/A
A540L Yes 257089 260859
82 3771
1256
N/A N/A N/A Similar to hypothetical protein LNTAR_05021 [Lentisphaera araneosa HTCC2155] [7e-24]
a541R No 258710 259000
52 291 96 N/A N/A N/A N/A
a542R No 259347 259565
48 219 72 N/A N/A N/A N/A
A543L No 260940 261107
58 168 55 N/A N/A N/A N/A
A544R Yes 260953 261849
52 897 298 N/A ATP-dependent DNA ligase
ATP dependent DNA ligase domain
Similar to ATP dependent DNA ligase [Chthoniobacter flavus Ellin428] [1e-52]
a545L No 261324 261545
72 222 73 N/A N/A N/A N/A
A546L Yes 261818 263008
78 1191
396 N/A N/A Glycosyl transferases group 1
Similar to hypothetical protein OsV5_020r [Ostreococcus virus OsV5] [6e-21]
a547R No 262104 262337
58 234 77 N/A N/A N/A N/A
A548L Yes 263043 264530
72 1488
495 SWI/SNF-related matrix-associated actin-dependent regulator of chromatin subfamily A member 5
Superfamily II DNA/RNA helicases SNF2 family
SNF2 family N-terminal domain / Helicase conserved C-terminal domain
Similar to hypothetical protein OTV1_129 [Ostreococcus tauri virus 1] [2e-63]
a549R No 263463 263708
56 246 81 N/A N/A N/A N/A
a550aR
No 264418 264576
54 159 52 N/A N/A N/A N/A
a550b No 264494 26468 68 189 62 N/A N/A N/A N/A
42
L 2a550R No 263955 26436
566 411 136 N/A N/A N/A N/A
a551aR
No 264946 265074
44 129 42 N/A N/A N/A N/A
A551bL
Yes 265136 265303
62 168 55 N/A N/A N/A N/A
A551L Yes 264603 265028
70 426 141 dUTP pyrophosphatase
dUTPase dUTPase Similar to dUTPase (Dut) putaive [Aspergillus fumigatus Af293] [3e-39]
A552R Yes 265187 266140
86 954 317 N/A N/A Transcription factor TFIID (or TATA-binding protein TBP)
Similar to hypothetical protein OTV1_140 [Ostreococcus tauri virus 1] [5e-09]
a553L No 265624 265839
50 216 71 N/A N/A N/A N/A
A554/556/557L
Yes 266143 267639
70 1497
498 N/A Predicted ATPase of the PP-loop superfamily implicated in cell cycle control
PP-loop family Similar to predicted protein [Ostreococcus lucimarinus CCE9901] [2e-32]
A555aR
Yes 267599 267736
58 138 45 N/A N/A N/A N/A
a555R No 266521 266862
54 342 113 N/A N/A N/A N/A
a558aL
No 268960 269100
36 141 46 N/A N/A N/A N/A
A558L Yes 267736 268938
82 1203
400 N/A N/A Large eukaryotic DNA virus major capsid protein
Similar to hypothetical protein OsV5_190f [Ostreococcus virus OsV5] [1e-46]
A559L Yes 269042 269683
78 642 213 N/A N/A N/A N/A
a560R No 269181 269495
68 315 104 N/A N/A N/A N/A
A561L Yes 269727 271676
90 1950
649 N/A N/A N/A N/A
a562R No 270374 270571
42 198 65 N/A N/A N/A N/A
a563R No 271510 271689
68 180 59 N/A N/A N/A N/A
43
A564L Yes 271714 272769
70 1056
351 N/A N/A N/A Similar to EsV-1-7 [Ectocarpus siliculosus virus 1] [6e-11]
A565R Yes 272865 274877
70 2013
670 N/A N/A N/A N/A
a566L No 274139 274411
60 273 90 N/A N/A N/A N/A
A567L Yes 274874 275332
76 459 152 N/A N/A N/A N/A
A568L Yes 275356 275895
72 540 179 N/A N/A N/A N/A
a569R No 275519 275773
66 255 84 N/A N/A N/A N/A
A570L Yes 275920 276306
86 387 128 N/A N/A N/A N/A
A571R Yes 276366 276716
14 351 116 N/A N/A PBCV-specific basic adaptor domain
N/A
A572R Yes 276730 277275
74 546 181 N/A N/A N/A N/A
a573L No 276965 277174
60 210 69 N/A N/A N/A N/A
A574aL
Yes 277951 278136
64 186 61 N/A N/A N/A N/A
A574L Yes 277272 278066
64 795 264 proliferating cell nuclear antigen
DNA polymerase sliding clamp subunit (PCNA homolog)
Proliferating cell nuclear antigen N-terminal domain / Proliferating cell nuclear antigen C-terminal domain
Similar to predicted protein [Ostreococcus lucimarinus CCE9901] [2e-32]
A575L Yes 278085 278591
78 507 168 N/A N/A N/A N/A
a576R No 278403 278624
60 222 73 N/A N/A N/A N/A
A577L Yes 278644 279045
84 402 133 N/A N/A N/A N/A
a578L No 279248 279379
62 132 43 N/A N/A N/A N/A
A579L Yes 279084 279800
82 717 238 N/A N/A N/A Similar to hypothetical protein MAR_ORF016 [Marseillevirus] [8e-30]
a580R No 279643 279864
64 222 73 N/A N/A N/A N/A
44
A581R Yes 279882 280679
52 798 265 DNA adenine methylase
Site-specific DNA methylase
D12 class N6 adenine-specific DNA methyltransferase
Similar to Dam-like adenine-specific DNA methylase [Marseillevirus] [4e-46]
a582aL
No 280684 280851
54 168 55 N/A N/A N/A N/A
a582bL
No 280782 280940
58 159 52 N/A N/A N/A N/A
a582L No 280182 280418
64 237 78 N/A N/A N/A N/A
A583L Yes 280793 283978
82 3186
1061
DNA topoisomerase II
Type IIA topoisomerase (DNA gyrase/topo II topoisomerase IV) B subunit
DNA gyrase B / DNA gyrase/topoisomerase IV subunit A
Similar to predicted protein [Physcomitrella patens subsp. patens] [0.0]
a584R No 281070 281366
56 297 98 N/A N/A N/A N/A
a585R No 281369 281608
62 240 79 N/A N/A N/A N/A
A586R Yes 282021 282248
62 228 75 N/A N/A N/A N/A
a587R No 282455 282799
60 345 114 N/A N/A N/A N/A
a588R No 283130 283480
42 351 116 N/A N/A N/A N/A
A589aL
Yes 283993 284145
80 153 50 N/A N/A N/A N/A
a589L No 283489 283719
56 231 76 N/A N/A N/A N/A
A590aL
Yes 285276 285401
44 126 41 N/A N/A N/A N/A
A590L Yes 284145 285191
84 1047
348 N/A N/A N/A N/A
a591L No 285284 285451
68 168 55 N/A N/A N/A N/A
A592R Yes 285435 285644
82 210 69 N/A N/A N/A N/A
A593R Yes 285694 286593
74 900 299 N/A N/A N/A N/A
a594R No 286070 286420
42 351 116 N/A N/A N/A N/A
a595L No 286404 286658
56 255 84 N/A N/A N/A N/A
A596R Yes 286625 287053
78 429 142 dCMP deaminase
Deoxycytidylate deaminase
Cytidine and deoxycytidylate deaminase zinc-binding region
Similar to deoxycytidylate deaminase [Phage phiJL001] [2e-23]
a597L No 286822 287118
54 297 98 N/A N/A N/A N/A
A598L Yes 287054 288145
84 1092
363 histidine decarboxylase
Glutamate decarboxylase and related PLP-dependent
Pyridoxal-dependent decarboxylase conserved domain
Similar to histidine decarboxylase [Nostoc punctiforme PCC 73102] [2e-58]
45
proteins
a599R No 287117 287656
62 540 179 N/A N/A N/A N/A
a600aR
No 288026 288166
58 141 46 N/A N/A N/A N/A
a600R No 287904 288152
52 249 82 N/A N/A N/A N/A
A601R Yes 288225 288530
52 306 101 N/A N/A N/A N/A
A602L Yes 288531 289136
60 606 201 N/A N/A N/A N/A
A603aL
Yes 289390 289575
72 186 61 N/A N/A N/A N/A
a603bR
No 289445 289591
72 147 48 N/A N/A N/A N/A
a603R No 289083 289400
78 318 105 N/A N/A N/A N/A
A604L Yes 289592 289996
80 405 134 N/A N/A N/A N/A
A605L Yes 290112 290588
80 477 158 N/A N/A N/A N/A
a606L No 290452 290796
50 345 114 N/A N/A N/A N/A
A607R Yes 290633 291808
12 1176
391 N/A N/A N/A Similar to PREDICTED: similar to ankyrin 23/unc44 [Tribolium castaneum] [4e-09]
A609L Yes 291805 292974
82 1170
389 UDPglucose 6-dehydrogenase
UDP-N-acetyl-D-mannosaminuronate dehydrogenase
UDP-glucose/GDP-mannose dehydrogenase family NAD binding domain / UDP-glucose/GDP-mannose dehydrogenase family central domain / UDP-glucose/GDP-mannose dehydrogenase family UDP binding domain
Similar to UDP-glucose dehydrogenase [Vibrio furnissii CIP 102972] [1e-123]
a610R No 291859 292134
66 276 91 N/A N/A N/A N/A
a611R No 292639 292863
62 225 74 N/A N/A N/A N/A
A612L Yes 293059 293418
78 360 119 N/A Proteins containing SET domain
SET domain Similar to nuclear protein SET [Methanoculleus
46
marisnigri JR1] [3e-13]
a613R No 293185 293448
54 264 87 N/A N/A N/A N/A
A614L Yes 293449 295182
86 1734
577 N/A N/A Protein kinase domain
N/A
a615R No 293917 294117
66 201 66 N/A N/A N/A N/A
a616R No 294857 295138
60 282 93 N/A N/A N/A N/A
A617R Yes 295254 296219
34 966 321 N/A N/A N/A N/A
A618L Yes 296241 296636
70 396 131 N/A N/A N/A N/A
A619L Yes 296653 297366
84 714 237 N/A N/A N/A N/A
a620aR
No 297535 297708
56 174 57 N/A N/A N/A N/A
A620L Yes 297436 297687
86 252 83 N/A N/A N/A N/A
a621aR
No 297918 298085
76 168 55 N/A N/A N/A N/A
a621bL
No 298062 298202
56 141 46 N/A N/A N/A N/A
A621L Yes 297719 298072
84 354 117 N/A N/A N/A N/A
A622L Yes 298138 299700
80 1563
520 N/A N/A Large eukaryotic DNA virus major capsid protein
Similar to hypothetical protein OsV5_190f [Ostreococcus virus OsV5] [2e-53]
A623aL
Yes 299923 300102
56 180 59 N/A N/A N/A N/A
A623L Yes 299757 299960
72 204 67 N/A N/A AN1-like Zinc finger
Similar to hypothetical protein SORBIDRAFT_01g005640 [Sorghum bicolor] [2e-07]
A624R Yes 299990 300355
32 366 121 N/A N/A Predicted membrane protein (DUF2177)
N/A
A625R Yes 300424 301722
72 1299
432 N/A Transposase and inactivated derivatives
Helix-turn-helix domain / Putative transposase DNA-binding domain
Similar to transposase IS605 OrfB family [Arthrospira maxima CS-328] [2e-20]
47
a626aR
No 301617 301775
60 159 52 N/A N/A N/A N/A
a626L No 300882 301106
48 225 74 N/A N/A N/A N/A
a627aR
No 303152 303289
54 138 45 N/A N/A N/A N/A
A627R Yes 301818 303155
68 1338
445 N/A N/A N/A N/A
A628L Yes 303158 303451
82 294 97 N/A N/A N/A N/A
A629R Yes 303605 305920
74 2316
771 ribonucleoside-diphosphate reductase subunit M1
Ribonucleotide reductase alpha subunit
ATP cone domain / Ribonucleotide reductase all-alpha domain / Ribonucleotide reductase barrel domain
Similar to Ribonucleoside-diphosphate reductase large chain putative [Pediculus humanus corporis] [0.0]
a630R No 303900 304139
50 240 79 N/A N/A N/A N/A
A631L Yes 304106 304375
42 270 89 N/A N/A N/A N/A
a632L No 304759 304977
64 219 72 N/A N/A N/A N/A
A633R Yes 305955 306317
76 363 120 N/A N/A N/A N/A
a634aL
No 306761 306889
46 129 42 N/A N/A N/A N/A
A634L Yes 306327 306731
72 405 134 N/A N/A N/A N/A
a635aR
No 307064 307258
64 195 64 N/A N/A N/A N/A
A635R Yes 306774 307031
14 258 85 N/A N/A N/A N/A
A636R Yes 307085 307375
84 291 96 N/A N/A N/A N/A
a637aL
No 307792 307965
62 174 57 N/A N/A N/A N/A
A637R Yes 307446 307871
84 426 141 N/A N/A N/A N/A
A638R Yes 307915 308994
80 1080
359 agmatine deiminase
Peptidylarginine deiminase and related enzymes
Porphyromonas-type peptidyl-arginine deiminase
Similar to conserved hypothetical protein [Clostridiales bacterium 1_7_47_FAA] [1e-111]
a639L No 307938 308336
48 399 132 N/A N/A N/A N/A
a640R No 308165 308404
52 240 79 N/A N/A N/A N/A
a641L No 308469 308726
42 258 85 N/A N/A N/A N/A
A643R Yes 309013 310410
68 1398
465 N/A N/A N/A N/A
a644aR
No 310852 311022
48 171 56 N/A N/A N/A N/A
A644R Yes 310449 310970
84 522 173 N/A N/A N/A N/A
A645R Yes 311064 31143 84 372 123 N/A N/A N/A N/A
48
5A646aL
Yes 312100 312225
84 126 41 N/A N/A N/A N/A
A646L Yes 311460 312011
78 552 183 N/A N/A N/A N/A
A647R Yes 312293 312862
26 570 189 N/A N/A Protein of unknown function (DUF1390)
N/A
a648L No 312869 313078
54 210 69 N/A N/A N/A N/A
A649R Yes 312890 313669
68 780 259 N/A N/A Protein of unknown function (DUF1390)
N/A
a650aL
No 313552 313674
10 123 40 N/A N/A N/A N/A
a650BL
No 313671 313847
72 177 58 N/A N/A N/A N/A
a650cR
No 313683 313865
40 183 60 N/A N/A N/A N/A
a650L No 313404 313667
72 264 87 N/A N/A N/A N/A
A651L Yes 313687 314379
80 693 230 N/A N/A GIY-YIG catalytic domain / NUMOD1 domain
Similar to putative SegB homing endonuclease [Staphylococcus phage PH15] [5e-14]
49
Appendix 2. Example Python Scripts for Synthesizing Colouring Files
a. Categorizing into proteomic method of detection
b. Categorizing into major/minor genes
50
c. Converting from Genbank to FASTA (note: non-functional outside of our specific input)
51
References
1. V. Agarwal et al., “PDBalert: automatic, recurrent remote homology tracking and protein structure prediction,” BMC structural biology 8, no. 1 (2008): 51.
2. R. Halim et al., “Oil extraction from microalgae for biodiesel production,” Bioresource technology (2010).
3. M W Karakashian, “Symbiosis in Paramecium Bursaria,” Symposia of the Society for Experimental Biology, no. 29 (1975): 145-173.
4. R. Radakovits et al., “Genetic Engineering of Algae for Enhanced Biofuel Production,” Eukaryotic Cell 9, no. 4 (February 2010): 486-501.
5. Santhanam, Narsi et al., “Oilgae Comprehensive Report Preview,” Internal Report [Oilgae]. Unpublished.
6. J. L Van Etten, L. C Lane, and R. H Meints, “Viruses and viruslike particles of eukaryotic algae.,” Microbiology and Molecular Biology Reviews 55, no. 4 (1991): 586.
7. L. P Villarreal and V. R DeFilippis, “A hypothesis for DNA viruses as the origin of eukaryotic replication proteins,” Journal of Virology 74, no. 15 (2000): 7079.
52