32
IMPACT OF THE GROWTH ANOMALY DISEASE ON GENE EXPRESSION IN THE CORAL, MONTIPORA CAPITATA, AT WAIʻŌPAE, HAWAIʻI A THESIS PRESENTED TO THE GRADUATE DIVISION OF THE UNIVERSITY OF HAWAIʻI AT HILO IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN TROPICAL CONSERVATION BIOLOGY AND ENVIRONMENTAL SCIENCE MAY 2016 By Monika Frazier Thesis Committee: Misaki Takabayashi, Chairperson Mahdi Belcaid Scott M. Geib Susan Jarvi Keywords: coral disease, growth anomaly, RNA-seq, metatranscriptome, Montipora capitata, Hawaiʻi

IMPACT OF THE GROWTH ANOMALY DISEASE ON ...24–28]. Reduced density of coral polyps and symbiotic algae results in decreased fitness, as polyps capture food sources from the water

  • Upload
    ngodieu

  • View
    217

  • Download
    3

Embed Size (px)

Citation preview

IMPACT OF THE GROWTH ANOMALY DISEASE ON GENE EXPRESSION IN THE CORAL, MONTIPORA CAPITATA, AT WAIʻŌPAE, HAWAIʻI

A THESIS PRESENTED TO THE GRADUATE DIVISION OF THE UNIVERSITY OF HAWAIʻI AT HILO IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE

DEGREE OF

MASTER OF SCIENCE

IN

TROPICAL CONSERVATION BIOLOGY AND ENVIRONMENTAL SCIENCE

MAY 2016

By Monika Frazier

Thesis Committee:

Misaki Takabayashi, Chairperson Mahdi Belcaid Scott M. Geib

Susan Jarvi

Keywords: coral disease, growth anomaly, RNA-seq, metatranscriptome, Montipora capitata, Hawaiʻi

i

Acknowledgements I would like to thank my advisor Misaki Takabayashi for giving me the opportunity to conduct this research and supporting me in every way possible throughout this project. Thank you to my committee members Mahdi Belcaid, Scott Geib and Susan Jarvi for all their support and advisement throughout the years. A big thank you goes out to the sampling teams who helped to collect coral samples in the field (John Burns, Ben Clark, Liz Clemens, Dan Jennings-Kam, Lauren Kapono, Keo Lopes, Kanoe Steward and Julia Stewart)! Special thanks goes to Steve Tam for his help in the lab and to Brian Hall and Teddy DeRego for continuously helping me with troubleshooting bioinformatics analyses and computing issues. I would especially like to thank Scott Geib for dedicating his time and resources to this project, and for allowing me to conduct bioinformatics analyses on the USDA computing cluster. I would also like to thank Renee Bellinger and Martin Helmkampf for reigniting my motivation to complete this project and providing constructive feedback for bioinformatics analyses. I would also like to acknowledge the funding sources for this project, the National Science Foundation’s Experimental Program to Stimulate Competitive Research grant entitled IMUA III: Pacific High Island Evolutionary Biogeography: Impacts of Invasive Species, Anthropogenic Activity and Climate Change on Hawaiian Focal Species (EPS-0903833) and the National Science Foundation Center for Research Excellence in Science and Technology (Grant No. 0833211) for the Center in Tropical Ecology and Evolution in Marine and Terrestrial Environments. Finally I would like to thank Kaleohone Roback for his unwavering support and endless patience through the ups and downs of graduate school.

Statement of Collaboration and Publication This research is the result of collaboration with Misaki Takabayashi of the University of Hawaiʻi at Hilo Marine Science Department, Scott M. Geib of the United States Department of Agriculture, Agriculture Research Service Pacific Basin Agricultural Research Center, Renee Bellinger and Martin Helmkampf of the University of Hawaiʻi at Hilo Tropical Conservation Biology and Environmental Science Graduate Program, who have contributed ideas for analyses as well as edits for this thesis, and will all be co-authors on the manuscript that is being submitted for publication in a peer-reviewed journal. However, I have conducted the lab work and bioinformatics analyses as well as the writing of this thesis, therefore it has been approved by my thesis committee to publish this work as my thesis to fulfill the degree requirements for Master of Science in Tropical Conservation Biology and Environmental Science.

ii

Abstract

Background

Scleractinian corals are a vital component of coral reef ecosystems, which have significant ecological, cultural and economic values worldwide. Anthropogenic and natural stressors are contributing to a global decline in coral health. Assessing coral health is complex due to the presence of a multitude of symbionts that can impact the physiology of the holobiont (coral host and associated symbionts). Growth anomaly (GA) is a coral disease that has significant negative impacts on several biological functions, yet our understanding of its etiology and pathology is lacking. In this study we use high-throughput mRNA sequencing along with de novo transcriptome assembly and ortholog assignment to identify coral genes that are expressed by healthy and GA-affected Montipora capitata coral colonies at a site with high GA prevalence in this species. We conducted pairwise comparisons of three distinct tissue types: healthy tissue from healthy corals, GA lesion tissue from diseased corals (“GA-affected tissue”) and apparently healthy tissue from diseased corals (“GA-unaffected tissue”).

Results

The quality-filtered de novo assembled metatranscriptome contained 76,063 genes, of which 13,643 were identified as putative coral genes. A total of 105 coral genes were differentially expressed among tissue types (healthy, GA-affected, GA-unaffected), and included genes involved in immune system pathways, regulation of apoptosis, bone development and growth, as well as disease pathology in other species. Pair-wise comparison of gene expression among healthy, GA-affected and GA-unaffected tissues showed the greatest number of differentially expressed genes between healthy and GA-affected tissues (93 genes), followed by healthy and GA-unaffected tissues (33 genes) and GA-affected and –unaffected tissues (7 genes). Differentially expressed genes of interest include deleted in malignant brain tumors 1, tumor necrosis factor receptor-associated factors 3, 5 and 6, low-density lipoprotein receptor-related proteins 4, 5 and 6, and bone morphogenetic protein 1.

Conclusion

The gene expression data and metatranscriptome assembly developed through this study represent a significant addition to the molecular information available to further our understanding of M. capitata GA. This is the first study to use mRNA-seq to investigate coral GA, and the first RNA-seq metatranscriptome assembly for the M. capitata holobiont. Differentially expressed putative coral genes identified in this study can serve as candidates for future targeted gene expression research in M. capitata GA. Furthermore, in the absence of a sequenced genome for this species of coral, the assembled transcriptome has vast application in the field of coral health.

iii

Table of Contents

Acknowledgements ....................................................................................................................................... i  

Abstract ........................................................................................................................................................ ii  

List of Tables .............................................................................................................................................. iv  

List of Figures .............................................................................................................................................. v  

List of Abbreviations ................................................................................................................................. vi  

Background .................................................................................................................................................. 1  Coral Reef Ecosystem ............................................................................................................................... 1  Threats to Coral Reefs .............................................................................................................................. 1  Coral Disease ........................................................................................................................................... 1  Coral Growth Anomaly ............................................................................................................................ 2  

Results and Discussion ................................................................................................................................ 3  De novo Transcriptome Assembly ............................................................................................................ 3  Coral Transcript Detection ...................................................................................................................... 4  Holobiont Gene Expression ...................................................................................................................... 5  Coral Gene Expression ............................................................................................................................. 6  

Conclusions ................................................................................................................................................ 10  

Methods ...................................................................................................................................................... 11  Sample Collection and Processing ......................................................................................................... 11  RNA Extraction and Sequencing ............................................................................................................ 11  Metatranscriptome Assembly Construction and Quality Analysis ......................................................... 12  Coral Transcript Detection .................................................................................................................... 12  Symbiodinium Clade Determination ...................................................................................................... 12  Differential Gene Expression Analysis ................................................................................................... 13  Gene Annotation ..................................................................................................................................... 13  

Appendix 1: Table of Annotated Differentially Expressed Genes ........................................................ 14  

Appendix 2: Yale Center for Genomic Analysis Library Preparation Protocol ................................. 17  

Appendix 3: RNA Extraction Protocol ................................................................................................... 18  

Appendix 4: Command Line Scripts for Bioinformatic Analyses ........................................................ 19  

References .................................................................................................................................................. 21  

iv

List of Tables  Table 1. Descriptive Statistics for Assembled Metatranscriptome, Quality Filtering and Ortholog Assignment .................................................................................................................... 4  Table 2. Differentially expressed genes among tissue types ...................................................... 7  

v

List of Figures Figure 1. GC content of protein coding transcripts by taxa. .................................................... 5  Figure 2. Heat map showing genes that are differentially expressed among tissue types ...... 7  

vi

List of Abbreviations BMP1 Bone morphogenetic protein 1 cp23S Chloroplast 23S DMBT1 Deleted in malignant brain tumors 1 DNA Deoxyribonucleic acid FC Fold change FDR False discovery rate FPKM Fragments per kilobase of transcript per million mapped reads GA Growth anomaly GP-340 Glycoprotein-340 ITS Internal transcribed spacer LRP Low-density lipoprotein receptor-related protein mRNA Messenger ribonucleic acid ORF Open reading frame PE Paired-end RNA Ribonucleic acid rRNA Ribosomal ribonucleic acid SAG Salivary agglutinin TNF Tumor necrosis factor TPM Transcripts per million TRAF Tumor necrosis factor receptor-associated factor YCGA Yale Center for Genomic Analysis

1

Background

Coral Reef Ecosystem

Tropical coral reefs are among the most diverse of Earth’s ecosystems [1]. At the foundation of these systems are scleractinian corals—sessile calcifying cnidarians in close association with a plethora of symbionts. The vast physical structures of coral reefs serve as habitat for its diverse community and reduce coastal erosion caused by waves. The coral holobiont (coral host along with its symbionts) provides a high rate of primary production, which supports the highly diverse and functionally complex coral reef ecosystem despite the oligotrophic (low nutrient) nature of tropical waters. The coral reefs around the world provide countless ecosystem services to humanity. Coral reef ecosystems support subsistence and commercial fisheries on a large scale, with an estimated six million reef fishers globally [2]. The vast cultural importance of coral and reef organisms among indigenous peoples speaks to the importance of coral reefs for livelihood. In Hawaiʻi, as in many places, coral reefs are valued in the billions of dollars, supporting fisheries, recreation, wave hazard mitigation, tourism as well as ecological processes [3].

Threats to Coral Reefs

Natural and anthropogenic factors have lead to the rapid decline of coral reef ecosystems across the globe in recent decades. As sessile organisms, corals are sensitive to human impacts and environmental change. In extreme cases disturbances to coral reefs can cause phase shifts where coral reefs become dominated by algae, causing a drastic reduction in ecosystem diversity [4]. Pollution on coral reefs takes many forms including untreated sewage [5], nutrient-rich terrestrial runoff [6] and the introduction of heavy metals to the marine environment [7]. Indirect anthropogenic impacts on coral reefs are wide-ranging from increased sedimentation as a result of land clearing and urban development [8] to global climate change resulting from fossil fuel emissions [9]. Natural variations in environment can also have catastrophic impacts to coral health and mortality, such as high intensity storms [10] and increased sea surface temperature as a result of El Niño [11]. Coral disease is an ever-growing threat to coral health and mortality [12]. Coral reefs in relatively low diversity regions such as Hawaiʻi are the most vulnerable [13]. With fewer species to fill the functional roles that are important to ecosystem health and resilience, each species becomes more critical to a properly functioning reef ecosystem.

Coral Disease

Diseases are currently a threat to corals on a global scale, and their prevalence and distribution are predicted to increase as an effect of global climate change [14]. Coral diseases cannot be managed without knowing the etiology and physiological effects of each disease on the coral holobiont. The coral holobiont, also referred to as the coral symbiome, describes the

2

coral host and its symbionts—bacteria, fungi, archaea, viruses, protists and animals closely associated with the coral colony. Coral symbionts have the capacity to impact coral health and even change the environmental thresholds that their sessile hosts can withstand [15–18]. Coral diseases have been recoded in over 100 coral species spanning the entire globe [19]. Coral diseases are thought to arise from influences of abiotic and biotic factors, and have been suggested as potential biological indicators of disturbance and stress on coral reefs [19]. In some cases, coral disease severity has been linked to anthropogenic factors [20, 21]. Changes to global climate can alter disease dynamics, potentially increasing the prevalence and severity of future coral disease outbreaks [12]. Current and future management of coral reefs must incorporate the potential causes and effects of coral diseases [22]. In order for scientists to be able to accurately predict the future effects of coral diseases in light of climate change, we must first understand the pathology of these diseases [14].

Coral Growth Anomaly

Growth anomaly (GA; also referred to as skeletal growth anomaly in previous literature) is a wide-spread coral disease—one of only four coral diseases that have been identified at locations around the entire globe [23]. GA has been found to affect 40 species of Scleractinian corals from 20 genera in the Indo-Pacific and Caribbean [23]. To date, studies of coral GA have focused on the morphology, histopathology, ecology, and the physiological effects of GA on the coral host. Though GA morphology among genera and species is not identical, GA is generally characterized by circumscribed lesions with abnormal skeletal and tissue structure, including reduced density of polyps and symbiotic algae [24–28]. Reduced density of coral polyps and symbiotic algae results in decreased fitness, as polyps capture food sources from the water column and symbiotic algae produce energy used for coral growth and reproduction [29]. Further reduction of photosynthetic capacity has been measured in Montipora capitata GA via quantum yield, suggesting that the micromorphology of GA provides high light stress, causing photoinhibition [30]. Decreased reproductive capacity in GA-affected corals, evidenced by decreased density and partial development of gonads in GA tissue, has been seen in coral of the genera Acropora [26], Montipora [28], and Porites [25]. Histopathological analyses reveal hyperplasia (tissue enlargement caused by increased cell production) of gastrovascular canals connecting polyps, possibly due to the increased need for energy transport from adjacent healthy tissue to GA tissue [25, 28]. This transport of energy from adjacent, apparently healthy, tissue to GA results in decreased growth in the adjacent healthy tissue as compared to healthy tissue from coral colonies without GA, as well as increased GA growth as connectivity to healthy tissue increases [24, 25, 31]. Though coral GA represent a significant decrease in fitness of GA-affected colonies, it generally does not result in colony mortality [25, 31]. Our current understanding of the pathology of GA is poor, with much of the evidence based on small sample sizes, short-term assessments or contradicting evidence among studies. Various environmental factors have been identified as potential predictors of GA prevalence

3

including coastal development [32], human population density [21, 33], coral host density [21] and high sea surface temperatures associated with coral bleaching [26, 34], although lack of a clear etiology limits our understanding of the effects of biotic and abiotic factors on GA prevalence and severity. Irikawa and colleagues suggest that GA is a sign of natural senescence in corals, evidenced by the concentration of GA in the central (oldest) region of coral colonies as well as in larger colonies, with large GA experiencing necrosis in the center [26]. Significantly higher prevalence of GA in large colonies and the central region of GA-affected colonies has been seen in other genera [25, 27], although necrosis is not present in all cases [28]. The upregulation in P. compressa tissue affected by GA of four proteins associated with hyperplasia in mammals suggests that Porites GA may be a hyperplastic condition, although upregulation of these proteins could also be caused by a combination of stressors, warranting further investigation into the molecular pathology of GA [25]. In an investigation into the molecular pathology of GA in Montipora capitata, the expression of human oncogene homologs indicative of neoplasia (uncontrolled growth of cells that is not under physiologic control) were analyzed, with results suggesting that GA is not a neoplastic condition, supporting the possibility of hyperplasia [35]. In this case, molecular differences could be detected among GA-affected and GA-unaffected tissue where previous histological analyses showed no significant difference [28, 35]. Investigations of molecular pathology can therefore be an important step in understanding the dynamics of GA. The purpose of this research is to elucidate the molecular processes in the coral Montipora capitata that are affected by growth anomaly using high throughput RNA sequencing.

Results and Discussion

De novo Transcriptome Assembly

Illumina high throughput sequencing (HiSeq2000) of poly-A selected mRNA extracted from tissue samples (N=18) of healthy and diseased coral colonies generated 1.3 billion 75 bp paired-end (PE) reads (103 gigabases). Sequencing resulted in an average of 25 million read pairs per sample (range: 17-42 million read pairs). Quality filtering and sequence trimming was conducted on raw reads, with 97.8% of read pairs retained. We constructed the de novo metatranscriptome assembly from 119 million normalized (to 50x coverage) read pairs, assembling a total of 660,340 transcripts representing 441,520 genes (Table 1). We conducted quality filtering of the de novo assembly, selecting protein coding genes that are well represented by raw read mapping, and that encode a protein sequence that is less than 90% similar to all other proteins in the dataset (see full description of quality filtering parameters in Methods section) to produce the final metatranscriptome assembly (87,085 transcripts; Table 1).

4

Table 1. Descriptive Statistics for De novo Assembled Metatranscriptome, Quality Filtering and Ortholog Assignment

Length (bases) Genes Isoforms %GC1 Total bases Mean N50

De novo Assembly 441,520 660,340 45.53 601,736,076 911 1,556

Qua

lity

Filte

ring

Par

amet

ers*

FPKM2≥0.5 146,298 237,332 46.47 307,002,357 1,293 1,916

TPM3≥0.5 146,180 237,482 46.48 307,522,843 1,294 1,916

Complete ORF4 46,876 91,876 46.42 209,031,715 2,275 2,689

Internal ORF 23,610 27,492 53.70 24,431,475 888 1,177

5’ Partial ORF 53,546 76,197 52.37 124,301,357 1,631 1,929

3’ Partial ORF 14,368 20,431 49.34 31,264,758 1,530 1,912

<90% Similarity5 114,925 137,299 50.78 214,880,995 1,565 1,956

Quality-Filtered Assembly 76,063 87,085 50.61 143,828,498 1,652 1,996

Coral Orthologs 13,643 20,461 41.56 39,739,502 1,942 2,409

*Statistics for quality filtering parameters are listed individually for each parameter; “Quality-filtered assembly” statistics represent transcripts that met all quality filtering parameters; 1%GC is the percent of nucleotide bases in sequences that are either G or C; 2FPKM=fragments per kilobase of transcript per million mapped reads; 3 TPM=transcripts per million; 4ORF=open reading frame; 5Protein sequences with <90% similarity (for proteins with >90% similarity to each other, the longest sequence was retained as the representative sequence for that cluster)

Coral Transcript Detection

In order to parse coral and symbiont transcripts from the metatranscriptome assembly, GC content was determined for protein-coding transcripts, showing two major peaks of GC content-associated transcript abundance (Figure 1). Protein-coding transcripts that had BLAST hits (e-value<1e-10) to phylum Cnidaria (1,564), class Dinophyceae (2,425), superkingdom Bacteria (669) and kingdom Fungi (355), representing coral and symbiont lineages, were plotted according to GC content. BLAST hits to superkingdom Viruses (10) were not included in this GC analysis due to low number, but were subtracted from the coral subset in the final step of coral transcript determination. By overlaying taxonomic classifications for a subset of identifiable transcripts, we were able to distinguish that the two peaks in GC-associated transcript abundance predominantly correspond with the abundance of coral and symbiont transcripts, respectively. We used a cutoff value of 47% GC content as a primary delineation between coral and symbiont transcripts, extracting 75,012 transcripts from the subset of 205,156 protein coding transcripts (which includes all transcripts with an open reading frame). By

5

comparing the list of transcripts having GC<47% and transcripts identified as inparalogs via reciprocal best hits to protein sequences derived from the Acropora digitifera genome and A. hyacinthus and A. tenuis aposymbiotic transcriptomes, we were able to extract a further refined coral subset of 21,063 transcripts. We then subtracted out any remaining transcripts that had BLAST hits to symbiont taxa, resulting in a final coral transcript subset of 20,461 transcripts from 13,643 genes.

Figure 1. GC content of protein coding transcripts by taxa. %GC content of protein coding transcripts and subsets thereof with BLAST hits (e-value<1e-10) to relevant coral holobiont taxa. *Y-axis values for protein coding transcripts (dashed line; 205,156 transcripts) are represented on the right axis.

Holobiont Gene Expression

Pairwise comparisons of gene expression among healthy, GA-affected, and GA-unaffected tissue samples using the quality-filtered assembly showed a mere three significant differentially expressed genes between healthy and GA-affected tissue types (FDR≤0.0009). Of the two genes that were upregulated in healthy as compared to GA-affected tissue, one is an uncharacterized protein, and the other is transposase IS4, whose functional role is to transpose genomic DNA. The gene that was upregulated in GA-affected as compared to healthy tissue is uncharacterized, but has a PB1 domain, which is a highly conserved domain involved in cell signaling, protein-protein interactions and protein turnover. The unusually low number of DEGs among such distinct tissue samples at the holobiont level may be an indication of the complexity of the holobiont metatranscriptome and its response to GA.

To further investigate the gene expression patterns at the holobiont level, we assessed the impact of Symbiodinium spp., the dinoflagellate symbiont, on holobiont gene expression. Extensive previous research has shown differential effects of several Symbiodinium spp. clades on coral physiology [16–18]. Therefore, we identified the clades that are present in our 12 coral colonies. We selected three Symbiodinium genes that can distinguish clades in the

0"

2000"

4000"

6000"

8000"

10000"

12000"

14000"

0"

50"

100"

150"

200"

250"

300"

350"

400"

25" 30" 35" 40" 45" 50" 55" 60" 65" 70" 75"

Num

ber"of"coding"transcripts"

Num

ber"o

f"transcripts"

%GC"

Cnidaria"Dinophyceae"Bacteria"Fungi"Coding"genes*"

6

metatranscriptome assembly: the internal transcribed spacer region of nuclear rRNA (ITS), the chloroplast 23S rRNA (cp23S), and the photosystem II protein D1 (psbA). Based on phylogenetic mapping with clade-specific isoforms of these genes, we identified six coral colonies exclusively housing Symbiodinium clade C, five colonies with only clade D and one colony with both clades C and D. As ITS, cp23S, and psbA represent three types of genetic markers (nuclear ribosomal DNA, chloroplast ribosomal DNA, and nuclear protein-coding DNA), their agreement provides strong support for these results. In our analysis of these genes, we found that the divergence of sequences between these two clades of Symbiodinium was so great that the assembly software identified orthologous genes in each clade as separate unigenes. Therefore, there are pairs of orthologs serving the same function in both clades, whose expression will be assessed independently. These genes would inevitably (and erroneously) be identified as differentially expressed. In fact, we found that 78% of holobiont genes in the quality-filtered assembly were differentially expressed between colonies housing different clades. Furthermore, for 1,408 genes that are expressed in samples housing clade C, all samples housing clade D show zero reads mapped. Similarly, 697 genes are expressed in samples housing clade D, with all clade C samples having zero reads mapped. With 17-42 million PE reads per sample, it is possible that a small number of reads will erroneously map to a gene from one sample, making it difficult to determine such a pattern on a larger scale. For example, 48,862 genes (82% of DEGs) show the same pattern with normalized (trimmed mean of M values) expression values less than 1 for all samples within a clade. It is possible that some of these low expression values are real, but also likely that many are due to the presence Symbiodinium orthologs with divergent sequence. The strong impact of Symbiodinium clade on holobiont gene expression due to the presence of multiple clades within each tissue type explains the dearth of detectable DEGs among tissue types at the holobiont level. Further refinement of holobiont gene expression analysis is being conducted in order to account for the divergence of Symbiodinium orthologs between clades C and D.

Coral Gene Expression

Gene expression comparison of putative coral transcripts among tissue types revealed 105 differentially expressed genes (FDR<0.01, FC≥2; Figure 2; see Appendix 1 for list of genes and annotations). Presence of DEGs at the coral level indicates a significant impact of GA on coral physiology, including both GA-affected and GA-unaffected tissue. A total of 93 genes were differentially expressed among healthy and GA-affected tissue types (Table 2)—more than any other pairwise comparison, indicating the significant impact that GA lesions have on coral physiology. The impact of GA on the entire coral colony is implied by the fact that 33 genes are differentially expressed among GA-unaffected (apparently healthy tissue from GA-affected coral colonies) and healthy tissue types. Interestingly, seven genes were differentially expressed between GA-affected and GA-unaffected tissue, showing that although GA has an impact on the entire colony, GA-affected and GA-unaffected tissue are still differentially impacted. We

7

selected 11 out of the above differentially expressed genes to explore in detail that highlight the effects of GA on coral physiology.

Table 2. Differentially expressed genes among tissue types

Healthy GA-unaffected GA-affected Healthy 18 42

GA-unaffected 15 5 GA-affected 51 2

Number of differentially expressed genes for each pairwise comparison. Column headers represent tissue that genes were upregulated in (e.g. 18 genes are upregulated in GA-unaffected compared to healthy).

Figure 2. Heat map showing genes that are differentially expressed among tissue types Tissue samples are represented in columns, with labels H=healthy, A=GA-affected and U=GA-unaffected tissue types, and numbers representing the coral colony from which samples were collected. Each differentially expressed gene is represented in a row. Colors correspond to fold change values.

8

Genes involved in disease

Deleted in malignant brain tumors 1 (DMBT1) is significantly downregulated in GA-affected tissue as compared to healthy tissue by an average 4.8 FC (FDR=0.0001). Although GA-unaffected tissue shows reduced gene expression by an average of 2 FC in comparison to that of healthy tissue, the difference in expression is not significant. The observed pattern in M. capitata GA of reduced expression in diseased tissue is consistent with the expression pattern of DMBT1 in human cancer tissue. In humans and other mammalian systems, DMBT1 is a gene in the highly conserved scavenger receptor cysteine-rich superfamily that is associated with innate immune response and epithelial cell differentiation [36]. Expression of DMBT1 results in the secretion of the DMBT1 protein, also known as salivary agglutinin (SAG) and glycoprotein-340 (GP-340), which is a protein that readily binds to bacteria and viruses, and interacts with immune proteins including IgA, lactoferrin and lysozyme [37]. DMBT1 has been observed to have low to no expression in various cancers including those of the brain, breast, digestive tract, lung and anus [38–42]. DMBT1 is upregulated due to inflammation, as observed in diseases affecting mucosa such as inflammatory bowel diseases [43]. The observed pattern in M. capitata GA of reduced expression in diseased tissue is consistent with the expression pattern of DMBT1 in human cancer tissue.

Three tumor necrosis factor receptor-associated factors (TRAFs) were identified as differentially expressed among the three coral tissue types. Two TRAFs with putative homology to TRAFs 3,5 and 6 showed significant upregulation by an average 3-5 FC, in GA-affected tissue as compared to healthy tissue (FDR=0.007). A third TRAF gene with putative homology to TRAF5 showed a significant upregulation upwards of 300 FC in both GA-affected and GA-unaffected tissue compared to healthy tissue (FDR≤0.0008). In humans and other vertebrate systems, tumor necrosis factor (TNF) is a conserved cytokine involved in apoptosis and immune functions including inflammation, tumor lysis and antiviral responses. Due to its ability to induce its own production, and that of other inflammatory cyto- and chemokines, TNF has a great capacity to induce inflammation. TNF has also been identified as a risk factor in all stages of tumor formation and progression [44]. TRAFs are integral proteins in TNF pathways. TRAF3 has been shown to play an important role in anti-inflammatory and antiviral response pathways [45, 46]. TRAF6 plays an integral role in immune response, as it is a key component of the cascade that leads to the activation of T cells [47]. TRAF5 is involved in cell signaling to regulate apoptosis via the lymphotoxin-β receptor to prevent cell death [48]. The magnitude of expression of TRAF5 in both GA-affected and GA-unaffected tissue and the role of TRAF5 in apoptosis regulation indicate an important role of TRAF5 to M. capitata GA, potentially allowing for the proliferation of tumor-like cells that lead to GA.

Genes involved in bone growth and development

Six putative coral low-density lipoprotein receptor-related protein (LRP) genes were identified as differentially expressed among the three tissue types. Of the four LRP4 DEGs, all

9

are downregulated in GA-affected compared to healthy tissue, and the one is downregulated in GA-affected tissue compared to both GA-unaffected and healthy tissue (FC=5-10, FDR≤0.0002). The two genes identified as LRP5 and LRP6 genes were both downregulated in GA-affected tissue compared to healthy tissue (FC=8-9, FDR≤0.0002). In humans and other vertebrates, LRPs are a family of genes involved in the regulation of bone growth, and effect bone density, mass and development. Deficiency in LRP4 has been shown to result in reduced bone mineral content and density in mice [49]. Common LRP5 alleles in humans have been shown to have differential effects on bone mineral density [50]. Evidence suggests interaction between LRP5 and 6, where deficiency in both genes, as compared to just one gene, causes exacerbated reduction in bone mineral density and deformation [51]. Downregulation of the LRP genes in GA-affected coral tissue supports previous research findings of reduced skeletal density in M. capitata GA, along with normal skeletal density in GA-unaffected tissue of diseased colonies [28], and is consistent with previous studies of bone density in other species. Further research is necessary to understand the mechanism of reduced LRP expression in GA-affected tissues. Additionally, single nucleotide polymorphism analyses of LRP isoforms in M. capitata may be an important future consideration given the differential impact of alleles, particularly those of LRP5.

Bone morphogenetic protein 1 (BMP1) was found to be downregulated in GA-affected as compared to healthy tissue by 3.2FC (FDR=0.0002). In humans and other mammalian species, BMP1 is an extracellular metalloproteinase that is critical to the formation of the extracellular matrix, specifically the processing of procollagens I, II and III for subsequent assembly into mature collagens [52]. Previous research in mice has shown that knockout of BMP1 results in gut herniation and death shortly after birth, indicating that BMP1 is also an important gene in development [53]. A missense mutation in BMP1 has been shown to cause a rare form of osteogenesis imperfecta [54], a genetic disorder that results in fragile, deformed bones and growth deficiency, and whose etiology is linked to type I collagen. This mutation affects the astacin protease module of BMP1 [55], resulting in abnormal processing of procollagen I [54]. In a similar case, osteogenesis imperfecta is caused by a missense mutation in the signal peptide of BMP1, which affects the proteins secretion, localization, and posttranslational glycosylation [56]. Patients with this mutation showed increased osteoclast activity, decreased BMP1 secretion and nonglycosylated BMP1 [57], which in itself can cause reduced protein secretion [56]. The same study tested the impact of this mutation on the in vitro processing by BMP1 of procollagen I and chordin, a protein that effects dorsal-ventral pattern in zebrafish, showing that the mutation caused a reduction in BMP1 processing of both targets [57]. Previous research in corals has shown that a BMP1 coral ortholog likely plays a role in coral skeletogenesis, evidenced by low expression in the developmental stages of larvae, followed by increased expression in calcifying coral polyps [58]. Reduced density and irregular growth of the coral skeleton in M. capitata GA lesions observed in previous studies [27, 28] may be correlated with our finding of downregulation of BMP1 in GA-affected tissue compared to healthy tissue. The fact that GA-unaffected coral tissue has normal skeletal density and growth, and does not exhibit

10

downregulation of BMP1 also supports the hypothesis that BMP1 expression may be impacting the skeletal structure and growth of GA lesions.

Assessing the transcriptomic approach

De novo transcriptome assembly is often preferred to reference-based transcriptome assembly and targeted transcript approaches, especially in non-model organisms, due to its ability to identify novel transcripts and provide a global analysis of transcription. To illustrate this point, we compare the current study to a previous study of M. capitata GA gene expression with parallel tissue sample definitions which utilized a targeted approach, assessing the expression of 5 coral homologs of human oncogenes in an effort to understand the molecular pathology of corals [35]. A major benefit to a global transcriptome analysis is that you can potentially assess for differential gene expression for all genes within a pathway. For example, the previous study targeted TNF, showing no differential expression among tissue types [35]. Using a transcriptomic approach, we were able to identify receptor-associated factors in the TNF pathway that indicate differential expression within the TNF pathway. Similarly, expression of galaxin, a coral organic matrix protein putatively involved in calcification and expected to be overexpressed in GA-affected tissue, was overexpressed in GA-unaffected tissue compared to healthy and GA-affected tissue types [35]. In comparison, we were able to identify four different genes that are involved in the regulation of bone growth in vertebrates, and which show similar expression patterns and resulting phenotypes to those seen in other species. These examples illustrate the utility of an unbiased whole transcriptome analysis to understanding gene expression in an organism with little genetic references available.

Conclusions De novo transcript assembly of diseased and healthy coral holobionts has provided a detailed molecular snapshot profile of coral and its symbionts. Through a comparative analysis of translated protein sequences to previously sequenced genomic references, we were able to identify a subset of transcripts that putatively originate from the coral host, Montipora capitata. Differential gene expression analysis among healthy, GA-affected and GA-unaffected coral tissue revealed gene expression patterns that are consistent with previous studies detailing the impact of GA on coral physiology, as well as the role these genes play in other species and diseases. Such results warrant further targeted study of genes including DMBT1, TRAFs, LRPs, BMP1 and their associated molecular pathways to elucidate their role in the onset and progression of M. capitata GA. This study represents the first RNA-seq profile of coral growth anomaly, and provides a direction in elucidating the molecular etiology of GA. However, coral is just one component of this symbiotic system and there is evidence that Symbiodinium has a large impact on coral holobiont gene expression, which is a topic that will be addressed in future research.

11

Methods

Sample Collection and Processing

Collection of M. capitata fragments was carried out in January and February 2013 at Waiʻōpae, East Hawaiʻi Island (19°29’55′′ N, 154°49’06′′ W). Waiʻōpae has previously been identified as a site with a high prevalence of M. capitata GA [27]. Healthy (N=6) and GA-affected (N=6) coral colonies were identified at Waiʻōpae. Coral fragments approximately 1cm3 were collected from colonies using a hammer and chisel, with permission from the Hawaiʻi State Division of Aquatic Resources (Special Activity Permit 2013-33). One fragment was collected from each healthy colony (colony not affected by GA; N=6). Two fragments were collected from each GA-affected colony: GA lesion tissue (“GA-affected”; N=6) and apparently healthy tissue adjacent to the GA lesion (“GA-unaffected”; N=6). Coral fragments were placed in liquid nitrogen immediately after collection for transport to the lab. Upon arrival at the lab, tissue from the fragments were scraped off using a sterile razor, crushed to a powder with a mortar and pestle, using liquid nitrogen to prevent the samples from thawing during processing. The resulting powder was placed in 1.5ml RNase-free tubes in ~0.1g aliquots and frozen at -80˚C.

RNA Extraction and Sequencing

Total RNA was extracted from ~0.1g tissue using a combination of TRIzol reagent (Life Technologies) and the RNeasy Mini Kit (Qiagen). TRIzol was added to sample tubes (1ml per 0.1g tissue) and tubes were incubated for 5 minutes at room temperature. Samples were then centrifuged (Heraeus Biofuge Fresco) at 12,000g at 4˚C for 10 minutes, and the supernatant transferred to an RNase-free 1.5ml tube. Chloroform (Sigma-Aldrich) was added (0.2ml per 1ml TRIzol), the samples were shaken vigorously for 20-30 seconds, and then incubated at room temperature for 3 minutes. Samples were then centrifuged at 18,000g at 4˚C for 18 minutes. The top layer of aqueous solution was transferred to an RNase-free tube and an equal volume of 200 proof molecular grade ethanol (Sigma-Aldrich) was added. RNA was purified using the RNeasy Mini Kit with a 25 minute DNA digestion step (Qiagen RNase-free DNase Set) and eluted in 60µl DEPC-treated water (Omega). Total RNA was stored at -80˚C in 10µl aliquots in RNase-free tubes. RNA quality and quantity was determined using the Qubit RNA Broad Range Assay Kit (Life Technologies) and Agilent RNA 6000 Nano Kit (Agilent). Once total RNA of sufficient quantity and quality was attained, 3µg total RNA was sent to the Yale Center for Genomic Analysis (YCGA) for mRNA sequencing on an Illumina HiSeq2000 sequencer. Strand-specific libraries were constructed using a modified protocol developed by YCGA (Appendix 2). Paired-end sequencing was conducted for 75 cycles for each read pair, with nine samples multiplexed per lane using the TruSeq Paired-End Cluster Kit v3-cBot-HS (Illumina).

12

Metatranscriptome Assembly Construction and Quality Analysis

Raw data sequence quality was assessed using FastQC [59] and quality filtering was completed using Trimmomatic [60] (see Appendix 4 for parameter specifications). Data were then in silico normalized to 50X coverage using the normalization script provided with the Trinity package [61]. A transcriptome assembly was constructed using default Trinity parameters from the normalized reads [62]. The metatranscriptome assembly was filtered for highly similar sequences using CD-HIT with a cutoff value of 0.9 [63]. For sequences with similarity greater than 90%, the longest sequence was retained as the representative sequence for that group. RSEM [64] was used to determine FPKM for all transcripts for quality filtering, using a cutoff value of 0.5. Putative protein coding regions based on open reading frames were identified using TransDecoder [65]. Transcripts retained after analyses with CD-HIT (longest sequence from each group of sequences with >90% similarity), RSEM (sequences with FPKM>0.5) and TransDecoder (sequences with an open reading frame) were retained for further analysis.

Coral Transcript Detection

In order to extract coral transcripts from the metatranscriptome assembly, GC-coverage plots were constructed for metatranscriptome protein-coding transcripts using the Blobology pipeline [66]. The one modification to this protocol was that we used annotations from all sequences that had a BLAST hit (e-value<1e-10) instead of using a subset of only 10,000 transcripts, as suggested in the protocol.

Coral and algal inparalogs were independently identified using Inparanoid [67] by comparing the translated quality-filtered assembly to reference proteomes. Coral references included the Acropora digitifera genome [68] and two transcriptomes produced by the Matz lab for A. hyacinthus and A. tenuis [69]. The algal reference used was the genome of Symbiodinium minutum clade B1 [70]. The S. minutum proteome was also used as an outgroup in all Inparanoid analyses using coral references.

Transcripts that had GC content <47% and found to be inparalogs to Acropora spp. were extracted from the metatranscriptome assembly to create a coral subset of transcripts. Transcripts that had inparalog matches to S. minutum or BLAST hits (e-value<1e-10) to sequences in class Dinophyceae, superkingdom Bacteria, kingdom Fungi and superkingdom Viruses were removed from the coral subset of transcripts.

Symbiodinium Clade Determination

To identify Symbiodinium lineages associated with M. capitata, and quantify their relative abundance in each colony, transcripts of three target genes were retrieved from the holobiont transcriptome assembly: the internal transcribed spacer region of nuclear rRNA (ITS), the chloroplast 23S rRNA (cp23S), and the photosystem II protein D1 (psbA). For each gene, reference sequences of clades A–H [71] obtained from NCBI GenBank were used as blastn queries, and aligned with MAFFT version 7 [72] to hits exceeding an e-value cutoff of e–10 and a

13

length cutoff of 50%. Phylogenetic analyses were performed to establish which lineage each transcript represents, using the neighbor joining method (Jukes-Cantor model) implemented in the same software. Expression levels of each transcript measured in FPKM values were then compared to characterize the Symbiodinium community composition in each coral sample.

Differential Gene Expression Analysis

Raw reads were mapped to the metatranscriptome assembly and coral transcriptome subset using RSEM [64] and bowtie2 [73]. Differential gene expression (FDR≤0.01, FC≥2) was calculated for pairwise comparisons of tissue types (healthy, GA-affected and GA-unaffected) holobiont metatranscriptome and putative coral transcriptome separately using edgeR [74].

Gene Annotation

BLAST hits were determined using BLASTP (version 2.3.0) with the National Center for Biotechnology Information’s non-redundant (nr) and Uniprot peptide databases with an e-value cutoff of 1e-4. Pfam domains were identified using the TransDecoder Trinity plugin to search the Pfam-A database. The online KEGG Ghost Koala program [75] was used to determine KEGG annotations. Gene ontology terms were determined by mapping files available from the Gene Ontology Consortium [76].

14

Appendix 1: Table of Annotated Differentially Expressed Genes

GeneID Comparison1

Gene Annotation Source2 AH AU UH

c230619_g4 A Acetylcholine receptor subunit alpha-like 2 2,4 c227218_g2 A Adenosine receptor A3 2,4 c203847_g3 H H Alcohol dehydrogenase transcription factor factor Myb/SANT-like 4 c214820_g1 H Antho-RFamide neuropeptides 1,2 c219571_g4 U ASL1/Clec4a2 fusion protein 1,2,3,4 c240815_g2 H Bone morphogenetic protein 1 homolog 2,4 c208053_g1 U BTB and MATH domain-containing protein 38 2,4 c223975_g1 H Calumenin 2,4

c244503_g2 A Collagen alpha-1 (XIV) Collagen alpha (VI)

2,4 3,4

c235651_g1 A Collagen alpha-6 (VI) 1,2,3,4 c214713_g1 A DDE superfamily endonuclease 4 c233675_g3 H DDE superfamily endonuclease 4 c218091_g3 A U Death domain-containing protein 4 c228448_g2 A E3 ubiquitin-protein ligase RNF213 2 c228448_g3 A E3 ubiquitin-protein ligase RNF213 2 c209755_g2 H Forkhead box protein L2 1,2,3,4 c215386_g1 A Frizzled 5 2,3,4 c216293_g5 A U G-protein-signaling modulator 1 1,2,4 c230266_g2 H Glypican-5 1,2,4 c238733_g2 H H GP26 1 c222220_g8 H H Guanylate-binding protein 6 2,4 c232579_g1 H HECT-domain-containing protein (ubiquitin-transferase) 4 c213861_g5 H HECT-domain-containing protein (ubiquitin-transferase) 4 c210228_g4 H Hemicentin 3 c216235_g2 A Homeobox protein engrailed 2,3,4

c224998_g2 H Inhibin beta B Activin-like protein

2,4 1,4

c165785_g1 H Integrase core domain-containing protein 4 c221254_g3 A ISXO2-like transposase domain-containing protein 4 c199868_g1 H Lectin 1 c239238_g1 H U Low-density lipoprotein receptor protein 4 1,2,4 c244330_g1 H Low-density lipoprotein receptor-related protein 4 1,2,4 c229161_g1 H Low-density lipoprotein receptor-related protein 4 1,2

c240897_g1 H Low-density lipoprotein receptor-related protein 4 Deleted in malignant brain tumors 1 protein

1,2 2,3,4

c231597_g2 H Low-density lipoprotein receptor-related protein 4 Low-density lipoprotein receptor-related protein 1

1,2,4 3

15

c221008_g1 H Low-density lipoprotein receptor-related protein 5 Low-density lipoprotein receptor-related protein 6

2,3 1,2,3

c244330_g2 H Low-density lipoprotein receptor-related protein 6 1,2 c233968_g6 A U Lysosomal acid lipase/cholesteryl ester hydrolase 1,2,3,4 c244278_g1 A Macrophage mannose receptor 1 1,2,3,4 c195139_g1 H Muscle segment homeobox 3 1,2,3,4

c218604_g3 H H Neuroendocrine convertase 1 Proprotein convertase subtilisin/kexin type 1

2,4 3,4

c233223_g2 H Neuronal acetylcholine receptor subunit alpha-10 2,4

c221191_g3 H Neuronal acetylcholine receptor subunit alpha-7 Nicotinic acetylcholine receptor

2,4 3,4

c175583_g1 A Notch-regulated ankyrin repeat-containing protein 2,4 c194031_g2 A Pao retrotransposon peptidase 4 c168944_g1 A U Pao retrotransposon peptidase family protein-like 1,4 c232386_g1 A U PB1 domain-containing protein 4

c219411_g9 H Peroxidasin homolog Hemicentin-2 Neurotrimin

1 2,4 3,4

c239515_g1 U Peroxisomal N(1)-acetyl-spermine/spermidine oxidase 2,3,4

c237792_g3 H Prestin, Solute carrier family 26 member 5,6

2,4 2,3,4

c223010_g2 H H Proprotein convertase subtilisin type 5 2,3,4 c211676_g2 H Protein DD3-3 2 c165248_g1 A Protein phosphatas 3,4

c231457_g2 U Protein unc-13 homolog C Endonuclease/Exonuclease/phosphatase family

2 4

c234697_g1 H Protein Wnt-2b 2,4 c239225_g1 H Protein Wnt-4 2,4 c224323_g1 H Protein Wnt-7b 2,3,4 c222845_g1 U Protein-methionine sulfoxide oxidase mical3a 2,3,4 c172787_g2 H H Putative ATP-dependent RNA helicase 2,3,4 c221008_g4 H Putative Low-density lipoprotein receptor repeat class B 1 c216902_g1 H Putative reverse transcriptase 1 c215315_g1 H H Retrotransposon-like family member (retr-1)-like 1,4 c219411_g4 H SEA domain-containing protein 4 c216324_g2 A ShK domain-like-containing protein 4 c237299_g2 H ShK domain-like-containing protein 4

c238232_g2 H DNA-directed RNA polymerase, omega subunit family protein Myosin heavy chain

1 2

c229212_g3 A U Similar to RNase H and integrase-like protein 1,4 c239797_g2 H Sodium/glucose cotransporter 2 2,4 c196457_g1 A Sprouty homolog 2 1,2,3,4 c235981_g3 A Sulfite exporter TauE/SafE 4 c210797_g1 H Tctex 1 domain-containing protein 2,4

16

c225255_g2 A U Tetracopeptide repeat protein 2,4 c240086_g4 H THAP domain-containing protein 4

c236094_g1 A TNF receptor-associated factor 3 TNF receptor-associated factor 5 TNF receptor-associated factor 6

2,3,4 2,4

2,3,4

c183125_g3 A U TNF receptor-associated factor 5 TNF receptor-associated factor 6

1,2,4 3,4

c239805_g2 H TPR repeat-containing protein 2 c210088_g1 A A Transcription factor HES-1 2,3,4

c208131_g3 A A Transcription factor HES-1 Transcription factor HES-4-A

3,4 2,4

c216072_g1 H Transient receptor potential cation channel subfamily A member 1 1,2,3,4 c208317_g1 H Tropomyosin alpha-4 chain 2,4 c223729_g1 H Twist 1,2,3,4 c220027_g3 A Type III iodothyronine deiodinase 2,3,4 c217092_g1 H Uncharacterized WD repeat-containing protein alr2800 2 c209865_g1 A A Unidentified protein c233110_g2 A Unidentified protein c228720_g1 H U Unidentified protein c209028_g1 H H Unidentified protein c214709_g3 H H Unidentified protein c169334_g4 H Unidentified protein c240174_g2 A U Unidentified protein c241337_g2 A U Unidentified protein c206573_g1 A U Unidentified protein c217203_g1 A U Unidentified protein c226765_g1 A U Unidentified protein c222182_g1 A Unidentified protein c222575_g2 A Unidentified protein c212713_g4 A Unidentified protein c216717_g1 A Unidentified protein

c218779_g10 A Unidentified protein c223018_g1 A Unidentified protein c230280_g4 A Unidentified protein c238564_g1 A Unidentified protein c233718_g4 H Unidentified protein c224126_g3 H Uromodulin 2,3 c210228_g7 H H Vascular endothelial growth factor receptor 2,4 c238492_g2 H Zinc finger protein 1,2,4

For genes with multiple annotations, each line represents a different annotation. 1For pairwise comparisons, letter corresponds to the tissue type in which the gene was upregulated; A=GA-affected, U=GA-unaffected, H=Healthy tissue. 2Annotation sources are 1=NCBI nr, 2=Uniprot, 3=KEGG, 4=Pfam domain

17

Appendix 2: Yale Center for Genomic Analysis Library Preparation Protocol

RNA Seq Quality Control: Total RNA quality is determined by estimating the

A260/A280 and A260/A230 ratios by nanodrop. RNA integrity is determined by running an

Agilent Bioanalyzer gel, which measures the ratio of the ribosomal peaks.

RNA Seq Library Prep: mRNA is purified from approximately 500ng of total RNA with

oligo-dT beads and sheared by incubation at 94C. Following first-strand synthesis with random

primers, second strand synthesis is performed with dUTP for generating strand-specific

sequencing libraries. The cDNA library is then end-repaired, and A-tailed, adapters are ligated

and second-strand digestion is performed by Uricil-DNA-Glycosylase. Indexed libraries that

meet appropriate cut-offs for both are quantified by qRT-PCR using a commercially available kit

(KAPA Biosystems) and insert size distribution determined with the LabChip GX. Samples with

a yield of ≥0.5 ng/ul are used for sequencing.

Flow Cell Preparation and Sequencing: Sample concentrations are normalized to 2 nM

and loaded onto Illumina version 3 flow cells at a concentration that yields 170-200 million

passing filter clusters per lane. Samples are sequenced using 75 bp paired end sequencing on an

Illumina HiSeq 2000 according to Illumina protocols. The 6 bp index is read during an additional

sequencing read that automatically follows the completion of read 1. Data generated during

sequencing runs are simultaneously transferred to the YCGA high performance computing

cluster. A positive control (prepared bacteriophage Phi X library) provided by Illumina is spiked

into every lane at a concentration of 0.3% to monitor sequencing quality in real time.

Data Analysis and Storage: Signal intensities are converted to individual base calls

during a run using the system's Real Time Analysis (RTA) software. Base calls are transferred

from the machine's dedicated personal computer to the Yale High Performance Computing

cluster via a 1 Gigabit network mount for downstream analysis. Primary analysis - sample de-

multiplexing and alignment to the human genome - is performed using Illumina's CASAVA

1.8.2 software suite. The data are returned to the user if the sample error rate is less than 2% and

the distribution of reads per sample in a lane is within reasonable tolerance. Data is retained on

the cluster for at least 6 months, after which it is transferred to a tape backup system.

18

Appendix 3: RNA Extraction Protocol Before starting: - Turn on centrifuge in hood to 4˚C (~1 hour to cool to temp) - Clean countertops, hood, centrifuge, pipettes - Label tubes (13-1.5ml and 1 column per sample + 1-1.5ml for DNaseI solution) Cut 0.1g tissue (minimize skeleton) Place into labeled RNase-free 1.5ml tube Add 1ml TRIzol per 0.1g tissue Use homogenizer with sterile tip to mix. Incubate @ RT for 5 min Centrifuge at 12,000xg for 10min at 4˚C Transfer supernatant to 1.5ml tube Add 0.2ml chloroform per 1ml TRIzol Shake vigorously for 20sec Incubate @ RT for 2-3min Centrifuge at 10,000xg for 18min at 4˚C Transfer top layer of liquid to 1.5ml tube Add equal volume of RNA-free EtOH Load sample into RNeasy column (700µl) Centrifuge at 8,000xg for 30sec Discard flow-through Add 350µl Buffer RW1 Centrifuge at 8,000xg for 15sec Discard flow-through Mix 10µl DNase I stock solution to 70µl Buffer RDD (per reaction) in a 1.5ml tube

Add 80µl DNase I incubation mix to membrane Incubate @ RT for 25min Add 350µl Buffer RW1 to column Centrifuge at 8,000xg for 15sec Discard flow through Transfer column to a new collection tube Add 500µl Buffer RPE Centrifuge 8,000xg for 30sec Discard flow-through Add 500µl Buffer RPE Centrifuge 8,000xg for 2min Discard flow-through Centrifuge at 8,000xg for 1min Transfer column to 1.5ml collection tube Add 50µl DEPC-treated water onto membrane Incubate @ RT for 2min Centrifuge 8,000xg for 1min Transfer 10µl to each pre-labeled tube Store at -80˚C immediately

19

Appendix 4: Command Line Scripts for Bioinformatic Analyses FastQC: Assess Raw Reads Quality

for FILE in Sample1 Sample2 Sample3 do ./FastQC/fastqc -t 32 $FILE.fastq done

Trimmomatic: Trim Raw Data

IN1=A1.R1.fastq IN2=A1.R2.fastq WINSIZE=5 WINCUTOFF=20 LEADING=20 TRAILING=20 MINLEN=50 java -jar trimmomatic-0.32.jar PE -phred33 $IN1 $IN2 trimmed.$IN1.$WINSIZE.$WINCUTOFF.R1.fastq single.$IN1.$WINSIZE.$WINCUTOFF.R1.fastq trimmed.$IN2.$WINSIZE.$WINCUTOFF.R2.fastq single.$IN2.$WINSIZE.$WINCUTOFF .R2.fastq ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:$LEADING TRAILING:$TRAILING SLIDINGWINDOW:$WINSIZE:$WINCUTOFF MINLEN:$MINLEN

Trinity: Assemble Transcriptome

Required programs in path: bowtie samtools Perl modules: PerlIO::gzip.pm Command parameters: /path/Trinity --seqType fq --JM 200G --left all.trimmed.R1.fastq --right all.trimmed.R2.fastq --SS_lib_type RF --CPU 32 --normalize_reads

TransDecoder: Translate Amino Acid

Required programs/files in path: PfamA.hmm cd-hit hmmer Command parameters: /path/trinity-plugins/TransDecoder_r20131110/TransDecoder -t Trinity.fasta --reuse -S --search_pfam /path/Pfam-A.hmm --MPI --CPU 100 --cd_hit_est /path/cd-hit-est

CD-HIT: Cluster Similar Proteins

Command parameters: /path/cd-hit -i Trinity.fasta.transdecoder.pep -M 0 -T 0 -g 1 -c 0.5 -n 2 -o Output_File

BLAST: Protein Annotation

Command parameters: /path/blastp -query Input_file -db nr -out Output_file -evalue 1e-4 -num_threads 50 -max_target_seqs 1 -outfmt 6

20

/path/blastp -query Input_file -db uniprot_sprot.fasta -out Output_file -evalue 1e-4 -num_threads 50 -max_target_seqs 1 -outfmt 6

RSEM: Align raw reads to assembly

Required programs in path: bowtie samtools RSEM Command parameters: /path/trinityrnaseq_r20140717/util/align_and_estimate_abundance.pl --transcripts Trinity.fasta --left all.trimmed.R1.fastq --right all.trimmed.R2.fastq --seqType fq --SS_lib_type RF --thread_count 32 --est_method RSEM --aln_method bowtie2 --trinity_mode --prep_reference

Parse results file: FPKM=value cat RSEM.genes.results | sed '1,1d' | awk '$7 >= $FPKM' | wc –l

Inparanoid: Identify Acropora and Symbiodinium inparalogs

Required programs in path: blastall formatdb Command parameters: perl inparanoid.pl Proteome_from_assembled_transcriptome Acropora_reference Symbiodinium_reference

RSEM: Align raw reads and estimate gene expression

Required programs in path: bowtie samtools RSEM Command parameters: /path/ trinityrnaseq_r20140717/util/align_and_estimate_abundance.pl --transcripts Transcriptome_assembly --left Sample_name.R1.fastq --right Sample_name.R2.fastq --seqType fq --SS_lib_type RF --thread_count 32 --est_method RSEM --aln_method bowtie2 --output_prefix Sample_name --trinity_mode --prep_reference

EdgeR: Define differentially expressed genes

Required programs in path: R Command parameters: /path/trinityrnaseq_r20140717/util/abundance_estimates_to_matrix.pl --est_method RSEM Sample1.genes.results Sample2.genes.results –out_prefix Output_file

21

References 1. Reaka-Kudla ML: The Global Biodiversity of Coral Reefs: A Comparison with Rain Forests. In Biodivers II Underst Prot Our Biol Resour. Edited by Reaka-Kudla ML, Wilson DE, Wilson EO.; 1997:83–108. 2. Teh LSL, Teh LCL, Sumaila UR: A Global Estimate of the Number of Coral Reef Fishers. PLoS One 2013, 8:e65397. 3. Cesar HSJ, van Beukering PJH: Economic Valuation of the Coral Reefs of Hawai‘i. Pacific Sci 2004, 58:231–242. 4. Hughes TP: Catastrophes, phase shifts, and large-scale degradation of a Caribbean coral reef. Science (80- ) 1994, 265:1547–1551. 5. Sutherland KP, Porter JW, Turner JW, Thomas BJ, Looney EE, Luna TP, Meyers MK, Futch JC, Lipp EK: Human sewage identified as likely source of white pox disease of the threatened Caribbean elkhorn coral, Acropora palmata. Environ Microbiol 2010, 12:1122–31. 6. Szmant AM: Nutrient Enrichment on Coral Reefs: Is It a Major Cause of Coral Reef Decline? Estuaries 2002, 25:743–766. 7. Venn AA, Quinn J, Jones R, Bodnar A: P-glycoprotein (multi-xenobiotic resistance) and heat shock protein gene expression in the reef coral Montastraea franksi in response to environmental toxicants. Aquat Toxicol 2009, 93:188–195. 8. Cramer KL, Jackson JBC, Angioletti C V, Leonard-Pingel J, Guilderson TP: Anthropogenic mortality on coral reefs in Caribbean Panama predates coral disease and bleaching. Ecol Lett 2012, 15:561–7. 9. Wild CA, Hoegh-Guldberg O, Naumann MS, Colombo-Pallotta MF, Ateweberhan M, Fitt WK, Iglesias-Prieto R, Palmer C, Bythell JC, Ortiz J-C, Loya Y, van Woesik R: Climate change impedes scleractinian corals as primary reef ecosystem engineers. Mar Freshw Res 2011, 62:205–215. 10. Coles SL, Brown EK: Twenty-five years of change in coral coverage on a hurricane impacted reef in Hawai‘i: the importance of recruitment. Coral Reefs 2007, 26:705–717. 11. Kelmo F, Attrill MJ: Severe impact and subsequent recovery of a coral assemblage following the 1997-8 El Niño event: A 17-year study from Bahia, Brazil. PLoS One 2013, 8:e65073. 12. Sokolow S: Effects of a changing climate on the dynamics of coral infectious disease: a review of the evidence. Dis Aquat Organ 2009, 87:5–18. 13. Bellwood DR, Hughes TP: Regional-scale assembly rules and biodiversity of coral reefs. Science (80- ) 2001, 292:1532–1534. 14. Altizer S, Ostfeld RS, Johnson PTJ, Kutz S, Harvell CD: Climate change and infectious diseases: from evidence to a predictive framework. Science (80- ) 2013, 341:514–519. 15. Gates RD, Ainsworth TD: The nature and taxonomic composition of coral symbiomes as drivers of performance limits in scleractinian corals. J Exp Mar Bio Ecol 2011, 408:94–101. 16. Rouze H, Lecellier G, Saulnier D, Berteaux-Lecellier V: Symbiodinium clades A and D differentially predispose Acropora cytherea to disease and Vibrio spp. colonization. Ecol Evol 2016, 6:560–572. 17. Yuyama I, Harii S, Hidaka M: Algal symbiont type affects gene expression in juveniles of the coral Acropora tenuis exposed to thermal stress. Mar Environ Res 2012, 76:41–7. 18. Salvo MKDE, Sunagawa S, Fisher PL, Voolstra CR: Coral host transcriptomic states are

22

correlated with Symbiodinium genotypes. 2010:1174–1186. 19. Green EP, Bruckner AW: The significance of coral disease epizootiology for coral reef conservation. Biol Conserv 2000, 96:347–361. 20. Redding JE, Myers-Miller RL, Baker DM, Fogel M, Raymundo LJ, Kim K: Link between sewage-derived nitrogen pollution and coral disease severity in Guam. Mar Pollut Bull 2013, 73:57–63. 21. Aeby GS, Williams GJ, Franklin EC, Kenyon J, Cox EF, Coles S, Work TM: Patterns of coral disease across the Hawaiian archipelago: relating disease to environment. PLoS One 2011, 6:e20370. 22. Beeden R, Maynard JA, Marshall PA, Heron SF, Willis BL: A framework for responding to coral disease outbreaks that facilitates adaptive management. Environ Manage 2012, 49:1–13. 23. Sutherland KP, Porter JW, Torres C: Disease and immunity in Caribbean and Indo-Pacific zooxanthellate corals. Mar Ecol Prog Ser 2004, 266:273–302. 24. Yasuda N, Nakano Y, Yamashiro H, Hidaka M: Skeletal structure and progression of growth anomalies in Porites australiensis in Okinawa, Japan. Dis Aquat Organ 2012, 97:237–247. 25. Domart-Coulon IJ, Traylor-Knowles N, Peters E, Elbert D, Downs CA, Price K, Stubbs J, McLaughlin S, Cox E, Aeby G, Brown PR, Ostrander GK: Comprehensive characterization of skeletal tissue growth anomalies of the finger coral Porites compressa. Coral Reefs 2006, 25:531–543. 26. Irikawa A, Casareto BE, Suzuki Y, Agostini S, Hidaka M, van Woesik R: Growth anomalies on Acropora cytherea corals. Mar Pollut Bull 2011, 62:1702–1707. 27. Burns JHR, Rozet NK, Takabayashi M: Morphology, severity, and distribution of growth anomalies in the coral, Montipora capitata, at Wai‘ōpae, Hawai‘i. Coral Reefs 2011, 30:819–826. 28. Burns JHR, Takabayashi M: Histopathology of growth anomaly affecting the coral, Montipora capitata: implications on biological functions and population viability. PLoS One 2011, 6:e28854. 29. Yellowlees D, Rees TA V, Leggat W: Metabolic interactions between algal symbionts and invertebrate hosts. Plant, Cell Environ 2008, 31:679–694. 30. Burns JHR, Gregg TM, Takabayashi M: Does Coral Disease Affect Symbiodinium? Investigating the Impacts of Growth Anomaly on Symbiont Photophysiology. PLoS One 2013, 8:e72466. 31. Stimson J: Ecological characterization of coral growth anomalies on Porites compressa in Hawai‘i. Coral Reefs 2011, 30:133–142. 32. Becker CG, Dalziel BD, Kersch-Becker MF, Park MG, Mouchka M: Indirect Effects of Human Development Along the Coast on Coral Health. Biotropica 2013, 45:401–407. 33. Aeby GS, Williams GJ, Franklin EC, Haapkyla J, Harvell CD, Neale S, Page C a, Raymundo L, Vargas-Ángel B, Willis BL, Work TM, Davy SK: Growth anomalies on the coral genera Acropora and Porites are strongly associated with host density and human population size across the Indo-Pacific. PLoS One 2011, 6:e16887. 34. McClanahan TR, Weil E, Maina J: Strong relationship between coral bleaching and growth anomalies in massive Porites. Glob Chang Biol 2009, 15:1804–1816. 35. Spies NP, Takabayashi M: Expression of galaxin and oncogene homologs in growth anomaly in the coral Montipora capitata. Dis Aquat Organ 2013, 104:249–256.

23

36. Mollenhauer J, Herbertz S, Holmskov U, Tolnay M, Krebs I, Merlo A, Schrøder HD, Maier D, Breitling F, Wiemann S, Grone H-J, Poustka A: DMBT1 Encodes a Protein Involved in the Immune Defense and in Epithelial Differentiation and Is Highly Unstable in Cancer. Cancer Res 2000, 60:1704–1710. 37. Ligtenberg AJ, Veerman EC, Amerongen AVN, Mollenhauer J: Salivary agglutinin/glycoprotein-340/DMBT1: a single molecule with variable composition and with different functions in infection, inflammation and cancer. Biol Chem 2007, 388:1275–1289. 38. Somerville RPT, Shoshan Y, Eng C, Barnett G, Miller D, Cowell JK: Molecular analysis of two putative tumour suppressor genes, PTEN and DMBT, which have been implicated in glioblastoma multiforme disease progression. Oncogene 1998, 17:1755–1757. 39. Braidotti P, Nuciforo PG, Mollenhauer J, Poustka A, Pellegrini C, Moro A, Bulfamante G, Coggi G, Bosari S, Pietra GG: DMBT1 expression is down-regulated in breast cancer. BMC Cancer 2004, 4. 40. Mollenhauer J, Herbertz S, Helmke B, Kollender G, Krebs I, Madsen J, Holmskov U, Sorger K, Schmitt L, Wiemann S, Otto HF, Grone H-J, Pousta A: Deleted in Malignant Brain Tumors 1 Is a Versatile Mucin-like Molecule Likely to Play a Differential Role in Digestive Tract Cancer. Cancer Res 2001, 61:8880–8886. 41. Wu W, Kemp BL, Proctor ML, Wu W, Kemp BL, Proctor ML, Gazdar AF, Minna JD, Hong WK, Mao L: Expression of DMBT1 , a Candidate Tumor Suppressor Gene , Is Frequently Lost in Lung Cancer. 1999:1846–1851. 42. Helmke BM, Renner M, Poustka A, Schirmacher P, Mollenhauer J, Kern MA: DMBT1 expression distinguishes anorectal from cutaneous melanoma. Histopathology 2009, 54:233–240. 43. Renner M, Bergmann G, Krebs I, End C, Lyer S, Hilberg F, Helmke B, Gassler N, Autschbach F, Bikker F, Strobel-Freidekind O, Gronert-Sum S, Benner A, Blaich S, Wittig R, Hudler M, Ligtenberg AJ, Madsen J, Holmskov U, Annese V, Latiano A, Schirmacher P, Amerongen VN, D’Amato M, Kioschis P, Hafner M, Poustka A, Mollenhauer J: DMBT1 Confers Mucosal Protection In Vivo and a Deletion Variant Is Associated With Crohn’s Disease. Gastroenterology 2007, 133:1499–1509. 44. Wen-Ming C: Tumor necrosis factor. Cancer Lett 2013, 328:222–225. 45. Hacker H, Redecke V, Blagoev B, Kratchmarova I, Hsu L, Wang GG, Kamps MP, Raz E, Wagner H, Hacker G, Mann M, Karin M: Specificity in Toll-like receptor signalling through distinct effector functions of TRAF3 and TRAF6. Nat Lett 2006, 439:10–13. 46. Oganesyan G, Saha SK, Guo B, He JQ, Shahangian A, Zarnegar B, Perry A, Cheng G: Critical role of TRAF3 in the Toll-like receptor-dependent and -independent antiviral response. Nat Lett 2006, 439:208–211. 47. Sun L, Deng L, Ea C, Xia Z-P, Chen ZJ: The TRAF6 Ubiquitin Ligase and TAK1 Kinase Mediate IKK Activation by BCL10 and MALT1 in T Lymphocytes. Mol Cell 2004, 14:289–301. 48. Nakano H, Oshima H, Chung W, Williams-Abbott L, Ware CF, Yagita H, Okumura K: TRAF5, an Activator of NF-κB and Putative Signal Transducer for the Lymphotoxin-β Receptor. J Biol Chem 1996, 271:14661–14664. 49. Choi HY, Dieckmann M, Herz J, Niemeier A: Lrp4, a Novel Receptor for Dickkopf 1 and Sclerostin, Is Expressed by Osteoblasts and Regulates Bone Growth and Turnover In Vivo. PLoS One 2009, 4:e7930. 50. van Meurs JBJ, Trikalinos TA, Ralston SH, Balcells S, Brandi ML, Brixen K, Kiel DP,

24

Langdahl BL, Lips P, Ljunggren O, Lorenc R, Obermayer-Pietsch B, Ohlsson C, Pettersson U, Reid DM, Rousseau F, Scollen S, Hul W V, Agueda L, Akesson K, Benevolenskaya LI, Ferrari SL, Hallmans G, Hofman A, Husted LB, Kruk M, Kaptoge S, Karasik D, Karlsson MK, Lorentzon M, et al.: Large-Scale Analysis of Association Between LRP5 and LRP6 Variants in Osteoperosis. J Am Med Assoc 2008, 299:1277–1290. 51. Holmen SL, Giambernardi TA, Zylstra CR, Buckner-Berghuis BD, Resau JH, Hess JF, Glatt V, Bouxsein ML, Ai M, Warman ML, Williams BO: Decreased BMD and Limb Deformities in Mice Carrying Mutations in Both Lrp5 and Lrp6. J Bone Miner Res 2004, 19:2033–2040. 52. Kessler E, Takahara K, Biniaminov L, Brusel M, Greenspan DS: Bone Morphogenetic Protein-1: The Type I Procollagen C-Proteinase. Science (80- ) 1996, 271:360–362. 53. Suzuki N, Labosky PA, Furuta Y, Hargett L, Dunn R, Fogo AB, Takahara K, Peters DMP, Greenspan DS, Hogan BLM: Failure of ventral body wall closure in mouse embryos lacking a procollagen C-proteinase encoded by Bmp1, a mammalian gene related to Drosophila tolloid. Development 1996, 122:3587–3595. 54. Martinez-Glez V, Valencia M, Caparros-Martin JA, Aglan M, Temtamy S, Tenorio J, Pulido V, Lindert U, Rohrbach M, Eyre D, Giunta C, Lapunzina P, Ruiz-Perez VL: Identification of a Mutation Causing Deficient BMP1/mTLD Proteolytic Activity in Autosomal Recessive Osteogenesis. Hum Mutat 2011, 33:343–350. 55. Guevara T, Yiallouros I, Kappelhoff R, Bissdorf S, Stocker W, Gomis-Ruth FX: Proenzyme Structure and Activation of Astacin Metallopeptidase. J Biol Chem 2010, 285:13958–13965. 56. Garrigue-Antar L, Hartigan N, Kadler KE: Post-translational Modification of Bone Morphogenetic Protein-1 Is Required for Secretion and Stability of the Protein. J Biol Chem 2002, 277:43327–43334. 57. Asharani P V, Keupp K, Semler O, Wang W, Li Y, Thiele H, Yigit G, Pohl E, Becker J, Frommolt P, Sonntag C, Altmuller J, Zimmermann K, Greenspan DS, Akarsu NA, Netzer C, Schonau E, Wirth R, Hammerschmidt M, Nurnberg P, Wollnik B, Carney TJ: Attenuated BMP1 Function Compromises Osteogenesis, Leading to Bone Fragility in Humans and Zebrafish. Am J Hum Genet 2012, 90:661–674. 58. Yuyama I, Suzuki Y, Watanabe T: Identification of differentially expressed genes during early growth of Acropora tenuis. In Proc 12th Int Coral Reef Symp; 2012. 59. FastQC [http://www.bioinformatics.babraham.ac.uk/projects/fastqc] 60. Bolger AM, Lohse M, Usadel B: Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 2014:btu170. 61. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A: Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nat Biotechnol 2011, 29:644–652. 62. Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M, Macmanes MD, Ott M, Orvis J, Pochet N, Strozzi F, Weeks N, Westerman R, William T, Dewey CN, Henschel R, LeDuc RD, Friedman N, Regev A: De novo transcript sequence reconstruction from RNA-Seq: reference generation and analysis with Trinity. Nat Protoc 2013, 8:1494–512. 63. Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006, 22:1658–1659. 64. Li B, Dewey CN: RSEM: accurate transcript quantification from RNA-Seq data with or

25

without a reference genome. BMC Bioinformatics 2011, 12:323–338. 65. TransDecoder [http://transdecoder.github.io] 66. Kumar S, Jones M, Koutsovoulos G, Clarke M, Blaxter M: Blobology: exploring raw genome data for contaminants, symbionts, and parasites using taxon-annotated GC-coverage plots. Front Genet 2013, 4:1–12. 67. Ostlund G, Schmitt T, Forslund K, Köstler T, Messina DN, Roopra S, Frings O, Sonnhammer ELL: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res 2010, 38:D196–203. 68. Shinzato C, Shoguchi E, Kawashima T, Hamada M, Hisata K, Tanaka M, Fujie M, Fujiwara M, Koyanagi R, Ikuta T, Fujiyama A, Miller DJ, Satoh N: Using the Acropora digitifera genome to understand coral responses to environmental change. Nature 2011, 476:320–323. 69. Matz Lab Data [http://www.bio.utexas.edu/research/matz_lab/matzlab/Data.html] 70. Shoguchi E, Shinzato C, Kawashima T, Gyoja F, Mungpakdee S, Koyanagi R, Takeuchi T, Hisata K, Tanaka M, Fujiwara M, Hamada M, Seidi A, Fujie M, Usami T, Goto H, Yamasaki S, Arakaki N, Suzuki Y, Sugano S, Toyoda A, Kuroki Y, Fujiyama A, Medina M, Coffroth MA, Bhattacharya D, Satoh N: Draft Assembly of the Symbiodinium minutum Nuclear Genome Reveals Dinoflagellate Gene Structure. Curr Biol 2013, 23:1399–1408. 71. Pochon X, Putnam HM, Burki F, Gates RD: Identifying and Characterizing Alternative Molecular Markers for the Symbiotic and Free-Living Dinoflagellate Genus Symbiodinium. PLoS One 2012, 7:e29816. 72. Katoh K, Standley DM: MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol 2013, 30:772–780. 73. Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Methods 2013, 9:357–359. 74. Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010, 26:139–140. 75. Kanehisa M, Sato Y, Morishima K: BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences. J Mol Biol 2015, 428:726–31. 76. Mappings of External Classification Systems to GO [http://geneontology.org/page/download-mappings]