20
Keywords: coconut oil, copra, differential expression, molecular markers, nut yield, RNA-seq Differential Expression Analysis in High-yielding and Low-yielding Philippine Coconut through Transcriptome Sequencing 1 Philippine Genome Center (PGC), University of the Philippines, Diliman, Quezon City 1101 Philippines 2 National Institute of Molecular Biology and Biotechnology, University of the Philippines, Diliman, Quezon City 1101 Philippines 3 Philippine Coconut Authority – Zamboanga Research Center (PCA-ZRC), San Ramon, Zamboanga City 7000 Philippines Ma. Regina Punzalan 1,2 , Gamaliel Lysander Cabria 1,2 , Ma. Anita Bautista 1,2 , Ernesto Emmanuel 3 , Ramon Rivera 3 , Susan Rivera 3 , and Cynthia Saloma 1,2 * The demand for coconut oil (CNO) continues to rise in the global market. This puts pressure for coconut-producing countries such as the Philippines to increase CNO and copra production. Baybay Tall (BAYT) is known to have the highest copra yield among the tall coconut varieties in the Philippines. However, traditional breeding techniques that rely on the use of morphological markers are very limited, laborious, and time-consuming. In order to improve breeding strategies for increased copra production, differential gene expression analysis was performed on coconut shell and kernel of high-yielding and low-yielding palms. High-quality RNA was isolated from the endosperm (ES or kernel) and endocarp (EC or shell) of nut tissues followed by transcriptome sequencing using Illumina HiSeq2000. De novo transcriptome assembly was performed using Trinity. Read abundance was estimated using Corset and differentially expressed genes were identified using edgeR. In total, 1,945 genes were found to be differentially expressed (FDR < 0.05) from the nut tissues. Annotation of the transcripts revealed that only 82 of the differentially expressed genes have significant annotation. Potential gene-targeted markers (GTMs) were designed for 64 candidate genes, which can be further validated for possible use in the marker- assisted selection of high-yielding palms. Microsatellite (SSR) sequences were identified in 19,147 unigenes in the EC and 17,394 in the ES. However, only two SSRs were found among differentially expressed genes in the EC and only one in the ES. Functional analysis revealed that high nut yield could arise from concerted actions of several transcription activators and regulatory proteins leading to increased cell division, secondary cell wall formation, enhanced energy metabolism, and activated stress response. Taken together, these processes contribute to increased kernel volume and thus increase in copra yield. Identified genes in this study can be used as potential targets in improving productivity in the Philippine coconut. Philippine Journal of Science 148 (S1): 83-95, Special Issue on Genomics ISSN 0031 - 7683 Date Received: 21 Mar 2019 *Corresponding Author: [email protected] 83

Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

Keywords: coconut oil, copra, differential expression, molecular markers, nut yield, RNA-seq

Differential Expression Analysis in High-yielding and Low-yielding Philippine Coconut through Transcriptome Sequencing

1Philippine Genome Center (PGC), University of the Philippines, Diliman, Quezon City 1101 Philippines

2National Institute of Molecular Biology and Biotechnology, University of the Philippines,Diliman, Quezon City 1101 Philippines

3Philippine Coconut Authority – Zamboanga Research Center (PCA-ZRC), San Ramon, Zamboanga City 7000 Philippines

Ma. Regina Punzalan1,2, Gamaliel Lysander Cabria1,2, Ma. Anita Bautista1,2, Ernesto Emmanuel3, Ramon Rivera3, Susan Rivera3, and Cynthia Saloma1,2*

The demand for coconut oil (CNO) continues to rise in the global market. This puts pressure for coconut-producing countries such as the Philippines to increase CNO and copra production. Baybay Tall (BAYT) is known to have the highest copra yield among the tall coconut varieties in the Philippines. However, traditional breeding techniques that rely on the use of morphological markers are very limited, laborious, and time-consuming. In order to improve breeding strategies for increased copra production, differential gene expression analysis was performed on coconut shell and kernel of high-yielding and low-yielding palms. High-quality RNA was isolated from the endosperm (ES or kernel) and endocarp (EC or shell) of nut tissues followed by transcriptome sequencing using Illumina HiSeq2000. De novo transcriptome assembly was performed using Trinity. Read abundance was estimated using Corset and differentially expressed genes were identified using edgeR. In total, 1,945 genes were found to be differentially expressed (FDR < 0.05) from the nut tissues. Annotation of the transcripts revealed that only 82 of the differentially expressed genes have significant annotation. Potential gene-targeted markers (GTMs) were designed for 64 candidate genes, which can be further validated for possible use in the marker-assisted selection of high-yielding palms. Microsatellite (SSR) sequences were identified in 19,147 unigenes in the EC and 17,394 in the ES. However, only two SSRs were found among differentially expressed genes in the EC and only one in the ES. Functional analysis revealed that high nut yield could arise from concerted actions of several transcription activators and regulatory proteins leading to increased cell division, secondary cell wall formation, enhanced energy metabolism, and activated stress response. Taken together, these processes contribute to increased kernel volume and thus increase in copra yield. Identified genes in this study can be used as potential targets in improving productivity in the Philippine coconut.

Philippine Journal of Science148 (S1): 83-95, Special Issue on GenomicsISSN 0031 - 7683Date Received: 21 Mar 2019

*Corresponding Author: [email protected]

83

Page 2: Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

INTRODUCTIONCopra, or dried coconut kernel, is a highly resourced commodity in the tropics. CNO is extracted from copra and the residue from extraction can be used as feeds for livestock. The use of CNO is favored over other vegetable oils due to its lower long chain fatty acid and higher medium chain saturated fatty acids content, higher burning point, and its perceived medical advantages (Young 1983, Dyer et al. 2008, DebMandal and Mandal 2011).

In addition, CNO has similar characteristics to petrol (i.e., specific energy, higher cetane number, and very low iodine value) and thus been used as biofuel in the form of either direct alternative oil or CNO methyl ester (Lujaji et al. 2010, Salmani et al. 2015).

The Philippines is one of the major top producers of copra and CNO in the world. It remains the top copra and CNO exporter with 350,000 metric tons of copra meal and 1 million metric tons of CNO exported in 2018 (Index Mundi 2019). In the 1970s, up to 8% (2.3 million ha) of the Philippines' total land area was dedicated to coconut farming and up to 1.86 million metric tons of CNO was exported annually (PCA 2019). However, coconut production declined as the maturation state of planted coconuts exceeded optimal productivity – in combination with various pathogenic attacks such as Phytophtora palmivora, Cadang-cadang viroid, mycoplasma, and coconut scale insect – and the increased demand for coco timber (Batugal et al. 2005, FAO 2001, Rivera et al.1999).

Fortunately, national breeding centers and programs have been established to boost the competence of the Philippine coconut industry through the breeding of varieties with improved nut yield, oil content, and biotic stress tolerance (Batugal et al. 2009, Herrán et al. 2000).

Modern breeding programs using molecular marker-assisted selection have been employed instead of the traditional breeding techniques, which require a lot of time and resources (Rivera et al. 1999, Herrán et al. 2000, Teulat et al. 2000, Batugal et al. 2009).

Characterization of coconut germplasm using molecular markers have been previously done using random DNA markers (RDMs) such as RAPD, RFLP, and AFLP; and non-specific markers such as genomic SSRs and transposable elements. These have been useful in rapid analysis for genetic diversity and trait mapping. However, genetic recombination can break established RDM genetic linkage to the trait allele locus, while genomic SSRs can be located either in transcribed or non-transcribed regions (Varshney et al. 2005, Poczai et al. 2013).

Instead of using genomic markers for genetic identity,

which may or may not be responsible for the desired trait, alternative crop-breeding strategies use markers with functional information such as GTMs and functional markers (FM) to improve selection of seedlings that directly confer the desired trait allele (Liu et al. 2012, Poczai et al. 2013). Unfortunately, the availability of markers with functional information is limited due to the shortage of data on gene targets related to desired functions or phenotypes.

This study aims to identify putative gene targets conferring increased nut yield or copra weight through RNA sequencing (RNA-seq) and transcriptome-wide differential expression analysis on the BAYT coconut variety. RNA-seq is a robust and cost-effective alternative for gene expression analysis since it can provide high-throughput sequence data from which functional information can be derived. The coconut variety, BAYT, is an advanced generation population of the traditional and widely-cultivated LAGT. BAYT has been known to have high copra yield (5t ha–1) and is a preferred parental tall of hybrids (Batugal et al. 2009).

Differential gene expression analysis was performed on BAYT EC and ES transcriptomes to mine for potential gene and genic SSR markers that can be used in the molecular marker-assisted selection of high-yielding palms.

MATERIALS AND METHODS

RNA Isolation of BAYT TissuesCoconut samples were acquired from the PCA’s coconut gene bank in Zamboanga City, Philippines. The BAYT population in Blocks 04 and 13A, aged 30 and 24 years old, were identified as low yielding and high yielding populations, respectively. Seven-month coconut fruits were harvested from two representative palms in each block. Tissues from the same block were considered biological replicates (Figure 1).

The coconut fruits were opened with a bolo and the solid ES and EC tissues were isolated using a sterile spatula and immediately flash frozen in liquid nitrogen. Tissues were macerated using pre-cooled mortar and pestles and powdered tissue was immediately processed in ~100 mg portions using Agilent™ Plant RNA Isolation Mini kit following the manufacturer’s instructions. RNA was eluted in 50 µL ddH20.

The total RNA quality was assessed using NanoDrop© 2000 spectrophotometer and Shimadzu MultiNA Microchip Electrophoresis. The RNA concentration was measured using Qubit™ High Sensitivity RNA kit (Thermo Fisher Scientific).

Punzalan et al.: Transcriptome Sequencing Analysis in Philippine Coconut

Special Issue on Genomics

84

Page 3: Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

RNA-seq and De Novo Assembly of BAYT ES and EC

High-quality RNA (28s:18s ratio ≥ 1.7; RIN ≥ 7; A260:A280 ≥ 1.8) were sent to Macrogen© Korea for sequencing where RNA quality was further verified using Agilent™ 2100 Bioanalyzer. Preparation of DNA library was done using Illumina® TruSeq™ RNA Sample Preparation kit. Enrichment of RNA, cDNA strand synthesis, adapter ligation, and DNA library enrichment were done following the manufacturer’s instructions. The libraries were paired-end sequenced by using Illumina® HiSeq 2000 platform.

Generated raw reads were analyzed using FastQC and with Trimmomatic paired-end trimmer v. 0.32 (Bolger et al 2014), adapters were removed, and low-quality reads were trimmed and filtered (Phred Q30). Initial de

novo transcriptome assembly was done on pooled reads using three platforms: Trinity 2.0.6 (Grabherr et al. 2011), SoapDenovo-Trans (Xie et al. 2014), and Velvet-Oases (Schulz et al. 2012).

SOAPdenovo-Trans and Velvet-Oases assembly were done using k-mer sizes of 21–71 nt with 10 nt intervals, while the Trinity was done using the pre-determined 25 nt k-mer size. Best transcriptome assembly was determined based on N50 statistics, the highest total gene clusters produced by the gene clustering software Corset v. 1.0.1 (Davidson and Oshlack 2014), and the highest ratio of clusters to transcripts per assembler. Afterward, pooled EC and ES reads were also assembled using Trinity followed by clustering Corset. Unigenes were selected from the longest transcripts of each gene cluster. Assembly statistics

Figure 1. Basic workflow for the differential gene expression analysis for nut yield. The figure shows an overview of the workflow used in this study. Sample collection from identified high-yielding block (BAYT 13) and low-yielding block (BAYT 4) was conducted at the PCA-ZRC. Total RNA of coconut EC and ES was isolated using Agilent Plant RNA Isolation Mini kit followed by Illumina HiSeq2000 sequencing. Trimmomatic was used to evaluate read quality, and only reads with a quality score of at least Q30 were used for the Trinity assembly. Corset clustering was performed to identify unigenes and estimate read count. All unigenes were annotated against the NCBI nr database (E value: 1E–5). Differentially expressed genes were identified using edge R (FDR > 0.05). SSRs were mined using MISA and candidate genes were selected from the list of differentially expressed genes.

Punzalan et al.: Transcriptome Sequencing Analysis in Philippine Coconut

Special Issue on Genomics

85

Page 4: Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

such as N50, size, and gene count were measured also using custom scripts. The scripts are publicly available at Github (https://github.com/jeevangelista/bioinformatics).

Functional Annotation of the BAYT TranscriptomeUnigenes were annotated using MPI-BLAST BLASTx command line tool against the NCBI non-redundant protein database with 1e-20 e-value cut-off. Interproscan (Jones et al. 2014, Zdobnov and Apweiler 2001) and gene ontology (GO) mapping were done using Blast2GO 3.0 pipeline (Conesa et al. 2005), followed by KEGG enzyme pathway analysis. Open-reading frame annotation and protein translation were done using the ab initio gene identification software GeneMarkS-T (Tang et al. 2015).

Protein ortholog clustering was done using the web-based pipeline OrthoVenn (Wang et al. 2015) using the translated protein fasta file. Comparison of orthologous genes was done on the merged BAYT transcriptome compared with previously sequenced Hainan Tall (HNT) (Fan et al. 2013), Laguna Tall (LAGT) and built-in Oryza sativa indica transcriptome from the Orthovenn database (https://orthovenn2.bioinfotoolkits.net).

Assembly ValidationValidation of the assembly was done by testing for the presence of transcripts via quantitative real-time PCR (qPCR) of housekeeping genes found in the Trinity assembly. Primers for sequences encoding glyceraldehyde 3-phosphate dehydrogenase (GAPDH), ubiquitin-conjugating enzyme E2 10 (UBC10), and agmatine coumaroyltransferase (ACT) were designed using Primer3 1.0.1 (Rozen and Kaletsky 2000). SuperScript VILO cDNA synthesis kit was used to prepare cDNA from the total RNA extract following the product manual. Eighty nanograms (80 ng) of each sample were subjected to qPCR analysis using Applied Biosystems 7500 fast real-time PCR system and Fast SYBR Green master mix, with ROX as a passive reference and 18S as an endogenous control.

Differential Expression Analysis of Genes in Coconut ES and ECAbundance estimation was performed using the Corset pipeline. Read counts from the individual tissue assemblies were normalized using the trimmed mean of M values (TMM) method. BAYT 13 (high yield) was treated as the primary contrast condition while BAYT 4 (low yield) was treated as the primary reference condition. A simple pairwise differential expression analysis was conducted using the edgeR Bioconductor Package within the Blast2Go Pro Suite 4.0.7 (Götz et al. 2008, Robinson et al. 2010).

Global linear model (GLM) likelihood ratio statistical test was applied and the false discovery rate (FDR) was set

to < 0.05. Differentially expressed genes were annotated against the NCBI nr protein database with E-value cut-off of 1E–05. GO mapping and validation were performed using the Blast2Go suite (Conesa et al. 2005).

Selection for Candidate Genes Associated with High Nut Yield

All annotated differentially expressed genes were selected and their corresponding amino acid sequences were aligned with their top blast hits. Sequences with low homology were discarded. Gene-targeted primers were designed for all transcripts with significant annotation using Primer3 1.0.1 with the following settings: primer length of 18 to 27 bp, GC content of 20–80%, Tm of 57–63 °C, and a PCR product size ranging 80–500 bp. The primers were made to target regions based on the blast hit.

Differentially expressed ES and EC unigenes were also subjected to SSR mining using the PrimerPro pipeline (http://webdocs.cs.ualberta.ca/∼yifeng/primerpro/). Tandem repeats of 10 mononucleotides and 2–6 nucleotides for at least five perfect repeat core motifs were mined using the MISA 1.0 (Thiel et al. 2003) algorithm of the pipeline, while Primer3 1.0.1 was used in designing the primers with the following settings: primer length of 18–27 bp, GC content of 20–80%, Tm of 57–63 °C, and a PCR product size ranging 100–300 bp was used. All sequences with poor annotation were discarded.

RESULTS AND DISCUSSION

Analysis of BAYT ES and EC Transcriptomes Suggests That Lipid Acts as the Primary Carbon Storage of the Developing Coconut SeedPCA-ZRC is host to around 286 coconut germplasm of various hybrids and foreign and local varieties. ES and EC from BAYT parental population (Block 04) and a high-copra-selected population (Block 13A) were used for RNA-seq analysis of BAYT. Illumina sequencing of each tissue yielded 66.9–88.9 million raw reads for a total of 586,631,950 100 bp paired-end reads. Phred score of Q30 was set to ensure high quality reads from which about 43% of the reads were discarded. The remaining 334,181,119 reads (56.9%) were used for de novo transcriptome assembly using SOAPdenovo-Trans, Velvet-Oases, and Trinity.

Comparison of the different assemblies showed that Velvet-Oases and Trinity assemblies have larger N50 transcript sizes than the SOAPdenovo-Trans assembly (Table S1). Interestingly, comparison of the cumulative length also shows that SOAPdenovo assembly has much smaller cumulative length and genome coverage (genome

Punzalan et al.: Transcriptome Sequencing Analysis in Philippine Coconut

Special Issue on Genomics

86

Page 5: Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

size estimated to be 2.1 GB) than both Velvet-Oases and Trinity. Using smaller k-mer size, hence lower stringency, Velvet-Oases shows cumulative assembly size comparable with Trinity albeit with a higher number of contigs. This indicates that Trinity has a much larger mean transcript size than Velvet-Oases. Corset clustering of transcripts and identification of unigenes showed that SOAPdenovo-51 (k-mer size 51) has the highest number of unigenes at 136,713, while Trinity has 126,576 and Oases-41 has 104,736 unigenes. However, the Trinity assembly has the highest amount of unigenes assembled from the total number of transcripts among the three, indicating that Trinity assembly has less redundant transcripts than the others. This makes Trinity the most suitable assembler for downstream use in this context. The basic bioinformatics workflow for differential expression analysis used in this study is outlined in Figure 1. Previous coconut transcriptome studies have also used Trinity (Fan et al. 2013, Nejat et al. 2015).

Coding region prediction and in silico translation was also done using the non-supervised training ab initio program: Genemark S-T prediction software (Tang et al. 2015).

Genemark uses log-odd scores for each prediction per transcript and outputs the highest scored prediction as the putative gene and with only one predicted gene per transcript. Trinity assembly of BAYT was found to have 46,475 complete coding sequences, while the Velvet-Oases has 41,993 and SOAPdenovo has 40,033. Gene prediction in the Trinity assembly of individual tissues identified 36,737 predicted proteins in the EC and 34,185 in the ES.

Predicted proteins in the BAYT Trinity assembly were used in evaluating orthologous groups of LAGT, HNT, and rice transcriptomes. Due to the limitation on the input size of the program, only the BAYT-Trinity assembly was directly compared to HNT, LAGT, and rice assemblies. Separate comparisons for BAYT-SOAP and Oases assemblies can be found in Table II and Figure II. The different BAYT assemblies were shown to have minimal differences in the number of orthologous proteins clustered with HNT.

Clustering of orthologous proteins in the BAYT-Trinity assembly produced 28,778 clusters, with most of the orthologous cluster shared by at least two organisms

(10,107) and only 4,836 are singletons. BAYT-Trinity has 1,109 unique protein clusters while having a high number of orthologous proteins with LAGT (19,280) and HNT (13,598). Rice (Oryza sativa indica) transcriptome has the fewest shared proteins at 10,417 (Figure 2). Interestingly, the higher number of similar proteins between BAYT and LAGT compared to others could be attributed to their small phylogenetic distance since BAYT is a localized advanced generation of LAGT. GO enrichment of BAYT singletons showed that ‘adenosylhomocysteinase activity,’ ‘sucrose synthase activity,’ and ‘auxin influx transmembrane transporter activity’ are the unique processes found only in the BAYT assembly (Table III).

Individual tissue assembly of EC and ES were also done using the same Trinity pipeline. The assembly generated a much smaller transcriptome than the combined assembly with 234,588 and 284,696 transcripts for EC and ES, respectively. The individual tissue assemblies also have much smaller assembly statistics (N50, mean, and median) than the merged Trinity assembly. Corset clustering resulted in 95,474 unigenes in the EC and 103,979 unigenes in the ES. The lower assembly statistics can be attributed to each tissue assembly, having only used a subset of all the reads used by the merged assembly, and may have missing reads that are only available in the other subset. Nonetheless, individual tissue assemblies were used in the differential gene expression analysis for nut yield.

EC and ES were also compared to analyze contrasting processes available for each tissue. Figure 2 shows around 93.66% similarity in orthologous proteins with only 518 unique protein clusters on EC tissue assembly; meanwhile, ES tissue assembly has 699 unique protein clusters. GO enrichment reveals unique processes only available to each tissue with the ES assembly having ‘RNA Pol II salt stress response regulation,’ ‘nuclear protein import processes,’ and ‘etioplast organization;’ while EC has ‘molecular oxygen oxidoreductase activity,’ ‘DNA integration,’ ‘RNA directed DNA polymerase activity,’ ‘endonuclease activity,’ and ‘paclitaxel biosynthetic process.’

The BAYT assembly was used in the annotation of the transcriptome against the NCBI non-redundant plant (nr-plant) and SwissProt protein databases using local BLASTx program. Up to 35,650 unigenes initially matched with the nr-plant database (28.1%) (E-value: 1E–20), while 14,662 and 5,413 of the remaining unigenes matched with nr-plant and SwissProt database (E-value: 1E–5), respectively. Top hit species include the African oil palm (Elaeis guineensis) and date palm (Phoenix dactylifera), followed by Vitis vinifera and Musa acuminata (Figure I). Interestingly, the BLAST matched with the nearest palm families rather than HNT or other coconut variety transcriptomes due to the small number of available accessions related to coconut transcriptome in the NCBI database.

Table 1. Pairwise differential expression results for nut tissues.

ES EC

Number of features 103,979 95,474

Number of features after filtering 103,978 95,474

Upregulated genes (logFC > 0.0) 30 1,165

Downregulated (logFC < 0.0) 15 735

Punzalan et al.: Transcriptome Sequencing Analysis in Philippine Coconut

Special Issue on Genomics

87

Page 6: Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

In-depth analysis of functions and processes in the transcriptome was done through GO annotation and mapping using the BLAST2GO pipeline. There were 31,254 unigenes (with BLAST matches) assigned to at least one GO ID, giving a total of 132,963 functional terms. Functional classification of the unigenes was performed using the WEGO online software (Ye et al. 2006).

GO mapping and enrichment were also done on both individual tissue assemblies to compare similarities and differences of evident processes between them. EC has 23,400 genes with 106,237 GO IDs, while ES has 118,932 functional terms mapped in 25,121 unigenes. There are no significant differences between the abundance of functional processes between individual tissue assemblies and the merged assembly (Figure 3).

Most of the functional terms mapped to biological processes with 55,072 (41.4%) terms in the merged assembly and 48,512 (45.7%) and 54,222 (45.6%) terms in the EC and ES, respectively. These functional terms mostly mapped to cellular metabolic processes, stimulus-response, and organizational processes. Around 1.1% of functional terms mapped to cell death processes that mostly include apoptosis regulation and developmentally programmed cell death, while 1.5% of the functional terms mapped to growth and developmental processes. The presence of cell death and growth processes indicates potential high turn-over of cells due to the ongoing development among seven-month-old nut tissues. A plant defense functional term, plant defensin, or defensin-like gene was mapped only in the ES, which may function in protecting the coconut meat against infectious diseases such as fungal pathogens (Balandín et al. 2005).

Other mapped terms were related to regulation,

biosynthetic processes, and cell communication processes. On cellular components, mapped terms were mostly intracellular and organelle components. Several terms also mapped to external encapsulating structures (2.8%) and symplasts (4.8%), which is expected for ES and EC tissues providing support for the developing embryo (Kozieradzka-Kiszkurno and Płachno 2012, Sebaa and Harche 2014, Stadler et al. 2005).

On molecular function, most of the terms were related to binding and catalytic processes. Hydrolases, lyase, isomerase, and oxidoreductase processes were among the top terms in catalytic activity. Interestingly, a few transposase genes also mapped to the transcriptome. Ion-binding, lipid binding, and transmembrane transport were also among the top mapped terms of molecular functions (Figure 3).

Further analyses of level 1 processes and pathways in the BAYT merged assembly were done using the Annocript pipeline (Musacchia et al. 2015).

Protein modification (30.69%) and amino-acid biosynthesis (9.42%) are the top pathways, found followed by lipid metabolism and carbohydrate degradation at 6.32% and 6.17%, respectively (Figure 4). The high amount of carbohydrate degradation and lipid metabolism indicate the importance of these processes as a carbon reservoir for the developing fruit and embryo. The high abundance of these pathways, coupled with the relatively low abundance of carbohydrate degradation pathways (2.17%), implied that lipid acts as the primary carbon storage of the developing coconut seed. Storage lipid formation often depends on the conversion of soluble sugars, such as sucrose, to lipids and may account for 40–50% of total carbon in oilseed crops (Murphy et al.

Figure 2. BAYT assembly comparisons through orthologous gene homology (left) and tissue comparison of EC and ES (right). Genemark S-T was used to predict protein sequences from the BAYT transcriptome. All predicted protein sequences were then subjected to OrthoVenn clustering to compare orthologous sequences among HNT, LAGT, and Oryza sativa indica. Orthologs in the individual ES and EC assemblies were also compared. The figure shows the number of orthologous sequences shared by the different assemblies.

Punzalan et al.: Transcriptome Sequencing Analysis in Philippine Coconut

Special Issue on Genomics

88

Page 7: Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

Figure 3. GO classification of the BAYT unigenes. BAYT assembly (Merged EC-ES), EC, and ES specific assembly unigenes were used for GO mapping and shows a similar trend among the level 3 GOs. Blast2Go was used in mapping GO terms to the merged assembly (in red) and individual EC (blue) and ES (yellow) assemblies. The figure shows the percentage of genes in each assembly that mapped to GO terms under ‘Cellular Component’, ‘Molecular Function,’ and ‘Biological Process.’

Figure 4. Analysis of Level 1 processes and pathways in the BAYT transcriptome. The figure shows that ‘Protein modification’ has the most number of terms mapped followed by ‘Amino-Acid biosynthesis,’ ‘Lipid metabolism,’ and ‘Carbohydrate degradation.’ All transcripts from the BAYT assembly were mapped against the KEGG pathway database. The figure shows the percentage of reads that mapped to the different pathways and processes.

Punzalan et al.: Transcriptome Sequencing Analysis in Philippine Coconut

Special Issue on Genomics

89

Page 8: Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

1993, Vigeolas et al. 2003).

Carbon storage in lipid form is reversed during germination and seedling establishment to allow rapid embryo development through increased activity of carbohydrate biosynthesis pathways (Eastmond et al. 2000, To et al. 2002). Further functional annotation of the genes was done using Reversed Position Specific BLAST (RPS-BLAST) using the predicted protein sequences against the COG database (Tatusov et al. 2000).

Aside from the ‘General function [R]’ cluster, most genes have been group into DNA and expression related clusters such as ‘Translation [J]’ (5.50%), ‘Transcription [K]’ (9.04%), ‘Replication, recombination and repair [L]’ (8.62%), and ‘Posttranslational modification [O]’ (7.82%) clusters (Figure 5). Signal transduction and carbohydrate mechanisms are also among the most abundant clusters with 1,752 genes (8.1%) for signal transduction, with 1,255 or 5.86% for carbohydrate metabolism. Among the lowest clusters belong to the ‘Nuclear Structure [Y]’ (0.04%) and ‘Cell motility [N]’ at only 0.17% of matched genes.

Validation of the individual tissue assemblies was done via qPCR of housekeeping genes (Figure III). It can be observed that GAPDH, UBC10, and ACT exhibit a similar pattern of expression in high-yielding and low-yielding nut tissues.

High Nut Yield Results From Concerted Actions of Several Transcription Activators and Regulatory Proteins Individual tissue assemblies of ES and EC were used in the differential expression analysis. Tissues from BAYT 13 (high yield) were treated as the primary contrast condition, while tissues from BAYT 4 (low yield) were treated as the primary reference condition (Table IV and

V). Upregulated genes (logFC > 0.0) were classified as significantly enriched in BAYT 13 but not in BAYT 4, whereas downregulated genes (logFC > 0.0) were highly expressed in BAYT 4 and not in BAYT 13. Counts obtained from the Corset pipeline were normalized using TMM and sequences and transcripts with zero read counts were discarded. GLM likelihood ratio statistical test was applied to obtain differentially expressed sequences (FDR < 0.5). An FDR of 5 implies that there is a 20% chance that a differentially expressed gene is not actually differentially expressed (Conesa et al. 2005). Analysis with lower FDR values would characterize DE genes with higher certainty. Table I summarizes the differential expression results for the two nut tissues, and Figure IV shows the MA and volcano plot for the differential analysis in the two tissues.

The BAYT ES has 45 differentially expressed genes, 30 of which are upregulated in high-yielding palms, whereas 1,900 differentially expressed genes were identified in the EC with more than 1000 genes upregulated in high-yielding palms. All differentially expressed genes were annotated against the NCBI non-redundant protein database (E-value: 1E–5) and mapped against GO databases using the Blast2Go software. Out of the 45 genes in the ES, only three sequences have a substantial annotation while only 79 out of the 1,900 were annotated in the EC. Most of the differentially expressed genes have no match against the databases or have a poor mapping with known functional domains. Purine metabolism is the top pathway enriched in the differentially expressed genes – followed by thiamine metabolism, oxidative phosphorylation, and starch and sucrose metabolism. Remarkably, most of the sequences with blast hits have very poor sequence similarity with their corresponding hit. These sequences were omitted from downstream analyses in order to better identify gene targets with functional information.

Figure 5. Functional characterization of annotated BAYT unigenes. The unigene clusters were classified using RPS-BLAST on conserved domain database (COG). Genemark S-T was used to predict protein sequences from the BAYT transcriptome. Predicted protein sequences were then subjected to RPS-BLAST against the COG database. The figure shows the number of sequences identified for each function [A–Z].

Punzalan et al.: Transcriptome Sequencing Analysis in Philippine Coconut

Special Issue on Genomics

90

Page 9: Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

Sequences of all annotated differentially expressed unigenes were closely inspected. Each unigene was aligned to the full sequence of its top hit. Only 64 of these differentially expressed genes were found to have good alignment with its blast hit and were thus used in selecting for candidate genes implicated in increased copra production. Table VI summarizes the differentially expressed genes with significant annotation and their corresponding primer sequences, which can be used in developing gene-target and FMs. These sequences have been shared with a PGC -PCA-ZRC project team for validation of molecular markers in field population. Three gene targets were identified in the ES and 61 genes were identified in the EC. The fold change suggests the degree of expression of these genes in the high-yielding palms as compared to the low yielding palms. A greater fold change value implies that there is a greater difference in the abundance of a certain gene between two tissues.

Based on the annotation of the differentially expressed genes, it seems that high nut yield in coconut is the result of concerted actions of several transcription activators and regulatory proteins. Coconut meat (ES) or copra production is likely enhanced in high-yielding palms due to a complex system of pathway interactions in coconut.

Cell division plays a vital role in regulating nut yield. This process is evidently favored by the upregulation of CDC 27 in the ES (Table VI). In plants, CDC 27 is encoded by the Hobbit gene, which is involved in cell cycle progression and cell differentiation (Bllilou et al. 2002). However, several cell division-related genes were found to be downregulated in the EC of high-yielding palms such as STK 13 and tyrosine-sulfated glycopeptide receptor 1 (PSYR1). STK 13 is a serine/threonine kinase and a key regulator in mitosis. It is a component of the chromosomal passenger complex, which is involved in organizing microtubules during mitosis. PSYR1, on the other hand, is an extracellular ligand that promotes cell proliferation and expansion (Amano et al. 2007).

There are also several cell wall modifying enzymes differentially expressed in the tissues studied. Pectin is among the primary components of the plant cell wall and can be classified to homogalacturonan, xylogalacturonan, apiogalacturonan, and rhamnogalacturonan I and II (Wolf et al. 2009). A probable pectinesterase inhibitor 28 is upregulated in high-yielding EC, suggesting reduced activity of pectin methylesterases implicated in the regulation of fruit development, carbohydrate metabolism, and cell wall extension (Carmadella et al. 2000). Moreover, a subtilisin-like protease – whose expression has been previously demonstrated to elevate levels of pectin methyltransferase activity in the cell wall (Rautengarten et al. 2008) – was found to be downregulated in the high-yielding palm. This protease

actively triggers the accumulation of cell wall modifying enzymes necessary for loosening the outer primary cell wall to facilitate extrusion or swelling of the pectinaceous mucilage upon imbibition during seed development (Rautengarten et al. 2008).

MYB308, a transcription factor involved in lignin biosynthesis, is also upregulated in the EC of high-yielding palms. Lignin is the second most abundant constituent of plant cell walls, second only to cellulose, and is produced through the phenylpropanoid pathway which is regulated by MYB 308 (Tamagnone et al. 1998). In addition, several cellulose synthase G3 genes – a Golgi-localized beta-glycan synthase that polymerizes the backbones of hemicelluloses of plant cell wall – are also upregulated. Another cell-wall modifying enzyme downregulated in the high-yielding EC is xyloglucan endotransglucosylase hydrolases, which cleave and re-ligate xyloglucan polymers, an essential constituent of the primary cell wall involved in ABA-regulated cell wall construction of growing tissues (Cho et al. 2006).

Our results suggest that, while processes involving cell division and antimicrobial activity are enriched in the ES of the high-yielding palm, the EC is enriched in processes relating to EC hardening, lignification, and secondary cell wall formation. However, these processes in the EC have very high carbon and energy demand (Carrari and Fernie 2006), which may compete with the carbon flux to the developing seed and ES. Interestingly, several genes related to the citric acid cycle and ATP synthesis-coupled electron transport were also found to be upregulated in the EC. These genes are ATP synthase, NADH dehydrogenase, NADH-plastoquinone oxidoreductase, subunits of photosystems I and II, and pyruvate dehydrogenase. It seems that high copra yield in coconut is an effect of a more efficient energy and carbon metabolism in the developing fruit.

There are also several genes related to oxidative stress found to be upregulated in the high-yielding nut. One of these is non-symbiotic hemoglobin 1, which is involved in nitrogen fixation. Non-symbiotic hemoglobin scavenges nitric oxide (NO) molecules, thereby improving oxidative stress tolerance and repression of energy-demanding processes (Thiel et al. 2011). In plants, NO is an important plant signaling molecule that can greatly influence energy and carbon metabolism. Previous studies (Thiel et al. 2011) showed that overexpression of non-symbiotic hemoglobin in Arabidopsis seeds under non-stress conditions resulted in higher seed yield. Another upregulated gene is 1-Cys peroxiredoxin, which relieves mild oxidative stress and functions as a molecular chaperone (Kim et al. 2011) during seed development.

Punzalan et al.: Transcriptome Sequencing Analysis in Philippine Coconut

Special Issue on Genomics

91

Page 10: Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

Moreover, the SUMO-conjugating enzyme SCE 1 is also among the upregulated genes. Sumoylation is a posttranslational modification similar to ubiquitylation wherein SUMO is covalently attached to its target protein. It is implicated in diverse developmental processes and was found to be induced in response to abiotic and biotic stress. In maize, the strong expression of SCE1 and accumulation of SUMO conjugates were observed in the developing ES (Augustine et al. 2016).

There are also several differentially expressed genes involved in lipid metabolism and fatty acid synthesis such as very-long-chain (3R)-3-hydroxyacyl-CoA dehydratase 2 (PAS 2), 3-oxoacyl-ACP synthase, and GDSL esterase lipase 5. Other notable genes include receptor-like kinases, which have been demonstrated to regulate ES development and stress-related signaling pathways (Gao et al. 2012). There are also three unigenes with high homology with protein sequences from Phoenix dactylifera and Elaeis guineensis but do not yet have characterized functions. These proteins may play critical roles in regulating expression directed towards nut yield and should be further studied.

Both individual assemblies were mined for SSR markers. Up to 19,147 unigenes in the EC and 17,394 in the ES were found to contain SSRs. SSR mining in the differentially expressed genes revealed that only one transcript in the ES and two in the EC contain SSRs. These transcripts encode an uncharacterized protein, U3 MPP10 and the non-symbiotic hemoglobin 1 protein with a perfect dinucleotide, a compound, and a perfect single nucleotide repeat, respectively (Table VII).

CONCLUSIONThere is a need to improve copra production in Philippine coconuts to be able to compete effectively in the world market. Increased copra production is best achieved by increasing both the number of nuts per palm and the weight of copra per nut. However, these processes have very different underlying mechanisms and it is unlikely to find a single causative gene that would confer enhanced copra yield. Despite being an already established high-yielding coconut variety, BAYT palms do not all perform well in terms of copra production. Boosting the yield potential of coconut plantations can be further improved by employing molecular breeding strategies during seedling selection to ensure that only high-yield palms are propagated and distributed to coconut farmers.

This study presents a differential expression analysis between high-yielding and low-yielding BAYT palms through transcriptome sequencing. Characterization of

the BAYT transcriptome and the differentially expressed genes showed that nut yield is influenced by diverse biological processes in the developing nut. Three events in the seven-month-old coconut fruit seem to be the main contributing factors leading to increased copra production: (1) cell division, (2) enhanced carbon and energy metabolism, and (3) activated the stress response. The solid ES or coconut meat from which copra and CNO are derived showed upregulation in cell cycle-related processes, while the EC or nutshell showed enriched secondary cell wall formation with upregulated ATP production and carbon metabolism. For both tissues, upregulation of several abiotic and biotic stress response genes observed in the high-yielding tissues suggests that a plant’s ability to adapt to environmental stresses and pathogenic infections is crucial in improving copra production.

Possible markers for further evaluation were developed from the candidate genes found to be differentially expressed in the coconut ES and EC. However, validation and testing of these markers are necessary before they can be fully used in seedling selection and crop breeding strategies. Nonetheless, these gene targets can be further evaluated as potential targets for genetic manipulation and other studies improving coconut productivity.

ACKNOWLEDGMENTSThis project was funded by the Department of Science and Technology – Philippine Council for Agriculture, Aquatic and Natural Resources Research and Development under the Coconut Genomics Program of the PGC. All coconut samples were provided by PCA-ZRC. We thank the PGC – Core Facility for Bioinformatics for their assistance and use of their computing resources.

NOTES ON APPENDICESThe complete appendices section of the study is accessible at http://philjournsci.dost.gov.ph

REFERENCESAMANO Y, TSUBOUCHI H, SHINOHARA H,

OGAWA M, MATSUBAYASHI Y. 2007. Tyrosine-sulfated glycopeptide involved in cellular proliferation and expansion in Arabidopsis. Proceedings of the National Academy of Sciences 104(46): 18333–18338.

AUGUSTINE RC, YORK SL, RYTZ TC, VIERSTRA

Punzalan et al.: Transcriptome Sequencing Analysis in Philippine Coconut

Special Issue on Genomics

92

Page 11: Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

RD. 2016. Defining the SUMO system in maize: SUMOylation is up-regulated during endosperm development and rapidly induced by stress. Plant Physiology 171(3): 2191–2210.

BALANDÍN M, ROYO J, GÓMEZ E, MUNIZ LM, MOLINA A, HUEROS G. 2005. A protective role for the embryo surrounding region of the maize endosperm, as evidenced by the characterisation of ZmESR-6, a defensin gene specifically expressed in this region. Plant Mol. Biol. 58: 269–282.

BATUGAL P, BOURDEIX R, BAUDOUIN L. 2009. Chapter 10: Coconut Breeding. In: Jain SM, Priyadarshan PM eds. Breeding Plantation Tree Crops: Tropical Species. New York: Springer New York.

BATUGAL P, RAO VR, OLIVER J eds. 2005. Coconut Genetic Resources. Serdang, Malaysia: International Plant Genetic Resources Institute – Regional Office for Asia, the Pacific and Oceania (IPGRI-APO).

BOLGER AM, LOHSE M, USADEL B. 2014. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30(15): 2114–2120.

BLILOU I, FRUGIER F, FOLMER S, SERRALBO O, WILLEMSEN V, WOLKENFELT H, ELOY NB, FERREIRA PC, WEISBEEK P, SCHERES B. 2002. The Arabidopsis HOBBIT gene encodes a CDC27 homolog that links the plant cell cycle to progression of cell differentiation. Genes & Development 16(19): 2566–2575.

CARMADELLA L, CARRATORE V, CIARDIELLO MA, SERVILLO L, BALESTREIRI C, GIOVANE A. 2000. Kiwi protein inhibitor of pectin methylesterase: Amino-acid sequence and structural importance of two disulfide bridges. European J of Biochemistry 267(14): 4561–4565.

CARRARI F, FERNIE AR. 2006. Metabolic regulation underlying tomato fruit development. J Exp. Bot. 57: 1883–1897.

CHO SK, KIM JE, PARK JA, EOM TJ, KIM WT. 2006. Constitutive expression of abiotic stress-inducible hot pepper CaXTH3, which encodes a xyloglucan endotransglucosylase/hydrolase homolog, improves drought and salt tolerance in transgenic Arabidopsis plants. FEBS Letters 580(13): 3136–3134.

CONESA A, GÖTZ S, GARCÍA-GÓMEZ JM, TEROL J, TALÓN M, ROBLES M. 2005. Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21: 3674–3676.

DAVIDSON NM, OSHLACK A. 2014. Corset: Enabling differential gene expression analysis for de novo

assembled transcriptomes. Genome Biol. 15: 410.

DEBMANDAL M, MANDAL S. 2011. Coconut (Cocos nucifera L.: Arecaceae): In health promotion and disease prevention. Asian Pac. J Trop. Med. 4: 241–247.

DYER JM, STYMNE S, GREEN AG, CARLSSON AS. 2008. High-value oils from plants. Plant J 54: 640–655.

EASTMOND PJ, GERMAIN V, LANGE PR, BRYCE JH, SMITH SM, GRAHAM IA. 2000. Postgerminative growth and lipid catabolism in oilseeds lacking the glyoxylate cycle. Proc. Natl. Acad. Sci. 97: 5669–5674.

FAN H, XIAO Y, YANG Y, XIA W, MASON AS, XIA Z, QIAO F, ZHAO S, TANG H. 2013. RNA-Seq Analysis of Cocos nucifera: Transcriptome Sequencing and De Novo Assembly subsequent functional genomics approaches. PLoS One 8(3): 359997.

[FAO] Food and Agriculture Organization. 2001. Non-forest tree plantations. Report based on the work of W. Killmann. Forest Plantation Thematic Papers, Working Paper 6.

GAO LL, XUE HW. 2012. Global analysis of expression profiles of rice receptor-like kinase genes. Molecular Plant 5(1): 143–153.

G Ö T Z S, G A R C Í A-G Ó M E Z J M, T E R O L J , WILLIAMS TD, NAGARAJ SH, NUEDA MJ, ROBLES M, TALÓN M, DOPAZO J, CONESA A. 2008. High-throughput functional annotation and data mining with the Blast2Go suite. Nucleic Acids Research 36(10): 3420–3435.

GRABHERR MG, HAAS BJ, YASSOUR M, LEVIN JZ, THOMPSON DA, AMIT I, ADICONIS X, FAN L, RAYCHOWDHURY R, ZENG Q, CHEN Z, MAUCELI E, HACOHEN N, GNIRKE A, RHIND N, DI PALMA F, BIRREN BW, NUSBAUM C, LINDBLAD-TOH K, FRIEDMAN N, REGEV A. 2011. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology 29(7): 644–652.

HERRÁN A, ESTIOKO L, BECKER D, RODRIGUEZ MJB, ROHDE W, RITTER E. 2000. Linkage mapping and QTL analysis in coconut (Cocos nucifera L.). Theor. Appl. Genet. 101: 292–300.

INDEX MUNDI. 2019. Philippine Coconut Oil Exports by Year. Retrieved from http://www.indexmundi.com/agriculture/?country=ph&commodity=coconut-oil&graph=exports on 11 Mar 2019.

JONES P, BINNS D, CHANG HY, FRASER M, LI W, MCANULLA C, MCWILLIAM H, MASLEN

Punzalan et al.: Transcriptome Sequencing Analysis in Philippine Coconut

Special Issue on Genomics

93

Page 12: Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

J, MITCHELL A, NUKA G, PESSEAT S, QUINN AF, SANGRADOR-VEGAS A, SCHEREMETJEW M, YONG SY, LOPEZ R, HUNTER S. 2014. InterProScan 5: Genome-scale protein function classification. Bioinformatics 30: 1236–1240.

KIM SY, PAENG SK, NAWKAR GM, MAIBAM P, LEE ES, KIM KS, LEE DH, PARK DJ, KANG SB, KIM MR, LEE JH, KIM YH, KIM WY, KANG CH. 2011. The 1-Cys peroxiredoxin, a regulator of seed dormancy, functions as a molecular chaperone under oxidative stress conditions. Plant Science 181(2): 119–124.

KOZIERADZKA-KISZKURNO M, PŁACHNO BJ. 2012. Are there symplastic connections between the endosperm and embryo in some angiosperms? A lesson from the Crassulaceae family. Protoplasma 249: 1081–1089.

LIU Y, HE Z, APPELS R, XIA X. 2012. Functional markers in wheat: Current status and future prospects. Theor. Appl. Genet. 125: 1–10.

LUJAJI F, BERECZKY A, JANOSI L, NOVAK C, MBARAWA M. 2010. Cetane number and thermal properties of vegetable oil, biodiesel, 1-butanol and diesel blends. J Therm. Anal. Calorim. 102: 1175–1181.

MURPHY DJ, RAWSTHORNE S, HILLS MJ. 1993. Storage lipid formation in seeds. Seed Sci. Res. 3: 79–95.

MUSACCHIA F, BASU S, PETROSINO G, SALVEMINI M, SANGES R. 2015. Annocript: A flexible pipeline for the annotation of transcriptomes able to identify putative long noncoding RNAs. Bioinforma. Oxf. Engl. 31: 2199–2201.

N E J AT N, C A H I L L D M, VA D A M A L A I G, ZIEMANN M, ROOKES J. NADERALI N. 2015. Transcriptomics-based analysis using RNA-Seq of the coconut (Cocos nucifera) leaf in response to yellow decline phytoplasma infection. Mol. Genet. Genomics MGG 290: 1899–1910.

[PCA] Philippine Coconut Authority. 2019. History of the coconut industry in the Philippines. Retrieved from www.pca.da.gov.ph/index.php/2015-10-26-03-15-57/2015-10-26-03-19-51 on 3 Jul 2019.

POCZAI P, VARGA I, LAOS M, CSEH A, BELL N, VALKONEN JP, HYVÖNEN J. 2013. Advances in plant gene-targeted and functional markers: A review. Plant Methods 9: 1–32.

RAUTENGARTEN C, USADEL B, NEUMETZLER L, HARTMANN J. BÜSSIS D, ALTMANN T. 2008. A subtilisin‐like serine protease essential for mucilage release from Arabidopsis seed coats. The

Plant Journal 54(3): 466–480.

RIVERA R, EDWARDS KJ, BARKER J, ARNOLD GM, AYAD G, HODGKIN T, KARP A. 1999. Isolation and characterization of polymorphic microsatellites in Cocos nucifera L. Genome 42: 668–675.

ROBINSON MD, MCCARTHY DJ, SMYTH GK. 2010. EdgeR: A bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26: 139–140.

ROZEN S, SKALETSKY H. 2000. Primer3 on the WWW for general users and for biologist programmers. Methods in Molecular Biology 132: 365–386.

SALMANI MH, REHMAN S, ZAIDI K, HASAN AK. 2015. Study of ignition characteristics of microemulsion of coconut oil under off diesel engine conditions. Eng. Sci. Technol. Int. J 18: 318–324.

SCHULZ MH, ZERBINO DR, VINGRON M, BIRNEY E. 2012. Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28: 1086–1092.

STADLER R, LAUTERBACH C, SAUER N. 2005. Cell-to-cell movement of green fluorescent protein reveals post-phloem transport in the outer integument and identifies symplastic domains in Arabidopsis seeds and embryos. Plant Physiology 139(2): 701–712.

SEBAA HS, HARCHE MK. 2014. Anatomical structure and ultrastructure of the endocarp cell walls of Argania spinosa (L.) Skeels (Sapotaceae). Micron 67: 100–106.

TAMAGNONE L, MERIDA A, PARR A, MACKAY S, CULIANEZ-MACIA FA, ROBERTS K, MARTIN C. 1998. The AmMYB308 and AmMYB330 transcription factors from Antirrhinum regulate phenylpropanoid and lignin biosynthesis in transgenic tobacco. The Plant Cell 10(2): 135–154.

TANG S, LOMSADZE A, BORODOVSKY M. 2015. Identification of protein coding regions in RNA transcripts. Nucleic Acids Res. 43: e78–e78.

TATUSOV RL, GALPERIN MY, NATALE DA, KOONIN EV. 2000. The COG database: A tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28(1): 33–36.

TEULAT B, ALDAM C, TREHIN R, LEBRUN P, BARKER JHA, ARNOLD GM, KARP A, BAUDOUIN L, ROGNON F. 2000. An analysis of genetic diversity in coconut (Cocos nucifera) populations from across the geographic range using sequence-tagged microsatellites (SSRs) and AFLPs. Theor. Appl. Genet. 100: 764–771.

Punzalan et al.: Transcriptome Sequencing Analysis in Philippine Coconut

Special Issue on Genomics

94

Page 13: Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

THIEL J, ROLLETSCHEK H, FRIEDEL S, LUNN JE, NGUYEN TH, FEIL R, TSCHIERSCH H, MÜLLER M, BORISJUK L. 2011. Seed-specific elevation of non-symbiotic hemoglobin AtHb1: beneficial effects and underlying molecular networks in Arabidopsis thaliana. BMC Plant Biology 11(1): 48.

THIEL T, MICHALEK W, VARSHNEY R. GRANER A. 2003. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theo Appl Genet. 106: 411.

TO JP, REITER WD, GIBSON SI. 2002. Mobilization of seed storage lipid by Arabidopsis seedlings is retarded in the presence of exogenous sugars. BMC Plant Biol. 2: 4.

VARSHNEY RK, GRANER A, SORRELLS ME. 2005. Genic microsatellite markers in plants: Features and applications. Trends Biotechnol. 23: 48–55.

VIGEOLAS H, VAN DONGEN JT, WALDECK P, HÜHN D, GEIGENBERGER P. 2003. Lipid Storage Metabolism Is Limited by the Prevailing Low Oxygen Concentrations within Developing Seeds of Oilseed Rape. Plant Physiol. 133: 2048–2060.

WANG Y, COLEMAN-DERR D, CHEN G, GU YQ. 2015. OrthoVenn: A web server for genome wide comparison and annotation of orthologous clusters across multiple species. Nucleic Acids Res. 43(78–84).

WOLF S, MOUILLE G, PELLOUX J. 2009. Homogalacturonan methyl-esterification and plant development. Molecular Plant 2(5): 851–860.

XIE Y, WU G, TANG J, LUO R, PATTERSON J, LIU S, HUANG W, HE G, GU S, LI S, ZHOU X, LAM TW, LI Y, XU X, WONG GK, WANG J. 2014. SOAPdenovo-Trans: De novo transcriptome assembly with short RNA-Seq reads. Bioinformatics 12: 1660–1666.

YE J, FANG L, ZHENG H, ZHANG Y, CHEN J, ZHANG Z, WANG J, LI S, LI R, BOLUND L, WANG J. 2006. WEGO: A web tool for plotting GO annotations. Nucleic Acids Res. 34, W293–W297.

YOUNG FVK. 1983. Palm Kernel and coconut oils: Analytical characteristics, process technology and uses. J Am. Oil Chem. Soc. 60: 374–379.

ZDOBNOV EM, APWEILER R. 2001. InterProScan – An integration platform for the signature-recognition methods in InterPro. Bioinformatics 17: 847–848.

Punzalan et al.: Transcriptome Sequencing Analysis in Philippine Coconut

Special Issue on Genomics

95

Page 14: Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

Table I. Assembly statistics of de novo assemblers: Trinity, Oases, and SoapDenovo-Trans.

Assembly K-mer size N50 Contigs Assembly size (Mb) Genome coverage (%) Average depth

Oases 21 2230 564,559 416.250286 19.8 80.3

31 2287 560,265 391.387566 18.6 85.4

41 2261 354,308 334.105343 15.9 100.0

51 1984 246,892 263.783198 12.6 126.7

61 2103 201,497 224.048342 10.7 149.2

71 1803 162,240 161.301693 7.7 207.2

SoapDenovo-Trans 21 270 1,177,433 221.761631 10.6 150.7

31 357 1,005,851 230.237508 11.0 145.1

41 501 592,353 188.927227 9.0 176.9

51 635 407,728 163.567959 7.8 204.3

61 692 322,181 144.669139 6.9 231.0

71 758 224,240 116.882899 5.6 285.9

Trinity 25 2229 336,873 395.83131 18.8 84.4

Table II. Protein predictions and orthologous clusters in different assemblies.

Proteins Clusters Singletons Non-redundant proteins

Total Percentage

BAYT – SOAP 40033 21012 14601 35613 88.95910874

BAYT – Oases 41993 20533 15573 36106 85.98099683

BAYT – Trinity 46475 22475 18171 40646 87.457773

Oryza indica 40745 14543 12824 27367 67.1665235

LAGT 36635 21932 10672 32604 88.99686093

HNT 29951 16357 8789 25146 83.95712998

Table III. Singleton GO annotation of orthologous proteins.

GO ID GO enriched processes Level 1 p-value

ES only processes

GO:0061416 regulation of transcription from RNA polymerase II promoter in response to salt stress

biological_process 0.023981

GO:0009662 etioplast organization biological_process 0.023981

GO:0006606 protein import into nucleus biological_process 0.037633

EC only processes

GO:0003964 RNA-directed DNA polymerase activity molecular_function 0.020672

GO:0015074 DNA integration biological_process 0.023644

GO:0016709 oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, NAD(P)H as one donor, and incorporation of one atom of oxygen

molecular_function 0.032449

GO:0042617 paclitaxel biosynthetic process biological_process 0.032449

GO:0004519 endonuclease activity molecular_function 0.042952

BAYT only processes

GO:0004013 adenosylhomocysteinase activity molecular_function 0.01777

GO:0016157 sucrose synthase activity molecular_function 0.027867

GO:0010328 auxin influx transmembrane transporter activity molecular_function 0.038758

Punzalan et al.: Transcriptome Sequencing Analysis in Philippine Coconut

Special Issue on Genomics

96

Page 15: Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

Table IV. Experimental design used for the pairwise differential expression analysis of the BAYT EC.

Sample Library size (pre-filter) (bp) Library size (post-filter) (bp) Condition Palm

BAYT 13 EC 1 32744684 32744684 (100 %) Control 1

BAYT 13 EC 2 30596900 30596900 (100 %) Control 1

BAYT 4 EC 1 26629426 26629426 (100 %) Treatment 2

BAYT 4 EC 2 26629426 29013035 (100 %) Treatment 2

Table V. Experimental design used for the pairwise differential expression analysis of the BAYT ES.

Sample Library size (pre-filter) (bp) Library size (post-filter) (bp) Condition Palm

BAYT 13 ES 1 27936946 27936946 (100 %) Control 1

BAYT 13 ES 2 31496401 31496401 (100 %) Control 1

BAYT 4 ES 1 20445675 20445675 (100 %) Treatment 2

BAYT 4 ES 2 29586162 29586162 (100 %) Treatment 2

Table VI. Candidate genes potentially involved in increased copra yield.

Name Description logFC FDR Primers Tm %GC Prod. size (bp)

Cluster-11151.0* uncharacterized protein 1.899859781 0.003636541 TGATGTCGCCTATCCTAAGTGA 59.73 45123

CGGAGAAGGATTGAACCAAG 59.66 50

Cluster-69820.0* cell division cycle 27 homolog isoform 2 10.23944649 0.015068085 TGTGTTCTTACCACGGCAAA 60.15 45

214GGAGGATGCCAAAGTGTTGT 59.97 50

Cluster-63256.0*disease resistance RGA3 7.059235399 0.042617636 GGTTGCCCATGTTGAGAGAT 59.93 50380

GGTGAGGGAGCAGAGTGAAG 59.99 60

Cluster-9737.0 Cysteine-rich RLK 8 5.583639497 0.02932236 TGTGGGACACTATGGCTGAA 60.11 50147

TCGTCCTTGTCAATGCTCTG 59.98 50

Cluster-79308.0 cytochrome b5 -3.872425511 0.027661966 GAAGGATGCAACCAACGATT 59.94 45208

CAAGATCAAGACAGGCACGA 59.98 50

Cluster-66310.0 U3 MPP10 3.857271117 0.033108013 CGGTGGAAGACGAGAAGAGA 60.52 55141

AAGGGAGGAGAGCGGAGTAG 59.97 60

Cluster-26403.0 pectinesterase inhibitor 28 6.407987158 0.004372268 CTCGACACATGCTTCTACGC 59.62 55

150GATCCAGGTTTGCCCATAGA 59.89 50

Cluster-383.0 uncharacterized protein -6.781461397 0.013775674 AGCACTGCATGCACAAAGTC 60.06 50162

CCGCTTCACTGATAGCCTTC 59.98 55

Cluster-52289.0 photosystem II N 5.405947874 2.64E-004 GGAAACAGCAACCCTAGTCG 59.73 55131

ACTAGTCCCCGTGTTCTTCG 59.21 55

Cluster-182.0 pathogenesis-related 1-like –8.185652938 0.001040674 GGAGCTGTGAGATCGAGTCC 59.95 60

182GCTAAAAGGCATTGCTGAGG 59.98 50

Cluster-6427.0 uncharacterized protein LOC109506209 4.159850614 0.038036122 GGCCACATATCCTGCACTTT 59.96 50

317TTGTCAACCCCAAAAAGGAG 59.94 45

Cluster-1142.0 tyrosine-sulfated glycopeptide receptor 1 –4.883055754 0.001524976 GCAGCCACTTGTTCCTTCTC 60 55

364AATACCACGAGGTGGTCAGC 60 55

Cluster-51538.0 pre translocase subunit 9.910346774 7.07E–004 CTGTTGCTGTTGGTTGCAGT 59.95 50117

TACACGACGACCTTGCTGAC 59.9 55

Punzalan et al.: Transcriptome Sequencing Analysis in Philippine Coconut

Special Issue on Genomics

97

Page 16: Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

Table VI. Candidate genes potentially involved in increased copra yield.

Cluster-41241.0 NADH-plastoquinone oxidoreductase subunit 7 5.044086813 6.43E-004 TGCACCAGAACAATTGGAAA 60.09 40

211CATCATTCGCATACCTGTGG 59.95 50

Cluster-41974.0 NRT1 PTR FAMILY –4.264618813 0.003786818 AATAACGCCAACTGGACAGG 59.99 50369

CCCGATGTTGATCGAGAAGT 60.07 50

Cluster-57477.0 non-symbiotic hemoglobin 1 4.059655185 0.00887499 TCATGGACCGTGATGAAGAA 60.05 45

105GCGCAGAAAAGAGAACAACC 60 50

Cluster-59617.0pyruvate dehydrogenase E1 component subunit

beta–3.874207337 0.019363258 TCACAGCGGTTGGAATCATA 60.07 45

201CTCTGCCTCCTCCAGACAAC 59.99 60

Cluster-122.0 serine-threonine kinase STK13 –6.578718897 0.019900298 TGCCCTTCAAAAGAGAGCAT 59.96 45

159TTTTCGTCCACCAACCATTT 60.21 40

Cluster-47457.0 subtilisin-like protease –2.81202487 0.020096783 CACTACGCTGGATCGTGAGA 60.01 55377

CTGGTGACATTCTGGTGGTG 60 55

Cluster-81036.0 lecithin-cholesterol acyltransferase-like 4 –6.693001017 0.016301405 GCAGGATGGTTCTGGTGAAT 59.93 50

165CGAGTTTCATTTGCCCATTT 59.94 40

Cluster-48945.0 RNA polymerase beta subunit 5.476349387 2.84E-004 CGGGGTATTCAGGAAGAACA 59.93 50

269GAAACCCCCGAATCTAGAGC 60.04 55

Cluster-35501.0 serine-threonine kinase STK2 6.108634977 0.00887499 GAGGCCTTCAGCTCTCTCCT 60.24 60

129ATCTTTGCCGAACGAATCAC 60.08 45

Cluster-50481.0 NADH-plastoquinone oxidoreductase subunit 4 4.023384784 0.006546047 GAGACTGGGAATCGATGGAC 59.47 55

238TCCTCCCCACATGGATAAAA 60.13 45

Cluster-49279.0 ribosomal S2 8.19711574 0.006050978 TTAGGGTCGAACCAACAACC 59.83 50230

CCCCGTTCAAGAATCACTGT 59.97 50

Cluster-35036.1 myb-related 308 3.766755234 0.036543915 ACCGTCTCATCGCCTACATC 60.1 55155

CTCTTCCTCGGTGAAGTTGC 59.99 55

Cluster-79635.0 ACT domain-containing ACR3 3.620286485 0.001000276 CCACCTTTCAAGCTGTGGAT 60.11 50

377GTAGGGGCCATCAGATGAGA 60.03 55

Cluster-82179.0 pentatricopeptide repeat-containing protein –4.028412313 0.020143994 TGCTTGTGCCAAGAAAGATG 59.99 45

109GATGCTGGCTAACATGCTGA 59.98 50

Cluster-50998.0 1-Cys peroxiredoxin 3.266714533 0.00305012 ACTACGTAGGCGATGGTTGG 60.02 55164

TTCCATGTGGCACTTGACAT 59.97 45

Cluster-42553.0 SUMO-conjugating enzyme SCE1 4.062634002 0.005852489 CGAAGCCAGAAACACTGTCA 60.02 50

235CTCCACCCACTGTCCTCATT 59.96 55

Cluster-129.0 1-aminocyclopropane-1-carboxylate –7.467778851 0.003950223 GGCAATTCCTGTCATCGACT 60.08 50

120GGAATCCCATGGTTCACAAC 60.03 50

Cluster-129.1 1-aminocyclopropane-1-carboxylate –4.389856173 0.04983725 TCTGGAAAAGAGAGGGCTGA 60.07 50

89GGAATCCCATGGTTCACAAC 60.03 50

Cluster-50560.0 hypothetical chloroplast RF2 4.795427592 1.47E–004 ACCCCCGAATTTGGAGTATC 60.02 50

170AGGAAGAAGCCCCATCAAAT 59.9 45

Cluster-82235.0 purine permease 3-like 3.077207817 0.039875236 CTCCGCCTCTACTTTGTTCG 60.01 55206

CAGTCGAGGGCGATAAGAAG 59.97 55

Cluster-58200.0 30S ribosomal protein S1 –5.394166325 4.93E–004 ACCTTACGGTGCCTTCATTG 59.99 50

159CTTACTCGGCCTCTTTCACG 60.01 55

Punzalan et al.: Transcriptome Sequencing Analysis in Philippine Coconut

Special Issue on Genomics

98

Page 17: Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

Table VI. Candidate genes potentially involved in increased copra yield.

Cluster-62964.0 mediator of RNA polII transcription subunit 32 3.184296477 0.032660842 AAGAATCGGTGGGAGGTGTT 60.75 50

196TCGAGTGAACCACATGAAGC 59.84 50

Cluster-45834.0

inositol phosphorylceramide

glucuronosyltransferase 1

4.753878601 0.001094TCTGGGTGCAGTCTCTGTTG 60.02 55

80

GCCTGGGTACAACTGCAAAT60 50

Cluster-39407.0very-long-chain

(3R)-3-hydroxyacyl- dehydratase 2

5.821671441 0.016983877 GTCTCCTCTCCACCACAAGC 59.84 60167

AGATCTCCAGCAGAGGCGTA 60.12 55

Cluster-84564.0 transcript antisense to ribosomal RNA 6.492765779 2.16E–004 ACAAAGGCTTGGTGTCCAAC 60.01 50

117CCCTAGGCCCGAGTTGTAAT 60.33 55

Cluster-53671.0 GDSL esterase lipase 5 2.978689438 0.03614376 GGGAGAGTTGGGTGATGAGA 60.05 55499

CTTCCCGCAACTGAAGACTC 59.99 55

Cluster-48979.0 ENHANCED DISEASE RESISTANCE 2 3.267262738 0.036543915 TCTGGTGCAGACAGTGGAAG 60.02 55

489ATCACGGGGCCATATAAACA 60.04 45

Cluster-24884.0 cellulose synthase G3 4.557607643 0.014124903 TTCTCCCCATCACTTCAAGG 60.04 50309

CAATTGGAGCCCACGTAGTT 59.99 50

Cluster-45489.0 G- coupled receptor 1 3.208735503 0.034925916 GATGTTGGGTTTGCTGGACT 59.97 50183

CATCAGCGACACAAGCTCAT 60.02 50

Cluster-11750.0 gag-pol polyprotein 4.192758154 0.012439889 GTTCCATGGCCTCACAACTT 59.97 50500

TGCCTGCTTTGTGCATTATC 59.97 45

Cluster-29022.2xyloglucan

endotransglucosylase hydrolase 23

–6.447337891 0.024567664 AGATGCAGTTCCACCTCTGG 60.26 5599

GTGTGCCATCAACCATGAAG 59.97 50

Cluster-68234.0 3-oxoacyl-[acyl-carrier- ] synthase 2.559893439 0.037870289 AGTGATCAGCGAAGCCTCAT 59.98 50

114GGAAAGGGACCATTTGGTTT 60.03 45

Cluster-47578.1 alpha-tubulin 4.367874667 0.006175533 GCACTGGTCTTCAAGGCTTC 60 5595

GTCGACAGACAGACGCTCAA 60.19 55

Cluster-58509.0 fructose-1,6-biphosphatase 6.057182354 0.049217648 TGCTGATGTTCATCGTACCC 59.53 50

134GACCACCAGCTTGTTCCATC 60.52 55

Cluster-609.6xyloglucan

endotransglucosylase hydrolase

–4.380909447 0.003892738 CAAGGACCAACCCATGATGA 61.73 5080

ACATTGGTGTGGAGCGTGTA 60.03 50

Cluster-50832.0 ATP synthase beta subunit 5.986601826 0.022893679 CTATCCTTGGGTTGGACGAA 59.93 50

96CCACGAAGAAGGGTTGTGAT 59.97 50

Cluster-226.0 CBF-like transcription factor 2 –9.577766035 5.96E–005 CGAGGAGGCATACTCGACTG 60.95 60

132ACCTCGCAGACCCACTTGT 90.73 58

Cluster-59126.0 EXPORTIN 1A 4.504108814 8.09E-004 CAGTTGAGCAACGAGATGGA 59.98 5094

TATAGCCTCTCCTGCCGAAA 59.94 50

Cluster-70170.0 aldehyde dehydrogenase family 2 member 3.928928109 0.015041299 ACAAGGTCCTCAGGTTGACG 60.15 55

165ATCGCCATTTCATCCTTCAC 59.9 45

Cluster-50488.0 11S globulin isoform 2 4.910889691 9.10E-004 TGCTGAGAGGGTGGTTCTCT 59.99 5593

CCTCGTCCTCCGGTACAATA 59.95 55

Cluster-53451.0 ATP synthase subunit 7.618550495 0.001656577 TTTGGTGCGAGGAACTTACC 60.11 50232

GCCACGACCAGTCCATAAAT 59.82 50

Cluster-68022.0 phosphatase 2C 51 3.388040994 0.007071638 GGTGCCACCGGATATAATTG 60.04 50185

CCCCAGCCTGAATAATCCTT 60.28 50

Punzalan et al.: Transcriptome Sequencing Analysis in Philippine Coconut

Special Issue on Genomics

99

Page 18: Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

Table VI. Candidate genes potentially involved in increased copra yield.

Cluster-68809.0 UPF0496 At1g20180 6.031951373 0.012724548 TAATCCATTCCCCAATCCAA 59.95 40175

TCCAACCACTCCACAAATGA 59.94 45

Cluster-78671.0 ubiquitin-40S ribosomal S27a –2.913857271 0.015155026 TCTTCGTCAAGACCCTCACC 60.24 55

98GCCCTCTTTGTCCTGGATTT 60.44 50

Cluster-33001.0 cellulose synthase G3 6.894364769 0.001183721 CAAAGCGAGCGATATGACAA 59.98 45237

CTTTCCACCATCCTTCCTCA 60.04 50

Cluster-67759.1 photosystem I subunit IX 4.554864846 0.001339199 GGAGGATTTTCAATGCGAGA 60.16 45

83CCTGCTAAAGACCCAAACCA 60.1 50

Cluster-11613.0 cellulose synthase G3 5.925043709 0.013478944 CTAGCCGATCTTGTCCTTGC 59.98 55323

CAGCAGAATGGCAACCAGTA 59.86 50

Cluster-33281.0 kinesin KIN-5C isoform X2 4.610883924 0.012591791 ATCTCGCCACGAGAAAGAAA 59.96 45

87GCGTTATTCCTCAGCTCGTC 59.98 55

Cluster-49467.0 Tubulin beta-8 chain 3.625522299 0.010395999 GCAAGGAAGCAGAGAATTGC 60.1 50133

ATCATCATTCGATCCGGGTA 60.12 45

Cluster-33.1 Leaf rust 10 disease resistance receptor –6.321318983 0.030273666 AAATGGTTGGTGGACGAAAA 60.21 40

199GGTCGATCTACAGGCTGCAT 60.25 55

Cluster-53457.1 NADH dehydrogenase 5.565575156 0.020816035 GCTCATGGGCAGCTTATCTC 59.95 55 111GCAAAGTTCCCACTGATGGT 59.97 50

*Sequences found in the ES

Table VII. SSR markers mined from the differentially expressed genes.

ID SSR type SSR Size Start End Primers Tm Prod.size Target

Cluster-11151.0* p2 (CT)11 22 60 81GGAACTGGCTTCCTGGAGTC 60.0

193 337–529TCGGACTTCGATTTTGGGCA 60.0

Cluster-66310.0 c (GAG)6gacgacgacgac(GAA)8 54 619 672CTACTCCGCTCTCCTCCCTT 60.1

275 302–576CCTCCTCATCTTCCTCCCCA 60.0

Cluster-57477.0 p1 (T)11 11 690 700AAAGAGAACAACCGCGTTGC 60.0

105 446–550GCTGAAGTCATGGACCGTGA 60.0

*Sequences found in the ES

Punzalan et al.: Transcriptome Sequencing Analysis in Philippine Coconut

Special Issue on Genomics

100

Page 19: Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

Figure I. Species and Top-Hit distribution of BLAST results. The unigenes of the BAYT assembly were annotated through BLAST against the plant NR database shows highest match with the African oil palm (Elaeis guineensis) and date palm (Pheonix dactylifera) for both the total amount of hits and the top-hit species identity.

Figure II. Venn diagrams of BAYT – OASES and SOAPdenovo assemblies showing the number of orthologous predicted proteins present in the LAGT, HNT, and rice assemblies.

Figure III. QPCR validation of housekeeping genes in the EC and ES of high-yielding and low-yielding coconut.

Punzalan et al.: Transcriptome Sequencing Analysis in Philippine Coconut

Special Issue on Genomics

101

Page 20: Differential Expression Analysis in High-yielding and Low ...philjournalsci.dost.gov.ph/images/pdf/special... · for coconut-producing countries such as the Philippines to increase

Figure IV. Volcano (top) and MA (bottom) plots of read counts in ES and EC. The volcano plot shows the negative log of adjusted p-values versus the fold changes while the MA plot shows the relationship between logFC and average expression level of the differentially expressed genes.

Punzalan et al.: Transcriptome Sequencing Analysis in Philippine Coconut

Special Issue on Genomics

102