16
Please cite this article in press as: Hook, S.E., et al., 454 pyrosequencing-based analysis of gene expression profiles in the amphipod Melita plumulosa: Transcriptome assembly and toxicant induced changes. Aquat. Toxicol. (2014), http://dx.doi.org/10.1016/j.aquatox.2013.11.022 ARTICLE IN PRESS G Model AQTOX-3691; No. of Pages 16 Aquatic Toxicology xxx (2014) xxx–xxx Contents lists available at ScienceDirect Aquatic Toxicology jou rn al hom ep age: www.elsevier.com/locate/aquatox 454 pyrosequencing-based analysis of gene expression profiles in the amphipod Melita plumulosa: Transcriptome assembly and toxicant induced changes Sharon E. Hook a,, Natalie A. Twine b , Stuart L. Simpson a , David A. Spadaro a , Philippe Moncuquet c , Marc R. Wilkins b a CSIRO Land and Water, Locked Bag 2007, Kirrawee, NSW 2232, Australia b NSW Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia c CSIRO Mathematics, Informatics, and Statistics, Acton, ACT, 2601, Australia a r t i c l e i n f o Article history: Received 23 June 2013 Received in revised form 26 November 2013 Accepted 28 November 2013 Keywords: De novo assembly Transcriptome assembly RNA Seq Amphipod Toxicogenomics a b s t r a c t Next generation sequencing using Roche’s 454 pyrosequencing platform can be used to generate genomic information for non-model organisms, although there are bioinformatic challenges associated with these studies. These challenges are compounded by a lack of a standardized protocol to either assemble data or to evaluate the quality of a de novo transcriptome. This study presents an assembly of the control and toxicant responsive transcriptome of Melita plumulosa, an Australian amphipod commonly used in ecotoxicological studies. RNA was harvested from control amphipods, juvenile amphipods, and from amphipods exposed to either metal or diesel contaminated sediments. This RNA was used as the basis for a 454 based transcriptome sequencing effort. Sequencing generated 1.3 million reads from control, juvenile, metal-exposed and diesel-exposed amphipods. Different read filtering and assembly protocols were evaluated to generate an assembly that (i) had an optimal number of contigs; (ii) had long con- tigs; (iii) contained a suitable representation of conserved genes; and (iv) had long ortholog alignment lengths relative to the length of each contig. A final assembly, generated using fixed-length trimming based on the sequence quality scores, followed by assembly using the MIRA algorithm, produced the best results. The 26,625 contigs generated via this approach were annotated using Blast2GO, and the differ- ential expression between treatments and control was determined by mapping with BWA followed by DESeq. Although the mapping generated low coverage, many differentially expressed contigs, including some with known developmental or toxicological function, were identified. This study demonstrated that 454 pyrosequencing is an effective means of generating reference transcriptome information for organisms, such as the amphipod M. plumulosa, that have no genomic information available in databases or in closely related sequenced species. It also demonstrated how optimization of read filtering proto- cols and assembly approaches changes the utility of results obtained from next generation sequencing studies, and establishes criteria to determine the quality of a de novo assembly in species lacking a refer- ence genome. This new transcriptomic knowledge provides the genomic foundation for the creation of microarray and qPCR assays, serving as a reference transcriptome in future RNAseq studies, and allowing both the biology and ecotoxicology of this organism to be better understood. This approach will allow genomics-based methodology to be applied to a wider range of environmentally relevant species. Crown Copyright © 2013 Published by Elsevier B.V. All rights reserved. 1. Introduction Environmental contaminants are often discharged into water- ways and are ultimately deposited in sediments. Amphipods are sediment-dwelling crustaceans that are sensitive to many envi- ronmental contaminants and are often the first to disappear from Corresponding author. Tel.: +61 02 9710 6839. E-mail address: [email protected] (S.E. Hook). contaminated sediments (Mann et al., 2010). As a consequence, they are typically used in toxicity testing. The epibenthic amphipod, Melita plumulosa, is commonly found in estuarine habitats across South Eastern Australia (Hyne et al., 2005; King et al., 2005). It typ- ically lives at the sediment–water interface, and ranges from the intertidal region to depths of up to 25 m (King et al., 2006a). It is a detrital, deposit-feeder and has been shown to take up con- taminants from both the sediment pore water and via ingestion (King et al., 2005, 2006a). It efficiently accumulates metals and is more sensitive to metal contaminants than other Australasian 0166-445X/$ see front matter. Crown Copyright © 2013 Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.aquatox.2013.11.022

454 pyrosequencing-based analysis of gene expression profiles in the amphipod Melita plumulosa: Transcriptome assembly and toxicant induced changes

  • Upload
    csiro

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

G

A

4ai

SPa

b

c

a

ARR2A

KDTRAT

1

wsr

0h

ARTICLE IN PRESS Model

QTOX-3691; No. of Pages 16

Aquatic Toxicology xxx (2014) xxx– xxx

Contents lists available at ScienceDirect

Aquatic Toxicology

jou rn al hom ep age: www.elsev ier .com/ locate /aquatox

54 pyrosequencing-based analysis of gene expression profiles in themphipod Melita plumulosa: Transcriptome assembly and toxicantnduced changes

haron E. Hooka,∗, Natalie A. Twineb, Stuart L. Simpsona, David A. Spadaroa,hilippe Moncuquetc, Marc R. Wilkinsb

CSIRO Land and Water, Locked Bag 2007, Kirrawee, NSW 2232, AustraliaNSW Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, AustraliaCSIRO Mathematics, Informatics, and Statistics, Acton, ACT, 2601, Australia

r t i c l e i n f o

rticle history:eceived 23 June 2013eceived in revised form6 November 2013ccepted 28 November 2013

eywords:e novo assemblyranscriptome assemblyNA Seqmphipodoxicogenomics

a b s t r a c t

Next generation sequencing using Roche’s 454 pyrosequencing platform can be used to generate genomicinformation for non-model organisms, although there are bioinformatic challenges associated with thesestudies. These challenges are compounded by a lack of a standardized protocol to either assemble dataor to evaluate the quality of a de novo transcriptome. This study presents an assembly of the controland toxicant responsive transcriptome of Melita plumulosa, an Australian amphipod commonly used inecotoxicological studies. RNA was harvested from control amphipods, juvenile amphipods, and fromamphipods exposed to either metal or diesel contaminated sediments. This RNA was used as the basisfor a 454 based transcriptome sequencing effort. Sequencing generated 1.3 million reads from control,juvenile, metal-exposed and diesel-exposed amphipods. Different read filtering and assembly protocolswere evaluated to generate an assembly that (i) had an optimal number of contigs; (ii) had long con-tigs; (iii) contained a suitable representation of conserved genes; and (iv) had long ortholog alignmentlengths relative to the length of each contig. A final assembly, generated using fixed-length trimmingbased on the sequence quality scores, followed by assembly using the MIRA algorithm, produced the bestresults. The 26,625 contigs generated via this approach were annotated using Blast2GO, and the differ-ential expression between treatments and control was determined by mapping with BWA followed byDESeq. Although the mapping generated low coverage, many differentially expressed contigs, includingsome with known developmental or toxicological function, were identified. This study demonstratedthat 454 pyrosequencing is an effective means of generating reference transcriptome information fororganisms, such as the amphipod M. plumulosa, that have no genomic information available in databasesor in closely related sequenced species. It also demonstrated how optimization of read filtering proto-

cols and assembly approaches changes the utility of results obtained from next generation sequencingstudies, and establishes criteria to determine the quality of a de novo assembly in species lacking a refer-ence genome. This new transcriptomic knowledge provides the genomic foundation for the creation ofmicroarray and qPCR assays, serving as a reference transcriptome in future RNAseq studies, and allowingboth the biology and ecotoxicology of this organism to be better understood. This approach will allowgenomics-based methodology to be applied to a wider range of environmentally relevant species.

. Introduction

Environmental contaminants are often discharged into water-

Please cite this article in press as: Hook, S.E., et al., 454 pyrosequencingplumulosa: Transcriptome assembly and toxicant induced changes. Aquat.

ays and are ultimately deposited in sediments. Amphipods areediment-dwelling crustaceans that are sensitive to many envi-onmental contaminants and are often the first to disappear from

∗ Corresponding author. Tel.: +61 02 9710 6839.E-mail address: [email protected] (S.E. Hook).

166-445X/$ – see front matter. Crown Copyright © 2013 Published by Elsevier B.V. All rittp://dx.doi.org/10.1016/j.aquatox.2013.11.022

Crown Copyright © 2013 Published by Elsevier B.V. All rights reserved.

contaminated sediments (Mann et al., 2010). As a consequence,they are typically used in toxicity testing. The epibenthic amphipod,Melita plumulosa, is commonly found in estuarine habitats acrossSouth Eastern Australia (Hyne et al., 2005; King et al., 2005). It typ-ically lives at the sediment–water interface, and ranges from theintertidal region to depths of up to 25 m (King et al., 2006a). It

-based analysis of gene expression profiles in the amphipod MelitaToxicol. (2014), http://dx.doi.org/10.1016/j.aquatox.2013.11.022

is a detrital, deposit-feeder and has been shown to take up con-taminants from both the sediment pore water and via ingestion(King et al., 2005, 2006a). It efficiently accumulates metals andis more sensitive to metal contaminants than other Australasian

ghts reserved.

ING Model

A

2 Toxico

aos2

fime2fbfbeip2c2me

swcieo(apdowptVtR

tctptgieelicagTtoptefMjbpd

ARTICLEQTOX-3691; No. of Pages 16

S.E. Hook et al. / Aquatic

mphipods (King et al., 2006b). It also is easily cultured in the lab-ratory (Hyne et al., 2005; Mann et al., 2010), making it an idealpecies for use in sediment toxicity testing (Simpson and Spadaro,011).

The “genomic revolution” has led to significant advances in theeld of ecotoxicology, allowing researchers to better examine theode of contaminant action, stressor interactions, and impacts at

nvironmentally realistic contaminant doses (Poynton and Vulpe,009; Villeneuve and Garcia-Reyero, 2011). However, the needor a priori knowledge of the genomic sequence of the organismeing studied meant that early studies were largely confined to aew species of fish (Hook, 2010). Far fewer genomics studies haveeen carried out in crustaceans, despite their environmental andcotoxicological importance, because of the paucity of sequencenformation (Stillman et al., 2008). To our knowledge, the only com-lete crustacean genome is that of Daphnia pulex (Colbourne et al.,011), although next generation sequencing projects have beenompleted for other crustaceans including a copepod (Lee et al.,010), an amphipod commonly used as a model species in develop-ental biology (Zeng et al., 2011), and for prawns (Kawahara-Miki

t al., 2011; Jung et al., 2011; Li et al., 2012; Ma et al., 2012).Toxicogenomic studies frequently interpret changes in tran-

cript levels as an indication of potential changes in key functionsithin the organism, i.e. in steroid hormone receptors that may

ause endocrine disruption; in oxidative stress responses or inmmunological processes (Perez-Casanova et al., 2011; Williamst al., 2013; Wiseman et al., 2013). However, cells have a multitudef regulatory processes in between transcription and translationTaylor et al., 2013), and not all cellular processes are regulatedt the level of the transcript and may instead be regulated viaost-translational modification. As a consequence, not all changesetected at the transcript level are observed at the protein level,r more importantly, at the effect level (Taylor et al., 2013). Asith most tools used in risk assessment, transcriptomic responsesrovide an indication of potential changes to higher “organiza-ional levels” (i.e. to functions and potential toxicity) (Poynton andulpe, 2009), but in common with other tools do not provide cer-

ainty of observing a predicted potential effect (e.g. Nikkinmaa andytkonen, 2011; Van Straalen and Feder, 2012).

We intend to apply the recent ecotoxicogenomic advancesowards our ongoing work with M. plumulosa. However, like mostrustaceans, almost no sequence information was available athe onset of this study. Consequently, we have used Roche 454yrosequencing to characterize its transcriptome. Next genera-ion transcriptome sequencing has been used to identify novelene transcripts and levels of gene expression in many organ-sms that lack reference genomes (e.g. Lowe et al., 2011; Zengt al., 2011; Craft et al., 2010; Hudson, 2008). Because of our inter-st in ecotoxicology, cDNA libraries were created from healthyaboratory cultures of adult organisms, as well as from organ-sms exposed to metal-and diesel contaminated sediments thataused reproductive impairment but no change in mortality. Welso created a cDNA library from juvenile organisms, to captureenes that could be altered at different developmental stages.he aims of this study were to use RNA Seq from M. plumulosao: (1) characterize the transcriptome to enable the constructionf microarrays and qPCR assays, as well as to serve as a tem-late for low cost transcriptomic assays (such as with ion torrentechnology); (2) examine differential gene expression followingxposure to environmental contaminants; and (3) to examine dif-erential gene expression at multiple life stages (as discussed in

ehinto et al., 2012). We have also defined criteria by which to

Please cite this article in press as: Hook, S.E., et al., 454 pyrosequencingplumulosa: Transcriptome assembly and toxicant induced changes. Aquat.

udge the quality of a next-generation sequencing (NGS) assem-ly without the benefit of a reference transcriptome. This shouldrovide guidance to the increasing number of researchers doinge novo transcriptome assembly. The data presented in this work

PRESSlogy xxx (2014) xxx– xxx

will substantially increase the genomic resources available for thisregionally important species. It will also demonstrate the utility ofusing NGS techniques to understand transcriptional responses toenvironmental contaminants, and set clear benchmarks that otherresearchers can use to evaluate their results in de novo transcrip-tomics projects.

2. Methods

2.1. Test sediments and analyses

The sediments for culturing the amphipods for use controls werecollected as described previously from Bonnet Bay, NSW, an estu-arine site that had been characterized and found to have relativelylow concentrations of metal and organic contaminants (Spadaroet al., 2008; Simpson and Spadaro, 2011).

A metal-contaminated sediment was collected from an estu-arine river site that had been previously shown to have metalconcentrations that cause chronic toxicity, but negligible concen-trations of organic contaminants (Simpson and Spadaro, 2011). Thissediment was stored at 4 ◦C in the dark for approximately 10 weeksuntil used in tests.

A diesel-spiked sediment was prepared in the laboratory byaddition of diesel to the control sediment. The diesel-spikedsediments were maintained in glass containers with minimalheadspace and lids were securely fastened to avoid losses throughevaporation. The spiked sediments were homogenized by vigor-ous shaking of the containers followed by 2–3 h on a bottle roller 3times per week and stored at 4 ◦C with a nitrogen filled headspacewhen equilibrating. A sediment with 10% (w/v) diesel was initiallyprepared. More dilute sediments were prepared by mixing 10%diesel sediment with control sediment to create a concentrationseries of 0.25, 0.5, 1, 2 and 4% (w/v) diesel for chronic toxicity tests.For the gene expression (whenever the phrase gene expression isused in this article, it refers to gene transcription, although weacknowledge that gene expression is also regulated, e.g., at tran-script editing, mRNA stability, translational and protein stabilitylevel) exposures, a sediment containing 5% (w/v) diesel was pre-pared as described above. These sediments were homogenized byvigorous shaking of containers followed by 2–3 h on a bottle roller.Sediments were rolled as described at least 3 times per week forone month before use in tests as described above.

The sediments were analysed in terms of physicochemical prop-erties (organic carbon, particle size, acid-volatile sulphide) andconcentrations of major contaminants. All glass and plastic-warefor analyses were cleaned by soaking in 10% (v/v) HNO3 (BDH,Analytical Reagent grade) for a minimum of 24 h, followed by thor-ough rinsing with deionized water (Milli-Q, 18 M� cm), or wereacetone rinsed for analyses of organic contaminants. All chemicalswere analytical reagent grade or equivalent analytical purity. WaterpH, salinity, temperature and dissolved oxygen measurementswere made with instruments calibrated according to manufacturerinstructions. Methods for sediment particle size (by wet sievingthrough 63 �m nylon sieves followed by gravimetry), particulateorganic carbon (POC, by high temperature TOC analyser) and totalrecoverable metals (TRM, by microwave-assisted aqua regia diges-tion) were as described in Spadaro et al. (2008). Dissolved metalconcentrations in acid digests of water and sediment sampleswere determined by inductively coupled plasma–atomic emis-sion spectrometry (ICP–AES, Varian 730-ES). Analyses of organiccontaminants were made according to standard methods (USEPA,

-based analysis of gene expression profiles in the amphipod MelitaToxicol. (2014), http://dx.doi.org/10.1016/j.aquatox.2013.11.022

1996). The organic analyses included polycyclic aromatic hydrocar-bons (PAHs) and total petroleum hydrocarbons (TPHs). Past studieshad shown that concentrations of tributyltin and most commonorganochlorine and organophosphate pesticides were generally

ING Model

A

Toxico

naw>s(a2wyv

2

Avw1tfii

wcmppsstne

2iSbeefistipa

2

nPfcs2w(

coudmm

ARTICLEQTOX-3691; No. of Pages 16

S.E. Hook et al. / Aquatic

egligible in the estuarine sediments (Simpson and Spadaro, 2011)nd were not tested for in the present study. The following analysesere made for quality assurance: acid-digest blanks; replicates for

20% of samples; analyte sample-spikes for metals; and internaltandards for all organic analyses. The certified reference materialCRM), PACS-2 (National Research Council Canada, NRCC) was alsonalysed for metals. A minimum of three replicates were within0%; recoveries for spikes and sediment CRM for metals wereithin 85–99% of expected values and recoveries for organic anal-

ses were between 80 and 120%. The limits of reporting for thearious methods were less than 10% of the lowest measured values.

.2. M. plumulosa culturing and bioassays

M. plumulosa was collected from the Hawkesbury River (NSW,ustralia) and was cultured in silty sediments as described pre-iously (Spadaro et al., 2008; Mann et al., 2009). The culturesere fed 1 mg/adult of Sera Micron fish food, in combination with

× 105 cells of the diatom Phaeodactylum tricornutum per culture,wice a week. Individuals were isolated from the cultures by trans-erring Nylon mesh patches from the stock cultures, and rinsingndividuals off the patch with clean seawater. Gravid females weredentified under a dissecting microscope.

All chronic toxicity bioassays and gene-expression exposuresere undertaken at a temperature of 21 ± 1 ◦C in an environmental

hamber (Labec Refrigerated Cycling Incubator, Laboratory Equip-ent Pty) on a 12-h light/12-h dark cycle (light intensity = 3.5 �mol

hotons/s/m2) for the test duration. For quality control purposes,hysicochemical parameters, including dissolved oxygen (>85%aturation was achieved by algal photosynthesis), pH (7.9–8.2),alinity (30 ± 2‰), and temperature (20–22 ◦C), were monitored athe beginning, periodically throughout the test, and at test termi-ation in at least one replicate test vessel per sediment bioassay tonsure they remained within acceptable limits.

The 10-d amphipod reproduction bioassays were undertaken in50 mL beakers with 40 g of test sediment and 180 mL of overly-

ng seawater, following the procedure described in Simpson andpadaro (2011). This test measures, along with survival, the num-er of embryos and <1 d-old juveniles in the second brood followingxposure of M. plumulosa to test sediments over a 10 d period. Thexposure condition allows for the removal of juveniles from therst brood which is typically unaffected by contaminants in the testediment because they were already ‘conceived’ before exposureo test sediments (Mann et al., 2009). The organisms were fed dur-ng the exposures. For quality assurance purposes, 9–18 juvenileser female were required in all controls for tests to be consideredcceptable.

.3. Toxicant exposures

Sub-lethal contaminant exposures were chosen to maximize theumber of contaminant-induced changes in gene expression (e.g.oynton et al., 2007). The toxicant exposures to collect organismsor transcriptome sequencing were undertaken in 1 L glass beakersontaining 150 g of the test sediment, filled with ∼900 mL of filteredeawater (0.45 �m, 30‰) and allowed to equilibrate overnight at1 ◦C in a temperature controlled cabinet. The overlying wateras exchanged with fresh seawater before 75 adult amphipods

>0.5 mm, 2–5 months old) were added to the test beaker.At fixed times (1, 2, 4 and 10 d) during the exposure, repli-

ate exposures were terminated and the organisms collected. Therganisms collected from all time points were pooled so that the

Please cite this article in press as: Hook, S.E., et al., 454 pyrosequencingplumulosa: Transcriptome assembly and toxicant induced changes. Aquat.

ltimate stressor responsive library would contain transcripts withifferent kinetics of response following contaminant exposure. Ani-als were collected by gently sieving the sediment through 180 �mesh and washed in a sorting tray to remove any sediment. The

PRESSlogy xxx (2014) xxx– xxx 3

exposed animals were pipetted into a 3 mL cryogenic tube, excessseawater was syringed away and then tubes were flash frozen in liq-uid nitrogen, which both instantly euthanized the organisms andpreserved the tissues for RNA extraction. Additional adults wereharvested from our laboratory cultures for the control exposuresand euthanized by flash freezing. Juvenile amphipods were col-lected by sieving the sediment through 180 �m mesh and also flashfrozen. All tissues were stored at −80 ◦C until use.

2.4. RNA extractions

To obtain sufficient yield for 454 sequencing, RNA was extractedfrom aliquots of 50 (adult) or 75 (juvenile) Melita using a combinedTrizol (Invitrogen)/RNeasy (Qiagen) approach. Briefly, 50 adult or75 juvenile amphipods were homogenized in the Trizol reagentusing the MP Biomedical lysing matrix E and a Qiagen tissue lyser.The Trizol procedure was followed through the phase separation,and the aqueous phase was treated as lysate in the early steps ofthe RNeasy (Qiagen) procedure from animal tissue. After isolation,the RNA was DNase-treated using TURBO DNA free (Ambion). RNAwas checked for purity (260/280 ratios > 2.0) using a nanodrop anal-yser, and all aliquots and timepoints were pooled for subsequentanalyses. An aliquot was saved for quality assurance and mRNAwas then isolated using the Oligotex mRNA purification kit. mRNAwas quantified and further checked for purity using the nanodrop,and aliquots of the total and messenger RNA were analysed forintegrity using a Shimadzu multiNA microchip electrophoresis sys-tem. Although the amounts of rRNA decreased following mRNApurification, the microchip analysis showed that rRNA contamina-tion was persistent in all samples.

2.5. Preparation of cDNA libraries and pyrosequencing

Construction of cDNA libraries and pyrosequencing was con-ducted at the Ramaciotti Centre for Gene Function Analysis,University of New South Wales, Kensington, NSW, Australia.Libraries were prepared as described in “cDNA Rapid Library Prepa-ration Method Manual - GS FLX Titanium Series - October 2009(Rev. Jan 2010)”, except for the following modifications: RNA wasfragmented for 125 s; enzymatic reactions and size selection wereperformed on the SpriWorks (Beckman Coulter). Samples weremultiplexed using MID-labelled primers (multiplex identifiers)RL1, RL2, RL3 and RL4. Combined cDNAs were then sequencedon a picotitre plate using the GS-FLX platform (454, Roche, Mary-land, USA). Emulsion PCR (emPCR) titrations were carried out asdescribed in “emPCR Method Manual - Lib-L SV - GS FLX TitaniumSeries - October 2009 (Rev. Jan 2010).” Bulk emPCRs were carriedout as described in “emPCR Method Manual - Lib-L LV - GS FLXTitanium Series - October 2009 (Rev. Jan 2010).” Sequencing wasperformed on a GS FLX (Roche) using the GS FLX Titanium Sequenc-ing Kit XLR70 as described in “Sequencing Method Manual - GSFLX Titanium Series - October 2009 (Rev. November 2010).” Imageand signal processing were performed with the GS Sequencer, con-tained within Roche’s 454 system sequencing software packageversion 2.6.

2.6. Data analysis

Three different assembly protocols were used: a Newblerassembly (Margulies et al., 2005), an assembly in CLC Genomicsworkbench (CLC Genomics workbench v6.0.1. CLC Bio A/S,Denmark), and an assembly using MIRA (Chevreux et al., 1999).

-based analysis of gene expression profiles in the amphipod MelitaToxicol. (2014), http://dx.doi.org/10.1016/j.aquatox.2013.11.022

These were used with three different approaches to read filtering:unfiltered reads, fixed length filtered reads and a sliding windowfiltering approach. The read filtering and the final assembly wereperformed in the CSIRO instance of Galaxy (Goecks et al., 2010;

ARTICLE IN PRESSG Model

AQTOX-3691; No. of Pages 16

4 S.E. Hook et al. / Aquatic Toxicology xxx (2014) xxx– xxx

r and

Grrp(c(tusgabFirwbtFwssbmaditb

Fig. 1. Schematic of the workflow showing the steps taken to filte

iardine et al., 2005). For the Newbler assembly, the sequencingeads were filtered for quality by the GS Run Processor software. Aibosomal RNA sequence database (including both parc SSU.fa andarc LSU.fa files) was downloaded from http://www.arb-silva.dePruesse et al., 2007). The total read dataset was used to build aonsensus de novo assembly with the Newbler v2.6 gs AssemblerRoche, Maryland, USA) using the −cDNA option. Reads mappingo the rRNA database files were excluded from the assembly bysing the filter option (−vt) during assembly. The resultant con-ensus assembly included files representing isogroups (putativeenes), isotigs (transcript variants) and contigs (exons) as well as

number of metric and other files. For the CLC Genomics work-ench assembly, the reads were uploaded into CLC Genomics asASTA files and assembled using default parameters. The follow-ng workflows were designed and used to filter and assemble theeads in Galaxy (as shown in Fig. 1): both FASTA and Quality filesere imported into Galaxy, and a base quality distribution file was

uilt. Sequencing artifacts (primers and MIDs) were removed usinghe “clip” tool, then the FASTA and QUAL files were combined toASTQ, and sequences were trimmed to 350 bp using FASTQ toolsithin Galaxy (Blankenberg et al., 2010), as the base distribution

uggested a remaining repeated sequence. Reads were then filtereduch that those without a quality score of 20 for more than 95% ofases discarded. A second filtering protocol was also trialed to opti-ize the assembly. In this alternate assembly, reads were trimmed

s previously and a sliding window approach was used to filter the

Please cite this article in press as: Hook, S.E., et al., 454 pyrosequencingplumulosa: Transcriptome assembly and toxicant induced changes. Aquat.

atasets read by read. The window size was 20, the minimum qual-ty score was 20 (Blankenberg et al., 2010) and the number of baseso exclude was 1. Filtered reads from both approaches were assem-led using the MIRA algorithm (version 0.0.4) (Chevreux et al.,

assemble the reads into contigs using the CSIRO Galaxy instance.

1999). Reads were deposited into NCBI’s databases with the acces-sion number SRR0851120-4. This Transcriptome Shotgun Assemblyproject has been deposited at DDBJ/EMBL/GenBank under theaccession GAKD00000000. The version described in this paper is thefirst version, GAKD01000000. The data are also available at: HookS, Twine N, Simpson, S, Spadaro D, Moncuquet P, Wilkins M (2013):Raw reads and final assembled contigs for Melita project. v2. CSIRO.Data Collection (http://dx.doi.org/10.4225/08/519F04075BB7F).

The output from all for assembly strategies was sent to Blast2Gov2.5.0 (http://www.blast2go.org; Conesa et al., 2005) for sequenceannotation and Gene Ontology mapping. Contig sequences werecompared to the BLAST non redundant protein database usingBLASTx (within the Blast2GO tool). Sequences were assigned anannotation if the best BLAST hit had an e-value <1 × 10−3. Sequenceannotation was made more parsimonious using the GO SLIM anno-tations (McCarthy et al., 2006). Taxonomic distribution of the BLASTresults from a BLASTn search version 0.0.11 (conducted via theCSIRO instance of Galaxy) were obtained using the “metagenomicanalysis” tools within Galaxy.

For read mapping, the fixed length filtered read dataset (asdescribed in first paragraph of this section) was used. Reads weremapped to contigs from the fixed length trimmed MIRA assemblyusing BWA-SW (version 0.6.2) (Li and Durbin, 2010). The resultantSAM alignment files were filtered to include only those alignmentswith a mapping quality (MAPQ) score of ≥1 (which identifies align-ments that are not ambiguous). The Samtools (v 0.1.18) (Li et al.,

-based analysis of gene expression profiles in the amphipod MelitaToxicol. (2014), http://dx.doi.org/10.1016/j.aquatox.2013.11.022

2009) idxstats program was used to generate counts for each con-tig, from the SAM alignment files. The DESeq package and R (Andersand Huber, 2010; Ihaka and Gentleman, 1996) was used for librarysize normalization and differential gene expression calculations for

ARTICLE IN PRESSG Model

AQTOX-3691; No. of Pages 16

S.E. Hook et al. / Aquatic Toxicology xxx (2014) xxx– xxx 5

Table 1Primer and PCR specifications used to validate to differential expression findings.

Primer Forward sequence Reverse Sequence Probe Sequence PCR efficiency % Annealingtemperature (◦C)

HSP70 CCGATGAGTCTTTCTGTGTCC TGTTGGTATTGATCTGGGCAC TGTGTTCCA GCATGGCAAGGTG 91 50ABCcassette TGT CGA GCA TGT GGA ATA TCC CCC TCG TGA CCA ATC TGA AG TGT CCC TGC ACC AGC CCT 84 50Chymotrypsin GTC ACT AAC ACC TCC GAT GC GCC AAT CAC TTT CAC AGA TGC ATG AGT CTT GCC CCA TCC GGT T 95 50Carboxypeptidase CAC ATA CGG TCA TCG TTC CAG CGA CGT TCA CAT TCT CTC AGT C ACA ACC CTG ACG GCT ATG CCT T 93 50Elongationfactor GGT ACG CTG GAT TGA CTT CTC TCG CTT CTA CGC TTT TGG TC TTC TCT GGC AAG GTT GGA ACT GGT 92 50

Table 2Chemical parameters of the test sediments.

Sediment Grain Size TOC AVS Total recoverable metals, mg/kg Total organics, mg/kg

<63 �m, % % �mol/g Fe Mn Cd Cu Ni Pb Zn PAHs TPHs

Control 98 4.7 4.5 24100 70 0.2 25 11 60 220 <0.2 <250Metals 59 4.5 <0.1 37000 580 0.58 835 33 45 400 <0.2 <2505% Diesel 98 4.7 4.0 24100 70 0.2 25 11 60 220 5.3 2700

T ide. Toa ere th

tcsametpa(

2

HvcdtTpup

TS

OC = total organic carbon. PW-NH3 = porewater ammonia. AVS = acid-volatile sulphnd 8270). TPHs = represents the sum of C10–C36 total petroleum hydrocarbons wh

he count datasets of the 4 samples. As there was a single sampleomprised of pooled individuals, in each condition, the expres-ion variance for each gene was estimated by using the variancecross the 4 conditions for that gene. This is known as a ‘blind’ethod of dispersion estimate (Anders and Huber, 2010). This

stimation can result in overly conservative statistical significanceesting of differential gene expression, but is suggested as an appro-riate step to take in the absence of biological replicates (Andersnd Huber, 2010). Venn diagrams were generated using VENNYOliveros, 2007).

.7. qPCR validation

As described previously (Hook and Osborn, 2012, Osborn andook, 2013), RNA extracted as described above was used for qPCRalidation. 200 ng from each aliquot was reverse transcribed intoDNA using Qiagen’s quanti tect cDNA synthesis kit. Primers withual-labelled 6′FAM – ZEN – Iowa Black probes were designed forhe transcripts in Table 1 using IDT DNA’s RealTime PCR software.

Please cite this article in press as: Hook, S.E., et al., 454 pyrosequencingplumulosa: Transcriptome assembly and toxicant induced changes. Aquat.

ranscript levels were measured using a relative quantificationrotocol on Applied Biosystems Fast 7500 system using the man-facturer’s protocols and reagents, with the exception of an addedrimer anneal step at 50 ◦C. All PCR efficiencies were comparable

able 3ummary of consensus assembly data, as obtained with different assembly and read filte

InputNumber of reads: 1,359,398Number of bases: 411,220,893

Newbler CLC genomics MIRA

Consensus resultsNumber of reads available: 1,359,398 1,359,398 1,35Number of reads assembled: 909,060 1,359,398 678,

(70.8%) (100%) (49.9Contig metricsNumber of contigs: 58,043 50,352 83,8Number of bases: 24,500,908 24,336,346 45,4Average contig size: 422 483 542

Isotig metricsNumber of Isotigs: 52,655Average contigs count: 1.2Number of bases: 12,925,492Average isotig size: 526Isogroup metricsNumber of isogroups 49,097Average contig count 1.2Average isotig count 1.1

tal PAHs = the sum of 16 polycyclic aromatic hydrocarbons (U.S. EPA methods 3550e sum was of (PAHs) and (TPHs).

(approximately 90%), and no amplification was measured in theNTC. Fold change was calculated by the ��Ct method, as describedby Livak and Schmittgen (2001). Expression was normalized toelongation factor-1-alpha levels. Significance was determinedusing a t-test on the ��Ct values to assure normality.

3. Results

3.1. Properties and chronic effects of test sediments

In order to maximize the diversity of transcripts captured and toinclude transcripts that are expressed upon exposure to toxicants,two groups of M. plumulosa amphipods were exposed to contami-nated sediments prior to harvest for RNA extraction. The propertiesof the sediments used for control, juvenile, metal and diesel expo-sures are provided in Table 2. While the concentrations of metalsin the control and diesel-spiked sediments appear to be elevated,past studies have demonstrated that the bioavailable metal con-centrations in sediments collected from the same site with similar

-based analysis of gene expression profiles in the amphipod MelitaToxicol. (2014), http://dx.doi.org/10.1016/j.aquatox.2013.11.022

total metal concentrations did not cause acute or chronic effects tobenthic invertebrates (Simpson and Spadaro, 2011).

For all sediments (controls, metal and diesel contaminated),survival was >85% during the 10-day chronic bioassays. Sublethal

ring protocols.

(unfiltered) MIRA (fixed length) MIRA (sliding window)

9,398 586,481 1,333,664324 310,697 258,587%) (22.9%) (19.0%)

80 26,265 22,38362,960 14,343,760 14,011,758

532 626

IN PRESSG Model

A

6 Toxicology xxx (2014) xxx– xxx

coer0(6sl

3

(

(

rpt

bsasscbduo1rPos(gSaw5c

ebfictrnwomrarc

Fig. 2. (a). Distribution of contig lengths obtained with different assembly algo-

ARTICLEQTOX-3691; No. of Pages 16

S.E. Hook et al. / Aquatic

ontaminant exposures were chosen to maximize the numberf contaminant-induced changes in gene expression (Poyntont al., 2007). The reproductive outputs of M. plumulosa (% controlesponse) were 34 ± 10% for metal contaminated sediment. For the.25, 0.5, 1, 2 and 4% diesel series the reproductive outputs weremean ± standard error) 106 ± 3%, 106 ± 3%, 109 ± 3%, 75 ± 5% and8 ± 16% of the control, respectively. For the gene-expression expo-ures a diesel concentration of 5% was used to provide a suitableevel of sublethal effects.

.2. Sequence output and assembly statistics

Four separate cDNA libraries were generated:

(i) one that contained RNA collected from adult amphipods fromour laboratory cultures;

(ii) one that contained RNA from juvenile amphipods;iii) one that contained RNA from amphipods exposed to metal-

contaminated sediments and harvested at different timepoints; and

iv) one that contained RNA from amphipods exposed to diesel-contaminated sediments and harvested at different timepoints.

These samples were multiplexed with specialized primers andun on one pyrosequencing plate. The 454 sequencing of the M.lumulosa pooled transcriptome generated 1,359,398 raw readsotaling 411 million base pairs, excluding rRNA.

The assembly statistics were different for each of the assem-lers used, as shown in Table 3. After trimming to remove adapterequences, the assembly program in CLC Genomics workbenchssembled 1,359,398 reads. The other algorithms only used a sub-et of reads. The unused reads were discarded as partial assemblies,ingletons, tandem repeats, outliers (which were excluded as likelyhimeras or artifacts), or reads less than 50 bp. These were assem-led into 50,352 contigs with an average length of 483. (Theistribution of contig lengths is shown in Fig. 2.) For reference, thenfiltered read lengths are also plotted. For the Newbler assembly,nce the reads were trimmed to remove poor quality sequences,,284,723 reads were available for assembly. A total of 74,675eads (approximately 5%) were excluded as a result of trimming.roducts from Newbler sequence assemblies have three levels ofrganization: contigs, which are contiguous sequences and repre-ent putative exons; isotigs, which are the assembly of the contigsexons) into one or more transcript variants; and isogroups, whichroup together transcript variants (isotigs) of the same gene (454equencing System Software Manual, May 2011). The consensusssembly contained 909,060 reads, (approximately 71%), whichere assembled into 49,097 isogroups. The average isotig size was

26 bp (distribution shown in Fig. 2) and the average number ofontigs per isotig was 1.2.

Galaxy has several options available to filter reads, which werevaluated to determine what influence these had on contig assem-ly, as shown in Fig. 2b. Using MIRA to assemble reads without priorltering resulted in 678,324 reads being assembled into 83,880ontigs with an average length of 542 bp (the distribution of con-ig lengths is shown in Fig. 2). Use of Galaxy’s FASTQ tools to trimeads from the 3′ end to a fixed length and exclude those that areot 95% Q ≥ 20 resulted in 586,481 reads. 310,697 of these readsere assembled into 26,625 contigs with an average contig length

f 532 bp. The distribution of contig lengths is shown in Fig. 2. Trim-ing reads using a sliding window via Galaxy, then assembling the

Please cite this article in press as: Hook, S.E., et al., 454 pyrosequencingplumulosa: Transcriptome assembly and toxicant induced changes. Aquat.

eads using the MIRA algorithm resulted in 22,383 contigs withn average length of 626 bp assembled from the 1,333,664 readsemaining after filtering. When the contig length distributions areompared, we see that trimming the reads prior to assembly in

rithms, and (b). Distribution of contig lengths obtained with different read trimmingprotocols. For reference, the distribution of lengths of the unfiltered reads is givenin panel c.

MIRA, either by a fixed length or sliding window algorithm, resultedin the lowest proportion of very short contigs (less than 250 bp)(Fig. 2). Comparable procedures to trim and filter reads are notoptions in the other two software packages.

The cDNA libraries generated for the metal exposed amphipodscontained 689,384 reads, whereas diesel exposed amphipods had225,010 reads, and juvenile and control amphipods contained

-based analysis of gene expression profiles in the amphipod MelitaToxicol. (2014), http://dx.doi.org/10.1016/j.aquatox.2013.11.022

196,626 and 174,992 reads respectively (Table 4). When the totalnumber of reads obtained for each library is compared, many moresequence reads for the metal exposed sample were obtained thanfor the other 3 samples (Table 4). This bias was likely introduced

ARTICLE IN PRESSG Model

AQTOX-3691; No. of Pages 16

S.E. Hook et al. / Aquatic Toxicology xxx (2014) xxx– xxx 7

Fig. 3. The taxonomic distribution of the top hit species distribution for the fixed length trimmed MIRA assembly, as generated by metagenomic analysis tools following aBLASTn search.

Table 4Summary of mapping statistics for individual treatments.

Control Diesel Metal Juvenile

Number of reads: 174,992 225,010 689,384 196,626Total number of alignments: 94,072 109,875 313,768 82,700

With zero mapping: 68,007 78,852 215,117 60,834

deEon

3

la(ew1nsrcBrBtwi

sFMmtwtcctda

Table 5Summary of the contig annotation using the different BLAST algorithms. Nucleotideorthologs were identified using a BLASTn search, protein orthologs were identifiedusing a BLASTx search, both search had a minimum e-value of 1 × 10−3.

Number of contigs: 26,625Number of contigs with a nucleotide ortholog 15,562Number of contigs with a protein ortholog 12,907Number of contigs with no ortholog in either database 9057

To gauge how thoroughly the assembly captured the transcip-

With non zero mapping: 26,025 31,023 98,651 21,866

uring the cDNA library preparation and has been discussedlsewhere (454 Sequencing Manual, Guidelines for Ampliconxperimental Design, March 2011). It does not reflect RNA qualityr experimental parameters. The variation was accounted for byormalization based on read library size in the DESeq package.

.3. Sequence annotation

Of the 26,625 contigs aligned by the MIRA algorithm using fixedength trimming criteria, 12,907 (approximately 48%) could beligned to another sequence in the non-redundant protein databaseNCBI) using BLASTx (using an e value cut off of 1−3) (Altschult al., 1990). The percentage of sequences that could be alignedith BLASTn (using the same e-value cut off) was higher (58% –

5,562 of 26,625). The majority of sequences with an ortholog in theucleotide database, but not the protein database, encoded ribo-omal RNA. When a sliding window algorithm was used to filtereads prior to assembly in MIRA, the assembly generated fewerontigs with BLASTn orthologs: 12,625 (56% of 22,383 contigs).y comparison, using the MIRA assembler with unfiltered readsesulted in 45,748 contigs (54% of 83,880) that could be aligned withLASTx. Use of the CLC genomics assembly resulted in 14,026 con-igs (approximately 30% of 50,352) that could be aligned by BLASTx,hereas the NEWBLER assembly resulted in 15,388 (30% of 58,043)

sotigs that had an ortholog in the non-redundant protein database.Contigs from the trimmed MIRA assembly aligned best to

equences derived from a variety of different taxa, as shown inig. 3. This was true for all assemblies, however only the fixed lengthIRA assembly is discussed specifically. As would be expected, theajority of contigs aligned to a crustacean species. Over 11,000 of

he sequences with an ortholog aligned to an arthropod sequencehen measured using a BLASTn search, and more than 8000 con-

igs were most similar to a sequence from one of the malacostracanrustaceans. The majority of non-arthropod “top hits” matchedhordates. A variety of other taxa, including viruses, archaea, bac-

Please cite this article in press as: Hook, S.E., et al., 454 pyrosequencingplumulosa: Transcriptome assembly and toxicant induced changes. Aquat.

eria, fungi and plants, all were represented in the taxonomicistribution, but each of these phyla made up less than 2% of thennotated contigs.

Number of unique nucleotide orthologs (number of distincttranscripts)

5965

The BLAST results show that there are multiple contigs that alignto different parts of a single transcript. The data in Table 5 showsthat there are many non-unique nucleotide alignments. This find-ing indicates that the contig assembly contains many fragmentsor partial coding sequences and very few full length transcripts,unsurprising, perhaps, given the average contig length of 532 bp.This leads to a high degree of redundancy in the annotation. Forinstance, of the BLASTn results, only 5965 coded for distinct tran-scripts (i.e. hit sequences with different accession numbers in theNCBI database).

To evaluate the quality of the different assemblies, the BLASTresults were used to determine whether the contigs produced rep-resented a single transcript. Many of the contigs produced by ourinitial assemblies, using Newbler and CLC Genomics, assembledinto “chimeric” contigs, where only fragments of these had anortholog in the database (data not shown). The remainder of thecontigs either did not align to any sequences in the database, or insome cases, aligned to another transcript encoding a different geneproduct. We calculated the ratio of the alignment length to the con-tig length for those contigs with an ortholog in the NCBI database,with the rationale that the ratio of high alignment length to contiglength would indicate assemblies of reads encoding the same geneproduct. Alignment lengths are shown in Fig. 4 for the two MIRAassemblies generated with different read trimming protocols. Sincethe other assembly programs generated very short contigs align-ment lengths (typically less than 25% of the length of the contig),they are not plotted. Although there are alignments of a varietyof lengths, the greatest proportion of contigs with an alignmentbetween 90 and 100% of the overall length was generated whenthe “fixed length” read filtering protocol was used. Using a slidingwindow protocol, the proportion of short alignment lengths wasmuch higher (as shown in Fig. 4). Fixed length read trimming wasthus used for the final assembly.

-based analysis of gene expression profiles in the amphipod MelitaToxicol. (2014), http://dx.doi.org/10.1016/j.aquatox.2013.11.022

tome, all 26,625 contigs were compared to the set of 248 highlyconserved eukaryotic proteins curated by CEGMA (Parra et al.,2007). This set of proteins has been refined to exclude the majority

ARTICLE IN PRESSG Model

AQTOX-3691; No. of Pages 16

8 S.E. Hook et al. / Aquatic Toxicology xxx (2014) xxx– xxx

Fr

oaSstb

3

toattg2tftsttfi

3

siiccec2fidwci

ig. 4. Proportion of the contig that aligns to a GenBank sequence with differentead trimming protocols.

f paralogs. 66.5% (165 out of 248) of proteins were identifiedmongst our contigs using BLASTx (E < = 1e−7) (Supplemental Table1). If unassembled reads are included in the BLAST, 84% of con-erved eukaryotic proteins were captured. This result suggests thathe majority of expressed genes were captured by pyrosequencing,ut that many remain singletons following the assembly.

.4. Functional annotation

Of the 26,625 contigs generated by the MIRA assembler usinghe fixed length trimmed reads, 8149 could be assigned Gene Ontol-gy (GO) annotation using the Blast2GO tool (Conesa et al., 2005)nd 5206 could be fully annotated (Supplementary Table S2). Ofhe 37,781 GO terms loaded, 5223 unique terms were assigned tohe fully annotated contigs. These matches were then grouped intoeneric GO terms using the GOSlim viewer tool (McCarthy et al.,006). As shown in Supplementary Fig. 1, functional annotation ofhose transcripts with full annotation displayed a diverse range ofunctions. There was no discernible enrichment of any set of func-ions, showing that our sequencing efforts captured a broad crossection of cellular function. However, since less than half of theranscripts could be assigned a GO term, any estimate of represen-ational coverage by comparing the abundance of GO terms to thatound in a related taxa (e.g. Fraser et al., 2011) would have to benterpreted with caution.

.5. Differential expression

The 100 most abundant contigs in the control treatment arehown in Supplementary Table S3. Often, different contigs encod-ng different portions (transcript fragments) of the same gene arencluded in the list. The most abundant contig in control amphipodsodes for vitellogenin (Supplementary Table S3) (roughly 658ounts). Vitellogenin (or apolipocrustein in crustaceans (Haywardt al., 2010)) is the major protein found in yolk and is asso-iated with reproductive maturity in oviparous females (Hyne,011). Aside from those contigs encoding vitellogenin, many otherrequently measured contigs are common “housekeeping” genesncluding actin, myosin, troponin, elongation factors, mitochon-rial cytochromes and GADPH. Contigs encoding ribosomal RNA

Please cite this article in press as: Hook, S.E., et al., 454 pyrosequencingplumulosa: Transcriptome assembly and toxicant induced changes. Aquat.

ere also common. More intriguing, transcripts encoding cell cycleontrol genes, an acetylcholine receptor, and a juvenile hormonenducible protein were also abundant in control amphipods.

Fig. 5. Overlap amongst the most abundant contigs in each treatment.

The most abundant genes show a high degree of overlapamongst treatments (Fig. 5). As in the control, this list includedhousekeeping genes, actin, myosin and troponin. Intriguingly, vitel-logenin is amongst the most common transcripts in the control, butnot in the juveniles (which are sexually immature – 347 counts)or in the diesel-exposed organisms (which are reproductivelyimpaired – no counts). The contigs that were uniquely abundant inone group only were frequently unknown or not fully annotated.

The fifty transcripts that had changed the most in the juvenile,metal-treated and diesel-treated sediments, relative to control, aregiven in Supplementary Tables S4–S6. These lists showed less over-lap between treatments than the most abundant transcripts, asshown in Fig. 6a and b. Unfortunately, all three gene lists con-tain many contigs (53, 48, and 54 respectively) with no orthologsin the reference database. Also, all three treatments had manyribosomal RNA subunit genes with both increased and decreasedabundance. Transcripts encoding a myb transcription factor, whichregulates hematopoiesis, were approximately 20-fold increased inabundance in the juvenile only treatment, as were transcripts forseveral cell signaling proteins. Contigs that align to senescenceassociated genes were down-regulated between 30- and 40-foldin the juvenile only treatments, and an adult-specific troponintranscript was also down regulated 40-fold (Supplementary TableS4) in this treatment. The contigs most increased in abundance inthe metal treatment relative to the control included transcriptsfor multiple digestive enzymes (e.g. chymotrypsin and carboxypeptidase), which were increased between 20- and 30-fold, aswell as transcripts for proteins involved in mitotic control (e.g. akinesin like protein) (increased 40-fold) and neurological develop-ment (e.g. a 5-AMP-activated protein kinase) (increased 16-fold)(Supplementary Table S5). Contigs with decreased abundance inthe metal treatments include transcripts for DNA repair enzymes(e.g. a mismatch specific glycolyslase) (decreased more than 80-fold) as well as many contigs encoding proteins with unknownfunction. In the diesel-exposed treatments, levels of contigs encod-ing rRNA promoter binding proteins were increased between 20-and 50-fold, as well contigs that encode lachesion, (involved incell adhesion), which was 30-fold more abundant. Contigs encod-ing vitellogenin were over 250-fold less abundant in amphipodsfrom diesel contaminated sediment (as discussed above), as weretranscripts encoding proteins with roles in transcriptional control(approximately 30-fold less abundant) (Supplementary Table S6).

-based analysis of gene expression profiles in the amphipod MelitaToxicol. (2014), http://dx.doi.org/10.1016/j.aquatox.2013.11.022

It was notable that different contigs encoding fragments of thesame transcript typically had similar patterns of abundance whentreatments are compared.

ARTICLE ING Model

AQTOX-3691; No. of Pages 16

S.E. Hook et al. / Aquatic Toxico

Fad

thWtctaoi6(c

ig. 6. (a) Degree of overlap among the 50 transcripts with the greatest increases inbundance for each treatment. (b) Overlap amongst the 50 contigs with the greatestecreases in abundance in each treatment.

When genes with known function are examined, we again noticehat contigs encoding different fragments of the same transcriptave similar patterns of expression (e.g. vitellogenin, Table 6).e also noticed that the patterns of expression were consis-

ent with the environmental contaminant exposure. For instance,ontigs encoding vitellogenin (an egg yolk protein, important forhe maturation of the oocyte) were less numerous in juvenilend diesel treated organisms than in controls and metal treatedrganisms (De Schamphelaere et al., 2008). Transcripts encod-

Please cite this article in press as: Hook, S.E., et al., 454 pyrosequencingplumulosa: Transcriptome assembly and toxicant induced changes. Aquat.

ng a sodium–potassium ATP pump, caspase-1, and HSP 70 and0 were 2–3-fold more abundant in metal exposed amphipodsWerner and Nagel, 1997) (Table 6). Contigs encoding an ABCassette and a MDR protein were five times more abundant in

PRESSlogy xxx (2014) xxx– xxx 9

amphipods exposed to diesel (e.g. Feldstein et al., 2006; Yawetzet al., 2010), whereas a contig encoding HSP 90 was three timesless abundant in diesel exposed amphipods than controls. GST-delta encoding contigs were approximately 3-fold more abundantin amphipods exposed to metals (e.g. Barata et al., 2005), whereasGST-mu encoding contigs were two times less abundant in metaltreated organisms (Espinoza et al., 2012) (Table 6), although lowcoverage makes the significance of these results difficult to evalu-ate.

Surprisingly, the contigs that encode gene products with arole in hormone regulation were more responsive to contami-nant treatment than to life stage (Table 6). For instance, contigsencoding a juvenile hormone inducible gene product (which con-trols reproductive maturation and gonadal development (LeBlanc,2007)) were under expressed in the juveniles-only treatment, butwere even less frequent in the diesel-treated organisms than inthe juveniles. Contigs encoding allostatin prepro hormone (whichactivates methyl farnesoate (Kwok et al., 2005)) were ten timesmore abundant in metal-exposed amphipods than any other group,and many of the nuclear hormone receptors responded similarlyto diesel or metals (Table 6), but not the other life stages. Far-nesoic acid o-methyltransferase encoding transcripts (which actson methyl farnesoate, the crustacean form of juvenile hormone(LeBlanc, 2007)) were also approximately ten times more abundantin the metals treatment for two of three contigs, and a transcriptencoding an ecdysone (which regulates molting (LeBlanc, 2007))regulated protein was approximately ten times more abundant inthe metals treatment. Several contigs for nuclear receptors that reg-ulate ecdysone (Hannas et al., 2010) were also more abundant inmetal-treated amphipods (Table 6).

To maximize the information regarding the functional clas-sification of the different contigs in each treatment, GO termenrichment analysis was conducted on all differentially expressed(fold change greater than 5) transcripts (Table 7). When the differ-entially expressed contigs are compared, changes in GO categoriesthat are indicative of function are observed. For instance, the mostfrequent cellular component to be up regulated by the metal treat-ments is “extra cellular region”, which corresponds to the digestiveenzymes up regulated by this treatment. Following diesel expo-sure, the “membrane” and “integral to membrane” componentsare among the most up regulated. When the “biological processes”are compared amongst treatment groups, transcripts involved inthe processes of “translation” and “translational elongation” aremore abundant in the juvenile treatments than in the others(Table 7).

3.6. QPCR validation

As shown in Table 8, a few transcripts that were putativelydifferentially expressed were validated via qPCR. Although directcomparison is difficult as the 454 measures represent a pool ofdifferent exposure times, ranging from 24 h to 2 weeks, and theqPCR measures represent the 48 h time period only, there is overallagreement between the two measures, although the magnitude ofchange measured via 454 was often greater. The HSP70 transcriptwas considered of higher abundance as measured via 454 but notas measured via qPCR, whether this is due to the kinetics in geneexpression or the relatively low magnitude of change in expressionmeasured is uncertain.

4. Discussion

-based analysis of gene expression profiles in the amphipod MelitaToxicol. (2014), http://dx.doi.org/10.1016/j.aquatox.2013.11.022

This paper describes the first generation and analysis of the M.plumulosa transcriptome. We found that the choice of assemblyalgorithm and read filtering protocol greatly influenced the quality

ARTICLE IN PRESSG Model

AQTOX-3691; No. of Pages 16

10 S.E. Hook et al. / Aquatic Toxicology xxx (2014) xxx– xxx

Table 6Normalized read counts for selected contigs with known toxicological or developmental function. The E score was derived from a BLASTx search performed via BLAST2GO.Read counts have fractions associated with them because of the DEseq normalization process, which normalizes to total library size.

Description Contig E score Control Read count Juvenile Read count Metal Read count Diesel Read Count

Vitellogenin 2 mira rep c7192 1.17E−52 657.91 347.44 615.98 1.19mira rep c7205 1.80E−23 282.57 160.56 232.63 1.19

Vitellogenin mira rep c7344 9.38E−62 182.67 90.81 132.46 1.19mira rep c7689 1.95E−85 19.98 23.69 23.40 1.18mira rep c7834 1.02E−28 12.84 3.95 9.36 1.19

Sodium–potassium transporting pump mira rep c7247 8.01E−141 71.357 59.22 162.89 99.89HSP 90 mira rep c7508 0 32.83 31.59 33.23 13.08HSP 70 mira rep c7225 0 47.095 36.85 100.64 36.87

mira c471 0 8.563 3.95 18.26 8.32mira rep c16162 0 7.14 5.26 17.32 1.19

HSP 60 mira rep c7901 5.43E−101 9.99 10.529 20.127 1.19Caspase-1 mira c462 2.46E−68 5.71 5.26 14.04 3.57GST (delta) mira rep c9444 2.53E−87 4.28 6.85 12.17 5.96

mira rep c16708 2.68E−72 2.85 2.63 5.15 3.57mira rep c17286 2.23E−91 2.84 1.32 5.15 1.19mira rep c18119 1.98E−75 1.43 1.32 4.68 5.95

GST (mu) mira rep c11197 2.47E−67 2.85 5.26 0.94 3.57ABC transporter mira c7070 5.52E−21 1.43 2.63 2.34 9.52Multi-drug transporter mira c6745 2.01E−21 1.43 1.13 2.34 10.70Juvenile Hormone Inducible mira rep c7298 9.17E−34 77.10 18.43 42.59 1.19

mira rep c14677 2.26E−21 19.98 3.95 8.43 1.19Nuclear hormone receptor e75 mira rep c15361 3.34E−29 1.43 2.63 2.81 10.70Allatostatin c preprohormone mira rep c8591 3.66E−22 1.43 1.32 14.98 2.37Probable nuclear hormone receptor hr3-like mira c6397 5.75E−28 1.43 1.31 2.81 8.32Steroid hormone receptor mira rep c16110 1.59E−41 1.43 2.63 9.83 4.76Farnesoic acid o-methyltransferase mira rep c7446 7.01E−124 18.55 14.48 14.98 10.70

2.1.1.

obatcaf

bpqr2sl(atiroBsswt

(

mira rep c7837 2.39E−93

mira c2002 7.41E−22

Ecdysteroid-regulated protein mira rep c13950 6.58E−30

f our final assemblies. After evaluating several different assem-ly and read filtering protocols for optimal results, we chose MIRAfter filtering the reads to a fixed length. This generated 26,625 con-igs, which we considered as putative transcripts. Of the assembledontigs, 15,562 have an ortholog in the NCBI nucleotide databasend we were able to annotate a large number of genes with knownunction in both growth and toxicological response.

One challenge in performing a de novo transcriptome assem-ly using NGS data is that there are currently no established “bestractice” methods for generating assemblies or for evaluating theuality and accuracy of results in the absence of a closely relatedeference species (Martin and Wang, 2011; Hornett and Wheat,012; Schliesky et al., 2012). This problem is greater for species,uch as the amphipod M. plumulosa, that have a great deal of phy-ogenetic distance to the most closely related sequenced speciesCahais et al., 2012; Hornett and Wheat, 2012). The existence of, anddherence to, quality assurance guidelines would not only assisthe production of higher quality transcriptomes, but would alsoncrease the reproducibility of NGS experiments between laborato-ies (Nekrutenko and Taylor, 2012). There are conflicting accountsf which assemblers perform best (e.g. Kumar and Blaxter, 2010;rautigam et al., 2011; Cahais et al., 2012; Mundry et al., 2012), anduggested protocols for quality assurance often require a referenceequence (e.g. Martin and Wang, 2011). Despite this uncertainty,e developed the following criteria to assess the accuracy of our

ranscriptome assembly:

(i) There should be an optimal number of contigs (Schliesky et al.,2012). An assembly with a high number of contigs is likelyto contain a high number of fragmented transcripts or highlypolymorphic genes (e.g. Schliesky et al., 2012). Nevertheless, itis not uncommon in the literature to see de novo transcriptome

Please cite this article in press as: Hook, S.E., et al., 454 pyrosequencingplumulosa: Transcriptome assembly and toxicant induced changes. Aquat.

projects where the number of contigs exceeds the number ofgenes in related species by 4–5-fold (e.g. Ma et al., 2012).

(ii) There should be a high proportion of long contigs and the pro-portion of very short contigs should be low. This criterion is

854 1.32 18.25 2.3843 1.32 6.09 1.1943 1.32 14.04 1.19

commonly used for judging the efficiency of assemblers (O’Neilet al., 2010).

iii) The most highly conserved eukaryotic proteins should be rep-resented in the assembly. These proteins are both conservedand frequently expressed so should be present in nearly everyorganism (Parra et al., 2007). Other studies have also used thiscriterion to gauge the comprehensiveness of their assembledtranscriptomes (e.g. Martinez-Barnatche et al., 2012).

(iv) There should be a low degree of “chimeric” contigs that eitheralign to multiple sequences in BLAST or have only short sec-tions of the transcript that align. Previous studies have usedan “ortholog hit ratio”–the proportion of the length of thealignment to the coding sequence in BLAST to determine qual-ity (O’Neil et al., 2010). One caveat to this is that NGS-basedapproaches also measure immature transcripts that may nothave been spliced to coding sequences. Also, the transcripts forthe same gene may differ between two species for evolutionaryreasons and may not represent a sequencing artifact.

As the goal of this study was to generate a reference transcrip-tome for future low cost RNA Seq studies (such as ion torrent),microarray construction and qPCR assays (as reviewed in Mehintoet al., 2012), the parameters of the read filtering and the assemblyprogram a used were conservative, so that unrelated reads wereless likely to be joined together into a single contig. As a conse-quence, the majority of transcripts had an alignment length tocontig length ratio greater than 50%, which we only found withthe fixed length trimmed MIRA assembly. The contigs created bythe two programs that used De Bruijn type algoritms were almostexclusively assembly artefacts where the software had joinedunrelated reads, and as a consequence, did not describe transcriptsthat encoded proteins. Because the reads were stringently filtered

-based analysis of gene expression profiles in the amphipod MelitaToxicol. (2014), http://dx.doi.org/10.1016/j.aquatox.2013.11.022

before assembly, we do not believe that the short alignment ratioscan be attributed to sequencing error. There were some disadvan-tages associated with conservative assemblies and stringent readfiltering, however. One is that the contigs represent many partial

Please cite this article in press as: Hook, S.E., et al., 454 pyrosequencing-based analysis of gene expression profiles in the amphipod Melitaplumulosa: Transcriptome assembly and toxicant induced changes. Aquat. Toxicol. (2014), http://dx.doi.org/10.1016/j.aquatox.2013.11.022

ARTICLE IN PRESSG Model

AQTOX-3691; No. of Pages 16

S.E. Hook et al. / Aquatic Toxicology xxx (2014) xxx– xxx 11

Table 7Frequency (as a percentage of the total annotated contigs) of Gene Ontology terms amongst up and down regulated contigs in each treatment.

Most up regulated

Metals Diesel Juvenile

Cellular componentExtracellular region 9.28 2.82 0.54Cytoplasm 7.17 3.95 7.03Membrane 1.27 6.21 2.16Integral to membrane 1.69 6.21 4.32Nucleus 3.38 5.08 7.03Cytosol 2.11 3.95 2.70Z disc 3.80 2.26 3.24Plasma membrane 2.95 3.95 4.32Molecular functionATP binding 8.51 6.81 5.63Protein binding 2.84 4.66 3.87Actin binding 3.61 1.08 1.76Protein homodimerization activity 3.35 1.43 0.70Serine-type peptidase activity 3.09 1.43 NAHydrolase activity 1.29 0.72 2.82Metal ion binding 1.03 2.51 2.11Zinc ion binding 1.80 2.15 1.41Protein serine/threonine kinase activity 1.80 1.08 2.11GTP binding 1.29 1.43 2.11Motor activity 2.06 0.72 0.70Nucleotide binding 1.80 1.43 1.76Nucleic acid binding 0.77 1.79 1.06Biological processOxidation–reduction process 1.52 0.96 2.54Protein phosphorylation 1.52 0.96 1.69Phosphorylation 0.46 1.68 1.41Translation 0.00 0.00 1.41Translational elongation 0.00 0.00 1.41Transport 0.76 1.20 0.56Nuclear-transcribed mrna catabolic process, nonsense-mediated decay 0.00 0.48 1.13Gluconeogenesis 0.30 0.72 1.13Mesoderm development 0.30 0.96 1.13Cytokinesis 1.06 0.72 0.56Muscle attachment 1.06 0.48 0.56Regulation of tube length, open tracheal system 1.06 0.48 0.56Imaginal disc-derived wing hair organization 1.06 0.48 0.56Flight 1.06 0.72 0.56

Most down regulated

Metals Diesel Juvenile

Cellular componentNucleus 5.43 6.10 5.67Nucleolus 3.83 3.66 4.25Cytoplasm 3.83 6.10 5.95Cytosol 2.88 6.10 5.38Integral to membrane 3.19 3.05 3.40Membrane 2.88 1.83 1.70Nucleoplasm 2.24 2.74 3.40Plasma membrane 1.92 2.74 2.83Intracellular 1.60 2.44 2.83Cytosolic small ribosomal subunit 1.92 2.44 1.98Mitochondrion 1.60 1.52 2.55Ribosome 1.60 1.22 0.85Microtubule associated complex 0.32 1.22 1.70Lipid particle 1.28 1.52 1.13Molecular functionATP binding 7.09 6.78 6.68Metal ion binding 3.07 2.19 2.71Protein binding 3.07 4.81 4.18Structural constituent of ribosome 3.07 3.50 2.92Zinc ion binding 1.65 2.84 3.76DNA binding 2.13 2.84 2.51Nucleotide binding 2.36 2.19 2.51Hydrolase activity 1.65 2.41Nucleic acid binding 1.42 2.19 1.67Binding 1.89 1.97 1.46Calcium ion binding 1.65 1.31 1.67GTP binding 0.71 1.75 1.88Catalytic activity 1.18 1.75 1.46Gtpase activity 0.47 1.75 1.25Actin binding 1.65

ARTICLE IN PRESSG Model

AQTOX-3691; No. of Pages 16

12 S.E. Hook et al. / Aquatic Toxicology xxx (2014) xxx– xxx

Table 7 (Continued)

Most down regulated

Metals Diesel Juvenile

RNA binding 1.42 1.31 1.46Structural constituent of cytoskeleton 0.24 1.53 1.25Unfolded protein binding 1.18 1.53 1.04Transcription coactivator activity 1.42 1.04Apical protein localization 1.42 1.31 1.04Sequence-specific DNA binding transcription factor activity 1.42 0.44 0.84Protein homodimerization activity 0.71 0.88 1.25Biological processSodium ion transport 2.42 0.62 0.43Translation 1.61 1.85 1.43Oxidation–reduction process 1.45 1.54 1.43Cellular process 0.64 1.23 1.43Protein phosphorylation 0.48 0.77 1.29GTP catabolic process 0.32 1.23 0.86

sdiptil

tttdisa2saea

de1t(tcewaamtaoc

TCt

Metabolic process

ATP catabolic process

Positive regulation of transcription from RNA polymerase II promoter

equences and comparatively few full-length cDNAs. Anotherisadvantage is that fewer conserved eukaryotic proteins were

ncluded in the assembly, as evident from the comparison pro-ortion of assembled contigs which align to the CEGMA databaseo those unassembled reads that align (Parra et al., 2007). Thencomplete assembly may also have resulted from our relativelyow overall sequence coverage.

One of the advantages of using a De Bruijn type algorithm overhe older consensus length overlap algorithms is that the De Bruijn-ype algorithms are better suited to finding splice variants withinranscripts. Alternative splicing is used to increase the functionaliversity of the proteome (Maniatis and Tasic, 2002). In general,

nvertebrates typically have much lower frequency of alternativelypliced genes than vertebrates (Kim et al., 2007). Crustaceans have

very high gene count relative to other organisms (Colbourne et al.,011), so may not utilize alternative splicing to increase the diver-ity of their transcriptomes. As a consequence, using an assemblylgorithm that does not capture the alternative splicing variantsfficiently may not have introduced as many duplicate reads formphipods as it would for vertebrate species.

Approximately half of the contigs characterized in this assemblyid not match any sequence in the NCBI protein database (using an

value cutoff of 1−3) (13,718 out of a total of 26,625). An additional567 aligned best to a hypothetical protein. It is not surprisinghat so many contigs are unknown as the malacostran crustaceanswhich includes the amphipods) diverged from the other crus-aceans 400 million years ago (Rewitz and Gilbert, 2008). Only onerustacean genome (D. pulex) has been fully sequenced (Colbournet al., 2011) and it is also composed of a large number of sequencesith no known homolog in the NCBI nr database. It is thought that

substantial portion of those genes confer an “eco-responsive”bility, i.e. an ability for D. pulex to adapt to a changing environ-ent (Colbourne et al., 2011). Since M. plumulosa would also need

Please cite this article in press as: Hook, S.E., et al., 454 pyrosequencingplumulosa: Transcriptome assembly and toxicant induced changes. Aquat.

o adapt to rapid environmental changes, it is possible that thismphipod also has a large portion of its transcriptome composed ofther “eco-responsive” genes. It is also possible that the “unknown”ontigs are untranslated regions of described transcripts (although

able 8omparison of differential expression as measured via 454 to that measured via quantithe q PCR data represents the 48 h time point only. For the qPCR data, fold change and a r

Transcript Treatment

Chymotrypsin Metal mixture

Carboxypeptidase Metal mixture

ABC transporter Diesel

HSP 70 Metal mixture

0.97 1.08 0.860.97 0.93 0.430.97 0.00 0.43

mRNA is enriched for, Roche 454 technology will sequence all RNApresent in the sample) or assembly artifacts (Jung et al., 2011).

Since there was no reference genome for the M. plumulosa tran-scriptome, it was not surprising that the best BLAST hits were toa diverse array of species (Fig. 3, Supplementary Table S2). Mostof these species (approximately 8500) fall into the crustacean subphylum. Another 2663 contigs aligned best to sequences derivedfrom the insects. Well-studied vertebrates accounted for ∼2000 ofthe top BLAST hits, which is no doubt a factor of their represen-tation in the genomic databases. Many bacteria (∼300) and algae(∼300) are also represented, which may be a result of measuringthe transcriptome of a sediment infaunal organism, which certainlyhas these bacteria both as a content of the gut and associated withits tissues.

To maximize transcriptome coverage from the 454 pyrose-quencing technique, a pool of different individuals from eachtreatment was analysed without biological replicates. This is com-mon for studies using Roche’s 454 pyrosequencing platform (e.g.Bellin et al., 2009; Hale et al., 2009), even when differential expres-sion experiments are being conducted. Other studies that usedpooled instead of replicate treatments include one comparing thegenomes of different lepidopteran species (O’Neil et al., 2010), oth-ers comparing expression levels in different tissues of birds andprawns (e.g. Santure et al., 2011; Jung et al., 2011), in differentamphipod developmental stages (Zeng et al., 2011), or betweenstressed and control individual coral colonies (Traylor-Knowleset al., 2011). Indeed, some of our transcripts of interest (e.g. manyof the cytochrome p450 isoforms) had read counts that were toolow for differential expression analysis, a problem which wouldhave been compounded if we had further split our run into repli-cates. A lack of biological replication in this study means we havea limited ability to infer statistical significance on the differentialgene expression results (Anders and Huber, 2010). As a conse-

-based analysis of gene expression profiles in the amphipod MelitaToxicol. (2014), http://dx.doi.org/10.1016/j.aquatox.2013.11.022

quence, our focus was on the contigs with the greatest abundance(as determined by read count), the greatest change in abundance (asdetermined by fold change), and those that previous studies haveshown to be either toxicologically or developmentally important.

ative PCR. The 454 data represent a combination of different time points, whereasange of expression values are presented (n = 3–5).

Fold change (454) Fold change (qPCR)

20 7.9 (2.5–35.8)20 4.4 (1.63–8.8)

6 5 (2.3–9.3)2 NC (0.3–1.8)

ING Model

A

Toxico

Dbemtotadutima

mfGapuaadpewtmiwtTstvedm(2ricSbmmIeoo(tt2seihNcii

r

ARTICLEQTOX-3691; No. of Pages 16

S.E. Hook et al. / Aquatic

ifferences in abundance when read counts are less than 10 shoulde interpreted with caution (Anders and Huber, 2010), hence wexcluded contigs with read counts less than 10 across all four treat-ents in the differential analysis. In addition, different exposure

imes were pooled to ensure that transcripts with different kineticsf expression were included in this description of the transcrip-ome. This may have masked any differential expression that waspparent at specific points after exposure only. To substantiate theifferences in expression that were observed, given that we werenable to replicate, a few transcripts were validated via qPCR. Theranscriptional changes were consistent with direction and signif-cance, if not magnitude. Similar trends have been observed for

icroarray data (as discussed in Hook and Osborn, 2012; Osbornnd Hook, 2013).

The differential transcript expression we describe in this studyatched a priori expectations. The most abundant transcripts were

rom housekeeping genes (such as actin, myosin, troponin, andADPH) and ribosomal RNA subunits, which are commonly foundmong the most abundant genes in transcriptomic sequencingrojects in related species (e.g. Jung et al., 2011). Ribosomal sub-nit genes were frequently amongst the most common contigs,s well as the most common contigs with increased or decreasedbundance, likely because of experimental artifact resulting fromifferential efficiency of the removal of rRNA from the mRNArior to sequencing. As would be expected, the most abundantlyxpressed genes were frequently the same across treatments. Thereere genes that showed differential expression following each

reatment (Supplementary Tables S4–S6). Many of these genesatched expectations based on the toxicological literature. For

nstance, contigs encoding vitellogenin, the dominant yolk protein,as less abundant in juvenile organisms, which are sexually imma-

ure, and in diesel exposed organisms, which were not reproducing.he metal exposed organisms, although reproductively impaired,till had vitellogenin transcripts. A previous study has also shownhat exposure to zinc does not cause a decreased abundance ofitellogenin transcripts (De Schamphelaere et al., 2008). Metal-xposed amphipods also increased transcript levels of a variety ofifferent digestive enzymes. Prior work has shown that ingestion ofetals inhibits digestive enzymes in deposit-feeding invertebrates

Chen et al., 2002; Dedourge-Geffard et al., 2009; Seebaugh et al.,011), hence this transcript upregulation may be a compensatoryesponse to the enzyme inactivation. Transcription of differentsoforms of glutathione-S-transferase were also responsive to theontaminant treatments. Contigs related to delta class glutathione--transferase, a protein previously thought to be specific to insectsut since been found in other crustaceans (Zhao et al., 2010), wereore abundant in metal- and diesel-exposed amphipods, whichay relate to its role in oxidative stress (e.g. Barata et al., 2005).

n amphipods exposed to metal-contaminated sediments, how-ver, decreased abundance in a glutathione-S-transferase mu wasbserved. Previous studies in fish have shown that some isoformsf GST can have decreased abundance following exposure to metalsEspinoza et al., 2012). Contigs related to multidrug resistant pro-eins and ABC cassettes, members of a related protein family thatransports organic molecules out of cells (Epel, 1998; Leslie et al.,005), were more prevalent in the diesel exposed organisms, con-istent with previous studies (e.g. Feldstein et al., 2006; Yawetzt al., 2010). Transcripts for heat shock protein 70 and 60 hadncreased abundance in metal exposed amphipods, which othersave also reported in amphipods exposed to cadmium (Werner andagel, 1997). However, a large number of differentially expressedontigs either had no annotation or aligned best to a hypothet-

Please cite this article in press as: Hook, S.E., et al., 454 pyrosequencingplumulosa: Transcriptome assembly and toxicant induced changes. Aquat.

cal protein, making interpretation of some of these differencesmpossible.

We were surprised that our metal-exposed adult amphipods,ather than the juveniles, showed the most dynamic expression

PRESSlogy xxx (2014) xxx– xxx 13

levels of transcripts involved in the crustacean endocrine system.If the changes we observe at the transcript level are manifested atthe protein level, there are several possible explanations for thisfinding. First, we would hypothesize that metal exposure influ-ences the molt cycle in our amphipods. Previous studies haveshown that exposure to cadmium can affect molting (Depledge andBillinghurst, 1999), and that exposure to zinc change expressionlevels of genes associated with molting (De Schamphelaere et al.,2008). Secondly, many modern pesticides are designed to disruptmolting processes in insects, and would likely interfere with normalcycling of hormones such as ecdysone (the molting hormone), andmethyl farnesoate (an analog to juvenile hormone) (LeBlanc, 2007)in crustaceans (e.g. Weston et al., 2005). Other GABA-inhibitingpesticides (such as fipronil) are also known to cause changes incrustacean reproduction (Gaertner et al., 2012). All organochlorineand organophosphate pesticides measured in all treatments werebelow the limit of detection (0.001 mg/kg), however, this limit isabove the concentrations known to cause endocrine effects, andmany putative invertebrate endocrine disruptors, such as bifen-thrin, cyfluthrin and cypermethrin (e.g. Weston et al., 2005) havenot been measured. Since these samples were collected in a devel-oped area of subtropical Australia, we cannot rule out the possibilitythat they contain juvenile hormone agonist pesticides. Finally, inthe metals treatment the concentrations of TBT, a known endocrinedisruptor in crustaceans (Wang et al., 2011), was approximately10 �g/kg (2 �g Sn/kg) a concentration that was comparable tothat found in control sediments in other studies (Jacobson et al.,2011), and approximately an order of magnitude below those con-centrations which cause effects measured in traditional bioassays(Jacobson et al., 2011). We could not predict whether the con-centrations of TBT would be sufficient to cause changes in geneexpression.

We speculate that the juveniles showed similar contig abun-dances as the adults for the transcripts involved in the regulationof endocrine processes as a result of our sampling. Many crustaceanhormones have multiple roles and function in embryonic and lar-val development, metamorphosis, sexual development as well asgonadal maturation (LeBlanc, 2007). Since we pooled differentgrowth stages in the juvenile only treatments, there may be genesthat are truly differentially expressed at one developmental stageand were “swamped” once different stages are pooled. This may becompounded by the fact that the control-only treatment containedadults, including gravid females carrying developing embryos, aswell as adults at different stages of gonad maturation.

The overall diversity of GO terms available for expressed tran-scripts suggested that this sequencing effort captured a diversity offunction within the transcriptome. However, since less than half ofthe expressed genes could be assigned a GO term, we cannot mean-ingfully compare the representation to another taxa. This problemis compounded by the fact that D. pulex, the only crustacean with afully sequenced genome at the time of writing, also has a very highabundance of genes with no homology in the available nucleotidedatabases and therefore no functional annotation (Colbourneet al., 2011). The few changes observed in GO functional categorieswhen differentially expressed contigs are compared likely reflectsthis incomplete annotation. However, there were a few notabledifferences in GO amongst the most differentially expressedcontigs. For instance metal exposure causes an upregulation ofcontigs associated with the extracellular region. As discussedpreviously, exposure to metals is known to alter digestive enzymefunction, and this transcriptional change may be a compensatoryresponse. Diesel exposure also affected contigs associated with

-based analysis of gene expression profiles in the amphipod MelitaToxicol. (2014), http://dx.doi.org/10.1016/j.aquatox.2013.11.022

the membrane, which would be expected as narcotic compounds,including many of the components of diesel, are known to interferewith cell membranes (Di Toro et al., 2000; Hook and Osborn, 2012).The juvenile only treatment had an increased proportion of contigs

ING Model

A

1 Toxico

a“fIfGn

tbtcchmStdmrotpa

5

ppajtdrmrncitogatg2b

A

taSNCC(MUMdSa

ARTICLEQTOX-3691; No. of Pages 16

4 S.E. Hook et al. / Aquatic

ssociated with the biological processes of “translation” andtranslational elongation”. We would hypothesize that these resultrom the active growth associated with organismal development.n addition, the most frequent GO terms encountered in the dif-erential expression lists are not the most frequently encounteredO terms in the assembly as a whole, giving further support to theotion that these functions are truly enriched in our treatments.

Although this study provides valuable information about theranscriptome of M. plumulosa and about the process that woulde taken to utilize NGS data, it is not without limitations. First,he study only describes the transcriptome. Not all transcriptomichanges will result in changes in proteins and not all cellular pro-esses are regulated at the transcriptomic level. Previous workas discussed the hazards of excluding other types of measure-ents including proteins (e.g. Nikkinmaa and Rytkonen, 2011; Van

traalen and Feder, 2012). A second limitation is that many con-igs identified could not be fully quantified because of the low readepth of 454 pyrosequencing. More recent platforms, such as Illu-ina HiSeq, would provide a greater number of reads and more

eliable quantitation (e.g. Tarazona et al., 2011). Finally, as oftenccurs for non model organisms, a substantial number of the con-igs identified do not have clear homologs and thus functions. It isossible that some of these will be identified as additional Crustaceare sequenced.

. Conclusions

In conclusion, this work demonstrated the efficacy of using 454yrosequencing as a means of transcriptome discovery in a com-letely undescribed genome with no closely related reference. Itlso shows the value of having predetermined criteria by which toudge the quality of de novo assembly when there is no referenceranscriptome or genome for the work. An iterative approach to theata analysis was used in which we trialed different assembly algo-ithms and read filtering protocols until we had an assembly thatet our criteria for success. In the case of this data set, a fixed-length

ead-filtering protocol and the MIRA assembler yielded a plausibleumber of contigs, a reasonable contig length, good transcriptomeoverage (as estimated by the representation of GO terms and thenclusion of conserved eukaryotic proteins in our assembly), andranscripts with a chimeric rate that was lower than any of thether assembly protocols we tried. This study also shows changes inene expression following exposure to contaminants and betweendults and juveniles that were consistent with prior studies. Theranscriptomic data generated by this study can now be used toenerate microarrays and qPCR assays (reviewed in Mehinto et al.,012), for use both in subsequent ecotoxiological studies and toetter understand the biology of these crustaceans.

cknowledgements and funding

This work was supported by a grant from BioPlatforms Australiao conduct the 454 sequencing. The Systems Biology Initiativecknowledges support from the EIF Super Science Scheme, the NSWtate Government Science Leveraging Fund and the University ofew South Wales. Additional support was made available by theSIRO Wealth from Oceans flagship and the CSIRO Bioinformaticsore. The authors acknowledge the assistance of Mr. Ian HamiltonCSIRO, Land and Water) with animal husbandry and exposures.

r. Jason Koval (Ramaciotti Centre for Gene Function Analysis,niversity of New South Wales) performed the 454 sequencing.

Please cite this article in press as: Hook, S.E., et al., 454 pyrosequencingplumulosa: Transcriptome assembly and toxicant induced changes. Aquat.

s. Hannah Osborn performed the quantitative PCR. The primaryata and workflows can be found at: Hook S, Twine N, Simpson, S,padaro D, Moncuquet P, Wilkins M (2013): Raw reads and finalssembled contigs for Melita project. v2. CSIRO. Data Collection

PRESSlogy xxx (2014) xxx– xxx

(http://dx.doi.org/10.4225/08/519F04075BB7F). This manuscriptwas improved by CSIRO internal reviewers, Peter Bain, GraemeBatley and Anthony Chariton, and by two anonomous reviewers.

Appendix A. Supplementary data

Supplementary data associated with this article can befound, in the online version, at http://dx.doi.org/10.1016/j.aquatox.2013.11.022.

References

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local align-ment search tool. J. Mol. Biol. 215, 403–410.

Anders, S., Huber, W., 2010. Differential expression analysis for sequence count data.Genome Biol. 11, R106.

Barata, C., Varo, I., Navarro, J.C., Arun, S., Porte, C., 2005. Antioxidant enzyme activitiesand lipid peroxidation in the freshwater cladoceran Daphnia magna exposed toredox cycling compounds. Comp. Biochem. Physiol. C 140, 175–186.

Bellin, D., Ferrarini, A., Chimento, A., Kaiser, O., Levenkova, N., Bouffard, P., Delle-dorne, M., 2009. Combining next-generation pyrosequencing with microarrayfor large scale expression analysis in non-model species. BMC Genomics 10, 555.

Blankenberg, D., Gordon, A., Von Kuster, G., Coraor, N., Taylor, J., Nekrutenko, A., TheGalaxy Team, 2010. Manipulation of FASTQ data with Galaxy. Bioinformatics 26,1783–1785.

Brautigam, A., Mullick, T., Schliesky, S., Weber, A.P.M., 2011. Critical assessment ofassembly strategies for non-model species mRNA-Seq data and application ofnext-generation sequencing to the comparison of C3 and C4 species. J. Exp. Bot.62, 3093–3102.

Cahais, V., Gayral, P., Tsagkogeorga, G., Melo-Ferreira, J., Ballenghien, M., Weinert, L.,Chiari, Y., Belkhir, K., Ranwez, V., Galtier, N., 2012. Reference-free transcriptomeassembly in non-model animals from next-generation sequencing data. Mol.Ecol. Resour. 12, 834–845.

Chen, Z., Mayer, L.M., Weston, D.P., Bock, M.J., Jumars, P.A., 2002. Inhibition ofdigestive enzyme activities by copper in the guts of various marine benthicinvertebrates. Environ. Toxicol. Chem. 21, 1243–1248.

Chevreux, B., Wetter, T., Suhai, S., 1999. Genome sequence assembly using tracesignals and additional sequence information. In: Computer Science and Biol-ogy: Proceedings of the German Conference on Bioinformatics (GCB), vol. 99,pp. 45–56.

Colbourne, J.K., Pfrender, M.E., Gilbert, D., Thomas, W.K., Tucker, A., Oakley, T.H., Tok-ishita, S., Aerts, A., Arnold, G.J., Basu, M.K., Bauer, D.J., Caceres, C.E., Carmel, L.,Casola, C., Choi, J.H., Detter, J.C., Dong, Q.F., Dusheyko, S., Eads, B.D., Frohlich, T.,Geiler-Samerotte, K.A., Gerlach, D., Hatcher, P., Jogdeo, S., Krijgsveld, J., Krivent-seva, E.V., Kultz, D., Laforsch, C., Lindquist, E., Lopez, J., Manak, J.R., Muller, J.,Pangilinan, J., Patwardhan, R.P., Pitluck, S., Pritham, E.J., Rechtsteiner, A., Rho, M.,Rogozin, I.B., Sakarya, O., Salamov, A., Schaack, S., Shapiro, H., Shiga, Y., Skalitzky,C., Smith, Z., Souvorov, A., Sung, W., Tang, Z.J., Tsuchiya, D., Tu, H., Vos, H., Wang,M., Wolf, Y.I., Yamagata, H., Yamada, T., Ye, Y.Z., Shaw, J.R., Andrews, J., Crease,T.J., Tang, H.X., Lucas, S.M., Robertson, H.M., Bork, P., Koonin, E.V., Zdobnov, E.M.,Grigoriev, I.V., Lynch, M., Boore, J.L., 2011. The ecoresponsive genome of Daphniapulex. Science 331, 555–561.

Conesa, A., Götz, S., Garcia-Gomez, J.M., Terol, J., Talon, M., Robles, M., 2005.Blast2GO: a universal tool for annotation, visualization and analysis in functionalgenomics research. Bioinformatics 21, 3674–3676.

Craft, J.A., Gilbert, J.A., Temperton, B., Dempsey, K.E., Ashelford, K., Tiwari, B.,Hutchinson, T.H., Chipman, J.K., 2010. Pyrosequencing of Mytilius galloprovin-cialis cDNAs: tissue-specific expression patterns. PLoS ONE 5, e8875.

Dedourge-Geffard, O., Palias, F., Biagianti-Risbourg, S., Geffard, O., Geffard, A., 2009.Effects of metals on feeding rate and digestive enzymes in Gammarus fossarum:an in situ experiment. Chemosphere 77, 1569–1576.

Depledge, M.H., Billinghurst, Z., 1999. Ecological significance of endocrine disruptionin marine invertebrates. Mar. Pollut Bull. 39, 32–38.

De Schamphelaere, K.A.C., Vandenbrouck, T., Muyssen, B.T.A., Soetaert, A., Blust,R., De Coen, W., Janssen, C.R., 2008. Integration of molecular with higher-leveleffects of dietary zinc exposure in Daphnia magna. Comp. Biochem. Physiol. D3, 307–314.

Di Toro, D.M., McGrath, J.A., Hansen, D.J., 2000. Technical basis for narcotic chemi-cals and polycyclic aromatic hydrocarbon criteria. I: water and tissue. Environ.Toxicol. Chem. 19, 1951–1970.

Epel, D., 1998. Use of multidrug transporters as first lines of defense against toxinsin aquatic organisms. Comp. Biochem. Physiol. A 120, 23–28.

Espinoza, H.M., Williams, C.R., Gallagher, E.P., 2012. Effect of cadmium on glu-tathione S-transferase and metallothionein gene expression in coho salmonliver, gill and olfactory tissues. Aquat. Toxicol. 110–111, 37–44.

Feldstein, T., Nelson, N., Mokady, O., 2006. Cloning and expression of MDR trans-

-based analysis of gene expression profiles in the amphipod MelitaToxicol. (2014), http://dx.doi.org/10.1016/j.aquatox.2013.11.022

porters from marine bivalves, and their potential use in biomonitoring. Mar.Environ. Res. 62, S118–S121.

Fraser, B.A., Weadick, C.J., Janowitz, I., Rodd, F.H., Hughes, K.A., 2011. Sequencing andcharacterization of the guppy (Poecilia reticulate) transcriptome. BMC Genomics12, 202.

ING Model

A

Toxico

G

G

G

H

H

H

H

H

H

H

H

H

I

J

J

K

K

K

K

K

K

K

L

L

L

L

L

L

L

L

ARTICLEQTOX-3691; No. of Pages 16

S.E. Hook et al. / Aquatic

aertner, K., Chandler, G.T., Quattro, J., Ferguson, P.L., Sabo-Atwood, T., 2012. Iden-tification and expression of the ecdysone receptor in the harpacticoid copepod,Amphiacus tenuiremis, in response to fipronil. Ecotoxicol. Environ. Saf. 76, 39–45.

iardine, B., Riemer, C., Hardison, R.C., Burhans, R., Elnitski, L., Shah, P., Zhang, Y.,Blankenberg, D., Albert, I., Taylor, J., Miller, W., Kent, W.J., Nekrutenko, A., 2005.Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 15,1451–1455.

oecks, J., Nekrutenko, A., Taylor, J., The Galaxy Team, 2010. Galaxy: a com-prehensive approach for supporting accessible, reproducible, and transparentcomputational research in the life sciences. Genome Biol. 25, R86.

ale, M.C., McCormick, C.R., Jackson, J.R., DeWoody, J.A., 2009. Next-generationpyrosequencing of gonad transcriptomes in the polyploidy lake sturgeon(Acipenser fulvescens): the relative merits of normalization and rarefaction ingene discovery. BMC Genomics 10, 203.

annas, B.R., Wang, Y.H., Baldwin, W.S., Li, Y., Wallace, A.D., LeBlanc, G.A., 2010.Interactions of the crustacean nuclear receptors HR3 and E75 in the regulationof gene transcription. Gen. Comp. Endocrinol. 167, 268–278.

ayward, A., Takahashi, T., Bendena, W.G., Tobe, S.S., Hui, J.H.L., 2010. Comparativegenomic and phylogenetic analysis of vitellogenin and other large lipid transferproteins in metazoans. FEBS Lett. 584, 1273–1278.

ook, S.E., 2010. Promise and progress in environmental genomics: a status reporton the applications of microarray studies in ecologically relevant fish species. J.Fish Biol. 77, 1999–2022.

ook, S.E., Osborn, H.L., 2012. Comparison of toxicity and transcriptomic profiles ina diatom exposed to oil, dispersants, dispersed oil. Aquat. Toxicol. 124, 139–151.

ornett, E.A., Wheat, C.W., 2012. Quantitative RNA-Seq analysis in non-modelspecies: assessing transcriptome assemblies as a scaffold and the utility of evo-lutionary divergent genomic reference species. BMC Genomics 13, 361.

udson, M.E., 2008. Sequencing breakthroughs for genomic ecology and evolution-ary biology. Mol. Ecol. Resour. 8, 3–17.

yne, R.V., Gale, S.A., King, C.K., 2005. Laboratory culture and life-cycle experimentswith the benthic amphipod Melita plumulosa (Zeider). Environ. Toxicol. Chem.24, 2065–2073.

yne, R.V., 2011. Review of the reproductive biology of amphipods and theirendocrine regulation: identification of mechanistic pathways for reproductivetoxicants. Environ. Toxicol. Chem. 30, 2647–2657.

haka, R., Gentleman, R., 1996. R: a language for data analysis and graphics. J. Comput.Graph. Stat. 5, 299–314.

acobson, T., Sundelin, B., Yang, G.D., Ford, A.T., 2011. Low dose TBT exposuredecreases amphipod immunocompetence and reproductive fitness. Aquat. Tox-icol. 101, 72–77.

ung, H., Lyons, R.E., Dinh, H., Hurwood, D.A., McWilliam, S., Mather, P.B., 2011. Tran-scriptomics of a Giant freshwater prawn, (Macrobrachium rosenbergii): De Novoassembly, annotation and marker discovery. PLoS ONE 12, 1–14, e27938.

awahara-Miki, R., Wada, K., Azuma, N., Chiba, S., 2011. Expression profiling with-out genome sequence information in a non-model species, Pandalid Shrimp(Pandalus latirostris), by Next-Generation Sequencing. PLoS ONE 6, e26043.

im, E., Magen, A., Ast, G., 2007. Different levels of alternative splicing among eukary-otes. Nucleic Acids Res. 35, 125–131.

ing, C.K., Simpson, S.L., Smith, S.V., Stauber, J.L., Batley, G.E., 2005. Short-term accu-mulation of Cd and Cu from water, sediment and algae by the amphipod Melitaplumulosa and the bivalve Tellina deltoidalis. Mar. Ecol. Prog. Ser. 287, 177–188.

ing, C.K., Gale, S.A., Stauber, J.L., 2006a. Acute toxicity and bioaccumulation of aque-ous and sediment-bound metals in the estuarine amphipod Melita plumulosa.Environ. Toxicol. 21, 489–504.

ing, C.K., Gale, S.A., Hyne, R.V., Stauber, J.L., Simpson, S.L., Hickey, C.W., 2006b.Sensitivities of Australian and New Zealand amphipods to copper and zinc inwaters and metal-spiked sediments. Chemosphere 63, 1466–1476.

umar, S., Blaxter, M.L., 2010. Comparing de novo assemblers for 454 transcriptomedata. BMC Genomics 11, 571.

wok, R., Zhang, J.R., Tobe, S.S., 2005. Regulation of methyl farnesoate productionby mandibular organs in the crayfish, Procambarus clarkii: a possible role forallatostatins. J. Insect Physiol. 51, 367–378.

eBlanc, G.A., 2007. Crustacean endocrine toxicology: a review. Ecotoxicology 16,61–81.

ee, J.S., Rhee, J.S., Kim, R.O., Hwang, D.S., Han, J., Choi, B.S., Park, G.S., Kim, I.C., Park,H.G., Lee, Y.M., 2010. The copepod Tigriopus japonicus genomic DNA information(574 Mb) and molecular anatomy. Mar. Environ. Res. 69, S21–S23.

eslie, E.M., Deeley, R.G., Cole, S.P.C., 2005. Multidrug resistance proteins: role of P-glycoprotein, MRP-1, MRP-2 and BCRP (ABCG2) in tissue defense. Toxicol. Appl.Pharmacol. 204, 216–237.

i, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abeca-sis, G., Durbin, R., 1000 Genome Project Data Processing Subgroup, 2009. TheSequence alignment/map (SAM) format and SAM tools. Bioinformatics 25,2078–2079.

i, H., Durbin, R., 2010. Fast and accurate long-read alignment with Burrows-Wheelertransform. Bioinformatics 26, 589–595.

i, C., Weng, S., Chen, Y., Yu, X., Lu, L., Zhang, H., He, J., Xu, X., 2012. Analysis of Litope-naeus vannamei transcriptome using the next generation sequencing technique.PLoS ONE 7, e47442.

ivak, K.J., Schmittgen, T.D., 2001. Analysis of relative gene expression data using

Please cite this article in press as: Hook, S.E., et al., 454 pyrosequencingplumulosa: Transcriptome assembly and toxicant induced changes. Aquat.

real-time quantitative PCR and the 2−��Ct method. Methods 25, 402–408.owe, C.D., Mello, L.V., Samatar, N., Martin, L.E., Montagnes, D.J.S., Watts, P.C., 2011.

The transcriptome of the novel dinoflagellate Oxyrrhis marina (Alveolata: Dino-phycae): response to salinity examined by 454 sequencing. BMC Genomics 12,519.

PRESSlogy xxx (2014) xxx– xxx 15

Ma, K., Qiu, G., Feng, J., Li, J., 2012. Transcriptome analysis of the Oriental RiverPrawn, Macrobranchium nipponense using 454 pyrosequencing for discovery ofgenes and markers. PLoS ONE 7, e39727.

Maniatis, T., Tasic, B., 2002. Alternative pre-mRNA splicing and proteome expansionin metazoans. Nature 418, 236–243.

Mann, R.M., Hyne, R.V., Spadaro, D.A., Simpson, S.L., 2009. Development and appli-cation of a rapid amphipod reproduction test for sediment quality assessment.Environ. Toxicol. Chem. 28, 1244–1254.

Mann, R.M., Hyne, R.V., Simandjuntak, D.L., Simpson, S.L., 2010. A rapid amphipodreproduction test for sediment quality assessment: in situ bioassays do notreplicate laboratory bioassays. Environ. Toxicol. Chem. 29, 2566–2574.

Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka, J.,Braverman, M.S., Chen, Y.J., Chen, Z.T., Dewell, S.B., Du, L., Fierro, J.M., Gomes, X.V.,Godwin, B.C., He, W., Helgesen, S., Ho, C.H., Irzyk, G.P., Jando, S.C., Alenquer, M.L.I.,Jarvie, T.P., Jirage, K.B., Kim, J.B., Knight, J.R., Lanza, J.R., Leamon, J.H., Lefkowitz,S.M., Lei, M., Li, J., Lohman, K.L., Lu, H., Makhijani, V.B., McDade, K.E., McKenna,M.P., Myers, E.W., Nickerson, E., Nobile, J.R., Plant, R., Puc, B.P., Ronan, M.T., Roth,G.T., Sarkis, G.J., Simons, J.F., Simpson, J.W., Srinivasan, M., Tartaro, K.R., Tomasz,A., Vogt, K.A., Volkmer, G.A., Wang, S.H., Wang, Y., Weiner, M.P., Yu, P.G., Begley,R.F., Rothberg, J.M., 2005. Genome sequencing in microfabricated high density-picolitre reactors. Nature 437, 376–380.

Martin, J.A., Wang, Z., 2011. Next-generation transcriptome assembly. Nat. Rev.Genet. 12, 671–682.

Martinez-Barnatche, J., Gomez-Barreto, R.E., Ovilla-Munoz, M., Tellez-Sosa, J., Gar-cia Lopez, D.E., Dinglasan, R.R., Mohein, C.U., MacCallum, R.M., Redmond, S.N.,Gibbons, J.G., Rokas, A., Machado, C., Cazares-Raga, F.E., Gonzalez-Ceron, L.,Hernandez-Martinez, S., Rodriguez Lopez, M.H., 2012. Transcriptome of theadult female malaria mosquito vector Anopheles albimanus. BMC Genomics 13,207.

McCarthy, F.M., Wang, N., Magee, G.B., Nanduri, B., Lawrence M.L., Camon, E.B.,Barrell, D.G., Hill, D.P., Dolan, M.E., Williams, W.P., Luthe, D.S., Bridges, S.M.,Burgess, S.C., 2006. AgBase: a functional genomics resource for agriculture. BMCGenomics 7, 229.

Mehinto, A.C., Matyniuk, C.J., Spade, D.J., Denslow, N.D., 2012. Applications of nextgeneration sequencing in fish ecotoxicogenomics. Front. Genet. 3, 62.

Mundry, M., Bornberg-Bauer, E., Sammeth, M., Feulner, P.G.D., 2012. Evaluatingcharacteristics of De Novo Assembly software on 454 transcriptome data: asimulation approach. PLoS ONE 7, e13410.

Nekrutenko, A., Taylor, J., 2012. Next generation sequencing data interpretation:enhancing reproducibility and accessibility. Nat. Rev. Genet. 13, 667–672.

Nikkinmaa, M., Rytkonen, K.T., 2011. Functional genomics in aquatic toxicology—donot forget the function. Aquat. Toxicol. 105, 16–24.

Oliveros, J.C., 2007. VENNY. An Interactive Tool for Comparing Lists with Venn Dia-grams. http://bioinfogp.cnb.csic.es/tools/venny/index.html

O’Neil, S.T., Dzurisin, J.D.K., Carmichael, R.D., Lobo, N.F., Emrich, S.J., Hellmann, J.J.,2010. Population-level transcriptome sequencing of nonmodel organisms Eryn-nis propertinus and Papilio zelicaon. BMC Genomics 11, 310.

Osborn, H.L., Hook, S.E., 2013. Using transcriptomic profiles in the diatom Phaeo-dactylum tricornutum to identify and prioritize stressors. Aquat. Toxicol. 138,12–25.

Parra, G., Bradnam, K., Korf, I., 2007. CEGMA: a pipeline to accurately annotate coregenes in eukaryotic genomes. Bioinformatics 23, 1061–1067.

Perez-Casanova, J.C., Hamoutene, D., Hobbs, K., Lee, K., 2011. Effects of chronic expo-sure to the aqueous factor of produced water on growth, detoxification, andimmune factors of Altantic cod. Ecotoxicol. Environ. Saf. 86, 239–249.

Poynton, H.C., Vulpe, C.D., 2009. Ecotoxicogenomics: emerging technologies foremerging contamiantns. J. Am. Water Resour. Assoc. 45, 83–96.

Poynton, H.C., Varshavsky, J.R., Chang, B., Cavigiolio, G., Chan, S., Holman, P.S., Logu-inov, A.V., Bauer, D.J., Komachi, K., Theil, E.C., Perkins, E.J., Hughes, O., Vulpe,C.D., 2007. Daphnia magna ecotoxicogenomics provides mechanistic insightsinto metal toxicity. Environ. Sci. Technol. 41, 1044–1050.

Pruesse, E., Quast, C., Knittel, K., Fuchs, B., Ludwig, W., Peplies, J., Glockner, F.O., 2007.SILVA: a comprehensive online resource for quality checked and aligned riboso-mal RNA sequence data compatible with ARB. Nucleic Acids Res. 35, 7188–7196.

Rewitz, K.F., Gilbert, L.I., 2008. Daphnia Halloween genes that encode cytochromeP450s mediating the synthesis of arthropod molting hormone: evolutionaryimplications. BMC Evol. Biol. 8, 60.

Santure, A.W., Gratten, J., Mossman, J.A., Sheldon, B.C., Slate, J., 2011. Characteri-zation of the transcriptome of a wild great tit Parus major population by nextgeneration sequencing. BMC Genomics 12, 283.

Schliesky, S., Gowik, U., Weger, A.P.M., Bautigam, A., 2012. RNA-seq assembly – arewe there yet? Front. Plant Sci. 3, 220.

Seebaugh, D.R., L’Amoreaux, W.J., Wallace, W.G., 2011. Digestive toxicity in grassshrimp collected along an impact gradient. Aquat. Toxicol. 105, 609–617.

Simpson, S.L., Spadaro, D.A., 2011. Performance and sensitivity of rapid sublethalsediment toxicity tests with the amphipod Melita plumulosa and copepodNitocra spinipes. Environ. Toxicol. Chem. 30, 2326–2334.

Spadaro, D., Micevska, A., Simpson, T.S.L., 2008. Effect of nutrition on toxicity of con-taminants to the epibenthic amphipod Melita plumulosa. Arch. Environ. Contam.Toxicol. 55, 593–602.

Stillman, J.H., Colbourne, J.K., Lee, C.L., Patel, N.H., Philips, M.R., Towle, D.W., Eads,

-based analysis of gene expression profiles in the amphipod MelitaToxicol. (2014), http://dx.doi.org/10.1016/j.aquatox.2013.11.022

B.D., Gelembiuk, G.W., Henry, R.P., Johnson, E.A., Pfrender, M.E., Terwilliger,N.B., 2008. Recent advances in crustacean genomics. Integr. Comp. Biol. 48,852–868.

Tarazona, S., Garcia-Alcalde, F., Dopazo, J., Ferrer, A., Conesa, A., 2011. Differentialexpression in RNA-seq: a matter of depth. Genome Res. 21, 2213–2223.

ING Model

A

1 Toxico

T

T

U

V

V

W

W

transcriptome for the emerging model crustacean Parhyale hawaiensis. BMC

ARTICLEQTOX-3691; No. of Pages 16

6 S.E. Hook et al. / Aquatic

aylor, R.C., Webb Robertson, B.J.M., Markillie, L.M., Serres, M.H., Linggi, B.E., Aldrich,J.T., Hill, E.A., Romine, M.F., Lipton, M.S., Wiley, H.S., 2013. Changes in trans-lational efficiency is a dominant regulatory mechanism in the environmentalresponse of bacteria. Integr. Biol. 5, 1393–1406.

raylor-Knowles, N., Granger, B.R., Lubinski, T., Parikh, J.R., Garamszegi, S., Xia, Y.,Marto, J.A., Kaufman, K., Finnerty, J.R., 2011. Production of a reference transcrip-tome and a transcriptomic database (PocilloporaBase) for the cauliflower coral,Pocillopora damicornis. BMC Genomics 12, 585.

.S. Environmental Protection Agency, 1996. Test Methods for Evaluating SolidWaste, Physical/Chemical Methods. SW-846. Office of Solid Waste, Washington,DC.

an Straalen, N.M., Feder, M.E., 2012. Ecological and evolutionary functionalgenomics – how can it contribute to the risk assessment of chemicals. Environ.Sci. Technol. 46, 3–9.

illeneuve, D.L., Garcia-Reyero, N., 2011. Predictive Ecotoxicology in the 21st Cen-tury. Environ. Toxicol. Chem. 30, 1–8.

Please cite this article in press as: Hook, S.E., et al., 454 pyrosequencingplumulosa: Transcriptome assembly and toxicant induced changes. Aquat.

ang, T.H., Kwon, G., Li, H., LeBlanc, G.A., 2011. Tributyltin synergizes with 20-hydroxyecdysone to produce endocrine toxicity. Toxicol. Sci. 123, 71–79.

erner, I., Nagel, R., 2011. Stress proteins hsp60 and hsp70 in three species ofamphipods exposed to cadmium, diazinon, dieldrin and fluoranthene. Environ.Toxicol. Chem. 16, 2393–2403.

PRESSlogy xxx (2014) xxx– xxx

Weston, D.P., Holmes, R.W., You, J., Lydy, M.J., 2005. Aquatic toxicity due to residen-tial use of pyrethroid insecticides. Environ. Sci. Technol. 39, 9778–9784.

Williams, T.D., Diab, A.M., Gubbins, M., Collins, C., Matejusova, I., Kerr, R., Chip-man, J.K., Kuiper, R., Vethaak, A.D., George, S.G., 2013. Transcriptomic responsesof European flounder (Platichthys flesus) liver to a brominated flame retardantmixture. Aquat. Toxicol. 142–143, 45–52.

Wiseman, S.B., Anderson, J.C., Liber, K., Giesy, J.P., 2013. Endocrine disruption andoxidative stress in larvae of Chironomus dilutes following short-term exposure tofresh or aged oil sands process-affected water. Aquat. Toxicol. 142-143, 414–421.

Yawetz, A., Fishelson, L., Bresler, V., Manelis, R., 2010. Comparison of the effects ofpollution on the marine bivalve Donax trunculus in the vicinity of polluted siteswith specimens from a clean reference site (Mediterranean Sea). Mar. Pollut.Bull. 60, 225–229.

Zeng, V., Villanueva, K.E., Ewen-Campen, B.S., Alwes, F., Browne, W.E., Extavour, C.G.,2011. De novo assembly and characterization of a maternal and developmental

-based analysis of gene expression profiles in the amphipod MelitaToxicol. (2014), http://dx.doi.org/10.1016/j.aquatox.2013.11.022

Genomics 12, 581.Zhao, D., Chen, L., Qin, C., 2010. A delta-class glutathione transferase from the Chi-

nese mitten crab Eriocheir sinensis: cDNA cloning, characterization and mRNAExpression. Fish Shellfish Immunol. 29, 698–703.