Genome Assembly and Allele Distribution of Sulfolobus ... than Sulfulobus islandicus, despite being...

Preview:

Citation preview

Sulfolobus isolateswerepurified from threehot springs inNymphLakeYellowstoneNationalParkin2012(Fig.1)(Table1).

These isolateswere sequencedon an IlluiminaHiSEq2000.Among them, three isolates fromthreedifferenthotspringswerealsosequencedusingPacBioRSII.Allgenomeswereassembledwiththea5pipeline(Trittetal.,2012).SPAdes(Nurk,Bankevichetal.,2013),SSPACE(Boetzeret al., 2011) and MIRA assembler (Chevreux et al., 1999) was used to perform a hybridassemblywithilluminareadsandPacBiodata,whilePBJelly2(Englishetal.,2012)wasusedtofillthegapsbetweenscaffoldsusingPacBioreadsasreferences.

GenomeAssemblyandAlleleDistributionofSulfolobusIslandicusY.Zhou1R.Anderson2andR.J.Whitaker3,1yzhou103@illinois.edu2rikander@illinois.edu3rwhitaker@life.illinois.edu.

InstituteforGenomicBiology,UniversityofIllinoisatUrbana-Champaign

IntroductionExtremophiles such as the genus Sulfolobus thrive in a range of extreme habitats thatwereoncethoughttobe inhospitablefor life.Theseextremehabitatsprovidebarrierstodispersal,allowing us to better investigate population differentiation and its relationship to ecologicalconditions. Previous study has shown that Sulfulobus acidocaldarius genomes are moreconserved than Sulfulobus islandicus, despite being closely related and sharing the samegeothermalhabitats.Therefore,wesuspectconsiderablevariationingenecontentandincoregenomevariationofS.islandicus.

Also, previous study of S. islandicus specific neutral loci YG5714_0138 (“Aldhy” marker) hasshowedthatdifferentALDallelesdominatethroughoutdifferenthotspringsandtimescales.Since the ALD alleles are S. islandicus specific loci that are under natural selection, theirphylogenetic relationship is likely representing the evolutionary pattern of core or variablegenomesofS. islandicus. Inthisstudy,wefocuson41Sulfolobus islandicusgenomesisolatedfromNymphLakeYellowstoneNationalParkin2012.

Questionsthatweseektoaddressinclude:-HowS.islandicusaremoresubjecttointer-speciesvariation?-What’sthespatialandtemporalpatternofgenomevariationofS.islandicus?-Howgenesaredistributedamongstrainsfromdifferenthotspringpopulations?

Approachesthatweusedinclude:-ComparativeanalysisofgenomesequencesofcloselyrelatedgenomessampledovertimeandspacefromYNP;-Correlatingthechangeinfrequencyofcandidatelociovertimewithbioticandabioticchangesbetweennaturalpopulations.

MaterialsandMethods

Results

ACKNOWLEDGEMENTSFundingwasreceivedfromaNASAPostdoctoralFellowshipthroughtheNASAAstrobiology InstitutetoREA,andNASAExobiologyandEvolutionaryBiologyNNX09AM92GtoR.J.W.REFERENCES1.Tritt,Andrew,etal.“Anintegratedpipelinefordenovoassemblyofmicrobialgenomes.”(2012):e42304.2. Nurk, S., et al. “Assembling genomes andmini-metagenomes from highly chimeric reads, p 158–170.”Research in computationalmolecular biology. Springer Verlag, Berlin,Germany(2013).

S.islandicusismorevariablethanS.acidocaldarius

Mao and Grogan (2012) showed that S. acidocaldarius genomes are much more highlyconserved thanS. islandicus, despite being closely related and sharing the same geothermalhabitats. We observe a similar, though less pronounced, trend. Pairwise average nucleotidedivergenceandpolymorphismsamongcoregenomesinS.islandicusis100to1000foldshigherthanthoseinS.acidocaldarius(Table.2).(LNP=LassenNationalPark)

GenomeassemblyofS.islandicus

The de novo assembly of Illumina reads using a5 pipeline yielded between 150 and 220scaffolds, and the longest scaffolds is less than 1/10 of thewhole genome. TheS. islandicusgenomesweremuchmoredifficulttoassemble, likelyduetothepresenceofISelements.Byusing different hybrid assembly methods, we’ve managed to get three consensus draftgenomes(Fig.2).Theresultingscaffoldswerecontiguatedintoasinglesequenceusingabacas(Assefa et al., 2009) for each genome, using the consensus draft genome as a reference. Toensurepropercontiguation,readsweremappedtotheassembliesandvisualizedwithbwaandsamtools(Lietal.,2009).Furthermore,thecontiguatedscaffoldsweremanuallycheckedandrearrangedifnecessarywithArtemis(Rutherfordetal.,2000).

-Despiteoccupyingthesamegenusandgeothermalhabitats,thegenomeofS.acidocaldariusismuchmorehighlyconservedthanS.islandicus.Thismaybeindicativeofdifferenthistoricalpopulationstructure.-2x104SNPswereidentifiedamong37S.islandicusgenomes,whichis100foldshigherthanS.acidocaldarius.-ALDallelesA55,A44,A46andA03werefoundtohavehighsequenceabundanceoccurinalargenumberofsamples,andaredifferentbyonetotwoSNPs.-ALDallelesA46,A55andA44represent91.2%genotypeofalltheisolatesfromNymphLakeYNP2012.

Sofarwe’vegot37draftgenomesofS.islandicus.ThegenomewiseSNPsbetweenalltheisolatesisaround2x104,whichis100foldshigherthanS.acidocaldarius.Tobetterassessgenomevariationamongthesepopulations,weproposetosequencemorestrainsthatwereisolatedinour2008-2010high-frequencysamplingofourfocalsprings.Thecombinedsetofgenomeswillrepresentthegenomedifferencesintimeandspace.

3.English,AdamC.,etal."Mindthegap:upgradinggenomeswithPacificBiosciencesRSlong-readsequencingtechnology."(2012):e47768.4.Boetzer,Marten,etal."Scaffoldingpre-assembledcontigsusingSSPACE."Bioinformatics27.4(2011):578-579.5.Didelot,X.,&Falush,D.(2007).Inferenceofbacterialmicroevolutionusingmultilocussequencedata.Genetics,175(3),1251–66.doi:10.1534/genetics.106.0633056.MaoD,GroganD.(2012).Genomicevidenceofrapid,global-scalegeneflowinaSulfolobusspecies.ISMEJ.6:1613–6.7.RenoML,HeldNL,FieldsCJ,BurkePV,WhitakerRJ.(2009).BiogeographyoftheSulfolobusislandicuspan-genome.Proc.Natl.Acad.Sci.U.S.A.106:8605–10.8.Assefa,Samuel,etal."ABACAS:algorithm-basedautomaticcontiguationofassembledsequences."Bioinformatics25.15(2009):1968-1969.9.Li,Heng,etal."Thesequencealignment/mapformatandSAMtools."Bioinformatics25.16(2009):2078-2079.10.Rutherford,Kim,etal."Artemis:sequencevisualizationandannotation."Bioinformatics16.10(2000):944-945.

#Isolates Species Date Location Spring Temp pH Conduc-tivity

NL01B_C01 25 S.islandicus 9/24/2012 NymphLake Monitor n/a n/a n/a

NL03_C02 12 S.islandicus 9/25/2012 NymphLake North/Spanky

85.8 2.31 1.894

NL13_C01 4 S.islandicus 9/24/2012 NymphLake ProsperousPoint

n/a n/a n/a

Figure1.Summaryof sampling sites in YNP. Left: showsgeographic locationof six basins inwhich acidichot springsweresampled. Right:Sulfate/chlorideplotsofacidichotspringssampledfrompreviousproposal. Focalsprings for thisproposalsampledthroughtimeareshowninred.

For phylogenetic analysis of ALD alleles, 17 samples from three distinct regions withinYellowstoneNational Park:GibbonGeyser Basin (GG),Nymph Lake (NL),NorrisGeyser Basin(NG) at three different time points: 2010, 2011 and 2012. These alleles were amplified andpyrosequencedusingtheRoche/454genomesequencerFLXTitanium+.Forthe41S.islandicusgenomes, all the Illumina short readsmatched to theALDgenewereextractedusingblastn,andassembledusingSequencher.AlltheALDalleleswerealignedusingclustalwandPhylipwasusedtobuildthephylogenetictrees.

Genomecomparisons

AveragePairwiseNucleotideDivergence

Species Samplinglocations Reference

Comparisonsamongall47S.acidocaldariusstrainsfromYNP2012

6.21x10-5 S.acidocaldarius YNP(2012) Andersonetal.,unpublished

DSM639vs.N8 3.70x10-6 S.acidocaldarius YNP(1972)vs.Japan Mao&Grogan,2012

N8vs.Ron12/I 1.30x10-5 S.acidocaldarius Japanvs.Germany Mao&Grogan,2012

Ron12/Ivs.DSM639 1.66x10-5 S.acidocaldarius Germanyvs.YNP(1972) Mao&Grogan,2012

ComparisonsamongYellowstonestrains

2.60x10-3 S.islandicus YNP Renoetal.,2009

ComparisonsbetweenNorthAmericanstrains

4.60x10-3 S.islandicus YNPvs.LNP Renoetal.,2009

ComparisonsbetweeneachNorthAmericanandMutnovskystrain

1.11x10-2 S.islandicus YNP/LNPvs.Mutnovsky Renoetal.,2009

Table1.Listofisolatessentforcompletegenomesequencing

The phylogenetic analysis of dominant ALD alleles and ALD genes from 2012 S. islandicusisolates shows the similarpattern.18/34 isolateshave theA46allele,9/34 isolateshave theA55allele,and4/34isolateshavetheA44allele.A46,A55andA44represent91.2%genotypeof all the isolates from Nymph Lake 2012. Also, there is no significant spacial distributionpatternoftheALDalleles(Fig.4).

ALDalleledistributionofS.islandicus

FortheAldhymarker,previousstudy identified148allelesfrom203,058Aldhysequences.19allelesweredominant,andtheycomprise99%ofthetotalsequenceabundance.AllelesA55,A44,A46andA03were found tohavehigh sequenceabundanceoccur ina largenumberofsamples, and are different by one to two SNPs. Allele A55 and/or A44 is found in highabundanceinNG05(>50%)whilealleleA46isfoundprimarilyinNL01(Fig.3),butalsooccursinNL03andNL13. AlthoughalleleA46 ispresentbutneverfoundatgreaterthan1%inNG05,allelesA55,A44,andA46canallbefoundtovaryingdegreesinNL01,NL03,andNL13.

0.01

0.1

1

10

100

1000

46 55 44 3 85 137 34 82 23 43 77 73 40 28 91 83 69 104 81 35 78 136 94 74 95 106 75 102 15 123 31 45 18 8 80 38 22 13 56

NL01.09.19.10.ALD

MaximumParsimonytree Consensusneighbor-jointree

Conclusions

Recommended