Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Supporting InformationHusnik and McCutcheon 10.1073/pnas.1603910113SI Materials and MethodsSymbiont Genome Assembly, Annotation, and Analyses. Endosymbi-ont genomes were closed into circular mapping molecules by thecombination of PCR and Sanger sequencing. General Tremblayaprimers for closing of problematic regions, such as the duplicatedrRNA operon, were designed to be applicable to most Tremblayaprinceps species (Table S4). Given unclear GC skew in some of thespecies, the origin of replication was set to the same region as inalready published Tremblaya and Moranella genomes to stan-dardize comparative genomic analyses. Pilon v1.12 (86) andREAPR v1.0.17 (87) were used to diagnose and improve potentialmisassemblies, collapsed repeats, and polymorphisms. Genomeannotations and reannotations [abbreviations combine Tremblayaprinceps (TP) with species abbreviations such as PCIT; i.e., forTPPCIT, MEPCIT, and Tremblaya phenacola from Phenacoccusavenae (TPPAVE)] were carried out by the Prokka v1.10 pipeline(88) with disabled default discarding of ORFs overlapping tRNAs.Our comparative data allowed us to reannotate many genesand pseudogenes previously annotated as hypothetical proteinsand uncover pseudogene remnants (Dataset S2A). Tremblayapanproteome was curated manually with an extensive use ofMetaPathways v2.0 (89), PathwayTools v17.0 (90), and Inter-Proscan v5.10 (91) and then, used in Prokka as trusted proteins forannotation. This approach was used to obtain identical gene namesfor all seven Tremblaya genomes (TPPAVE, TPPCIT, TPMHIR,TPFVIR, TPPLON, TPPMAR, and TPTPER). tRNA and tmRNAregions were reannotated using tFind.pl wrapper (bioinformatics.sandia.gov/software). Tremblaya pseudogenes were reannotatedin the Artemis browser (92) based on genome alignment of allTremblaya genomes.Genomes of γ-proteobacterial symbionts were annotated as
described for Tremblaya genomes, except that several approacheswere used to assist in pseudogene annotation. Proteins split intotwo or more ORFs were joined into a single pseudogene feature.All proteins were then searched against the National Center ofBiotechnology Information (NCBI) nonredundant protein data-base (NR) database, and their length was compared. If the en-dosymbiont protein was shorter than 60% of its 10 top hits, it wascalled a pseudogene unless it is known to be a bifunctional proteinand at least one of its domains was intact. All intergenic regionswere then screened by BlastX (e value 1e−4] against NR to revealpseudogene remnants.Multigene matrices of conserved orthologous genes for β-pro-
teobacteria (49 genes) and Enterobacteriaceae (80 genes) weregenerated by the PhyloPhlAN package (93). Sequences of genesfor 16S and 23S rRNA were downloaded from the NCBI nucle-otide database and used for Tremblaya- and Sodalis-allied, species-rich phylogenies. All matrices were aligned by the MAFFT v6L-INS-i algorithm (94). Ambiguously aligned positions were ex-cluded by trimAL v1.2 (95) with the −automated 1 flag set forlikelihood-based phylogenetic methods. Maximum likelihood (ML)and Bayesian inference (BI) phylogenetic methods were applied tothe single-gene and concatenated amino acid alignments. MLtrees were inferred using RAxML 8.2.4 (96) under the LG + Gmodel with subtree pruning and regrafting tree search algorithmand 1,000 bootstrap pseudoreplicates. BI analyses were conductedin MrBayes 3.2.2 (97) under the LG + I + G model with 5 milliongenerations [prset aamodel = fixed(lg), lset rates = invgammangammacat = 4, mcmcp checkpoint = yes ngen = 5,000,000].Concatenated 16S–23S rRNA gene phylogenies for mealybugendosymbionts were inferred as above, except that the GTR + I +Gmodel was used. For BI analyses, a proportion of invariable sites
(I) was estimated from the data, and heterogeneity of evolutionaryrates was modeled by four substitution rate categories of the γ- (G)distribution with the γ-shape parameter (α) estimated from thedata. Exploration of Markov chain Monte Carlo convergence andburn-in determination were performed in AWTY (ceb.csit.fsu.edu/awty) and Tracer v1.5 (evolve.zoo.ox.ac.uk). Additionally, concate-nated protein and Dayhoff6 recoded datasets were analyzed underthe CAT + GTR + G model in PhyloBayes MPI 1.5a (98). Pos-terior distributions obtained under four independent PhyloBayesruns were compared using tracecomp and bpcomp programs, andruns were considered converged at maximum discrepancy value100.Tremblaya genomes were aligned using progressiveMauve v2.3.1
(99). Clusters of orthologous genes were generated using OrhoMCLv1.4 (100). Orthologs missed because of low homology (BLASTe value 1e−5) were curated with the help of identical gene orderand annotations. All genomes were visualized as linear withlinks connecting positions of orthologous genes in Processing3(https://processing.org/). Additional figures were drawn or curatedin Inkscape (https://inkscape.org/en/).
Contamination Screening and Filtering of Draft Mealybug Genomes.The presence of additional species, such as facultative symbionts,environmental bacteria, and contamination in the genome data werevisualized by the Taxon-Annotated GCCoverage (TAGC; drl.github.io/blobtools/) plots (101, 102), and the tool was also used to extractcontigs of two γ-proteobacterial symbionts from the Pseudococcuslongispinus mealybug and Wolbachia sp. from the Maconellicoccushirsutusmealybug. We confirmed that there were no other organismspresent in our data at high coverage, except the expected endo-symbionts. Although there are now reliable methodologies to re-move the majority of contamination from data sequenced usingseveral independent libraries (102, 103), recognizing low-coveragecontamination (in our case, mostly of bacterial, human, and plantorigin) from single-library sequencing data can be problematic. Usingthe TAGC Tool, we were able to recognize low-coverage Propioni-bacterium spp. and human contamination in several of the samples(megablast e value 1e−25) and plant contamination in the P. longispinussample. These short sequences were filtered out, and also, all(nonsymbiont) contigs or scaffolds shorter than 200 bp and/or havingcoverage lower than 3× were excluded from the total assemblies.
Draft Insect Genomes and HGTs. Endosymbiont contigs and PhiXcontigs (from the spike in of Illumina libraries) were excluded fromassemblies, and insect genome assemblies were evaluated by theQuast v.2.3 Tool (104) for basic assembly statistics and by theCEGMAv2.5 (105) andBUSCOv1.1 (106) withArthropoda datasetfor gene completeness (Table S2). Lacking RNA Sequencing datato properly annotate the draft genomes, only preliminary genepredictions were carried out by unsupervised GeneMark-ES (107)runs to get exon structures for scaffolds with HGTs.Horizontally transferred genes previously identified in the Pla-
nococcus citri genome were used as queries for BlastN, tBlastN,and tBlastX searches against custom databases made of scaffoldsfrom individual species. Additionally, two approaches were usedto minimize false negative results possibly caused by highly di-verged and/or fragmented HGTs undetected by BLAST searches.First, nucleotide alignments of individual HGTs (see above) wereused as Hidden Markov Model profiles in nhmmer (108) searchesagainst scaffolds of individual assemblies. Second, BLAST data-bases were made out of all raw fastq reads and searched bytBlastN using protein HGTs from P. citri as queries.
Husnik and McCutcheon www.pnas.org/cgi/content/short/1603910113 1 of 9
http://bioinformatics.sandia.gov/softwarehttp://bioinformatics.sandia.gov/softwarehttp://ceb.csit.fsu.edu/awtyhttp://ceb.csit.fsu.edu/awtyhttp://evolve.zoo.ox.ac.uk/https://processing.org/https://inkscape.org/en/http://drl.github.io/blobtools/http://drl.github.io/blobtools/www.pnas.org/cgi/content/short/1603910113
Lineage-specific candidates of HGT were detected as reportedpreviously (9) using the NR database (downloaded March 17, 2015).We used stringent screening criteria: only genes present on longscaffolds containing insect genes or present in several mealybuggenomes were considered as strongly supported HGT candidateshere (Table S3). Moreover, all scaffolds of HGT candidates pre-sented here were confirmed by mapping raw read data and manuallyexamined for low-coverage regions and potential misassembliescreated by the joining of low-coverage contigs of bacterial contam-inants with bona fide insect contigs.A multigene mealybug phylogeny was inferred as above using
419 concatenated protein sequences of the core eukaryotic proteinsidentified from six mealybug genomes by the CEGMA package.Phylogenetic trees for individual HGTs were inferred as reportedpreviously (9), except that the workflow was implemented using theETE3 Python Toolkit (109).
Microscopy.Whole-mealybug individuals stored in absolute ethanolwere postfixed with 4% (vol/vol) paraformaldehyde in PBS for 1 h;
dehydrated by 1-h incubations in 80%, 90%, and 100% (vol/vol)ethanol; cleared in xylene two times for 1 h each, and paraffinembedded overnight. Paraffin blocks were sectioned to 5–7 μMsections, deparaffinized in xylene two times for 5 min each, andthen, hydrated through a 100%, 85%, and 70% (vol/vol) ethanolseries. Hybridization was done according to the work by vanLeuven et al. (110). No probe and RNase A controls were used toassess insect tissue autofluorescence. The following fluorochrome-labeled oligonucleotide probes targeting 16S rRNA were used forendosymbiont in situ hybridization of M. hirsutus [TPMHIR: 5′-Cy3-ATGCCACCCTTCCTCCCGAA-3′;Doolittlea endobiaMHIR(DEMHIR): 5′-Cy5-CTTTCATTTTCTTCCCCGTT-3′] and Par-racoccus marginatus [TPPMAR: ACGCCCYCCTTCATCCC-GAA; Mikella endobia PMAR (MEPMAR): 5′-Cy5-TAATAAC-TTTCTTCCTTGCT-3′]. An Olympus FV 1000 IX Inverted LaserScanning Confocal Microscope was used for imaging with 60× and100× oil immersion lenses. Image postprocessing was done in Fijiv1.51a (111).
Husnik and McCutcheon www.pnas.org/cgi/content/short/1603910113 2 of 9
www.pnas.org/cgi/content/short/1603910113
Fig. S1. Supplementary phylogenetic trees. Values at nodes represent support from ML bootstrap pseudoreplicates. (A) Multigene ML phylogeny of Trem-blaya within β-proteobacteria inferred from 49 concatenated protein sequences. (B) Zoomed-in Tremblaya ML phylogeny inferred from the 16S–23S rRNAalignment. (C) Multigene mealybug ML phylogeny inferred from 419 concatenated CEGMA protein sequences. (D) ML phylogeny of γ-proteobacterial sym-bionts inferred from the 16S–23S rRNA alignment. Clade labels A–G were adopted from the work by Thao et al. (43).
Husnik and McCutcheon www.pnas.org/cgi/content/short/1603910113 3 of 9
www.pnas.org/cgi/content/short/1603910113
Fig.S2
.Schem
atic
diagramsofinsect
scaffoldsco
ntainingHGTs
invo
lved
inam
inoacid
andBvitamin
metab
olism.Insect
exons(predictedbyGen
eMarkES
)areco
lor-co
ded
asgreen
rectan
glesan
dwhen
inclose
proximityto
HGTs,an
notatedbytheirputative
functions.Gen
esofbacterial
origin
arehighlig
htedin
yello
w.(A
)Gen
ome
localiz
ationofbioABD,ribAD,lysA
,dap
F,an
dtm
sHGTs
confirm
ingthat
they
arepresentoninsect
scaffolds.
Only
thelongestscaffold
forea
chHGTisshown,becau
sethescaffoldsfrom
differentmea
lybugspeciessharegen
eorder.(B)Alig
nmen
tsofM.hirsutus,P.
marginatus,
andF.
virgatascaffoldsshowingcysK
acquisitionafter
divergen
ceoftheMaconellicoccusclad
ean
dcysK
duplicationin
F.virgata(alsopresentin
P.citrian
dP.
longispinus)
andriboflav
intran
sporter
duplicationin
P.marginatus.
Husnik and McCutcheon www.pnas.org/cgi/content/short/1603910113 4 of 9
www.pnas.org/cgi/content/short/1603910113
Fig. S3. FISH confirming that intrabacterial symbionts reside inside Tremblaya cells in (A) M. hirsutus and (B) P. marginatus mealybugs. Tremblaya cells are ingreen, and γ-proteobacterial symbionts (DEMHIR and MEPMAR) are in red. (Scale bar: 10 μm.)
Husnik and McCutcheon www.pnas.org/cgi/content/short/1603910113 5 of 9
www.pnas.org/cgi/content/short/1603910113
Table
S1.
Extended
assembly
metrics
fordraft
mea
lybuggen
omes
Assem
bly
metric
MHIR
FVIR
PCIT
(rea
ssem
bly)
PLON
TPER
PMAR
Totalassembly
size
(bp)
163,04
4,54
430
4,57
0,83
237
7,82
9,87
228
4,99
0,20
123
7,58
2,51
819
1,20
8,35
1To
talno.ofscaffolds
12,889
32,723
167,51
466
,857
80,386
60,102
No.ofscaffolds≥1,00
0bp
8,04
321
,984
64,930
40,284
58,090
33,617
Largestscaffold
(bp)
393,85
032
2,87
382
,122
182,78
854
,847
76,575
N50
jN75
47,025
j22,30
025
,562
j12,55
17,07
8j3
,639
10,126
j4,908
4,68
1j2
,689
6,79
9j3
,788
G+
C(%
)35
.334
.234
.333
.731
.536
.1No.ofNsper
100kb
p97
.820
.715
2.6
26.2
8.8
34.1
CEG
MA
complete
(of24
8)23
9(96.37
%)
239(96.37
%)
236(95.16
%)
229(92.34
%)
236(95.16
%)
242(97.58
%)
CEG
MA
complete
pluspartial
246(99.19
%)
243(97.98
%)
245(98.79
%)
244(98.39
%)
247(99.60
%)
245(98.79
%)
BUSC
OsEu
karyota
(n=42
9)C:85%
[D:7.4%],F:3.0%
,M:11%
C:84%
[D:5.1%],F:3.9%
,M:11%
C:80%
[D:6.9%],F:7.2%
,M:11%
C:78%
[D:3.4%],F:9.0%
,M:12%
C:77%
[D:4.1%],F:10
%,M:12%
C:82%
[D:5.8%],F:5.5%
,M:11%
BUSC
OsArthropoda(n=2,67
5)C:76%
[D:3.5%],F:14
%,M:9.4%
C:76%
[D:3.3%],F:13
%,M:9.9%
C:71%
[D:4.8%],F:16
%,M:12%
C:70%
[D:2.3%],F:16
%,M:13%
C:66%
[D:2.3%],F:16
%,M:16%
C:72%
[D:3.0%],F:15
%,M:12%
Allva
lues
werecalculatedwithouten
dosymbiontan
dlow-cove
rageco
ntaminationco
ntigs.BUSC
OsArthropodaassessmen
tsforAcyrthosiphonpisum
gen
omeassembly
asareference:C:72%
[D:6.1%
],F:15
%,M:12%
.C,co
mplete;D,duplicated
;F,
frag
men
ted;M,missing.
Husnik and McCutcheon www.pnas.org/cgi/content/short/1603910113 6 of 9
www.pnas.org/cgi/content/short/1603910113
Table
S2.
Insect
scaffoldsco
ntaininghorizo
ntally
tran
sferredgen
es
Gen
ecategory
andHGT
Scaffold
nam
e,length,an
dk-mer
cove
rage(m
erged
k-mers)
MHIR
FVIR
PCIT
PLON
TPER
PMAR
Bvitamin
metab
olism
bioA
NODE_
1095
_434
37_3
0.54
27_ID_2
189*
NODE_
2692
_292
64_4
5.82
33_ID_5
383
NODE_
1158
_223
96_2
1.40
66_ID_2
315
NODE_
1345
4_63
21_9
2.05
54_ID_2
6907
NODE_
5755
_774
9_42
.283
5_ID_1
1509
NODE_
1563
8_39
63_4
8.90
07_ID_3
1275
NODE_
3702
_143
32_3
0.93
77_ID_7
403
bioB
NODE_
206_
1036
77_2
6.44
1_ID_4
11*
NODE_
1537
_394
45_3
7.31
68_ID_3
073
NODE_
1118
_226
42_2
2.61
56_ID_2
23NODE_
1146
0_73
25_1
11.564
_ID_2
2919
NODE_
386_
1932
5_14
.747
1_ID_7
71NODE_
1524
_159
17_4
6.93
02_ID_3
047
bioD
NODE_
407_
7651
4_32
.340
2_ID_8
13*
NODE_
1082
3_77
22_4
6.66
98_ID_2
1645
NODE_
1705
0_60
03_2
8.55
77_ID_3
4099
NODE_
6031
_117
80_4
1.46
39_ID_1
2061
NODE_
6741
_717
7_24
.074
_ID_1
348
1NODE_
2159
8_29
96_3
9.46
89_ID_4
3195
ribA
NODE_
36_1
7833
0_31
.654
2_ID_7
1*NODE_
854_
5179
8_32
.657
2_ID_1
707
NODE_
1211
8_77
09_9
.087
8_ID_2
4235
NODE_
1018
7_81
29_4
5.45
34_ID_2
0373
NODE_
2234
6_34
61_1
4.50
56_ID_4
4691
NODE_
1442
_163
34_4
2.47
95_ID_2
883
ribD
NODE_
3471
_116
46_3
7.17
15_ID_6
941
NODE_
4692
_194
96_3
3.19
48_ID_9
383*
NODE_
2235
9_48
79_3
7.49
48_ID_4
4717
NODE_
4881
_134
43_3
8.11
56_ID_9
761
NODE_
1083
2_54
98_4
6.80
1_ID_2
1663
NODE_
9906
_543
6_52
.039
4_ID_1
9811
pan
CNA
NODE_
1895
_355
06_4
2.12
94_ID_3
789*
NA
NA
NA
NA
Aminoacid
metab
olism
cysK
NA
NODE_
1251
_435
41_3
6.83
07_ID_2
501*
NODE_
5169
_123
55_8
.405
61_ID_1
0337
NODE_
6319
_114
25_9
6.43
25_ID_1
2637
NODE_
5086
_819
5_13
.861
8_ID_1
0171
NODE_
317_
2780
1_43
.935
8_ID_6
33NODE_
1576
_200
02_2
0.75
4_ID_3
151
NODE_
2819
3_28
29_7
0.88
61_ID_5
6385
NODE_
3332
_150
01_3
6.89
71_ID_6
663
dap
FNODE_
2062
_242
85_2
6.55
33_ID_4
123*
NODE_
5954
_158
83_3
8.14
28_ID_1
1907
NODE_
962_
2395
5_17
.903
9_ID_1
923
NODE_
2046
5_42
68_3
6.19
01_ID_4
0929
NODE_
6454
_733
5_15
.475
_ID_1
290
7NODE_
2898
6_16
94_1
75.113
_ID_5
7971
lysA
NODE_
59_1
4878
6_27
.084
7_ID_1
17NODE_
4_29
7799
_35.73
95_ID_7
*NODE_
3039
4_37
49_1
4.32
49_ID_6
0787
NODE_
8644
_922
4_44
.821
1_ID_1
7287
NODE_
7424
_681
8_19
.968
9_ID_1
4847
NODE_
1012
_189
19_4
6.86
22_ID_2
023
tms
NODE_
1166
_416
17_2
6.82
28_ID_2
331*
NA
NODE_
6634
_109
45_1
6.72
2_ID_1
3267
NODE_
1305
0_64
99_5
5.78
9_ID_2
6099
NODE_
3443
8_24
17_2
9.26
71_ID_6
8875
NODE_
8338
_614
6_45
.941
4_ID_1
6675
NODE_
5474
_326
3_6.97
353_
ID_1
0947
NODE_
7749
_100
66_1
5.96
57_ID_1
5497
NODE_
2574
6_32
90_1
40.852
_ID_5
1491
NODE_
6297
_742
5_23
.065
4_ID_1
2593
NODE_
4174
_964
4_62
.539
3_ID_8
347
NODE_
1111
5_81
60_6
.391
49_ID_2
2229
NODE_
5435
_125
67_3
3.04
41_ID_1
0869
NODE_
1961
4_37
86_3
20.495
_ID_3
9227
NODE_
1222
7_46
96_3
6.58
5_ID_2
4453
NODE_
3006
_175
61_4
2.37
35_ID_6
011
NODE_
3489
5_23
81_2
8.38
95_ID_6
9789
Peptidoglycanmetab
olism
murA
NA
NODE_
460_
6630
9_43
.134
1_ID_9
19*
NODE_
1135
4_80
54_2
0.72
08_ID_2
2707
NODE_
115_
6123
0_35
.696
2_ID_2
29NA
NA
murB
NA
NODE_
1275
8_57
17_3
3.09
94_ID_2
5515
(possible
pseudogen
e)NODE_
369_
3146
1_35
.533
7_ID_7
37*
NODE_
2053
4_42
54_4
9.71
11_ID_4
1067
NA
NA
murC
NA
NA
NODE_
1601
_198
97_1
5.25
5_ID_3
201*
NODE_
2279
3_38
12_4
9.67
47_ID_4
5585
NA
NA
murD
NA
NA
NODE_
1378
2_70
24_6
.683
02_ID_2
7563
*NODE_
2401
9_35
87_4
0.03
4_ID_4
8037
NA
NA
murE
NA
NA
NODE_
6492
_110
57_8
.877
11_ID_1
2983
*NODE_
1736
3_49
62_3
0.27
12_ID_3
4725
NA
NA
murF
NA
NA
NODE_
594_
2768
0_14
.487
_ID_1
187*
NODE_
4718
_137
04_4
2.36
41_ID_9
435
NA
NA
amiD
NODE_
127_
1240
60_2
4.86
87_ID_2
53NA
NODE_
3719
2_29
84_1
0.55
92_ID_7
4383
NA
NA
NA
mltB
NA
NA
NODE_
1970
3_53
83_1
7.88
93_ID_3
9405
NA
NA
NA
b-Lactamase
NA
NODE_
5744
_164
11_3
2.71
18_ID_1
1487
NODE_
4174
1_24
94_8
7.34
03_ID_8
3481
NODE_
4491
_141
29_3
4.10
56_ID_8
981
NODE_
2774
4_29
40_1
1.90
02_ID_5
5487
NODE_
1286
_172
79_5
0.76
95_ID_2
571
NODE_
1555
0_36
79_3
2.21
36_ID_3
1099
NODE_
9718
_886
9_20
.899
5_ID_1
9435
NODE_
1646
2_52
18_3
3.54
75_ID_3
2923
NODE_
1417
8_46
65_1
2.50
5_ID_2
8355
NODE_
1916
1_54
97_2
2.63
18_ID_3
8321
NODE_
2719
5_30
22_1
66.648
_ID_5
4389
NODE_
2805
2_29
13_1
5.79
6_ID_5
6103
NODE_
2415
4_35
66_5
1.43
41_ID_4
8307
NODE_
6508
_729
7_15
.320
6_ID_1
3015
NODE_
2155
_206
06_3
7.46
45_ID_4
309*
ddlB
NA
NODE_
52_1
2521
4_31
.955
_ID_1
03*
NODE_
2593
_166
10_2
2.72
97_ID_5
185
NODE_
7871
_982
5_39
.201
5_ID_1
5741
NODE_
2901
7_28
31_2
0.62
86_ID_5
8033
NA
Other
DUR1,2
NA
NA
NODE_
1398
_209
65_1
7.31
76_ID_2
795*
(both
ureacarboxylase
and
allophan
atehyd
rolase)
NODE_
2264
_201
56_4
4.51
6_ID_4
527
(only
allophan
atehyd
rolase)
NA
NA
gshA
NA
NA
NODE_
3343
5_33
99_3
0.69
65_ID_6
6869
NA
NA
NA
TypeIII
effector
NA
NODE_
4508
_202
39_3
2.58
29_ID_9
015
NODE_
2326
_173
45_1
0.17
49_ID_4
651
(+more
than
10other
copies)
NODE_
935_
2898
2_40
.281
7_ID_1
869
(+more
than
10other
copies)
NODE_
174_
2313
3_18
.807
5_ID_3
47(+
more
than
10other
copies)
NODE_
1751
_149
72_7
8.53
05_ID_3
501
NODE_
932_
4986
8_38
.390
9_ID_1
863
NODE_
31_4
8312
_40.32
9_ID_6
1NODE_
955_
4923
1_37
.124
8_ID_1
909
NODE_
4166
_965
0_48
.827
9_ID_8
331
NODE_
444_
6748
9_35
.791
4_ID_8
87*
NODE_
6268
_751
3_43
.506
2_ID_1
2535
NODE_
1448
_404
90_3
5.36
09_ID_2
895
chitinase
NA
NA
NA
NA
NODE_
1934
_119
60_1
5.46
34_ID_3
867
NODE_
378_
2643
5_39
.288
4_ID_7
55*
rlmI
NA
NA
NODE_
8054
_986
3_19
.351
2_ID_1
6107
NA
NA
NA
AAA-A
TPases
NODE_
36_1
7833
0_31
.654
2_ID_7
1(+
numerousother
hits)
NODE_
854_
5179
8_32
.657
2_ID_1
707
(+numerousother
hits)
NODE_
3869
_140
76_3
6.17
82_ID_7
737
(+numerousother
hits)
NODE_
3376
_165
44_4
0.29
29_ID_6
751
(+numerousother
hits)
NODE_
4822
_839
6_19
.844
6_ID_9
643
(+numerousother
hits)
NODE_
1442
_163
34_4
2.47
95_ID_2
883
(+numerousother
hits)
Anky
rinrepea
tprotein
(likelyopposite
HGT
direction;i.e
.,from
insectsto
Wolbachia)
NA
NODE_
942_
4956
4_41
.794
3_ID_1
883
(+numerousother
hitsto
anky
rinproteins)
NODE_
1287
_216
00_2
0.31
77_ID_2
573
(+numerousother
hitsto
anky
rinproteins)
NODE_
1876
_218
72_3
8.68
57_ID_3
751
(+numerousother
hitsto
anky
rinproteins)
NODE_
2986
_102
56_2
3.74
63_ID_5
971
(+numerousother
hitsto
anky
rinproteins)
NODE_
1130
_180
92_4
7.76
32_ID_2
259
(+numerousother
hitsto
anky
rinproteins)
NA,notap
plicab
le.
*Longestscaffoldsforea
choftheHGTcandidate.
Husnik and McCutcheon www.pnas.org/cgi/content/short/1603910113 7 of 9
www.pnas.org/cgi/content/short/1603910113
Table S3. Overview of evidence that the HGTs are encoded on the insect genomes
HGT
Phylogenetic origin(does not necessarily
mean donor)
Present in severalmealybug species andforms a single clade
Other bacterialgenes on the
scaffold
Insect geneson the insect
scaffoldsOverall HGTevidence
bioA α-Proteobacteria: Rickettsiales Yes No Yes Strong supportbioB α-Proteobacteria: Rickettsiales Yes No Yes Strong supportbioD α-Proteobacteria: Rickettsiales Yes No Yes Strong supportribA γ-Proteobacteria: Enterobacteriales Yes AAA-ATPase HGT Yes Strong supportribD α-Proteobacteria: Rickettsiales Yes No Yes Strong supportpanC β-Proteobacteria No, only FVIR No Yes Moderate supportcysK γ-Proteobacteria: Enterobacteriales Yes No Yes Strong supportdapF α-Proteobacteria: Rickettsiales Yes No Yes Strong supportlysA α-Proteobacteria: Rickettsiales Yes No Yes Strong supporttms γ-Protobacteria or β-proteobacteria Yes No Yes Strong supportmurA γ-Proteobacteria: Enterobacteriales Yes No Yes Strong supportmurB Bacteroidetes Yes No Yes Strong supportmurC Bacteroidetes (PCIT) Yes but different origin No Yes Moderate support
α-Proteobacteria: Rickettsiales (PLON)murD α-Proteobacteria: Rickettsiales Yes No Yes Strong supportmurE α-Proteobacteria: Rickettsiales Yes No No Moderate supportmurF α-Proteobacteria: Rickettsiales Yes No Yes Strong supportamiD γ-Proteobacteria: Enterobacteriales (PCIT) Yes but different origin No Yes Moderate support
α-Proteobacteria: Rickettsiales (MHIR)mltB γ-Proteobacteria: Enterobacteriales No, only PCIT No No Weaker supportb-Lactamase γ-Proteobacteria: Enterobacteriales Yes No Yes Strong supportddlB α-Proteobacteria: Rickettsiales Yes No Yes Strong supportDUR1,2 γ-Proteobacteria: Enterobacteriales Yes but different origin No No Moderate supportgshA γ-Proteobacteria: Enterobacteriales No, only PCIT No No Weaker supportType III effector γ-Protobacteria or β-proteobacteria Yes No Yes Strong supportchitinase γ-Protobacteria or β-proteobacteria Yes No Yes Strong supportrlmI γ-Proteobacteria: Enterobacteriales No, only PCIT No No Weaker supportAAA-ATPases α-Proteobacteria: Rickettsiales NA ribA HGT Yes Moderate supportAnkyrin repeat
proteinsα-Proteobacteria: Rickettsiales NA No Yes Moderate support
Related to Fig. 4 and Fig. S2. NA, not applicable.
Table S4. Tremblaya primer
Genome region Forward primer Reverse primer(s)
leuA_fwd ↔ rpsO_rRNA_fwd_rev CTAAGGGCTGAGGACGTTGG CCCCTACGCAGCCTGTTTATrpsO_rRNA_fwd_rev ↔ prs_rev CCCCTACGCAGCCTGTTTAT GGGTAGCTCAGCGGTAAGAGtRNA_Gly_fwd ↔ rsmH_rev GCCTAGTGCAGGGATAGAAGG CACTGAGGCTCTGAGTTGGCtRNA_Gly_fwd ↔ 23S_rRNA_rev1 GCCTAGTGCAGGGATAGAAGG CGTTGATAGGCTGGGTGTGTtRNA_Gly_fwd ↔ 23S_rRNA_rev2 GCCTAGTGCAGGGATAGAAGG AAGTTCCGACCTGCACGAATargG_fwd ↔ rib_pseudo_rev CCCTGGCCTATGCTTCTGAC GGAGGTCAGATTCGAGGCAGilvD_fwd ↔ hypothetical_protein_rev ATAAGGAGGAGGGTGCCTGT GTGATGGTGTTAGGTTGCGG
These primers were used for duplicated rRNA operons and one more region-breaking assembly of fiveT. princeps genomes.
Dataset S1. Phylogenetic trees for individual HGTs
Dataset S1
Values at nodes represent support from ML bootstrap pseudoreplicates. Extremely short inner branches were extended by dashed lines for better legibility.a, bioA; b, bioD; c, bioB; d, ribA; e, tms; f, cysK; g, ribD; h, panC; i, DUR1 and DUR12; j, dapF; k, lysA; l, b-lact; m, chiA; n, amiD; o, ddlB; p, murA; q, murB; r, murC;s, murD; t, murE; 1u, murF.
Husnik and McCutcheon www.pnas.org/cgi/content/short/1603910113 8 of 9
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1603910113/-/DCSupplemental/pnas.1603910113.sd01.pdfwww.pnas.org/cgi/content/short/1603910113
Dataset S2. Tremblaya gene information
Dataset S2
(A) Gene order, functional categories from Clusters of Orthologous (COG) groups, Enzyme Commission (E.C.) numbers, protein products, and gene abbre-viations for all Tremblaya genomes. Tremblaya phenacola PAVE inversion is designated by light yellow color, pseudogenes are in red, noncoding RNAs are inmagenta (tRNAs are not shown), and hypothetical proteins are in blue. (B) Raw data to reproduce Fig. 3. There are two copies of leuA in TPTPER and two copiesof aroDQ in DEMHIR. Only glyS is found in TPPAVE and not glyQ. 0, Missing gene; 1, found on the endosymbiont genome; 2, pseudogene; 3, HGT found on theinsect genome; 4, insect gene.
Husnik and McCutcheon www.pnas.org/cgi/content/short/1603910113 9 of 9
http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1603910113/-/DCSupplemental/pnas.1603910113.sd02.xlswww.pnas.org/cgi/content/short/1603910113