Supporting Information - PNAS · 8/25/2016 · the CAT + GTR + G model in PhyloBayes MPI 1.5a (98). Pos-terior distributions obtained under four independent PhyloBayes runs were

Supporting InformationHusnik and McCutcheon 10.1073/pnas.1603910113SI Materials and MethodsSymbiont Genome Assembly, Annotation, and Analyses. Endosymbi-ont genomes were closed into circular mapping molecules by thecombination of PCR and Sanger sequencing. General Tremblayaprimers for closing of problematic regions, such as the duplicatedrRNA operon, were designed to be applicable to most Tremblayaprinceps species (Table S4). Given unclear GC skew in some of thespecies, the origin of replication was set to the same region as inalready published Tremblaya and Moranella genomes to stan-dardize comparative genomic analyses. Pilon v1.12 (86) andREAPR v1.0.17 (87) were used to diagnose and improve potentialmisassemblies, collapsed repeats, and polymorphisms. Genomeannotations and reannotations [abbreviations combine Tremblayaprinceps (TP) with species abbreviations such as PCIT; i.e., forTPPCIT, MEPCIT, and Tremblaya phenacola from Phenacoccusavenae (TPPAVE)] were carried out by the Prokka v1.10 pipeline(88) with disabled default discarding of ORFs overlapping tRNAs.Our comparative data allowed us to reannotate many genesand pseudogenes previously annotated as hypothetical proteinsand uncover pseudogene remnants (Dataset S2A). Tremblayapanproteome was curated manually with an extensive use ofMetaPathways v2.0 (89), PathwayTools v17.0 (90), and Inter-Proscan v5.10 (91) and then, used in Prokka as trusted proteins forannotation. This approach was used to obtain identical gene namesfor all seven Tremblaya genomes (TPPAVE, TPPCIT, TPMHIR,TPFVIR, TPPLON, TPPMAR, and TPTPER). tRNA and tmRNAregions were reannotated using tFind.pl wrapper (bioinformatics.sandia.gov/software). Tremblaya pseudogenes were reannotatedin the Artemis browser (92) based on genome alignment of allTremblaya genomes.Genomes of γ-proteobacterial symbionts were annotated as

described for Tremblaya genomes, except that several approacheswere used to assist in pseudogene annotation. Proteins split intotwo or more ORFs were joined into a single pseudogene feature.All proteins were then searched against the National Center ofBiotechnology Information (NCBI) nonredundant protein data-base (NR) database, and their length was compared. If the en-dosymbiont protein was shorter than 60% of its 10 top hits, it wascalled a pseudogene unless it is known to be a bifunctional proteinand at least one of its domains was intact. All intergenic regionswere then screened by BlastX (e value 1e−4] against NR to revealpseudogene remnants.Multigene matrices of conserved orthologous genes for β-pro-

teobacteria (49 genes) and Enterobacteriaceae (80 genes) weregenerated by the PhyloPhlAN package (93). Sequences of genesfor 16S and 23S rRNA were downloaded from the NCBI nucle-otide database and used for Tremblaya- and Sodalis-allied, species-rich phylogenies. All matrices were aligned by the MAFFT v6L-INS-i algorithm (94). Ambiguously aligned positions were ex-cluded by trimAL v1.2 (95) with the −automated 1 flag set forlikelihood-based phylogenetic methods. Maximum likelihood (ML)and Bayesian inference (BI) phylogenetic methods were applied tothe single-gene and concatenated amino acid alignments. MLtrees were inferred using RAxML 8.2.4 (96) under the LG + Gmodel with subtree pruning and regrafting tree search algorithmand 1,000 bootstrap pseudoreplicates. BI analyses were conductedin MrBayes 3.2.2 (97) under the LG + I + G model with 5 milliongenerations [prset aamodel = fixed(lg), lset rates = invgammangammacat = 4, mcmcp checkpoint = yes ngen = 5,000,000].Concatenated 16S–23S rRNA gene phylogenies for mealybugendosymbionts were inferred as above, except that the GTR + I +Gmodel was used. For BI analyses, a proportion of invariable sites

(I) was estimated from the data, and heterogeneity of evolutionaryrates was modeled by four substitution rate categories of the γ- (G)distribution with the γ-shape parameter (α) estimated from thedata. Exploration of Markov chain Monte Carlo convergence andburn-in determination were performed in AWTY (ceb.csit.fsu.edu/awty) and Tracer v1.5 (evolve.zoo.ox.ac.uk). Additionally, concate-nated protein and Dayhoff6 recoded datasets were analyzed underthe CAT + GTR + G model in PhyloBayes MPI 1.5a (98). Pos-terior distributions obtained under four independent PhyloBayesruns were compared using tracecomp and bpcomp programs, andruns were considered converged at maximum discrepancy value100.Tremblaya genomes were aligned using progressiveMauve v2.3.1

(99). Clusters of orthologous genes were generated using OrhoMCLv1.4 (100). Orthologs missed because of low homology (BLASTe value 1e−5) were curated with the help of identical gene orderand annotations. All genomes were visualized as linear withlinks connecting positions of orthologous genes in Processing3(https://processing.org/). Additional figures were drawn or curatedin Inkscape (https://inkscape.org/en/).

Contamination Screening and Filtering of Draft Mealybug Genomes.The presence of additional species, such as facultative symbionts,environmental bacteria, and contamination in the genome data werevisualized by the Taxon-Annotated GCCoverage (TAGC; drl.github.io/blobtools/) plots (101, 102), and the tool was also used to extractcontigs of two γ-proteobacterial symbionts from the Pseudococcuslongispinus mealybug and Wolbachia sp. from the Maconellicoccushirsutusmealybug. We confirmed that there were no other organismspresent in our data at high coverage, except the expected endo-symbionts. Although there are now reliable methodologies to re-move the majority of contamination from data sequenced usingseveral independent libraries (102, 103), recognizing low-coveragecontamination (in our case, mostly of bacterial, human, and plantorigin) from single-library sequencing data can be problematic. Usingthe TAGC Tool, we were able to recognize low-coverage Propioni-bacterium spp. and human contamination in several of the samples(megablast e value 1e−25) and plant contamination in the P. longispinussample. These short sequences were filtered out, and also, all(nonsymbiont) contigs or scaffolds shorter than 200 bp and/or havingcoverage lower than 3× were excluded from the total assemblies.

Draft Insect Genomes and HGTs. Endosymbiont contigs and PhiXcontigs (from the spike in of Illumina libraries) were excluded fromassemblies, and insect genome assemblies were evaluated by theQuast v.2.3 Tool (104) for basic assembly statistics and by theCEGMAv2.5 (105) andBUSCOv1.1 (106) withArthropoda datasetfor gene completeness (Table S2). Lacking RNA Sequencing datato properly annotate the draft genomes, only preliminary genepredictions were carried out by unsupervised GeneMark-ES (107)runs to get exon structures for scaffolds with HGTs.Horizontally transferred genes previously identified in the Pla-

nococcus citri genome were used as queries for BlastN, tBlastN,and tBlastX searches against custom databases made of scaffoldsfrom individual species. Additionally, two approaches were usedto minimize false negative results possibly caused by highly di-verged and/or fragmented HGTs undetected by BLAST searches.First, nucleotide alignments of individual HGTs (see above) wereused as Hidden Markov Model profiles in nhmmer (108) searchesagainst scaffolds of individual assemblies. Second, BLAST data-bases were made out of all raw fastq reads and searched bytBlastN using protein HGTs from P. citri as queries.

Husnik and McCutcheon www.pnas.org/cgi/content/short/1603910113 1 of 9

http://bioinformatics.sandia.gov/softwarehttp://bioinformatics.sandia.gov/softwarehttp://ceb.csit.fsu.edu/awtyhttp://ceb.csit.fsu.edu/awtyhttp://evolve.zoo.ox.ac.uk/https://processing.org/https://inkscape.org/en/http://drl.github.io/blobtools/http://drl.github.io/blobtools/www.pnas.org/cgi/content/short/1603910113

Lineage-specific candidates of HGT were detected as reportedpreviously (9) using the NR database (downloaded March 17, 2015).We used stringent screening criteria: only genes present on longscaffolds containing insect genes or present in several mealybuggenomes were considered as strongly supported HGT candidateshere (Table S3). Moreover, all scaffolds of HGT candidates pre-sented here were confirmed by mapping raw read data and manuallyexamined for low-coverage regions and potential misassembliescreated by the joining of low-coverage contigs of bacterial contam-inants with bona fide insect contigs.A multigene mealybug phylogeny was inferred as above using

419 concatenated protein sequences of the core eukaryotic proteinsidentified from six mealybug genomes by the CEGMA package.Phylogenetic trees for individual HGTs were inferred as reportedpreviously (9), except that the workflow was implemented using theETE3 Python Toolkit (109).

Microscopy.Whole-mealybug individuals stored in absolute ethanolwere postfixed with 4% (vol/vol) paraformaldehyde in PBS for 1 h;

dehydrated by 1-h incubations in 80%, 90%, and 100% (vol/vol)ethanol; cleared in xylene two times for 1 h each, and paraffinembedded overnight. Paraffin blocks were sectioned to 5–7 μMsections, deparaffinized in xylene two times for 5 min each, andthen, hydrated through a 100%, 85%, and 70% (vol/vol) ethanolseries. Hybridization was done according to the work by vanLeuven et al. (110). No probe and RNase A controls were used toassess insect tissue autofluorescence. The following fluorochrome-labeled oligonucleotide probes targeting 16S rRNA were used forendosymbiont in situ hybridization of M. hirsutus [TPMHIR: 5′-Cy3-ATGCCACCCTTCCTCCCGAA-3′;Doolittlea endobiaMHIR(DEMHIR): 5′-Cy5-CTTTCATTTTCTTCCCCGTT-3′] and Par-racoccus marginatus [TPPMAR: ACGCCCYCCTTCATCCC-GAA; Mikella endobia PMAR (MEPMAR): 5′-Cy5-TAATAAC-TTTCTTCCTTGCT-3′]. An Olympus FV 1000 IX Inverted LaserScanning Confocal Microscope was used for imaging with 60× and100× oil immersion lenses. Image postprocessing was done in Fijiv1.51a (111).


www.pnas.org/cgi/content/short/1603910113

Fig. S1. Supplementary phylogenetic trees. Values at nodes represent support from ML bootstrap pseudoreplicates. (A) Multigene ML phylogeny of Trem-blaya within β-proteobacteria inferred from 49 concatenated protein sequences. (B) Zoomed-in Tremblaya ML phylogeny inferred from the 16S–23S rRNAalignment. (C) Multigene mealybug ML phylogeny inferred from 419 concatenated CEGMA protein sequences. (D) ML phylogeny of γ-proteobacterial sym-bionts inferred from the 16S–23S rRNA alignment. Clade labels A–G were adopted from the work by Thao et al. (43).



Fig.S2

.Schem

atic

diagramsofinsect

scaffoldsco

ntainingHGTs

invo

lved

inam

inoacid

andBvitamin

metab

olism.Insect

exons(predictedbyGen

eMarkES

)areco

lor-co

ded

asgreen

rectan

glesan

dwhen

inclose

proximityto

HGTs,an

notatedbytheirputative

functions.Gen

esofbacterial

origin

arehighlig

htedin

yello

w.(A

)Gen

ome

localiz

ationofbioABD,ribAD,lysA

,dap

F,an

dtm

sHGTs

confirm

ingthat

they

arepresentoninsect

scaffolds.

Only

thelongestscaffold

forea

chHGTisshown,becau

sethescaffoldsfrom

differentmea

lybugspeciessharegen

eorder.(B)Alig

nmen

tsofM.hirsutus,P.

marginatus,

andF.

virgatascaffoldsshowingcysK

acquisitionafter

divergen

ceoftheMaconellicoccusclad

ean

dcysK

duplicationin

F.virgata(alsopresentin

P.citrian

dP.

longispinus)

andriboflav

intran

sporter

duplicationin

P.marginatus.



Fig. S3. FISH confirming that intrabacterial symbionts reside inside Tremblaya cells in (A) M. hirsutus and (B) P. marginatus mealybugs. Tremblaya cells are ingreen, and γ-proteobacterial symbionts (DEMHIR and MEPMAR) are in red. (Scale bar: 10 μm.)



Table

S1.

Extended

assembly

metrics

fordraft

mea

lybuggen

omes

Assem

bly

metric

MHIR

FVIR

PCIT

(rea

ssem

bly)

PLON

TPER

PMAR

Totalassembly

size

(bp)

163,04

4,54

430

4,57

0,83

237

7,82

9,87

228

4,99

0,20

123

7,58

2,51

819

1,20

8,35

1To

talno.ofscaffolds

12,889

32,723

167,51

466

,857

80,386

60,102

No.ofscaffolds≥1,00

0bp

8,04

321

,984

64,930

40,284

58,090

33,617

Largestscaffold

(bp)

393,85

032

2,87

382

,122

182,78

854

,847

76,575

N50

jN75

47,025

j22,30

025

,562

j12,55

17,07

8j3

,639

10,126

j4,908

4,68

1j2

,689

6,79

9j3

,788

G+

C(%

)35

.334

.234

.333

.731

.536

.1No.ofNsper

100kb

p97

.820

.715

2.6

26.2

8.8

34.1

CEG

MA

complete

(of24

8)23

9(96.37

%)

239(96.37

%)

236(95.16

%)

229(92.34

%)

236(95.16

%)

242(97.58

%)

CEG

MA

complete

pluspartial

246(99.19

%)

243(97.98

%)

245(98.79

%)

244(98.39

%)

247(99.60

%)

245(98.79

%)

BUSC

OsEu

karyota

(n=42

9)C:85%

[D:7.4%],F:3.0%

,M:11%

C:84%

[D:5.1%],F:3.9%

,M:11%

C:80%

[D:6.9%],F:7.2%

,M:11%

C:78%

[D:3.4%],F:9.0%

,M:12%

C:77%

[D:4.1%],F:10

%,M:12%

C:82%

[D:5.8%],F:5.5%

,M:11%

BUSC

OsArthropoda(n=2,67

5)C:76%

[D:3.5%],F:14

%,M:9.4%

C:76%

[D:3.3%],F:13

%,M:9.9%

C:71%

[D:4.8%],F:16

%,M:12%

C:70%

[D:2.3%],F:16

%,M:13%

C:66%

[D:2.3%],F:16

%,M:16%

C:72%

[D:3.0%],F:15

%,M:12%

Allva

lues

werecalculatedwithouten

dosymbiontan

dlow-cove

rageco

ntaminationco

ntigs.BUSC

OsArthropodaassessmen

tsforAcyrthosiphonpisum

gen

omeassembly

asareference:C:72%

[D:6.1%

],F:15

%,M:12%

.C,co

mplete;D,duplicated

;F,

frag

men

ted;M,missing.



Table

S2.

Insect

scaffoldsco

ntaininghorizo

ntally

tran

sferredgen

es

Gen

ecategory

andHGT

Scaffold

nam

e,length,an

dk-mer

cove

rage(m

erged

k-mers)

MHIR

FVIR

PCIT

PLON

TPER

PMAR

Bvitamin

metab

olism

bioA

NODE_

1095

_434

37_3

0.54

27_ID_2

189*

NODE_

2692

_292

64_4

5.82

33_ID_5

383

NODE_

1158

_223

96_2

1.40

66_ID_2

315

NODE_

1345

4_63

21_9

2.05

54_ID_2

6907

NODE_

5755

_774

9_42

.283

5_ID_1

1509

NODE_

1563

8_39

63_4

8.90

07_ID_3

1275

NODE_

3702

_143

32_3

0.93

77_ID_7

403

bioB

NODE_

206_

1036

77_2

6.44

1_ID_4

11*

NODE_

1537

_394

45_3

7.31

68_ID_3

073

NODE_

1118

_226

42_2

2.61

56_ID_2

23NODE_

1146

0_73

25_1

11.564

_ID_2

2919

NODE_

386_

1932

5_14

.747

1_ID_7

71NODE_

1524

_159

17_4

6.93

02_ID_3

047

bioD

NODE_

407_

7651

4_32

.340

2_ID_8

13*

NODE_

1082

3_77

22_4

6.66

98_ID_2

1645

NODE_

1705

0_60

03_2

8.55

77_ID_3

4099

NODE_

6031

_117

80_4

1.46

39_ID_1

2061

NODE_

6741

_717

7_24

.074

_ID_1

348

1NODE_

2159

8_29

96_3

9.46

89_ID_4

3195

ribA

NODE_

36_1

7833

0_31

.654

2_ID_7

1*NODE_

854_

5179

8_32

.657

2_ID_1

707

NODE_

1211

8_77

09_9

.087

8_ID_2

4235

NODE_

1018

7_81

29_4

5.45

34_ID_2

0373

NODE_

2234

6_34

61_1

4.50

56_ID_4

4691

NODE_

1442

_163

34_4

2.47

95_ID_2

883

ribD

NODE_

3471

_116

46_3

7.17

15_ID_6

941

NODE_

4692

_194

96_3

3.19

48_ID_9

383*

NODE_

2235

9_48

79_3

7.49

48_ID_4

4717

NODE_

4881

_134

43_3

8.11

56_ID_9

761

NODE_

1083

2_54

98_4

6.80

1_ID_2

1663

NODE_

9906

_543

6_52

.039

4_ID_1

9811

pan

CNA

NODE_

1895

_355

06_4

2.12

94_ID_3

789*

NA

NA

NA

NA

Aminoacid

metab

olism

cysK

NA

NODE_

1251

_435

41_3

6.83

07_ID_2

501*

NODE_

5169

_123

55_8

.405

61_ID_1

0337

NODE_

6319

_114

25_9

6.43

25_ID_1

2637

NODE_

5086

_819

5_13

.861

8_ID_1

0171

NODE_

317_

2780

1_43

.935

8_ID_6

33NODE_

1576

_200

02_2

0.75

4_ID_3

151

NODE_

2819

3_28

29_7

0.88

61_ID_5

6385

NODE_

3332

_150

01_3

6.89

71_ID_6

663

dap

FNODE_

2062

_242

85_2

6.55

33_ID_4

123*

NODE_

5954

_158

83_3

8.14

28_ID_1

1907

NODE_

962_

2395

5_17

.903

9_ID_1

923

NODE_

2046

5_42

68_3

6.19

01_ID_4

0929

NODE_

6454

_733

5_15

.475

_ID_1

290

7NODE_

2898

6_16

94_1

75.113

_ID_5

7971

lysA

NODE_

59_1

4878

6_27

.084

7_ID_1

17NODE_

4_29

7799

_35.73

95_ID_7

*NODE_

3039

4_37

49_1

4.32

49_ID_6

0787

NODE_

8644

_922

4_44

.821

1_ID_1

7287

NODE_

7424

_681

8_19

.968

9_ID_1

4847

NODE_

1012

_189

19_4

6.86

22_ID_2

023

tms

NODE_

1166

_416

17_2

6.82

28_ID_2

331*

NA

NODE_

6634

_109

45_1

6.72

2_ID_1

3267

NODE_

1305

0_64

99_5

5.78

9_ID_2

6099

NODE_

3443

8_24

17_2

9.26

71_ID_6

8875

NODE_

8338

_614

6_45

.941

4_ID_1

6675

NODE_

5474

_326

3_6.97

353_

ID_1

0947

NODE_

7749

_100

66_1

5.96

57_ID_1

5497

NODE_

2574

6_32

90_1

40.852

_ID_5

1491

NODE_

6297

_742

5_23

.065

4_ID_1

2593

NODE_

4174

_964

4_62

.539

3_ID_8

347

NODE_

1111

5_81

60_6

.391

49_ID_2

2229

NODE_

5435

_125

67_3

3.04

41_ID_1

0869

NODE_

1961

4_37

86_3

20.495

_ID_3

9227

NODE_

1222

7_46

96_3

6.58

5_ID_2

4453

NODE_

3006

_175

61_4

2.37

35_ID_6

011

NODE_

3489

5_23

81_2

8.38

95_ID_6

9789

Peptidoglycanmetab

olism

murA

NA

NODE_

460_

6630

9_43

.134

1_ID_9

19*

NODE_

1135

4_80

54_2

0.72

08_ID_2

2707

NODE_

115_

6123

0_35

.696

2_ID_2

29NA

NA

murB

NA

NODE_

1275

8_57

17_3

3.09

94_ID_2

5515

(possible

pseudogen

e)NODE_

369_

3146

1_35

.533

7_ID_7

37*

NODE_

2053

4_42

54_4

9.71

11_ID_4

1067

NA

NA

murC

NA

NA

NODE_

1601

_198

97_1

5.25

5_ID_3

201*

NODE_

2279

3_38

12_4

9.67

47_ID_4

5585

NA

NA

murD

NA

NA

NODE_

1378

2_70

24_6

.683

02_ID_2

7563

*NODE_

2401

9_35

87_4

0.03

4_ID_4

8037

NA

NA

murE

NA

NA

NODE_

6492

_110

57_8

.877

11_ID_1

2983

*NODE_

1736

3_49

62_3

0.27

12_ID_3

4725

NA

NA

murF

NA

NA

NODE_

594_

2768

0_14

.487

_ID_1

187*

NODE_

4718

_137

04_4

2.36

41_ID_9

435

NA

NA

amiD

NODE_

127_

1240

60_2

4.86

87_ID_2

53NA

NODE_

3719

2_29

84_1

0.55

92_ID_7

4383

NA

NA

NA

mltB

NA

NA

NODE_

1970

3_53

83_1

7.88

93_ID_3

9405

NA

NA

NA

b-Lactamase

NA

NODE_

5744

_164

11_3

2.71

18_ID_1

1487

NODE_

4174

1_24

94_8

7.34

03_ID_8

3481

NODE_

4491

_141

29_3

4.10

56_ID_8

981

NODE_

2774

4_29

40_1

1.90

02_ID_5

5487

NODE_

1286

_172

79_5

0.76

95_ID_2

571

NODE_

1555

0_36

79_3

2.21

36_ID_3

1099

NODE_

9718

_886

9_20

.899

5_ID_1

9435

NODE_

1646

2_52

18_3

3.54

75_ID_3

2923

NODE_

1417

8_46

65_1

2.50

5_ID_2

8355

NODE_

1916

1_54

97_2

2.63

18_ID_3

8321

NODE_

2719

5_30

22_1

66.648

_ID_5

4389

NODE_

2805

2_29

13_1

5.79

6_ID_5

6103

NODE_

2415

4_35

66_5

1.43

41_ID_4

8307

NODE_

6508

_729

7_15

.320

6_ID_1

3015

NODE_

2155

_206

06_3

7.46

45_ID_4

309*

ddlB

NA

NODE_

52_1

2521

4_31

.955

_ID_1

03*

NODE_

2593

_166

10_2

2.72

97_ID_5

185

NODE_

7871

_982

5_39

.201

5_ID_1

5741

NODE_

2901

7_28

31_2

0.62

86_ID_5

8033

NA

Other

DUR1,2

NA

NA

NODE_

1398

_209

65_1

7.31

76_ID_2

795*

(both

ureacarboxylase

and

allophan

atehyd

rolase)

NODE_

2264

_201

56_4

4.51

6_ID_4

527

(only

allophan

atehyd

rolase)

NA

NA

gshA

NA

NA

NODE_

3343

5_33

99_3

0.69

65_ID_6

6869

NA

NA

NA

TypeIII

effector

NA

NODE_

4508

_202

39_3

2.58

29_ID_9

015

NODE_

2326

_173

45_1

0.17

49_ID_4

651

(+more

than

10other

copies)

NODE_

935_

2898

2_40

.281

7_ID_1

869

(+more

than

10other

copies)

NODE_

174_

2313

3_18

.807

5_ID_3

47(+

more

than

10other

copies)

NODE_

1751

_149

72_7

8.53

05_ID_3

501

NODE_

932_

4986

8_38

.390

9_ID_1

863

NODE_

31_4

8312

_40.32

9_ID_6

1NODE_

955_

4923

1_37

.124

8_ID_1

909

NODE_

4166

_965

0_48

.827

9_ID_8

331

NODE_

444_

6748

9_35

.791

4_ID_8

87*

NODE_

6268

_751

3_43

.506

2_ID_1

2535

NODE_

1448

_404

90_3

5.36

09_ID_2

895

chitinase

NA

NA

NA

NA

NODE_

1934

_119

60_1

5.46

34_ID_3

867

NODE_

378_

2643

5_39

.288

4_ID_7

55*

rlmI

NA

NA

NODE_

8054

_986

3_19

.351

2_ID_1

6107

NA

NA

NA

AAA-A

TPases

NODE_

36_1

7833

0_31

.654

2_ID_7

1(+

numerousother

hits)

NODE_

854_

5179

8_32

.657

2_ID_1

707

(+numerousother

hits)

NODE_

3869

_140

76_3

6.17

82_ID_7

737

(+numerousother

hits)

NODE_

3376

_165

44_4

0.29

29_ID_6

751

(+numerousother

hits)

NODE_

4822

_839

6_19

.844

6_ID_9

643

(+numerousother

hits)

NODE_

1442

_163

34_4

2.47

95_ID_2

883

(+numerousother

hits)

Anky

rinrepea

tprotein

(likelyopposite

HGT

direction;i.e

.,from

insectsto

Wolbachia)

NA

NODE_

942_

4956

4_41

.794

3_ID_1

883

(+numerousother

hitsto

anky

rinproteins)

NODE_

1287

_216

00_2

0.31

77_ID_2

573

(+numerousother

hitsto

anky

rinproteins)

NODE_

1876

_218

72_3

8.68

57_ID_3

751

(+numerousother

hitsto

anky

rinproteins)

NODE_

2986

_102

56_2

3.74

63_ID_5

971

(+numerousother

hitsto

anky

rinproteins)

NODE_

1130

_180

92_4

7.76

32_ID_2

259

(+numerousother

hitsto

anky

rinproteins)

NA,notap

plicab

le.

*Longestscaffoldsforea

choftheHGTcandidate.



Table S3. Overview of evidence that the HGTs are encoded on the insect genomes

HGT

Phylogenetic origin(does not necessarily

mean donor)

Present in severalmealybug species andforms a single clade

Other bacterialgenes on the

scaffold

Insect geneson the insect

scaffoldsOverall HGTevidence

bioA α-Proteobacteria: Rickettsiales Yes No Yes Strong supportbioB α-Proteobacteria: Rickettsiales Yes No Yes Strong supportbioD α-Proteobacteria: Rickettsiales Yes No Yes Strong supportribA γ-Proteobacteria: Enterobacteriales Yes AAA-ATPase HGT Yes Strong supportribD α-Proteobacteria: Rickettsiales Yes No Yes Strong supportpanC β-Proteobacteria No, only FVIR No Yes Moderate supportcysK γ-Proteobacteria: Enterobacteriales Yes No Yes Strong supportdapF α-Proteobacteria: Rickettsiales Yes No Yes Strong supportlysA α-Proteobacteria: Rickettsiales Yes No Yes Strong supporttms γ-Protobacteria or β-proteobacteria Yes No Yes Strong supportmurA γ-Proteobacteria: Enterobacteriales Yes No Yes Strong supportmurB Bacteroidetes Yes No Yes Strong supportmurC Bacteroidetes (PCIT) Yes but different origin No Yes Moderate support

α-Proteobacteria: Rickettsiales (PLON)murD α-Proteobacteria: Rickettsiales Yes No Yes Strong supportmurE α-Proteobacteria: Rickettsiales Yes No No Moderate supportmurF α-Proteobacteria: Rickettsiales Yes No Yes Strong supportamiD γ-Proteobacteria: Enterobacteriales (PCIT) Yes but different origin No Yes Moderate support

α-Proteobacteria: Rickettsiales (MHIR)mltB γ-Proteobacteria: Enterobacteriales No, only PCIT No No Weaker supportb-Lactamase γ-Proteobacteria: Enterobacteriales Yes No Yes Strong supportddlB α-Proteobacteria: Rickettsiales Yes No Yes Strong supportDUR1,2 γ-Proteobacteria: Enterobacteriales Yes but different origin No No Moderate supportgshA γ-Proteobacteria: Enterobacteriales No, only PCIT No No Weaker supportType III effector γ-Protobacteria or β-proteobacteria Yes No Yes Strong supportchitinase γ-Protobacteria or β-proteobacteria Yes No Yes Strong supportrlmI γ-Proteobacteria: Enterobacteriales No, only PCIT No No Weaker supportAAA-ATPases α-Proteobacteria: Rickettsiales NA ribA HGT Yes Moderate supportAnkyrin repeat

proteinsα-Proteobacteria: Rickettsiales NA No Yes Moderate support

Related to Fig. 4 and Fig. S2. NA, not applicable.

Table S4. Tremblaya primer

Genome region Forward primer Reverse primer(s)

leuA_fwd ↔ rpsO_rRNA_fwd_rev CTAAGGGCTGAGGACGTTGG CCCCTACGCAGCCTGTTTATrpsO_rRNA_fwd_rev ↔ prs_rev CCCCTACGCAGCCTGTTTAT GGGTAGCTCAGCGGTAAGAGtRNA_Gly_fwd ↔ rsmH_rev GCCTAGTGCAGGGATAGAAGG CACTGAGGCTCTGAGTTGGCtRNA_Gly_fwd ↔ 23S_rRNA_rev1 GCCTAGTGCAGGGATAGAAGG CGTTGATAGGCTGGGTGTGTtRNA_Gly_fwd ↔ 23S_rRNA_rev2 GCCTAGTGCAGGGATAGAAGG AAGTTCCGACCTGCACGAATargG_fwd ↔ rib_pseudo_rev CCCTGGCCTATGCTTCTGAC GGAGGTCAGATTCGAGGCAGilvD_fwd ↔ hypothetical_protein_rev ATAAGGAGGAGGGTGCCTGT GTGATGGTGTTAGGTTGCGG

These primers were used for duplicated rRNA operons and one more region-breaking assembly of fiveT. princeps genomes.

Dataset S1. Phylogenetic trees for individual HGTs

Dataset S1

Values at nodes represent support from ML bootstrap pseudoreplicates. Extremely short inner branches were extended by dashed lines for better legibility.a, bioA; b, bioD; c, bioB; d, ribA; e, tms; f, cysK; g, ribD; h, panC; i, DUR1 and DUR12; j, dapF; k, lysA; l, b-lact; m, chiA; n, amiD; o, ddlB; p, murA; q, murB; r, murC;s, murD; t, murE; 1u, murF.


http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1603910113/-/DCSupplemental/pnas.1603910113.sd01.pdfwww.pnas.org/cgi/content/short/1603910113

Dataset S2. Tremblaya gene information

Dataset S2

(A) Gene order, functional categories from Clusters of Orthologous (COG) groups, Enzyme Commission (E.C.) numbers, protein products, and gene abbre-viations for all Tremblaya genomes. Tremblaya phenacola PAVE inversion is designated by light yellow color, pseudogenes are in red, noncoding RNAs are inmagenta (tRNAs are not shown), and hypothetical proteins are in blue. (B) Raw data to reproduce Fig. 3. There are two copies of leuA in TPTPER and two copiesof aroDQ in DEMHIR. Only glyS is found in TPPAVE and not glyQ. 0, Missing gene; 1, found on the endosymbiont genome; 2, pseudogene; 3, HGT found on theinsect genome; 4, insect gene.

http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1603910113/-/DCSupplemental/pnas.1603910113.sd02.xlswww.pnas.org/cgi/content/short/1603910113

Documents

Supporting Information - PNAS · 8/25/2016 · the CAT + GTR + G model in PhyloBayes MPI 1.5a (98). Pos-terior distributions obtained under four independent PhyloBayes runs were