Supporting InformationJanouškovec et al. 10.1073/pnas.1003335107SI Materials and MethodsAnalysis of rearrangements between plastid genomes of CCMP-3155 and P. falciparum was done using GRIMM (1). Homolo-gous genes from large single copy region and genes from in-verted repeats were analyzed altogether (circular topology) orseparately (linear topology) leading to the same total of four in-versions. Ribosomal RNA sequences were aligned using arb-aligner (http://www.arb-silva.de/aligner/). Amino acid sequenceswere aligned using MAFFT v6.240 (2). Alignments were editedusing Bioedit v7.0.9 (3) and Gblocks v 0.91b (4). The subset of 34conserved plastid genes (Fig. 5) was selected based on the max-imum likelihood distances as inferred with TREE-PUZZLE5.2 (5) (cutoff value was set to 0.82). Concatenated nuclear da-taset (i; see below) was analyzed using RAxML 7.1 (6) (LG+Gamma+F model for protein genes and GTR+Gamma forrRNA genes, 1,000 bootstrap replicates) and MrBayes 3.1.2 (7);(WAG+Gamma+F and GTR+Gamma, models, two Markovchains run under default priors for 2 × 106 generations, first 5 ×104 were excluded from consensus topology reconstruction asa burn-in). Plastid datasets (ii, iii, v, vi) were analyzed under theCpREV+Gamma+F empirical model in RAxML (500 boot-strap replicates) and MrBayes (two Markov chains, default pri-ors, 5 × 105 generations, burn-in 5 × 104), and CAT mixturemodel in PhyloBayes 3.2 (8) and PhyML 3.0-CAT (9, 10) (C50model for best tree, C20 for 100 replicate bootstrap analysis).Form II Rubisco dataset (iv) was analyzed using RAxML (LG+Gamma+F, 1,000 bootstrap replicates), MrBayes and Phylo-Bayes (settings same as ii, iii, v, vi). Some bioinformatic analyseswere carried out on the freely available Bioportal (www.bioportal.uio.no).
Phylogenetic Analyses Were Conducted on the Following Datasets.(i) Dataset of nuclear genes (6 protein + 2 rRNA genes; 7,137positions); Fig. 1Genes: hsp90, hsp70, alpha-tubulin, beta-tubulin, biP, eF2, SSU
rRNA, LSU rRNA(ii) Dataset of plastid genes limited to genes retained in di-
noflagellate plastid genomes (11 genes, 4,212 amino acid posi-tions); Fig. S7AGenes: atpA, atpB, petB, petD, psaA, psaB, psbA, psbB, psbC,
psbD, psbE(iii) Dataset of plastid genes limited to the content of api-
complexan plastid genomes (23 genes, 4,438 amino acid posi-tions); Fig. S7BGenes: clpC, rpl2, rpl4, rpl6, rpl11, rpl14, rpl16, rpl23, rpoB,
rpoC1, rpoC2, rps2, rps3, rps4, rps5, rps7, rps8, rps11, rps12, rps17,rps19, sufB, tufA(iv) Dataset of form II Rubisco (455 amino acid positions);
Fig. 3(v) Dataset of 34 conserved plastid genes (7,599 amino acid
positions); Fig. 5Genes: acsF, atpA, atpB, atpH, atpI, clpC, petB, petD, petG,
petN, psaA, psaB, psaC, psaD, psbA, psbB, psbC, psbD, psbE,psbF, psbH, psbI, psbJ, psbK, psbN, psbT, rpl11, rpl14, rpl16, rps12,rps19, rps5, sufB, tufA(vi) Dataset of all plastid genes present in CCMP3155 or C.
velia (68 genes, 15,736 amino acid positions); Fig. S8Genes: acsF, atpA, atpB, atpH, atpI, ccs1, ccsA, clpC, petA,
petB, petD, petG, petN, psaA, psaB, psaC, psaD, psbA, psbB, psbC,psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbN, psbT, psbV, rpl2,rpl3, rpl4, rpl5, rpl6, rpl11, rpl14, rpl16, rpl19, rpl20, rpl23, rpl27,rpl31, rpoA, rpoB, rpoC1, rpoC2, rps2, rps3, rps4, rps5, rps7, rps8,rps11, rps12, rps13, rps14, rps16, rps17, rps18, rps19, secA, secY,sufB, tatC, tufA, ycf3, ycf4
1. Tesler G (2002) GRIMM: Genome rearrangements web server. Bioinformatics 18:492–493.
2. Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: Improvement in accuracyof multiple sequence alignment. Nucleic Acids Res 33:511–518.
3. Hall T (1999) BioEdit: A user-friendly biological sequence alignment editor andanalysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 41:95–98.
4. Castresana J (2000) Selection of conserved blocks from multiple alignments for theiruse in phylogenetic analysis. Mol Biol Evol 17:540–552.
5. Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) TREE-PUZZLE: Maximumlikelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics18:502–504.
6. Stamatakis A (2006) RAxML-VI-HPC: Maximum likelihood-based phylogeneticanalyses with thousands of taxa and mixed models. Bioinformatics 22:2688–2690.
7. Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference of phylogenetictrees. Bioinformatics 17:754–755.
8. Lartillot N, Philippe H (2004) A Bayesian mixture model for across-site heterogeneitiesin the amino-acid replacement process. Mol Biol Evol 21:1095–1109.
9. Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate largephylogenies by maximum likelihood. Syst Biol 52:696–704.
10. Quang S, Gascuel O, Lartillot N (2008) Empirical profile mixture models forphylogenetic reconstruction. Bioinformatics 24:2317–2323.
Janouškovec et al. www.pnas.org/cgi/content/short/1003335107 1 of 7
Fig. S1. Transmission electron microscopy of the CCMP3155 plastid. (A) View of a plastid lobe shows thylakoid lamellae organized in stacks of three (Scale bar,200 nm.) a feature also known from C. velia and dinoflagellates. (B) Characteristic pyrenoid with several thylakoid lamellae entering the site of carbon fixationsurrounded by a sheath of starch (Scale bar, 500 nm.). (C–E) The plastid is bounded by four membranes similarly to plastids of C. velia and apicomplexans (whitearrowheads). (Scale bars, 100 nm.)
Janouškovec et al. www.pnas.org/cgi/content/short/1003335107 2 of 7
1taeper
2taeper
Chromera velia
119798 bp(~121200 bp)
1FRO)GGA(P
Absp2Hpta1)UUU(K
Aasp†Basp
†2-Bpta
2FRO
Ctat
Atep
Dtep
)CCU(G
1-Aasp
Bbsp
3FRO
Bbsp
†
4FROFscaAasp
†2-Aasp
Ebsp
Basp
1Cplc
5FR
O
2Cplc
Cbsp
Dbsp
6FR
O
2lpr
91sp
r
7FR
O
8FR
O4l
prVb
spKb
sp)C
AU(V
)AU
G(Y
)UGU
(T)U
AG(I
Ascc
)GGU(P
3fcy
)GAU(
L)CUU(
E
srr)UAC(
Mf
lrr
Jbsp63lpr
3lpr
Afut
9FRO31spr
3lpr †Afut †
3Cplc
01FRO11FROatpB-1psbN
2spr)AGU(S)UUG(N
2)UUU(KHpta 3)UUU(K
21FRO 7spr 21spr 31FRO Aopr 13lpr 11spr81spr
1Copr
Bopr
11lpr Ipta
41spr Aces
)U
CU(
R
)C
GU(A
)AC
U(W
4spr 2Co
pr
3spr 61l
pr
71sp
r
41lpr 5lp
r 8spr 6lp
r Yces
)AAG(F
)GUU(Q
)ACG(C
)GUG(H
)UAC(I)UCG(S
)CUG(D)AAC(L
)UAC(M 41FRO 51FRO
)GCA(R
Tbsp Casp 61FROGtep
Btep HbspApta
’1)UUU(K ’2Hpta’Absp ’)GGA(P
’1FRO
ycf4EbspFbsp
GtepNbsp
HbspApta
3fcyHptaIpta
2spr
2Copr
1Copr
Bopr1F
RO
02lpr)A
GU(S
61spr)
CC
U(G
Dtep2F
RO
AtepCbsp
BlhcCplc
Casp
11lp
r
4spr
)G
UG(
H)A
CG(
C
)AC
C(W
Kbsp
Ntep
ascc
Vbsp
41sp
rBb
spFs
caIb
sp
Basp
Aasp)CUU(E
)GGU(P
4lpr
32lpr
2lpr91spr
3spr61lpr71spr41lpr5lpr8spr6lpr
5spr
31spr 63lpr 11spr 13lpr Aopr 31lpr 21lpr 7sprAfut
Dasp )AAC(L
)GUU(Q
)AAG(F
3lpr 3FRO 81spr Bfus
Absp
)CCG(G
)CUG(D
)UCG(S
)UAC(I
)AUG(Y Bpta
)UAC(M
TbspJbsp)
UGU(T
)GA
U(L
)U
CU(
R
)CA
U(V
NLhc Llhc72lpr
)U
UU(K
)C
GU(A
)TAC(
Mf
srr dpp lrr
frr Dbsp Bt
ep
)GC
A(R
)UA
G(I
)UU
G(N
1scc
Aces
)UUG(N
)UAG(I
)GCA(R
Btep Db
sp frrlrr
dpp
srr)UAC(Mf
)CGU(A )UUU(K72lprLlhcNlhc )CAU(V
)UCU(R)GAU(L
T(UGU)psbJpsbT
1RI
2RI
CCMP3155
85535 bp expression
photosystems
biosynthesis
other & YCFs
ORFs
pseudogene
CSS
CSL
)AAU(L
*
CCMP2878
)AA
U(L
*
secY
Fig. S2. Plastid genome maps of C. velia and CCMP3155. Genes on the outside are transcribed counter clockwise. All genes are colored according to thefunctional categories (Upper Right). Asterisk next to the gene for tRNA-Leu (UAA) indicates an intron in the anticodon triplet. Crosses in C. velia genes labelpseudogenes. The plastid genome of C. velia has not been proven to map as a circle (dotted line). The relative sizes of both plastid genomes are proportional.
Janouškovec et al. www.pnas.org/cgi/content/short/1003335107 3 of 7
48.5
24.3
97
145.5
48.5
24.3
97
145.5
A B
Fig. S3. Size estimate of the C. velia plastid genome. (A) PFGE of genomic DNA revealed a faint band at the size of approximately 120 kb (arrow). (B) Southernhybridization of a different PFGE run. Radioactively marked psbA plastid gene probe showed hybridization signal at corresponding size. We assume that thelower smudge belongs to sheared plastid DNA. The experiment was reproduced two times with the same result.
L13
S12L4 rpoAL29 S17L23 S19L2 L16 L24S3L22 L14 tufAL3 secYS8 L6L5 L36 L31S13L18 S5 L13S11 S7S9 S12
S12L4 rpoAL29 S17L23 S19L2 L16 L24S3L22 L14 tufAL3 secYS8 L6L5 L36 L31S13L18 S5 S11 S7S9 S12
S12rpoAS17L23 S19L2 L16S3L22 L14 tufAL3 secYS8 L6L5 L36 L31S13S5 S11 S7S9 S12
S12L4 rpoAS17S19L2 L16S3 L14 tufAL3 secYS8 L6L5 L36 S11 S7S12
S12L4 rpoAS17S19L2 L16S3 L14 tufAL3 secYS8 L6L5 L36 L31S13S5 L13S11 S7S12
S12L4 S17L23 S19L2 L16S3 L14 S8 L6 L36S5 S11 S7S12
Haptophyte
CCMP3155
Chromera
Plasmodium
L31
L23
S12L4 rpoAL29 S17L23 S19L2 L16 L24S3L22 L14 tufAL3 secYS8 L6L5 L36 L31S13L18 S5 L13S11 S7S9 S12
tufA
Cryptophytes
Red algae
Heterokonts*
S10
S12L4 S17S19L2 L16S3 L14 tufAsecY
S8
L6L5 S13 S11 S7S12
S10
S10
S10
S13
S12L4 S17S19L2 L16S3 L14 S8
S8
S8
L6 L36S5 S11 S7S12
S12L4 S19
S19
L2 L16S3 L14 tufAL6L5
L5
L36
L36
S5 S11 S7
S7
S12S8
S12L4L4 L2 L16S3 L14 tufAL6 S5 S11 S12S8
tufAToxoplasma
S12L4 S17
S17
S19L2 L16S3 L14 L36S5 S11 S7S12 tufAEimeria
L13
Theileria
Babesia
L6
position of fusion
S10 + spc + alpha operon str operon
ORF
ORF
ORFORF ORF
ORF
Fig. S4. The plastid ribosomal superoperon gives evidence for the red algal origin of alveolate plastids. The superoperon originated by fusion of S10+spc+alpha operon cluster and str operon (Top). Genes in the superoperon are transcribed in left to right order and solid horizontal lines connect neighboringgenes (L = rpl and S = rps ribosomal protein genes). Diagonal lines show transposition of rpl31 in CCMP3155 and C. velia (solid) and additional two possibletranspositions in the ancestor of alveolates (dotted). The white type of the cryptophyte and haptophyte rpl36 gene indicates it was acquired by horizontalgene replacement from a noncyanobacterial donor. The asterisk denotes further modifications of the superoperon in heterokont algae: the presence of ycf88between rps19 and rpl22 in diatoms Odontella sinensis, Phaeodactylum tricornutum and Thalassiosira pseudonana and loss of rpl4, rpl29, and rpl18 in pe-lagophytes Aureoumbra lagunensis and Aureococcus anophagefferens. Red algae: Porphyra purpurea, Porphyra yezoensis, Gracilaria tenuistipitata, Cyani-dioschyzon merolae, Cyanidium caldarium; Cryptophytes: Guillardia theta, Rhodomonas salina; Haptophyte: Emiliania huxleyi; Heterokonts: Vaucheria litorea,Heterosigma akashiwo, T. pseudonana, P. tricornutum, O. sinensis, A. lagunensis, A. anophagefferens, Fucus vesiculosus and Ectocarpus siliculosus.
Janouškovec et al. www.pnas.org/cgi/content/short/1003335107 4 of 7
ribosomal superoperon
ribosomal superoperon
rps4 rpl4
rpl4
rpl4rps4
atpB
9.7 kb, mostly photosystem genes
rps4
ribosomal superoperon
H)
GU
G(
C)
AC
G(
L)
AA
U(
H)
GU
G(
C)
AC
G(C
)A
CG(
E)
CU
U(
F)
AA
G(
Q)
GU
U(
L)
AA
C(
F)
AA
G(
Q)
GU
U(
P)
GG
U(
E)
CU
U( P)
GG
U( F)
AA
G(
Q)
GU
U(
E)
CU
U( P)
GG
U(
L)
AA
U(L
)A
AU(H
)G
UG(
S)
UC
G(
D)
CU
G(
S)
UC
G(
D)
CU
G(
Y)
AU
G(
Y)
AU
G(
I)
UA
C(M)
UA
C(
M)
UA
C(
D)
CU
G(
S)
UC
G(Y)
AU
G(
M)
UA
C (
K)
UU
U(K
)U
UU(
*
*
** ** *
*
*
CCMP3155
CCMP3155
Plasmodium
ToxoplasmaEimeria
CCMP3155
ToxoplasmaEimeria
G)
CCU(
S)
AG
U(
Plasmodium
clpCpsaCrpl11 chlB psbC petA ORF
ORF
ORF
petD
psaD
ORF
ORF
rps16
G)
C CU(
S)
AG
U(clpC
G)
CCU(
S)
AG
U(clpC
rpl11
rpl11
*
L)
GA
U(R)
UC
U(
V)
CA
U (
L)
GA
U(R)
UC
U(
V)
CA
U(
ORF ORF
L)
GA
U(R)
UC
U(
V)
CA
U(
genes:
homologous genes unique to CCMP3155 and apicomplexans
homologous gene elsewhere in the CCMP3155 plastid genome
non-homologous genes
genes lost in apicomplexans
gene order:
conserved in other plastid genomes
putatively homologous
Fig. S5. Conserved gene order in alveolate plastid genomes. The plastid genomes of CCMP3155 and apicomplexans share several uniquely organized geneclusters (see the legend) not found in other plastid genomes. Genes lost in apicomplexans are mostly connected to photosynthetic function in the plastid ofCCMP3155. ORFs in apicomplexans have no homology to other plastid genes.
TTGGGAGTAGGTTTTCACATAGGTTAGACGTGTAAACTTTTCCCATCGTTTTTTATAACAGATATTTTAACTAAAAAAATGTTATTAA TTGGGAGTAGGTTTTCACATAGGTTAGACGTGTAAACTTTTCCCATCGTTTTTTATAACAGATATTTTAACTTTTTTTTTT TTGGGAGTAGGTTTTCACATAGGTTAGACGTGTAAACTTTTCCCATCGTTTTTTATAACAGATATTTTAACTTTTTTTTTT TTGGGAGTAGGTTTTCACATAGGTTAGACGTGTAAACTTTTCCCATCGTTTTTTATAACAGATATTTTAACTTTTTTTTT TTGGGAGTAGGTTTTCACATAGGTTAGACGTGTAAACTTTTCCCATCGTTTTTTATAACAGATATTTTAATTTTTTTTTT
DNA
cDNA-4cDNA-3cDNA-2cDNA-1
TCGTATTATTAGTGTTAGGAGCTGTAAAAAGGCCCCAATCGACCTAAAGATTAATGCAAAACAACCATTACGTTTAAGTAACACTGCC TCGTATTATTAGTGTTAGGAGCTGTAAAAAGGCCCCAATCGACCTAAAGATTAATGCAAAACAACCATTAATTTTTTTTTTT TCGTATTATTAGTGTTAGGAGCTGTAAAAAGGCCCCAATCGACCTAAAGATTAATGCAAAACAACCATTTTTTTTTT
DNAcDNA-1cDNA-2
psbC mRNA 3’UTR
psbB mRNA 3’UTR
DNAcDNA-1 GCACAAGTAGTGCCGTCTTATAGAAGACCTTTTTTTGTTTAAGGGCTTGTCTTCTTAGGCCCTCATTCCTTTTTTTTTTTT
GCACAAGTAGTGCCGTCTTATAGAAGACCTTTTTTTGTTTAAGGGCTTGTCTTCTTAGGCCCTCATTCCATTTGAGCATTTAGTTTA
psaA mRNA 3’UTR
Fig. S6. Polyuridylylation of plastid transcripts in C. velia. Alignment of genomic DNA sequences of three plastid genes, psbC, psbB, and psaA with corre-sponding cDNA sequences. All cDNA clones are terminated with thymidine stretches (in bold) that are absent in genomic DNA suggesting presence of transcriptpolyuridylylation in the plastid. Underlined thymidines may correspond to the 5′UTRs of the circularized transcripts. Presence of polyU tails in psbB and psbCtranscripts was validated by sequencing three clones from 3′RACE products obtained with oligo-dA and gene specific primers.
Janouškovec et al. www.pnas.org/cgi/content/short/1003335107 5 of 7
BA
Prochlorococcus marinus MIT 9301
Synechococcus sp. CC9311
Mesostigma viride
Nephroselmis olivacea
Arabidopsis thaliana
Chlamydomonas reinhardtii
-/92/1/1
Cyanophora paradoxa
Gracilaria tenuistipitata
Porphyra purpurea
Emiliania huxlei
Guillardia theta
Rhodomonas salina
87/100/1/1
Cyanidiosschyzon merolae
Cyanidium caldarium
Phaeodactylum tricornutum
Thalassiosira pseudonana
Odontella sinensis
Heterosigma akashiwo
Vaucheria litorea
Eimeria tenella
Plasmodium falciparum
Toxoplasma gondii100/99/1/1
98/100/1/0.96
-/97/1/-
-/98/1/1
-/98/1/1
0.5
HETEROKONTS
RED ALGAE
HACROBIANS
RED ALGAE
GREEN ALGAE AND PLANT
APICOMPLEXANSGonyaulax polyedra
Heterocapsa triquetra
Amphidinium carterae
0.1
Prochlorococcus marinus MIT 9301
Synechococcus sp. CC9311
Mesostigma viride
Nephroselmis olivacea
Arabidopsis thaliana
Chlamydomonas reinhardtii
Cyanophora paradoxa
Gracilaria tenuistipitata
Porphyra purpurea
Emiliania huxlei
Guillardia theta
Rhodomonas salina
Cyanidioschyzon merolae
Cyanidium caldarium
Phaeodactylum tricornutum
Thalassiosira pseudonana
Odontella sinensis
Heterosigma akashiwo
Vaucheria litorea
HETEROKONTS
HACROBIANS
RED ALGAE
GREEN ALGAE AND PLANT
DINOFLAGELLATES
GLAUCOPHYTE
CYANOBACTERIA
GLAUCOPHYTE
CYANOBACTERIA
CCMP3155Chromera velia
CCMP3155Chromera velia
97/64/-/0.98
-/64/-/0.99
-/66/0.98/0.98
94/-/-/-
100/100/1/-99/98/1/1
98/100/1/1
64/90/-/1
66/-/-/-90/93/1/1
90/68/1/
62/-/-/-
90/77/1/1
-/-/1/-
-/-/1/-
Fig. S7. Concatenated plastid phylogenies support the common origin of alveolate plastids. (A) Analyses of 11 plastid genes retained in dinoflagellate plastidssupports themonophyly of alveolate sequences, their relationship toheterokonts, and themonophyly of all chromalveolate plastids. (B) Analysesof 23genes retainedin apicoplasts supports sequences ofCCMP3155andC. velia as their closest sister group.Maximum likelihood treeswere constructedusingCATmodel (A andB) displayPHYML-CAT/RAxML/MrBayes/Phylobayes branch supports; solid circles indicate 100/100/1/1 supports. Supports ≥60/≥50/≥0.98/≥0.98 are shown as significant.
Janouškovec et al. www.pnas.org/cgi/content/short/1003335107 6 of 7
Cyanophora paradoxa
Nephroselmis olivacea
Chlamydomonas reinhardtii
Arabidopsis thaliana
Mesostigma viride
Phaeodactylum tricornutum
Thalassiosira pseudonana
Vaucheria litorea
Heterosigma akashiwo
Rhodomonas salina
Guillardia theta
Emiliania huxleyi
Porphyra purpurea
Gracilaria tenuistipitata
Cyanidium caldarium
Cyanidioschyzon merolae
Odontella sinensis
Synechococcus sp. CC9311
Prochlorococcus marinus MIT 9301
Aureoumbra lagunensis
Aureococcus anophagefferens
HETEROKONTS
RED ALGAE
HACROBIANS
GLAUCOPHYTE
GREEN ALGAEAND PLANTS
CYANOBACTERIA
CCMP3155
Chromera
Fucus vesiculosus
Ectocarpus siliculosus
0.2
1/99/1/1
1/92/1/1
1/97/1/1
0.98/80/1/-
1/94/1/1
0.99/78/1/1
1/-/-/1
1/81/-/1
1/74/1/1
A + H
RP
Fig. S8. Concatenatedplastid phylogenies of all 68genes found inCCMP3155plastid (excluding rpl36). The treedisplays complete support for groupingalveolate(represented by CCMP3155) and heterokont plastids (A + H, arrow) and red algal plastids (RP, arrow). Dotted branch indicates the placement of C. velia sequence,which received complete support in all analyses. TheMaximum likelihood tree constructed using CpREV+Gamma+Fmodel is displaying PhyML-CAT(aLRT)/RAxML/MrBayes/ PhyloBayes supports over branches; solid circles indicate 1/100/1/1 supports. Only ≥0.98/≥60/≥0.98/≥0.98 supports are shown as significant.
Janouškovec et al. www.pnas.org/cgi/content/short/1003335107 7 of 7