1
The genomes of three uneven siblings – footprints of the
lifestyle of three Trichoderma species
SUPPLEMENTARY MATERIAL
Genome defense mechanisms
Figures S1-S3 2
Chromatin rearrangement and histone modification
Figures S4 – S18 5
CAZymes
Figures S19 –S22 21
Nitrogen metabolism
Figures S23 – S25 25
Protein kinases
Figure S26 28
Protein phosphatases
Figure S27 29
Calcium signaling
Figure S28 30
Transcription factors
Supplementary note 1 31
Supplementary note 2 31
Figures S29 – S30 33
Genes related to competition and defense
Supplementary note 3 35
Proteins with known effector domains
Figures S31– S32 37
REFERENCES 39
2
Genome Defense Mechanisms
Figure S1. Phylogenetic tree of fungal Dicer-like proteins. TA: T.atroviride, TV: T. virens, TR: T. reesei, TAs:
T. asperellum, TH: T. harzanium, FG: Fusarium graminearum, MG: Magnaporthe grisea, NC: N. crassa, CP:
Cryphonectria parasitica,MC: Mucor circinelloides, CN: Cryptococcus neoformans, SP: Schizosaccharomyces
pombe, AN:Aspergillus nidulans, AT: Arabidopsis thaliana. AT RNAi proteins were used as outgroup. The
number indicates the paralogs in each strain. M: Meiotic Silencing by Unpaired DNA, Q: Quelling. Distance
trees were constructed by the neighbor-joining (NJ) method with the default settings using SeaView software
(1). The robustness of the trees was estimated by performing 1000 bootstrap replicates (expressed as percentage
in the figures).
Seaview Dcr-NJ_tree Wed Sep 23 22:04:30 2015
AT_Dcl4_At5g20320
AT_Dcl-2_At3g03300
AT_Dcl-3_At3g43920
AT_Dcl-1_At1g01040
70
98
SP_DCR1_SPCC188.13c
MC_DCL1_104148
MC_DCL2_116481
TH1_497427
TV1_171147
TR1_69494
87
TAs1_138816
TA_Dcr1_292263
100
100
FG1_FG09025
100
CP_DCL-1_69967
NC_DCL-1/SMS-3_NCU08270
MG_MDL1_MG01541
63
89
96
CN_DCR2_CNAG02745
CN_DCR1_CNAG02742
100
100
100
76
35
100
66
MG_MDL2_MG07167
TH2_89465
TV2_47151
TR2_79823
98
TAs2_148991
TA_Dcr2_291296
100
100
FG2_FG04408
100
CP_DCL-2_276559
81
17
NC_DCL-2_NCU06766
34
AN_AN3189
100
100
NJ 330 sites Poisson 1000 repl.
Q
M
3
Figure S2 Phylogenetic tree of fungal Argonaute proteins. TA: T.atroviride, TV: T. virens, TR: T. reesei,
TAs: T. asperellum, TH: T. harzanium, FG: Fusarium graminearum, MG: Magnaporthe grisea, NC: N. crassa,
CP: Cryphonectria parasitica,MC: Mucor circinelloides, CN: Cryptococcus neoformans, SP:
Schizosaccharomyces pombe, AN:Aspergillus nidulans, AT: Arabidopsis thaliana. AT RNAi proteins were used
as outgroup. The number indicates the paralogs in each strain. M: Meiotic Silencing by Unpaired DNA, Q:
Quelling. Distance trees were constructed by the neighbor-joining (NJ) method with the default settings using
SeaView software (1). The robustness of the trees was estimated by performing 1000 bootstrap replicates
(expressed as percentage in the figures).
Seaview agos1-NJ_tree Wed Sep 23 22:16:49 2015
AT_AGO10_At5g43810
AT_AGO1_At1g48410
AT_AGO5_At2g27880
100
AT_AGO7_At1g69440
AT_AGO3_At1g31290
AT_AGO2_At1g31280100
100
88
AT_AGO9_At5g21150
AT_AGO4_At2g27040
AT_AGO6_At2g32940
100
59
100
SP_AGO1_SPCC736.1
94
CN_AGO1_CNAG04609
MC_Ago2_104162
MC_Ago3_104163
MC_Ago1_154929
80
100
30
41
TH3_530248
TAs3_44486
TA_Ago3_36522
TV3_37110
70
TR3_107068
29
58
FG2_FG00348
100
CP_AGL4_261854
MG2_MG11029
100
94
NC_SMS-2_NCU09434
93
31
100
CP_AGL3_268359
TH2_78235
TV2_181363
TR2_60270
99
TA_Ago2_20708
89
CP_AGL2_292762
100
60
MG3_MG10003
AN_AN1519
74
65
TH1_417954
TV1_112874
TAs2_26314
TAs1_132085
TA_Ago1_245602
100
100
100
TR1_49832
60
FG1_FG08752
100
MG1_MG01294
NC_QDE-2_NCU04730
95
93
CP_AGL1_74333
98
99
88
100
NJ 311 sites Poisson 1000 repl.
Q
M
4
Figure S3 Phylogenetic tree of fungal RNA-dependent RNA polymerase (RdRp). TA: T.atroviride, TV: T.
virens, TR: T. reesei, TAs: T. asperellum, TH: T. harzanium, FG: Fusarium graminearum, MG: Magnaporthe
grisea, NC: N. crassa, CP: Cryphonectria parasitica,MC: Mucor circinelloides, CN: Cryptococcus neoformans,
SP: Schizosaccharomyces pombe, AN:Aspergillus nidulans, AT: Arabidopsis thaliana. AT RNAi proteins were
used as outgroup. The number indicates the paralogs in each strain. M: Meiotic Silencing by Unpaired DNA, Q:
Quelling. Distance trees were constructed by the neighbor-joining (NJ) method with the default settings using
SeaView software (1). The robustness of the trees was estimated by performing 1000 bootstrap replicates
(expressed as percentage in the figures).
Q
M
Seaview Rdr-NJ_tree Wed Sep 23 22:27:06 2015
AT6_sde-1/sgs2_At3g49500
AT1_At1g14790
FG5_FG09076
95
MG2_MG02748
TH2_489404
TV2_28428
TAs2_451362
TA_Rdr2_225118
100
100
TR2_103470
80
FG2_FG08716
100
CP3_10929
99
NC_SAD1_NCU02178
50
37
AN1_AN4790
47
SP_RDR1_SPAC6F12
88
CN_CNAG03466
MC2_144762
78
24
44
46
TH3_95176
TV3_10390
TR3_49048
99
TAs3_195554
TA_Rdr3_317554
84
100
FG3_FG01582
100
CP4_339656
NC_RRP3_NCU08435
97
53
AN2_AN2717
27
MG3_MG06205
28
54
100
AT4_At2g19920
AT3_At2g19910
65
100
TH1_134243
TV1_122493
TR1_67742
86
TAs1_181102
TA_Rdr1_321718
42
69
FG1_FG06504
100
CP1_35624
NC_QDE1_NCU07534
MG1_MG07682
68
44
25
CP2_270014
68
FG4_FG04619
58
MC1_82874
100
96
NJ 234 sites Poisson 1000 repl.
5
Chromatin rearrangement and histone modification
Figure S4. Phylogenetic tree of Trichoderma spp. histone variants. The unrooted phylogenetic tree was
constructed by the neighbor-joining method, using using MEGA 5.01 (2). Numbers on branches indicate the
bootstrap values obtained after 1000 replications. The tree shows the different histone variants marked with
different colors. Ch: Colletotrichum higginsianum, Fo: Fusarium oxysporum. Nh: Nectria haematococca, Nc:
Neurospora crassa, Gz: Gibberella zeae, Sc: Saccharomyces cerevisiae.
6
Figure S5. Molecular architecture of centromere and kinetochore. The present scheme represents the
conserved proteins that are involved in the structure of the centromere and kinetochore, as well as in nucleosome
modifications. For our purposes, we assume a similar architecture for Trichoderma spp. The centromere is
composed of centric chromatin and centromeric heterochromatin, where CENP-B delimits both regions. The
centromeric heterochromatin is characterized by H3K9 trimethylation which is recognized for HP1, together
with the presence of methylated cytosines. In the centric chromatin, CENP-A nucleosomes are interspersed
between nucleosomes containing canonical H3 histones, with the histone modifications H3K9 and H4K20
methylations. The kinetochore has been partitioned into the inner kinetochore that includes chromatinproximal
components, and the outer kinetochore, which is composed of complexes that mediate microtubule (MT)
attachment and is broadly conserved. In the inner kinetochore, the Constitutive CentromereAssociated Network
(CCAN) includes CENP-C, which binds to DNA and CENP-A at one end, and to other CCAN proteins on the
other end. CCAN proteins associate closely with members of the Mis12 complex. In the outer kinetochore, the
major microtubule binding activity of the kinetochore is mediated by an assembly of both the Knl1 and the
Ndc80 complexes, at the Mis12 complex. This structure attaches cooperatively to microtubules. The Ndc80
complex is a rod-like molecule, with globular domains at either end of the rod that are involved in MT
attachment. The Dam1 complex forms rings around microtubules, and localizes behind the Ndc80 head domains.
At this location, a Dam1 complex ring encircles the MT lattice as well as the coiled-coil domain of the Ndc80
complex. Together, Dam1 and Ndc80 complexes are the major microtubule-binding kinetochore proteins that
organize end-on kinetochore-micotubule attachment.
7
Figure S6. Open or closed chromatin conformations and chromatin remodelers. Chromatin is the complex
assemblage of DNA, histone proteins, and other non-histone protein components. Changes in chromatin
structure result in modification of functional properties, since many regulatory processes occur at this level.
Transcriptionally silent chromatin involves the action of histone methyltransferases, which catalyze the
H3K9met2, 3 that in turn is recognized by the chromodomain of HP1. Another post-translational modification of
histones, that is associated to silent chromatin is ubiquitination (H2AK119Ub), as well as a hypoacetylated state
of histones, DNA methylation and incorporation of the linker histone H1.
The closed chromatin is converted to an open and transcriptionally active state by exchange/incorporation of
chromatin components, removal and addition of covalent modifications of histones and DNA. In contrast to
heterochromatin, histone hyperacetylation (H3K9) is a mark of a permissive chromatin, which is catalyzed by
coactivator complexes such as SAGA. Other non-histone protein components include HMGA and HMGB that
participate in establishment of transcriptional factors on gene promoters. During gene expression, there is
alteration in the positioning of nucleosomes, since either promoter regions of active genes seem to be largely
devoid of nucleosomes or the promoter-associated nucleosomes have been repositioned, for instance by the
action of ATP-dependent chromatin-remodeling enzymes present in SWI/SNF or RSC complexes. Chromatin-
remodeling complexes perturb or reposition nucleosomes, allowing access to DNA-binding sites.
8
Figure S7a. Tree view of Snf2 family – Rad5/16 like proteins. Unrooted neighbour-joining tree (using
bootstrap) from a multiple alignment of Snf2 predicted protein sequences from T. virens, T. atroviride and T.
reesei, N. crassa and G. zeae. Phylogenetic and molecular evolutionary analyses were conducted using MEGA
version 4 (3).
9
Figure S7b. Tree view of Snf2 family – other proteins. Unrooted neighbour-joining tree (using bootstrap)
from a multiple alignment of Snf2 predicted protein sequences from T. virens, T. atroviride, T. reesei, N. crassa
and G. zeae. Phylogenetic and molecular evolutionary analyses were conducted using MEGA version 4 (3).
10
A
B
Figure S8. Structure and phylogeny of SNF5 family proteins. (A) The SNF5 domain (orange) has been
predicted as an enzymatic unit for multiple demethylases. The Sfh1 protein has a GATA zinc finger domain
(green). (B) Unrooted neighbour-joining tree (using bootstrap) from a multiple alignment of predicted Snf5
protein sequences from T. virens, T. atroviride, T. reesei, N. crassa and G. zeae. Phylogenetic and molecular
evolutionary analyses were conducted using MEGA version 4 (3).
11
Figure S9. Phylogenetic analysis of HMG family proteins. Unrooted neighbour-joining tree (using bootstrap)
from a multiple alignment of HMG protein sequences from T. virens, T. atroviride, T. reesei, N. crassa and G.
zeae. Phylogenetic and molecular evolutionary analyses were conducted using MEGA version 4 (3).
12
Figure S10. Phylogenetic tree of histone acetyltransferases in Trichoderma spp. The unrooted tree was
constructed by the neighbor-joining method using MEGA 5.01 (2). N. crassa and G. zeae were used as outgroup.
Numbers on branches indicate the bootstrap values obtained after 500 replications. The tree shows the different
families of histone acetyltransferases.
13
Figure S11. Phylogenetic analysis and structure of bromodomain proteins found in Trichoderma spp. (A)
Phylogenetic tree of bromodomain proteins present in the Trichoderma spp. genomes. The unrooted tree was
constructed by the neighbor-joining method, using MEGA 5.01 (2). Numbers on branches indicate the bootstrap
values obtained after 500 replications. (B) Schematic structures of bromodomain proteins in Trichoderma spp.
The bromodomain is represented by orange boxes. The protein length is indicated at the right side of pictures.
Due to high degree in homology between the Trichoderma spp. bromodomains, only the T. atroviride
bromodomain is shown.
14
Figure S12. Proteins containing SANT-domaind present in Trichoderma spp. (A) Phylogenetic tree of
SANT-domain proteins. The unrooted tree was constructed by the neighbor-joining method using MEGA 5.01
(2). Numbers on branches indicate the bootstrap values obtained after 500 replications. The tree shows the
relation among Rsc8 and Ada2 with the rest of SANT-domain proteins. (B) Sequence alignment of the SANT
domain proteins from Trichoderma spp. with Ada2 and Rsc8 from S. cerevisiae (Sc). Asterisks indicate the
conserved residues that are predicted to form the core of the SANT domain. The shaded residues indicate
identical and conserved amino acids.
TA 268637
TR 4124
TV 193762
GZ 46126337
NC 85095018
GZ 46111675
NC 164424105
NC 85108949
TR 21557
GZ 46110086
TA 263654
TV 198070
TA 273518
TV 231090
TR 120908
NC 40882161
TA 292194
TV 192559
TR 123619
100
61
97
100
100
88
65
64
100
62
45
45
100
100
2
ada2
Rsc8
SANT_1
SANT_2
* * *
A
B
15
Figure S13. Schematic representation of Histone Deacetylases (HDACs) in fungi. The structure of fungal
histone deacetylases are schematically indicated with the blue boxes (histone deacetylase domain) and a long C-
terminal region for RPD3, PHD1, HOSB, whereas HDA1 has a histone deacetylase domain (blue boxes) and an
Arb2 domain (orange box). The structure of sirtuin 2 (SIR2) represents the sirtuin-typic catalytic core domain.
The catalytic domains are flanked by distinct N- and C-terminal extensions (gray boxes).
16
Figure S14. Phylogenetic analysis of histone deacetylases (HDACs) in filamentous fungi using the Poisson
model. The HDACs sequences used to construct the tree were from T. atroviride (TA), T. virens (TV), T. reesei
(TR), N. crassa (NC) and G. zeae (GZ). The numbers at nodes correspond to the percentages of 500 bootstrap
replications. The phylogenetic analysis was performed using MEGA 5 (2).
17
A
B
Figure S15. Structure and phylogeny of JmjC-domain-containing proteins. (A) The JmjC domain (blue) has
been predicted as an enzymatic unit for multiple demethylases. JARID-like proteins have a C5HC2 (gray)
domain that has been shown to be important for demethylase activity. (B) Unrooted neighbour-joining tree
(using bootstrap) from a multiple sequence alignment of predicted JmjC protein sequences from T. virens, T.
atroviride and T. reesei, N. crassa and G. zeae. Phylogenetic and molecular evolutionary analyses were
conducted using MEGA version 4 (3).
18
A
B
Figure S16. The Ubiquitin system as involved in histone modification. (A) Scheme of structure of proteins of
the ubiquitin system. E2 ubiquitin-conjugating enzymes present the UBC domain (green), which donate
ubiquitin to the ε-amino group of specific lysine residues, often in a substrate-specific manner. BRE1 is a RING
finger-containing E3 ubiquitin ligase, and the RING domain (orange) is essential for their ubiquitin ligase
activity. (B) Phylogenetic analysis of proteins of the ubiquitin system. Unrooted neighbour-joining tree (using
bootstrap) from a multiple alignment of the predicted ubiquitin system related protein sequences from T. virens,
T. atroviride, T. reesei, N. crassa and G. zeae. Phylogenetic and molecular evolutionary analyses were
conducted using MEGA version 4 (3).
19
Figure S17. SUMO functions. SUMO is translated as a precursor protein and it has to be processed by the Ulp1
protease. Mature SUMO is then activated by the heterodimeric SUMO-activating enzyme (E1), in an ATP-
dependent manner. Later, activated SUMO is transferred to the conjugating enzyme (E2), which in conjunction
with the SUMO ligase (E3), conjugates the SUMO protein to lysine residues of a wide variety of substrate
proteins, including histone proteins.
20
Figure S18. Phylogenetic tree of proteins involved in sumoylation in Trichoderma spp. The unrooted tree
was constructed using the neighbor-joining method of MEGA 5.01. Numbers on branches indicate the bootstrap
values obtained after 1000 replications. The tree shows the proteins involved sumoylation such as E1-activating
enzyme (Aos1/Uba2), E2-conjugating (Ubc9) and E3 ligases (SizA and Mms2) as well as the SUMO protein
Smt3.
21
CAZymes
Figure S19. Comparison of amino acid similarities of cellulases between T. atroviride, T. reesei and T.
virens. Average % aa-positives (grey bars) and % aa-identities (black bars) are shown. Protein sequences were
compared by pairwise alignments (BLASTP) and average values [%] of identical amino acids (black bars) or
amino acids with similar biochemical properties (positives; grey bars) were calculated from all three possible
pairwise alignments (TA-TV, TA-TR, TV-TR). The obtained data are indicative of the overall conservation of
the respective proteins among the three tested Trichoderma species. For example, Cel3c is a particularly well
conserved protein, whereas Cel61c protein sequences exhibit considerable sequence differences.
60 70 80 90 100
CEL3a
CEL3b
CEL3c
CEL3d
CEL3e
CEL5a
CEL5b
CEL6a
CEL7a
CEL7b
CEL12a
CEL12b
CEL12c
CEL45a
CEL45b
CEL61a
CEL61b
CEL61c
CEL74a
%
aa-identities
aa-positives
22
Figure S20. Phylogenetic relationships of cellulases between T. atroviride, T. reesei and T. virens. Sequences
were retrieved from the respective JGI genome databases. Multiple alignments were created with ClustalX 2.0
(4), manually refined in GeneDoc (5) and phylogenetic analysis was carried out with MEGA 4 (3) using the
Neighbour Joining, a distance algorithmic method, and stability of clades was evaluated by 1000 bootstrap
rearrangements.
23
Figure S21. Comparison of amino acid similarities of the main chitinases and so far characterized
glucanases between T. atroviride, T. reesei and T. virens. Average % aa-positives (grey bars) and % aa-identities
(black bars) are shown. Protein sequences were compared by pairwise alignments (BLASTP) and average values
[%] of identical amino acids (black bars) or amino acids with similar biochemical properties (positives; grey
bars) were calculated from all three possible pairwise alignments (TA-TV, TA-TR, TV-TR). The obtained data
are indicative of the overall conservation of the respective proteins among the three tested Trichoderma species.
For example, NAG1 is a particularly well conserved protein, whereas ECH30 protein sequences exhibit more
sequence differences.
60 70 80 90 100
ECH42
NAG1
CHIT33
CHIT36
ECH30
Endo T
BGN3(GH15)
BGN16.3(GH30)
%
aa-identities
aa-positives
24
Figure S22. Phylogenetic relationships of chitinases and glucanases, respectively, between between T.
atroviride, T. reesei and T. virens. Sequences were retrieved from the respective JGI genome databases.
Multiple alignments were created with ClustalX 2.0 (4), manually refined in GeneDoc (5) and phylogenetic
analysis was carried out with MEGA 4 (3) using the Neighbour Joining, a distance algorithmic method, and
stability of clades was evaluated by 1000 bootstrap rearrangements.
Ta 49469 Chit33
Tv 178019 Chit33
Tr 43873 Chit33 Chi18-12
Tr 119859 Ech30 Chi18-13
Ta 79492 Ech30 Chi18-13
Tv 58102 Ech30 Chi18-13
Tv 46824 BGN16.3
Ta 43222 BGN16.3
Tr 3094 BGN16.3
Ta 91075 BGN3
Tv 51057 BGN3
Tr 64906 BGN3
Ta 131598 Ech42
Tr 80833 Ech42
Tv 111866 Ech42
Ta 217415 EndoT
Tr 122812 EndoT
Tv 47547 EndoT
Tr 59791 Chit36
Tv 89999 Chit36
Ta 83999 Chit36
Tr 21725 Nag1
Tv 111394 Nag1
Ta 136120 Nag1
Ta 41039 Nag2
Tr 23346 Nag2
Tv Nag2
94
100
74
100
73
99
100
99
100
98
100
58
100
99
100
93
100
99
19
27
33
36
96
100
0.2
25
Nitrogen metabolism
Figure S23. Phylogenetic analysis of proteins involved in nitrate assimilation. NJ phylogenetic tree based on
the alignment of N. crassa (Nc) proteins involved in nitrate assimilation, and homologous proteins found in T.
atroviride (Ta), T. virens (Tv), T. reesei (Tr) and G. zeae (Gz). The unrooted tree was constructed using the
neighbor-joining method of MEGA 5.01. Numbers on branches indicate the bootstrap values obtained after 1000
replications.
26
Figure S24. Phylogenetic analysis of proteins involved in purine catabolism. NJ phylogenetic tree showing
clusters of homolgs of N.crassa (Nc) proteins from T. atroviride (Ta), T. virens (Tv), T. reesei (Tr) and G. zeae
(Gz) involved in the purine catabolic pathway. The unrooted tree was constructed using the neighbor-joining
method of MEGA 5.01. Numbers on branches indicate the bootstrap values obtained after 1000 replications.
27
Figure S25. Phylogenetic analysis of proteins involved in the glutamine assimilation pathway. NJ
phylogenetic tree showing clusters of homolgs of N.crassa (Nc) proteins in T. atroviride (Ta), T. virens (Tv), T.
reesei (Tr) and G. zeae (Gz) involved in the purine catabolic pathway. The unrooted tree was constructed using
the neighbor-joining method of MEGA 5.01. Numbers on branches indicate the bootstrap values obtained after
1000 replications.
28
Protein kinases
Figure S26. Phylogenetic analysis of histidine kinases of T. atroviride (TA), T. virens (TV) and T. reesei
(TR) along with sequences from N. crassa (NCU) and Gibberella. Sequences were aligned using Clustal X (4)
and phylogenetic analysis was performed with MEGA4 (3) using the minimum evolution algorithm with 500
bootstrap cycles.
29
Protein phosphatases
Figure S27. Phylogenetic tree of Trichoderma Ser/Thr protein phosphatases. The phylogenetic tree was
constructed by the minimum-evolution method, using MEGA 4 (3). Numbers on branches indicate the bootstrap
values obtained after 500 replications. TR: T. reesei, TV: T. virens, TA: T. atroviride, NCU: N. crassa, AN:
Aspergillus nidulans, An: Aspergillus niger, AO: Aspergillus oryzae, AFL: Aspergillus flavus, Afu: Aspergillus
fumigatus.
PP2A
PP2A
PP5
PP2B
PP1
FCP/SCP
PP2C
PP2A-related
PP2A-related
PP2C-related
PPP
PPM
TV 76450
TR 56872
TA 81292
NCU06563
An18g04600
A.tubingensis 0200203
TR 48910
TA 141173
TV 82688
NCU06630
An11g00420
TV 84259
TA 301210
TR 74884
NCU08301
TR 55868
TV 77355
TA 148949
TA 296970
TR 52144
TV 197274
AN8820
AO 090020000552
TA 207023
NCU03804
TR 59944
TV 211614
TR 79535
TV 32949
TA 253020
NCU07489
AN3793
TV 193217
TA 301856
TV 111355
TR 120722
NCU00043
Aterreus 02596
AN0410
TR 122050
TV 157255
NCU08380
TA 165307
NCU02943
TR 28199
TV 30939
TA 175017
TR 74030
TA 156154
TV 189826
NCU03495
TR 21256
TV 84520
TA 50094
NCU01767
AO 090001000488
AFL 09131
TV 40568
TR 124001
TA 33728
TV 181584
TV 181508
TR 58587
NCU00434
TA 128623
Afu5g13340
AO 090005001595
NCU04600
TR 81164
TA 302981
TV 72651
99
100
58
98
58
100
77
99
71
100
61
100
83
65
69
76
100
75
97
100
94
96
93
100
67
49
61
99
45
52
98
98
98
92
77
100
97
80
58
100
43
97
55
100
69
100
43
62
100
99
86
97
98
84
81
99
96
79
65
38
39
100
100
75
94
100
0.2
30
Calcium signaling
Figure S28 Phylogenetic tree of calcium ATPases of T. reesei, T. atroviride and T. virens together with
identified calcium ATPases of N. crassa. The red box highlights the group of new ATPaes. Sequences were
aligned using Clustal X and phylogenetic analysis was performed with MEGA4 (3) using the minimum
evolution algorithm with 500 bootstrap cycles.
TV 34827
TR 81536
TA 128193
NC NCU07966
NC NCU05046 ena1
TV 67662
TV 57750
TR 122972
NC NCU08147 ph7/ena2
TA 219964
TR 81430
TV 59192
TV 87963
TR 120627
TA 85476
NC NCU03305 nca1
NC NCU03292 pmr1
TA 161034
TR 119592
TV 112028
TR 62362
TV 69284
TA 322548
TR 58952
TV 33876
NC NCU05154 nca3
NC NCU04736 nca2
TA 133801
TR 75347
TV 210318
TV 34006
TA 139416
TR 23221
NC NCU04898
NC NCU10143
TR 123183
TA 315257
TV 78217
57
100
100
100
94
100
100
97
100
100
100
98
100
99
100
100
78
100
100
100
100
73
100
100
98
100
99
92
100
65
100
100
97
100
100
0.1
31
Transcription Factors
Supplementary note 1
Identification of transcription factors.
In order to identify proteins encoding transcription factors, we searched the predicted proteins
in the three Trichoderma genomes for DNA binding domains using the Pfam database (6)
with the Web Server Batch Sequence Search
(http://pfam.sanger.ac.uk/search#TabView=tab1) using a cutoff of E-value ≤ 1e-2
.
Subsequently, the candidate transcription factors were compared against the Fungal
Transcription Factor Database (FTFD; http://ftfd.snu.ac.kr/tf.php, (7)) to group the putative
transcription factors into families. Finally, all proteins were analyzed manually.
Supplementary note 2
Global comparison of Trichoderma TFs to other fungi
We compared the repertoire of Trichoderma transcription factors with those of other
filamentous fungi using Bidirectional Best Hit (BBH) analysis against fungi with different
lifestyles, as a probe: N. crassa,
(http://www.broad.mit.edu/annotation/genome/neurospora/Home.html); two plants pathogens:
Fusarium graminearum,
(http://www.broad.mit.edu/annotation/genome/fusarium_graminearum/MultiHome.html) and
Fusarium oxysporum,
(http://www.broadinstitute.org/annotation/genome/fusarium_group/MultiHome.html); and the
model fungus A. nidulans,
(http://www.broadinstitute.org/annotation/genome/aspergillus_group/MultiHome.html), using
E-value ≤ 1e-5
. Additionally, we compared the Trichoderma TFs to proteins encoded in the
genomes of two insect pathogenic fungi: Cordyceps militaris,
(http://www.ncbi.nlm.nih.gov/genome/?term=Cordyceps militaris) and Metarhizium
anisopliae, (http://www.ncbi.nlm.nih.gov/genome/2190) using a BLASTP with an E-value ≤
1e-3
. T. atroviride and T. virens, the two mycoparasitic species, have more orthologous or
homologous TFs among those found in fungal pathogens compared to those present in N.
crassa (Table 3 and supplementary table S2).
Also the genes specific to one or two Trichoderma species show interesting distribution of
homologous genes in other fungi:
For the 6 TFs common to T. atroviride and T. reesei, 4 have orthologs in N. crassa, F.
graminearum and F. oxysporum or A. nidulans; from which one had an ortholog in N.crassa,
F. graminearum and F. oxysporum, one was shared with F. graminearum, F. oxysporum and
A. nidulans, one with F. graminearum and F. oxysporum, one with N.crassa and A. nidulans,
and two had no orthologs in any of them (supplementary figure S30 and supplementary table
S2). In the case of the TFs shared by T. virens and T. reesei, 11 have orthologs in N. crassa,
F. graminearum and F. oxysporum or A. nidulans; from which, four were found to have
orthologs in N.crassa, F. graminearum, F. oxysporum and A. nidulans, one has orthologs in
N.crassa, F. graminearum and F. oxysporum, one in F. graminearum, F. oxysporum and A.
nidulans, four have orthologs only in F. graminearum and F. oxysporum, one with N.crassa
and A. nidulans, and 9 have no orthologs in neither fungus. Moreover, 8 have a homolog in
M. anisopliae and C. militaris, and 12 have no homolog (supplementary figure S30,
32
supplementary table S2).
Interestingly, for the 102 TFs shared by T. atroviride and T. virens there are more orthologs
shared only with the phytopathogenic fungi, because 12 have orthologs in N. crassa, F.
graminearum, F. oxysporum or A. nidulans, 5 have an ortholog in N. crassa, F. graminearum
or F. oxysporum, 3 with orthologs only in N. crassa, while 15 have orthologs in F.
graminearum, F. oxysporum or A. nidulans, 20 with F. graminearum or F. oxysporum and 10
TFs were shared only with A. nidulans. In support of these results, of the 102 TFs shared by
T. atroviride and T. virens, 56 have homologs with insect fungal pathogens (M. anisopliae or
C. militaris). In addition, 38 of these transcription factors have no orthologs in N. crassa, F.
graminearum, F. oxyporum or A. nidulans. These factors could be key elements in the ability
of these two species to antagonize other fungi or in the plant-symbiont interaction
(supplementary figure S30 and supplementary table S2).
Additionally, the unique TFs of T. atroviride and T. virens also show an increase in the
number of orthologs with pathogenic fungi. The same happens with homologs in M.
anisopliae and C. militaris (supplementary figure S30 and supplementary table S2). Finally,
44 TFs of T. atroviride, and 65 of T. virens had no orthologs in any other of the fungal
genomes analyzed, whereas only 7 of T. reesei were found to fall in this group
(supplementary figure S30 and supplementary table S2). The latter set of genes may be
considered species specific, and may be involved in processes that are unique to each one of
the species or might reflect major regulatory changes for genes that are common to all three
species.
33
Fig. S29. Distribution of transcription factors (TF) found in the three species of
Trichoderma with shared orthologs or homologs in N. crassa (Nc), F. graminearum (Fg), F.
oxysporum (Fo), A. nidulans (An), C. militaris (Cm) and M. anisopliae (Ma). Blue bars
indicate TF with orthologs in NcFgFoAn, red bars indicate TF without orthologs in
NcFgFoAn, green bars indicate TF with homologs in MaCm, purple bars indicate TF without
homologs in MaCm and orange bars indicate TF without orthologs or homologs in the fungi
analyzed. On top each bar indicates the number of TFs.
34
Fig. S30. Distribution of orthologs and homologs of transcription factors (TF) shared in
Trichoderma spp., with N. crassa (Nc), F. graminearum (Fg), F. oxysporum (Fo), A. nidulans
(An), C. militaris (Cm) and M. anisopliae (Ma). TrTaTv (TF common to all three species
Trichoderma); TF shared between Ta and Tr (TaTr), Tv and Tr (TvTr) and Ta and Tv (TaTv);
Transcription factors unique of T. atroviride (Ta), T. virens (Tv) an T. reesei (Tr). Blue bars
indicate TF with orthologs in NcFgFoAn, red bars indicate TF without orthologs in
NcFgFoAn, green bars indicate TF with homologs in MaCm, purple bars indicate TF without
homologs in MaCm and orange bars indicate TF without orthologs or homologs in the fungi
analyzed. On top each bar indicates the number of TFs.
35
Genes related to competition and defense (biocontrol)
Supplementary note 3
Prediction and annotation of extracellular proteins
The set of extracellular proteins was predicted as follows. Predicted proteome datasets for T.
virens, T. reesei and T. atroviride were downloaded from the JGI website [http://genome.jgi-
psf.org/TriviGv29_8_2/TriviGv29_8_2.home.html, http://genome.jgi-
psf.org/Trire2/Trire2.home.html and http://genome.jgi.doe.gov/Triat2/Triat2.home.html,
respectively].
An initial set of potential extracellular proteins was defined using a pipeline automated in Perl
scripts as follows. We first used CBS SignalP 4.0 (8) locally
[http://www.cbs.dtu.dk/services/SignalP/] on the predicted proteomes of the three fungi. All
sequences with a predicted signal peptide (but without transmembrane helices) were initially
selected. This new dataset was then scanned for endoplasmic reticulum signal retention motifs
(ERrs [PS00014]) by using ScanProsite (9, 10) [http://prosite.expasy.org/scanprosite/] locally,
and for GPI-anchor signals by using FragAnchor: GPI-Anchored Protein Prediction Tandem
System (NN+HMM) (11) [http://navet.ics.hawaii.edu/~fraganchor/NNHMM/NNHMM.html].
All of the sequences containing ERrs and GPI anchor were removed from the initial dataset.
Remaining proteins were annotated by using Blast2GO (12, 13) pipeline with default
parameters. We will refer to these proteins as the extracellular dataset hereafter.
Identification of putative effector proteins in Trichoderma spp.
We first integrated a database of 80 proteins known to be effectors as reported in the literature
and having experimental evidence to act as effectors in other systems. We integrated the
database by text mining with TextPresso (14). This dataset consisted of 68 different kinds of
effector proteins. Homologous sequences to these 68 different effectors were searched by
BLASTp in all three Trichoderma spp. (e-value cutoff of 10E-5). By using Perl scripts, we
selected hits if: 1) the cutoff is equal or less than 10E-5; 2) the alignment length is equal or
more than 50 amino acids; 3) the query sequence is represented by more than 50% in the
alignment versus the subject sequence. An Interpro (15, 16) analysis was conducted on all the
68 different sequences known to be effectors, and in all Trichoderma species studied here. We
then identified all Trichoderma proteins having a similar domain structure to our validated
dataset (the 68 different proteins) by using Perl scripts. We then searched in the extracellular
dataset (see the above section) for known and experimentally validated effector motifs such as
RxLR (17, 18), RxFLAK (19) and some possible variants of them like
[RKH]x[LYMFYW][RKH] (20). These motifs were searched between amino acids 15 to 75
in the sequences, by using Perl scripts.
Protein internal repeats were predicted using XSTREAM
(http://jimcooperlab.mcdb.ucsb.edu/xstream/) (21), in the extracellular dataset of T.
atroviride, T. virens and T. reesei with default parameters (21).
36
Small Secreted Cysteine Rich Proteins (SSCRP) were identified by analyzing all sequences
smaller than 300 amino acids in length and showing an enrichment of cysteines greater than
or equal to 3%. All sequences with these features were analyzed for disulfide bridges using
Disulfind (22) and R scripts. Sequences with previous cutoffs for enrichment and amino acid
length as well as with at least two or more predicted disulfide bridges were considered
SSCRP.
All sequences that were identified as extracellular and having one or more features found in
the validated set were considered as effector proteins in these three species. We will refer to
these proteins as the potential effector dataset.
Distribution and phylogenetic analysis of effector proteins
Once we identified our dataset of effector proteins predicted in Trichoderma proteome (i.e.,
the effector dataset), we proceeded to screen for orthologs in T. atroviride, T. virens and T.
reesei as well as G. zeae and Chaetomium globosum using BranchClust (23). We aligned all
orthogroups with T-coffee (24) locally and identified the best evolutionary model by using
ProtTest (25). Then we reconstructed phylogenies for all groups of homologs in the effector
dataset by using PhyML (26) (100 bootstrap replications). Finally we used MEGA5 (2) to edit
the obtained trees.
37
Proteins with known effector motifs
Figure S31. Evolutionary relationships of NEPs. (A) The evolutionary history was inferred using the
Neighbor-Joining method (13) with 1000 bootstrap cycles (12). The tree is drawn to scale, with branch lengths in
the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary
distances were computed using the Poisson correction method (14) and are in units of the number of amino acid
substitutions per site. The analysis involved 28 amino acid sequences. Evolutionary analyses were conducted in
MEGA5 (2). (B) Sequence logos of the conserved domain within NEPs in Trichoderma spp. The graphic was
generated using WebLogo (27).
38
Figure S32 Phylogenetic analysis of LysM domains in fungi.
The evolutionary history was inferred using the Neighbor-Joining method (13) using the bootstrap test (1000
replicates) (12, 23). The evolutionary distances were computed using the JTT matrix-based method (24) and are
in units of the number of amino acid substitutions per site. Evolutionary analyses (Maximum Parsimony
analysis) were conducted in MEGA5.
39
REFERENCES
1. Gouy M, Guindon S, Gascuel O. 2010. SeaView version 4: A multiplatform
graphical user interface for sequence alignment and phylogenetic tree building. Mol
Biol Evol 27:221-224.
2. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. 2011. MEGA5:
molecular evolutionary genetics analysis using maximum likelihood, evolutionary
distance, and maximum parsimony methods. Mol Biol Evol 28:2731-2739.
3. Tamura K, Dudley J, Nei M, Kumar S. 2007. MEGA4: Molecular Evolutionary
Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol 24:1596-1599.
4. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. 1997. The
CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment
aided by quality analysis tools. Nucleic Acids Res 25:4876-4882.
5. Nicholas K, Nicholas H. 1997. GeneDoc: a tool for editing and annotating multiple
sequence alignments. distributed by the autors ( wwwpscedu/biomed/genedoc).
6. Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N,
Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer EL, Eddy SR,
Bateman A, Finn RD. 2012. The Pfam protein families database. Nucleic Acids Res
40:D290-301.
7. Park J, Park J, Jang S, Kim S, Kong S, Choi J, Ahn K, Kim J, Lee S, Kim S,
Park B, Jung K, Kim S, Kang S, Lee YH. 2008. FTFD: an informatics pipeline
supporting phylogenomic analysis of fungal transcription factors. Bioinformatics
24:1024-1025.
8. Petersen TN, Brunak S, von Heijne G, Nielsen H. 2011. SignalP 4.0: discriminating
signal peptides from transmembrane regions. Nat Methods 8:785-786.
9. Sigrist CJ, Cerutti L, de Castro E, Langendijk-Genevaux PS, Bulliard V, Bairoch
A, Hulo N. 2010. PROSITE, a protein domain database for functional characterization
and annotation. Nucleic Acids Res 38:D161-166.
10. Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux
PS, Pagni M, Sigrist CJ. 2006. The PROSITE database. Nucleic Acids Res 34:D227-
230.
11. Poisson G, Chauve C, Chen X, Bergeron A. 2007. FragAnchor: a large-scale
predictor of glycosylphosphatidylinositol anchors in eukaryote protein sequences by
qualitative scoring. Genomics Proteomics Bioinformatics 5:121-130.
12. Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M. 2005.
Blast2GO: a universal tool for annotation, visualization and analysis in functional
genomics research. Bioinformatics 21:3674-3676.
13. Gotz S, Garcia-Gomez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ,
Robles M, Talon M, Dopazo J, Conesa A. 2008. High-throughput functional
annotation and data mining with the Blast2GO suite. Nucleic Acids Res 36:3420-
3435.
40
14. Muller HM, Kenny EE, Sternberg PW. 2004. Textpresso: An ontology-based
information retrieval and extraction system for biological literature. Plos Biology
2:1984-1998.
15. Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P,
Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D,
Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J,
McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C,
Quinn AF, Selengut JD, Sigrist CJA, Thimma M, Thomas PD, Valentin F,
Wilson D, Wu CH, Yeats C. 2009. InterPro: the integrative protein signature
database. Nucleic Acids Res 37:D211-D215.
16. McDowall J, Hunter S. 2011. InterPro protein classification. Methods in molecular
biology (Clifton, NJ) 694:37-47.
17. Birch PRJ, Boevink PC, Gilroy EM, Hein I, Pritchard L, Whisson SC. 2008.
Oomycete RXLR effectors: delivery, functional redundancy and durable disease
resistance. Curr Opin Plant Biol 11:373-379.
18. Govers F, Bouwmeester K. 2008. Effector Trafficking: RXLR-dEER as Extra Gear
for Delivery into Plant Cells. Plant Cell 20:1728-1730.
19. Liu T, Ye W, Ru Y, Yang X, Gu B, Tao K, Lu S, Dong S, Zheng X, Shan W,
Wang Y, Dou D. 2011. Two Host Cytoplasmic Effectors Are Required for
Pathogenesis of Phytophthora sojae by Suppression of Host Defenses. Plant Physiol
155:490-501.
20. Rouxel T, de Wit P. 2011. Dothideomycete Effectors Facilitating Biotrophic and
Necrotrophic Lifestyles, p 426. In Martin F, Kamoun S (ed), Effectors in Plant-
Microbe Interactions, 1st ed. Wiley-Blackwell, Oxford, UK.
21. Newman AM, Cooper JB. 2007. XSTREAM: a practical algorithm for identification
and architecture modeling of tandem repeats in protein sequences. BMC
Bioinformatics 8:382.
22. Ceroni A, Passerini A, Vullo A, Frasconi P. 2006. DISULFIND: a disulfide bonding
state and cysteine connectivity prediction server. Nucleic Acids Res 34:W177-W181.
23. Poptsova MS, Gogarten JP. 2007. BranchClust: a phylogenetic algorithm for
selecting gene families. Bmc Bioinformatics 8.
24. Notredame C, Higgins DG, Heringa J. 2000. T-Coffee: A novel method for fast and
accurate multiple sequence alignment. J Mol Biol 302:205-217.
25. Abascal F, Zardoya R, Posada D. 2005. ProtTest: selection of best-fit models of
protein evolution. Bioinformatics 21:2104-2105.
26. Guindon S, Dufayard JF, Hordijk W, Lefort V, Gascuel O. 2009. PhyML: Fast
and Accurate Phylogeny Reconstruction by Maximum Likelihood. Infection Genetics
and Evolution 9:384-385.
27. Crooks GE, Hon G, Chandonia JM, Brenner SE. 2004. WebLogo: a sequence logo
generator. Genome research 14:1188-1190.