Download pdf - The genomes of three uneven siblings footprints of the ... · 2/5/2016 · Figures S1-S3 2 Chromatin rearrangement and histone modification Figures S4 – S18 5 CAZymes Figures S19

1

The genomes of three uneven siblings – footprints of the

lifestyle of three Trichoderma species

SUPPLEMENTARY MATERIAL

Genome defense mechanisms

Figures S1-S3 2

Chromatin rearrangement and histone modification

Figures S4 – S18 5

CAZymes

Figures S19 –S22 21

Nitrogen metabolism


Protein kinases

Figure S26 28

Protein phosphatases

Figure S27 29

Calcium signaling

Figure S28 30

Transcription factors

Supplementary note 1 31



Genes related to competition and defense


Proteins with known effector domains

Figures S31– S32 37

REFERENCES 39

2

Genome Defense Mechanisms

Figure S1. Phylogenetic tree of fungal Dicer-like proteins. TA: T.atroviride, TV: T. virens, TR: T. reesei, TAs:

T. asperellum, TH: T. harzanium, FG: Fusarium graminearum, MG: Magnaporthe grisea, NC: N. crassa, CP:

Cryphonectria parasitica,MC: Mucor circinelloides, CN: Cryptococcus neoformans, SP: Schizosaccharomyces

pombe, AN:Aspergillus nidulans, AT: Arabidopsis thaliana. AT RNAi proteins were used as outgroup. The

number indicates the paralogs in each strain. M: Meiotic Silencing by Unpaired DNA, Q: Quelling. Distance

trees were constructed by the neighbor-joining (NJ) method with the default settings using SeaView software

(1). The robustness of the trees was estimated by performing 1000 bootstrap replicates (expressed as percentage

in the figures).

Seaview Dcr-NJ_tree Wed Sep 23 22:04:30 2015

AT_Dcl4_At5g20320

AT_Dcl-2_At3g03300

AT_Dcl-3_At3g43920

AT_Dcl-1_At1g01040

70

98

SP_DCR1_SPCC188.13c

MC_DCL1_104148

MC_DCL2_116481

TH1_497427

TV1_171147

TR1_69494

87

TAs1_138816

TA_Dcr1_292263

100

100

FG1_FG09025

100

CP_DCL-1_69967

NC_DCL-1/SMS-3_NCU08270

MG_MDL1_MG01541

63

89

96

CN_DCR2_CNAG02745

CN_DCR1_CNAG02742

100

100

100

76

35

100

66

MG_MDL2_MG07167

TH2_89465

TV2_47151

TR2_79823

98

TAs2_148991

TA_Dcr2_291296

100

100

FG2_FG04408

100

CP_DCL-2_276559

81

17

NC_DCL-2_NCU06766

34

AN_AN3189

100

100

NJ 330 sites Poisson 1000 repl.

Q

M

3

Figure S2 Phylogenetic tree of fungal Argonaute proteins. TA: T.atroviride, TV: T. virens, TR: T. reesei,

TAs: T. asperellum, TH: T. harzanium, FG: Fusarium graminearum, MG: Magnaporthe grisea, NC: N. crassa,

CP: Cryphonectria parasitica,MC: Mucor circinelloides, CN: Cryptococcus neoformans, SP:

Schizosaccharomyces pombe, AN:Aspergillus nidulans, AT: Arabidopsis thaliana. AT RNAi proteins were used

as outgroup. The number indicates the paralogs in each strain. M: Meiotic Silencing by Unpaired DNA, Q:

Quelling. Distance trees were constructed by the neighbor-joining (NJ) method with the default settings using

SeaView software (1). The robustness of the trees was estimated by performing 1000 bootstrap replicates

(expressed as percentage in the figures).

Seaview agos1-NJ_tree Wed Sep 23 22:16:49 2015

AT_AGO10_At5g43810

AT_AGO1_At1g48410

AT_AGO5_At2g27880

100

AT_AGO7_At1g69440

AT_AGO3_At1g31290

AT_AGO2_At1g31280100

100

88

AT_AGO9_At5g21150

AT_AGO4_At2g27040

AT_AGO6_At2g32940

100

59

100

SP_AGO1_SPCC736.1

94

CN_AGO1_CNAG04609

MC_Ago2_104162

MC_Ago3_104163

MC_Ago1_154929

80

100

30

41

TH3_530248

TAs3_44486

TA_Ago3_36522

TV3_37110

70

TR3_107068

29

58

FG2_FG00348

100

CP_AGL4_261854

MG2_MG11029

100

94

NC_SMS-2_NCU09434

93

31

100

CP_AGL3_268359

TH2_78235

TV2_181363

TR2_60270

99

TA_Ago2_20708

89

CP_AGL2_292762

100

60

MG3_MG10003

AN_AN1519

74

65

TH1_417954

TV1_112874

TAs2_26314

TAs1_132085

TA_Ago1_245602

100

100

100

TR1_49832

60

FG1_FG08752

100

MG1_MG01294

NC_QDE-2_NCU04730

95

93

CP_AGL1_74333

98

99

88

100


Q

M

4

Figure S3 Phylogenetic tree of fungal RNA-dependent RNA polymerase (RdRp). TA: T.atroviride, TV: T.

virens, TR: T. reesei, TAs: T. asperellum, TH: T. harzanium, FG: Fusarium graminearum, MG: Magnaporthe

grisea, NC: N. crassa, CP: Cryphonectria parasitica,MC: Mucor circinelloides, CN: Cryptococcus neoformans,

SP: Schizosaccharomyces pombe, AN:Aspergillus nidulans, AT: Arabidopsis thaliana. AT RNAi proteins were

used as outgroup. The number indicates the paralogs in each strain. M: Meiotic Silencing by Unpaired DNA, Q:

Quelling. Distance trees were constructed by the neighbor-joining (NJ) method with the default settings using

SeaView software (1). The robustness of the trees was estimated by performing 1000 bootstrap replicates

(expressed as percentage in the figures).

Q

M

Seaview Rdr-NJ_tree Wed Sep 23 22:27:06 2015

AT6_sde-1/sgs2_At3g49500

AT1_At1g14790

FG5_FG09076

95

MG2_MG02748

TH2_489404

TV2_28428

TAs2_451362

TA_Rdr2_225118

100

100

TR2_103470

80

FG2_FG08716

100

CP3_10929

99

NC_SAD1_NCU02178

50

37

AN1_AN4790

47

SP_RDR1_SPAC6F12

88

CN_CNAG03466

MC2_144762

78

24

44

46

TH3_95176

TV3_10390

TR3_49048

99

TAs3_195554

TA_Rdr3_317554

84

100

FG3_FG01582

100

CP4_339656

NC_RRP3_NCU08435

97

53

AN2_AN2717

27

MG3_MG06205

28

54

100

AT4_At2g19920

AT3_At2g19910

65

100

TH1_134243

TV1_122493

TR1_67742

86

TAs1_181102

TA_Rdr1_321718

42

69

FG1_FG06504

100

CP1_35624

NC_QDE1_NCU07534

MG1_MG07682

68

44

25

CP2_270014

68

FG4_FG04619

58

MC1_82874

100

96


5

Chromatin rearrangement and histone modification

Figure S4. Phylogenetic tree of Trichoderma spp. histone variants. The unrooted phylogenetic tree was

constructed by the neighbor-joining method, using using MEGA 5.01 (2). Numbers on branches indicate the

bootstrap values obtained after 1000 replications. The tree shows the different histone variants marked with

different colors. Ch: Colletotrichum higginsianum, Fo: Fusarium oxysporum. Nh: Nectria haematococca, Nc:

Neurospora crassa, Gz: Gibberella zeae, Sc: Saccharomyces cerevisiae.

6

Figure S5. Molecular architecture of centromere and kinetochore. The present scheme represents the

conserved proteins that are involved in the structure of the centromere and kinetochore, as well as in nucleosome

modifications. For our purposes, we assume a similar architecture for Trichoderma spp. The centromere is

composed of centric chromatin and centromeric heterochromatin, where CENP-B delimits both regions. The

centromeric heterochromatin is characterized by H3K9 trimethylation which is recognized for HP1, together

with the presence of methylated cytosines. In the centric chromatin, CENP-A nucleosomes are interspersed

between nucleosomes containing canonical H3 histones, with the histone modifications H3K9 and H4K20

methylations. The kinetochore has been partitioned into the inner kinetochore that includes chromatinproximal

components, and the outer kinetochore, which is composed of complexes that mediate microtubule (MT)

attachment and is broadly conserved. In the inner kinetochore, the Constitutive CentromereAssociated Network

(CCAN) includes CENP-C, which binds to DNA and CENP-A at one end, and to other CCAN proteins on the

other end. CCAN proteins associate closely with members of the Mis12 complex. In the outer kinetochore, the

major microtubule binding activity of the kinetochore is mediated by an assembly of both the Knl1 and the

Ndc80 complexes, at the Mis12 complex. This structure attaches cooperatively to microtubules. The Ndc80

complex is a rod-like molecule, with globular domains at either end of the rod that are involved in MT

attachment. The Dam1 complex forms rings around microtubules, and localizes behind the Ndc80 head domains.

At this location, a Dam1 complex ring encircles the MT lattice as well as the coiled-coil domain of the Ndc80

complex. Together, Dam1 and Ndc80 complexes are the major microtubule-binding kinetochore proteins that

organize end-on kinetochore-micotubule attachment.

7

Figure S6. Open or closed chromatin conformations and chromatin remodelers. Chromatin is the complex

assemblage of DNA, histone proteins, and other non-histone protein components. Changes in chromatin

structure result in modification of functional properties, since many regulatory processes occur at this level.

Transcriptionally silent chromatin involves the action of histone methyltransferases, which catalyze the

H3K9met2, 3 that in turn is recognized by the chromodomain of HP1. Another post-translational modification of

histones, that is associated to silent chromatin is ubiquitination (H2AK119Ub), as well as a hypoacetylated state

of histones, DNA methylation and incorporation of the linker histone H1.

The closed chromatin is converted to an open and transcriptionally active state by exchange/incorporation of

chromatin components, removal and addition of covalent modifications of histones and DNA. In contrast to

heterochromatin, histone hyperacetylation (H3K9) is a mark of a permissive chromatin, which is catalyzed by

coactivator complexes such as SAGA. Other non-histone protein components include HMGA and HMGB that

participate in establishment of transcriptional factors on gene promoters. During gene expression, there is

alteration in the positioning of nucleosomes, since either promoter regions of active genes seem to be largely

devoid of nucleosomes or the promoter-associated nucleosomes have been repositioned, for instance by the

action of ATP-dependent chromatin-remodeling enzymes present in SWI/SNF or RSC complexes. Chromatin-

remodeling complexes perturb or reposition nucleosomes, allowing access to DNA-binding sites.

8

Figure S7a. Tree view of Snf2 family – Rad5/16 like proteins. Unrooted neighbour-joining tree (using

bootstrap) from a multiple alignment of Snf2 predicted protein sequences from T. virens, T. atroviride and T.

reesei, N. crassa and G. zeae. Phylogenetic and molecular evolutionary analyses were conducted using MEGA

version 4 (3).

9

Figure S7b. Tree view of Snf2 family – other proteins. Unrooted neighbour-joining tree (using bootstrap)

from a multiple alignment of Snf2 predicted protein sequences from T. virens, T. atroviride, T. reesei, N. crassa

and G. zeae. Phylogenetic and molecular evolutionary analyses were conducted using MEGA version 4 (3).

10

A

B

Figure S8. Structure and phylogeny of SNF5 family proteins. (A) The SNF5 domain (orange) has been

predicted as an enzymatic unit for multiple demethylases. The Sfh1 protein has a GATA zinc finger domain

(green). (B) Unrooted neighbour-joining tree (using bootstrap) from a multiple alignment of predicted Snf5

protein sequences from T. virens, T. atroviride, T. reesei, N. crassa and G. zeae. Phylogenetic and molecular

evolutionary analyses were conducted using MEGA version 4 (3).

11

Figure S9. Phylogenetic analysis of HMG family proteins. Unrooted neighbour-joining tree (using bootstrap)

from a multiple alignment of HMG protein sequences from T. virens, T. atroviride, T. reesei, N. crassa and G.

zeae. Phylogenetic and molecular evolutionary analyses were conducted using MEGA version 4 (3).

12

Figure S10. Phylogenetic tree of histone acetyltransferases in Trichoderma spp. The unrooted tree was

constructed by the neighbor-joining method using MEGA 5.01 (2). N. crassa and G. zeae were used as outgroup.

Numbers on branches indicate the bootstrap values obtained after 500 replications. The tree shows the different

families of histone acetyltransferases.

13

Figure S11. Phylogenetic analysis and structure of bromodomain proteins found in Trichoderma spp. (A)

Phylogenetic tree of bromodomain proteins present in the Trichoderma spp. genomes. The unrooted tree was

constructed by the neighbor-joining method, using MEGA 5.01 (2). Numbers on branches indicate the bootstrap

values obtained after 500 replications. (B) Schematic structures of bromodomain proteins in Trichoderma spp.

The bromodomain is represented by orange boxes. The protein length is indicated at the right side of pictures.

Due to high degree in homology between the Trichoderma spp. bromodomains, only the T. atroviride

bromodomain is shown.

14

Figure S12. Proteins containing SANT-domaind present in Trichoderma spp. (A) Phylogenetic tree of

SANT-domain proteins. The unrooted tree was constructed by the neighbor-joining method using MEGA 5.01

(2). Numbers on branches indicate the bootstrap values obtained after 500 replications. The tree shows the

relation among Rsc8 and Ada2 with the rest of SANT-domain proteins. (B) Sequence alignment of the SANT

domain proteins from Trichoderma spp. with Ada2 and Rsc8 from S. cerevisiae (Sc). Asterisks indicate the

conserved residues that are predicted to form the core of the SANT domain. The shaded residues indicate

identical and conserved amino acids.

TA 268637

TR 4124

TV 193762

GZ 46126337

NC 85095018

GZ 46111675

NC 164424105

NC 85108949

TR 21557

GZ 46110086

TA 263654

TV 198070

TA 273518

TV 231090

TR 120908

NC 40882161

TA 292194

TV 192559

TR 123619

100

61

97

100

100

88

65

64

100

62

45

45

100

100

2

ada2

Rsc8

SANT_1

SANT_2

* * *

A

B

15

Figure S13. Schematic representation of Histone Deacetylases (HDACs) in fungi. The structure of fungal

histone deacetylases are schematically indicated with the blue boxes (histone deacetylase domain) and a long C-

terminal region for RPD3, PHD1, HOSB, whereas HDA1 has a histone deacetylase domain (blue boxes) and an

Arb2 domain (orange box). The structure of sirtuin 2 (SIR2) represents the sirtuin-typic catalytic core domain.

The catalytic domains are flanked by distinct N- and C-terminal extensions (gray boxes).

16

Figure S14. Phylogenetic analysis of histone deacetylases (HDACs) in filamentous fungi using the Poisson

model. The HDACs sequences used to construct the tree were from T. atroviride (TA), T. virens (TV), T. reesei

(TR), N. crassa (NC) and G. zeae (GZ). The numbers at nodes correspond to the percentages of 500 bootstrap

replications. The phylogenetic analysis was performed using MEGA 5 (2).

17

A

B

Figure S15. Structure and phylogeny of JmjC-domain-containing proteins. (A) The JmjC domain (blue) has

been predicted as an enzymatic unit for multiple demethylases. JARID-like proteins have a C5HC2 (gray)

domain that has been shown to be important for demethylase activity. (B) Unrooted neighbour-joining tree

(using bootstrap) from a multiple sequence alignment of predicted JmjC protein sequences from T. virens, T.

atroviride and T. reesei, N. crassa and G. zeae. Phylogenetic and molecular evolutionary analyses were

conducted using MEGA version 4 (3).

18

A

B

Figure S16. The Ubiquitin system as involved in histone modification. (A) Scheme of structure of proteins of

the ubiquitin system. E2 ubiquitin-conjugating enzymes present the UBC domain (green), which donate

ubiquitin to the ε-amino group of specific lysine residues, often in a substrate-specific manner. BRE1 is a RING

finger-containing E3 ubiquitin ligase, and the RING domain (orange) is essential for their ubiquitin ligase

activity. (B) Phylogenetic analysis of proteins of the ubiquitin system. Unrooted neighbour-joining tree (using

bootstrap) from a multiple alignment of the predicted ubiquitin system related protein sequences from T. virens,

T. atroviride, T. reesei, N. crassa and G. zeae. Phylogenetic and molecular evolutionary analyses were

conducted using MEGA version 4 (3).

19

Figure S17. SUMO functions. SUMO is translated as a precursor protein and it has to be processed by the Ulp1

protease. Mature SUMO is then activated by the heterodimeric SUMO-activating enzyme (E1), in an ATP-

dependent manner. Later, activated SUMO is transferred to the conjugating enzyme (E2), which in conjunction

with the SUMO ligase (E3), conjugates the SUMO protein to lysine residues of a wide variety of substrate

proteins, including histone proteins.

20

Figure S18. Phylogenetic tree of proteins involved in sumoylation in Trichoderma spp. The unrooted tree

was constructed using the neighbor-joining method of MEGA 5.01. Numbers on branches indicate the bootstrap

values obtained after 1000 replications. The tree shows the proteins involved sumoylation such as E1-activating

enzyme (Aos1/Uba2), E2-conjugating (Ubc9) and E3 ligases (SizA and Mms2) as well as the SUMO protein

Smt3.

21

CAZymes

Figure S19. Comparison of amino acid similarities of cellulases between T. atroviride, T. reesei and T.

virens. Average % aa-positives (grey bars) and % aa-identities (black bars) are shown. Protein sequences were

compared by pairwise alignments (BLASTP) and average values [%] of identical amino acids (black bars) or

amino acids with similar biochemical properties (positives; grey bars) were calculated from all three possible

pairwise alignments (TA-TV, TA-TR, TV-TR). The obtained data are indicative of the overall conservation of

the respective proteins among the three tested Trichoderma species. For example, Cel3c is a particularly well

conserved protein, whereas Cel61c protein sequences exhibit considerable sequence differences.

60 70 80 90 100

CEL3a

CEL3b

CEL3c

CEL3d

CEL3e

CEL5a

CEL5b

CEL6a

CEL7a

CEL7b

CEL12a

CEL12b

CEL12c

CEL45a

CEL45b

CEL61a

CEL61b

CEL61c

CEL74a

%

aa-identities

aa-positives

22

Figure S20. Phylogenetic relationships of cellulases between T. atroviride, T. reesei and T. virens. Sequences

were retrieved from the respective JGI genome databases. Multiple alignments were created with ClustalX 2.0

(4), manually refined in GeneDoc (5) and phylogenetic analysis was carried out with MEGA 4 (3) using the

Neighbour Joining, a distance algorithmic method, and stability of clades was evaluated by 1000 bootstrap

rearrangements.

23

Figure S21. Comparison of amino acid similarities of the main chitinases and so far characterized

glucanases between T. atroviride, T. reesei and T. virens. Average % aa-positives (grey bars) and % aa-identities

(black bars) are shown. Protein sequences were compared by pairwise alignments (BLASTP) and average values

[%] of identical amino acids (black bars) or amino acids with similar biochemical properties (positives; grey

bars) were calculated from all three possible pairwise alignments (TA-TV, TA-TR, TV-TR). The obtained data

are indicative of the overall conservation of the respective proteins among the three tested Trichoderma species.

For example, NAG1 is a particularly well conserved protein, whereas ECH30 protein sequences exhibit more

sequence differences.

60 70 80 90 100

ECH42

NAG1

CHIT33

CHIT36

ECH30

Endo T

BGN3(GH15)

BGN16.3(GH30)

%

aa-identities

aa-positives

24

Figure S22. Phylogenetic relationships of chitinases and glucanases, respectively, between between T.

atroviride, T. reesei and T. virens. Sequences were retrieved from the respective JGI genome databases.

Multiple alignments were created with ClustalX 2.0 (4), manually refined in GeneDoc (5) and phylogenetic

analysis was carried out with MEGA 4 (3) using the Neighbour Joining, a distance algorithmic method, and

stability of clades was evaluated by 1000 bootstrap rearrangements.

Ta 49469 Chit33

Tv 178019 Chit33

Tr 43873 Chit33 Chi18-12

Tr 119859 Ech30 Chi18-13

Ta 79492 Ech30 Chi18-13

Tv 58102 Ech30 Chi18-13

Tv 46824 BGN16.3

Ta 43222 BGN16.3

Tr 3094 BGN16.3

Ta 91075 BGN3

Tv 51057 BGN3

Tr 64906 BGN3

Ta 131598 Ech42

Tr 80833 Ech42

Tv 111866 Ech42

Ta 217415 EndoT

Tr 122812 EndoT

Tv 47547 EndoT

Tr 59791 Chit36

Tv 89999 Chit36

Ta 83999 Chit36

Tr 21725 Nag1

Tv 111394 Nag1

Ta 136120 Nag1

Ta 41039 Nag2

Tr 23346 Nag2

Tv Nag2

94

100

74

100

73

99

100

99

100

98

100

58

100

99

100

93

100

99

19

27

33

36

96

100

0.2

25

Nitrogen metabolism

Figure S23. Phylogenetic analysis of proteins involved in nitrate assimilation. NJ phylogenetic tree based on

the alignment of N. crassa (Nc) proteins involved in nitrate assimilation, and homologous proteins found in T.

atroviride (Ta), T. virens (Tv), T. reesei (Tr) and G. zeae (Gz). The unrooted tree was constructed using the

neighbor-joining method of MEGA 5.01. Numbers on branches indicate the bootstrap values obtained after 1000

replications.

26

Figure S24. Phylogenetic analysis of proteins involved in purine catabolism. NJ phylogenetic tree showing

clusters of homolgs of N.crassa (Nc) proteins from T. atroviride (Ta), T. virens (Tv), T. reesei (Tr) and G. zeae

(Gz) involved in the purine catabolic pathway. The unrooted tree was constructed using the neighbor-joining

method of MEGA 5.01. Numbers on branches indicate the bootstrap values obtained after 1000 replications.

27

Figure S25. Phylogenetic analysis of proteins involved in the glutamine assimilation pathway. NJ

phylogenetic tree showing clusters of homolgs of N.crassa (Nc) proteins in T. atroviride (Ta), T. virens (Tv), T.

reesei (Tr) and G. zeae (Gz) involved in the purine catabolic pathway. The unrooted tree was constructed using

the neighbor-joining method of MEGA 5.01. Numbers on branches indicate the bootstrap values obtained after

1000 replications.

28

Protein kinases

Figure S26. Phylogenetic analysis of histidine kinases of T. atroviride (TA), T. virens (TV) and T. reesei

(TR) along with sequences from N. crassa (NCU) and Gibberella. Sequences were aligned using Clustal X (4)

and phylogenetic analysis was performed with MEGA4 (3) using the minimum evolution algorithm with 500

bootstrap cycles.

29

Protein phosphatases

Figure S27. Phylogenetic tree of Trichoderma Ser/Thr protein phosphatases. The phylogenetic tree was

constructed by the minimum-evolution method, using MEGA 4 (3). Numbers on branches indicate the bootstrap

values obtained after 500 replications. TR: T. reesei, TV: T. virens, TA: T. atroviride, NCU: N. crassa, AN:

Aspergillus nidulans, An: Aspergillus niger, AO: Aspergillus oryzae, AFL: Aspergillus flavus, Afu: Aspergillus

fumigatus.

PP2A

PP2A

PP5

PP2B

PP1

FCP/SCP

PP2C

PP2A-related

PP2A-related

PP2C-related

PPP

PPM

TV 76450

TR 56872

TA 81292

NCU06563

An18g04600

A.tubingensis 0200203

TR 48910

TA 141173

TV 82688

NCU06630

An11g00420

TV 84259

TA 301210

TR 74884

NCU08301

TR 55868

TV 77355

TA 148949

TA 296970

TR 52144

TV 197274

AN8820

AO 090020000552

TA 207023

NCU03804

TR 59944

TV 211614

TR 79535

TV 32949

TA 253020

NCU07489

AN3793

TV 193217

TA 301856

TV 111355

TR 120722

NCU00043

Aterreus 02596

AN0410

TR 122050

TV 157255

NCU08380

TA 165307

NCU02943

TR 28199

TV 30939

TA 175017

TR 74030

TA 156154

TV 189826

NCU03495

TR 21256

TV 84520

TA 50094

NCU01767

AO 090001000488

AFL 09131

TV 40568

TR 124001

TA 33728

TV 181584

TV 181508

TR 58587

NCU00434

TA 128623

Afu5g13340

AO 090005001595

NCU04600

TR 81164

TA 302981

TV 72651

99

100

58

98

58

100

77

99

71

100

61

100

83

65

69

76

100

75

97

100

94

96

93

100

67

49

61

99

45

52

98

98

98

92

77

100

97

80

58

100

43

97

55

100

69

100

43

62

100

99

86

97

98

84

81

99

96

79

65

38

39

100

100

75

94

100

0.2

30

Calcium signaling

Figure S28 Phylogenetic tree of calcium ATPases of T. reesei, T. atroviride and T. virens together with

identified calcium ATPases of N. crassa. The red box highlights the group of new ATPaes. Sequences were

aligned using Clustal X and phylogenetic analysis was performed with MEGA4 (3) using the minimum

evolution algorithm with 500 bootstrap cycles.

TV 34827

TR 81536

TA 128193

NC NCU07966

NC NCU05046 ena1

TV 67662

TV 57750

TR 122972

NC NCU08147 ph7/ena2

TA 219964

TR 81430

TV 59192

TV 87963

TR 120627

TA 85476

NC NCU03305 nca1

NC NCU03292 pmr1

TA 161034

TR 119592

TV 112028

TR 62362

TV 69284

TA 322548

TR 58952

TV 33876

NC NCU05154 nca3

NC NCU04736 nca2

TA 133801

TR 75347

TV 210318

TV 34006

TA 139416

TR 23221

NC NCU04898

NC NCU10143

TR 123183

TA 315257

TV 78217

57

100

100

100

94

100

100

97

100

100

100

98

100

99

100

100

78

100

100

100

100

73

100

100

98

100

99

92

100

65

100

100

97

100

100

0.1

31

Transcription Factors

Supplementary note 1

Identification of transcription factors.

In order to identify proteins encoding transcription factors, we searched the predicted proteins

in the three Trichoderma genomes for DNA binding domains using the Pfam database (6)

with the Web Server Batch Sequence Search

(http://pfam.sanger.ac.uk/search#TabView=tab1) using a cutoff of E-value ≤ 1e-2

.

Subsequently, the candidate transcription factors were compared against the Fungal

Transcription Factor Database (FTFD; http://ftfd.snu.ac.kr/tf.php, (7)) to group the putative

transcription factors into families. Finally, all proteins were analyzed manually.


Global comparison of Trichoderma TFs to other fungi

We compared the repertoire of Trichoderma transcription factors with those of other

filamentous fungi using Bidirectional Best Hit (BBH) analysis against fungi with different

lifestyles, as a probe: N. crassa,

(http://www.broad.mit.edu/annotation/genome/neurospora/Home.html); two plants pathogens:

Fusarium graminearum,

(http://www.broad.mit.edu/annotation/genome/fusarium_graminearum/MultiHome.html) and

Fusarium oxysporum,

(http://www.broadinstitute.org/annotation/genome/fusarium_group/MultiHome.html); and the

model fungus A. nidulans,

(http://www.broadinstitute.org/annotation/genome/aspergillus_group/MultiHome.html), using

E-value ≤ 1e-5

. Additionally, we compared the Trichoderma TFs to proteins encoded in the

genomes of two insect pathogenic fungi: Cordyceps militaris,

(http://www.ncbi.nlm.nih.gov/genome/?term=Cordyceps militaris) and Metarhizium

anisopliae, (http://www.ncbi.nlm.nih.gov/genome/2190) using a BLASTP with an E-value ≤

1e-3

. T. atroviride and T. virens, the two mycoparasitic species, have more orthologous or

homologous TFs among those found in fungal pathogens compared to those present in N.

crassa (Table 3 and supplementary table S2).

Also the genes specific to one or two Trichoderma species show interesting distribution of

homologous genes in other fungi:

For the 6 TFs common to T. atroviride and T. reesei, 4 have orthologs in N. crassa, F.

graminearum and F. oxysporum or A. nidulans; from which one had an ortholog in N.crassa,

F. graminearum and F. oxysporum, one was shared with F. graminearum, F. oxysporum and

A. nidulans, one with F. graminearum and F. oxysporum, one with N.crassa and A. nidulans,

and two had no orthologs in any of them (supplementary figure S30 and supplementary table

S2). In the case of the TFs shared by T. virens and T. reesei, 11 have orthologs in N. crassa,

F. graminearum and F. oxysporum or A. nidulans; from which, four were found to have

orthologs in N.crassa, F. graminearum, F. oxysporum and A. nidulans, one has orthologs in

N.crassa, F. graminearum and F. oxysporum, one in F. graminearum, F. oxysporum and A.

nidulans, four have orthologs only in F. graminearum and F. oxysporum, one with N.crassa

and A. nidulans, and 9 have no orthologs in neither fungus. Moreover, 8 have a homolog in

M. anisopliae and C. militaris, and 12 have no homolog (supplementary figure S30,

http://ftfd.snu.ac.kr/tf.php

http://www.broad.mit.edu/annotation/genome/neurospora/Home.html

http://www.broad.mit.edu/annotation/genome/fusarium_graminearum/MultiHome.html

http://www.broadinstitute.org/annotation/genome/fusarium_group/MultiHome.html

http://www.broadinstitute.org/annotation/genome/aspergillus_group/MultiHome.html

http://www.ncbi.nlm.nih.gov/genome/?term=Cordyceps%20militaris

http://www.ncbi.nlm.nih.gov/genome/2190

32

supplementary table S2).

Interestingly, for the 102 TFs shared by T. atroviride and T. virens there are more orthologs

shared only with the phytopathogenic fungi, because 12 have orthologs in N. crassa, F.

graminearum, F. oxysporum or A. nidulans, 5 have an ortholog in N. crassa, F. graminearum

or F. oxysporum, 3 with orthologs only in N. crassa, while 15 have orthologs in F.

graminearum, F. oxysporum or A. nidulans, 20 with F. graminearum or F. oxysporum and 10

TFs were shared only with A. nidulans. In support of these results, of the 102 TFs shared by

T. atroviride and T. virens, 56 have homologs with insect fungal pathogens (M. anisopliae or

C. militaris). In addition, 38 of these transcription factors have no orthologs in N. crassa, F.

graminearum, F. oxyporum or A. nidulans. These factors could be key elements in the ability

of these two species to antagonize other fungi or in the plant-symbiont interaction

(supplementary figure S30 and supplementary table S2).

Additionally, the unique TFs of T. atroviride and T. virens also show an increase in the

number of orthologs with pathogenic fungi. The same happens with homologs in M.

anisopliae and C. militaris (supplementary figure S30 and supplementary table S2). Finally,

44 TFs of T. atroviride, and 65 of T. virens had no orthologs in any other of the fungal

genomes analyzed, whereas only 7 of T. reesei were found to fall in this group

(supplementary figure S30 and supplementary table S2). The latter set of genes may be

considered species specific, and may be involved in processes that are unique to each one of

the species or might reflect major regulatory changes for genes that are common to all three

species.

33

Fig. S29. Distribution of transcription factors (TF) found in the three species of

Trichoderma with shared orthologs or homologs in N. crassa (Nc), F. graminearum (Fg), F.

oxysporum (Fo), A. nidulans (An), C. militaris (Cm) and M. anisopliae (Ma). Blue bars

indicate TF with orthologs in NcFgFoAn, red bars indicate TF without orthologs in

NcFgFoAn, green bars indicate TF with homologs in MaCm, purple bars indicate TF without

homologs in MaCm and orange bars indicate TF without orthologs or homologs in the fungi

analyzed. On top each bar indicates the number of TFs.

34

Fig. S30. Distribution of orthologs and homologs of transcription factors (TF) shared in

Trichoderma spp., with N. crassa (Nc), F. graminearum (Fg), F. oxysporum (Fo), A. nidulans

(An), C. militaris (Cm) and M. anisopliae (Ma). TrTaTv (TF common to all three species

Trichoderma); TF shared between Ta and Tr (TaTr), Tv and Tr (TvTr) and Ta and Tv (TaTv);

Transcription factors unique of T. atroviride (Ta), T. virens (Tv) an T. reesei (Tr). Blue bars

indicate TF with orthologs in NcFgFoAn, red bars indicate TF without orthologs in

NcFgFoAn, green bars indicate TF with homologs in MaCm, purple bars indicate TF without

homologs in MaCm and orange bars indicate TF without orthologs or homologs in the fungi

analyzed. On top each bar indicates the number of TFs.

35

Genes related to competition and defense (biocontrol)


Prediction and annotation of extracellular proteins

The set of extracellular proteins was predicted as follows. Predicted proteome datasets for T.

virens, T. reesei and T. atroviride were downloaded from the JGI website [http://genome.jgi-

psf.org/TriviGv29_8_2/TriviGv29_8_2.home.html, http://genome.jgi-

psf.org/Trire2/Trire2.home.html and http://genome.jgi.doe.gov/Triat2/Triat2.home.html,

respectively].

An initial set of potential extracellular proteins was defined using a pipeline automated in Perl

scripts as follows. We first used CBS SignalP 4.0 (8) locally

[http://www.cbs.dtu.dk/services/SignalP/] on the predicted proteomes of the three fungi. All

sequences with a predicted signal peptide (but without transmembrane helices) were initially

selected. This new dataset was then scanned for endoplasmic reticulum signal retention motifs

(ERrs [PS00014]) by using ScanProsite (9, 10) [http://prosite.expasy.org/scanprosite/] locally,

and for GPI-anchor signals by using FragAnchor: GPI-Anchored Protein Prediction Tandem

System (NN+HMM) (11) [http://navet.ics.hawaii.edu/~fraganchor/NNHMM/NNHMM.html].

All of the sequences containing ERrs and GPI anchor were removed from the initial dataset.

Remaining proteins were annotated by using Blast2GO (12, 13) pipeline with default

parameters. We will refer to these proteins as the extracellular dataset hereafter.

Identification of putative effector proteins in Trichoderma spp.

We first integrated a database of 80 proteins known to be effectors as reported in the literature

and having experimental evidence to act as effectors in other systems. We integrated the

database by text mining with TextPresso (14). This dataset consisted of 68 different kinds of

effector proteins. Homologous sequences to these 68 different effectors were searched by

BLASTp in all three Trichoderma spp. (e-value cutoff of 10E-5). By using Perl scripts, we

selected hits if: 1) the cutoff is equal or less than 10E-5; 2) the alignment length is equal or

more than 50 amino acids; 3) the query sequence is represented by more than 50% in the

alignment versus the subject sequence. An Interpro (15, 16) analysis was conducted on all the

68 different sequences known to be effectors, and in all Trichoderma species studied here. We

then identified all Trichoderma proteins having a similar domain structure to our validated

dataset (the 68 different proteins) by using Perl scripts. We then searched in the extracellular

dataset (see the above section) for known and experimentally validated effector motifs such as

RxLR (17, 18), RxFLAK (19) and some possible variants of them like

[RKH]x[LYMFYW][RKH] (20). These motifs were searched between amino acids 15 to 75

in the sequences, by using Perl scripts.

Protein internal repeats were predicted using XSTREAM

(http://jimcooperlab.mcdb.ucsb.edu/xstream/) (21), in the extracellular dataset of T.

atroviride, T. virens and T. reesei with default parameters (21).

http://genome.jgi-psf.org/TriviGv29_8_2/TriviGv29_8_2.home.html

http://genome.jgi-psf.org/TriviGv29_8_2/TriviGv29_8_2.home.html

http://genome.jgi-psf.org/Trire2/Trire2.home.html

http://genome.jgi-psf.org/Trire2/Trire2.home.html

http://genome.jgi.doe.gov/Triat2/Triat2.home.html

http://www.cbs.dtu.dk/services/SignalP/

http://navet.ics.hawaii.edu/~fraganchor/NNHMM/NNHMM.html

http://jimcooperlab.mcdb.ucsb.edu/xstream/

36

Small Secreted Cysteine Rich Proteins (SSCRP) were identified by analyzing all sequences

smaller than 300 amino acids in length and showing an enrichment of cysteines greater than

or equal to 3%. All sequences with these features were analyzed for disulfide bridges using

Disulfind (22) and R scripts. Sequences with previous cutoffs for enrichment and amino acid

length as well as with at least two or more predicted disulfide bridges were considered

SSCRP.

All sequences that were identified as extracellular and having one or more features found in

the validated set were considered as effector proteins in these three species. We will refer to

these proteins as the potential effector dataset.

Distribution and phylogenetic analysis of effector proteins

Once we identified our dataset of effector proteins predicted in Trichoderma proteome (i.e.,

the effector dataset), we proceeded to screen for orthologs in T. atroviride, T. virens and T.

reesei as well as G. zeae and Chaetomium globosum using BranchClust (23). We aligned all

orthogroups with T-coffee (24) locally and identified the best evolutionary model by using

ProtTest (25). Then we reconstructed phylogenies for all groups of homologs in the effector

dataset by using PhyML (26) (100 bootstrap replications). Finally we used MEGA5 (2) to edit

the obtained trees.

37

Proteins with known effector motifs

Figure S31. Evolutionary relationships of NEPs. (A) The evolutionary history was inferred using the

Neighbor-Joining method (13) with 1000 bootstrap cycles (12). The tree is drawn to scale, with branch lengths in

the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary

distances were computed using the Poisson correction method (14) and are in units of the number of amino acid

substitutions per site. The analysis involved 28 amino acid sequences. Evolutionary analyses were conducted in

MEGA5 (2). (B) Sequence logos of the conserved domain within NEPs in Trichoderma spp. The graphic was

generated using WebLogo (27).

38

Figure S32 Phylogenetic analysis of LysM domains in fungi.

The evolutionary history was inferred using the Neighbor-Joining method (13) using the bootstrap test (1000

replicates) (12, 23). The evolutionary distances were computed using the JTT matrix-based method (24) and are

in units of the number of amino acid substitutions per site. Evolutionary analyses (Maximum Parsimony

analysis) were conducted in MEGA5.

39

REFERENCES

1. Gouy M, Guindon S, Gascuel O. 2010. SeaView version 4: A multiplatform

graphical user interface for sequence alignment and phylogenetic tree building. Mol

Biol Evol 27:221-224.

2. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. 2011. MEGA5:

molecular evolutionary genetics analysis using maximum likelihood, evolutionary

distance, and maximum parsimony methods. Mol Biol Evol 28:2731-2739.

3. Tamura K, Dudley J, Nei M, Kumar S. 2007. MEGA4: Molecular Evolutionary

Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol 24:1596-1599.

4. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. 1997. The

CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment

aided by quality analysis tools. Nucleic Acids Res 25:4876-4882.

5. Nicholas K, Nicholas H. 1997. GeneDoc: a tool for editing and annotating multiple

sequence alignments. distributed by the autors ( wwwpscedu/biomed/genedoc).

6. Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N,

Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer EL, Eddy SR,

Bateman A, Finn RD. 2012. The Pfam protein families database. Nucleic Acids Res

40:D290-301.

7. Park J, Park J, Jang S, Kim S, Kong S, Choi J, Ahn K, Kim J, Lee S, Kim S,

Park B, Jung K, Kim S, Kang S, Lee YH. 2008. FTFD: an informatics pipeline

supporting phylogenomic analysis of fungal transcription factors. Bioinformatics

24:1024-1025.

8. Petersen TN, Brunak S, von Heijne G, Nielsen H. 2011. SignalP 4.0: discriminating

signal peptides from transmembrane regions. Nat Methods 8:785-786.

9. Sigrist CJ, Cerutti L, de Castro E, Langendijk-Genevaux PS, Bulliard V, Bairoch

A, Hulo N. 2010. PROSITE, a protein domain database for functional characterization

and annotation. Nucleic Acids Res 38:D161-166.

10. Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux

PS, Pagni M, Sigrist CJ. 2006. The PROSITE database. Nucleic Acids Res 34:D227-

230.

11. Poisson G, Chauve C, Chen X, Bergeron A. 2007. FragAnchor: a large-scale

predictor of glycosylphosphatidylinositol anchors in eukaryote protein sequences by

qualitative scoring. Genomics Proteomics Bioinformatics 5:121-130.

12. Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M. 2005.

Blast2GO: a universal tool for annotation, visualization and analysis in functional

genomics research. Bioinformatics 21:3674-3676.

13. Gotz S, Garcia-Gomez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ,

Robles M, Talon M, Dopazo J, Conesa A. 2008. High-throughput functional

annotation and data mining with the Blast2GO suite. Nucleic Acids Res 36:3420-

3435.

40

14. Muller HM, Kenny EE, Sternberg PW. 2004. Textpresso: An ontology-based

information retrieval and extraction system for biological literature. Plos Biology

2:1984-1998.

15. Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P,

Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D,

Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J,

McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C,

Quinn AF, Selengut JD, Sigrist CJA, Thimma M, Thomas PD, Valentin F,

Wilson D, Wu CH, Yeats C. 2009. InterPro: the integrative protein signature

database. Nucleic Acids Res 37:D211-D215.

16. McDowall J, Hunter S. 2011. InterPro protein classification. Methods in molecular

biology (Clifton, NJ) 694:37-47.

17. Birch PRJ, Boevink PC, Gilroy EM, Hein I, Pritchard L, Whisson SC. 2008.

Oomycete RXLR effectors: delivery, functional redundancy and durable disease

resistance. Curr Opin Plant Biol 11:373-379.

18. Govers F, Bouwmeester K. 2008. Effector Trafficking: RXLR-dEER as Extra Gear

for Delivery into Plant Cells. Plant Cell 20:1728-1730.

19. Liu T, Ye W, Ru Y, Yang X, Gu B, Tao K, Lu S, Dong S, Zheng X, Shan W,

Wang Y, Dou D. 2011. Two Host Cytoplasmic Effectors Are Required for

Pathogenesis of Phytophthora sojae by Suppression of Host Defenses. Plant Physiol

155:490-501.

20. Rouxel T, de Wit P. 2011. Dothideomycete Effectors Facilitating Biotrophic and

Necrotrophic Lifestyles, p 426. In Martin F, Kamoun S (ed), Effectors in Plant-

Microbe Interactions, 1st ed. Wiley-Blackwell, Oxford, UK.

21. Newman AM, Cooper JB. 2007. XSTREAM: a practical algorithm for identification

and architecture modeling of tandem repeats in protein sequences. BMC

Bioinformatics 8:382.

22. Ceroni A, Passerini A, Vullo A, Frasconi P. 2006. DISULFIND: a disulfide bonding

state and cysteine connectivity prediction server. Nucleic Acids Res 34:W177-W181.

23. Poptsova MS, Gogarten JP. 2007. BranchClust: a phylogenetic algorithm for

selecting gene families. Bmc Bioinformatics 8.

24. Notredame C, Higgins DG, Heringa J. 2000. T-Coffee: A novel method for fast and

accurate multiple sequence alignment. J Mol Biol 302:205-217.

25. Abascal F, Zardoya R, Posada D. 2005. ProtTest: selection of best-fit models of

protein evolution. Bioinformatics 21:2104-2105.

26. Guindon S, Dufayard JF, Hordijk W, Lefort V, Gascuel O. 2009. PhyML: Fast

and Accurate Phylogeny Reconstruction by Maximum Likelihood. Infection Genetics

and Evolution 9:384-385.

27. Crooks GE, Hon G, Chandonia JM, Brenner SE. 2004. WebLogo: a sequence logo

generator. Genome research 14:1188-1190.