21
1 Supplementary Information for Functional cooperation of the glycine synthase-reductase and Wood-Ljungdahl pathways for autotrophic growth of Clostridium drakei Yoseb Song, Jin Soo Lee, Jongoh Shin, Gyu Min Lee, Sangrak Jin, Seulgi Kang, Jung-Kul Lee, Dong Rip Kim, Eun Yeol Lee, Sun Chang Kim, Suhyung Cho, Donghyuk Kim * and Byung-Kwan Cho * * To whom correspondence should be addressed. E-mail: [email protected] and [email protected] The file includes: SI Appendix Materials and Methods SI Appendix Figures Fig. S1. Comparison between the assembled genomes. Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig. S5. Cell morphology of Clostridium drakei under heterotrophic and autotrophic conditions. Fig. S6. Quality of RNA-Seq. Fig. S7. Transcriptome analysis of Clostridium drakei in heterotrophic and autotrophic conditions. Fig. S8. Metabolic pathways, energy conservation system, and transcriptional dynamics of C. drakei. Fig. S9. Introduction of Glycine synthase-reductase pathway in E. limosum. SI Appendix Text Text S1. Genome sequencing of C. drakei. Text S2. Genome annotation. Text S3. The Wood-Ljungdahl pathway in C. drakei. Text S4. Energy conservation in C. drakei. Text S5. Genome-scale model construction and prediction. Text S6. Transcriptome analysis. SI Appendix References Other supplementary materials for this manuscript include the following: Datasets S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, and S13 iSL771.mat (iSL771 model file) www.pnas.org/cgi/doi/10.1073/pnas.1912289117

Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

1

Supplementary Information for Functional cooperation of the glycine synthase-reductase and Wood-Ljungdahl pathways for autotrophic growth of Clostridium drakei Yoseb Song, Jin Soo Lee, Jongoh Shin, Gyu Min Lee, Sangrak Jin, Seulgi Kang, Jung-Kul Lee, Dong Rip Kim, Eun Yeol Lee, Sun Chang Kim, Suhyung Cho, Donghyuk Kim* and Byung-Kwan Cho* *To whom correspondence should be addressed. E-mail: [email protected] and [email protected]

The file includes:

SI Appendix Materials and Methods

SI Appendix Figures Fig. S1. Comparison between the assembled genomes. Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig. S5. Cell morphology of Clostridium drakei under heterotrophic and autotrophic conditions. Fig. S6. Quality of RNA-Seq. Fig. S7. Transcriptome analysis of Clostridium drakei in heterotrophic and autotrophic conditions. Fig. S8. Metabolic pathways, energy conservation system, and transcriptional dynamics of C. drakei. Fig. S9. Introduction of Glycine synthase-reductase pathway in E. limosum.

SI Appendix Text Text S1. Genome sequencing of C. drakei. Text S2. Genome annotation. Text S3. The Wood-Ljungdahl pathway in C. drakei. Text S4. Energy conservation in C. drakei. Text S5. Genome-scale model construction and prediction. Text S6. Transcriptome analysis.

SI Appendix References

Other supplementary materials for this manuscript include the following: Datasets S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, and S13 iSL771.mat (iSL771 model file)

www.pnas.org/cgi/doi/10.1073/pnas.1912289117

Page 2: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

2

SI Appendix Materials and Methods

Bacterial strains and growth conditions. C. drakei SL1T was obtained from Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures. Culture was performed anaerobically at 30 °C in 100 ml DSM 135 medium (pH 7.0), which comprised 1 g/l ammonium chloride, 2 g/l yeast extract, 10 g/l sodium bicarbonate, 0.1 g/l magnesium sulfate heptahydrate, 0.3 g/l cysteine-hydrochloride, 10 ml vitamin solution (4 mg/l biotin, 4 mg/l folic acid, 20 mg/l pyridoxine-HCl, 10 mg/l thiamine-HCl, 10 mg/l riboflavin, 10 mg/l nicotinic acid, 10 mg/l pantothenate, 0.2 mg/l vitamin B12, 10 mg/l p-aminobenzoic acid and 10 mg/l lipoic acid), 4.6 mM KH2PO4, 5.4 mM K2HPO4, 4 µM resazurin, and 20 ml trace element

solution (1.0 g/l nitrilotriacetic acid, 3.0 g/l MgSO47H2O, 0.5 g/l MnSO4H2O, 1.0 g/l NaCl, 0.1 g/l

FeSO47H2O, 180 mg/l CoSO47H2O, 0.1 g/l CaCl22H2O, 180 mg/l ZnSO47H2O, 10 mg/l CuSO45H2O, 20

mg/l KAI(SO4)212H2O, 10 mg/l H3BO3, 10 mg/l Na2MO42H2O, 30 mg/l NiCl26H2O, 0.3 mg/l Na2SeO35

H2O, 0.4 mg/l Na2WO42H2O), supplemented with either 5 g/L fructose or H2/CO2 (80:20) at a pressure of 200 kPa in 50 ml of headspace for heterotrophic and autotrophic growth conditions, respectively. For the main culture, the precultured cell was prepared by cultivating in the corresponding carbon source containing media, then collected the cell pellet via anaerobic centrifugation at 3,000 g, 4 °C for 15 min, which subsequently washed using the basal DSM 135 medium for three times. The washed cell pellet was then inoculated in a fresh 100 ml DSM 135 medium for the main culture with biological triplicate, except for RNA-Seq measurement, which done with biological duplicate. DNA isolation. At exponential growth, the cell pellet was collected by anaerobic centrifugation at 3,000 g, 4 °C for 15 min; then cells were ground using liquid nitrogen in a mortar. Subsequently, 17.5 ml lysis buffer, composed of 6.5 ml buffer A (0.1 M Tris-HCl (pH 9.0), 5 mM ethylenediaminetetraacetic acid (EDTA) (pH 8.0), and 0.35 M sorbitol), 6.5 ml buffer B (0.2 M Tris-HCl (pH 9.0), 50 mM EDTA (pH 8.0), 2 M NaCl, and 2% (w/v) hexa-decyltrimethylammonium bromide), 2.6 ml of 5% (w/v) N-lauroylsarcosine

sodium salt, 1.75 ml of 1% (w/v) polyvinylpyrrolidone, and 125 l of 20 mg/ml proteinase K (Qiagen, Hilden, Germany), were added to the ground cells and incubated at 65 °C for 30 min. Following the incubation, 5.75 ml of 5 M potassium acetate (pH 7.5) was added, mixed by inverting, incubated on ice for 30 min, and centrifuged at 5,000 g at 4 °C for 20 min. The supernatant was transferred to a new tube, then 23 ml of chloroform:isoamyl alcohol (24:1) was added. The mixed solution was centrifuged at 4,000 g, 4 °C for 10 min, and then the aqueous phase was transferred to a new tube. To the transferred solution,

100 l of 10 mg/ml RNase A (Qiagen) was added, which was then incubated at 37 °C for 120 min. Following the incubation, 0.1 volume of 3 M sodium acetate and 1.0 volume of isopropanol were added, then incubated at room temperature for 5 min. The incubated sample was centrifuged for 30 min, 4 °C at 10,000 g. The pellet was washed with 2 ml of 70% (v/v) ethanol and centrifuged for 10 min at 10,000 g at 4 °C, and the washing step was repeated two additional times. The remaining pellet was dried at room

temperature for 5 min then resuspended in 500 l nuclease-free water. The extracted DNA was further purified using a QIAGEN Genomic-tip, according to the manufacturer’s instruction. Genome sequencing and annotation. The C. drakei genome was sequenced using the PacBio system with an average size of 20 kb based on polymerase version P5 and C3 chemistry. In total, 300,584 reads were sequenced, which then were filtered for read quality lower than 0.75 using SMRT Analysis v2.3.0, resulting in 115,636 reads composed of 924,907,730 bases with an average length of 7,998 bp. Using HGAP.2 SMRT Analysis v2.3.0, the trimmed reads were used for de novo assembly, resulting in a single contig as a circular form composed of 5,695,241 bp (see SI Appendix, Text S1 for detailed description). To compare and improve the accuracy of the constructed genome, the previously sequenced highly accurate Illumina data were aligned to the assembled contig using the default parameters of CLC Genome Workbench (mismatch cost = 2, insertion cost = 3, deletion cost = 3, length fraction = 0.8, and similarity fraction = 0.8) (1, 2). Following the alignment, conflicting sites were extracted (Dataset S2) and confirmed using Sanger Sequencing (see SI Appendix, Text S1 for detailed description). Using the assembled and confirmed genome, the origin and terminus of replication were determined by using GenSkew (http://genskew.csb.univie.ac.at). Gene annotation was performed using the NCBI Prokaryotic Genome Automatic Annotation Pipeline (PGAAP). tRNA and rRNA were confirmed using tRNAscan-SE1.31 and RNAmmer 1.2, respectively (3, 4). The amino acid sequences of the predicted and annotated genes were aligned against the protein database to identify Gene Ontology (GO), KEGG Orthology ID,

Page 3: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

3

and COGs (5, 6). RNA-Seq library preparation. At the mid-exponential phase, the cells were harvested by centrifugation

at 4,000 g for 15 min at 4 °C. The collected cells were resuspended in 500 l of lysis buffer, comprising 20 mM Tris-HCl (pH 7.4), 140 mM NaCl, 5 mM MgCl2, and 1% Triton X-100, and frozen then ground by using a mortar and pestle. The lysed cells were thawed in ice, and the debris was removed by centrifugation at 4,000 g for 15 min at 4 °C. Subsequently, total RNA was isolated using TRIzol (Thermo Scientific, Waltham, MA, USA) according to the manufacturer’s instruction. To remove the remaining genomic DNA (gDNA), the RNA was treated with 4 U of rDNase I (Ambion, Austin, TX, USA) for 1 h at 37 °C, then incubated at 75 °C for 10 min to deactivate the enzyme. To remove ribosomal RNAs (rRNA) in the gDNA-depleted RNA, the Ribo-ZeroTM rRNA Removal Kit for Meta-bacteria (Epicentre, Madison, WI, USA) was used according to the manufacturer’s instruction. The quality of the rRNA-depleted RNA was checked using an Agilent 2200 TapeStation system (Agilent Technologies, Santa Clara, CA, USA). To construct the RNA libraries using the quality confirmed RNA, the TrueSeq Stranded mRNA Library Prep Kit (Illumina, San Diego, CA, USA) was used to convert the RNA into RNA-Seq libraries. The libraries were sequenced using the 150 bp read recipe with an Illumina MiSeqTM system. Data processing. The adapter sequence of the sequenced reads of RNA-Seq was trimmed. The trimmed reads shorter than 20 bp were discarded to improve the accuracy of the mapping result. Using CLC Genomics Workbench, the trimmed reads were mapped onto the C. drakei genome by using default parameters (mismatch cost = 2, deletion cost = 3, insertion cost = 3, length fraction = 0.8, and similarity fraction = 0.8) and only the uniquely mapped reads were rescued. Using the obtained reads, the gene expression was normalized using the DESeq2 package in R with default parameters. Quantitative real-time PCR. A total of eight genes associated with WLP, electron conservation, and housekeeping systems were selected for confirmation of RNA-Seq results by qPCR, as follows: B9W14_02065, B9W14_03705, B9W14_04700, B9W14_20060, B9W14_22270, B9W14_22300, B9W14_25420, and B9W14_25535. The same RNA sample used for RNA-Seq library construction was reverse transcribed to cDNA using the SuperScript III First-Strand Synthesis System (Invitrogen, Carlsbad, CA, USA), as instructed by the manufacturer. The real-time PCR was performed using the SYBR FAST qPCR master mix (KAPA BIO, Wilmington, MA, USA) and monitored on a CFX96TM Real-Time PCR Detection System under the following conditions: 98 °C for 10 s; 62 °C for 30 s; 72 °C for 10 s for 40 cycles. The data were normalized by using the results of B9W14_02065 (GTP binding protein Era) and B9W14_25420 (transcription termination factor Rho). Primers and target genes for gene expression of the GSRP and the WLP in E. limosum was listed in Dataset S12. Model reconstruction. C. drakei genome-scale metabolic network was reconstructed using gene annotation information. All model reconstruction was performed using COBRA Toolbox v3.0.3 and COBRApy v0.14.1 softwares (7). Using the protein sequence, a draft model was constructed via the ModelSEED database. Next, C. drakei genes were aligned to previously constructed models, such as for C. acetobutylicum, C. thermocellum, C. ljungdahlii, and Escherichia coli, using Smith-Waterman alignment. Genes with an alignment score of amino acid sequence identity over 60% were considered as homologs and employed for model construction. The excluded genes were examined in BiGG, Biocyc, KEGG, and Uniprot databases, and were then reconciled to the draft model (8-11). Following construction, the gaps were reconciled by using a gap-filling program and the necessary reactions added. With phylogenetically related, the biomass objective function for C. drakei was determined using the previous model, iSL771 (12). Selecting cofactors for the key pathways, previous studies on the acetogens were considered based on genetic structure and gene similarities of the associated gene clusters. In addition, the cofactor candidates were tested compared to the experimental result, in order to determine which cofactor reflects the most accurate in vivo result. The mass imbalanced or flux inconsistent reaction were constrained to avoid miscalculation. Subsequently, the RNA-Seq data obtained under heterotrophic and autotrophic conditions were integrated using the GIMME program, in which the scripts were deposited in the following address: https://github.com/SBL-Kimlab/Project_c.drakei. The simulation and constraint-based analysis was performed using IBM CPLEX linear programming solver. Following the reconstruction of the model, the flux distribution was determined by Markov chain Monte Carlo sampling method for the

Page 4: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

4

analysis in the COBRA Toolbox, for 100,000 points in each conditions (7, 13).

13C isotope labelling assay. For 13C-based analysis, 100% of U-13C-fructose and 100% of 13C-sodium bicarbonate were supplemented in the media mentioned above, instead of the carbon sources. The cultured cells were harvested at the same point as the sampling point of RNA-Seq. The obtained cells were centrifuged at 13,000 g, 4 °C for 15 min, and the supernatant removed. The pellet was resuspended

with 200 l of 6 M HCl and incubated at 100 °C for 12 h; then the incubated sample was dried at 95 °C for

8 h. The dried pellet was resuspended with 20 l of N,N-dimethylformamide, then 20 l of n-methyl-N-ter-butyldimethylsilyl trifluoracetemide was added and incubated at 85 °C for 1 h. The prepared sample was then transferred to a vial, and amino acids were measured using gas chromatography-mass spectrometry as described in a previous study (14). Thioredoxin reductase assay. A thioredoxin reductase assay kit (Sigma-Aldrich, St. Louis, MO, USA) was used for the assay. At the mid-exponential phase, the cells, which had been cultured in heterotrophic and autotrophic conditions with biological duplicates, were harvested by centrifugation at 4,000 g for 15

min at 4 °C. The collected cells were washed twice with 500 l of lysis buffer, which comprised 20 mM Tris-HCl (pH 7.4), 140 mM NaCl, 5 mM MgCl2, and 1% Triton X-100. Prior to homogenization, cell pellets

were resuspended in 500 l of lysis buffer, which was composed of 20 mM Tris-HCl (pH 7.4), 140 mM NaCl, 5 mM MgCl2, 1 mM protease inhibitor cocktail (Sigma-Aldrich), and 1% Triton X-100. The sample was frozen using liquid nitrogen, then ground using a mortar and pestle. The lysate cell was thawed in ice and the debris was removed by centrifugation at 4,000 g for 15 min at 4 °C. For measuring the protein

concentration, 5 Bradford Reagent (Bio-rad, Hercules, CA, USA) was mixed with deionized water at a

1:4 ratio. Then, 200 l of the mixed reagent was added to each well, to which 10 l of the prepared sample was added. The mixed solution was incubated at room temperature for 5 min. The samples were measured at an absorbance of 595 nm. Prior to the thioredoxin assay, all of the sample proteins were diluted to 1 mg of total protein, then the diluted protein was measured again using the same method. Using the diluted protein, activity of the samples with and without an inhibitor that selectively inhibits thioredoxin reductase was measured as instructed by the manufacturer. The results were calculated by subtracting the value of the sample with an inhibitor from that of the sample without inhibitor to obtain the actual thioredoxin reductase activity. The calculated results were normalized by the total protein in the sample. Plasmid construction. Escherichia coli strain NEB Express was used for plasmid propagation and cloning. Plasmid for electroporation into E. limosum was isolated from Escherichia coli strain ER2275 to

allow in vivo methylation. Escherichia coli was cultivated in LB medium at 37 C. For DNA amplification, Pfu-X polymerase (Solgent, Daejeon, Korea) was used. C. drakei genomic DNA was used for amplification of the glycine synthase-reductase pathway associated genes. All of the primers are indicated in Dataset S13, which used to amplify at following condition: 95 °C for 2 min; 95 °C for 20 sec; 60 °C for 30 s; 72 °C for 30 sec/Kb for 35 cycles; 72 °C for 5 min. Plasmids were constructed by inserting the amplicons into the pJIR750ai plasmid (American Type Culture Collection, VA, USA) by using combination of restriction enzyme sites SacI and NotI, SacI and BamHI, BamHI and XbaI, and XbaI and SalI for amplicon gcv1, gcv2, gcv3, and gcv4, respectively, then ligated using T4 DNA ligase. Transformation. E. limosum were cultured in 100 mL of DSM 135 medium supplemented with 5 g/l glucose. At the early-exponential phase (OD600 0.3 ~ 0.5), the cells were harvested by centrifuging at 3,000 g for 15 min at 4 °C. The harvested cells were washed with 50 ml of 270 mM sucrose buffer (pH 6) and resuspended to achieve a final concentration of 1011 cells/ml. About 1.5 ~ 2 µg plasmid was added to the electrocompetent cells and then the solution was transferred to a 0.1-cm-gap Gene Pulser cuvette (Bio-Rad, Hercules, CA). Thereafter, the cells were pulsed at 2.0 kV and immediately resuspended with 0.9 mL of reinforced clostridial medium (RCM). The cells were recovered on ice for 5 min, and incubated at 37 °C for 16 h. The recovered cells were plated on an RCM plate (1.5% agar) containing 15 µg/ml thiamphenicol. A single colony was selected and cultured in DSM 135 medium supplemented with 5 g/l glucose.

Page 5: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

5

SI Appendix Figures

Fig. S1. Comparison between the assembled genomes. Comparison of previous genome sequencing assembly to the assembly generated in this study (1). Contigs indicate the contigs generated from previous studies, which were aligned to the complete assembly constructed in this study using long read sequencing. Gaps represent missing genome sites in the previous studies. Short-read coverage reflects mapping the previous study reads onto the complete genome. Sequence conflicts represent sequence mismatches between the short reads and the constructed genome, resulting in 131 conflicting sites.

Page 6: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

6

Fig. S2. Genome sequence validation. Of the 131 total conflicting sites, 30 were confirmed using

Sanger sequencing. The long read sequences represent sequences generated using the PacBio platform

and short read sequences represent sequences generated by the Illumina platform. Red sequence

represents conflicting sequence that was corrected by Sanger sequencing. Of 30 validated sites, all

matched to sequences generated by long reads.

Page 7: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

7

Fig. S3. Gene clusters in C. drakei. (A) 16S rRNA sequence based phylogenetic tree. (B) Overview of

Wood-Ljungdahl pathway, Rnf complex, ATP synthase, and hydrogenase system identified in C. drakei.

Abbreviations: A: Acetobacterium; Ac: Acetohalobium; Ca: Carboxydothermus; C: Clostridium; E:

Eubacterium; M: Moorella; Te: Thermacetogenium; Th: Thermoanaerobacter; Tr: Treponema.

Page 8: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

8

Fig. S4. In silico analysis of Clostridium drakei (iSL771). (A) Flow of iSL771 development. The steps included draft model construction using Model SEED, then reconciliation with homologous genes from four models. Excluded genes mined from the databases were integrated to the draft model, then the model was curated fulfil biomass. The final metabolic model of C. drakei was termed iSL771. (B) Experimental validation of predicted growth and production rate. Grey color bars represent experimental data obtained from culturing C. drakei. Darker grey color bars represent predicted data obtained from iSL771. (C) Effect on predicted growth rate by change of fluxes of CO2 fixing reactions. (D) Correlation between CODH/ACS and GLYR fluxes in iSL771.

Page 9: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

9

Fig. S5. Cell morphology of Clostridium drakei under heterotrophic and autotrophic conditions. (A)

Cell density of C. drakei in heterotrophic (black circles, 5 g/L fructose) and autotrophic conditions (red

rhombi, H2/CO2 (80:20)). (B) Substrate consumption of fructose (red circles), production of acetic acid

(grey rhombi), and production of butyrate (yellow triangles) under heterotrophic conditions are shown.

Production of acetic acid (blue squares) under autotrophic conditions is shown.

Page 10: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

10

Fig. S6. Quality of RNA-Seq. (A) Duplicates of RNA-Seq data are highly reproducible (R2 ≥ 0.92). (B) Validation of mRNA expression of six genes using quantitative real-time PCR verification. B9W14_02065 and B9W14_25420 were used as reference genes.

Page 11: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

11

Fig. S7. Transcriptome analysis of Clostridium drakei in heterotrophic and autotrophic conditions. (A) Volcano plot comparing the heterotrophic condition to the autotrophic condition. The Y-axis represents log ratio of the P-value and the X-axis represents fold change. The red dots indicate P values less than 0.05 with fold change of greater or less than two-fold, for both upregulation and downregulation. (B) Change of differentially expressed genes categorized by Clusters of Orthologous Groups (COGs). Of the total genes, the proportion of 693 significantly upregulated genes (red bar) and 651 downregulated genes (blue bar) in each category is shown. Abbreviations of the categories are listed below. (C) Transcriptional expression of the gene cluster (B9W14_22240–22310) encoding the Wood-Ljungdahl pathway. Green color indicates genes encoding the Wood-Ljungdahl pathway, ‘H’ indicates heterotrophic condition, and ‘A’ represents autotrophic condition. Abbreviations: J, Translation, ribosomal structure, and biogenesis; K, Transcription; L, Replication, recombination, and repair; D, Cell cycle control, cell division, chromosome partitioning; V, Defense mechanism; T, Signal transduction mechanism; M, Cell wall/membrane/envelope biogenesis; N, Cell motility; U, Intracellular trafficking, secretion, and vesicular transport; O, Post-translational modification, protein turnover, and chaperones; C, Energy production and conversion; G, Carbohydrate transport and metabolism; E, Amino acid transport and metabolism; F, Nucleotide transport and metabolism; H, Coenzyme transport and metabolism; I, Lipid transport and metabolism; P, Inorganic ion transport and metabolism; Q, Secondary metabolites biosynthesis, transport, and catabolism.

Page 12: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

12

Page 13: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

13

Fig. S8. Metabolic pathways, energy conservation system, and transcriptional dynamics of C. drakei. (A) General metabolic pathway and energy conservation system associated with heterotrophic growth and autotrophic growth is shown. Central metabolic pathway (blue rectangle), incomplete tricarboxylic acid (TCA) cycle (purple rectangle), the Wood-Ljungdahl pathway (WLP) (green rectangle), and glycine synthase-reductase pathway and reductive glycine pathway (GSRP/RGP) (light purple rectangle) are shown. Metabolites are shown as rectangles and reactions are shown as arrows. All gene numbers are listed in the rectangle next to the arrows. (B) The heatmap is composed of three columns; the top and the middle boxes represent normalized RNA reads from heterotrophic and autotrophic conditions, respectively. The bottom box indicates fold change, as autotrophic expression over heterotrophic expression. In addition, gene numbers responsible for the expression are located under the bottom box of the heatmap. A full list and expression data can be found in Dataset S6. In addition, the Wood-Ljungdahl pathway (WLP), the glycine synthase-reductase pathway (GSRP) and reductive glycine pathway (RGP), and central metabolic pathway associated transcriptional expressions can be found in Datasets S8, S9, S10, and S11, respectively. Asterisk represent insignificant change of the corresponding genes.

Page 14: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

14

Fig. S9. Introduction of glycine synthase-reductase pathway in E. limosum. (A) A scheme of introduction of glycine synthase-reductase pathway coding genes from C. drakei into E. limosum strain, which does not contain the pathway. E. limosum with the pathway encoding genes is termed GSRP and without the genes is termed control. (B) Amplicons of gene expression of GSRP under autotrophic growth condition. The abbreviations are as following: 1, gcvT; 2, gcvH; 3, gcvPA; 4, gcvPB; 5, grdX; 6, trxB; 7, trxA; 8, grdE; 9, grdA; 10, grdB; 11, grdC; 12, grdD; M, marker; Ref, reference gene; C1, thiamphenicol resistance gene; C2, guanylate kinase gene. (C) Amplicons of gene expression of the native Wood-Ljungdahl pathway associated genes in E. limosum with and without GSRP coding genes. The abbreviations are as following: 1, fhs; 2, fchA; 3, folD; 4, metV; 5, metF; 6, lpd; 7, cooC; 8, acsD; 9, acsC; 10, acsE; 11, acsA; 12, cooC; 13, acsB; Ref, reference gene; C1, thiamphenicol resistance gene; C2, guanylate kinase gene.

Page 15: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

15

SI Appendix Text

SI Appendix Text S1 Genome sequencing of C. drakei. In a previous report, the C. drakei genome was constructed using the Illumina short-read platform, resulting in 140 contigs with a genome size of 5,635,531 bp, 5,763 coding sequences (CDS), 11 rRNAs, and 100 tRNAs (1). Although the WLP-associated genes were identified, potentially important genetic features may be absent in the draft genome owing to missing genome regions. To incorporate the missing information, in the present study the C. drakei SL1T genome was sequenced using a long read sequencer, the Single Molecule Real-Time (SMRT) platform, then assembled via a hierarchical genome assembly process pipeline (SI Appendix, Fig. S1) (15). Consequently, a complete genome, composed of 5,695,241 nt with 29.7% GC content, was assembled by using the trimmed 115,636 reads with 924,907,730 bases (SI Appendix, Fig. S1).

However, SMRT sequencing often presents random errors in the reads, resulting in incorrect nucleotide sequences in the constructed genome; thus, additional nucleotide sequence confirmation is required to provide accurate genome information (2). To validate the genome sequences, the reads generated by Jeong et al., which were sequenced using the relatively low-error Illumina platform, were aligned to the constructed genome and 131 potential incorrect nucleotide sites were determined (SI Appendix, Fig. S1 and Dataset S2) (1). Of the 131 conflicting sites, all of the nucleotide conflicts represented sequence mismatches, which were composed of one site with two tandem conflicting sequences and the remainder of the sites with a single conflicting sequence. Of these, 30 conflicting sites were selected for confirmation using Sanger sequencing, which determined that all of the sites matched to the SMRT read sequences (SI Appendix, Fig. S2). Notably, this outcome contradicts the initial expectation that the relatively low-error Illumina reads, rather than SMRT reads, would match to the Sanger sequencing results. One possible explanation is the potential occurrence of mutations in the bacteria. Although the identical strain was sequenced, the C. drakei SL1T used in Jeong et al. and in the present study may have different mutations owing to natural adaptation to laboratory culture conditions. Under this consideration, the nucleotide sequences of the remaining 101 conflicting sites, as well as the 30 validated sites, were retained as in the initial genome sequence of C. drakei obtained herein. SI Appendix Text S2 Genome annotation. Using the validated genome sequence, genome features responsible for autotrophic growth of C. drakei were identified via gene annotation, which was predicted using the NCBI Prokaryotic Genome Automatic Annotation Pipeline (PGAAP) (Dataset S3). According to the genome analysis, the origin of replication and the terminus replication positions were determined at genomic positions of 5,461,506 and 2,682,346, respectively. After the annotation, a total of 5,144 genes with 30 rRNAs and 90 tRNAs (16) were identified in the genome. Among the 5,144 genes, 5,024 were predicted as CDS, of which 3,840 (76.43%) were functionally predicted genes and 1,304 (25.96%) were hypothetical genes (Dataset S3). Using the annotated genetic feature, comparative phylogenetic analysis between C. drakei 16S rRNA and those of the 14 acetogenic bacteria with completed sequences and which are capable of autotrophic growth, indicated that C. drakei is closely related to C. carboxidivorans DSM 15243 and C. scatologenes ATCC 25775; notably, these had been initially classified as the same species then were reclassified in a subsequent study (SI Appendix, Fig. S3A) (17, 18). Furthermore, the three bacteria were reported to inhabit a similar temperature and pH range, along with exhibiting similar growth rate using fructose or CO as substrates (17). In addition, C. drakei is closely related to phylogenetically indistinguishable C. autoethanogenum DSM 10061 and C. ljungdahlii DSM 13528, which share a large portion of identical genome sequences (19, 20). Consistent with the phylogenetic analysis, a total of nine sigma factors were identified in the genome including housekeeping genes (B9W14_03085, σ70), stress response (B9W14_00390, σ54), and sporulation master regulation sigma factor (B9W14_04605, Spo0A), which is similar to those in other Clostridium species (Dataset S3) (21). Of the identified sigma factors, the housekeeping gene potentially plays an important role for autotrophic growth by transcriptionally regulating hydrogenase and WLP-associated genes with a conserved binding motif at −10 and −35 regions (22). In addition, Spo0A regulates the formation of terminal and free spores (23), which then sequentially activate sporulation-specific sigma factors σE (B9W14_05225), σF (B9W14_20840), σG (B9W14_05230), σH (B9W14_23705), and σK (B9W14_05195) in the mother cell and the forespore. However, the sporulation phosphotransfer reaction-coding Spo0B and Spo0F components

Page 16: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

16

were not located, which is similar to the case in other Clostridium species (21). SI Appendix Text S3 The Wood-Ljungdahl pathway in C. drakei. The initial reaction of WLP reduces CO2 and converts it into formate by catalyzation using formate dehydrogenase (FDH), which complex varies by acetogens (SI Appendix, Fig. S3B) (24). In the C. drakei genome, a total of three clusters containing FDH were identified (B9W14_06825, B9W14_08960, and B9W14_20090), with one including seleno-dependent and two seleno-independent coding genes. The first two seleno-independent FDH (B9W14_06825 and B9W14_08960) genes were found without any proximal cofactor genes, such as ferredoxin or NAD(P)H (Dataset S3). In contrast, the third, seleno-dependent FDH (B9W14_20090) gene was located along with six and four cofactor coding genes located upstream and downstream of the gene, respectively. As mentioned above, genes in the cluster consisted of hydrogenases that bifurcate and utilize ferredoxin and NADPH, which is identical to the reactions in the phylogenetically related C. autoethanogenum (Fig. 1A and B) (25). Based on the gene composition of the hydrogenase and FDH, reduced ferredoxin and NADPH by the hydrogenase complexes appear to cooperate with FDH to activate CO2 to formate, as the initial reaction of the WLP (SI Appendix, Fig. S3B).

Using the converted formate, WLP-associated gene-encoded proteins, which are responsible for the downstream reaction of FDH, combine acetyl-CoA with an additional CO2. Genes for the carbonyl and methyl branches of the WLP in the C. drakei genome were identified in a single cluster that consisted of 15 genes, identical to the WLP clusters in other Clostridium species (Fig. 1B). The cluster contains the first gene (acsV: B9W14_22240) encoding the ferredoxin responsible for corrinoid activation/regeneration of the CO dehydrogenase (CODH)/acetyl-CoA synthase (ACS) complex and the second gene (gcvH B9W14_22245) encoding the lipoate-containing protein H of the glycine cleavage system.

The next five genes are responsible for the carbonyl branch of WLP, being associated with acetyl coeznymeA (acsB: B9W14_22250), methyltransferase (acsE: B9W14_22255), corrinoid/iron sulphur protein (acsC: B9W14_22260, acsD: B9W14_22265), and CODH accessory protein (cooC: B9W14_22270). Following the carbonyl branch genes, lpdA (B9W14_22275) encoding dihydrolipoyl dehydrogenase and the five methyl branch-associated genes encoding subunits of methylene-THF reductase (metF: B9W14_22280, metV: B9W14_22285), methylene-tetrahydrofolate (THF) dehydrogenase (folD: B9W14_22290), methenyl-THF cyclohydrolase (fchA: B9W14_22295), and formate-THF synthetase (fhs: B9W14_22300) were identified. Although the initial annotation of B9W14_22295 was as a sugar ABC transporter substrate-binding protein, the annotation was corrected to methyl-THF cyclohydrolase based on the high amino acid identity (99%) with this gene in C. scatologenes (26).

The last two genes encode the CODH complex (cooC: B9W14_22305 and acsA: B9W14_22310), which is responsible for converting CO2 to CO then into acetyl-CoA with the ACS complex. In addition, an extra cluster copy composed of CODH-coding genes (B9W14_11690–11700) is located outside of the conserved WLP cluster. Of the three genes, cooF (B9W14_11695) and cooS (B9W14_11700) are responsible for capturing and oxidizing CO, which likely plays an important role during the utilization of CO for acetogenesis growth.

The gene composition and organization of the clusters are identical to the cluster in phylogenetically related C. carboxidivorans and C. scatologenes with minimal protein identity of 93% with maximum E-value of 5E-84. Furthermore, the cluster showed high similarity with the clusters in C. ljungdahlii and C. autoethanogenum, ranging between 73% to 90% identity (27). Based on the identical gene composition and high gene similarity, WLP reactions in C. drakei likely convert CO2 to formate then to acetyl-CoA with ferredoxin and NADPH, which is similar to the pathways in C. autoethanogenum and C. ljungdahlii. SI Appendix Text S4 Energy conservation in C. drakei. For autotrophic growth, the utilization of hydrogenase is essential for reducing CO2 from H2. A total of seven hydrogenase clusters are located in the C. drakei genome: B9W14_10930–10950, B9W14_11370–11395, B9W14_14030–14045, B9W14_14790, B9W14_22055, B9W14_20060–20085, and B9W14_20095. Among these, the first cluster (B9W14_109300–10950) is composed of Ni/Fe hydrogenases along with the Ni-inserting maturation factor; the second cluster contains hydrogenases and formate hydrogenlyase (B9W14_11385) that generate H2 and CO2 from

Page 17: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

17

formate; the third cluster consists of three genes encoding hydrogenase and a single maturation protein for the assembly of the protein; and the fourth and the fifth clusters encode iron-only hydrogenase genes. Notably, the sixth and the seventh cluster are located upstream of formate dehydrogenase (FDH, B9W14_20090) along with ferredoxin and NADPH specific electron-bifurcating hydrogenase, similar to the organization of C. autoethanogenum with a minimal identity of 58% (25).

Energy conservation plays an important role in the autotrophic growth of acetogens, which create transmembrane ion gradients by using proton-dependent cytochromes and proton/sodium ion-translocating ferredoxin:NAD+ oxidoreductase (Rnf), and obtain energy from differences between redox couples. In the C. drakei genome, a single ATP synthase cluster was identified with nine associated genes (B9W14_25495–25535), which showed high similarity to the complex in C. autoethanogenum with minimal identity of 62% (Fig. 1A). The ATP synthase complex in C. autoethanogenum was reported to utilize translocated H+ to generate ATP. With the absence of a Na+ binding motif in the c subunit of the complex, similar to that in C. autoethanogenum, the ATP synthase in C. drakei utilizes the H+ gradient generated by the Rnf complex, which translocates ions by oxidizing ferredoxin and reducing NAD+. In the C. drakei genome, six genes encoding Rnf complex (B9W14_04700–04725) proteins, which are similar to the complex in C. autoethanogenum with minimal identity of 64%, were identified. As expected, all of the genes, except the NAD+ reducing gene (rnfC), have membrane signal peptides in their N-terminus sequence, confirming that the complex functions as integral membrane protein. As in other acetogens, the combination of the membrane bound Rnf complex and ATP synthase generate the necessary energy required during acetogenesis growth (SI Appendix, Fig. S3B).

Electron transfer flavoprotein (ETF) and electron-bifurcating transhydrogenase (Nfn) are located in seven clusters (B9W14_01145–01150, B9W14_06520–06525, B9W14_06690, B9W14_06735–06740, B9W14_07270–07275, B9W14_07910–07915, and B9W14_15585–15590) and three clusters (B9W14_04810–04815, B9W14_09425–09430, and B9W14_22080–22085), respectively (Dataset S3). For ETF, two clusters (B9W14_07270–07275 and B9W14_15585–15590) are located proximal to the lactate dehydrogenase (B9W14_07280 and B9W14_15580) genes, with the proteins being speculated to accept electrons from the lactate dehydrogenase reaction as mentioned in studies on A. woodii (28). ETF accepts electrons from NADH, which then reduces Fd. In addition, the Fd reduced form donates electrons to Nfn to reduce NAD+ and NADP+. Although the Nfn protein is encoded as single peptide in phylogenetically related C. autoethanogenum and C. ljungdahlii, similar to C. kluyveri, genes encoding two subunits of Nfn (nfnA and nfnB) were identified in the C. drakei genome. SI Appendix Text S5 Genome-scale model construction and prediction. For the model construction, initially, gene annotation of C. drakei was processed through the Model SEED database (29). As a result, a total of 922 reactions from 771 genes were first reconstructed, based on the genome annotation of C. drakei (29). Then, homologous reaction search against C. acetobutylicum, C. thermocellum, C. ljungdahlii, and Escherichia coli models was performed, including reactions with amino acid sequence identity greater than 60% (30-32). Clostridia models were used owing to the phylogenetic distance to C. drakei and the Escherichia coli strain model was used because of its abundant data pool. The excluded reactions of C. drakei, resulting from low similarity score or absence of information in other models, were searched in BiGG, Biocyc, KEGG, and Uniprot databases, and were manually added to the model (8, 33-35). The functional performance of the constructed model was calculated using a biomass objective function, which was then evaluated using constraints-based reconstruction and analysis (COBRA) to determine missing metabolic network gaps that couple with necessary biomass components (7, 36). Eventually, a final model with 771 genes, 922 reactions, and 854 metabolites was constructed and named iSL771.

Using the iSL771 model, the functional capability of C. drakei in heterotrophic and autotrophic conditions was validated based on its growth in the conditions. Growth rate of C. drakei in exponential phase for heterotroph was 0.184 h-1 and for autotroph was 0.044 h−1 (SI Appendix, Fig. S5A). Metabolite consumption of fructose was completed at 48 h, which was the highest OD 600 nm time point (SI Appendix, Fig. S5B). A total production of 73.3 mM acetate and 7.0 mM butyrate was accumulated over a 143 h span in the heterotrophic condition and total of 11.1 mM acetate was produced in the autotrophic condition (SI Appendix, Fig. S5B). To compare the experimental result with the simulation generated by the model, acetate production rates were calculated in mmol per gram of dry weight per hour (mmol/gDW/h), with the experimental acetate production rates were 4.524 and 3.278 mmol/gDW/h. The

Page 18: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

18

experimentally validated consumption rates were constrained and resulted in in silico growth rates of 0.178 and 0.046 h−1 for fructose and H2/CO2, respectively, which were in good agreement with the experimental growth rates (SI Appendix, Fig. S4B). In addition, the predicted acetate production rates were 4.490 and 3.291 mmol/gDW/h for fructose and H2/CO2, respectively (SI Appendix, Fig. S4B). Following the validation, a balance for ATP of model under heterotrophic and autotrophic was simulated, then obtained 1Frucose → 3Acetate + 4.380ATP for heterotrophic condition and 4H2 + 2CO2 →1Acetate + 0.875ATP for the autotrophic condition. SI Appendix Text S6 Transcriptome analysis. For RNA-Seq, C. drakei was cultured in DSM 135 media supplemented with 5 g/L glucose as heterotrophic condition or purged with H2/CO2 (80:20) at 200 kPa as autotrophic condition at 30 °C. For sampling, the mid-exponential phase was selected, with optical density at wavelength 600 nm (OD 600nm) of 1.15 at 42 h and OD 600 nm of 0.198 at 32 h with from heterotrophic and autotrophic conditions, respectively. At that time point, 22.91 and 5.70 mM acetate production was observed in the respective conditions. Subsequently, ribosomal RNAs among total RNAs obtained from the heterotrophic and autotrophic conditions were removed, then the screened RNA was converted into cDNA libraries. Following the conversion, using the Illumina sequencing platform, a of total 10.6 million reads were generated, which were then trimmed into 10.2 million reads with average length of 137.0 bp, and mapped onto the C. drakei genome and normalized using the Bioconductor package DESeq2 (Dataset S5) (37). The normalized RNA-Seq data showed that the data were reproducible with high correlation between the biological duplicates (Pearson correlation coefficient r2 > 0.92) as determined by hierarchical clustering (SI Appendix, Fig. S6A). Based on the analysis, the transcriptional regulation of C. drakei appeared to alternate cellular function under the heterotrophic and the autotrophic conditions.

To validate the RNA-Seq result, gene expression of eight genes was investigated using qPCR analysis. GTPase (B9W14_02065) and transcription termination factor (B9W14_25420) coding genes were used as reference genes, which showed insignificant transcriptional change in both conditions. Glucokinase (B9W14_03705), a subunit of the Rnf complex (B9W14_04700), hydrogenase (B9W14_20060), CODH (B9W14_22270), formate-tetrahydrofolate ligase (B9W14_22300), and a subunit of ATP synthase (B9W14_25535), which exhibited fold change (log2) variation between heterotrophic and autotrophic condition from −0.27 to 4.54, were selected. The qPCR expression values of the six genes were calculated into fold changes between the conditions, then compared to the fold change of the RNA-Seq data, showing a positive correlation (Pearson correlation coefficient r2 = 0.91) (SI Appendix, Fig. S6B). The high correlation value validated the RNA-Seq results as accurate and reproducible.

The significantly changed genes, based on two-fold change and P-value lower than 0.05, were analyzed using COGs (SI Appendix, Fig. S7A and Dataset S7). The COG classification provides insight towards understanding the change of the cellular functions. The COG analysis revealed cell motility (N), energy production and conversion (C), secondary metabolite biosynthesis (Q), carbohydrate transport and metabolism (G), and amino acid transport and metabolism (E) as reflecting the most upregulated gene proportion with 80.0%, 72.3%, 66.7%, 64.4%, and 63.9%, respectively (SI Appendix, Fig. S7B). In contrast, intracellular trafficking, secretion, and vesicular transport (U), translation, ribosomal structure, and biogenesis (J), defense mechanisms (V), replication, recombination and repair (L), and cell wall/membrane/envelope biogenesis (M) contained the greatest proportions of downregulated genes with 100.0%, 92.9%, 90.0%, 85.0%, and 80.49%, respectively (SI Appendix, Fig. S7B). Taken together, the results indicated that C. drakei transcriptionally activates energy production and conservation-associated genes in an effort to maintain the energy necessary during CO2 fixation, but regulates energy expense for cell wall biosynthesis, cell replication, and translation process, which translates to the low growth rate in the autotrophic condition.

Gene expression of the energy conservation complexes also showed transcriptional changes. During autotrophic growth, transcriptional change of the Rnf complex was increased with a minimum fold

change of 2.69 (DESeq P < 4.19 10−20) (SI Appendix, Fig. S8A, B, and Dataset S9), which was consistent with the transcriptional change in C. autoethanogenum but in contrast to that in C. ljungdahlii (38, 39). Despite the change of the Rnf complex, transcriptional abundance of the genes encoding the ATP synthase complex was insignificantly upregulated with a minimum fold change of 0.69 (DESeq P <

6.28 10−2) (SI Appendix, Fig. S8A, B, and Dataset S9). One possible explanation for the minimal change may be the active transcription of the complex in heterotrophic conditions with minimum

Page 19: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

19

normalized reads of 192.27, which was more than the transcriptional expression of other energy conservation system-coding genes in this condition. Unlike in other reported acetogens, the ATP synthase is transcriptionally active in the presence of fructose, as well as under H2/CO2, whereas gene expression of the Rnf complex is active only during autotrophic conditions (32, 38, 39).

Additional energy conservation complexes, such as electron transfer flavoprotein (ETF) and electron-bifurcating transhydrogenase (Nfn), which reduce and oxide Fd, NAD+, and NADP+, were also identified in C. drakei (SI Appendix, Fig. S8 and see Text S4 for detailed description) (24, 40, 41). Of the seven ETF clusters, two (B9W14_07270–07275 and B9W14_15585–15590), which are located in the lactate dehydrogenase (LDH) cluster, were transcriptionally activated with a minimum fold change of 6.93

(DESeq P < 8.86 10−61) (SI Appendix, Fig. S8 and Dataset S9). Despite an absence of lactate detected in both conditions, gene expression of LDH, along with genes located within the cluster, was

significantly upregulated with a minimum fold change of 6.04 (DESeq P < 1.82 10−56), indicating that the whole gene cluster is required for oxidizing NADH to reduce Fd. Consistent with the gene expression of ETF, a transcriptional abundance of NADP+-reducing Nfn genes were significantly upregulated with a

minimum fold change of 2.04 (DESeq P < 2.15 10−3) in two clusters (B9W14_09425–09430 and B9W14_22080–22085) (Dataset S9). Collectively, these results suggest that C. drakei selectively transcribes two ETF and two Nfn gene clusters to generate and distribute the redox energy required under autotrophic conditions.

To fully comprehend the systemic change of C. drakei under the autotrophic condition, the transcriptional abundance of central metabolism, such as glycolysis and the TCA cycles, was examined (Dataset S11). Notably, of the genes responsible for the glycolysis pathway, the fbp gene encoding the protein responsible for the fructose 1,6-phosphate to fructose 6-phosphate reaction, a major regulator for

gluconeogenesis, was upregulated with a fold change of 3.20 (DESeq P < 3.76 10−35) (Dataset S11). In addition, gluconeogenesis associated glyceraldehyde-3-phosphate dehydrogenase (B9W14_07390: gapA), and pyruvate ferredoxin oxidoreductase complex (B9W14_0910–09025 and B9W14_23275–23290: por) coding genes were significantly upregulated with 8.27- and over 4.22-fold changes,

respectively (DESeq P < 1.17 10−38) (Dataset S11). Consistent with this result, pyruvate phosphate kinase (B9W14_02085: ppsA) and phosphoenolpyruvate carboxykinase (B9W14_01175: pckA) coding genes, which are rate limiting genes for gluconeogenesis, were significantly upregulated with a minimum

fold change of 1.80 (DESeq P < 9.31 10−13) (42). In contrast, transcriptional changes of the other genes encoding glycolysis factors either remained unchanged or were significantly downregulated (SI Appendix, Fig. S8A, B, and Dataset S11). This indicated that during acetogenesis in the autotrophic condition, glycolysis was repressed whereas gluconeogenesis was upregulated, perhaps to provide precursors derived from the glycolytic pathway that were lacking under the heterotrophic condition (38, 43). In addition, transcriptional expression of the TCA cycle-coding genes were mostly significantly downregulated, except gene clusters containing 2-oxoacid:ferredoxin oxidoreductase complex encoding genes (B9W14_12115–12130: kor) and the succinate dehydrogenase gene (B9W14_10620), which were

significantly upregulated with a fold change greater than 2.50 (DESeq P < 1.25 10−10) (Dataset S11).

Page 20: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

20

Supplementary References

1. Y. Jeong, Y. Song, H. S. Shin, B. K. Cho, Draft genome sequence of acid-tolerant Clostridium drakei SL1T, a potential chemical producer through syngas fermentation. Genome Announc. 2 (2014).

2. S. Koren et al., Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol. 30, 693-700 (2012).

3. T. M. Lowe, S. R. Eddy, tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955-964 (1997).

4. K. Lagesen et al., RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35, 3100-3108 (2007).

5. J. Huerta-Cepas et al., eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 44, D286-293 (2016).

6. M. Kanehisa, Y. Sato, K. Morishima, BlastKOALA and GhostKOALA: KEGG Tools for functional characterization of genome and metagenome sequences. J. Mol. Biol. 428, 726-731 (2016).

7. J. Schellenberger et al., Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0. Nat. Protoc. 6, 1290-1307 (2011).

8. M. Kanehisa, S. Goto, KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27-30 (2000).

9. R. Caspi et al., The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 44, D471-480 (2016).

10. C. The UniProt, UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158-D169 (2017).

11. Z. A. King et al., BiGG Models: A platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Res. 44, D515-522 (2016).

12. A. M. Feist, B. O. Palsson, The biomass objective function. Curr. Opin. Microbiol. 13, 344-349 (2010).

13. J. D. Orth, I. Thiele, B. O. Palsson, What is flux balance analysis? Nat. Biotechnol. 28, 245-248 (2010).

14. N. Zamboni, S. M. Fendt, M. Ruhl, U. Sauer, 13C-based metabolic flux analysis. Nat. Protoc. 4, 878-892 (2009).

15. C. S. Chin et al., Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods. 10, 563-569 (2013).

16. T. Tatusova et al., NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res. 44, 6614-6624 (2016).

17. J. S. Liou, D. L. Balkwill, G. R. Drake, R. S. Tanner, Clostridium carboxidivorans sp. nov., a solvent-producing clostridium isolated from an agricultural settling lagoon, and reclassification of the acetogen Clostridium scatologenes strain SL1 as Clostridium drakei sp. nov. Int. J. Syst. Evol. Microbiol. 55, 2085-2091 (2005).

18. K. Kusel, T. Dorsch, G. Acker, E. Stackebrandt, H. L. Drake, Clostridium scatologenes strain SL1 isolated as an acetogenic bacterium from acidic sediments. Int. J. Syst. Evol. Microbiol. 50 Pt 2, 537-546 (2000).

19. E. Stackebrandt, I. Kramer, J. Swiderski, H. Hippe, Phylogenetic basis for a taxonomic dissection of the genus Clostridium. FEMS Immunol. Med. Microbiol. 24, 253-258 (1999).

20. J. M. Bruno-Barcena, M. S. Chinn, A. M. Grunden, Genome sequence of the autotrophic acetogen Clostridium autoethanogenum JA1-1 strain DSM 10061, a producer of ethanol from carbon monoxide. Genome Announc. 1 (2013).

21. J. A. Hoch, Regulation of the phosphorelay and the initiation of sporulation in Bacillus subtilis. Annu. Rev. Microbiol. 47, 441-465 (1993).

22. Y. Song et al., Determination of the genome and primary transcriptome of syngas fermenting Eubacterium limosum ATCC 8486. Sci. Rep. 7, 13694 (2017).

23. I. H. Huang, M. Waters, R. R. Grau, M. R. Sarker, Disruption of the gene (spo0A) encoding sporulation transcription factor blocks endospore formation and enterotoxin production in enterotoxigenic Clostridium perfringens type A. FEMS Microbiol. Lett. 233, 233-240 (2004).

Page 21: Fig. S2. Fig. S3. Fig. S5. Fig. S6.€¦ · Fig. S2. Genome sequence validation. Fig. S3. Gene clusters in C. drakei. Fig. S4. In silico analysis of Clostridium drakei (iSL771). Fig

21

24. K. Schuchmann, V. Muller, Autotrophy at the thermodynamic limit of life: a model for energy conservation in acetogenic bacteria. Nat. Rev. Microbiol. 12, 809-821 (2014).

25. S. Wang et al., NADP-specific electron-bifurcating [FeFe]-hydrogenase in a functional complex with formate dehydrogenase in Clostridium autoethanogenum grown on CO. J. Bacteriol. 195, 4373-4386 (2013).

26. G. Bruant, M. J. Levesque, C. Peter, S. R. Guiot, L. Masson, Genomic analysis of carbon monoxide utilization and butanol production by Clostridium carboxidivorans strain P7T. PLoS One 5, e13033 (2010).

27. A. Poehlein et al., The complete genome sequence of Clostridium aceticum: a missing link between rnf- and cytochrome-containing autotrophic acetogens. MBio 6, e01168-01115 (2015).

28. M. C. Weghoff, J. Bertsch, V. Muller, A novel mode of lactate metabolism in strictly anaerobic bacteria. Environ. Microbiol. 17, 670-677 (2015).

29. C. S. Henry et al., High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat. Biotechnol. 28, 977-982 (2010).

30. J. Lee, H. Yun, A. M. Feist, B. O. Palsson, S. Y. Lee, Genome-scale reconstruction and in silico analysis of the Clostridium acetobutylicum ATCC 824 metabolic network. Appl. Microbiol. Biotechnol. 80, 849-862 (2008).

31. S. B. Roberts, C. M. Gowen, J. P. Brooks, S. S. Fong, Genome-scale metabolic analysis of Clostridium thermocellum for bioethanol production. BMC Syst. Biol. 4, 31 (2010).

32. H. Nagarajan et al., Characterizing acetogenic metabolism using a genome-scale metabolic reconstruction of Clostridium ljungdahlii. Microb. Cell Fact. 12, 118 (2013).

33. Z. A. King et al., BiGG Models: A platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Res. 44, D515-522 (2016).

34. R. Caspi et al., The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 44, D471-480 (2016).

35. C. The UniProt, UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158-D169 (2017).

36. N. E. Lewis, H. Nagarajan, B. O. Palsson, Constraining the metabolic genotype-phenotype relationship using a phylogeny of in silico methods. Nat. Rev. Microbiol. 10, 291-305 (2012).

37. S. Anders, W. Huber, Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).

38. E. Marcellin et al., Low carbon fuels and commodity chemicals from waste gases - systematic approach to understand energy metabolism in a model acetogen. Green Chem. 18, 3020-3028 (2016).

39. Y. Tan, J. Liu, X. Chen, H. Zheng, F. Li, RNA-seq-based comparative transcriptome analysis of the syngas-utilizing bacterium Clostridium ljungdahlii DSM 13528 grown autotrophically and heterotrophically. Mol. Biosyst. 9, 2775-2784 (2013).

40. F. Li et al., Coupled ferredoxin and crotonyl coenzyme A (CoA) reduction with NADH catalyzed by the butyryl-CoA dehydrogenase/Etf complex from Clostridium kluyveri. J. Bacteriol. 190, 843-850 (2008).

41. S. Wang, H. Huang, J. Moll, R. K. Thauer, NADP+ reduction with reduced ferredoxin and NADP+ reduction with NADH are coupled via an electron-bifurcating enzyme complex in Clostridium kluyveri. J. Bacteriol. 192, 5115-5123 (2010).

42. Y. P. Chao, R. Patnaik, W. D. Roof, R. F. Young, J. C. Liao, Control of gluconeogenic growth by pps and pck in Escherichia coli. J. Bacteriol. 175, 6939-6944 (1993).

43. I. A. Berg et al., Autotrophic carbon fixation in archaea. Nat. Rev. Microbiol. 8, 447-460 (2010).