View
2
Download
0
Category
Preview:
Citation preview
Methodological impact on metagenomics analyses: the skin microbiome
and beyond
Cyrille Jarrin1, Patrick Robe1, Daniel Auriol1, David Villanova1, Kuno Schweikert2
1 Libragen, Canal Biotech, 3 rue des satellites, F-31400, Toulouse, France 2 Induchem, industriestrasse 8a, CH-8604 Volketswil, Switzerland
Introduction
The term metagenome was coined by Handelsman and coworkers in 1998 [1] and was
defined as the collective genome of microflora found in a defined environment.
Metagenomics is a culture-independent biomolecular way of analyzing environmental
samples of cohabiting microbial populations that has allowed biologists to access to the huge
“hard to isolate” fraction of environmental microbial communities. The initial studies aiming at
the characterization of the full extent of microbial diversity included DNA purification,
fragmentation and cloning in an easily cultivable host. The recombinant cells were
subsequent grown to obtain sufficient DNA for the subsequent analyses. These analyses
were designed to elucidate taxonomic compositions (16S rRNA) or to identify functional
properties (in particular using sometimes sophisticated methods of detection of enzymatic
activities). With the advent of the so called next generation technologies of sequencing
(NGS), metagenome cloning is no longer necessary since the required DNA amounts are by
far less important. Often nowadays inaccurately restricted to the study of environmental
samples using sequencing [2], metagenomics has promoted a considerable increase in
knowledge of both the taxonomic and functional microbial diversity of natural ecosystems.
Moreover, the exploration of microbial communities associated with human body sites (gut,
skin, mouth, vagina …) enables the deciphering of close relationships between human health
and inhabiting microbiota (e.g. [3]).
Realizing the potential of metagenomics for discovering novel genes from the yet untapped
microbial diversity, libragen has performed for 15 years analyses of microbial communities
[4] as well as research programs aiming at the discovery of new and performant biocatalysts
and metabolic pathways that give solutions to industrial issues [5-6].
The apparent practical simplicity of DNA extraction from natural samples using dedicated
commercial kits, and the explosion of NGS facilities, allows to define the genetic diversity of
bacterial communities and enables prediction of associated gene functions. However, the
applied procedures in the publications are non-standardized, sometimes poorly described.
As a consequence, results can hardly be compared in particular when a defined environment
such as human body is considered.
It must be mentioned that many sources of technical biases that can significantly impact the
results have been identified: experimental design, sampling, sample storage [7], insufficient
purity of extracted nucleic acids, inappropriate selection of 16S variable region [8], poor
choice of primers set [9], insufficient control of produced libraries, inappropriate raw data
processing [10] or deficient statistical analyses.
The objective of libragen was to evaluate more precisely the impact of the conditions of
sample storage, DNA extraction methods and primers set selection on the observed
taxonomic profiles of two selected environments, human gut and human skin. This will allow
to have detailed and reliable methodological information when establishing an experimental
plan dedicated to the characterization of microbiomes.
Materials and Methods
Microbiota sampling and storage
Gut microbiota. Stool from a healthy adult volunteer was freshly collected. Three different
ways were considered for DNA extraction: 1) from a fraction of the fresh stool using different
commercial kits (see DNA extraction section); 2) from the fresh stool that was previously
frozen (-20°C); 3) from two fractions of the initial fresh stool treated with storage commercial
kits. The first fraction was treated with the Omnigene® Gut kit (DNA Genotek, Ottawa,
Canada) and the second with the PSP® Spin stool DNA plus kit (Stratec Molecular GmbH,
Berlin, Germany). Both resulting media were kept at 4°C until DNA extraction.
Skin microbiota. Non-invasive skin samples were collected from forehead and forearm of
healthy volunteers, by swabbing with sterile 5x5cm gauze, pre-moistened with a sterile
solution of 0.15 M NaCl – 0.1% Tween 20. Gauze samples were collected in a 50 ml plastic
tube and frozen at -20°C until DNA extraction.
DNA extraction
Gut microbiota. Four different commercial kits were used for DNA extraction following the
manufacturer recommendations:
PowerLyzer® PowerSoil® DNA Isolation Kit (MO BIO Laboratories, Inc., Carlsbad, USA)
QIAamp Fast DNA Stool Kit (Qiagen, Valencia, USA)
PSP® Spin stool DNA plus kit (Stratec Molecular GmbH, Berlin, Germany)
FastDNA™ SPIN Kit for Feces (MP Biomedicals, Santa Ana, USA).
Skin microbiota. Each tube was fully filled up with NaCl-Tween 20 solution, horizontally
shaked 15 min at 800 rpm. The NaCl-Tween 20 suspension was transferred in new tubes
and gauzes spined-dry (biological safety cabinet) to collect most of the suspension volume.
Suspensions were then centrifuged at 8000 rpm during 30 min to obtain a cell pellet. DNA
extraction was performed using different methods based on mechanical lysis:
1) In house method. Briefly, the cell pellet was resuspended with a hexadecyl-
trimethylammonium bromide (CTAB) extraction buffer (0.5 mL) and split in two equal parts
for duplicated DNA extractions. An equal volume of phenol-chloroform-isoamyl alcohol
25:24:1 was added to each cell suspension. Cells were lysed 30 sec at 5.5 m/s using the
FastPrep FP120 bead beating system (Bio-101, Vista, California). Samples were
centrifuged 10 min, 4°C at 11 000 rpm. The aqueous phase was collected and the
precipitation was done with 2 volumes of 100% ethanol and 1/10 volume of 5 M NaCl at
4°C, overnight. Then, purification was performed using the Illustra GFX purification kit (GE
HealthCare, Pittsburgh, USA) to obtain highly purified DNA.
2) Commercial method. The PowerLyzer® PowerSoil® DNA Isolation Kit (MO BIO
Laboratories, Inc., Carlsbad, USA) was used, according to the recommended procedure.
Sequencing
16S rRNA gene sequencing
Illumina technology. Sequencing was performed with the MiSeq device (Illumina, Inc., San
Diego, CA, USA) through a 600 cycles paired-end run, targeting three 16S variable regions:
the V1V2 region; amplicon length: 350 bp,
the V3V4 region (16S-Mi341F forward primer 5’- CCTACGGGNGGCWGCAG-3’ and
16S-Mi805R reverse primer 5’-GACTACHVGGGTATCTAATCC-3’) producing about
460 bp amplicons,
and the V4V5 region; amplicon lengths: 425 bp (V4V5a) and 470 bp (V4V5b).
PCR1s were performed as follows: 4 µL of template DNA (20 ng) were mixed with 0.6 µL of
each reverse and forward primers (10 µM), 6 µL of KAPA HiFi Fidelity Buffer (5X), 0.9 µL of
KAPA dNTP Mix (10 mM each), 6.5 µL of distilled water (DH2O), and 0.6 µL of KAPA HiFi
hotstart Taq (1 U/µL), for a total volume of 30 µL. Each amplification was duplicated, and
duplicates were pooled after amplification. PCR1 cycles consisted of 95°C for 3 min and then
27 cycles of 95°C for 30 s, 59°C for 30 s, and 72°C for 30 s, followed by a final extension at
72°C for 5 min, with a MJ Research PTC200 thermocycler. Negative controls were included
in all steps to check for contamination. All duplicate pools were controlled by gel
electrophoresis, and amplicons were quantified using fluorometry.
Libraries ready for analyses were then produced following the Illumina guidelines for 16S
metagenomics libraries preparation. Briefly, the PCR1 amplicons were purified and controlled
using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, USA). To enable the
simultaneous analysis of multiple samples (multiplexing), Nextera® XT indexes (Illumina)
were added during PCR2 using between 15 to 30 ng of PCR1 amplicons. PCR2 cycles
consisted of 94°C for 1 min and then 12 cycles of 94°C for 60 s, 65°C for 60 s, and 72°C for
60 s, followed by a final extension at 72°C for 10 min. Indexed libraries were purified,
quantified and controlled using an Agilent 2100 Bioanalyzer. Validated indexed libraries were
pooled in order to obtain an equimolar mixture.
The run was achieved on MiSeq sequencer (Illumina) using the MiSeq Reagent Kit v3 600
cycles (Illumina). It allowed an output of 25 million of paired-end reads of 300 bases, i.e. up
to 15 Gigabases. The libraries and the MiSeq run were performed by libragen, at the GeT-
PlaGe platform (INRA, Auzeville, France).
After MiSeq run, raw data sequences were demultiplexed and quality-checked to remove all
reads with ambiguous bases. Indexes and primers sequences were then trimmed, and the
forward and reverse sequences were paired. The paired-sequences were then treated using
Qiime pipeline [11] to remove chimeras and reads with PCR errors. Good quality paired-
sequences were mapped to the RDP database (Release 11, update 3;
http://rdp.cme.msu.edu/) for taxonomic assignation. Assigned sequences were finally split
into Operational Taxonomic Unit (OTU) at a 3% dissimilarity level.
Roche technology. Sequence analysis was achieved using the pyrosequencing method of
DNA sequencing. It was performed by Beckman Coulter Genomics Company (Danvers, MA,
USA) through the 454 GS FLX platform technology, targeting two 16S variable regions: the
V1V3 region (16S-0027F forward primer 5’-AGAGTTTGATCCTGGCTCAG-3’ and 16S-
0533R reverse primer 5’-TTACCGCGGCTGCTGGCAC-3’) producing about 520 bp
amplicons and the V4V6 region (16S-0515F forward primer 5’-TGYCAGCMGCCGCGGTA-3’
and 16S-1061R reverse primer 5’-TCACGRCACGAGCTGACG-3’) producing about 560bp
amplicons. To enable the simultaneous analysis of multiple samples (multiplexing), GS FLX
Standard Multiplex Identifiers (MID, Roche Life Sciences, Indianapolis, IN, USA) were tailed
to each end of the primers.
The PCRs were carried out as follows: 3 ng of each template DNA were mixed with 2.5 µL of
each reverse and forward primers (10 µM), 1.25 unit of Promega Taq polymerase (Promega
Corporation, Madison USA) and 7.9 µL of distilled water (DH2O). PCR cycles consisted of
95°C for 3 min and then 25 cycles of 95°C for 1 min, 58°C for 30 s, and 72°C for 1 min,
followed by a final extension at 72°C for 5 min, with a MJ Research PTC200 thermocycler
(Ramsey, MN, USA). Negative controls were included in all steps to check for contamination.
Bioinformatic treatment of raw data sequences including the demultiplexing of sequenced
samples, the assembling of forward and reverse sequences (clustering step), the Blastn
analysis (i.e. searching nucleotide databases using nucleotides queries[12]) of both clustered
and singletons reads against a curated copy of the 16S RDP database [13], and the
taxonomic assignments through comparison of taxonomy and scores of the 25 best hits for
each blasted sequence (MEGAN4) [14], was implemented by Beckman Coulter Genomics
Company.
Whole (meta)Genome Sequencing (WGS)
25 ng of metagenomic DNA were used for the library preparation. Metagenomic DNA was
fragmented in a Covaris™ M220 instrument (Woburn, MA, USA) to an average size of
approximatively 250 bp, according to the supplier suggested protocol. Fragmented DNA was
used to synthesize indexed sequencing libraries using the TruSeq Nano DNA Sample Prep
Kit (Illumina, Inc., San Diego, CA, USA) according to the manufacturer recommended
protocol. Cluster generation was performed on the cBOT instrument using the TruSeq PE
Cluster Kit v3 reagents (Illumina). Libraries were sequenced with an Illumina HiSeq 2000
using the TruSeq SBS Kit v3 reagents (Illumina) for paired end sequencing with reads
lengths of 150 base pairs (300 cycles). High throughput sequencing reads were quality
filtered using the fastq_quality_filter program provided with the FASTX-Toolkit. Only the
reads with a quality score higher than 17 for at least 80% of the read length (i.e., probability
of correct base call close to 98%) were conserved.
Gene catalogs for each sample were created using the MOCAT pipeline [14]. Briefly, the
pipeline performs quality control of the raw reads, removes human contamination by
mapping to the reference human genome, assembles the reads and predicts protein-coding
genes on the assembled overlapping reads (contigs) and scaftigs (contigs that were
extended and linked using the paired-end information of sequencing reads).
Predicted proteins were compared to the non-redundant NCBI RefSeq database using
BLAST [4]. Taxonomic analysis was based on the NCBI taxonomy; functional analysis was
performed by MEGAN4 using the SEED classification [13,14-16]. Taxonomic analysis is
performed by placing each sequence read onto a node of the NCBI taxonomy, based on
gene content. For each read that matches the sequence of some gene, the program places
the read on to the lowest common ancestor (LCA) node of those taxa in the taxonomy that
are known to have that gene. This is called the LCA algorithm.
Results
Impact of the DNA extraction methods
DNA extraction was carried out as follows:
human gut microbiota study: in triplicate from a unique frozen stool using four
commercial kits;
skin microbiota study: in duplicate from a unique sample using an in house procedure
and a commercial kit.
Sequencing targeted the 16S rRNA gene V3V4 region (Illumina technology).
Gut microbiota
The first step of the experimental plan was to confirm that the selected kits for DNA
extraction gave a stool microbial profile at the phylum level globally consistent with the actual
knowledge, i.e. showing a majority of Firmicutes and Bacteroidetes [17]. The second step
was to assess the reproducibility of the different methods for DNA extraction, considering
that a good reproducibility between triplicates would also deliver a good reproducibility of the
sequencing process (reproducibility will indeed be evaluated on the basis of sequencing
results).
Figure 1. Relative abundance (%) of the most abundant phyla when considering 4 commercial kits.
DNA extraction kit suppliers: Stratec Molecular GmbH (STRATEC), MP Biomedicals (MP), MO BIO
Laboratories (MOBIO) and Qiagen (QIAGEN).
Figure 1 reports the relative abundances of the most abundant phyla obtained with the four
selected DNA extraction kits. All the obtained phyla profiles are coinciding with the present
knowledge about the stool microbial composition; nevertheless, some variations regarding
the used DNA extraction kit can be observed. Indeed, the Verrucomicrobia phylum is only
detected using the kits supplied by MP Biomedicals and Stratec Molecular GmbH. Relative
abundances are respectively 1.2% and 0.1%.
The heat map representation (Figure 2) shows that the triplicates of a given method are
gathered: the Pearson Product Moments Correlation values are between 0.91 and 1; such a
range of values indicates that results are overlapping. As a consequence, it will be possible
to work with the triplicates average counts for the next analyses. There are some differences
between the extraction methods since the genetic information obtained in each case can’t be
superposed with that from the others.
Figure 2. Observed distances, based on the normalized genus counts, between samples from a
unique frozen stool treated with four different DNA extraction kits (triplicates). Dark blue squares
highlight close samples; white squares indicate strong differences between samples. The notations -1,
-2 and -3 mark the DNA extraction triplicates. DNA extraction kits suppliers: Qiagen (Qia), Stratec
Molecular GmbH (Str), MO BIO Laboratories (Mo), MP Biomedicals (MP).
Figure 3 reports the (r) values between the different methods, based on the normalized
genus counts. The methods involving the kit supplied by MO BIO Laboratories and by MP
Biochemicals gave a (r) value of 0.9718. The other (r) values are lower than 0.9 and linear
correlations between samples can‘t be considered.
Figure 3. Correlation studies of the normalized genus counts between the different extraction methods.
Plots illustrate the linear relation between each condition, and the value of the correlation coefficients
(r) highlights the strength of this relation. DNA extraction kits suppliers: Qiagen, Stratec Molecular
GmbH (Stratec), MO BIO Laboratories (MoBio), MP Biomedicals (MP).
The Qiagen QIAmp Fast DNA Stool Kit was chosen as reference to evaluate the efficiency of
the other kits (Figure 4). Strictly the same genera were detected for all the 4 selected
extraction methods. However, differences were observed in the relative occurrence of each
genus. For example, using the extraction kits supplied by MO BIO Laboratories and MP
Biomedicals, the Bacteroides genus appeared significantly less abundant than when using
the kits supplied by Qiagen and Stratec Molecular GmbH.
Figure 4. Each DNA extraction kit is compared to the Qiagen QIAmp Fast DNA Stool kit. A Fold
Change of “+2” implies a doubling of the normalized read counts for the compared kit in regard to the
Qiagen kit. A Fold Change of “-2” implies a 2 fold reduction. Significant differences are marked with a
star *. DNA extraction kit suppliers: Qiagen (Qia), Stratec Molecular GmbH (Stra), MO BIO
Laboratories (Mo), MP Biomedicals (MP).
Skin microbiota
DNA was extracted from a human skin microbiota sample using either an in house method or
a commercial kit. Sequencing using the Illumina technology was performed by targeting the
V3V4 region of the 16S rRNA gene. Results reported in Figure 5 show the correlations
obtained when considering two libraries obtained with the in house method (Lib-1 and Lib-2)
and two libraries obtained using the PowerLyzer® PowerSoil® DNA Isolation kit from MO BIO
Laboratories (MoBio-1 and MoBio-2). The extractions made in duplicate overlap (r > 0.99)
when considering the results at the genus level.
-6
-4
-2
0
2
4
Fo
ld C
ha
ng
e
(Mo)/(Qia) (MP)/(Qia) (Stra)/(Qia)
* * *
*
*
* * * *
* *
* *
*
*
Figure 5. Correlation studies of the normalized genus counts between the different extraction methods.
Plots illustrate the linear relation between each condition, and the correlation coefficients (r) highlight
the strength of this relation.
Despite the strong correlation between the observed profiles, the relative proportions
obtained using one method are different from those obtained using the other method (Figure
6). In comparison with the profiles obtained with the in house method, the kit provided by MO
BIO Laboratories seems to enhance the proportion of Propionibacterium and to reduce the
Corynebacterium proportion.
Figure 6. Relative abundances (%) of the most abundant genera. Conditions: DNA from a human skin
microbiota sample; DNA extraction using either an in house method (LIB), or a MO BIO Laboratories
commercial kit (MOBIO).
Impact of samples storage conditions
Four different storage conditions of a stool sample were compared in order to evaluate the
impact of the storage on the observed microbiota composition. The microbiota profile of the
fresh stool was regarded as the reference. DNA extraction kit provided by Qiagen® was used.
Sequencing using the Illumina technology was performed by targeting the V3V4 region of the
16S rRNA gene.
When considering the bacterial community composition at the genus level, (r) values higher
than 0.96 were obtained; all the profiles were thus well correlated (Figure 7).
Figure 7. Correlation studies of the normalized genus counts between the following storage conditions:
fresh, frozen, treated with Omnigene® Gut kit (Omnigut_Stab) and PSP
® Spin stool DNA plus kit
(Stratec_Stab). Plots illustrate the linear relation between each condition, and the correlation
coefficients (r) highlight the strength of this relation.
In accordance with the higher than 0.96 (r) values, the distribution of the 15 most abundant
genera is slightly dispersed (Figure 8). It can thus be stated that the storage condition has
only a quiet impact on the microbial composition.
Figure 8. Box plot representation of the normalized reads counts for the most abundant genera among
the 4 storage conditions.
Impact of the 16S rRNA gene targeted region and the primers set selection
The impact of targeting different regions of 16s rRNA gene, or using different primers sets,
on the observed composition of a given microbial community was studied as follows:
Gut microbiota: DNA was extracted from a frozen stool with the kit provided by
Qiagen;
Skin Microbiota study: DNA was extracted using the in house method.
Sequencing was carried out using the Illumina technology.
Gut microbiota
Considering the relative abundance of the most abundant genera (Figure 9), the same
genera are observed for the four primers sets, but with different levels of relative abundance.
It must be mentioned that the most abundant genera are representative of the human stool
microbiota of healthy human adult. As far as the impact of the primers sets used is
concerned, V1V2 and V3V4 regions gave very similar diversity patterns at the genus level.
When using the V4V5a primers set, the Akkermansia genus was very highly represented.
The V4V5b primers set favors the representation of the Rhodobacter genus and limits that of
the Ruminococcus genus.
The apparent proximity between the V1V2 and V3V4 profiles is clearly confirmed by the
Pearson Product Moments Correlation value of 0.9290 reported in Figure 10. The (r) values
between the V4V5b profile and either the V1V2 or V3V4 profiles are rather high, respectively
0.8403 and 0.9053. The (r) values between the V4V5a profile and the V3V4, V1V2 and
V4V5b profiles are rather low, respectively 0.8166, 0.7892 and 0.6499.
Figure 9. Relative abundance (%) of the most abundant genera. DNA was extracted from a frozen
stool. 16S rRNA gene regions were targeted using 4 primers sets: V1V2, V3V4, V4Va and V4V5b.
Amplicons were then sequenced using Illumina technology.
Figure 10. Correlation studies of the normalized genus counts between the different primers sets.
Plots illustrate the linear relation between each condition, and the correlation coefficients (r) highlight
the strength of this relation.
Skin microbiota
Figure 11 shows the relative abundances of the most abundant genera obtained when using
the selected four primers sets.
Figure 11. Relative abundance (%) of the most abundant genera. Conditions: DNA extracted from a
human skin microbiota sample using an in house method (LIB); targeted 16S rRNA gene regions:
V1V2, V3V4, V4V5a, V4V5b; sequencing using Illumina technology.
The V3V4 and V4V5a primers sets gave very close profiles. In comparison with the V3V4,
V4V5a and V4V5b profiles, the profile obtained with the V1V2 primers set shows a high
proportion of the genus Propionibacterium and an extremely low representation of the genus
Corynebacterium. V4V5b primers set allowed to obtain a profile closer to that obtained with
the V3V4 and V4V5a than the one obtained with V1V2 primers sets.
Impact of PCR
The possible bias introduced by the PCR amplification was assessed through the
comparison of the profile produced from a selected sample of skin microbiota either with
PCR of the V4V6 and V1V3 regions of the 16S gene (sequenced using Roche technology) or
without PCR. In the latter case, sequencing was performed using the Whole (meta)Genome
Sequencing (WGS) approach. DNA extraction was done through the in house method.
Figure 12 shows the different microbial profiles obtained using either amplicon-based
sequencing (V1V3 and V4V6) or WGS approaches.
Figure 12. Relative abundance of the most abundant orders, regarding the sequencing method. V1V3
and V4V6 make reference to the targeted regions of the 16S gene sequenced by pyrosequencing.
The WGS approach produces a relative abundance of the two major orders (Bacillales and
Actinomycetales) similar of that displayed using the V3V4 amplicon-based approach.
Additional information is specifically provided by the WGS approach, concerning the relative
abundance of fungal organisms (Malasseziales and Ustilaginales, see also Table 1). Using
the V1V3 primers set reveals a highly unbalanced microbiota with the two major orders
Bacillales and Actinomycetales. In return, using the V4V6 primers set, a more balanced
bacterial diversity is displayed. The orders reported in Table 1 show that some bacterial
orders (e.g. Burkholderiales and Pasteurellales) need a PCR step to be detected. These low
abundance orders would require a huge effort of sequencing to be detected using WGS
approaches.
ORDERS KINGDOM WGS V4V6 V1V3
BACILLALES Bacteria 42,0 31,8 8,3
ACTINOMYCETALES Bacteria 30,2 28,4 83,4
MALASSEZIALES Fungi 13,8 - -
USTILAGINALES Fungi 7,5 - -
TREMELLALES Fungi 1,0 - -
SCHIZOSACCHAROMYCETALES Fungi 0,8 - -
SACCHAROMYCETALES Fungi 0,7 - -
EUROTIALES Fungi 0,3 - -
LACTOBACILLALES Bacteria 0,3 3,2 0,7
CLOSTRIDIALES Bacteria 0,3 8,2 1,9
ANAEROLINEALES Bacteria 0,0 1,6 0,2
BURKHOLDERIALES Bacteria 0,0 7,5 0,6
NEISSERIALES Bacteria 0,0 1,5 0,2
PASTEURELLALES Bacteria 0,0 3,2 1,0
PSEUDOMONADALES Bacteria 0,0 0,8 0,2
SPHINGOMONADALES Bacteria 0,0 0,8 0,0
SPIROCHAETALES Bacteria 0,0 1,0 0,2
Table 1. Relative abundance (%) of the most abundant orders, regarding the sequencing method.
V1V3 and V4V6 refer to the targeted regions of the pyrosequenced 16S rRNA gene.
Discussion
Samples storage and DNA extraction reproducibility
Sampling, samples storage before their processing and DNA extraction are critical steps,
which can greatly affect the produced results. Processing fresh microbial samples can only
be considered as a theoretical scenario, since microbial sampling and DNA processing are
most often carried out at different places. Immediate sample freezing is a widely used
approach, from which stabilization of microbial communities is expected. From a practical
point of view, the requirement of a freezer or dry ice for sample freezing may be an issue.
The stabilization of samples mediated by homogenization in stabilizing solutions, is a more
recent commercially available strategy claimed to allow a good an even better maintenance
of the microbial community integrity. Though in the processing sequence, DNA extraction
comes after material sampling and sample storage, the focus was first on DNA extraction
methods since a sine qua non condition to continue the investigations was that the obtained
results were in global accordance with the actually accepted gut or skin microbiota. In both
cases, the most abundant phyla (stool) and genera (stool and skin) obtained with the
selected DNA extraction kits were those usually reported in the literature. The reliability and
robustness of DNA extraction methods was investigated considering the gut microbiota. The
Pearson correlation values ranged, for each set of triplicate, from 0.91 to 1.0: it was then
concluded that, for each extraction kit, results were overlapping. Nevertheless, when looking
at more precise taxonomic information, in particular as far as less abundant phyla or orders
are concerned, some differences could be detected from one to other DNA extraction kit. The
less performant DNA extraction kit in term of triplicate results closeness was also the one for
which the DNA yield was the lowest.
For the impact of storage conditions on the microbial community profile, it was shown that
the three options have a quite low impact when compared to the fresh stool results.
Impact of methodological choices on revealed diversity
With the objective of describing the microbiome diversity, the method for DNA extraction has
first to be chosen. For human skin microbiota, both tested methods gave the same results in
term of microbial profile. According to our experience, technical expertise and habits would
be relevant for the choice. For human gut microbiome, even if the methods gave very close
global results, significantly different results were obtained when considering less abundant
phyla or orders. Depending on the context (existing anterior studies, data about the presence
of certain phyla, classes or genera, looking for most abundant organisms or less abundant
ones), several DNA extraction methods will be considered.
The second choice is about the requirement or not of PCR, this requirement being linked to
the sequencing technology. The strategy to choose is case dependent since PCR is not
required in the WGS approach and thus no amplification biases will alter the population
composition in one hand, and in the other hand, WGS only allows access to the most
dominant taxa, giving as a consequence an incomplete representation of the microbiome. On
the contrary, PCR-based approach enables to zoom on some specific taxa, which could be
non-dominant. Then, two possible objectives can be considered:
to formally characterize a given microbiota: a WGS approach with strong allocation of
sequencing depth should be preferred;
to evaluate the dynamic modification of a given microbial community over time or
after applying a treatment: the PCR-based approach should be preferred.
Finally, if the PCR-based approach is selected, the most appropriate primers sets for the
study must be defined. Since it has just been shown that some primers sets promote the
relative abundance of specific genus, like for the DNA extraction methods, the most
appropriate primers set for the study must be chosen.
Conclusion
The objectives of this work were to have detailed and reliable information able to properly
establish experimental plans dedicated to the characterization of microbiomes. Supported by
two different microbiomes and consequently different microorganisms populations (phyla to
orders), it clearly shows that several technical solutions exist for DNA extraction. The
evaluated techniques are robust but not strictly identical in the populations identified, in terms
of quantity and quality. In addition, several storage techniques exist that preserve the
integrity of the microbial populations from sampling to DNA extraction and thus proximity
between the collecting unit and the analysis team is not mandatory. Once highly purified
DNA is obtained, methodological choices (in terms of sequencing method, bioinformatics, …)
must be made that will severely impact the metagenomics analyses final results. Some
knowledge of the environment of interest may direct these choices (provided that the
available tools are sufficiently characterized to be able to discriminate between them). In the
case of completely unknown environments, the less biased solution would be selected.
It is important to note that neither targeted sequencing nor WGS approaches are absolute
quantitative methods. They allow to compare environments from a space-time perspective,
and, when quantitative data are required, to formulate assumptions that have to be confirmed
with specialized techniques, for example RT-PCR.
Considering the taxonomic analyses, the present weakness of assignation is due to the short
length of reads; indeed, usually obtained 300 bp reads do not allow a good resolution, and
assignation beyond the genus level is very hazardous. WGS enable to inform about the
species, provided that literature is sufficiently rich. It also allows to inform about potential
functions, with the risk of missing the lowest abundant species.
In order to circumvent the short reads issues, the long reads Pacific Biosciences technology
(http://www.pacificbiosciences.com/) is a very promising technique we are investigating to
sequence full length amplified 16S rRNA genes.
Finally, the take home message is that metagenomics analyses require appropriate
experimental conditions from sampling to data processing and the libragen 15 years
experience in the field is a clear advantage.
Bibliographic references
1. Handelsman, J. et al. (1998) Chem. Biol., 5(10): 245-249
2. Tringe, S.G. and Rubin E.M. (2005) Nat. Rev. Genet. 6(11): 805-814
3. Surano, N.K. and Kasper D.L. (2014) J. Clin. Invest., 124(10): 4197-4203
4. Manichanh, C. et al. (2006) Gut 55(2): 205-211
5. Lefevre, F. et al. (2007) Biocat. Biotrans., 25(2-4): 242-250
6. Lefevre, F. et al. (2008) Res. Microbiol., 159(3): 153-161
7. Cardona, S. et al. (2012) BMC Micribiol., 12: 158-168
8. Guo, F. et al. (2013) PLoS one 8(10): e76185
9. Starke, I.C. et al. (2014) Mol. Biol. Int., 2014: 548683
10. Zhou, Q. et al. (2014) Sci. Rep., 4: e6957
11. Caporaso J.G. et al. (2010) Nat. Methods, 7(5): 335-336
12. Altschul, S.F. et al. (1997) Nucleic Acids Res., 25(17): 3389-3402
13. Huson, D.H. et al. (2011) Genome Res., 21(9): 1552-1560
14. Overbeek, R. et al. (2005) Nucleic Acids Res., 33(17): 5691-5702
15. Kultima, J.R. et al. (2012) PLoS One, 7(10): e47656
16. Huson, D.H. et al. (2007) Genome Res., 17(3): 377-386
17. Durban, A. et al. (2011) Microb. Ecol., 61(1): 123-133
Recommended