Methodological impact on metagenomics analyses: the skin...

Methodological impact on metagenomics analyses: the skin microbiome

and beyond

Cyrille Jarrin1, Patrick Robe1, Daniel Auriol1, David Villanova1, Kuno Schweikert2

1 Libragen, Canal Biotech, 3 rue des satellites, F-31400, Toulouse, France 2 Induchem, industriestrasse 8a, CH-8604 Volketswil, Switzerland

Introduction

The term metagenome was coined by Handelsman and coworkers in 1998 [1] and was

defined as the collective genome of microflora found in a defined environment.

Metagenomics is a culture-independent biomolecular way of analyzing environmental

samples of cohabiting microbial populations that has allowed biologists to access to the huge

“hard to isolate” fraction of environmental microbial communities. The initial studies aiming at

the characterization of the full extent of microbial diversity included DNA purification,

fragmentation and cloning in an easily cultivable host. The recombinant cells were

subsequent grown to obtain sufficient DNA for the subsequent analyses. These analyses

were designed to elucidate taxonomic compositions (16S rRNA) or to identify functional

properties (in particular using sometimes sophisticated methods of detection of enzymatic

activities). With the advent of the so called next generation technologies of sequencing

(NGS), metagenome cloning is no longer necessary since the required DNA amounts are by

far less important. Often nowadays inaccurately restricted to the study of environmental

samples using sequencing [2], metagenomics has promoted a considerable increase in

knowledge of both the taxonomic and functional microbial diversity of natural ecosystems.

Moreover, the exploration of microbial communities associated with human body sites (gut,

skin, mouth, vagina …) enables the deciphering of close relationships between human health

and inhabiting microbiota (e.g. [3]).

Realizing the potential of metagenomics for discovering novel genes from the yet untapped

microbial diversity, libragen has performed for 15 years analyses of microbial communities

[4] as well as research programs aiming at the discovery of new and performant biocatalysts

and metabolic pathways that give solutions to industrial issues [5-6].

The apparent practical simplicity of DNA extraction from natural samples using dedicated

commercial kits, and the explosion of NGS facilities, allows to define the genetic diversity of

bacterial communities and enables prediction of associated gene functions. However, the

applied procedures in the publications are non-standardized, sometimes poorly described.

As a consequence, results can hardly be compared in particular when a defined environment

such as human body is considered.

It must be mentioned that many sources of technical biases that can significantly impact the

results have been identified: experimental design, sampling, sample storage [7], insufficient

purity of extracted nucleic acids, inappropriate selection of 16S variable region [8], poor

choice of primers set [9], insufficient control of produced libraries, inappropriate raw data

processing [10] or deficient statistical analyses.

The objective of libragen was to evaluate more precisely the impact of the conditions of

sample storage, DNA extraction methods and primers set selection on the observed

taxonomic profiles of two selected environments, human gut and human skin. This will allow

to have detailed and reliable methodological information when establishing an experimental

plan dedicated to the characterization of microbiomes.

Materials and Methods

Microbiota sampling and storage

Gut microbiota. Stool from a healthy adult volunteer was freshly collected. Three different

ways were considered for DNA extraction: 1) from a fraction of the fresh stool using different

commercial kits (see DNA extraction section); 2) from the fresh stool that was previously

frozen (-20°C); 3) from two fractions of the initial fresh stool treated with storage commercial

kits. The first fraction was treated with the Omnigene® Gut kit (DNA Genotek, Ottawa,

Canada) and the second with the PSP® Spin stool DNA plus kit (Stratec Molecular GmbH,

Berlin, Germany). Both resulting media were kept at 4°C until DNA extraction.

Skin microbiota. Non-invasive skin samples were collected from forehead and forearm of

healthy volunteers, by swabbing with sterile 5x5cm gauze, pre-moistened with a sterile

solution of 0.15 M NaCl – 0.1% Tween 20. Gauze samples were collected in a 50 ml plastic

tube and frozen at -20°C until DNA extraction.

DNA extraction

Gut microbiota. Four different commercial kits were used for DNA extraction following the

manufacturer recommendations:

PowerLyzer® PowerSoil® DNA Isolation Kit (MO BIO Laboratories, Inc., Carlsbad, USA)

QIAamp Fast DNA Stool Kit (Qiagen, Valencia, USA)

PSP® Spin stool DNA plus kit (Stratec Molecular GmbH, Berlin, Germany)

FastDNA™ SPIN Kit for Feces (MP Biomedicals, Santa Ana, USA).

Skin microbiota. Each tube was fully filled up with NaCl-Tween 20 solution, horizontally

shaked 15 min at 800 rpm. The NaCl-Tween 20 suspension was transferred in new tubes

and gauzes spined-dry (biological safety cabinet) to collect most of the suspension volume.

Suspensions were then centrifuged at 8000 rpm during 30 min to obtain a cell pellet. DNA

extraction was performed using different methods based on mechanical lysis:

1) In house method. Briefly, the cell pellet was resuspended with a hexadecyl-

trimethylammonium bromide (CTAB) extraction buffer (0.5 mL) and split in two equal parts

for duplicated DNA extractions. An equal volume of phenol-chloroform-isoamyl alcohol

25:24:1 was added to each cell suspension. Cells were lysed 30 sec at 5.5 m/s using the

FastPrep FP120 bead beating system (Bio-101, Vista, California). Samples were

centrifuged 10 min, 4°C at 11 000 rpm. The aqueous phase was collected and the

precipitation was done with 2 volumes of 100% ethanol and 1/10 volume of 5 M NaCl at

4°C, overnight. Then, purification was performed using the Illustra GFX purification kit (GE

HealthCare, Pittsburgh, USA) to obtain highly purified DNA.

2) Commercial method. The PowerLyzer® PowerSoil® DNA Isolation Kit (MO BIO

Laboratories, Inc., Carlsbad, USA) was used, according to the recommended procedure.

Sequencing

16S rRNA gene sequencing

Illumina technology. Sequencing was performed with the MiSeq device (Illumina, Inc., San

Diego, CA, USA) through a 600 cycles paired-end run, targeting three 16S variable regions:

the V1V2 region; amplicon length: 350 bp,

the V3V4 region (16S-Mi341F forward primer 5’- CCTACGGGNGGCWGCAG-3’ and

16S-Mi805R reverse primer 5’-GACTACHVGGGTATCTAATCC-3’) producing about

460 bp amplicons,

and the V4V5 region; amplicon lengths: 425 bp (V4V5a) and 470 bp (V4V5b).

PCR1s were performed as follows: 4 µL of template DNA (20 ng) were mixed with 0.6 µL of

each reverse and forward primers (10 µM), 6 µL of KAPA HiFi Fidelity Buffer (5X), 0.9 µL of

KAPA dNTP Mix (10 mM each), 6.5 µL of distilled water (DH2O), and 0.6 µL of KAPA HiFi

hotstart Taq (1 U/µL), for a total volume of 30 µL. Each amplification was duplicated, and

duplicates were pooled after amplification. PCR1 cycles consisted of 95°C for 3 min and then

27 cycles of 95°C for 30 s, 59°C for 30 s, and 72°C for 30 s, followed by a final extension at

72°C for 5 min, with a MJ Research PTC200 thermocycler. Negative controls were included

in all steps to check for contamination. All duplicate pools were controlled by gel

electrophoresis, and amplicons were quantified using fluorometry.

Libraries ready for analyses were then produced following the Illumina guidelines for 16S

metagenomics libraries preparation. Briefly, the PCR1 amplicons were purified and controlled

using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, USA). To enable the

simultaneous analysis of multiple samples (multiplexing), Nextera® XT indexes (Illumina)

were added during PCR2 using between 15 to 30 ng of PCR1 amplicons. PCR2 cycles

consisted of 94°C for 1 min and then 12 cycles of 94°C for 60 s, 65°C for 60 s, and 72°C for

60 s, followed by a final extension at 72°C for 10 min. Indexed libraries were purified,

quantified and controlled using an Agilent 2100 Bioanalyzer. Validated indexed libraries were

pooled in order to obtain an equimolar mixture.

The run was achieved on MiSeq sequencer (Illumina) using the MiSeq Reagent Kit v3 600

cycles (Illumina). It allowed an output of 25 million of paired-end reads of 300 bases, i.e. up

to 15 Gigabases. The libraries and the MiSeq run were performed by libragen, at the GeT-

PlaGe platform (INRA, Auzeville, France).

After MiSeq run, raw data sequences were demultiplexed and quality-checked to remove all

reads with ambiguous bases. Indexes and primers sequences were then trimmed, and the

forward and reverse sequences were paired. The paired-sequences were then treated using

Qiime pipeline [11] to remove chimeras and reads with PCR errors. Good quality paired-

sequences were mapped to the RDP database (Release 11, update 3;

http://rdp.cme.msu.edu/) for taxonomic assignation. Assigned sequences were finally split

into Operational Taxonomic Unit (OTU) at a 3% dissimilarity level.

Roche technology. Sequence analysis was achieved using the pyrosequencing method of

DNA sequencing. It was performed by Beckman Coulter Genomics Company (Danvers, MA,

USA) through the 454 GS FLX platform technology, targeting two 16S variable regions: the

V1V3 region (16S-0027F forward primer 5’-AGAGTTTGATCCTGGCTCAG-3’ and 16S-

0533R reverse primer 5’-TTACCGCGGCTGCTGGCAC-3’) producing about 520 bp

amplicons and the V4V6 region (16S-0515F forward primer 5’-TGYCAGCMGCCGCGGTA-3’

and 16S-1061R reverse primer 5’-TCACGRCACGAGCTGACG-3’) producing about 560bp

amplicons. To enable the simultaneous analysis of multiple samples (multiplexing), GS FLX

Standard Multiplex Identifiers (MID, Roche Life Sciences, Indianapolis, IN, USA) were tailed

to each end of the primers.

The PCRs were carried out as follows: 3 ng of each template DNA were mixed with 2.5 µL of

each reverse and forward primers (10 µM), 1.25 unit of Promega Taq polymerase (Promega

Corporation, Madison USA) and 7.9 µL of distilled water (DH2O). PCR cycles consisted of

95°C for 3 min and then 25 cycles of 95°C for 1 min, 58°C for 30 s, and 72°C for 1 min,

followed by a final extension at 72°C for 5 min, with a MJ Research PTC200 thermocycler

(Ramsey, MN, USA). Negative controls were included in all steps to check for contamination.

Bioinformatic treatment of raw data sequences including the demultiplexing of sequenced

samples, the assembling of forward and reverse sequences (clustering step), the Blastn

analysis (i.e. searching nucleotide databases using nucleotides queries[12]) of both clustered

and singletons reads against a curated copy of the 16S RDP database [13], and the

taxonomic assignments through comparison of taxonomy and scores of the 25 best hits for

each blasted sequence (MEGAN4) [14], was implemented by Beckman Coulter Genomics

Company.

Whole (meta)Genome Sequencing (WGS)

25 ng of metagenomic DNA were used for the library preparation. Metagenomic DNA was

fragmented in a Covaris™ M220 instrument (Woburn, MA, USA) to an average size of

approximatively 250 bp, according to the supplier suggested protocol. Fragmented DNA was

used to synthesize indexed sequencing libraries using the TruSeq Nano DNA Sample Prep

Kit (Illumina, Inc., San Diego, CA, USA) according to the manufacturer recommended

protocol. Cluster generation was performed on the cBOT instrument using the TruSeq PE

Cluster Kit v3 reagents (Illumina). Libraries were sequenced with an Illumina HiSeq 2000

using the TruSeq SBS Kit v3 reagents (Illumina) for paired end sequencing with reads

lengths of 150 base pairs (300 cycles). High throughput sequencing reads were quality

filtered using the fastq_quality_filter program provided with the FASTX-Toolkit. Only the

reads with a quality score higher than 17 for at least 80% of the read length (i.e., probability

of correct base call close to 98%) were conserved.

Gene catalogs for each sample were created using the MOCAT pipeline [14]. Briefly, the

pipeline performs quality control of the raw reads, removes human contamination by

mapping to the reference human genome, assembles the reads and predicts protein-coding

genes on the assembled overlapping reads (contigs) and scaftigs (contigs that were

extended and linked using the paired-end information of sequencing reads).

Predicted proteins were compared to the non-redundant NCBI RefSeq database using

BLAST [4]. Taxonomic analysis was based on the NCBI taxonomy; functional analysis was

performed by MEGAN4 using the SEED classification [13,14-16]. Taxonomic analysis is

performed by placing each sequence read onto a node of the NCBI taxonomy, based on

gene content. For each read that matches the sequence of some gene, the program places

the read on to the lowest common ancestor (LCA) node of those taxa in the taxonomy that

are known to have that gene. This is called the LCA algorithm.

Results

Impact of the DNA extraction methods

DNA extraction was carried out as follows:

human gut microbiota study: in triplicate from a unique frozen stool using four

commercial kits;

skin microbiota study: in duplicate from a unique sample using an in house procedure

and a commercial kit.

Sequencing targeted the 16S rRNA gene V3V4 region (Illumina technology).

Gut microbiota

The first step of the experimental plan was to confirm that the selected kits for DNA

extraction gave a stool microbial profile at the phylum level globally consistent with the actual

knowledge, i.e. showing a majority of Firmicutes and Bacteroidetes [17]. The second step

was to assess the reproducibility of the different methods for DNA extraction, considering

that a good reproducibility between triplicates would also deliver a good reproducibility of the

sequencing process (reproducibility will indeed be evaluated on the basis of sequencing

results).

Figure 1. Relative abundance (%) of the most abundant phyla when considering 4 commercial kits.

DNA extraction kit suppliers: Stratec Molecular GmbH (STRATEC), MP Biomedicals (MP), MO BIO

Laboratories (MOBIO) and Qiagen (QIAGEN).

Figure 1 reports the relative abundances of the most abundant phyla obtained with the four

selected DNA extraction kits. All the obtained phyla profiles are coinciding with the present

knowledge about the stool microbial composition; nevertheless, some variations regarding

the used DNA extraction kit can be observed. Indeed, the Verrucomicrobia phylum is only

detected using the kits supplied by MP Biomedicals and Stratec Molecular GmbH. Relative

abundances are respectively 1.2% and 0.1%.

The heat map representation (Figure 2) shows that the triplicates of a given method are

gathered: the Pearson Product Moments Correlation values are between 0.91 and 1; such a

range of values indicates that results are overlapping. As a consequence, it will be possible

to work with the triplicates average counts for the next analyses. There are some differences

between the extraction methods since the genetic information obtained in each case can’t be

superposed with that from the others.

Figure 2. Observed distances, based on the normalized genus counts, between samples from a

unique frozen stool treated with four different DNA extraction kits (triplicates). Dark blue squares

highlight close samples; white squares indicate strong differences between samples. The notations -1,

-2 and -3 mark the DNA extraction triplicates. DNA extraction kits suppliers: Qiagen (Qia), Stratec

Molecular GmbH (Str), MO BIO Laboratories (Mo), MP Biomedicals (MP).

Figure 3 reports the (r) values between the different methods, based on the normalized

genus counts. The methods involving the kit supplied by MO BIO Laboratories and by MP

Biochemicals gave a (r) value of 0.9718. The other (r) values are lower than 0.9 and linear

correlations between samples can‘t be considered.

Figure 3. Correlation studies of the normalized genus counts between the different extraction methods.

Plots illustrate the linear relation between each condition, and the value of the correlation coefficients

(r) highlights the strength of this relation. DNA extraction kits suppliers: Qiagen, Stratec Molecular

GmbH (Stratec), MO BIO Laboratories (MoBio), MP Biomedicals (MP).

The Qiagen QIAmp Fast DNA Stool Kit was chosen as reference to evaluate the efficiency of

the other kits (Figure 4). Strictly the same genera were detected for all the 4 selected

extraction methods. However, differences were observed in the relative occurrence of each

genus. For example, using the extraction kits supplied by MO BIO Laboratories and MP

Biomedicals, the Bacteroides genus appeared significantly less abundant than when using

the kits supplied by Qiagen and Stratec Molecular GmbH.

Figure 4. Each DNA extraction kit is compared to the Qiagen QIAmp Fast DNA Stool kit. A Fold

Change of “+2” implies a doubling of the normalized read counts for the compared kit in regard to the

Qiagen kit. A Fold Change of “-2” implies a 2 fold reduction. Significant differences are marked with a

star *. DNA extraction kit suppliers: Qiagen (Qia), Stratec Molecular GmbH (Stra), MO BIO

Laboratories (Mo), MP Biomedicals (MP).

Skin microbiota

DNA was extracted from a human skin microbiota sample using either an in house method or

a commercial kit. Sequencing using the Illumina technology was performed by targeting the

V3V4 region of the 16S rRNA gene. Results reported in Figure 5 show the correlations

obtained when considering two libraries obtained with the in house method (Lib-1 and Lib-2)

and two libraries obtained using the PowerLyzer® PowerSoil® DNA Isolation kit from MO BIO

Laboratories (MoBio-1 and MoBio-2). The extractions made in duplicate overlap (r > 0.99)

when considering the results at the genus level.

(Mo)/(Qia) (MP)/(Qia) (Stra)/(Qia)

* * * *

Figure 5. Correlation studies of the normalized genus counts between the different extraction methods.

Plots illustrate the linear relation between each condition, and the correlation coefficients (r) highlight

the strength of this relation.

Despite the strong correlation between the observed profiles, the relative proportions

obtained using one method are different from those obtained using the other method (Figure

6). In comparison with the profiles obtained with the in house method, the kit provided by MO

BIO Laboratories seems to enhance the proportion of Propionibacterium and to reduce the

Corynebacterium proportion.

Figure 6. Relative abundances (%) of the most abundant genera. Conditions: DNA from a human skin

microbiota sample; DNA extraction using either an in house method (LIB), or a MO BIO Laboratories

commercial kit (MOBIO).

Impact of samples storage conditions

Four different storage conditions of a stool sample were compared in order to evaluate the

impact of the storage on the observed microbiota composition. The microbiota profile of the

fresh stool was regarded as the reference. DNA extraction kit provided by Qiagen® was used.

Sequencing using the Illumina technology was performed by targeting the V3V4 region of the

16S rRNA gene.

When considering the bacterial community composition at the genus level, (r) values higher

than 0.96 were obtained; all the profiles were thus well correlated (Figure 7).

Figure 7. Correlation studies of the normalized genus counts between the following storage conditions:

fresh, frozen, treated with Omnigene® Gut kit (Omnigut_Stab) and PSP

® Spin stool DNA plus kit

(Stratec_Stab). Plots illustrate the linear relation between each condition, and the correlation

coefficients (r) highlight the strength of this relation.

In accordance with the higher than 0.96 (r) values, the distribution of the 15 most abundant

genera is slightly dispersed (Figure 8). It can thus be stated that the storage condition has

only a quiet impact on the microbial composition.

Figure 8. Box plot representation of the normalized reads counts for the most abundant genera among

the 4 storage conditions.

Impact of the 16S rRNA gene targeted region and the primers set selection

The impact of targeting different regions of 16s rRNA gene, or using different primers sets,

on the observed composition of a given microbial community was studied as follows:

Gut microbiota: DNA was extracted from a frozen stool with the kit provided by

Qiagen;

Skin Microbiota study: DNA was extracted using the in house method.

Sequencing was carried out using the Illumina technology.

Gut microbiota

Considering the relative abundance of the most abundant genera (Figure 9), the same

genera are observed for the four primers sets, but with different levels of relative abundance.

It must be mentioned that the most abundant genera are representative of the human stool

microbiota of healthy human adult. As far as the impact of the primers sets used is

concerned, V1V2 and V3V4 regions gave very similar diversity patterns at the genus level.

When using the V4V5a primers set, the Akkermansia genus was very highly represented.

The V4V5b primers set favors the representation of the Rhodobacter genus and limits that of

the Ruminococcus genus.

The apparent proximity between the V1V2 and V3V4 profiles is clearly confirmed by the

Pearson Product Moments Correlation value of 0.9290 reported in Figure 10. The (r) values

between the V4V5b profile and either the V1V2 or V3V4 profiles are rather high, respectively

0.8403 and 0.9053. The (r) values between the V4V5a profile and the V3V4, V1V2 and

V4V5b profiles are rather low, respectively 0.8166, 0.7892 and 0.6499.

Figure 9. Relative abundance (%) of the most abundant genera. DNA was extracted from a frozen

stool. 16S rRNA gene regions were targeted using 4 primers sets: V1V2, V3V4, V4Va and V4V5b.

Amplicons were then sequenced using Illumina technology.

Figure 10. Correlation studies of the normalized genus counts between the different primers sets.

Plots illustrate the linear relation between each condition, and the correlation coefficients (r) highlight

the strength of this relation.

Skin microbiota

Figure 11 shows the relative abundances of the most abundant genera obtained when using

the selected four primers sets.

Figure 11. Relative abundance (%) of the most abundant genera. Conditions: DNA extracted from a

human skin microbiota sample using an in house method (LIB); targeted 16S rRNA gene regions:

V1V2, V3V4, V4V5a, V4V5b; sequencing using Illumina technology.

The V3V4 and V4V5a primers sets gave very close profiles. In comparison with the V3V4,

V4V5a and V4V5b profiles, the profile obtained with the V1V2 primers set shows a high

proportion of the genus Propionibacterium and an extremely low representation of the genus

Corynebacterium. V4V5b primers set allowed to obtain a profile closer to that obtained with

the V3V4 and V4V5a than the one obtained with V1V2 primers sets.

Impact of PCR

The possible bias introduced by the PCR amplification was assessed through the

comparison of the profile produced from a selected sample of skin microbiota either with

PCR of the V4V6 and V1V3 regions of the 16S gene (sequenced using Roche technology) or

without PCR. In the latter case, sequencing was performed using the Whole (meta)Genome

Sequencing (WGS) approach. DNA extraction was done through the in house method.

Figure 12 shows the different microbial profiles obtained using either amplicon-based

sequencing (V1V3 and V4V6) or WGS approaches.

Figure 12. Relative abundance of the most abundant orders, regarding the sequencing method. V1V3

and V4V6 make reference to the targeted regions of the 16S gene sequenced by pyrosequencing.

The WGS approach produces a relative abundance of the two major orders (Bacillales and

Actinomycetales) similar of that displayed using the V3V4 amplicon-based approach.

Additional information is specifically provided by the WGS approach, concerning the relative

abundance of fungal organisms (Malasseziales and Ustilaginales, see also Table 1). Using

the V1V3 primers set reveals a highly unbalanced microbiota with the two major orders

Bacillales and Actinomycetales. In return, using the V4V6 primers set, a more balanced

bacterial diversity is displayed. The orders reported in Table 1 show that some bacterial

orders (e.g. Burkholderiales and Pasteurellales) need a PCR step to be detected. These low

abundance orders would require a huge effort of sequencing to be detected using WGS

approaches.

ORDERS KINGDOM WGS V4V6 V1V3

BACILLALES Bacteria 42,0 31,8 8,3

ACTINOMYCETALES Bacteria 30,2 28,4 83,4

MALASSEZIALES Fungi 13,8 - -

USTILAGINALES Fungi 7,5 - -

TREMELLALES Fungi 1,0 - -

SCHIZOSACCHAROMYCETALES Fungi 0,8 - -

SACCHAROMYCETALES Fungi 0,7 - -

EUROTIALES Fungi 0,3 - -

LACTOBACILLALES Bacteria 0,3 3,2 0,7

CLOSTRIDIALES Bacteria 0,3 8,2 1,9

ANAEROLINEALES Bacteria 0,0 1,6 0,2

BURKHOLDERIALES Bacteria 0,0 7,5 0,6

NEISSERIALES Bacteria 0,0 1,5 0,2

PASTEURELLALES Bacteria 0,0 3,2 1,0

PSEUDOMONADALES Bacteria 0,0 0,8 0,2

SPHINGOMONADALES Bacteria 0,0 0,8 0,0

SPIROCHAETALES Bacteria 0,0 1,0 0,2

Table 1. Relative abundance (%) of the most abundant orders, regarding the sequencing method.

V1V3 and V4V6 refer to the targeted regions of the pyrosequenced 16S rRNA gene.

Discussion

Samples storage and DNA extraction reproducibility

Sampling, samples storage before their processing and DNA extraction are critical steps,

which can greatly affect the produced results. Processing fresh microbial samples can only

be considered as a theoretical scenario, since microbial sampling and DNA processing are

most often carried out at different places. Immediate sample freezing is a widely used

approach, from which stabilization of microbial communities is expected. From a practical

point of view, the requirement of a freezer or dry ice for sample freezing may be an issue.

The stabilization of samples mediated by homogenization in stabilizing solutions, is a more

recent commercially available strategy claimed to allow a good an even better maintenance

of the microbial community integrity. Though in the processing sequence, DNA extraction

comes after material sampling and sample storage, the focus was first on DNA extraction

methods since a sine qua non condition to continue the investigations was that the obtained

results were in global accordance with the actually accepted gut or skin microbiota. In both

cases, the most abundant phyla (stool) and genera (stool and skin) obtained with the

selected DNA extraction kits were those usually reported in the literature. The reliability and

robustness of DNA extraction methods was investigated considering the gut microbiota. The

Pearson correlation values ranged, for each set of triplicate, from 0.91 to 1.0: it was then

concluded that, for each extraction kit, results were overlapping. Nevertheless, when looking

at more precise taxonomic information, in particular as far as less abundant phyla or orders

are concerned, some differences could be detected from one to other DNA extraction kit. The

less performant DNA extraction kit in term of triplicate results closeness was also the one for

which the DNA yield was the lowest.

For the impact of storage conditions on the microbial community profile, it was shown that

the three options have a quite low impact when compared to the fresh stool results.

Impact of methodological choices on revealed diversity

With the objective of describing the microbiome diversity, the method for DNA extraction has

first to be chosen. For human skin microbiota, both tested methods gave the same results in

term of microbial profile. According to our experience, technical expertise and habits would

be relevant for the choice. For human gut microbiome, even if the methods gave very close

global results, significantly different results were obtained when considering less abundant

phyla or orders. Depending on the context (existing anterior studies, data about the presence

of certain phyla, classes or genera, looking for most abundant organisms or less abundant

ones), several DNA extraction methods will be considered.

The second choice is about the requirement or not of PCR, this requirement being linked to

the sequencing technology. The strategy to choose is case dependent since PCR is not

required in the WGS approach and thus no amplification biases will alter the population

composition in one hand, and in the other hand, WGS only allows access to the most

dominant taxa, giving as a consequence an incomplete representation of the microbiome. On

the contrary, PCR-based approach enables to zoom on some specific taxa, which could be

non-dominant. Then, two possible objectives can be considered:

to formally characterize a given microbiota: a WGS approach with strong allocation of

sequencing depth should be preferred;

to evaluate the dynamic modification of a given microbial community over time or

after applying a treatment: the PCR-based approach should be preferred.

Finally, if the PCR-based approach is selected, the most appropriate primers sets for the

study must be defined. Since it has just been shown that some primers sets promote the

relative abundance of specific genus, like for the DNA extraction methods, the most

appropriate primers set for the study must be chosen.

Conclusion

The objectives of this work were to have detailed and reliable information able to properly

establish experimental plans dedicated to the characterization of microbiomes. Supported by

two different microbiomes and consequently different microorganisms populations (phyla to

orders), it clearly shows that several technical solutions exist for DNA extraction. The

evaluated techniques are robust but not strictly identical in the populations identified, in terms

of quantity and quality. In addition, several storage techniques exist that preserve the

integrity of the microbial populations from sampling to DNA extraction and thus proximity

between the collecting unit and the analysis team is not mandatory. Once highly purified

DNA is obtained, methodological choices (in terms of sequencing method, bioinformatics, …)

must be made that will severely impact the metagenomics analyses final results. Some

knowledge of the environment of interest may direct these choices (provided that the

available tools are sufficiently characterized to be able to discriminate between them). In the

case of completely unknown environments, the less biased solution would be selected.

It is important to note that neither targeted sequencing nor WGS approaches are absolute

quantitative methods. They allow to compare environments from a space-time perspective,

and, when quantitative data are required, to formulate assumptions that have to be confirmed

with specialized techniques, for example RT-PCR.

Considering the taxonomic analyses, the present weakness of assignation is due to the short

length of reads; indeed, usually obtained 300 bp reads do not allow a good resolution, and

assignation beyond the genus level is very hazardous. WGS enable to inform about the

species, provided that literature is sufficiently rich. It also allows to inform about potential

functions, with the risk of missing the lowest abundant species.

In order to circumvent the short reads issues, the long reads Pacific Biosciences technology

(http://www.pacificbiosciences.com/) is a very promising technique we are investigating to

sequence full length amplified 16S rRNA genes.

Finally, the take home message is that metagenomics analyses require appropriate

experimental conditions from sampling to data processing and the libragen 15 years

experience in the field is a clear advantage.

Bibliographic references

1. Handelsman, J. et al. (1998) Chem. Biol., 5(10): 245-249

2. Tringe, S.G. and Rubin E.M. (2005) Nat. Rev. Genet. 6(11): 805-814

3. Surano, N.K. and Kasper D.L. (2014) J. Clin. Invest., 124(10): 4197-4203

4. Manichanh, C. et al. (2006) Gut 55(2): 205-211

5. Lefevre, F. et al. (2007) Biocat. Biotrans., 25(2-4): 242-250

6. Lefevre, F. et al. (2008) Res. Microbiol., 159(3): 153-161

7. Cardona, S. et al. (2012) BMC Micribiol., 12: 158-168

8. Guo, F. et al. (2013) PLoS one 8(10): e76185

9. Starke, I.C. et al. (2014) Mol. Biol. Int., 2014: 548683

10. Zhou, Q. et al. (2014) Sci. Rep., 4: e6957

11. Caporaso J.G. et al. (2010) Nat. Methods, 7(5): 335-336

12. Altschul, S.F. et al. (1997) Nucleic Acids Res., 25(17): 3389-3402

13. Huson, D.H. et al. (2011) Genome Res., 21(9): 1552-1560

14. Overbeek, R. et al. (2005) Nucleic Acids Res., 33(17): 5691-5702

15. Kultima, J.R. et al. (2012) PLoS One, 7(10): e47656

16. Huson, D.H. et al. (2007) Genome Res., 17(3): 377-386

17. Durban, A. et al. (2011) Microb. Ecol., 61(1): 123-133

Methodological impact on metagenomics analyses: the skin...

Documents

Industrial Metagenomics Brochure

introduction to metagenomics

AAAS Metagenomics 021910 Final

K-mers in Metagenomics - Bioinformaticsbioinformatics.org.au/.../sites/9/...Donovan-Parks.pdf · K-mers in Metagenomics by donovan parks. 2o 2f 7 metagenomics environmental sample

Metagenomics reveals flavour metabolic network of cereal ...sihua.ivyunion.org/QT/Metagenomics reveals flavour... · Metagenomics reveals ﬂavour metabolic network of cereal vinegar

Metagenomics. What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics

Metagenomics and 16S - DTU

Metagenomics Biocuration 2013

Workshop 11: Metagenomics Analysis · Shannon-Wiener diversity index alpha_diversity.py -s. d) microbiome diversity analyses alpha diversity (microbial community evenness and richness,

Parks kmer metagenomics

Phylotastic metagenomics

Amplicon sequencing / Metagenomics

[13.07.07] albertsen mewe13 metagenomics

GLBIO/CCBC Metagenomics Workshop

A Primer on Metagenomics

Microbial Ecology of the Skin in the Era of Metagenomics and Molecular Microbiologyperspectivesinmedicine.cshlp.org/content/3/12/a015362... · 2013-11-22 · Microbial Ecology of

Metagenomics - GoSeqIt · Metagenomics 2 Metagenomics is the study of genetic material recovered directly from environmental samples. Ex: Soil (rain forrest) Fecal samples (human

Workshop 11: Metagenomics Analysis

Metagenomics Research Review

[2013.10.29] albertsen genomics metagenomics