Upload
ueb
View
401
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Course: Bioinformatics for Biomedical Research (2014). Session: 2.1.3- Next Generation Sequencing. Technologies and Applications. Part III: NGS Applications II. Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
Citation preview
1
Vall d’Hebron Institut de Recerca (VHIR)
Rosa PrietoHead of the High Tech Unit
15/05/2014
Institut d’Investigació Sanitària acreditat per l’Instituto de Salud Carlos III (ISCIII)
NEXT GENERATION SEQUENCING TECHNOLOGIES AND APPLICATIONS
CURS OF BIOINFORMATICS FOR BIOMEDICAL RESEARCH
2
INTRODUCTION TO NGS1
2
3
4
Index
NGS TECHNOLOGY OVERVIEW
NGS APPLICATIONS OVERVIEW
CURS OF BIOINFORMATICS FOR BIOMEDICAL RESEARCH
WHAT IS NEXT IN SEQUENCING TECHNOLOGIES?
NGS applications
-Amplicon sequencing-Targeted DNA resequencing-Exome sequencing-Whole genome sequencing
-Metagenomics
-RNA sequencing-Targeted RNA resequencing
-Epigenomics-Sequencing of free DNA-RNA (plasma/serum)
4
Metagenomics is the study of a collection of genetic material (genomes) from a mixed community of organisms.Metagenomics usually refers to the study of microbial communities.
2
What can we study?
•The biosphere contains between 1030 and 1031 microbial genomes, at least 2–3 orders of magnitude morethan the number of plant and animal cells combined.•Microbes associated with the human body outnumber human cells by at least a factor of ten.•The vast majority cannot be cultured.
Metagenomics
5
2
(16S rRNA)
The 16S rRNA gene is comprised of highly conserved regions interspersed with more variable regions, allowing PCRprimers to be designed that are complementary to universally conserved regions flanking variable regions.Wu et al. BMC Microbiol. 2010; 10: 206.
Unidirectional sequencing
Types of metagenomics studies using NGS
-Population screening and diversity-Genome assembly-Gene prediction and annotation-Functional genomics-Ecology
-Taxonomy
7
2
Sampling and pyrosequencing methods for characterizing bacterial communities in the humangut using 16S sequence tags.
Wu et al. BMC Microbiol. 2010; 10: 206.
This is a study of methods for surveying bacterial communities in human feces using 454/Roche pyrosequencing of 16SrRNA gene tags.
Comparison of different methods of sample storage (no effect), DNA extraction and purification (great effect), set ofprimers for amplification of several variable regions (effect) and GS FLX vs. GS FLX Titanium sequencing (no effect).
Composition of the gut microbiome in the ten subjects studied.
We did find that the choice of 16S rRNA gene regionused for analysis had a noticeable effect, with the V6-V9region representing an outlier.The V6-V9 primers consistently showed the lowestpercentage of taxonomic assignments at the genuslevel.We note that our choice of V6-V9 primer andsequencing direction did not cover the V6 regionsefficiently.
Types of metagenomics studies using NGS
8
2 NIH Human Microbiome Project
“our other genome”
9
•To establish associationsbetween the genes of the humanintestinal microbiota and ourhealth and disease.• Focused on two disorders ofincreasing importance in Europe,Inflammatory Bowel Disease(IBD) and obesity.
2 MetaHit Project
Intestinal microbiota deep-sequencing for patient stratification:•rich microbiota •poor microbiota (obesity, metabolic disturbance, weight increase)
The obese individuals among the lower bacterial richness group also gain more weight over time. Only a few bacterialspecies are sufficient to distinguish between individuals with high and low bacterial richness, and even between lean andobese participants. Our classifications based on variation in the gut microbiome identify subsets of individuals in thegeneral white adult population who may be at increased risk of progressing to adiposity-associated co-morbidities.
10
The first Genomics technique: microarrays
One gene at a time
Many genes at the same time
PRE-GENOMICS ERA
GENOMICS ERA
Description of two-colour arrays
11
What is a microarray?
SOLID SURFACE
PROBES
SAMPLE(TARGET)
Fluorescence scanning
Image analysis
Raw data
14
Wang et al., Nat. Rev. Genetics 10 (2009)
4
500 pg RNAt 100 pg RNAt (Illumina), 10 pg (ultralow Illumina), 500 pg (Roche)
RNAseq vs microarrays for transcriptome analysis
•Much more sensitive than microarrays•Higher dynamic range•Real count of sequences vs. Fluorescence intensities•All RNA species can be sequenced (microarrays probes more focused on coding genes)•Available for all kinds of organisms•Protocols optimized for very low input •Cost is getting rapidly reduced
15
RNAseq library construction
Very high dynamic range (105 to 107)
16
Total RNAseq
Nat. Rev. Genetics 2009
more than 95% of the transcripts willbe ribosomal
17
•Poly A+ selection for mRNAseq: 1st strand synthesis done on oligodTattached to magnetic beads.
PROs: very effective at removing ribosomal species.Less sequencing required for the same coverage compared to tRNA.
CONs: RNA quality is an issue (degraded RNA makes it difficult to sequence 5’)Many RNA species get lost (non coding, miRNA…)
•Standard library construction does not preserve directionality (butprotocols are available to generate libraries that do preserve strandness). This may be particularlyuseful for finding unannotated genes and ncRNAs and for de-novo sequencing.
•Small RNAseq requires specific isolation and RNA library construction protocols.
•FFPE or very poor quality samples also can be sequenced using specific kits and protocols thatnot rely on polyA tails
•Illumina and Ion Torrent sell specific kits for all these kinds of RNA libraries.•Targeted RNA custom panels also exists.
Other kinds of RNA libraries
Third generation sequencing: PacBio RSII
•AMPLIFICATION OF SAMPLE IS NOT REQUIRED (LOW INPUT, AVOID BIAS, MORE UNIFORM COVERAGE, ANALYSIS OF HETEROGENEUS SAMPLES)
•SMRT Technology (Single Molecule Real Time): highly processive DNApol+ labeled phospholinked fluorescent nucleotides recorded in real time → direct observation of nucleotide incorporation
•Long reads (6-10 kb), a small number of reads up to 18 kb
•Single reads show very high error rate (15% compared to 0,1-1% of other platforms), but stochastic, improved by circular consensus sequencing (consensus sequence of high quality)
•Amplification not required (avoids bias, more uniform coverage)
•Quick delivery of results (runs last from 30 min to 3 hr)
•No problem for GC rich regions. Modification status of the template nucleotides (5-mC, 5-hmC) seen
http://smrt.med.cornell.edu/Strategies.html
2016: end of 454 commercialization and support by Roche
https://ncifrederick.cancer.gov/atp/cms/wp-content/uploads/2011/10/pacbio_technology_backgrounder.pdf
Oxford Nanopore Technologies
https://www.nanoporetech.com/technology/the-minion-device-a-miniaturised-sensing-system/the-minion-device-a-miniaturised-sensing-system
Third generation sequencing: nanopore technology
https://www.nanoporetech.com/technology/introduction-to-nanopore-sensing/introduction-to-nanopore-sensing
GridION
Expected to be released in late Nov.2014
1000$ genome for everybody
??
•18 Tb/run, 2x150 bp length•Human sequencing only•Bioinformatics/interpretation not included
In:-Macrogen (Seoul)-Broad Institute in Cambridge (Massachusetts)-Garvan Institute (Sydney)
Human genomes at 30x coverage
2012
2014
1000$ genome for everybody
And now….. what?
-Sequencing capabilities have been dramatically increased, so obtaining Tb of sequences is no longer an issue.
-Issues to deal with:
Data managing
Clinical information
VHIR’s HIGH TECHNOLOGY UNIT (UAT)
•Genomics•Metabolomics•Cytomics•Microscopy
•Statistic and Bioinformatics Unit
Unitat d’Alta Tecnologia (UAT)VHIR-Mediterrània Building-Ground floor
We offer a set of high-tech services that support teaching activities and research activities in the biomedical field: