Transcript
Page 1: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Next Generation SequencingApplications

Sylvain Foret

March 2010

http://dayhoff.anu.edu.au/~sf/next_gen_seq

Page 2: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

1 Genome sequencing

2 Transcriptome sequencing

3 Bisulfite sequencing

4 ChIP-seq

Page 3: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

1 Genome sequencingDe novo genome sequencingGenome resequencing

2 Transcriptome sequencing

3 Bisulfite sequencing

4 ChIP-seq

Page 4: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

De Novo Genome Sequencing

Definition

Sequencing a genome from scratch, without any pre-existingtemplate

DNA fragmentation

Sequencing

DNA extraction

Biological Sample

Page 5: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

How many sequences?

Coverage depth

coverage = a =NL

G

where N is the number of reads, L the read size, and G thegenome size.Assuming that reads are uniformly distributed, and ignoring endeffects, the probability of a read starting in an interval [x , x + h] ish/G .The number of reads falling in this interval is this a binomialdistribution of mean Nh/G .For large N (many reads) and small h (h = L, reads are small), thenumber of reads covering a segment of size L can be approximatedwith a Poisson distribution of mean a.

Page 6: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

How many sequences?

Proportion of the genome covered

Coverage 2 4 6 8

Expected proportion 0.864 0 .981 0.997 0.999

Expected contig size 1,600 6,700 33,500 186,000

NB: the Poisson approximation usually overestimates the actualproportion covered.

Page 7: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Genome Assembly

Reads

Contigs

Scaffolds

Super−Scaffolds

Page 8: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Building Contigs

Alignments: theory

Aligning 2 sequences of size n has complexity o(n2)

Aligning m sequences has complexity o(nm)

⇒ Need faster algorithms

Alignments: heuristics

Find ‘similar’ reads by looking for common words (o(n))

Align clusters of similar reads

Allow for more mismatches at the ends of the reads

Page 9: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Building Scaffolds

Physical map

For instance: micro-satellites

One marker on the contig: located

Two markers on the contig: oriented

Mate pairs

One mate pair: oriented with other contig

Can provide accurate distance between contigs

Long insert libraries (cosmids, fosmids) are usually part ofgenome sequencing projects

Page 10: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Super-Scaffolds

Any other type of information ...

Weak matches (eg poor quality reads)

ESTs

Protein homology

Long range PCR

. . .

Often a manual (and tedious) process

Page 11: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Next Generation Sequencing

Which Technique?

The curse of repeats and low complexity

454 is a reasonable choice

Other technologies mainly applied to prokaryotes

However: Panda genome sequencing with Illumina (!!!)

Page 12: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

1 Genome sequencingDe novo genome sequencingGenome resequencing

2 Transcriptome sequencing

3 Bisulfite sequencing

4 ChIP-seq

Page 13: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Genome Resequencing

Definition

Sequencing the genome of a species with a sequenced genome.Reads are mapped onto this template, no assembly is involved.

DNA fragmentation

Sequencing

DNA extraction

Biological Sample

Page 14: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Genome Resequencing

Looking for differences

Single nucleotide polymorphisms (SNPs)

Insertions and deletions

Other molecular markers: micro-satellites, mini-satellites, ...

Segmental duplications and other genomic re-arrangements

. . .

Page 15: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

SNPs

ATAGCAGTGCACACGTGCGCACAATATACGACAAACGTTACC

ATAGCAGTGCACACGTGCGCACAATATACGACAAACGTTACCATAGTAGTGCACACGTGCGCACAATATCCGACAAACGTTACC

ATAGCAGTGCACACGTGCGCACAATATACGACAAACGTTACCATAGTAGTGCACACGTGCGCACAATATCCGACAAACGTTACC

ATAGCAGTGCACACGTGCGCACAATATACGACAAACGTTACCATAGTAGTGCACACGTGCGCACAATATCCGACAAACGTTACC

ATAGCAGTGCACACGTGCGCACAATATACGACAAACGTTACCATAGTAGTGCACACGTGCGCACAATATCCGACAAACGTTACC

ATAGCAGTGCACACGTGCGCACAATATACGACAAACGTTACCATAGTAGTGCACACGTGCGCACAATATACGACAAACGTTACC

ATAGCAGTGCACACGTGCGCACAATATACGACAAACGTTACCATAGTAGTGCACACGTGCGCACAATATACGACAAACGTTACC

ATAGCAGTGCACACGTGCGCACAATATACGACAAACGTTACCATAGTAGTGCACACGTGCGCACAATATACGACAAACGTTACC

ATAGCAGTGCACACGTGCGCACAATATACGACAAACGTTACCATAGTAGTGCACACGTGCGCACAATATACGACAAACGTTACC

Homozygous SNP Heterozygous SNP

Template

Reads

Source: http://solid.appliedbiosystems.com

Page 16: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Genomic Re-Arrangements

Source: http://solid.appliedbiosystems.com

Page 17: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Applications

Resequencing applications

Comparing closely related species (eg Homo sapiens vs H.neandertalis)

Genome wide association studies (GWAS)

Tumor-associated mutations

. . .

Page 18: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Targeted Resequencing

Page 19: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Next Generation Sequencing

Which Technique?

To resequence the same species, small reads are morecost-effective

For different species, 454 may be preferable

Page 20: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Resequencing Example: Myeloid Leukaemia

From Ley et al, Nature 2008

Page 21: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Resequencing Example: Maternal Blood

From Chiu et al, PNAS 2008

Page 22: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

1 Genome sequencing

2 Transcriptome sequencingDe novo transcriptome sequencingTranscriptome profilingDifferential gene expression

3 Bisulfite sequencing

4 ChIP-seq

Page 23: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

De Novo Transcriptome Sequencing

Pros

‘Genome of the poor’: Only a small proportion of eukaryoticgenomes is protein coding. Therefore sequencing atranscriptome is cheaper than a genome.

Can give more information than a genome: genes can be hardto predict in silico. Here, no need for prediction.

Provides access to alternative splicing.

Cons

No insight into the non-expressed functional elements

Adequate coverage is difficult for genes expressed at low level

Long transcripts can be difficult to sequence entirely

Page 24: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Transcriptome Assembly

Assembly

Same basic procedure as for genomes (reads → contigs)

BUT:

Genomes are linear segments (or circular)Transcripts are graphs of alternatively spliced exonsNo assembler can currently handle this

E1 E2 E3

E1 E2 E3

transcript 1

E3E1

transcript 2

E1

E2

E3

splice graph

Page 25: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Next Generation Sequencing

Which Technique?

Longer reads make assembly easier

Short reads, especially with mate pairs can be useful tocomplement an existing assembly

Page 26: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

1 Genome sequencing

2 Transcriptome sequencingDe novo transcriptome sequencingTranscriptome profilingDifferential gene expression

3 Bisulfite sequencing

4 ChIP-seq

Page 27: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Transcriptome Profiling

Genome + Transcriptome

Combining a high-quality genome assembly with high-throughputtranscriptome sequencing has provided unprecedented insight intothe complexity of eukaryotic transcriptomes.

Page 28: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Mapping Reads

From Cloonan et al, Nature Methods 2008

Page 29: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Mapping Reads

Multiple hits

Read size M1 M5 M10 M100

25 62% 33% 5% 2%

30 73% 20% 5% 2%

35 79% 17% 4% 2%

From Cloonan et al, Nature Methods 2008

Page 30: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Saturating the Transcriptome

From Cloonan et al, Nature Methods 2008

Page 31: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Recent Discoveries

Transcriptome profiling breakthroughs

Alternative splicing: 92-94% of human genes undergoalternative splicing

Patterns of alternative splicing are highly dynamic

Discovery of many non-coding RNAs (ncRNA)

Page 32: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Next Generation Sequencing

Which Technique?

Short reads are more cost-effective

Mate pairs can improve mapping

Mate pairs impose restrictions on sequence size

Page 33: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

1 Genome sequencing

2 Transcriptome sequencingDe novo transcriptome sequencingTranscriptome profilingDifferential gene expression

3 Bisulfite sequencing

4 ChIP-seq

Page 34: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Differential Gene Expression

Definition

Identifying genes expressed at different levels in differentconditions.Examples:

Diseased vs healthy

Treated vs non-treated

Mutant vs wild-type

Dose response

More complex, factorial designs

Page 35: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Differential Gene Expression

Assumptions

Number of reads from a given transcript is proportional to:

molar concentrationlength of transcript

A possible unit of measurement is: reads of per kilobase ofexon model per million mapped reads (RPKM, Mortazavi etal, Nature Methods 2008)

From Mortazavi et al, Nature Methods 2008

Page 36: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Statistical modelling

The model

Hypothesis: number of reads mapping to a given gene is aPoisson random variable

Recall Poisson is the limit of binomial as the number of ‘trials’gets big but the probability of ‘success’ gets small

bin(n, p) = Pois(µ) as n→∞, p → 0, np = µ

Here, n ∼ 108,and for a given gene ‘j’:

pj =number of transcripts from gene j in flow cell

total number of transcripts in flow cell∼ 10−3 − 10−6

Then number of reads of gene j is Poisson with mean

µj = npj ∼ 102 − 105

Page 37: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Poisson distribution and empirical distribution

From Marioni et al, Genome Research 2008

Page 38: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Statistical modelling

Hypothesis testing

Null hypothesis: µj1 = µj2

Alternate hypothesis: µj1 6= µj2

Procedure

xjk ∼ Pois(µjk) where µjk = Ckpj

Note: µ means estimate of µ.If the reads are distributed randomly amongst the N samples:

Xj =N∑

k=1

(xjk − µjk)2

µjk∼ χ2

N−1

Page 39: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Next Generation Sequencing

Which Technique?

Multiplexing can be very useful:

Technical or biological replicatesComplex factorial designsCost savings

Page 40: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

1 Genome sequencing

2 Transcriptome sequencing

3 Bisulfite sequencingCpG methylationGenome-wide CpG profiles

4 ChIP-seq

Page 41: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

DNA Methylation

Biological significance

DNA methylation involves the addition of a methyl group tosome nucleotides

Present in all realms of life

Involved in various functions

Can be inherited

DNA methylation in animals

Mostly CpG dinucleotides

Gene silencing (chromatin remodelling)

Imprinting

Widespread in mammals

Involved in a number of diseases: cancer, obesity, . . .

Poorly understood

Page 42: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

DNA Methylation: An Example

DNA methylation in the honeybee

Some insects have a mammalian-like methylase gene set

For instance, the honeybee

Workers and queens, same genome

Dnmt3 knockdown ⇒ queens

This illustrates the importance of methylation in theintegration of environmental clues

Page 43: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

1 Genome sequencing

2 Transcriptome sequencing

3 Bisulfite sequencingCpG methylationGenome-wide CpG profiles

4 ChIP-seq

Page 44: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Bisulfite Sequencing

Page 45: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Next Generation Sequencing

Which Technique?

454: longer is better (loss of complexity), but morehomopolymers

Short reads are more cost-effective

Mate pairs can improve mapping

SOLiD has the advantage of color-space

Page 46: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Bisulfite Sequencing Example: Leukaemia

From Taylor et al, Cancer Research 2008

Page 47: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

1 Genome sequencing

2 Transcriptome sequencing

3 Bisulfite sequencing

4 ChIP-seqMethodExampleRibosome profiling

Page 48: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

DNA-Protein Interactions

Page 49: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

ChIP-seq

From Mardis, Nature Methods 2007

Page 50: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

ChIP-seq

Source: http://solid.appliedbiosystems.com

Page 51: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

1 Genome sequencing

2 Transcriptome sequencing

3 Bisulfite sequencing

4 ChIP-seqMethodExampleRibosome profiling

Page 52: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Histone Profiles

Page 53: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Histone Profiles

From Schones and Zhao, Nature Review Genetics 2008

Page 54: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Histone Profiles

From Barski et al, Cell 2007

Page 55: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

1 Genome sequencing

2 Transcriptome sequencing

3 Bisulfite sequencing

4 ChIP-seqMethodExampleRibosome profiling

Page 56: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Ribosome Profiling

Problems with RNA-based methods

RNA-based methods (RNA-seq, microarrays, quantitativePCR, . . . ) provide a proxy to protein concentration

However, these methods ignore post-transcriptional events

Ribosome profiling provides a better proxy to proteinconcentration

Ribosome profiling

Technology similar to ChIP-seq

Measures RNA sequences attached to ribosomes

Very new, might or might not be practical

Page 57: Next Generation Sequencing - Applications - MSImaths-people.anu.edu.au/~foret/next_gen_seq/hts_applications.pdf · 1 Genome sequencing De novo genome sequencing Genome resequencing

Ribosome Profiling

Location on transcript

Num

ber

of

Reads

Stalled translation Active translation

Location on transcript

Num

ber

of

Reads


Recommended