32
Functional Genomics with Next- Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

Embed Size (px)

Citation preview

Page 1: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

Functional Genomics with Next-Generation Sequencing

Jen Taylor

Bioinformatics Team

CSIRO Plant Industry

Page 2: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

Capacity and Resolution

• Next generation sequencing• Increasing capacity leads to increased resolution

Eric Lander, Broad Institute

Page 3: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

How a Genome Works?

Parts Description• Function?• Interconnectedness?

Comparisons• Population - level• Between genomes

Page 4: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

Application domains

Reference genome

No Reference Genome

Partially sequenced

UNsequenced

“PUN Genomes”

Page 5: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

Impact of a Reference Genome

Sequence Data

Alignment

Read Density

Characterisation

Genome Assembly

Contigs

Page 6: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

Applications of Next Generation Sequencing

• Profiling of Variation• Genetic variation• Transcript variation• Epigenetic variation• Metagenomic variation

• Discovery• Novel genomes• Novel genes• Novel transcripts• Small / long non-coding RNA

RNA Sequencing (RNASeq)

• Coding and non-coding transcript profiling

• Dynamic and Context dependent

Epigenomics

• Genome-wide protein-DNA interactions, DNA modifications

• Heritable and reversible regulation of gene expression

Today

Page 7: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

RNASeq

• Qualitative – transcript diversity• Quantitative – transcript abundance

• Impact of NGS• Observation of transcript complexity

• Transcript discovery

• Small / long non-coding RNA

• Analytical challenges• Transcript complexity

• Compositional properties

Page 8: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

RNASeq

Library Construction

Sample

Total RNA

PolyA RNA

Small RNA

Sequencing

Base calling & QC

Mapping to Genome

Assembly to Contigs

Digital “Counts”

Reads per kilobase per million (RPKM)

Transcript structure

Secondary structure

Targets or Products

Reference

PUN

Analysis

Page 9: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

RNASeq – Transcript Complexity

Mapping :

• Reads with multiple locations

•Conserved domains ?

•Sequencing error ?

• Reads Spanning Exons

• Gapped alignments ?

• Sequencing error ?

Erange Pipeline : Mortazavi et al., Nature Methods VOL.5 NO.7 JULY 2008

Page 10: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

RNASeq – Compositional properties

Depth of Sequence• Sequence count ≈ Transcript Abundance

• Majority of the data can be dominated by a small number of highly abundant transcripts

• Ability to observe transcripts of smaller abundance is dependent upon sequence depth

Page 11: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

RNASeq – Compositional properties

Composition• Sequence counts are a composition

of a fixed number of total sequence reads

• Therefore they are sum-constrained and not independent

• Large variations in component numbers and sizes can produce artefacts

True Reads

RPKM

Page 12: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

RNASeq - Correspondence

• Good correspondence with :• Expression Arrays• Tiling Arrays• qRT-PCR

• Range of up to 5 orders of magnitude

• Better detection of low abundance transcripts

• Greater power to detect• Transcript sequence polymorphism• Novel trans-splicing• Paralogous genes• Individual cell type expression

Page 13: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

Reference Genome - RNASeq

Page 14: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

Reference Genome - RNASeq

Human Exome

Number of exons targeted: ~180,000 (CCDS database)plus700+ miRNA(Sanger v13)300+ ncRNA

Page 15: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

Epigenome

• Protein-DNA interactions [ChIPSeq]• Nucleosome positioning• Histone modification• Transcription factor interactions

• Methylation [MethylSeq]

• Impact of NextGen• Whole genome profiling• Resolution

• Analytical challenges• Systematic bias• Unambiguous mapping• Robust event calling

Image : ClearScience

Page 16: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

ChIPSeq

MNaseLinker Digest

Sequence &Align

RemoveNucleosomes

Page 17: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

ChIPSeq

MNaseDigest

Sequence &Align

RemoveNucleosomes

Page 18: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

ChipSeq methods

Pepke et al., 2009

CisGenome

ERANGE

FindPeaks

F-Seq

GLITR

MACS

PeakSeq

QuEST

Page 19: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

MethylSeq using Bisulfite conversion

Cytosine Uracil

Bisulfite conversion

Thymine

PCR

5-methylcytosine 5-methylcytosine Cytosine

Bisulfite conversion PCR

Page 20: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

Limited publications from BS-Seq

• Mammals

• Methylation predominant occurs at CpG site

• Several publications in human

• One publications in mouse

• Plants

• Methylation occurs at CG, CHH, CHG sites

• Two publications in arabidopsis

H = A, G, T

Page 21: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

Problems of mapping BS-seq reads

• Reduced sequence complexity

Cm methylated

C Un-methylated

Watson >>A Cm G T T C T C C A G T C>>

Bisulfite conversion

>>A Cm G T T T T T T A G T T>>

>>A C G T T T T T T A G T T>>

Page 22: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

Problems of mapping BS-seq reads

• Increased search space

Watson >> A Cm G T T C T C C A G T C >> Crick << T G Cm A A G A G G T C A G <<

BSW >> ACmGTTTTTTAGTT >> BSC << TGCmAAGAGGTTAG <<

Bisulfite conversion

BSW >> ACmGTTTTTTAGTT >>BSWR << TG CAAAAAATCAA >>

BSCR >> ACG TTCTCCAAGA >> BSC << TGCmAAGAGGTTAG <<

PCR

Page 23: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

ELAND

• Mapping reads to genome sequences

• Mapping reads to two converted genome sequences

• Cross match for reads mapping to multiple positions in converted genomes

• Mapping results were combined to generate methylation information

• Eland only allows 2 mismatches.

Lister et al. Cell (2008)

Page 24: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

BSMAP

• Based on HASH table seeding algorithm

Xi and Li BMC Bioinformatics (2009)

Page 25: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

Re-mapping of Lister’s data using BSMAP

Raw Reads MethodsUniquely

Mapped Reads

Unique and Nonclonal

Reads

Unique and nonclonal reads%

144,704,372

Eland 55,805,931 39,113,599 27.03%

BSMAP 67,975,425 48,498,687 35.52%

Lister et al. Cell (2008)

Page 26: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

Methylation pattern throughout chromosomes

CHG

Crick

Watson

Position

Arabidopsis Chromosome 3

CG Watson

Crick

CHH Watson

Crick

Met

hyl

atio

n L

evel

/ 50

Kb

1.0

0.80

0.20

Page 27: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

Partially / Unsequenced Genomes

Options for dealing with partial or unsequenced genomes• Wait for or generate the genome sequence

• ‘Borrow’ a reference genome from a phylogenetic neighbour

• Take a deep breath and ‘do denovo’

• Denovo Genome

• Denovo Transcriptome

DNA or RNA Sequence Data

Partial Sequence Database

Partial Assembly

Gene Annotation

Genetic Variation

Non-coding RNA

Transcript Variation

Page 28: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

Plant Genomes – Haploid Size

Human

Arabidopsis

Rice

Potato

Sugarcane

Cotton

Barley

Wheat

Diameter proportional to genome haploid genome size

Page 29: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

Plant Genomes – Total Size

Human Cotton Barley Sugarcane

Wheat

Page 30: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

Denovo RNA Seq

• Why transcriptome ?• Large genome sizes with high repeat content are difficult to assemble

• Transcriptomes more constant size• Enriched for functional content

• Aims :• Transcript discovery• Small /long non-coding RNA profiling

• Analytical challenges• Assembly – ABySS, Velvet, Euler-SR• Comparisons between non-discrete, overlapping transcripts• Annotation• Ploidy

Page 31: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

Summary – Impacts and Challenges

• RNASeq• Increased resolution

• Increased power for transcript complexity and variation

• Analytical challenges – transcript complexity, compositional bias

• Large gains in small and long non-coding RNA profiling

• Epigenomics• ChipSeq and MethylSeq

• Genome-wide with resolution

• Robust event calling is challenging

• Denovo transcriptomics• Attractive option for large, repeat rich genomes

Page 32: Functional Genomics with Next-Generation Sequencing Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. INI Meeting July 2010 - Tutorial - Applications

Acknowledgements

CSIRO PI Bioinformatics Team

Andrew Spriggs

Stuart Stephen

Emily Ying

Jose Robles

Michael James

CSIRO Biostatistics

David Lovell