Upload
didina
View
61
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Functional Genomics with Next-Generation Sequencing . Jen Taylor Bioinformatics Team CSIRO Plant Industry. Capacity and Resolution. Next generation sequencing Increasing capacity leads to increased resolution. Eric Lander, Broad Institute. How a Genome Works?. Parts Description - PowerPoint PPT Presentation
Citation preview
Functional Genomics with Next-Generation Sequencing
Jen Taylor
Bioinformatics Team
CSIRO Plant Industry
CSIRO. INI Meeting July 2010 - Tutorial - Applications
Capacity and Resolution
• Next generation sequencing• Increasing capacity leads to increased resolution
Eric Lander, Broad Institute
CSIRO. INI Meeting July 2010 - Tutorial - Applications
How a Genome Works?
Parts Description• Function?• Interconnectedness?
Comparisons• Population - level• Between genomes
CSIRO. INI Meeting July 2010 - Tutorial - Applications
Application domains
Reference genome
No Reference Genome
Partially sequenced
UNsequenced
“PUN Genomes”
CSIRO. INI Meeting July 2010 - Tutorial - Applications
Impact of a Reference Genome
Sequence Data
Alignment
Read Density
Characterisation
Genome Assembly
Contigs
CSIRO. INI Meeting July 2010 - Tutorial - Applications
Applications of Next Generation Sequencing
• Profiling of Variation• Genetic variation• Transcript variation• Epigenetic variation• Metagenomic variation
• Discovery• Novel genomes• Novel genes• Novel transcripts• Small / long non-coding RNA
RNA Sequencing (RNASeq)
• Coding and non-coding transcript profiling
• Dynamic and Context dependent
Epigenomics
• Genome-wide protein-DNA interactions, DNA modifications
• Heritable and reversible regulation of gene expression
Today
CSIRO. INI Meeting July 2010 - Tutorial - Applications
RNASeq
• Qualitative – transcript diversity• Quantitative – transcript abundance
• Impact of NGS• Observation of transcript complexity
• Transcript discovery
• Small / long non-coding RNA
• Analytical challenges• Transcript complexity
• Compositional properties
CSIRO. INI Meeting July 2010 - Tutorial - Applications
RNASeq
Library Construction
Sample
Total RNA
PolyA RNA
Small RNA
Sequencing
Base calling & QC
Mapping to Genome
Assembly to Contigs
Digital “Counts”
Reads per kilobase per million (RPKM)
Transcript structure
Secondary structure
Targets or Products
Reference
PUN
Analysis
CSIRO. INI Meeting July 2010 - Tutorial - Applications
RNASeq – Transcript Complexity
Mapping :
• Reads with multiple locations
•Conserved domains ?
•Sequencing error ?
• Reads Spanning Exons
• Gapped alignments ?
• Sequencing error ?
Erange Pipeline : Mortazavi et al., Nature Methods VOL.5 NO.7 JULY 2008
CSIRO. INI Meeting July 2010 - Tutorial - Applications
RNASeq – Compositional properties
Depth of Sequence• Sequence count ≈ Transcript Abundance
• Majority of the data can be dominated by a small number of highly abundant transcripts
• Ability to observe transcripts of smaller abundance is dependent upon sequence depth
CSIRO. INI Meeting July 2010 - Tutorial - Applications
RNASeq – Compositional properties
Composition• Sequence counts are a composition
of a fixed number of total sequence reads
• Therefore they are sum-constrained and not independent
• Large variations in component numbers and sizes can produce artefacts
True Reads
RPKM
CSIRO. INI Meeting July 2010 - Tutorial - Applications
RNASeq - Correspondence
• Good correspondence with :• Expression Arrays• Tiling Arrays• qRT-PCR
• Range of up to 5 orders of magnitude
• Better detection of low abundance transcripts
• Greater power to detect• Transcript sequence polymorphism• Novel trans-splicing• Paralogous genes• Individual cell type expression
CSIRO. INI Meeting July 2010 - Tutorial - Applications
Reference Genome - RNASeq
CSIRO. INI Meeting July 2010 - Tutorial - Applications
Reference Genome - RNASeq
Human Exome
Number of exons targeted: ~180,000 (CCDS database)plus700+ miRNA(Sanger v13)300+ ncRNA
CSIRO. INI Meeting July 2010 - Tutorial - Applications
Epigenome
• Protein-DNA interactions [ChIPSeq]• Nucleosome positioning• Histone modification• Transcription factor interactions
• Methylation [MethylSeq]
• Impact of NextGen• Whole genome profiling• Resolution
• Analytical challenges• Systematic bias• Unambiguous mapping• Robust event calling
Image : ClearScience
CSIRO. INI Meeting July 2010 - Tutorial - Applications
ChIPSeq
MNaseLinker Digest
Sequence &Align
RemoveNucleosomes
CSIRO. INI Meeting July 2010 - Tutorial - Applications
ChIPSeq
MNaseDigest
Sequence &Align
RemoveNucleosomes
CSIRO. INI Meeting July 2010 - Tutorial - Applications
ChipSeq methods
Pepke et al., 2009
CisGenome
ERANGE
FindPeaks
F-Seq
GLITR
MACS
PeakSeq
QuEST
CSIRO. INI Meeting July 2010 - Tutorial - Applications
MethylSeq using Bisulfite conversion
Cytosine Uracil
Bisulfite conversion
Thymine
PCR
5-methylcytosine 5-methylcytosine Cytosine
Bisulfite conversion PCR
CSIRO. INI Meeting July 2010 - Tutorial - Applications
Limited publications from BS-Seq
• Mammals
• Methylation predominant occurs at CpG site
• Several publications in human
• One publications in mouse
• Plants
• Methylation occurs at CG, CHH, CHG sites
• Two publications in arabidopsis
H = A, G, T
CSIRO. INI Meeting July 2010 - Tutorial - Applications
Problems of mapping BS-seq reads
• Reduced sequence complexity
Cm methylated
C Un-methylated
Watson >>A Cm G T T C T C C A G T C>>
Bisulfite conversion
>>A Cm G T T T T T T A G T T>>
>>A C G T T T T T T A G T T>>
CSIRO. INI Meeting July 2010 - Tutorial - Applications
Problems of mapping BS-seq reads
• Increased search space
Watson >> A Cm G T T C T C C A G T C >> Crick << T G Cm A A G A G G T C A G <<
BSW >> ACmGTTTTTTAGTT >> BSC << TGCmAAGAGGTTAG <<
Bisulfite conversion
BSW >> ACmGTTTTTTAGTT >>BSWR << TG CAAAAAATCAA >>
BSCR >> ACG TTCTCCAAGA >> BSC << TGCmAAGAGGTTAG <<
PCR
CSIRO. INI Meeting July 2010 - Tutorial - Applications
ELAND
• Mapping reads to genome sequences
• Mapping reads to two converted genome sequences
• Cross match for reads mapping to multiple positions in converted genomes
• Mapping results were combined to generate methylation information
• Eland only allows 2 mismatches.
Lister et al. Cell (2008)
CSIRO. INI Meeting July 2010 - Tutorial - Applications
BSMAP
• Based on HASH table seeding algorithm
Xi and Li BMC Bioinformatics (2009)
CSIRO. INI Meeting July 2010 - Tutorial - Applications
Re-mapping of Lister’s data using BSMAP
Raw Reads MethodsUniquely
Mapped Reads
Unique and Nonclonal
Reads
Unique and nonclonal reads%
144,704,372
Eland 55,805,931 39,113,599 27.03%
BSMAP 67,975,425 48,498,687 35.52%
Lister et al. Cell (2008)
CSIRO. INI Meeting July 2010 - Tutorial - Applications
Methylation pattern throughout chromosomes
CHG
Crick
Watson
Position
Arabidopsis Chromosome 3
CG Watson
Crick
CHH Watson
Crick
Met
hyl
atio
n L
evel
/ 50
Kb
1.0
0.80
0.20
CSIRO. INI Meeting July 2010 - Tutorial - Applications
Partially / Unsequenced Genomes
Options for dealing with partial or unsequenced genomes• Wait for or generate the genome sequence
• ‘Borrow’ a reference genome from a phylogenetic neighbour
• Take a deep breath and ‘do denovo’
• Denovo Genome
• Denovo Transcriptome
DNA or RNA Sequence Data
Partial Sequence Database
Partial Assembly
Gene Annotation
Genetic Variation
Non-coding RNA
Transcript Variation
CSIRO. INI Meeting July 2010 - Tutorial - Applications
Plant Genomes – Haploid Size
Human
Arabidopsis
Rice
Potato
Sugarcane
Cotton
Barley
Wheat
Diameter proportional to genome haploid genome size
CSIRO. INI Meeting July 2010 - Tutorial - Applications
Plant Genomes – Total Size
Human Cotton Barley Sugarcane
Wheat
CSIRO. INI Meeting July 2010 - Tutorial - Applications
Denovo RNA Seq
• Why transcriptome ?• Large genome sizes with high repeat content are difficult to assemble
• Transcriptomes more constant size• Enriched for functional content
• Aims :• Transcript discovery• Small /long non-coding RNA profiling
• Analytical challenges• Assembly – ABySS, Velvet, Euler-SR• Comparisons between non-discrete, overlapping transcripts• Annotation• Ploidy
CSIRO. INI Meeting July 2010 - Tutorial - Applications
Summary – Impacts and Challenges
• RNASeq• Increased resolution
• Increased power for transcript complexity and variation
• Analytical challenges – transcript complexity, compositional bias
• Large gains in small and long non-coding RNA profiling
• Epigenomics• ChipSeq and MethylSeq
• Genome-wide with resolution
• Robust event calling is challenging
• Denovo transcriptomics• Attractive option for large, repeat rich genomes
CSIRO. INI Meeting July 2010 - Tutorial - Applications
Acknowledgements
CSIRO PI Bioinformatics Team
Andrew Spriggs
Stuart Stephen
Emily Ying
Jose Robles
Michael James
CSIRO Biostatistics
David Lovell