40
Permissions: you are free to blog or live-blog about this presentation as long as you attribute the work to its authors Korea Center for Disease Control & Prevention Next-generation genomics: an integrative approach Chang Bum Hong Division of Structural and functional Genomics, Center for Genome Sciences, NIH

Next-generation genomics: an integrative approach

Embed Size (px)

Citation preview

Page 1: Next-generation genomics: an integrative approach

Permissions: you are free to blog or live-blog about this presentation as long as you attribute the work to its authors

Korea Center for Disease Control & Prevention

Next-generation genomics:an integrative approach

Chang Bum Hong

Division of Structural and functional Genomics, Center for Genome Sciences, NIH

Page 2: Next-generation genomics: an integrative approach

twitter

Page 3: Next-generation genomics: an integrative approach

APPLICATIONS OF NEXT-GENERATION SEQUENCING

2011• Genome structural variation discovery and genotyping• RNA sequencing: advances, challenges and opportunities• Charting histon modifications and the functional organization of mammalian genomes

2010• Evaluating genome-scale approaches to eukaryotic DNA replication• Advances in understanding cancer genomes through second-generation sequencing• Genome-wide allele-specific analysis: insights into regulatory variation• Next-generation genomics: an integrative approach• Uncovering the roles of rare variants in common disease through whole-genome sequencing• Principles and challenges of genome-wide DNA methylation analysis• Prokaryotic transcriptomics: a new view on regulation, physiology and pathogenicity• Sequencing technologies - the next generation• RNA processing and its regulation: global insights into biological networks

2009• The complex eukaryotic transcriptome: unexpected pervasive transcription and novel small RNAs• ChIP-seq: advantages and challenges of a maturing technology• Insights from genomic profiling of transcription factors• RNA-Seq: a revolutionary tool for transcriptomics

Page 4: Next-generation genomics: an integrative approach
Page 5: Next-generation genomics: an integrative approach

DNA

RNA

Protein

Complete genome resequencingTargeted genomic resequencingde novo sequencing

Translated into proteins

DNA being transcribed into RNA

PhenotypeDisease

Chromatin immuniprecipitation sequencingSequencing of bisulfite-treated DNA

EpigenomeTranscriptome sequencingSmall RNA sequencing

Proteomics

Transcriptomics

Genomics

Genome-scale dataGWAS, ChIP-seq and RNA-seq

Page 6: Next-generation genomics: an integrative approach

Next-generation sequencing

•We define this as the use of established sequencing platforms, including the

• Illumia/Solexa Genome Analyzer

• Roche/454 Genome Sequencer

• Applied Biosystems SOLiD

• Helicos and Pacific Biosciences

HiSeq 2000

5500xl SOLid System

MiSeq

Ion Personal Genome Machine

Genome Sequencer FLX System

GS Junior

HeliScope Single Molecule SequncerPACBIO RSJay Flatley Greg Lucier

Page 7: Next-generation genomics: an integrative approach

Jay Flatley Greg Lucier Stephen Quake

Jim Watson Craig Venter

John WestFormer Illumina CEOFounder of HelicosLife Technogoies CEOIllumina CEO

?

Page 8: Next-generation genomics: an integrative approach

BGI 1 x 454, 27 x SOLiD3/4, 128 x Illumina HiSeq

94 x Illumina GA2, 10 x 454, 8 x SOLiD3/4, 1 x Heliscope, 1 x Polonator, 1 x PacBioBroad Institute

Next Generation Genomics: World Map of High-throughput Sequencershttp://pathogenomics.bham.ac.uk/hts/

GMI at Seoul National University College of Medicine 10 x Illumina GA2Macrogen 10 x Illumina GA2, 1 x 454, 2 x SOLiD3/4NICEM Illumina GA2, 454Gachon University of Medicine and Science Illumina GA2, 2 x SOLiD 3/4KRIBB 1x Illumina GA2

Page 9: Next-generation genomics: an integrative approach

• Next-next....-generation: how many ‘next’s are there?

• First Generation: automated version of Sanger sequencing(DNA-sequencing method invented by Fred Sanger in the 1970s)

• Second Generation

• Roche/454 sequencing machine from 454 Life Science(2005)

• 450 bases per read / $0.02 per 1000 bases / 2 days per Gb

• Solexa from Illumina(2006)

• 75 bases per read / $0.01 per 1000 bases / 0.5 days per Gb

• SOLiD from Applied BioSystem(2006)

• 50 bases per read / $0.001 per 1000 bases / 0.5 days per Gb

• Next-Next-Gen - Third Generation?

• Hiseq2000 from Illumina - 0.04 days per Gb

• Helicos Heliscope

• Pacific Biosciences SMART

Sequencing technologies

Page 10: Next-generation genomics: an integrative approach

Shendure & Ji, 2008

Michael L. Metzker, 2010

Sequencing technologiesFeature generation

Page 11: Next-generation genomics: an integrative approach

Sequencing technologiesSequencing by synthesis

Michael L. Metzker, 2010

Page 12: Next-generation genomics: an integrative approach

• Sequencing

• How deep?

• Single, Paired read or both

• Alignment

• References, assemble or both

• Experimental specific analysis

• A ‘one-size-fits-all’ program dose not exist

NGS typical procedure

Page 13: Next-generation genomics: an integrative approach

• Sequence assembly

• Whole Genome Assembly (Reference, De novo)

• Transcriptome Assembly

• Short Sequence Alignment

• Single read

• Paired read

• Genomic Variation Detection

• Detection of Single Nucleotide Polymorphism (SNP)

• Detection of Alternative Splicing Event

• Detection of major/minor transcript isoforms

Applications

Page 14: Next-generation genomics: an integrative approach

Shendure & Ji, 2008

Applications

Page 15: Next-generation genomics: an integrative approach

Bioinformatics tools

Shendure & Ji, 2008

Page 16: Next-generation genomics: an integrative approach

• Sequence Reads

• fastq

• fasta

• Alignment

• Sequence Alignment Map (SAM)

• BAM (Binary Alignment Map)

• Variation

• VCF (Variation Call Format)

File Format

Page 17: Next-generation genomics: an integrative approach

Data: Sequence Reads

Page 18: Next-generation genomics: an integrative approach

Data: Sequence Reads

A challenge call for a new compression algorithmCompression of genomic sequences in FASTQ format

Page 19: Next-generation genomics: an integrative approach

Sebastian Deorowicz et.al, 2011

Data: Sequence Reads

Compress type Compress time Size

gzip 14s 28M

bzip2 9.75s 23M

dsrc 1.36s 21M

Page 20: Next-generation genomics: an integrative approach

• ChIP-Seq

• allows you to assay the amount of binding and location of a protein to DNA, such as a transcription factor bound to the start site of a gene, or a histones of a certain type

• RNA-Seq

• Mapping transcription start sites

• Characterization of alternative splicing patterns

• Gene fusion detection

• Estimation of the abundance of the transcripts from their depth of coverage in the mapping

Example of Applications

Page 21: Next-generation genomics: an integrative approach

ChIP-Seq

Barski A & Zhao K, 2009

Chromatin immunoprecipitation (ChIP)

Kharchenko et al, 2008

Shirely et al, 2009

Page 22: Next-generation genomics: an integrative approach

ChIP-Seq

Shirely et al, 2009

Page 23: Next-generation genomics: an integrative approach

ChIP-Seq Software packages

Shirely et al, 2009

Page 24: Next-generation genomics: an integrative approach

RNA-Seq

Zhong Wang, 2009

RNA-Seq (De novo transcriptome assembly)

RNA-Seq(Transcriptome resequencing)

Page 25: Next-generation genomics: an integrative approach

RNA-Seq

RNA-Seq mapping of short reads in exon-exon junctionsRNA-Seq mapping of short reads over exon-exon junctions, depending on where each end maps to, it could be defined a Transor a Cis event.

from wikipedia.org

Page 26: Next-generation genomics: an integrative approach

RNA-Seq Software packages

Shirely et al, 2009

Page 27: Next-generation genomics: an integrative approach

• Genes in DNA being transcribed into RNA

• might be spliced

• transported to an appropriate cellular compartment

• translated into proteins

• Regulated at many levels

• DNA methylation

• chromatin modification

• binding of transcription factors to the DNA

• binding of splicing factors to the RNA and RNA transport

DNA encodes heritable traits

Page 28: Next-generation genomics: an integrative approach

•What types of genomic data sets are available?

•Why perform integrative genomic analysis?

• Approaches to an integrative analysis

• Using large-scale data sets for integrative analysis

• Future perspectives

NGG(Next-generation genomics)an integrative approach

Page 29: Next-generation genomics: an integrative approach

• Sequence variation data

• SNP genotyping arrays

• resequencing

• Transcriptomic data

• RNA-Seq

• identify transcripts arising from gene fusion events

• detect novel classes of non-coding RNAs

• Epigenomic data

• Bisulphite tratment

• Chromatin immunoprecipitation

• Interactome data

• RNA-protein interaction

• protein -protein interaction networks

• define genetic and signaling pathways

What types of genomic data sets are available?

Page 30: Next-generation genomics: an integrative approach

• Annotating functional features of the genome

• Inferring the function of genetic variants

• Understanding mechanisms of gene regulation

Figure 1 | Annotating the genome through detecting transcription-factor binding sites and histone-modification states.

Why perform integrative genomic analysis?

Figure 2 | Identification of regulatory SNPs

Page 31: Next-generation genomics: an integrative approach

Approaches to an integrative analysis

• Data complexity reduction

• summarize each experiment as a collection of genomic regions with strong enrichment of signal

• especially important to inspect at least some of the results by eye

• Unsupervised integration

• 목적은 어떤 올바른 답을 찾는 것이 아니라 데이터 집합 내에서 구조를 발견

• Clustering: partitioning a large data set into easily digestible, conceptual pieces

• Supervised integration

• 예제 입출력을 사용해 예측하는 방법을 학습하는 기법

• Bayesian network

Page 32: Next-generation genomics: an integrative approach

Approaches to an integrative analysis

an intromic H3K4me1 peak predicts an enhancer elements

Promoter

Transcribed

Page 33: Next-generation genomics: an integrative approach

UCSC browser with EnCODE data

Page 34: Next-generation genomics: an integrative approach

Using large-scale data sets for integrative analysis

• For the bench scientist

• open-source web browser, such as FireFox

• add-ons: gatekeepers

Page 35: Next-generation genomics: an integrative approach

Using large-scale data sets for integrative analysis

• For the bench scientist

• stand-alone analytical system: CisGenome

• genome browser: UCSC browser, Anno-J

Figure 4 | Flow chart for data analysisWorkflow for ChIP-seq analysis

Galaxy

UCSC browser

Online or stand-alone tools

Page 36: Next-generation genomics: an integrative approach

Using large-scale data sets for integrative analysis

• Bioinformatics hurdles

• normalized data

Page 37: Next-generation genomics: an integrative approach

Future perspectives

•Data integration itself is not an end

• designed to generate novel hypotheses and help to test them

• Community-wide effort, akin to Wikipedia

• Searchable with Google-like capabilities

Page 38: Next-generation genomics: an integrative approach

Future perspectives

Page 39: Next-generation genomics: an integrative approach

Future perspectives

Page 40: Next-generation genomics: an integrative approach

토비 세가란Genstruct에서 약제 발현원리 이해를 위한 알고리즘 설계

사트남 알랙생명과학 커뮤니티를 위한 버티컬 검색 엔진을 개발

하는 넥스트바이오의 엔지니어링 부사장