Introduction to NGS analyses - FORTH-IMBB · Introduction to NGS analyses Giorgio L Papadopoulos...

Preview:

Citation preview

Introduction to NGS analyses

Giorgio L Papadopoulos

Institute of Molecular Biology and Biotechnology

Bioinformatics Support Group

04/12/2015

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 1 / 28

Overview

1 IntroductionDNA sequencingApplications

2 Primary AnalysisChIPseqRNAseq

3 InterpretationVisualizationAnalysis SoftwareDownstreamAnalysis

4 Online ResourcesENCODEGEOENA

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 2 / 28

Overview

1 IntroductionDNA sequencingApplications

2 Primary AnalysisChIPseqRNAseq

3 InterpretationVisualizationAnalysis SoftwareDownstreamAnalysis

4 Online ResourcesENCODEGEOENA

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 2 / 28

Overview

1 IntroductionDNA sequencingApplications

2 Primary AnalysisChIPseqRNAseq

3 InterpretationVisualizationAnalysis SoftwareDownstreamAnalysis

4 Online ResourcesENCODEGEOENA

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 2 / 28

Overview

1 IntroductionDNA sequencingApplications

2 Primary AnalysisChIPseqRNAseq

3 InterpretationVisualizationAnalysis SoftwareDownstreamAnalysis

4 Online ResourcesENCODEGEOENA

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 2 / 28

Introduction DNA sequencing

Fred SangerLate 1970s

... Several million times ...Late 2000s

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 3 / 28

Introduction DNA sequencing

Fred SangerLate 1970s

... Several million times ...Late 2000s

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 3 / 28

Introduction DNA sequencing

Fred SangerLate 1970s

... Several million times ...Late 2000s

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 3 / 28

Introduction DNA sequencing

Fred SangerLate 1970s

... Several million times ...Late 2000s

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 3 / 28

Introduction Applications

Reuter et al, Mol Cell, 2015

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 4 / 28

Introduction Applications

Reuter et al, Mol Cell, 2015

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 4 / 28

Introduction ChIPseq

Genome Wide Occupancy Profiling

Overlapping reads come from different cells!!!

NGS is a cell population readout ( 50 million cells)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28

Introduction ChIPseq

Genome Wide Occupancy Profiling

Overlapping reads come from different cells!!!

NGS is a cell population readout ( 50 million cells)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28

Introduction ChIPseq

Genome Wide Occupancy Profiling

Overlapping reads come from different cells!!!

NGS is a cell population readout ( 50 million cells)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28

Introduction ChIPseq

Genome Wide Occupancy Profiling

Overlapping reads come from different cells!!!

NGS is a cell population readout ( 50 million cells)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28

Introduction ChIPseq

Genome Wide Occupancy Profiling

Overlapping reads come from different cells!!!

NGS is a cell population readout ( 50 million cells)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28

Introduction ChIPseq

Genome Wide Occupancy Profiling

Overlapping reads come from different cells!!!

NGS is a cell population readout ( 50 million cells)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28

Introduction ChIPseq

Genome Wide Occupancy Profiling

Overlapping reads come from different cells!!!

NGS is a cell population readout ( 50 million cells)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 5 / 28

Introduction ChIPseq

Primary Sequencing Data

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 6 / 28

Introduction ChIPseq

Primary Sequencing Data

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 6 / 28

Introduction ChIPseq

Genome wide, highly accurate representation of genomic statesFairly unbiased

Complementary and overlapping information

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 7 / 28

Introduction ChIPseq

Genome wide, highly accurate representation of genomic statesFairly unbiased

Complementary and overlapping information

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 7 / 28

Primary Analysis ChIPseq

Raw Data

Aligned Data

Enriched

RegionsRead Density

VisualizationTarget Genes

Functional

Analysis

Distribution

Motifs

Overlaps

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 8 / 28

Primary Analysis ChIPseq

Raw Data

Aligned Data

Enriched

RegionsRead Density

VisualizationTarget Genes

Functional

Analysis

Distribution

Motifs

Overlaps

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 8 / 28

Primary Analysis ChIPseq

Raw Data

Aligned Data

Enriched

RegionsRead Density

VisualizationTarget Genes

Functional

Analysis

Distribution

Motifs

Overlaps

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 8 / 28

Primary Analysis ChIPseq

Raw Data

Aligned Data

Enriched

RegionsRead Density

Visualization

Target Genes

Functional

Analysis

Distribution

Motifs

Overlaps

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 8 / 28

Primary Analysis ChIPseq

Raw Data

Aligned Data

Enriched

RegionsRead Density

VisualizationTarget Genes

Functional

Analysis

Distribution

Motifs

Overlaps

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 8 / 28

Primary Analysis ChIPseq

Raw Data

Aligned Data

Enriched

RegionsRead Density

VisualizationTarget Genes

Functional

Analysis

Distribution

Motifs

Overlaps

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 8 / 28

Primary Analysis ChIPseq

Raw Data

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28

Primary Analysis ChIPseq

Raw Data

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28

Primary Analysis ChIPseq

Raw Data

Operations:DeMultiplexingQuality Control

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28

Primary Analysis ChIPseq

Raw Data

Operations:DeMultiplexingQuality Control

FastQC

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28

Primary Analysis ChIPseq

Raw Data

Operations:DeMultiplexingQuality Control

FastQC

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28

Primary Analysis ChIPseq

Raw Data

Operations:DeMultiplexingQuality Control

FastQC

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28

Primary Analysis ChIPseq

Raw Data

Operations:DeMultiplexingQuality Control

FastQC

Align Data...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 9 / 28

Primary Analysis ChIPseq

Align Data...

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...(iGenomes, UCSC, NCBI, EBI...)

Different mappers will usedifferent Index Formats...

bowtie.2

% of alignment...

Low percentage?Contaminations,

adapter/barcode trimming,wrong reference...

Bad Library...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28

Primary Analysis ChIPseq

Align Data...

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...(iGenomes, UCSC, NCBI, EBI...)

Different mappers will usedifferent Index Formats...

bowtie.2

% of alignment...

Low percentage?Contaminations,

adapter/barcode trimming,wrong reference...

Bad Library...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28

Primary Analysis ChIPseq

Align Data...

Aligner (Mapper)

http://www.ebi.ac.uk/~nf/hts_mappers/

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...(iGenomes, UCSC, NCBI, EBI...)

Different mappers will use different Index Formats...

bowtie.2

% of alignment...

Low percentage?Contaminations,

adapter/barcode trimming,wrong reference...

Bad Library...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28

Primary Analysis ChIPseq

Align Data...

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...(iGenomes, UCSC, NCBI, EBI...)

Different mappers will usedifferent Index Formats...

bowtie.2

% of alignment...

Low percentage?Contaminations,

adapter/barcode trimming,wrong reference...

Bad Library...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28

Primary Analysis ChIPseq

Align Data...

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...(iGenomes, UCSC, NCBI, EBI...)

Different mappers will usedifferent Index Formats...

bowtie.2

% of alignment...

Low percentage?Contaminations,

adapter/barcode trimming,wrong reference...

Bad Library...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28

Primary Analysis ChIPseq

Align Data...

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...(iGenomes, UCSC, NCBI, EBI...)

Different mappers will usedifferent Index Formats...

bowtie.2

Fast...Versatile...

Documented and supported...

% of alignment...

Low percentage?Contaminations,

adapter/barcode trimming,wrong reference...

Bad Library...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28

Primary Analysis ChIPseq

Align Data...

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...(iGenomes, UCSC, NCBI, EBI...)

Different mappers will usedifferent Index Formats...

bowtie.2

% of alignment...

Low percentage?Contaminations,

adapter/barcode trimming,wrong reference...

Bad Library...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28

Primary Analysis ChIPseq

Align Data...

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...(iGenomes, UCSC, NCBI, EBI...)

Different mappers will usedifferent Index Formats...

bowtie.2

% of alignment...

Low percentage?Contaminations,

adapter/barcode trimming,wrong reference...

Bad Library...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28

Primary Analysis ChIPseq

Align Data...

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...(iGenomes, UCSC, NCBI, EBI...)

Different mappers will usedifferent Index Formats...

bowtie.2

% of alignment...

Low percentage?Contaminations,

adapter/barcode trimming,wrong reference...

Bad Library...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 10 / 28

Primary Analysis ChIPseq

Aligned Data

SAM ⇔ BAM ⇒ BED⇓

Enriched

Regions

⇓Chr Start Stop ID Score

SET ofGENOMIC COORDINATES

(PEAKS...)

How Many???

Where???

With Who???

Function???

HOW???

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28

Primary Analysis ChIPseq

Aligned Data

SAM ⇔ BAM ⇒ BED

⇓Enriched

Regions

⇓Chr Start Stop ID Score

SET ofGENOMIC COORDINATES

(PEAKS...)

How Many???

Where???

With Who???

Function???

HOW???

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28

Primary Analysis ChIPseq

Aligned Data

SAM ⇔ BAM ⇒ BED⇓

Enriched

Regions

⇓Chr Start Stop ID Score

SET ofGENOMIC COORDINATES

(PEAKS...)

How Many???

Where???

With Who???

Function???

HOW???

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28

Primary Analysis ChIPseq

Aligned Data

SAM ⇔ BAM ⇒ BED⇓

Enriched

Regions

⇓Chr Start Stop ID Score

SET ofGENOMIC COORDINATES

(PEAKS...)

How Many???

Where???

With Who???

Function???

HOW???

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28

Primary Analysis ChIPseq

Aligned Data

SAM ⇔ BAM ⇒ BED⇓

Enriched

Regions

⇓Chr Start Stop ID Score

SET ofGENOMIC COORDINATES

(PEAKS...)

How Many???

Where???

With Who???

Function???

HOW???

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28

Primary Analysis ChIPseq

Aligned Data

SAM ⇔ BAM ⇒ BED⇓

Enriched

Regions

⇓Chr Start Stop ID Score

SET ofGENOMIC COORDINATES

(PEAKS...)

How Many???

Where???

With Who???

Function???

HOW???

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28

Primary Analysis ChIPseq

Aligned Data

SAM ⇔ BAM ⇒ BED⇓

Enriched

Regions

⇓Chr Start Stop ID Score

SET ofGENOMIC COORDINATES

(PEAKS...)

How Many???

Where???

With Who???

Function???

HOW???

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28

Primary Analysis ChIPseq

Aligned Data

SAM ⇔ BAM ⇒ BED⇓

Enriched

Regions

⇓Chr Start Stop ID Score

SET ofGENOMIC COORDINATES

(PEAKS...)

How Many???

Where???

With Who???

Function???

HOW???

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28

Primary Analysis ChIPseq

Aligned Data

SAM ⇔ BAM ⇒ BED⇓

Enriched

Regions

⇓Chr Start Stop ID Score

SET ofGENOMIC COORDINATES

(PEAKS...)

How Many???

Where???

With Who???

Function???

HOW???

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28

Primary Analysis ChIPseq

Aligned Data

SAM ⇔ BAM ⇒ BED⇓

Enriched

Regions

⇓Chr Start Stop ID Score

SET ofGENOMIC COORDINATES

(PEAKS...)

How Many???

Where???

With Who???

Function???

HOW???

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 11 / 28

Primary Analysis Peak Calling

“Peak calling is a computational method used to identify areas in agenome that have been enriched with aligned reads“

Challenge: Not one single ’peak shape’...

Number of peaksPeak distribution

Peak heightRange of measurements

Experimental variation (IP...)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 12 / 28

Primary Analysis Peak Calling

“Peak calling is a computational method used to identify areas in agenome that have been enriched with aligned reads“

Challenge: Not one single ’peak shape’...

Number of peaksPeak distribution

Peak heightRange of measurements

Experimental variation (IP...)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 12 / 28

Primary Analysis Peak Calling

“Peak calling is a computational method used to identify areas in agenome that have been enriched with aligned reads“

Challenge: Not one single ’peak shape’...

Number of peaksPeak distribution

Peak heightRange of measurements

Experimental variation (IP...)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 12 / 28

Primary Analysis Peak Calling

“Peak calling is a computational method used to identify areas in agenome that have been enriched with aligned reads“

Challenge: Not one single ’peak shape’...

Number of peaksPeak distribution

Peak heightRange of measurements

Experimental variation (IP...)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 12 / 28

Primary Analysis Peak Calling

Solutions:Different Peak CallersOptimal Parameters

Correct Experimental DesignGood ChIP...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28

Primary Analysis Peak Calling

Solutions:Different Peak CallersOptimal Parameters

Correct Experimental DesignGood ChIP...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28

Primary Analysis Peak Calling

Solutions:Different Peak CallersOptimal Parameters

Correct Experimental DesignGood ChIP...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28

Primary Analysis Peak Calling

Solutions:Different Peak CallersOptimal Parameters

Correct Experimental DesignGood ChIP...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28

Primary Analysis Peak Calling

Solutions:Different Peak CallersOptimal Parameters

Correct Experimental DesignGood ChIP...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28

Primary Analysis Peak Calling

Solutions:Different Peak CallersOptimal Parameters

Correct Experimental DesignGood ChIP...

Peaks represent only a ’summary’ of a ChIPseq experiment...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28

Primary Analysis Peak Calling

Solutions:Different Peak CallersOptimal Parameters

Correct Experimental DesignGood ChIP...

Peaks represent only a ’summary’ of a ChIPseq experiment...

Useful metrics:FRiP: Fraction of Reads in Peaks (>1%)

IDR: Irreproducible Discovery RateSignificance: pValue, ChIPtoINPUT ratio, # of reads...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28

Primary Analysis Peak Calling

Solutions:Different Peak CallersOptimal Parameters

Correct Experimental DesignGood ChIP...

Peaks represent only a ’summary’ of a ChIPseq experiment...

Empirical:Known Motif: Enriched in peaks, Central Position

Distribution: Near TSS, Within GeneBody, EnhancersTarget Genes: Gene expression changes, Functional Annotation

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28

Primary Analysis Peak Calling

Solutions:Different Peak CallersOptimal Parameters

Correct Experimental DesignGood ChIP...

Peaks represent only a ’summary’ of a ChIPseq experiment...

Empirical:Known Motif: Enriched in peaks, Central Position

Distribution: Near TSS, Within GeneBody, EnhancersTarget Genes: Gene expression changes, Functional Annotation

Browser inspection and previously known sites

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 13 / 28

Primary Analysis Peak Calling

TFs: MACSHMs: SICER

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28

Primary Analysis Peak Calling

TFs: MACSHMs: SICER

Running MACS: macs14 -t ChIP -c Input -g mm -n Name

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28

Primary Analysis Peak Calling

TFs: MACSHMs: SICER

Running MACS: macs14 -t ChIP -c Input -g mm -n Name

−600 −400 −200 0 200 400 600

0.0

0.1

0.2

0.3

0.4

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=119

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28

Primary Analysis Peak Calling

TFs: MACSHMs: SICER

Running MACS: macs14 -t ChIP -c Input -g mm -n Name

−600 −400 −200 0 200 400 600

0.0

0.1

0.2

0.3

0.4

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=119

−600 −400 −200 0 200 400 600

0.04

0.05

0.06

0.07

0.08

0.09

0.10

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=50

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28

Primary Analysis Peak Calling

TFs: MACSHMs: SICER

Running MACS: macs14 -t ChIP -c Input -g mm -n Name

−600 −400 −200 0 200 400 600

0.0

0.1

0.2

0.3

0.4

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=119

−600 −400 −200 0 200 400 600

0.04

0.05

0.06

0.07

0.08

0.09

0.10

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=50

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28

Primary Analysis Peak Calling

TFs: MACSHMs: SICER

Running MACS: macs14 -t ChIP -c Input -g mm -n Name

−600 −400 −200 0 200 400 600

0.0

0.1

0.2

0.3

0.4

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=119

−600 −400 −200 0 200 400 600

0.04

0.05

0.06

0.07

0.08

0.09

0.10

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=50

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28

Primary Analysis Peak Calling

TFs: MACSHMs: SICER

Running MACS: macs14 -t ChIP -c Input -g mm -n Name

−600 −400 −200 0 200 400 600

0.0

0.1

0.2

0.3

0.4

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=119

−600 −400 −200 0 200 400 600

0.04

0.05

0.06

0.07

0.08

0.09

0.10

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=50

WIG file: Normalized Read Density text format⇓

bigWig file: Binary Read Density format

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28

Primary Analysis Peak Calling

TFs: MACSHMs: SICER

Running MACS: macs14 -t ChIP -c Input -g mm -n Name

−600 −400 −200 0 200 400 600

0.0

0.1

0.2

0.3

0.4

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=119

−600 −400 −200 0 200 400 600

0.04

0.05

0.06

0.07

0.08

0.09

0.10

Peak Model

Distance to the middle

Per

cent

age

forward tagsreverse tagsshifted tags

d=50

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 14 / 28

Primary Analysis ChIPseq Summary

General Guidelines:

Biological Replicates (2x)

INPUT (noIP sample)

20-25 mln reads per sample

Optimize ChIP conditions before library preparation

Minimize experimental variation

Look at the data

’ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia’Landt et al, Genome Res, 2012

’Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data’Bailey et al, PLoS Comp Biol, 2013

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 15 / 28

Primary Analysis ChIPseq Summary

General Guidelines:

Biological Replicates (2x)

INPUT (noIP sample)

20-25 mln reads per sample

Optimize ChIP conditions before library preparation

Minimize experimental variation

Look at the data

’ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia’Landt et al, Genome Res, 2012

’Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data’Bailey et al, PLoS Comp Biol, 2013

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 15 / 28

Primary Analysis RNAseq

Raw Data

Aligned Data

Read Counts

Condition 1

Read Counts

Condition 2

Differential

Expression

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 16 / 28

Primary Analysis RNAseq

Raw Data

Aligned Data

Read Counts

Condition 1

Read Counts

Condition 2

Differential

Expression

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 16 / 28

Primary Analysis RNAseq

Raw Data

Aligned Data

Read Counts

Condition 1

Read Counts

Condition 2

Differential

Expression

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 16 / 28

Primary Analysis RNAseq

Raw Data

Aligned Data

Read Counts

Condition 1

Read Counts

Condition 2

Differential

Expression

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 16 / 28

Primary Analysis RNAseq

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...Spliced alignment

Build reference based onknown transcripts

STAR

VERY Fast...(45 million paired readsper hour per processor)

TopHat2

If you want to use CuffLinks...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 17 / 28

Primary Analysis RNAseq

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...Spliced alignment

Build reference based onknown transcripts

STAR

VERY Fast...(45 million paired readsper hour per processor)

TopHat2

If you want to use CuffLinks...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 17 / 28

Primary Analysis RNAseq

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...Spliced alignment

Build reference based onknown transcripts

STAR

VERY Fast...(45 million paired readsper hour per processor)

TopHat2

If you want to use CuffLinks...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 17 / 28

Primary Analysis RNAseq

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...Spliced alignment

Build reference based onknown transcripts

STAR

VERY Fast...(45 million paired readsper hour per processor)

TopHat2

If you want to use CuffLinks...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 17 / 28

Primary Analysis RNAseq

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...Spliced alignment

Build reference based onknown transcripts

STAR

VERY Fast...(45 million paired readsper hour per processor)

TopHat2

If you want to use CuffLinks...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 17 / 28

Primary Analysis RNAseq

Aligner (Mapper)

Reference GenomeHuman (hg18, hg19)Mouse (mm9, mm10)

...Spliced alignment

Build reference based onknown transcripts

STAR

VERY Fast...(45 million paired readsper hour per processor)

TopHat2

If you want to use CuffLinks...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 17 / 28

Primary Analysis RNAseq

Raw Data

Aligned Data

Read Counts

Condition 1

Read Counts

Condition 2

htseqCount

Differential

ExpressionDESeqedgeR

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 18 / 28

Primary Analysis RNAseq

Raw Data

Aligned Data

Read Counts

Condition 1

Read Counts

Condition 2

htseqCount

Differential

ExpressionDESeqedgeR

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 18 / 28

Primary Analysis RNAseq

Raw Data

Aligned Data

Read Counts

Condition 1

Read Counts

Condition 2

htseqCount

Differential

ExpressionDESeqedgeR

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 18 / 28

Primary Analysis RNAseq Design

Raw Data

Adapted from: Kostadyma M, EBI Workshop

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 19 / 28

Primary Analysis RNAseq Design

Raw Data

Adapted from: Kostadyma M, EBI Workshop

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 19 / 28

Primary Analysis RNAseq Design

Raw Data

Adapted from: Kostadyma M, EBI Workshop

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 19 / 28

Primary Analysis RNAseq Design

Raw Data

Adapted from: Kostadyma M, EBI WorkshopPapadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 19 / 28

Interpretation Visualization

GenomeGraphs (R package) Ariadne (GeneViewer)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28

Interpretation Visualization

Local

GenomeGraphs (R package)Ariadne (GeneViewer)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28

Interpretation Visualization

Local

GenomeGraphs (R package)

Online

Ariadne (GeneViewer)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28

Interpretation Visualization

Local

IGV (Integrative Genomics Viewer)

GenomeGraphs (R package)

Online

Ariadne (GeneViewer)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28

Interpretation Visualization

Local

IGV (Integrative Genomics Viewer)

GenomeGraphs (R package)

Online

UCSC Genome Browser

Ariadne (GeneViewer)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28

Interpretation Visualization

Local

IGV (Integrative Genomics Viewer)

Cross Platform (Win, Mac, Linux)Different file formats

Fast

GenomeGraphs (R package)

Online

UCSC Genome Browser

Ariadne (GeneViewer)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28

Interpretation Visualization

Local

IGV (Integrative Genomics Viewer)

Cross Platform (Win, Mac, Linux)Different file formats

Fast

GenomeGraphs (R package)

Online

UCSC Genome Browser

Web basedSlow

Comprehensive database

Ariadne (GeneViewer)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28

Interpretation Visualization

Local

IGV (Integrative Genomics Viewer)

Cross Platform (Win, Mac, Linux)Different file formats

Fast

GenomeGraphs (R package)

Online

UCSC Genome Browser

Web basedSlow

Comprehensive database

Scalechr11:

STS Markers

RatHuman

OrangutanDog

HorseOpossum

ChickenStickleback

RepeatMasker

SNPs (128)

50 kb mm918,780,000 18,790,000 18,800,000 18,810,000 18,820,000 18,830,000 18,840,000 18,850,000 18,860,000 18,870,000 18,880,000 18,890,000 18,900,000 18,910,000 18,920,000

STS Markers on Genetic and Radiation Hybrid Maps

UCSC Genes (RefSeq, GenBank, tRNAs & Comparative Genomics)

G1E DNaseI HS Peaks Rep 2 from ENCODE/PSU

G1E-ER4 Estradiol 24hr DNaseI HS DNase-seq Peaks Rep 2 from ENCODE/PSU

G1E DNaseI HS Signal Rep 2 from ENCODE/PSU

G1E-ER4 Estradiol 24hr DNaseI HS DNase-seq Signal Rep 2 from ENCODE/PSU

CH12 CTCF TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

CH12 CTCF TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

CH12 p300 (SC-584) TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

CH12 p300 (SC-584) TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

CH12 Pol2 TFBS ChIP-seq Peaks from ENCODE/Stanford/YaleCH12 Pol2 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

CH12 c-Jun TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

CH12 c-Jun TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

MEL CTCF TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

MEL CTCF TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

MEL GATA-1 TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

MEL GATA-1 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

MEL p300 (SC-584) TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

MEL p300 (SC-584) TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

MEL Pol2 TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

MEL Pol2 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

Placental Mammal Basewise Conservation by PhyloP

Multiz Alignments of 30 Vertebrates

Repeating Elements by RepeatMasker

Simple Nucleotide Polymorphisms (dbSNP build 128)

BC050140AK018427

Meis1Meis1

Meis1Gm16140

Meis1AK144295

chr11.2463chr11.2464

chr11.2467 chr11.2474 chr11.2477 chr11.2485chr11.2488

chr11.4792chr11.4794

chr11.4817 chr11.4841chr11.4846

chr11.4858chr11.4862

chr11.4866chr11.4869

chr11.4871

G1E 2

0.15 _

0 _

G1E-ER4 E2 24 2

0.15 _

0 _

CH12 CTCF

CH12 p300 SC-584

CH12 Pol2

CH12 c-Jun

MEL CTCF

MEL GATA-1

MEL p300 SC-584

MEL Pol2

Mammal Cons

2.1 _

-3.3 _

0 -

Ariadne (GeneViewer)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28

Interpretation Visualization

Local

GenomeGraphs (R package)

Online

UCSC Genome Browser

Web basedSlow

Comprehensive database

Scalechr11:

STS Markers

RatHuman

OrangutanDog

HorseOpossum

ChickenStickleback

RepeatMasker

SNPs (128)

50 kb mm918,780,000 18,790,000 18,800,000 18,810,000 18,820,000 18,830,000 18,840,000 18,850,000 18,860,000 18,870,000 18,880,000 18,890,000 18,900,000 18,910,000 18,920,000

STS Markers on Genetic and Radiation Hybrid Maps

UCSC Genes (RefSeq, GenBank, tRNAs & Comparative Genomics)

G1E DNaseI HS Peaks Rep 2 from ENCODE/PSU

G1E-ER4 Estradiol 24hr DNaseI HS DNase-seq Peaks Rep 2 from ENCODE/PSU

G1E DNaseI HS Signal Rep 2 from ENCODE/PSU

G1E-ER4 Estradiol 24hr DNaseI HS DNase-seq Signal Rep 2 from ENCODE/PSU

CH12 CTCF TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

CH12 CTCF TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

CH12 p300 (SC-584) TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

CH12 p300 (SC-584) TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

CH12 Pol2 TFBS ChIP-seq Peaks from ENCODE/Stanford/YaleCH12 Pol2 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

CH12 c-Jun TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

CH12 c-Jun TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

MEL CTCF TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

MEL CTCF TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

MEL GATA-1 TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

MEL GATA-1 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

MEL p300 (SC-584) TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

MEL p300 (SC-584) TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

MEL Pol2 TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale

MEL Pol2 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale

Placental Mammal Basewise Conservation by PhyloP

Multiz Alignments of 30 Vertebrates

Repeating Elements by RepeatMasker

Simple Nucleotide Polymorphisms (dbSNP build 128)

BC050140AK018427

Meis1Meis1

Meis1Gm16140

Meis1AK144295

chr11.2463chr11.2464

chr11.2467 chr11.2474 chr11.2477 chr11.2485chr11.2488

chr11.4792chr11.4794

chr11.4817 chr11.4841chr11.4846

chr11.4858chr11.4862

chr11.4866chr11.4869

chr11.4871

G1E 2

0.15 _

0 _

G1E-ER4 E2 24 2

0.15 _

0 _

CH12 CTCF

CH12 p300 SC-584

CH12 Pol2

CH12 c-Jun

MEL CTCF

MEL GATA-1

MEL p300 SC-584

MEL Pol2

Mammal Cons

2.1 _

-3.3 _

0 -

Ariadne (GeneViewer)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28

Interpretation Visualization

Local

GenomeGraphs (R package)

Online

Ariadne (GeneViewer)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 20 / 28

Interpretation Analysis Software

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28

Interpretation Analysis Software

GALAXY server

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28

Interpretation Analysis Software

GALAXY server

Online collection of widely usedbioinformatics tools

(e.g. FastQC, bowtie.2, CuffLinks,MACS, SICER ...)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28

Interpretation Analysis Software

GALAXY server

Online collection of widely usedbioinformatics tools

(e.g. FastQC, bowtie.2, CuffLinks,MACS, SICER ...)

Learn Galaxy...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28

Interpretation Analysis Software

GALAXY server

Online collection of widely usedbioinformatics tools

(e.g. FastQC, bowtie.2, CuffLinks,MACS, SICER ...)

Learn Galaxy...

https://galaxyproject.org/

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28

Interpretation Analysis Software

GALAXY server

Online collection of widely usedbioinformatics tools

(e.g. FastQC, bowtie.2, CuffLinks,MACS, SICER ...)

Learn Galaxy...

https://galaxyproject.org/

Chipster

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28

Interpretation Analysis Software

GALAXY server

Online collection of widely usedbioinformatics tools

(e.g. FastQC, bowtie.2, CuffLinks,MACS, SICER ...)

Learn Galaxy...

https://galaxyproject.org/

Chipster

User-friendly analysis software forhigh-throughput data (GUI)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 21 / 28

Interpretation Downstream Analysis

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Interpretation Downstream Analysis

Genomic Regions Enrichment ofAnnotations Tool (GREAT)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Interpretation Downstream Analysis

Genomic Regions Enrichment ofAnnotations Tool (GREAT)

Functional annotation ofcis-regulatory regions

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Interpretation Downstream Analysis

Genomic Regions Enrichment ofAnnotations Tool (GREAT)

Functional annotation ofcis-regulatory regions

Input: BED file

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Interpretation Downstream Analysis

Genomic Regions Enrichment ofAnnotations Tool (GREAT)

Functional annotation ofcis-regulatory regions

Input: BED file

Output:List of potential target genes

Peak distribution plotsGeneset enrichment analysis

(GO, Pathways, Phenotypes ...)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Interpretation Downstream Analysis

Genomic Regions Enrichment ofAnnotations Tool (GREAT)

Functional annotation ofcis-regulatory regions

Input: BED file

Output:List of potential target genes

Peak distribution plotsGeneset enrichment analysis

(GO, Pathways, Phenotypes ...)

http:

//bejerano.stanford.edu/great/public/html/index.php

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Interpretation Downstream Analysis

Genomic Regions Enrichment ofAnnotations Tool (GREAT)

Functional annotation ofcis-regulatory regions

Input: BED file

Output:List of potential target genes

Peak distribution plotsGeneset enrichment analysis

(GO, Pathways, Phenotypes ...)

http:

//bejerano.stanford.edu/great/public/html/index.php

MEME-ChIP

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Interpretation Downstream Analysis

Genomic Regions Enrichment ofAnnotations Tool (GREAT)

Functional annotation ofcis-regulatory regions

Input: BED file

Output:List of potential target genes

Peak distribution plotsGeneset enrichment analysis

(GO, Pathways, Phenotypes ...)

http:

//bejerano.stanford.edu/great/public/html/index.php

MEME-ChIP

Perform motif discovery, motifenrichment analysis and clustering

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Interpretation Downstream Analysis

Genomic Regions Enrichment ofAnnotations Tool (GREAT)

Functional annotation ofcis-regulatory regions

Input: BED file

Output:List of potential target genes

Peak distribution plotsGeneset enrichment analysis

(GO, Pathways, Phenotypes ...)

http:

//bejerano.stanford.edu/great/public/html/index.php

MEME-ChIP

Perform motif discovery, motifenrichment analysis and clustering

Input: FASTA file

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Interpretation Downstream Analysis

Genomic Regions Enrichment ofAnnotations Tool (GREAT)

Functional annotation ofcis-regulatory regions

Input: BED file

Output:List of potential target genes

Peak distribution plotsGeneset enrichment analysis

(GO, Pathways, Phenotypes ...)

http:

//bejerano.stanford.edu/great/public/html/index.php

MEME-ChIP

Perform motif discovery, motifenrichment analysis and clustering

Input: FASTA file

Output:Enriched motifs

Motif distribution plots

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Interpretation Downstream Analysis

Genomic Regions Enrichment ofAnnotations Tool (GREAT)

Functional annotation ofcis-regulatory regions

Input: BED file

Output:List of potential target genes

Peak distribution plotsGeneset enrichment analysis

(GO, Pathways, Phenotypes ...)

http:

//bejerano.stanford.edu/great/public/html/index.php

MEME-ChIP

Perform motif discovery, motifenrichment analysis and clustering

Input: FASTA file

Output:Enriched motifs

Motif distribution plots

http://meme-suite.org/tools/meme-chip

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 22 / 28

Online Resources ENCODE

Encyclopedia of DNA Elements

Aim: build a comprehensive parts list offunctional elements in the human genome

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 23 / 28

Online Resources ENCODE

Encyclopedia of DNA Elements

Aim: build a comprehensive parts list offunctional elements in the human genome

New website is extremely easy to use!

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 23 / 28

Online Resources ENCODE

Encyclopedia of DNA Elements

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28

Online Resources ENCODE

Encyclopedia of DNA Elements

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28

Online Resources ENCODE

Encyclopedia of DNA Elements

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28

Online Resources ENCODE

Encyclopedia of DNA Elements

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28

Online Resources ENCODE

Encyclopedia of DNA Elements

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28

Online Resources ENCODE

Encyclopedia of DNA Elements

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28

Online Resources ENCODE

Encyclopedia of DNA Elements

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28

Online Resources ENCODE

Encyclopedia of DNA Elements

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 24 / 28

Online Resources GEO

Gene Expression Omnibus

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28

Online Resources GEO

Gene Expression Omnibus

GEO is a public functional genomics data repository

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28

Online Resources GEO

Gene Expression Omnibus

GEO is a public functional genomics data repository

All NGS datasets have to be deposited to GEO prior to publication

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28

Online Resources GEO

Gene Expression Omnibus

GEO is a public functional genomics data repository

All NGS datasets have to be deposited to GEO prior to publication

All datasets included in GEO are publicly available for academic use

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28

Online Resources GEO

Gene Expression Omnibus

GEO is a public functional genomics data repository

All NGS datasets have to be deposited to GEO prior to publication

All datasets included in GEO are publicly available for academic use

The search function is pretty close to awful

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28

Online Resources GEO

Gene Expression Omnibus

GEO is a public functional genomics data repository

All NGS datasets have to be deposited to GEO prior to publication

All datasets included in GEO are publicly available for academic use

The search function is pretty close to awful

Best way to query the GEO database is by Accession Number(declared in the manuscript, GSE30142, GSM746581)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28

Online Resources GEO

Gene Expression Omnibus

GEO is a public functional genomics data repository

All NGS datasets have to be deposited to GEO prior to publication

All datasets included in GEO are publicly available for academic use

The search function is pretty close to awful

Best way to query the GEO database is by Accession Number(declared in the manuscript, GSE30142, GSM746581)

FTP access to SRA experiment (Sequence Read Archive)

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 25 / 28

Online Resources ENA

European Nucleotide Archive

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 26 / 28

Online Resources ENA

European Nucleotide Archive

ENA provides a comprehensive record of theworld’s nucleotide sequencing information

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 26 / 28

Online Resources ENA

European Nucleotide Archive

ENA provides a comprehensive record of theworld’s nucleotide sequencing information

It covers raw sequencing data, sequence assembly informationand functional annotation

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 26 / 28

Online Resources ENA

European Nucleotide Archive

ENA provides a comprehensive record of theworld’s nucleotide sequencing information

It covers raw sequencing data, sequence assembly informationand functional annotation

The search function is pretty close to awful

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 26 / 28

Online Resources ENA

European Nucleotide Archive

ENA provides a comprehensive record of theworld’s nucleotide sequencing information

It covers raw sequencing data, sequence assembly informationand functional annotation

The search function is pretty close to awful

Large overlap with GEO database

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 26 / 28

Comprehensive database of genomic features

Systemic understandingof biological processes

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 27 / 28

Thank you...

Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 28 / 28

Recommended