View
3.261
Download
1
Category
Tags:
Preview:
DESCRIPTION
an R package that contains a collection of tools for visualizing and analyzing genome-wide data sets. The package works with a variety of genomic interval file types and enables easy summarization and annotation of high throughput data sets with given genomic annotations. http://al2na.github.io/genomation/
Citation preview
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
genomationa toolkit to summarize annotate and visualize genomic intervals
Altuna Akalın1
February 24 2014
1 presented by Package developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Quick introduction
The genomation is an R package that expedites genomicinterval summary and annotation It has the following features
1 Annotation of genomic intervals eg see what of yourintervals overlap with exonintronpromoters
2 Summary of genomic scores or read coverages over pre-definedregions
eg extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3 Visualize genomic interval summaries as meta-region plots orheatmaps
4 Work with multiple file formatseg BAM BED bigWig GFF and generic tabular text files
containing chromosome location information
5 do all these in R )
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Quick introduction
The genomation is an R package that expedites genomicinterval summary and annotation It has the following features
1 Annotation of genomic intervals eg see what of yourintervals overlap with exonintronpromoters
2 Summary of genomic scores or read coverages over pre-definedregions
eg extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3 Visualize genomic interval summaries as meta-region plots orheatmaps
4 Work with multiple file formatseg BAM BED bigWig GFF and generic tabular text files
containing chromosome location information
5 do all these in R )
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Quick introduction
The genomation is an R package that expedites genomicinterval summary and annotation It has the following features
1 Annotation of genomic intervals eg see what of yourintervals overlap with exonintronpromoters
2 Summary of genomic scores or read coverages over pre-definedregions
eg extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3 Visualize genomic interval summaries as meta-region plots orheatmaps
4 Work with multiple file formatseg BAM BED bigWig GFF and generic tabular text files
containing chromosome location information
5 do all these in R )
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Quick introduction
The genomation is an R package that expedites genomicinterval summary and annotation It has the following features
1 Annotation of genomic intervals eg see what of yourintervals overlap with exonintronpromoters
2 Summary of genomic scores or read coverages over pre-definedregions
eg extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3 Visualize genomic interval summaries as meta-region plots orheatmaps
4 Work with multiple file formatseg BAM BED bigWig GFF and generic tabular text files
containing chromosome location information
5 do all these in R )
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Quick introduction
The genomation is an R package that expedites genomicinterval summary and annotation It has the following features
1 Annotation of genomic intervals eg see what of yourintervals overlap with exonintronpromoters
2 Summary of genomic scores or read coverages over pre-definedregions
eg extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3 Visualize genomic interval summaries as meta-region plots orheatmaps
4 Work with multiple file formatseg BAM BED bigWig GFF and generic tabular text files
containing chromosome location information
5 do all these in R )
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely used
Summaries of genomic intervals are one of the useful ways tocommunicate high-dimensional dataTraditionally regions of interest are picked and distribution ofgenomic intervals are summarized on those regions
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Erkek S et al (2013) Molecular determinants of nucleosomeretention at CpG-rich sequences in mouse spermatozoa Nature Structuralamp Molecular Biology
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Stadler M Murr R Burger L et al (2011) DNA-bindingfactors shape the mouse methylome at distal regulatory regions Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Does this mean all of the windows (viewpoints) have a similarenrichment profile
minus100 0 50 100
35
40
45
50
average profile around anchor
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Only 13 of windows have such enrichment Be careful when you areinterpreting the average profiles
05
2 1
0 1
6 2
1 minus100 0 50 100
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Lister R et al (2009) Human DNA methylomes at baseresolution show widespread epigenomic differences Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Feng S et al (2010) Conservation and divergence ofmethylation patterning in plants and animals PNAS
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Issues to keep in mind when developing summarymethods
Genomic data comes in many formats we need a method that isable to work with multiple flat file formatsWe need a method that is not specialized on one type of dataset such as read counts it should also work on other scoringschemes(eg conservation scores) easilyRegions of interest are not always equi-width you should be ableto normalize for length differences by binningMultiple visualization options and fast heatmap generationshould be availableClustering of regions based on multiple summaries (egbinding for different TFs on the same set of regions) on theheatmapEase of use it should not take hours of coding to generate andvisualize summaries
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Overview of genomation features
BAMBigWigBEDGFFTab txtGRanges
BEDGFFTab txtGRanges
Summarize
Annotation
Genomic Intervals
Annotate
Visualize
Base-pairs bins1 2 3 4 n
ScoreMatrixScoreMatrixList object
region 1
region 2
region 3
region 4
region m
IntergenicIntronExonPromoter409
116
218257
iuml iuml 0 500 1000
00
02
04
06
08
10
base-pairs around anchor
read
per
milli
on
TF4TF3TF2TF1
iuml
iuml
0
500
100
0
0 05 1 15 2
TF 4
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 3
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 2
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 1
iuml iuml 0 500 1000
base-pairs around anchor
TF1
TF2
TF3
TF4
007
20
340
60
861
1
meta-region plots meta-region heatmaps
heatmaps for genomic interval sets
Piecharts for annotation
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
installation of the package and the example data
We can install the package and the data using install_github()
function from the devtools package
install dependencies
installpackages( c(datatableplyrreshape2ggplot2gridBasedevtools))
source(httpbioconductororgbiocLiteR)biocLite(c(GenomicRangesrtracklayerimputeRsamtools))
install the packages
library(devtools)install_github(genomation username = al2na)
install the data package
needed for examples
install_github(genomationData username = al2na)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Data import
Various file formats can be used in genomation You can read inannotation or your genomic intervals of interest
library(genomation)tabfile1 lt- systemfile(extdatatab1bed package = genomation)readGeneric(tabfile1)
GRanges with 6 ranges and 0 metadata columns seqnames ranges strand ltRlegt ltIRangesgt ltRlegt [1] chr21 [9437272 9439473] [2] chr21 [9483485 9484663] [3] chr21 [9647866 9648116] [4] chr21 [9708935 9709231] [5] chr21 [9825442 9826296] [6] chr21 [9909011 9909218] --- seqlengths chr21 NA
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Extraction of data over pre-defined genomic regions
ScoreMatrix() and ScoreMatrixBin() are functions used to extractdata over predefined windows
ScoreMatrix is used when all of the windows have the samewidth (eg region around TSS)ScoreMatrixBin is designed for use with windows of unequalwidth (eg enrichment of methylation over exons)
data(cage)data(promoters)sm lt- ScoreMatrix(target = cage windows = promoters)sm
scoreMatrix with dims 1055 2001
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Visualizing ScoreMatrix summary of genomicinvervals over pre-defined regions
plotMeta()heatMeta() heatMatrix() and multiHeatMatrix()are the visualization functions
oldmar lt- par()$marpar(oma = c(0 0 0 0))heatMatrix(sm xcoords = c(-1000 1000))plotMeta(sm xcoords = c(-1000 1000)linecol=blue)par(oma = oldmar)
00
751
52
2 3
minus1000 minus500 0 500 1000 minus1000 minus500 0 500 1000
000
005
010
015
020
025
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with BAM files
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()functions
bamfile = systemfile(teststestbam package=genomation)windows = GRanges(rep(c(12)each=2)
IRanges(rep(c(12) times=2) width=5))scores3 = ScoreMatrix(target=bamfilewindows=windows type=bam)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Quick introduction
The genomation is an R package that expedites genomicinterval summary and annotation It has the following features
1 Annotation of genomic intervals eg see what of yourintervals overlap with exonintronpromoters
2 Summary of genomic scores or read coverages over pre-definedregions
eg extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3 Visualize genomic interval summaries as meta-region plots orheatmaps
4 Work with multiple file formatseg BAM BED bigWig GFF and generic tabular text files
containing chromosome location information
5 do all these in R )
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Quick introduction
The genomation is an R package that expedites genomicinterval summary and annotation It has the following features
1 Annotation of genomic intervals eg see what of yourintervals overlap with exonintronpromoters
2 Summary of genomic scores or read coverages over pre-definedregions
eg extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3 Visualize genomic interval summaries as meta-region plots orheatmaps
4 Work with multiple file formatseg BAM BED bigWig GFF and generic tabular text files
containing chromosome location information
5 do all these in R )
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Quick introduction
The genomation is an R package that expedites genomicinterval summary and annotation It has the following features
1 Annotation of genomic intervals eg see what of yourintervals overlap with exonintronpromoters
2 Summary of genomic scores or read coverages over pre-definedregions
eg extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3 Visualize genomic interval summaries as meta-region plots orheatmaps
4 Work with multiple file formatseg BAM BED bigWig GFF and generic tabular text files
containing chromosome location information
5 do all these in R )
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Quick introduction
The genomation is an R package that expedites genomicinterval summary and annotation It has the following features
1 Annotation of genomic intervals eg see what of yourintervals overlap with exonintronpromoters
2 Summary of genomic scores or read coverages over pre-definedregions
eg extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3 Visualize genomic interval summaries as meta-region plots orheatmaps
4 Work with multiple file formatseg BAM BED bigWig GFF and generic tabular text files
containing chromosome location information
5 do all these in R )
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Quick introduction
The genomation is an R package that expedites genomicinterval summary and annotation It has the following features
1 Annotation of genomic intervals eg see what of yourintervals overlap with exonintronpromoters
2 Summary of genomic scores or read coverages over pre-definedregions
eg extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3 Visualize genomic interval summaries as meta-region plots orheatmaps
4 Work with multiple file formatseg BAM BED bigWig GFF and generic tabular text files
containing chromosome location information
5 do all these in R )
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely used
Summaries of genomic intervals are one of the useful ways tocommunicate high-dimensional dataTraditionally regions of interest are picked and distribution ofgenomic intervals are summarized on those regions
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Erkek S et al (2013) Molecular determinants of nucleosomeretention at CpG-rich sequences in mouse spermatozoa Nature Structuralamp Molecular Biology
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Stadler M Murr R Burger L et al (2011) DNA-bindingfactors shape the mouse methylome at distal regulatory regions Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Does this mean all of the windows (viewpoints) have a similarenrichment profile
minus100 0 50 100
35
40
45
50
average profile around anchor
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Only 13 of windows have such enrichment Be careful when you areinterpreting the average profiles
05
2 1
0 1
6 2
1 minus100 0 50 100
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Lister R et al (2009) Human DNA methylomes at baseresolution show widespread epigenomic differences Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Feng S et al (2010) Conservation and divergence ofmethylation patterning in plants and animals PNAS
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Issues to keep in mind when developing summarymethods
Genomic data comes in many formats we need a method that isable to work with multiple flat file formatsWe need a method that is not specialized on one type of dataset such as read counts it should also work on other scoringschemes(eg conservation scores) easilyRegions of interest are not always equi-width you should be ableto normalize for length differences by binningMultiple visualization options and fast heatmap generationshould be availableClustering of regions based on multiple summaries (egbinding for different TFs on the same set of regions) on theheatmapEase of use it should not take hours of coding to generate andvisualize summaries
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Overview of genomation features
BAMBigWigBEDGFFTab txtGRanges
BEDGFFTab txtGRanges
Summarize
Annotation
Genomic Intervals
Annotate
Visualize
Base-pairs bins1 2 3 4 n
ScoreMatrixScoreMatrixList object
region 1
region 2
region 3
region 4
region m
IntergenicIntronExonPromoter409
116
218257
iuml iuml 0 500 1000
00
02
04
06
08
10
base-pairs around anchor
read
per
milli
on
TF4TF3TF2TF1
iuml
iuml
0
500
100
0
0 05 1 15 2
TF 4
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 3
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 2
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 1
iuml iuml 0 500 1000
base-pairs around anchor
TF1
TF2
TF3
TF4
007
20
340
60
861
1
meta-region plots meta-region heatmaps
heatmaps for genomic interval sets
Piecharts for annotation
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
installation of the package and the example data
We can install the package and the data using install_github()
function from the devtools package
install dependencies
installpackages( c(datatableplyrreshape2ggplot2gridBasedevtools))
source(httpbioconductororgbiocLiteR)biocLite(c(GenomicRangesrtracklayerimputeRsamtools))
install the packages
library(devtools)install_github(genomation username = al2na)
install the data package
needed for examples
install_github(genomationData username = al2na)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Data import
Various file formats can be used in genomation You can read inannotation or your genomic intervals of interest
library(genomation)tabfile1 lt- systemfile(extdatatab1bed package = genomation)readGeneric(tabfile1)
GRanges with 6 ranges and 0 metadata columns seqnames ranges strand ltRlegt ltIRangesgt ltRlegt [1] chr21 [9437272 9439473] [2] chr21 [9483485 9484663] [3] chr21 [9647866 9648116] [4] chr21 [9708935 9709231] [5] chr21 [9825442 9826296] [6] chr21 [9909011 9909218] --- seqlengths chr21 NA
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Extraction of data over pre-defined genomic regions
ScoreMatrix() and ScoreMatrixBin() are functions used to extractdata over predefined windows
ScoreMatrix is used when all of the windows have the samewidth (eg region around TSS)ScoreMatrixBin is designed for use with windows of unequalwidth (eg enrichment of methylation over exons)
data(cage)data(promoters)sm lt- ScoreMatrix(target = cage windows = promoters)sm
scoreMatrix with dims 1055 2001
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Visualizing ScoreMatrix summary of genomicinvervals over pre-defined regions
plotMeta()heatMeta() heatMatrix() and multiHeatMatrix()are the visualization functions
oldmar lt- par()$marpar(oma = c(0 0 0 0))heatMatrix(sm xcoords = c(-1000 1000))plotMeta(sm xcoords = c(-1000 1000)linecol=blue)par(oma = oldmar)
00
751
52
2 3
minus1000 minus500 0 500 1000 minus1000 minus500 0 500 1000
000
005
010
015
020
025
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with BAM files
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()functions
bamfile = systemfile(teststestbam package=genomation)windows = GRanges(rep(c(12)each=2)
IRanges(rep(c(12) times=2) width=5))scores3 = ScoreMatrix(target=bamfilewindows=windows type=bam)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Quick introduction
The genomation is an R package that expedites genomicinterval summary and annotation It has the following features
1 Annotation of genomic intervals eg see what of yourintervals overlap with exonintronpromoters
2 Summary of genomic scores or read coverages over pre-definedregions
eg extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3 Visualize genomic interval summaries as meta-region plots orheatmaps
4 Work with multiple file formatseg BAM BED bigWig GFF and generic tabular text files
containing chromosome location information
5 do all these in R )
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Quick introduction
The genomation is an R package that expedites genomicinterval summary and annotation It has the following features
1 Annotation of genomic intervals eg see what of yourintervals overlap with exonintronpromoters
2 Summary of genomic scores or read coverages over pre-definedregions
eg extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3 Visualize genomic interval summaries as meta-region plots orheatmaps
4 Work with multiple file formatseg BAM BED bigWig GFF and generic tabular text files
containing chromosome location information
5 do all these in R )
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Quick introduction
The genomation is an R package that expedites genomicinterval summary and annotation It has the following features
1 Annotation of genomic intervals eg see what of yourintervals overlap with exonintronpromoters
2 Summary of genomic scores or read coverages over pre-definedregions
eg extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3 Visualize genomic interval summaries as meta-region plots orheatmaps
4 Work with multiple file formatseg BAM BED bigWig GFF and generic tabular text files
containing chromosome location information
5 do all these in R )
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Quick introduction
The genomation is an R package that expedites genomicinterval summary and annotation It has the following features
1 Annotation of genomic intervals eg see what of yourintervals overlap with exonintronpromoters
2 Summary of genomic scores or read coverages over pre-definedregions
eg extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3 Visualize genomic interval summaries as meta-region plots orheatmaps
4 Work with multiple file formatseg BAM BED bigWig GFF and generic tabular text files
containing chromosome location information
5 do all these in R )
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely used
Summaries of genomic intervals are one of the useful ways tocommunicate high-dimensional dataTraditionally regions of interest are picked and distribution ofgenomic intervals are summarized on those regions
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Erkek S et al (2013) Molecular determinants of nucleosomeretention at CpG-rich sequences in mouse spermatozoa Nature Structuralamp Molecular Biology
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Stadler M Murr R Burger L et al (2011) DNA-bindingfactors shape the mouse methylome at distal regulatory regions Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Does this mean all of the windows (viewpoints) have a similarenrichment profile
minus100 0 50 100
35
40
45
50
average profile around anchor
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Only 13 of windows have such enrichment Be careful when you areinterpreting the average profiles
05
2 1
0 1
6 2
1 minus100 0 50 100
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Lister R et al (2009) Human DNA methylomes at baseresolution show widespread epigenomic differences Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Feng S et al (2010) Conservation and divergence ofmethylation patterning in plants and animals PNAS
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Issues to keep in mind when developing summarymethods
Genomic data comes in many formats we need a method that isable to work with multiple flat file formatsWe need a method that is not specialized on one type of dataset such as read counts it should also work on other scoringschemes(eg conservation scores) easilyRegions of interest are not always equi-width you should be ableto normalize for length differences by binningMultiple visualization options and fast heatmap generationshould be availableClustering of regions based on multiple summaries (egbinding for different TFs on the same set of regions) on theheatmapEase of use it should not take hours of coding to generate andvisualize summaries
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Overview of genomation features
BAMBigWigBEDGFFTab txtGRanges
BEDGFFTab txtGRanges
Summarize
Annotation
Genomic Intervals
Annotate
Visualize
Base-pairs bins1 2 3 4 n
ScoreMatrixScoreMatrixList object
region 1
region 2
region 3
region 4
region m
IntergenicIntronExonPromoter409
116
218257
iuml iuml 0 500 1000
00
02
04
06
08
10
base-pairs around anchor
read
per
milli
on
TF4TF3TF2TF1
iuml
iuml
0
500
100
0
0 05 1 15 2
TF 4
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 3
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 2
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 1
iuml iuml 0 500 1000
base-pairs around anchor
TF1
TF2
TF3
TF4
007
20
340
60
861
1
meta-region plots meta-region heatmaps
heatmaps for genomic interval sets
Piecharts for annotation
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
installation of the package and the example data
We can install the package and the data using install_github()
function from the devtools package
install dependencies
installpackages( c(datatableplyrreshape2ggplot2gridBasedevtools))
source(httpbioconductororgbiocLiteR)biocLite(c(GenomicRangesrtracklayerimputeRsamtools))
install the packages
library(devtools)install_github(genomation username = al2na)
install the data package
needed for examples
install_github(genomationData username = al2na)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Data import
Various file formats can be used in genomation You can read inannotation or your genomic intervals of interest
library(genomation)tabfile1 lt- systemfile(extdatatab1bed package = genomation)readGeneric(tabfile1)
GRanges with 6 ranges and 0 metadata columns seqnames ranges strand ltRlegt ltIRangesgt ltRlegt [1] chr21 [9437272 9439473] [2] chr21 [9483485 9484663] [3] chr21 [9647866 9648116] [4] chr21 [9708935 9709231] [5] chr21 [9825442 9826296] [6] chr21 [9909011 9909218] --- seqlengths chr21 NA
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Extraction of data over pre-defined genomic regions
ScoreMatrix() and ScoreMatrixBin() are functions used to extractdata over predefined windows
ScoreMatrix is used when all of the windows have the samewidth (eg region around TSS)ScoreMatrixBin is designed for use with windows of unequalwidth (eg enrichment of methylation over exons)
data(cage)data(promoters)sm lt- ScoreMatrix(target = cage windows = promoters)sm
scoreMatrix with dims 1055 2001
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Visualizing ScoreMatrix summary of genomicinvervals over pre-defined regions
plotMeta()heatMeta() heatMatrix() and multiHeatMatrix()are the visualization functions
oldmar lt- par()$marpar(oma = c(0 0 0 0))heatMatrix(sm xcoords = c(-1000 1000))plotMeta(sm xcoords = c(-1000 1000)linecol=blue)par(oma = oldmar)
00
751
52
2 3
minus1000 minus500 0 500 1000 minus1000 minus500 0 500 1000
000
005
010
015
020
025
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with BAM files
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()functions
bamfile = systemfile(teststestbam package=genomation)windows = GRanges(rep(c(12)each=2)
IRanges(rep(c(12) times=2) width=5))scores3 = ScoreMatrix(target=bamfilewindows=windows type=bam)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Quick introduction
The genomation is an R package that expedites genomicinterval summary and annotation It has the following features
1 Annotation of genomic intervals eg see what of yourintervals overlap with exonintronpromoters
2 Summary of genomic scores or read coverages over pre-definedregions
eg extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3 Visualize genomic interval summaries as meta-region plots orheatmaps
4 Work with multiple file formatseg BAM BED bigWig GFF and generic tabular text files
containing chromosome location information
5 do all these in R )
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Quick introduction
The genomation is an R package that expedites genomicinterval summary and annotation It has the following features
1 Annotation of genomic intervals eg see what of yourintervals overlap with exonintronpromoters
2 Summary of genomic scores or read coverages over pre-definedregions
eg extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3 Visualize genomic interval summaries as meta-region plots orheatmaps
4 Work with multiple file formatseg BAM BED bigWig GFF and generic tabular text files
containing chromosome location information
5 do all these in R )
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Quick introduction
The genomation is an R package that expedites genomicinterval summary and annotation It has the following features
1 Annotation of genomic intervals eg see what of yourintervals overlap with exonintronpromoters
2 Summary of genomic scores or read coverages over pre-definedregions
eg extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3 Visualize genomic interval summaries as meta-region plots orheatmaps
4 Work with multiple file formatseg BAM BED bigWig GFF and generic tabular text files
containing chromosome location information
5 do all these in R )
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely used
Summaries of genomic intervals are one of the useful ways tocommunicate high-dimensional dataTraditionally regions of interest are picked and distribution ofgenomic intervals are summarized on those regions
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Erkek S et al (2013) Molecular determinants of nucleosomeretention at CpG-rich sequences in mouse spermatozoa Nature Structuralamp Molecular Biology
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Stadler M Murr R Burger L et al (2011) DNA-bindingfactors shape the mouse methylome at distal regulatory regions Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Does this mean all of the windows (viewpoints) have a similarenrichment profile
minus100 0 50 100
35
40
45
50
average profile around anchor
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Only 13 of windows have such enrichment Be careful when you areinterpreting the average profiles
05
2 1
0 1
6 2
1 minus100 0 50 100
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Lister R et al (2009) Human DNA methylomes at baseresolution show widespread epigenomic differences Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Feng S et al (2010) Conservation and divergence ofmethylation patterning in plants and animals PNAS
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Issues to keep in mind when developing summarymethods
Genomic data comes in many formats we need a method that isable to work with multiple flat file formatsWe need a method that is not specialized on one type of dataset such as read counts it should also work on other scoringschemes(eg conservation scores) easilyRegions of interest are not always equi-width you should be ableto normalize for length differences by binningMultiple visualization options and fast heatmap generationshould be availableClustering of regions based on multiple summaries (egbinding for different TFs on the same set of regions) on theheatmapEase of use it should not take hours of coding to generate andvisualize summaries
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Overview of genomation features
BAMBigWigBEDGFFTab txtGRanges
BEDGFFTab txtGRanges
Summarize
Annotation
Genomic Intervals
Annotate
Visualize
Base-pairs bins1 2 3 4 n
ScoreMatrixScoreMatrixList object
region 1
region 2
region 3
region 4
region m
IntergenicIntronExonPromoter409
116
218257
iuml iuml 0 500 1000
00
02
04
06
08
10
base-pairs around anchor
read
per
milli
on
TF4TF3TF2TF1
iuml
iuml
0
500
100
0
0 05 1 15 2
TF 4
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 3
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 2
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 1
iuml iuml 0 500 1000
base-pairs around anchor
TF1
TF2
TF3
TF4
007
20
340
60
861
1
meta-region plots meta-region heatmaps
heatmaps for genomic interval sets
Piecharts for annotation
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
installation of the package and the example data
We can install the package and the data using install_github()
function from the devtools package
install dependencies
installpackages( c(datatableplyrreshape2ggplot2gridBasedevtools))
source(httpbioconductororgbiocLiteR)biocLite(c(GenomicRangesrtracklayerimputeRsamtools))
install the packages
library(devtools)install_github(genomation username = al2na)
install the data package
needed for examples
install_github(genomationData username = al2na)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Data import
Various file formats can be used in genomation You can read inannotation or your genomic intervals of interest
library(genomation)tabfile1 lt- systemfile(extdatatab1bed package = genomation)readGeneric(tabfile1)
GRanges with 6 ranges and 0 metadata columns seqnames ranges strand ltRlegt ltIRangesgt ltRlegt [1] chr21 [9437272 9439473] [2] chr21 [9483485 9484663] [3] chr21 [9647866 9648116] [4] chr21 [9708935 9709231] [5] chr21 [9825442 9826296] [6] chr21 [9909011 9909218] --- seqlengths chr21 NA
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Extraction of data over pre-defined genomic regions
ScoreMatrix() and ScoreMatrixBin() are functions used to extractdata over predefined windows
ScoreMatrix is used when all of the windows have the samewidth (eg region around TSS)ScoreMatrixBin is designed for use with windows of unequalwidth (eg enrichment of methylation over exons)
data(cage)data(promoters)sm lt- ScoreMatrix(target = cage windows = promoters)sm
scoreMatrix with dims 1055 2001
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Visualizing ScoreMatrix summary of genomicinvervals over pre-defined regions
plotMeta()heatMeta() heatMatrix() and multiHeatMatrix()are the visualization functions
oldmar lt- par()$marpar(oma = c(0 0 0 0))heatMatrix(sm xcoords = c(-1000 1000))plotMeta(sm xcoords = c(-1000 1000)linecol=blue)par(oma = oldmar)
00
751
52
2 3
minus1000 minus500 0 500 1000 minus1000 minus500 0 500 1000
000
005
010
015
020
025
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with BAM files
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()functions
bamfile = systemfile(teststestbam package=genomation)windows = GRanges(rep(c(12)each=2)
IRanges(rep(c(12) times=2) width=5))scores3 = ScoreMatrix(target=bamfilewindows=windows type=bam)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Quick introduction
The genomation is an R package that expedites genomicinterval summary and annotation It has the following features
1 Annotation of genomic intervals eg see what of yourintervals overlap with exonintronpromoters
2 Summary of genomic scores or read coverages over pre-definedregions
eg extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3 Visualize genomic interval summaries as meta-region plots orheatmaps
4 Work with multiple file formatseg BAM BED bigWig GFF and generic tabular text files
containing chromosome location information
5 do all these in R )
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Quick introduction
The genomation is an R package that expedites genomicinterval summary and annotation It has the following features
1 Annotation of genomic intervals eg see what of yourintervals overlap with exonintronpromoters
2 Summary of genomic scores or read coverages over pre-definedregions
eg extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3 Visualize genomic interval summaries as meta-region plots orheatmaps
4 Work with multiple file formatseg BAM BED bigWig GFF and generic tabular text files
containing chromosome location information
5 do all these in R )
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely used
Summaries of genomic intervals are one of the useful ways tocommunicate high-dimensional dataTraditionally regions of interest are picked and distribution ofgenomic intervals are summarized on those regions
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Erkek S et al (2013) Molecular determinants of nucleosomeretention at CpG-rich sequences in mouse spermatozoa Nature Structuralamp Molecular Biology
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Stadler M Murr R Burger L et al (2011) DNA-bindingfactors shape the mouse methylome at distal regulatory regions Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Does this mean all of the windows (viewpoints) have a similarenrichment profile
minus100 0 50 100
35
40
45
50
average profile around anchor
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Only 13 of windows have such enrichment Be careful when you areinterpreting the average profiles
05
2 1
0 1
6 2
1 minus100 0 50 100
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Lister R et al (2009) Human DNA methylomes at baseresolution show widespread epigenomic differences Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Feng S et al (2010) Conservation and divergence ofmethylation patterning in plants and animals PNAS
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Issues to keep in mind when developing summarymethods
Genomic data comes in many formats we need a method that isable to work with multiple flat file formatsWe need a method that is not specialized on one type of dataset such as read counts it should also work on other scoringschemes(eg conservation scores) easilyRegions of interest are not always equi-width you should be ableto normalize for length differences by binningMultiple visualization options and fast heatmap generationshould be availableClustering of regions based on multiple summaries (egbinding for different TFs on the same set of regions) on theheatmapEase of use it should not take hours of coding to generate andvisualize summaries
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Overview of genomation features
BAMBigWigBEDGFFTab txtGRanges
BEDGFFTab txtGRanges
Summarize
Annotation
Genomic Intervals
Annotate
Visualize
Base-pairs bins1 2 3 4 n
ScoreMatrixScoreMatrixList object
region 1
region 2
region 3
region 4
region m
IntergenicIntronExonPromoter409
116
218257
iuml iuml 0 500 1000
00
02
04
06
08
10
base-pairs around anchor
read
per
milli
on
TF4TF3TF2TF1
iuml
iuml
0
500
100
0
0 05 1 15 2
TF 4
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 3
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 2
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 1
iuml iuml 0 500 1000
base-pairs around anchor
TF1
TF2
TF3
TF4
007
20
340
60
861
1
meta-region plots meta-region heatmaps
heatmaps for genomic interval sets
Piecharts for annotation
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
installation of the package and the example data
We can install the package and the data using install_github()
function from the devtools package
install dependencies
installpackages( c(datatableplyrreshape2ggplot2gridBasedevtools))
source(httpbioconductororgbiocLiteR)biocLite(c(GenomicRangesrtracklayerimputeRsamtools))
install the packages
library(devtools)install_github(genomation username = al2na)
install the data package
needed for examples
install_github(genomationData username = al2na)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Data import
Various file formats can be used in genomation You can read inannotation or your genomic intervals of interest
library(genomation)tabfile1 lt- systemfile(extdatatab1bed package = genomation)readGeneric(tabfile1)
GRanges with 6 ranges and 0 metadata columns seqnames ranges strand ltRlegt ltIRangesgt ltRlegt [1] chr21 [9437272 9439473] [2] chr21 [9483485 9484663] [3] chr21 [9647866 9648116] [4] chr21 [9708935 9709231] [5] chr21 [9825442 9826296] [6] chr21 [9909011 9909218] --- seqlengths chr21 NA
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Extraction of data over pre-defined genomic regions
ScoreMatrix() and ScoreMatrixBin() are functions used to extractdata over predefined windows
ScoreMatrix is used when all of the windows have the samewidth (eg region around TSS)ScoreMatrixBin is designed for use with windows of unequalwidth (eg enrichment of methylation over exons)
data(cage)data(promoters)sm lt- ScoreMatrix(target = cage windows = promoters)sm
scoreMatrix with dims 1055 2001
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Visualizing ScoreMatrix summary of genomicinvervals over pre-defined regions
plotMeta()heatMeta() heatMatrix() and multiHeatMatrix()are the visualization functions
oldmar lt- par()$marpar(oma = c(0 0 0 0))heatMatrix(sm xcoords = c(-1000 1000))plotMeta(sm xcoords = c(-1000 1000)linecol=blue)par(oma = oldmar)
00
751
52
2 3
minus1000 minus500 0 500 1000 minus1000 minus500 0 500 1000
000
005
010
015
020
025
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with BAM files
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()functions
bamfile = systemfile(teststestbam package=genomation)windows = GRanges(rep(c(12)each=2)
IRanges(rep(c(12) times=2) width=5))scores3 = ScoreMatrix(target=bamfilewindows=windows type=bam)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Quick introduction
The genomation is an R package that expedites genomicinterval summary and annotation It has the following features
1 Annotation of genomic intervals eg see what of yourintervals overlap with exonintronpromoters
2 Summary of genomic scores or read coverages over pre-definedregions
eg extract the conservation profile over ChIP-seq binding sites
(equi-width regions) or CpG islands (nonequi-width regions)
3 Visualize genomic interval summaries as meta-region plots orheatmaps
4 Work with multiple file formatseg BAM BED bigWig GFF and generic tabular text files
containing chromosome location information
5 do all these in R )
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely used
Summaries of genomic intervals are one of the useful ways tocommunicate high-dimensional dataTraditionally regions of interest are picked and distribution ofgenomic intervals are summarized on those regions
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Erkek S et al (2013) Molecular determinants of nucleosomeretention at CpG-rich sequences in mouse spermatozoa Nature Structuralamp Molecular Biology
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Stadler M Murr R Burger L et al (2011) DNA-bindingfactors shape the mouse methylome at distal regulatory regions Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Does this mean all of the windows (viewpoints) have a similarenrichment profile
minus100 0 50 100
35
40
45
50
average profile around anchor
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Only 13 of windows have such enrichment Be careful when you areinterpreting the average profiles
05
2 1
0 1
6 2
1 minus100 0 50 100
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Lister R et al (2009) Human DNA methylomes at baseresolution show widespread epigenomic differences Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Feng S et al (2010) Conservation and divergence ofmethylation patterning in plants and animals PNAS
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Issues to keep in mind when developing summarymethods
Genomic data comes in many formats we need a method that isable to work with multiple flat file formatsWe need a method that is not specialized on one type of dataset such as read counts it should also work on other scoringschemes(eg conservation scores) easilyRegions of interest are not always equi-width you should be ableto normalize for length differences by binningMultiple visualization options and fast heatmap generationshould be availableClustering of regions based on multiple summaries (egbinding for different TFs on the same set of regions) on theheatmapEase of use it should not take hours of coding to generate andvisualize summaries
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Overview of genomation features
BAMBigWigBEDGFFTab txtGRanges
BEDGFFTab txtGRanges
Summarize
Annotation
Genomic Intervals
Annotate
Visualize
Base-pairs bins1 2 3 4 n
ScoreMatrixScoreMatrixList object
region 1
region 2
region 3
region 4
region m
IntergenicIntronExonPromoter409
116
218257
iuml iuml 0 500 1000
00
02
04
06
08
10
base-pairs around anchor
read
per
milli
on
TF4TF3TF2TF1
iuml
iuml
0
500
100
0
0 05 1 15 2
TF 4
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 3
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 2
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 1
iuml iuml 0 500 1000
base-pairs around anchor
TF1
TF2
TF3
TF4
007
20
340
60
861
1
meta-region plots meta-region heatmaps
heatmaps for genomic interval sets
Piecharts for annotation
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
installation of the package and the example data
We can install the package and the data using install_github()
function from the devtools package
install dependencies
installpackages( c(datatableplyrreshape2ggplot2gridBasedevtools))
source(httpbioconductororgbiocLiteR)biocLite(c(GenomicRangesrtracklayerimputeRsamtools))
install the packages
library(devtools)install_github(genomation username = al2na)
install the data package
needed for examples
install_github(genomationData username = al2na)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Data import
Various file formats can be used in genomation You can read inannotation or your genomic intervals of interest
library(genomation)tabfile1 lt- systemfile(extdatatab1bed package = genomation)readGeneric(tabfile1)
GRanges with 6 ranges and 0 metadata columns seqnames ranges strand ltRlegt ltIRangesgt ltRlegt [1] chr21 [9437272 9439473] [2] chr21 [9483485 9484663] [3] chr21 [9647866 9648116] [4] chr21 [9708935 9709231] [5] chr21 [9825442 9826296] [6] chr21 [9909011 9909218] --- seqlengths chr21 NA
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Extraction of data over pre-defined genomic regions
ScoreMatrix() and ScoreMatrixBin() are functions used to extractdata over predefined windows
ScoreMatrix is used when all of the windows have the samewidth (eg region around TSS)ScoreMatrixBin is designed for use with windows of unequalwidth (eg enrichment of methylation over exons)
data(cage)data(promoters)sm lt- ScoreMatrix(target = cage windows = promoters)sm
scoreMatrix with dims 1055 2001
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Visualizing ScoreMatrix summary of genomicinvervals over pre-defined regions
plotMeta()heatMeta() heatMatrix() and multiHeatMatrix()are the visualization functions
oldmar lt- par()$marpar(oma = c(0 0 0 0))heatMatrix(sm xcoords = c(-1000 1000))plotMeta(sm xcoords = c(-1000 1000)linecol=blue)par(oma = oldmar)
00
751
52
2 3
minus1000 minus500 0 500 1000 minus1000 minus500 0 500 1000
000
005
010
015
020
025
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with BAM files
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()functions
bamfile = systemfile(teststestbam package=genomation)windows = GRanges(rep(c(12)each=2)
IRanges(rep(c(12) times=2) width=5))scores3 = ScoreMatrix(target=bamfilewindows=windows type=bam)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely used
Summaries of genomic intervals are one of the useful ways tocommunicate high-dimensional dataTraditionally regions of interest are picked and distribution ofgenomic intervals are summarized on those regions
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Erkek S et al (2013) Molecular determinants of nucleosomeretention at CpG-rich sequences in mouse spermatozoa Nature Structuralamp Molecular Biology
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Stadler M Murr R Burger L et al (2011) DNA-bindingfactors shape the mouse methylome at distal regulatory regions Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Does this mean all of the windows (viewpoints) have a similarenrichment profile
minus100 0 50 100
35
40
45
50
average profile around anchor
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Only 13 of windows have such enrichment Be careful when you areinterpreting the average profiles
05
2 1
0 1
6 2
1 minus100 0 50 100
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Lister R et al (2009) Human DNA methylomes at baseresolution show widespread epigenomic differences Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Feng S et al (2010) Conservation and divergence ofmethylation patterning in plants and animals PNAS
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Issues to keep in mind when developing summarymethods
Genomic data comes in many formats we need a method that isable to work with multiple flat file formatsWe need a method that is not specialized on one type of dataset such as read counts it should also work on other scoringschemes(eg conservation scores) easilyRegions of interest are not always equi-width you should be ableto normalize for length differences by binningMultiple visualization options and fast heatmap generationshould be availableClustering of regions based on multiple summaries (egbinding for different TFs on the same set of regions) on theheatmapEase of use it should not take hours of coding to generate andvisualize summaries
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Overview of genomation features
BAMBigWigBEDGFFTab txtGRanges
BEDGFFTab txtGRanges
Summarize
Annotation
Genomic Intervals
Annotate
Visualize
Base-pairs bins1 2 3 4 n
ScoreMatrixScoreMatrixList object
region 1
region 2
region 3
region 4
region m
IntergenicIntronExonPromoter409
116
218257
iuml iuml 0 500 1000
00
02
04
06
08
10
base-pairs around anchor
read
per
milli
on
TF4TF3TF2TF1
iuml
iuml
0
500
100
0
0 05 1 15 2
TF 4
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 3
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 2
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 1
iuml iuml 0 500 1000
base-pairs around anchor
TF1
TF2
TF3
TF4
007
20
340
60
861
1
meta-region plots meta-region heatmaps
heatmaps for genomic interval sets
Piecharts for annotation
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
installation of the package and the example data
We can install the package and the data using install_github()
function from the devtools package
install dependencies
installpackages( c(datatableplyrreshape2ggplot2gridBasedevtools))
source(httpbioconductororgbiocLiteR)biocLite(c(GenomicRangesrtracklayerimputeRsamtools))
install the packages
library(devtools)install_github(genomation username = al2na)
install the data package
needed for examples
install_github(genomationData username = al2na)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Data import
Various file formats can be used in genomation You can read inannotation or your genomic intervals of interest
library(genomation)tabfile1 lt- systemfile(extdatatab1bed package = genomation)readGeneric(tabfile1)
GRanges with 6 ranges and 0 metadata columns seqnames ranges strand ltRlegt ltIRangesgt ltRlegt [1] chr21 [9437272 9439473] [2] chr21 [9483485 9484663] [3] chr21 [9647866 9648116] [4] chr21 [9708935 9709231] [5] chr21 [9825442 9826296] [6] chr21 [9909011 9909218] --- seqlengths chr21 NA
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Extraction of data over pre-defined genomic regions
ScoreMatrix() and ScoreMatrixBin() are functions used to extractdata over predefined windows
ScoreMatrix is used when all of the windows have the samewidth (eg region around TSS)ScoreMatrixBin is designed for use with windows of unequalwidth (eg enrichment of methylation over exons)
data(cage)data(promoters)sm lt- ScoreMatrix(target = cage windows = promoters)sm
scoreMatrix with dims 1055 2001
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Visualizing ScoreMatrix summary of genomicinvervals over pre-defined regions
plotMeta()heatMeta() heatMatrix() and multiHeatMatrix()are the visualization functions
oldmar lt- par()$marpar(oma = c(0 0 0 0))heatMatrix(sm xcoords = c(-1000 1000))plotMeta(sm xcoords = c(-1000 1000)linecol=blue)par(oma = oldmar)
00
751
52
2 3
minus1000 minus500 0 500 1000 minus1000 minus500 0 500 1000
000
005
010
015
020
025
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with BAM files
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()functions
bamfile = systemfile(teststestbam package=genomation)windows = GRanges(rep(c(12)each=2)
IRanges(rep(c(12) times=2) width=5))scores3 = ScoreMatrix(target=bamfilewindows=windows type=bam)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Erkek S et al (2013) Molecular determinants of nucleosomeretention at CpG-rich sequences in mouse spermatozoa Nature Structuralamp Molecular Biology
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Stadler M Murr R Burger L et al (2011) DNA-bindingfactors shape the mouse methylome at distal regulatory regions Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Does this mean all of the windows (viewpoints) have a similarenrichment profile
minus100 0 50 100
35
40
45
50
average profile around anchor
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Only 13 of windows have such enrichment Be careful when you areinterpreting the average profiles
05
2 1
0 1
6 2
1 minus100 0 50 100
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Lister R et al (2009) Human DNA methylomes at baseresolution show widespread epigenomic differences Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Feng S et al (2010) Conservation and divergence ofmethylation patterning in plants and animals PNAS
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Issues to keep in mind when developing summarymethods
Genomic data comes in many formats we need a method that isable to work with multiple flat file formatsWe need a method that is not specialized on one type of dataset such as read counts it should also work on other scoringschemes(eg conservation scores) easilyRegions of interest are not always equi-width you should be ableto normalize for length differences by binningMultiple visualization options and fast heatmap generationshould be availableClustering of regions based on multiple summaries (egbinding for different TFs on the same set of regions) on theheatmapEase of use it should not take hours of coding to generate andvisualize summaries
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Overview of genomation features
BAMBigWigBEDGFFTab txtGRanges
BEDGFFTab txtGRanges
Summarize
Annotation
Genomic Intervals
Annotate
Visualize
Base-pairs bins1 2 3 4 n
ScoreMatrixScoreMatrixList object
region 1
region 2
region 3
region 4
region m
IntergenicIntronExonPromoter409
116
218257
iuml iuml 0 500 1000
00
02
04
06
08
10
base-pairs around anchor
read
per
milli
on
TF4TF3TF2TF1
iuml
iuml
0
500
100
0
0 05 1 15 2
TF 4
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 3
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 2
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 1
iuml iuml 0 500 1000
base-pairs around anchor
TF1
TF2
TF3
TF4
007
20
340
60
861
1
meta-region plots meta-region heatmaps
heatmaps for genomic interval sets
Piecharts for annotation
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
installation of the package and the example data
We can install the package and the data using install_github()
function from the devtools package
install dependencies
installpackages( c(datatableplyrreshape2ggplot2gridBasedevtools))
source(httpbioconductororgbiocLiteR)biocLite(c(GenomicRangesrtracklayerimputeRsamtools))
install the packages
library(devtools)install_github(genomation username = al2na)
install the data package
needed for examples
install_github(genomationData username = al2na)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Data import
Various file formats can be used in genomation You can read inannotation or your genomic intervals of interest
library(genomation)tabfile1 lt- systemfile(extdatatab1bed package = genomation)readGeneric(tabfile1)
GRanges with 6 ranges and 0 metadata columns seqnames ranges strand ltRlegt ltIRangesgt ltRlegt [1] chr21 [9437272 9439473] [2] chr21 [9483485 9484663] [3] chr21 [9647866 9648116] [4] chr21 [9708935 9709231] [5] chr21 [9825442 9826296] [6] chr21 [9909011 9909218] --- seqlengths chr21 NA
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Extraction of data over pre-defined genomic regions
ScoreMatrix() and ScoreMatrixBin() are functions used to extractdata over predefined windows
ScoreMatrix is used when all of the windows have the samewidth (eg region around TSS)ScoreMatrixBin is designed for use with windows of unequalwidth (eg enrichment of methylation over exons)
data(cage)data(promoters)sm lt- ScoreMatrix(target = cage windows = promoters)sm
scoreMatrix with dims 1055 2001
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Visualizing ScoreMatrix summary of genomicinvervals over pre-defined regions
plotMeta()heatMeta() heatMatrix() and multiHeatMatrix()are the visualization functions
oldmar lt- par()$marpar(oma = c(0 0 0 0))heatMatrix(sm xcoords = c(-1000 1000))plotMeta(sm xcoords = c(-1000 1000)linecol=blue)par(oma = oldmar)
00
751
52
2 3
minus1000 minus500 0 500 1000 minus1000 minus500 0 500 1000
000
005
010
015
020
025
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with BAM files
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()functions
bamfile = systemfile(teststestbam package=genomation)windows = GRanges(rep(c(12)each=2)
IRanges(rep(c(12) times=2) width=5))scores3 = ScoreMatrix(target=bamfilewindows=windows type=bam)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Stadler M Murr R Burger L et al (2011) DNA-bindingfactors shape the mouse methylome at distal regulatory regions Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Does this mean all of the windows (viewpoints) have a similarenrichment profile
minus100 0 50 100
35
40
45
50
average profile around anchor
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Only 13 of windows have such enrichment Be careful when you areinterpreting the average profiles
05
2 1
0 1
6 2
1 minus100 0 50 100
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Lister R et al (2009) Human DNA methylomes at baseresolution show widespread epigenomic differences Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Feng S et al (2010) Conservation and divergence ofmethylation patterning in plants and animals PNAS
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Issues to keep in mind when developing summarymethods
Genomic data comes in many formats we need a method that isable to work with multiple flat file formatsWe need a method that is not specialized on one type of dataset such as read counts it should also work on other scoringschemes(eg conservation scores) easilyRegions of interest are not always equi-width you should be ableto normalize for length differences by binningMultiple visualization options and fast heatmap generationshould be availableClustering of regions based on multiple summaries (egbinding for different TFs on the same set of regions) on theheatmapEase of use it should not take hours of coding to generate andvisualize summaries
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Overview of genomation features
BAMBigWigBEDGFFTab txtGRanges
BEDGFFTab txtGRanges
Summarize
Annotation
Genomic Intervals
Annotate
Visualize
Base-pairs bins1 2 3 4 n
ScoreMatrixScoreMatrixList object
region 1
region 2
region 3
region 4
region m
IntergenicIntronExonPromoter409
116
218257
iuml iuml 0 500 1000
00
02
04
06
08
10
base-pairs around anchor
read
per
milli
on
TF4TF3TF2TF1
iuml
iuml
0
500
100
0
0 05 1 15 2
TF 4
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 3
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 2
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 1
iuml iuml 0 500 1000
base-pairs around anchor
TF1
TF2
TF3
TF4
007
20
340
60
861
1
meta-region plots meta-region heatmaps
heatmaps for genomic interval sets
Piecharts for annotation
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
installation of the package and the example data
We can install the package and the data using install_github()
function from the devtools package
install dependencies
installpackages( c(datatableplyrreshape2ggplot2gridBasedevtools))
source(httpbioconductororgbiocLiteR)biocLite(c(GenomicRangesrtracklayerimputeRsamtools))
install the packages
library(devtools)install_github(genomation username = al2na)
install the data package
needed for examples
install_github(genomationData username = al2na)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Data import
Various file formats can be used in genomation You can read inannotation or your genomic intervals of interest
library(genomation)tabfile1 lt- systemfile(extdatatab1bed package = genomation)readGeneric(tabfile1)
GRanges with 6 ranges and 0 metadata columns seqnames ranges strand ltRlegt ltIRangesgt ltRlegt [1] chr21 [9437272 9439473] [2] chr21 [9483485 9484663] [3] chr21 [9647866 9648116] [4] chr21 [9708935 9709231] [5] chr21 [9825442 9826296] [6] chr21 [9909011 9909218] --- seqlengths chr21 NA
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Extraction of data over pre-defined genomic regions
ScoreMatrix() and ScoreMatrixBin() are functions used to extractdata over predefined windows
ScoreMatrix is used when all of the windows have the samewidth (eg region around TSS)ScoreMatrixBin is designed for use with windows of unequalwidth (eg enrichment of methylation over exons)
data(cage)data(promoters)sm lt- ScoreMatrix(target = cage windows = promoters)sm
scoreMatrix with dims 1055 2001
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Visualizing ScoreMatrix summary of genomicinvervals over pre-defined regions
plotMeta()heatMeta() heatMatrix() and multiHeatMatrix()are the visualization functions
oldmar lt- par()$marpar(oma = c(0 0 0 0))heatMatrix(sm xcoords = c(-1000 1000))plotMeta(sm xcoords = c(-1000 1000)linecol=blue)par(oma = oldmar)
00
751
52
2 3
minus1000 minus500 0 500 1000 minus1000 minus500 0 500 1000
000
005
010
015
020
025
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with BAM files
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()functions
bamfile = systemfile(teststestbam package=genomation)windows = GRanges(rep(c(12)each=2)
IRanges(rep(c(12) times=2) width=5))scores3 = ScoreMatrix(target=bamfilewindows=windows type=bam)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Does this mean all of the windows (viewpoints) have a similarenrichment profile
minus100 0 50 100
35
40
45
50
average profile around anchor
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Only 13 of windows have such enrichment Be careful when you areinterpreting the average profiles
05
2 1
0 1
6 2
1 minus100 0 50 100
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Lister R et al (2009) Human DNA methylomes at baseresolution show widespread epigenomic differences Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Feng S et al (2010) Conservation and divergence ofmethylation patterning in plants and animals PNAS
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Issues to keep in mind when developing summarymethods
Genomic data comes in many formats we need a method that isable to work with multiple flat file formatsWe need a method that is not specialized on one type of dataset such as read counts it should also work on other scoringschemes(eg conservation scores) easilyRegions of interest are not always equi-width you should be ableto normalize for length differences by binningMultiple visualization options and fast heatmap generationshould be availableClustering of regions based on multiple summaries (egbinding for different TFs on the same set of regions) on theheatmapEase of use it should not take hours of coding to generate andvisualize summaries
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Overview of genomation features
BAMBigWigBEDGFFTab txtGRanges
BEDGFFTab txtGRanges
Summarize
Annotation
Genomic Intervals
Annotate
Visualize
Base-pairs bins1 2 3 4 n
ScoreMatrixScoreMatrixList object
region 1
region 2
region 3
region 4
region m
IntergenicIntronExonPromoter409
116
218257
iuml iuml 0 500 1000
00
02
04
06
08
10
base-pairs around anchor
read
per
milli
on
TF4TF3TF2TF1
iuml
iuml
0
500
100
0
0 05 1 15 2
TF 4
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 3
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 2
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 1
iuml iuml 0 500 1000
base-pairs around anchor
TF1
TF2
TF3
TF4
007
20
340
60
861
1
meta-region plots meta-region heatmaps
heatmaps for genomic interval sets
Piecharts for annotation
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
installation of the package and the example data
We can install the package and the data using install_github()
function from the devtools package
install dependencies
installpackages( c(datatableplyrreshape2ggplot2gridBasedevtools))
source(httpbioconductororgbiocLiteR)biocLite(c(GenomicRangesrtracklayerimputeRsamtools))
install the packages
library(devtools)install_github(genomation username = al2na)
install the data package
needed for examples
install_github(genomationData username = al2na)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Data import
Various file formats can be used in genomation You can read inannotation or your genomic intervals of interest
library(genomation)tabfile1 lt- systemfile(extdatatab1bed package = genomation)readGeneric(tabfile1)
GRanges with 6 ranges and 0 metadata columns seqnames ranges strand ltRlegt ltIRangesgt ltRlegt [1] chr21 [9437272 9439473] [2] chr21 [9483485 9484663] [3] chr21 [9647866 9648116] [4] chr21 [9708935 9709231] [5] chr21 [9825442 9826296] [6] chr21 [9909011 9909218] --- seqlengths chr21 NA
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Extraction of data over pre-defined genomic regions
ScoreMatrix() and ScoreMatrixBin() are functions used to extractdata over predefined windows
ScoreMatrix is used when all of the windows have the samewidth (eg region around TSS)ScoreMatrixBin is designed for use with windows of unequalwidth (eg enrichment of methylation over exons)
data(cage)data(promoters)sm lt- ScoreMatrix(target = cage windows = promoters)sm
scoreMatrix with dims 1055 2001
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Visualizing ScoreMatrix summary of genomicinvervals over pre-defined regions
plotMeta()heatMeta() heatMatrix() and multiHeatMatrix()are the visualization functions
oldmar lt- par()$marpar(oma = c(0 0 0 0))heatMatrix(sm xcoords = c(-1000 1000))plotMeta(sm xcoords = c(-1000 1000)linecol=blue)par(oma = oldmar)
00
751
52
2 3
minus1000 minus500 0 500 1000 minus1000 minus500 0 500 1000
000
005
010
015
020
025
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with BAM files
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()functions
bamfile = systemfile(teststestbam package=genomation)windows = GRanges(rep(c(12)each=2)
IRanges(rep(c(12) times=2) width=5))scores3 = ScoreMatrix(target=bamfilewindows=windows type=bam)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Utility and futility of average profiles
Only 13 of windows have such enrichment Be careful when you areinterpreting the average profiles
05
2 1
0 1
6 2
1 minus100 0 50 100
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Lister R et al (2009) Human DNA methylomes at baseresolution show widespread epigenomic differences Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Feng S et al (2010) Conservation and divergence ofmethylation patterning in plants and animals PNAS
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Issues to keep in mind when developing summarymethods
Genomic data comes in many formats we need a method that isable to work with multiple flat file formatsWe need a method that is not specialized on one type of dataset such as read counts it should also work on other scoringschemes(eg conservation scores) easilyRegions of interest are not always equi-width you should be ableto normalize for length differences by binningMultiple visualization options and fast heatmap generationshould be availableClustering of regions based on multiple summaries (egbinding for different TFs on the same set of regions) on theheatmapEase of use it should not take hours of coding to generate andvisualize summaries
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Overview of genomation features
BAMBigWigBEDGFFTab txtGRanges
BEDGFFTab txtGRanges
Summarize
Annotation
Genomic Intervals
Annotate
Visualize
Base-pairs bins1 2 3 4 n
ScoreMatrixScoreMatrixList object
region 1
region 2
region 3
region 4
region m
IntergenicIntronExonPromoter409
116
218257
iuml iuml 0 500 1000
00
02
04
06
08
10
base-pairs around anchor
read
per
milli
on
TF4TF3TF2TF1
iuml
iuml
0
500
100
0
0 05 1 15 2
TF 4
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 3
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 2
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 1
iuml iuml 0 500 1000
base-pairs around anchor
TF1
TF2
TF3
TF4
007
20
340
60
861
1
meta-region plots meta-region heatmaps
heatmaps for genomic interval sets
Piecharts for annotation
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
installation of the package and the example data
We can install the package and the data using install_github()
function from the devtools package
install dependencies
installpackages( c(datatableplyrreshape2ggplot2gridBasedevtools))
source(httpbioconductororgbiocLiteR)biocLite(c(GenomicRangesrtracklayerimputeRsamtools))
install the packages
library(devtools)install_github(genomation username = al2na)
install the data package
needed for examples
install_github(genomationData username = al2na)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Data import
Various file formats can be used in genomation You can read inannotation or your genomic intervals of interest
library(genomation)tabfile1 lt- systemfile(extdatatab1bed package = genomation)readGeneric(tabfile1)
GRanges with 6 ranges and 0 metadata columns seqnames ranges strand ltRlegt ltIRangesgt ltRlegt [1] chr21 [9437272 9439473] [2] chr21 [9483485 9484663] [3] chr21 [9647866 9648116] [4] chr21 [9708935 9709231] [5] chr21 [9825442 9826296] [6] chr21 [9909011 9909218] --- seqlengths chr21 NA
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Extraction of data over pre-defined genomic regions
ScoreMatrix() and ScoreMatrixBin() are functions used to extractdata over predefined windows
ScoreMatrix is used when all of the windows have the samewidth (eg region around TSS)ScoreMatrixBin is designed for use with windows of unequalwidth (eg enrichment of methylation over exons)
data(cage)data(promoters)sm lt- ScoreMatrix(target = cage windows = promoters)sm
scoreMatrix with dims 1055 2001
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Visualizing ScoreMatrix summary of genomicinvervals over pre-defined regions
plotMeta()heatMeta() heatMatrix() and multiHeatMatrix()are the visualization functions
oldmar lt- par()$marpar(oma = c(0 0 0 0))heatMatrix(sm xcoords = c(-1000 1000))plotMeta(sm xcoords = c(-1000 1000)linecol=blue)par(oma = oldmar)
00
751
52
2 3
minus1000 minus500 0 500 1000 minus1000 minus500 0 500 1000
000
005
010
015
020
025
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with BAM files
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()functions
bamfile = systemfile(teststestbam package=genomation)windows = GRanges(rep(c(12)each=2)
IRanges(rep(c(12) times=2) width=5))scores3 = ScoreMatrix(target=bamfilewindows=windows type=bam)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Lister R et al (2009) Human DNA methylomes at baseresolution show widespread epigenomic differences Nature
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Feng S et al (2010) Conservation and divergence ofmethylation patterning in plants and animals PNAS
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Issues to keep in mind when developing summarymethods
Genomic data comes in many formats we need a method that isable to work with multiple flat file formatsWe need a method that is not specialized on one type of dataset such as read counts it should also work on other scoringschemes(eg conservation scores) easilyRegions of interest are not always equi-width you should be ableto normalize for length differences by binningMultiple visualization options and fast heatmap generationshould be availableClustering of regions based on multiple summaries (egbinding for different TFs on the same set of regions) on theheatmapEase of use it should not take hours of coding to generate andvisualize summaries
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Overview of genomation features
BAMBigWigBEDGFFTab txtGRanges
BEDGFFTab txtGRanges
Summarize
Annotation
Genomic Intervals
Annotate
Visualize
Base-pairs bins1 2 3 4 n
ScoreMatrixScoreMatrixList object
region 1
region 2
region 3
region 4
region m
IntergenicIntronExonPromoter409
116
218257
iuml iuml 0 500 1000
00
02
04
06
08
10
base-pairs around anchor
read
per
milli
on
TF4TF3TF2TF1
iuml
iuml
0
500
100
0
0 05 1 15 2
TF 4
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 3
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 2
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 1
iuml iuml 0 500 1000
base-pairs around anchor
TF1
TF2
TF3
TF4
007
20
340
60
861
1
meta-region plots meta-region heatmaps
heatmaps for genomic interval sets
Piecharts for annotation
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
installation of the package and the example data
We can install the package and the data using install_github()
function from the devtools package
install dependencies
installpackages( c(datatableplyrreshape2ggplot2gridBasedevtools))
source(httpbioconductororgbiocLiteR)biocLite(c(GenomicRangesrtracklayerimputeRsamtools))
install the packages
library(devtools)install_github(genomation username = al2na)
install the data package
needed for examples
install_github(genomationData username = al2na)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Data import
Various file formats can be used in genomation You can read inannotation or your genomic intervals of interest
library(genomation)tabfile1 lt- systemfile(extdatatab1bed package = genomation)readGeneric(tabfile1)
GRanges with 6 ranges and 0 metadata columns seqnames ranges strand ltRlegt ltIRangesgt ltRlegt [1] chr21 [9437272 9439473] [2] chr21 [9483485 9484663] [3] chr21 [9647866 9648116] [4] chr21 [9708935 9709231] [5] chr21 [9825442 9826296] [6] chr21 [9909011 9909218] --- seqlengths chr21 NA
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Extraction of data over pre-defined genomic regions
ScoreMatrix() and ScoreMatrixBin() are functions used to extractdata over predefined windows
ScoreMatrix is used when all of the windows have the samewidth (eg region around TSS)ScoreMatrixBin is designed for use with windows of unequalwidth (eg enrichment of methylation over exons)
data(cage)data(promoters)sm lt- ScoreMatrix(target = cage windows = promoters)sm
scoreMatrix with dims 1055 2001
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Visualizing ScoreMatrix summary of genomicinvervals over pre-defined regions
plotMeta()heatMeta() heatMatrix() and multiHeatMatrix()are the visualization functions
oldmar lt- par()$marpar(oma = c(0 0 0 0))heatMatrix(sm xcoords = c(-1000 1000))plotMeta(sm xcoords = c(-1000 1000)linecol=blue)par(oma = oldmar)
00
751
52
2 3
minus1000 minus500 0 500 1000 minus1000 minus500 0 500 1000
000
005
010
015
020
025
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with BAM files
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()functions
bamfile = systemfile(teststestbam package=genomation)windows = GRanges(rep(c(12)each=2)
IRanges(rep(c(12) times=2) width=5))scores3 = ScoreMatrix(target=bamfilewindows=windows type=bam)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Genomic interval summaries are widely usedExamples from literature
Figure Feng S et al (2010) Conservation and divergence ofmethylation patterning in plants and animals PNAS
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Issues to keep in mind when developing summarymethods
Genomic data comes in many formats we need a method that isable to work with multiple flat file formatsWe need a method that is not specialized on one type of dataset such as read counts it should also work on other scoringschemes(eg conservation scores) easilyRegions of interest are not always equi-width you should be ableto normalize for length differences by binningMultiple visualization options and fast heatmap generationshould be availableClustering of regions based on multiple summaries (egbinding for different TFs on the same set of regions) on theheatmapEase of use it should not take hours of coding to generate andvisualize summaries
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Overview of genomation features
BAMBigWigBEDGFFTab txtGRanges
BEDGFFTab txtGRanges
Summarize
Annotation
Genomic Intervals
Annotate
Visualize
Base-pairs bins1 2 3 4 n
ScoreMatrixScoreMatrixList object
region 1
region 2
region 3
region 4
region m
IntergenicIntronExonPromoter409
116
218257
iuml iuml 0 500 1000
00
02
04
06
08
10
base-pairs around anchor
read
per
milli
on
TF4TF3TF2TF1
iuml
iuml
0
500
100
0
0 05 1 15 2
TF 4
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 3
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 2
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 1
iuml iuml 0 500 1000
base-pairs around anchor
TF1
TF2
TF3
TF4
007
20
340
60
861
1
meta-region plots meta-region heatmaps
heatmaps for genomic interval sets
Piecharts for annotation
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
installation of the package and the example data
We can install the package and the data using install_github()
function from the devtools package
install dependencies
installpackages( c(datatableplyrreshape2ggplot2gridBasedevtools))
source(httpbioconductororgbiocLiteR)biocLite(c(GenomicRangesrtracklayerimputeRsamtools))
install the packages
library(devtools)install_github(genomation username = al2na)
install the data package
needed for examples
install_github(genomationData username = al2na)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Data import
Various file formats can be used in genomation You can read inannotation or your genomic intervals of interest
library(genomation)tabfile1 lt- systemfile(extdatatab1bed package = genomation)readGeneric(tabfile1)
GRanges with 6 ranges and 0 metadata columns seqnames ranges strand ltRlegt ltIRangesgt ltRlegt [1] chr21 [9437272 9439473] [2] chr21 [9483485 9484663] [3] chr21 [9647866 9648116] [4] chr21 [9708935 9709231] [5] chr21 [9825442 9826296] [6] chr21 [9909011 9909218] --- seqlengths chr21 NA
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Extraction of data over pre-defined genomic regions
ScoreMatrix() and ScoreMatrixBin() are functions used to extractdata over predefined windows
ScoreMatrix is used when all of the windows have the samewidth (eg region around TSS)ScoreMatrixBin is designed for use with windows of unequalwidth (eg enrichment of methylation over exons)
data(cage)data(promoters)sm lt- ScoreMatrix(target = cage windows = promoters)sm
scoreMatrix with dims 1055 2001
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Visualizing ScoreMatrix summary of genomicinvervals over pre-defined regions
plotMeta()heatMeta() heatMatrix() and multiHeatMatrix()are the visualization functions
oldmar lt- par()$marpar(oma = c(0 0 0 0))heatMatrix(sm xcoords = c(-1000 1000))plotMeta(sm xcoords = c(-1000 1000)linecol=blue)par(oma = oldmar)
00
751
52
2 3
minus1000 minus500 0 500 1000 minus1000 minus500 0 500 1000
000
005
010
015
020
025
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with BAM files
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()functions
bamfile = systemfile(teststestbam package=genomation)windows = GRanges(rep(c(12)each=2)
IRanges(rep(c(12) times=2) width=5))scores3 = ScoreMatrix(target=bamfilewindows=windows type=bam)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Issues to keep in mind when developing summarymethods
Genomic data comes in many formats we need a method that isable to work with multiple flat file formatsWe need a method that is not specialized on one type of dataset such as read counts it should also work on other scoringschemes(eg conservation scores) easilyRegions of interest are not always equi-width you should be ableto normalize for length differences by binningMultiple visualization options and fast heatmap generationshould be availableClustering of regions based on multiple summaries (egbinding for different TFs on the same set of regions) on theheatmapEase of use it should not take hours of coding to generate andvisualize summaries
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Overview of genomation features
BAMBigWigBEDGFFTab txtGRanges
BEDGFFTab txtGRanges
Summarize
Annotation
Genomic Intervals
Annotate
Visualize
Base-pairs bins1 2 3 4 n
ScoreMatrixScoreMatrixList object
region 1
region 2
region 3
region 4
region m
IntergenicIntronExonPromoter409
116
218257
iuml iuml 0 500 1000
00
02
04
06
08
10
base-pairs around anchor
read
per
milli
on
TF4TF3TF2TF1
iuml
iuml
0
500
100
0
0 05 1 15 2
TF 4
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 3
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 2
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 1
iuml iuml 0 500 1000
base-pairs around anchor
TF1
TF2
TF3
TF4
007
20
340
60
861
1
meta-region plots meta-region heatmaps
heatmaps for genomic interval sets
Piecharts for annotation
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
installation of the package and the example data
We can install the package and the data using install_github()
function from the devtools package
install dependencies
installpackages( c(datatableplyrreshape2ggplot2gridBasedevtools))
source(httpbioconductororgbiocLiteR)biocLite(c(GenomicRangesrtracklayerimputeRsamtools))
install the packages
library(devtools)install_github(genomation username = al2na)
install the data package
needed for examples
install_github(genomationData username = al2na)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Data import
Various file formats can be used in genomation You can read inannotation or your genomic intervals of interest
library(genomation)tabfile1 lt- systemfile(extdatatab1bed package = genomation)readGeneric(tabfile1)
GRanges with 6 ranges and 0 metadata columns seqnames ranges strand ltRlegt ltIRangesgt ltRlegt [1] chr21 [9437272 9439473] [2] chr21 [9483485 9484663] [3] chr21 [9647866 9648116] [4] chr21 [9708935 9709231] [5] chr21 [9825442 9826296] [6] chr21 [9909011 9909218] --- seqlengths chr21 NA
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Extraction of data over pre-defined genomic regions
ScoreMatrix() and ScoreMatrixBin() are functions used to extractdata over predefined windows
ScoreMatrix is used when all of the windows have the samewidth (eg region around TSS)ScoreMatrixBin is designed for use with windows of unequalwidth (eg enrichment of methylation over exons)
data(cage)data(promoters)sm lt- ScoreMatrix(target = cage windows = promoters)sm
scoreMatrix with dims 1055 2001
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Visualizing ScoreMatrix summary of genomicinvervals over pre-defined regions
plotMeta()heatMeta() heatMatrix() and multiHeatMatrix()are the visualization functions
oldmar lt- par()$marpar(oma = c(0 0 0 0))heatMatrix(sm xcoords = c(-1000 1000))plotMeta(sm xcoords = c(-1000 1000)linecol=blue)par(oma = oldmar)
00
751
52
2 3
minus1000 minus500 0 500 1000 minus1000 minus500 0 500 1000
000
005
010
015
020
025
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with BAM files
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()functions
bamfile = systemfile(teststestbam package=genomation)windows = GRanges(rep(c(12)each=2)
IRanges(rep(c(12) times=2) width=5))scores3 = ScoreMatrix(target=bamfilewindows=windows type=bam)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Overview of genomation features
BAMBigWigBEDGFFTab txtGRanges
BEDGFFTab txtGRanges
Summarize
Annotation
Genomic Intervals
Annotate
Visualize
Base-pairs bins1 2 3 4 n
ScoreMatrixScoreMatrixList object
region 1
region 2
region 3
region 4
region m
IntergenicIntronExonPromoter409
116
218257
iuml iuml 0 500 1000
00
02
04
06
08
10
base-pairs around anchor
read
per
milli
on
TF4TF3TF2TF1
iuml
iuml
0
500
100
0
0 05 1 15 2
TF 4
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 3
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 2
iuml
iuml
0
500
100
0
0 05 1 15 2 25
TF 1
iuml iuml 0 500 1000
base-pairs around anchor
TF1
TF2
TF3
TF4
007
20
340
60
861
1
meta-region plots meta-region heatmaps
heatmaps for genomic interval sets
Piecharts for annotation
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
installation of the package and the example data
We can install the package and the data using install_github()
function from the devtools package
install dependencies
installpackages( c(datatableplyrreshape2ggplot2gridBasedevtools))
source(httpbioconductororgbiocLiteR)biocLite(c(GenomicRangesrtracklayerimputeRsamtools))
install the packages
library(devtools)install_github(genomation username = al2na)
install the data package
needed for examples
install_github(genomationData username = al2na)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Data import
Various file formats can be used in genomation You can read inannotation or your genomic intervals of interest
library(genomation)tabfile1 lt- systemfile(extdatatab1bed package = genomation)readGeneric(tabfile1)
GRanges with 6 ranges and 0 metadata columns seqnames ranges strand ltRlegt ltIRangesgt ltRlegt [1] chr21 [9437272 9439473] [2] chr21 [9483485 9484663] [3] chr21 [9647866 9648116] [4] chr21 [9708935 9709231] [5] chr21 [9825442 9826296] [6] chr21 [9909011 9909218] --- seqlengths chr21 NA
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Extraction of data over pre-defined genomic regions
ScoreMatrix() and ScoreMatrixBin() are functions used to extractdata over predefined windows
ScoreMatrix is used when all of the windows have the samewidth (eg region around TSS)ScoreMatrixBin is designed for use with windows of unequalwidth (eg enrichment of methylation over exons)
data(cage)data(promoters)sm lt- ScoreMatrix(target = cage windows = promoters)sm
scoreMatrix with dims 1055 2001
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Visualizing ScoreMatrix summary of genomicinvervals over pre-defined regions
plotMeta()heatMeta() heatMatrix() and multiHeatMatrix()are the visualization functions
oldmar lt- par()$marpar(oma = c(0 0 0 0))heatMatrix(sm xcoords = c(-1000 1000))plotMeta(sm xcoords = c(-1000 1000)linecol=blue)par(oma = oldmar)
00
751
52
2 3
minus1000 minus500 0 500 1000 minus1000 minus500 0 500 1000
000
005
010
015
020
025
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with BAM files
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()functions
bamfile = systemfile(teststestbam package=genomation)windows = GRanges(rep(c(12)each=2)
IRanges(rep(c(12) times=2) width=5))scores3 = ScoreMatrix(target=bamfilewindows=windows type=bam)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
installation of the package and the example data
We can install the package and the data using install_github()
function from the devtools package
install dependencies
installpackages( c(datatableplyrreshape2ggplot2gridBasedevtools))
source(httpbioconductororgbiocLiteR)biocLite(c(GenomicRangesrtracklayerimputeRsamtools))
install the packages
library(devtools)install_github(genomation username = al2na)
install the data package
needed for examples
install_github(genomationData username = al2na)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Data import
Various file formats can be used in genomation You can read inannotation or your genomic intervals of interest
library(genomation)tabfile1 lt- systemfile(extdatatab1bed package = genomation)readGeneric(tabfile1)
GRanges with 6 ranges and 0 metadata columns seqnames ranges strand ltRlegt ltIRangesgt ltRlegt [1] chr21 [9437272 9439473] [2] chr21 [9483485 9484663] [3] chr21 [9647866 9648116] [4] chr21 [9708935 9709231] [5] chr21 [9825442 9826296] [6] chr21 [9909011 9909218] --- seqlengths chr21 NA
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Extraction of data over pre-defined genomic regions
ScoreMatrix() and ScoreMatrixBin() are functions used to extractdata over predefined windows
ScoreMatrix is used when all of the windows have the samewidth (eg region around TSS)ScoreMatrixBin is designed for use with windows of unequalwidth (eg enrichment of methylation over exons)
data(cage)data(promoters)sm lt- ScoreMatrix(target = cage windows = promoters)sm
scoreMatrix with dims 1055 2001
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Visualizing ScoreMatrix summary of genomicinvervals over pre-defined regions
plotMeta()heatMeta() heatMatrix() and multiHeatMatrix()are the visualization functions
oldmar lt- par()$marpar(oma = c(0 0 0 0))heatMatrix(sm xcoords = c(-1000 1000))plotMeta(sm xcoords = c(-1000 1000)linecol=blue)par(oma = oldmar)
00
751
52
2 3
minus1000 minus500 0 500 1000 minus1000 minus500 0 500 1000
000
005
010
015
020
025
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with BAM files
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()functions
bamfile = systemfile(teststestbam package=genomation)windows = GRanges(rep(c(12)each=2)
IRanges(rep(c(12) times=2) width=5))scores3 = ScoreMatrix(target=bamfilewindows=windows type=bam)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Data import
Various file formats can be used in genomation You can read inannotation or your genomic intervals of interest
library(genomation)tabfile1 lt- systemfile(extdatatab1bed package = genomation)readGeneric(tabfile1)
GRanges with 6 ranges and 0 metadata columns seqnames ranges strand ltRlegt ltIRangesgt ltRlegt [1] chr21 [9437272 9439473] [2] chr21 [9483485 9484663] [3] chr21 [9647866 9648116] [4] chr21 [9708935 9709231] [5] chr21 [9825442 9826296] [6] chr21 [9909011 9909218] --- seqlengths chr21 NA
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Extraction of data over pre-defined genomic regions
ScoreMatrix() and ScoreMatrixBin() are functions used to extractdata over predefined windows
ScoreMatrix is used when all of the windows have the samewidth (eg region around TSS)ScoreMatrixBin is designed for use with windows of unequalwidth (eg enrichment of methylation over exons)
data(cage)data(promoters)sm lt- ScoreMatrix(target = cage windows = promoters)sm
scoreMatrix with dims 1055 2001
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Visualizing ScoreMatrix summary of genomicinvervals over pre-defined regions
plotMeta()heatMeta() heatMatrix() and multiHeatMatrix()are the visualization functions
oldmar lt- par()$marpar(oma = c(0 0 0 0))heatMatrix(sm xcoords = c(-1000 1000))plotMeta(sm xcoords = c(-1000 1000)linecol=blue)par(oma = oldmar)
00
751
52
2 3
minus1000 minus500 0 500 1000 minus1000 minus500 0 500 1000
000
005
010
015
020
025
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with BAM files
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()functions
bamfile = systemfile(teststestbam package=genomation)windows = GRanges(rep(c(12)each=2)
IRanges(rep(c(12) times=2) width=5))scores3 = ScoreMatrix(target=bamfilewindows=windows type=bam)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Extraction of data over pre-defined genomic regions
ScoreMatrix() and ScoreMatrixBin() are functions used to extractdata over predefined windows
ScoreMatrix is used when all of the windows have the samewidth (eg region around TSS)ScoreMatrixBin is designed for use with windows of unequalwidth (eg enrichment of methylation over exons)
data(cage)data(promoters)sm lt- ScoreMatrix(target = cage windows = promoters)sm
scoreMatrix with dims 1055 2001
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Visualizing ScoreMatrix summary of genomicinvervals over pre-defined regions
plotMeta()heatMeta() heatMatrix() and multiHeatMatrix()are the visualization functions
oldmar lt- par()$marpar(oma = c(0 0 0 0))heatMatrix(sm xcoords = c(-1000 1000))plotMeta(sm xcoords = c(-1000 1000)linecol=blue)par(oma = oldmar)
00
751
52
2 3
minus1000 minus500 0 500 1000 minus1000 minus500 0 500 1000
000
005
010
015
020
025
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with BAM files
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()functions
bamfile = systemfile(teststestbam package=genomation)windows = GRanges(rep(c(12)each=2)
IRanges(rep(c(12) times=2) width=5))scores3 = ScoreMatrix(target=bamfilewindows=windows type=bam)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Visualizing ScoreMatrix summary of genomicinvervals over pre-defined regions
plotMeta()heatMeta() heatMatrix() and multiHeatMatrix()are the visualization functions
oldmar lt- par()$marpar(oma = c(0 0 0 0))heatMatrix(sm xcoords = c(-1000 1000))plotMeta(sm xcoords = c(-1000 1000)linecol=blue)par(oma = oldmar)
00
751
52
2 3
minus1000 minus500 0 500 1000 minus1000 minus500 0 500 1000
000
005
010
015
020
025
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with BAM files
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()functions
bamfile = systemfile(teststestbam package=genomation)windows = GRanges(rep(c(12)each=2)
IRanges(rep(c(12) times=2) width=5))scores3 = ScoreMatrix(target=bamfilewindows=windows type=bam)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with BAM files
BAM files can also be used in ScoreMatrix() and ScoreMatrixBin()functions
bamfile = systemfile(teststestbam package=genomation)windows = GRanges(rep(c(12)each=2)
IRanges(rep(c(12) times=2) width=5))scores3 = ScoreMatrix(target=bamfilewindows=windows type=bam)
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Working with bigWig files
ScoreMatrix() and ScoreMatrixBin() are functions can handlebigWig files Here we use ENCODE DHS scores downloaded fromhttpgooglfEVu0g
mybed12file=systemfile(extdatachr21refseqhg19bedpackage = genomation)
feats=readTranscriptFeatures(mybed12fileupflank=500downflank=500)sm=ScoreMatrix(target=wgEncodeUwDnaseA549RawRep1bw
windows=feats$promoterstype=bigWigstrandaware=TRUE)plotMeta(smxcoords=c(-500500)main=DHS around TSSlinecol=blue)
minus400 0 200
46
810
14
DHS around TSS
bases
aver
age
scor
e
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple heatmap profiles can be plotted using multiHeatMatrix()which takes in a ScoreMatrixList object Here we used CTCF P300 Suz12 Rad21 Znf143 BAM files from genomationData package
ctcfpeaks=readRDS(ctcfpeaksrds)dataPath = systemfile(extdata package = genomationData)bamfiles = listfiles(dataPath full= Tpattern = bam$)[c(146)]sml = ScoreMatrixList(bamfiles ctcfpeaks binnum = 50type = bam)names(sml)=c(CTCFP300Suz12Rad21Znf143)multiHeatMatrix(sml xcoords = c(-500 500)cexaxis=035commonscale = T
col = c(lightgray blue)winsorize=c(095))
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
multiHeatMatrix() can also apply K-means clustering Extremevalues are trimmed using with ldquowinsorizerdquo argument
multiHeatMatrix(sml xcoords = c(-500 500)kmeans=TRUEk=3commonscale = Tcexaxis=04col = c(lightgray blue)winsorize=c(095))
1
2
3
minus50
0 minus
250
0
250
500
0 2 4 6 8
CTCF
minus50
0 minus
250
0
250
500
0 2 4 6 8
P300
minus50
0 minus
250
0
250
500
0 2 4 6 8
Suz12
minus50
0 minus
250
0
250
500
0 2 4 6 8
Rad21
minus50
0 minus
250
0
250
500
0 2 4 6 8
Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can be visualized with heatMeta() Herewe also apply a scaling function to all the matrices
take log2 of all matrices
sml2=scaleScoreMatrixList(smlscalefun=function(x) log2(x+1))heatMeta(sml2legendname=average profilesxcoords=c(-500 500)
xlab=bp around peaks)
minus400 minus200 0 200 400
bp around peaks
Znf143
Rad21
Suz12
P300
CTCF
021
061
11
41
8av
erag
e pr
ofile
s
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Multiple profiles
Multiple average profiles can also be visualized with plotMeta()
plotMeta(sml2profilenames=names(sml2)xcoords=c(-500 500)main=mult profiles)
minus400 minus200 0 200 400
05
10
15
mult profiles
bases
aver
age
scor
e
CTCFP300Suz12Rad21Znf143
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Future work
Explore overlap statistics between two genomic data sets DoesTF1 binding site locations overlap with TF2 sites more thanexpectedThis is previously explored with GenometriCorr package Thesefunctionality can be included in the form of a dependencyPerformance improvement on certain functions faster is alwaysbetter
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Further information
The genomation package is available athttpal2nagithubiogenomation You can find the linkto the vignette on the webpage as wellCode that generated this presentation is available athttpgithubcomal2nagenomation_presentation
Questions and bug reportsYou can viewopen issues in githubhttpsgithubcomal2nagenomationissuesstate=open
You can ask questions by sending an e-mail togenomationgooglegroupscom or using the web interface togoogle groups
Developed by Altuna Akalın and Vedran Franke
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
genomation
package
Altuna Akalın
Usage and
ubiquity of
genomic
interval
summaries
Using
genomation
More
information
Session Info
sessionInfo()
R version 302 (2013-09-25) Platform x86_64-apple-darwin1080 (64-bit) locale [1] C attached base packages [1] methods grid stats graphics grDevices utils datasets [8] base other attached packages [1] genomation_09902 knitr_15 loaded via a namespace (and not attached) [1] BSgenome_1300 BiocGenerics_080 Biostrings_2300 [4] GenomicRanges_1143 IRanges_1205 MASS_73-29 [7] RColorBrewer_10-5 RCurl_195-41 Rsamtools_1141 [10] XML_395-02 XVector_020 bitops_10-6 [13] colorspace_12-4 datatable_1810 dichromat_20-0 [16] digest_063 evaluate_051 formatR_010 [19] ggplot2_0931 gridBase_04-6 gtable_012 [22] highr_03 impute_1360 labeling_02 [25] munsell_042 parallel_302 plyr_18 [28] proto_03-10 reshape2_122 rtracklayer_1220 [31] scales_023 stats4_302 stringr_062 [34] tools_302 zlibbioc_180
Recommended