ACE & RACEannotation of
complex/combinatorialexpressions
Self-introductionAndrey ZinovyevM.Sc. in theoretical physics (1997)
Ph.D. in computer science (2001),Method of elastic mapsand applications in bioinformatics
Programming, industrial informationsystems (C++, Delphi)
Web-services development (Java, JSP)Senior postdoctoral fellow in IHES, France
http://www.ihes.fr/~zinovyev or type “zinovyev” in Google
Plan of the talk ACE framework
introductionwhat we have
What will be in RACE? ACE software
C++ codeweb-application
Plans for ACE and RACE Computational environment
Genome as databaseeverything is annotation
ATGCGTGCAAATGCTCTTTGTGTAACGTGTCGACGTACGTGTGTAACGTGCGACGTACGT
Genomes: human, chimp, mouse, rat
Gene annotation
Probabilityprofiles
TF1
TF2
b.ac e
RNA structures
r.ac e
Microarrays
m.a ce
common format for annotation files (binary p-files)
Genome preprocessingcompile once, run everywhere
ATGCGTGCAAATGCTCTTTGTGTAACGTGTCGACGTACGTGTGTAACGTGCGACGTACGT
b.acePotential
TF binding sites
r.acePotential
RNA structures,splicing sites
m.aceGene
expressiondata
c.aceChromatinstructure
and dynamics
ace.annotate ace.RNAtoolsace.annotate
ace.map arc
ace.enhanceace.clusterace.displayace.dyCrace.stat
Structure space the truth is out there
set of annotations
Structure space
Multidimensionalcombinatorialspace of all possiblestructures appearingin a scanning window
ace.enhancebe more abstract
Accessing and masking structure spaceace.enhance
expression (heuristic mask)
Method_01Method_02
Method_11…
ace.enhanceannotation
view in genome browser (ace.display)
compare with experiment(cross-annotation)
(ace.dyCr) construct more abstract
space and applyace.enhance further
b.ace
TF1
TF2
Transfac release Genome release
b.ace~1.2Tbyte
ace.annotate
ace.enhanceEnhance methods:1. Fixed spacing of sites2. Fixed order of sites3. Fixed strand orientation of sites4. Multiple copies of site5. Minimal spacing of sites6. Maximal spacing of sites7. Variable, defined spacing
between sites8. Minimal p-value for weight matrix9. Maximal p-value for weight matrix10. Bias weight-matrix
M1&&M2||M3||M4||M5
… + ace.cluster:simplified version of enhance for detecting
clusters of repetitions of one motif
Example14 transcription factors, chr14 of UCSC_HG15
rarHS – 659.631 hitscMyb – 1.647.505 hitsCEBP – 1.189.196 hitsPU.1 – 472.383 hits
ace.annotate =>
ace.enhance expression, window 50bp:PU.1 && rarHS — rarHS || rarHS — rarHS && CEBP< cMyb
11**
8**
Result: 102 hits5’ 3’5’ 3’
5’ 3’
14.1
14.2
14.3
Example2clusters of motifs, chr14
jfl_im = TAGAGA
TAGAGTTAGGGATAGGGT
ace.annotate => 183.389hits
ace.enhance expression, window 300bp:jfl_im 10 copies
Result: 51 hits in 5 groups
ACE C++ tools aceLib, wraps system-dependent code generic programming for code reusabilityace.annotate – probability based annotations
and motifs searchace.enhance – accessing (masking) structure space:
combinatorial query language
ace.cluster – extracting clusters of repetitions:simplified version of enhance
ace.dyCr – first step in structure space analysis:dynamic cross-annotation
ace.stat – statistical significance analysis
ACE web-application (JSP)ace.uit
database layout: .ace
modules layout: ace.rte/ace.annotate
modules layout: ace.rte/ace.enhance
data layout: my.ace
documentation layout: ace.doc
Plans with ACEprincipal problem
false-positive rate
ace.stat : statistical model of random noise maximum entropy principle significance analysis
Plans with ACEvisualizing structure space
creating 2D maps of structure space
data visualization,dimension reduction
ace.evaace.net
Plans with ACEintegrating m.ace
m.ace
ace.map
Plans with ACEmodel of chromatin structure and dynamics
c.aceimunoprecipitation
experiments
chromatinstate
profiles
silencingstructures in space
arc
Plans with ACEcomparative genomics
genome1 genome2
Installation of b.ace in Lillehttp://ace.ibl.fr
1.2 Tbyte PowerVault storagePowerEdge Dell server
Installation of RACE in Sherbrooke (golf)
UCSClocal UCSC
browser
Gbrowser
LISADB ace
r.aceDB
Distributed environmentdatabase synchronization protocol
b.aceLille
France
public dbs
new genomerelease
where?
r.aceSherbrooke
Canada
m.aceINSERM
Paris
c.aceIHESParis
LISASherbrooke
Canada
RACE platform for integration
ace.annotate find simple motifs (loops, hairpins)
ace.RNAtools pluggable algorithms
p-files (r.ace database)
ace.enhancepluggable methods
ace.displayace.statace.dyCr
ACE team
aceLib, ace C++: Thomas Bücher, Inst.Neur.
arc : Graham Smith, IHES
ace.map : Sebastian Noth, INSERM
ace team leader : Arndt Benecke, IHES
ace.stat : Richard Madden, UdSh
ace.uit, ace C++: Andrey Zinovyev, IHES