28
ACE & RACE annotation of complex/combinatorial expressions

ACE & RACE a nnotation of c omplex/ c ombinatorial e xpressions

  • Upload
    oki

  • View
    48

  • Download
    2

Embed Size (px)

DESCRIPTION

ACE & RACE a nnotation of c omplex/ c ombinatorial e xpressions. Self-introduction. Andrey Zinovyev. M.Sc. in theoretical physics (1997). Programming, industrial information systems (C++, Delphi). Ph.D. in computer science (2001), Method of elastic maps and applications in bioinformatics. - PowerPoint PPT Presentation

Citation preview

Page 1: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

ACE & RACEannotation of

complex/combinatorialexpressions

Page 2: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

Self-introductionAndrey ZinovyevM.Sc. in theoretical physics (1997)

Ph.D. in computer science (2001),Method of elastic mapsand applications in bioinformatics

Programming, industrial informationsystems (C++, Delphi)

Web-services development (Java, JSP)Senior postdoctoral fellow in IHES, France

http://www.ihes.fr/~zinovyev or type “zinovyev” in Google

Page 3: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

Plan of the talk ACE framework

introductionwhat we have

What will be in RACE? ACE software

C++ codeweb-application

Plans for ACE and RACE Computational environment

Page 4: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

Genome as databaseeverything is annotation

ATGCGTGCAAATGCTCTTTGTGTAACGTGTCGACGTACGTGTGTAACGTGCGACGTACGT

Genomes: human, chimp, mouse, rat

Gene annotation

Probabilityprofiles

TF1

TF2

b.ac e

RNA structures

r.ac e

Microarrays

m.a ce

common format for annotation files (binary p-files)

Page 5: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

Genome preprocessingcompile once, run everywhere

ATGCGTGCAAATGCTCTTTGTGTAACGTGTCGACGTACGTGTGTAACGTGCGACGTACGT

b.acePotential

TF binding sites

r.acePotential

RNA structures,splicing sites

m.aceGene

expressiondata

c.aceChromatinstructure

and dynamics

ace.annotate ace.RNAtoolsace.annotate

ace.map arc

ace.enhanceace.clusterace.displayace.dyCrace.stat

Page 6: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

Structure space the truth is out there

set of annotations

Structure space

Multidimensionalcombinatorialspace of all possiblestructures appearingin a scanning window

Page 7: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

ace.enhancebe more abstract

Accessing and masking structure spaceace.enhance

expression (heuristic mask)

Method_01Method_02

Method_11…

ace.enhanceannotation

view in genome browser (ace.display)

compare with experiment(cross-annotation)

(ace.dyCr) construct more abstract

space and applyace.enhance further

Page 8: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

b.ace

TF1

TF2

Transfac release Genome release

b.ace~1.2Tbyte

ace.annotate

Page 9: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

ace.enhanceEnhance methods:1. Fixed spacing of sites2. Fixed order of sites3. Fixed strand orientation of sites4. Multiple copies of site5. Minimal spacing of sites6. Maximal spacing of sites7. Variable, defined spacing

between sites8. Minimal p-value for weight matrix9. Maximal p-value for weight matrix10. Bias weight-matrix

M1&&M2||M3||M4||M5

… + ace.cluster:simplified version of enhance for detecting

clusters of repetitions of one motif

Page 10: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

Example14 transcription factors, chr14 of UCSC_HG15

rarHS – 659.631 hitscMyb – 1.647.505 hitsCEBP – 1.189.196 hitsPU.1 – 472.383 hits

ace.annotate =>

ace.enhance expression, window 50bp:PU.1 && rarHS — rarHS || rarHS — rarHS && CEBP< cMyb

11**

8**

Result: 102 hits5’ 3’5’ 3’

5’ 3’

14.1

14.2

14.3

Page 11: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

Example2clusters of motifs, chr14

jfl_im = TAGAGA

TAGAGTTAGGGATAGGGT

ace.annotate => 183.389hits

ace.enhance expression, window 300bp:jfl_im 10 copies

Result: 51 hits in 5 groups

Page 12: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

ACE C++ tools aceLib, wraps system-dependent code generic programming for code reusabilityace.annotate – probability based annotations

and motifs searchace.enhance – accessing (masking) structure space:

combinatorial query language

ace.cluster – extracting clusters of repetitions:simplified version of enhance

ace.dyCr – first step in structure space analysis:dynamic cross-annotation

ace.stat – statistical significance analysis

Page 13: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

ACE web-application (JSP)ace.uit

Page 14: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

database layout: .ace

Page 15: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

modules layout: ace.rte/ace.annotate

Page 16: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

modules layout: ace.rte/ace.enhance

Page 17: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

data layout: my.ace

Page 18: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

documentation layout: ace.doc

Page 19: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

Plans with ACEprincipal problem

false-positive rate

ace.stat : statistical model of random noise maximum entropy principle significance analysis

Page 20: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

Plans with ACEvisualizing structure space

creating 2D maps of structure space

data visualization,dimension reduction

Page 21: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

ace.evaace.net

Plans with ACEintegrating m.ace

m.ace

ace.map

Page 22: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

Plans with ACEmodel of chromatin structure and dynamics

c.aceimunoprecipitation

experiments

chromatinstate

profiles

silencingstructures in space

arc

Page 23: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

Plans with ACEcomparative genomics

genome1 genome2

Page 24: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

Installation of b.ace in Lillehttp://ace.ibl.fr

1.2 Tbyte PowerVault storagePowerEdge Dell server

Page 25: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

Installation of RACE in Sherbrooke (golf)

UCSClocal UCSC

browser

Gbrowser

LISADB ace

r.aceDB

Page 26: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

Distributed environmentdatabase synchronization protocol

b.aceLille

France

public dbs

new genomerelease

where?

r.aceSherbrooke

Canada

m.aceINSERM

Paris

c.aceIHESParis

LISASherbrooke

Canada

Page 27: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

RACE platform for integration

ace.annotate find simple motifs (loops, hairpins)

ace.RNAtools pluggable algorithms

p-files (r.ace database)

ace.enhancepluggable methods

ace.displayace.statace.dyCr

Page 28: ACE & RACE a nnotation of  c omplex/ c ombinatorial e xpressions

ACE team

aceLib, ace C++: Thomas Bücher, Inst.Neur.

arc : Graham Smith, IHES

ace.map : Sebastian Noth, INSERM

ace team leader : Arndt Benecke, IHES

ace.stat : Richard Madden, UdSh

ace.uit, ace C++: Andrey Zinovyev, IHES