35
Department of Plant Systems Biology Research at the Bioinformatics & Computational Biology research groups

Department of Plant Systems Biology Research at the Bioinformatics & Computational Biology research groups

Embed Size (px)

Citation preview

Department of Plant Systems Biology

Research at the Bioinformatics & Computational Biology research

groups

2 Yvan Saeys, Donostia 2004

Department of Plant Systems Biology

• Headed by Prof. Dirk Inzé – 203 people (179 research staff, 24 technical/administrative staff)

• 6 Research Divisions – Biology (146)

• Molecular Genetics Division (87)• Functional Genomics Division (19)• Plant-Microbe Division (19)• Genome Dynamics and Gene Regulation Division (19)

– (Bio)Informatics (33)• Bioinformatics and Evolutionary Genomics Division (24)• Computational Biology Division (9)

3 Yvan Saeys, Donostia 2004

2 “Computational” research groups

• Bioinformatics and Evolutionary Genomics (BEG)– Mainly deal with sequence data

• Comparative Genomics (Yves Van de Peer)

• Gene prediction & Annotation (Pierre Rouzé)

• Computational Biology Division (CBD)– Explore biological systems (networks)

• Headed by Martin Kuiper

4 Yvan Saeys, Donostia 2004

Dr. Martin Kuiper

Prof. Yves Van de Peer

Dr. Pierre Rouzé

Group Leaders

5 Yvan Saeys, Donostia 2004

Research activities

Comparative Genomics

Gene Prediction &

Genome Annotation

Annotation of genomes

Machine Learning

Ancient large-scale gene

duplications

Functional divergence of duplicated genes Promoters and

regulatory elements

Transcription factors

Bacterial comparative

genomics Non coding RNAs

Gene network modelling

Heterosis

6 Yvan Saeys, Donostia 2004

Ancient large-scale gene duplications

• Investigate major events during evolutionary past of genomes:– Large scale gene duplications

– Genome duplications

• Research– Algorithms to detect colinear regions

– Compare intra and inter species

– Arabidopsis: 3 whole genome duplications

– Comparisons between Arabidopsis and Rice

– Duplications in vertebrate genomes

Klaas Vandepoele Cedric Simillion

7 Yvan Saeys, Donostia 2004

Large-scale duplications

synteny

ancient duplication

HsaC1

HsaC9

recent duplication

C2

C4

colinearity

8 Yvan Saeys, Donostia 2004

Ancient large-scale gene duplications

A 1 2 3 4 5 6 7 8 9 10 11 11 11 11 11

B 1 2 3 4 5 6 7 8 9 10 11 11 11 11 11

A 1 2 4 5 6 7 8 9 10 11 11 11 11

B 2 3 4 6 7 8 9 11 11 11 11 11

11

11 11 11

11

Building genomic profiles

C 12456789101111111111 1111 11 4

Not significant !

C 12456789101111111111 1111 11 4

Not significant

A 1 2 4 5 6 7 8 9 10 11 11 11 11

B 2 3 4 6 7 8 9 11 11 11 11 11

11

11 11 11

11

C1

24

56

78

910

1111

1111

1111

1111

4

significant homology!

9 Yvan Saeys, Donostia 2004

Functional divergence of duplicated genes

• Duplications stimulate biological novelties– Investigate what happens to duplicated genes

– Study of models for gene evolution

– Genes are not individual entities, but members of gene families

• Research– Up to 65% of the genes in Arabidopsis belong to a gene family

– Divergence at the regulatory/expression level

– Divergence at the coding level.

Tine CasneufJeroen Raes

10 Yvan Saeys, Donostia 2004

Functional divergence of duplicated genes

11 Yvan Saeys, Donostia 2004

Bacterial comparative genomics

• Investigation of multiple bacterial genomes– Genomes evolve over time, changing in subtle or radical

ways, constantly adapting to the surrounding environment– Genomes can evolve gradually through vertical

transmission of mutations, gene duplications, deletions, and rearrangements

– Alternatively, they can evolve more suddenly and sporadically via horizontal transfer of genetic information between different microbial species

• Research– Assess the contribution of gene duplications to genome

evolution in prokaryotes

Dirk Gevers

12 Yvan Saeys, Donostia 2004

Bacterial comparative genomics

Functional Landscape of the Paranome (FLOP):•Linking functional information to the paranome information•Allows us to determine whether paralog retention is biased towards specific functional classes for each of the bacterial strains

13 Yvan Saeys, Donostia 2004

Transcription factors

• Towards a better understanding of the link between evolution and development (evo-devo)– Transcription factors play a major role in the regulation of

gene expression– Study the evolutionary and functional divergence of genes

belonging to large transcription factor gene families

• Research– Structural and phylogenetic analyses of the MADS-box gene

family– Comprehensive view on the regulatory role of MADS-box genes

in plant development – Phylogenetic footprinting

Stefanie De Bodt

14 Yvan Saeys, Donostia 2004

Transcription factors

15 Yvan Saeys, Donostia 2004

Genome Annotation

• Structural annotation of genes/genomes– Locate genes in genomes

– Find the exact gene structures

– Investigation of particular gene families

• Research– Development of an automatic annotation platform that can

be applied to different genomes

– Genomes: Arabidopsis, Poplar, Medicago, Ostrecoccus tauri

                                                                                                                                        

Stephane Rombauts

Lieven Sterck

Steven Robbens

16 Yvan Saeys, Donostia 2004

Genome Annotation platform

RepeatMasker

Codingpotential search

SplicePredictor

Netstart

NetGene2

BlastnBlastx

EuGene

Intrinsicapproaches

Extrinsicapproaches

PredictedGenes

(structural annotation)

17 Yvan Saeys, Donostia 2004

Dataset construction for Poplar

Let EuGene make prediction based on

extrinsic dataEuGene

Blastn

RepeatMasker Blastx

Extrinsicapproaches

IMM

Splicing: WAM

Start: const

Intrinsicapproaches

EuGene framework

Blast against Arabidopsis proteins with full length, discard cDNAs that have no hit

Training set of mapped

cDNAs

Poplar IMMSpliceMachin

e

Start predictio

n

Select predicted genes covered

by FL cDNA

Final prediction of EuGene

18 Yvan Saeys, Donostia 2004

Annotation of core cell cycle genes in Ostreococcus tauri

The CDK gene familyCell Cycle genes Arabidopsis thaliana Ostreococcus tauri

CDK A 1 1CDK B 4 1CDK C 2 1CDK D 3 1CDK E 1 0CDK F 1 0Cyclin A 10 1Cyclin B 9 1Cyclin D 10 1Cyclin H 1 1Wee1 1 1Cdc25 0 1Rb 1 1Dp 2 1E2F 3 1DEL 3 1Cks 2 1

19 Yvan Saeys, Donostia 2004

Machine Learning(applied to genome annotation)

• Computational techniques to identify structural elements– Supervised classification methods

– Support Vector Machines

– Feature selection for knowledge extraction

• Research– New splice site prediction models

– New feature selection techniques for gene prediction

– Leads to more accurate gene models

Sven Degroeve Yvan Saeys

20 Yvan Saeys, Donostia 2004

Splice Machine

21 Yvan Saeys, Donostia 2004

Feature selection for acceptor prediction50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

A

T

C

G

AAA AAA

AAT AAT

AAC AAC

AAG AAG

ATA ATA

ATT ATTATC ATC

ATG ATG

ACA ACA

ACT ACT

ACC ACC

ACG ACG

AGA AGA

AGT AGT

AGC AGC

AGG AGG

TAA TAA

TAT TAT

TAC TAC

TAG TAG

TTA TTA

TTT TTT

TTC TTC

TTG TTG

TCA TCA

TCT TCT

TCC TCC

TCG TCG

TGA TGA

TGT TGT

TGC TGC

TGG TGG

CAA CAA

CAT CAT

CAC CAC

CAG CAG

CTA CTA

CTT CTT

CTC CTCCTG CTG

CCA CCA

CCT CCT

CCC CCC

CCG CCG

CGA CGA

CGT CGT

CGC CGC

CGG CGG

GAA GAA

GAT GAT

GAC GAC

GAG GAG

GTA GTA

GTT GTTGTC GTC

GTG GTG

GCA GCA

GCT GCT

GCC GCC

GCG GCG

GGA GGA

GGT GGT

GGC GGC

GGG GGG

AA

AT

AC

AG

TA

TT

TC

TG

CA

CT

CC

CG

GA

GT

GC

GG

22 Yvan Saeys, Donostia 2004

Promoter prediction

• Computational identification of promoter regions– Signal elements

– Structural features

– Still many false positives

• Research– Develop new tools and approaches for the automatic

delineation of promoters

– Motif detection

– Detecting cis-regulatory elements

– Phylogenetic footprinting

Kobe Florquin

23 Yvan Saeys, Donostia 2004

Promoter predictionSensitivity Nucleosome Positioning

0,75

0,8

0,85

0,9

0,95

1

1 2 3 4 5 6 7 8 9 10

Specificity Nucleosome Positioning

0,75

0,8

0,85

0,9

0,95

1

1 2 3 4 5 6 7 8 9 10

Sensitivity B-DNA twist

0,75

0,8

0,85

0,9

0,95

1

1 2 3 4 5 6 7 8 9 10

Specificity B-DNA twist

0,75

0,8

0,85

0,9

0,95

1

1 2 3 4 5 6 7 8 9 10

Sensitivity DNA denaturation

0,75

0,8

0,85

0,9

0,95

1

1 2 3 4 5 6 7 8 9 10

Specificity DNA denaturation

0,75

0,8

0,85

0,9

0,95

1

1 2 3 4 5 6 7 8 9 10

Sensitivity Nucleosome Positioning

0,75

0,8

0,85

0,9

0,95

1

50 25 10 5

Specificity Nucleosome Positioning

0,75

0,8

0,85

0,9

0,95

1

50 25 10 5

Sensitivity B-DNA twist

0,75

0,8

0,85

0,9

0,95

1

50 25 10 5

Specificity B-DNA twist

0,75

0,8

0,85

0,9

0,95

1

50 25 10 5

Sensitivity DNA denaturation

0,75

0,8

0,85

0,9

0,95

1

50 25 10 5

Specificity DNA denaturation

0,75

0,8

0,85

0,9

0,95

1

50 25 10 5

different promoter classes different percentage promoters

Intron-Exon | Simple | Dinucleotide | Markov

a.

c.

k.h.e.b.

j.g.d.

l.i.f.

24 Yvan Saeys, Donostia 2004

Non coding RNAs

• Many RNA molecules are not protein coding but instead function through their RNA form– Known a long time: transfer RNAs (tRNA), ribosomal

RNAs (rRNA)– Only recently discovered: small interfering RNAs

(siRNA), micro RNAs (miRNA), …– Regulate gene expression at the post-transcriptional level

• Research– Developing different computational tools and techniques to

detect and characterize non-coding RNAs in Arabidopsis and other plant genomes

Jan Wuyts Eric Bonnet

25 Yvan Saeys, Donostia 2004

Non coding RNAs: MIRfinder

26 Yvan Saeys, Donostia 2004

Comparison between plant species

27 Yvan Saeys, Donostia 2004

Genetic networks

• Integrate functional genomics data of all types in a global network that reflects the regulatory wiring and modularity of an organism– Micro-array data from perturbation experiments

– Leaf development

• Research– Novel methods, based on combinatorial statistics and graph

theory

– Unsupervised classification techniques (k-core clustering, Kohonen maps)

Steven Maere Steven Vercruysse

28 Yvan Saeys, Donostia 2004

Genetic networks

Comb. p-value < 0.01

k-core clustering GO labeling & visualization

Gene profiles

Experim

ents

29 Yvan Saeys, Donostia 2004

Genetic networksHierarchical clustering

Many other algorithms…

Self-organizing map

- Regulatory interactions

Goal: getting information about:

- Protein function (same profile => same biol. process?)

30 Yvan Saeys, Donostia 2004

Heterosis

• Modeling of “hybrid vigour”– Improved performance of F1 hybrids with respect to the

parents

– Dominance Model

– Over-dominance Model

– Epistatic Model

– biometrics versus soft-computing approach

• Research– Additive versus dominance effects– Estimation of the molecular phenotype of the hybrid

Jeroen Meeus Elena Tsiporkova

31 Yvan Saeys, Donostia 2004

Heterosis: Biometrics Approach

2500

0 ge

nes

10 parents 45 hybrids

25000 genes

biomassleaf size

biomassleaf size

10 parents 45 hybrids

heterotic non-heterotic

Step 3prediction

Step 1correlation

hybrid-parents

Step 2correlation morphological-

molecular phenotypes

Step 2correlation morphological-

molecular phenotypes

Molecular Phenotypes

Morphological Phenotypes

32 Yvan Saeys, Donostia 2004

Heterosis: Soft-Computing Approach

2500

0 ge

nes

10 parents 45 hybrids

25000 genes

biomassleaf size

biomassleaf size

10 parents 45 hybrids

heterotic non-heterotic

direct classification

simulation

associationassociation

Molecular Phenotypes

Morphological Phenotypes

33 Yvan Saeys, Donostia 2004

Databases• European ribosomal RNA

database http://www.psb.ugent.be/rRNA/

• European Plant Promoter database (PlantCARE)

http://oberon.fvms.ugent.be:8080/

PlantCARE/index.html

• European Federated Plant Database Network (Planet) http://mips.gsf.de/proj/planet/about.html

Software• Tree construction: TreeCon

• Tools: ForCon, SPADS, ZT, AFLPinSilico

• Large-scale duplications: Adhore, i-Adhore, ASaturA

Websitehttp://bioinformatics.psb.ugent.be

Francis Dierick: databases, webmaster, support

Gert Sclep: CATMA and CAGE databases

34 Yvan Saeys, Donostia 2004

“Part-time” Phd students

Secretary

Guy Baele: Modelling the covarion hypothesis

Dirk Vandycke: Extrinsic gene prediction approaches

Ann Bostyn

35 Yvan Saeys, Donostia 2004

Thanks to…