Upload
lenard-lawrence
View
216
Download
1
Tags:
Embed Size (px)
Citation preview
Department of Plant Systems Biology
Research at the Bioinformatics & Computational Biology research
groups
2 Yvan Saeys, Donostia 2004
Department of Plant Systems Biology
• Headed by Prof. Dirk Inzé – 203 people (179 research staff, 24 technical/administrative staff)
• 6 Research Divisions – Biology (146)
• Molecular Genetics Division (87)• Functional Genomics Division (19)• Plant-Microbe Division (19)• Genome Dynamics and Gene Regulation Division (19)
– (Bio)Informatics (33)• Bioinformatics and Evolutionary Genomics Division (24)• Computational Biology Division (9)
3 Yvan Saeys, Donostia 2004
2 “Computational” research groups
• Bioinformatics and Evolutionary Genomics (BEG)– Mainly deal with sequence data
• Comparative Genomics (Yves Van de Peer)
• Gene prediction & Annotation (Pierre Rouzé)
• Computational Biology Division (CBD)– Explore biological systems (networks)
• Headed by Martin Kuiper
5 Yvan Saeys, Donostia 2004
Research activities
Comparative Genomics
Gene Prediction &
Genome Annotation
Annotation of genomes
Machine Learning
Ancient large-scale gene
duplications
Functional divergence of duplicated genes Promoters and
regulatory elements
Transcription factors
Bacterial comparative
genomics Non coding RNAs
Gene network modelling
Heterosis
6 Yvan Saeys, Donostia 2004
Ancient large-scale gene duplications
• Investigate major events during evolutionary past of genomes:– Large scale gene duplications
– Genome duplications
• Research– Algorithms to detect colinear regions
– Compare intra and inter species
– Arabidopsis: 3 whole genome duplications
– Comparisons between Arabidopsis and Rice
– Duplications in vertebrate genomes
Klaas Vandepoele Cedric Simillion
7 Yvan Saeys, Donostia 2004
Large-scale duplications
synteny
ancient duplication
HsaC1
HsaC9
recent duplication
C2
C4
colinearity
8 Yvan Saeys, Donostia 2004
Ancient large-scale gene duplications
A 1 2 3 4 5 6 7 8 9 10 11 11 11 11 11
B 1 2 3 4 5 6 7 8 9 10 11 11 11 11 11
A 1 2 4 5 6 7 8 9 10 11 11 11 11
B 2 3 4 6 7 8 9 11 11 11 11 11
11
11 11 11
11
Building genomic profiles
C 12456789101111111111 1111 11 4
Not significant !
C 12456789101111111111 1111 11 4
Not significant
A 1 2 4 5 6 7 8 9 10 11 11 11 11
B 2 3 4 6 7 8 9 11 11 11 11 11
11
11 11 11
11
C1
24
56
78
910
1111
1111
1111
1111
4
significant homology!
9 Yvan Saeys, Donostia 2004
Functional divergence of duplicated genes
• Duplications stimulate biological novelties– Investigate what happens to duplicated genes
– Study of models for gene evolution
– Genes are not individual entities, but members of gene families
• Research– Up to 65% of the genes in Arabidopsis belong to a gene family
– Divergence at the regulatory/expression level
– Divergence at the coding level.
Tine CasneufJeroen Raes
11 Yvan Saeys, Donostia 2004
Bacterial comparative genomics
• Investigation of multiple bacterial genomes– Genomes evolve over time, changing in subtle or radical
ways, constantly adapting to the surrounding environment– Genomes can evolve gradually through vertical
transmission of mutations, gene duplications, deletions, and rearrangements
– Alternatively, they can evolve more suddenly and sporadically via horizontal transfer of genetic information between different microbial species
• Research– Assess the contribution of gene duplications to genome
evolution in prokaryotes
Dirk Gevers
12 Yvan Saeys, Donostia 2004
Bacterial comparative genomics
Functional Landscape of the Paranome (FLOP):•Linking functional information to the paranome information•Allows us to determine whether paralog retention is biased towards specific functional classes for each of the bacterial strains
13 Yvan Saeys, Donostia 2004
Transcription factors
• Towards a better understanding of the link between evolution and development (evo-devo)– Transcription factors play a major role in the regulation of
gene expression– Study the evolutionary and functional divergence of genes
belonging to large transcription factor gene families
• Research– Structural and phylogenetic analyses of the MADS-box gene
family– Comprehensive view on the regulatory role of MADS-box genes
in plant development – Phylogenetic footprinting
Stefanie De Bodt
15 Yvan Saeys, Donostia 2004
Genome Annotation
• Structural annotation of genes/genomes– Locate genes in genomes
– Find the exact gene structures
– Investigation of particular gene families
• Research– Development of an automatic annotation platform that can
be applied to different genomes
– Genomes: Arabidopsis, Poplar, Medicago, Ostrecoccus tauri
Stephane Rombauts
Lieven Sterck
Steven Robbens
16 Yvan Saeys, Donostia 2004
Genome Annotation platform
RepeatMasker
Codingpotential search
SplicePredictor
Netstart
NetGene2
BlastnBlastx
EuGene
Intrinsicapproaches
Extrinsicapproaches
PredictedGenes
(structural annotation)
17 Yvan Saeys, Donostia 2004
Dataset construction for Poplar
Let EuGene make prediction based on
extrinsic dataEuGene
Blastn
RepeatMasker Blastx
Extrinsicapproaches
IMM
Splicing: WAM
Start: const
Intrinsicapproaches
EuGene framework
Blast against Arabidopsis proteins with full length, discard cDNAs that have no hit
Training set of mapped
cDNAs
Poplar IMMSpliceMachin
e
Start predictio
n
Select predicted genes covered
by FL cDNA
Final prediction of EuGene
18 Yvan Saeys, Donostia 2004
Annotation of core cell cycle genes in Ostreococcus tauri
The CDK gene familyCell Cycle genes Arabidopsis thaliana Ostreococcus tauri
CDK A 1 1CDK B 4 1CDK C 2 1CDK D 3 1CDK E 1 0CDK F 1 0Cyclin A 10 1Cyclin B 9 1Cyclin D 10 1Cyclin H 1 1Wee1 1 1Cdc25 0 1Rb 1 1Dp 2 1E2F 3 1DEL 3 1Cks 2 1
19 Yvan Saeys, Donostia 2004
Machine Learning(applied to genome annotation)
• Computational techniques to identify structural elements– Supervised classification methods
– Support Vector Machines
– Feature selection for knowledge extraction
• Research– New splice site prediction models
– New feature selection techniques for gene prediction
– Leads to more accurate gene models
Sven Degroeve Yvan Saeys
21 Yvan Saeys, Donostia 2004
Feature selection for acceptor prediction50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
A
T
C
G
AAA AAA
AAT AAT
AAC AAC
AAG AAG
ATA ATA
ATT ATTATC ATC
ATG ATG
ACA ACA
ACT ACT
ACC ACC
ACG ACG
AGA AGA
AGT AGT
AGC AGC
AGG AGG
TAA TAA
TAT TAT
TAC TAC
TAG TAG
TTA TTA
TTT TTT
TTC TTC
TTG TTG
TCA TCA
TCT TCT
TCC TCC
TCG TCG
TGA TGA
TGT TGT
TGC TGC
TGG TGG
CAA CAA
CAT CAT
CAC CAC
CAG CAG
CTA CTA
CTT CTT
CTC CTCCTG CTG
CCA CCA
CCT CCT
CCC CCC
CCG CCG
CGA CGA
CGT CGT
CGC CGC
CGG CGG
GAA GAA
GAT GAT
GAC GAC
GAG GAG
GTA GTA
GTT GTTGTC GTC
GTG GTG
GCA GCA
GCT GCT
GCC GCC
GCG GCG
GGA GGA
GGT GGT
GGC GGC
GGG GGG
AA
AT
AC
AG
TA
TT
TC
TG
CA
CT
CC
CG
GA
GT
GC
GG
22 Yvan Saeys, Donostia 2004
Promoter prediction
• Computational identification of promoter regions– Signal elements
– Structural features
– Still many false positives
• Research– Develop new tools and approaches for the automatic
delineation of promoters
– Motif detection
– Detecting cis-regulatory elements
– Phylogenetic footprinting
Kobe Florquin
23 Yvan Saeys, Donostia 2004
Promoter predictionSensitivity Nucleosome Positioning
0,75
0,8
0,85
0,9
0,95
1
1 2 3 4 5 6 7 8 9 10
Specificity Nucleosome Positioning
0,75
0,8
0,85
0,9
0,95
1
1 2 3 4 5 6 7 8 9 10
Sensitivity B-DNA twist
0,75
0,8
0,85
0,9
0,95
1
1 2 3 4 5 6 7 8 9 10
Specificity B-DNA twist
0,75
0,8
0,85
0,9
0,95
1
1 2 3 4 5 6 7 8 9 10
Sensitivity DNA denaturation
0,75
0,8
0,85
0,9
0,95
1
1 2 3 4 5 6 7 8 9 10
Specificity DNA denaturation
0,75
0,8
0,85
0,9
0,95
1
1 2 3 4 5 6 7 8 9 10
Sensitivity Nucleosome Positioning
0,75
0,8
0,85
0,9
0,95
1
50 25 10 5
Specificity Nucleosome Positioning
0,75
0,8
0,85
0,9
0,95
1
50 25 10 5
Sensitivity B-DNA twist
0,75
0,8
0,85
0,9
0,95
1
50 25 10 5
Specificity B-DNA twist
0,75
0,8
0,85
0,9
0,95
1
50 25 10 5
Sensitivity DNA denaturation
0,75
0,8
0,85
0,9
0,95
1
50 25 10 5
Specificity DNA denaturation
0,75
0,8
0,85
0,9
0,95
1
50 25 10 5
different promoter classes different percentage promoters
Intron-Exon | Simple | Dinucleotide | Markov
a.
c.
k.h.e.b.
j.g.d.
l.i.f.
24 Yvan Saeys, Donostia 2004
Non coding RNAs
• Many RNA molecules are not protein coding but instead function through their RNA form– Known a long time: transfer RNAs (tRNA), ribosomal
RNAs (rRNA)– Only recently discovered: small interfering RNAs
(siRNA), micro RNAs (miRNA), …– Regulate gene expression at the post-transcriptional level
• Research– Developing different computational tools and techniques to
detect and characterize non-coding RNAs in Arabidopsis and other plant genomes
Jan Wuyts Eric Bonnet
27 Yvan Saeys, Donostia 2004
Genetic networks
• Integrate functional genomics data of all types in a global network that reflects the regulatory wiring and modularity of an organism– Micro-array data from perturbation experiments
– Leaf development
• Research– Novel methods, based on combinatorial statistics and graph
theory
– Unsupervised classification techniques (k-core clustering, Kohonen maps)
Steven Maere Steven Vercruysse
28 Yvan Saeys, Donostia 2004
Genetic networks
Comb. p-value < 0.01
k-core clustering GO labeling & visualization
Gene profiles
Experim
ents
29 Yvan Saeys, Donostia 2004
Genetic networksHierarchical clustering
Many other algorithms…
Self-organizing map
- Regulatory interactions
Goal: getting information about:
- Protein function (same profile => same biol. process?)
30 Yvan Saeys, Donostia 2004
Heterosis
• Modeling of “hybrid vigour”– Improved performance of F1 hybrids with respect to the
parents
– Dominance Model
– Over-dominance Model
– Epistatic Model
– biometrics versus soft-computing approach
• Research– Additive versus dominance effects– Estimation of the molecular phenotype of the hybrid
Jeroen Meeus Elena Tsiporkova
31 Yvan Saeys, Donostia 2004
Heterosis: Biometrics Approach
2500
0 ge
nes
10 parents 45 hybrids
25000 genes
biomassleaf size
…
biomassleaf size
…
10 parents 45 hybrids
heterotic non-heterotic
Step 3prediction
Step 1correlation
hybrid-parents
Step 2correlation morphological-
molecular phenotypes
Step 2correlation morphological-
molecular phenotypes
Molecular Phenotypes
Morphological Phenotypes
32 Yvan Saeys, Donostia 2004
Heterosis: Soft-Computing Approach
2500
0 ge
nes
10 parents 45 hybrids
25000 genes
biomassleaf size
…
biomassleaf size
…
10 parents 45 hybrids
heterotic non-heterotic
direct classification
simulation
associationassociation
Molecular Phenotypes
Morphological Phenotypes
33 Yvan Saeys, Donostia 2004
Databases• European ribosomal RNA
database http://www.psb.ugent.be/rRNA/
• European Plant Promoter database (PlantCARE)
http://oberon.fvms.ugent.be:8080/
PlantCARE/index.html
• European Federated Plant Database Network (Planet) http://mips.gsf.de/proj/planet/about.html
Software• Tree construction: TreeCon
• Tools: ForCon, SPADS, ZT, AFLPinSilico
• Large-scale duplications: Adhore, i-Adhore, ASaturA
Websitehttp://bioinformatics.psb.ugent.be
Francis Dierick: databases, webmaster, support
Gert Sclep: CATMA and CAGE databases
34 Yvan Saeys, Donostia 2004
“Part-time” Phd students
Secretary
Guy Baele: Modelling the covarion hypothesis
Dirk Vandycke: Extrinsic gene prediction approaches
Ann Bostyn