26

R ESEARCH G ENOME B IOINFORMATICS L AB R ESEARCH at G ENOME B IOINFORMATICS L AB Josep F. Abril Ferrando and Genís Parra Farré Genome BioInformatics Research

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

RRESEARCHESEARCHat

GGENOMEENOME BBIOINFORMATICSIOINFORMATICS LLABAB

Josep F. Abril Ferrandoand

Genís Parra Farré

Genome BioInformatics Research Lab

RGBI @ ( IMIM – UPF – CRG )

Introduction

Visualization of Genomic

Annotations

Comparative Genomics

Human and Mouse Genomes

Exon Structural SelectionBIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-

CRG)

SUMMARYSUMMARY

Computational Analysis of Genomic Computational Analysis of Genomic SequencesSequences

DNA SEQUENCE

Sequencing

ASSEMBLED SEQUENCE

Assembling

ANNOTATED SEQUENCE

Analyzing

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

From Genes to Genomes: Single GenesFrom Genes to Genomes: Single Genes

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

From Genes to Genomes: ChromosomesFrom Genes to Genomes: Chromosomes

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

From Genes to Genomes: Whole GenomesFrom Genes to Genomes: Whole Genomes

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Comparative Genomics: Single GenesComparative Genomics: Single Genes

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Comparative Genomics: Syntenic RegionsComparative Genomics: Syntenic Regions

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Programming in PProgramming in POSTOSTSSCRIPT (I)CRIPT (I)

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

%!PS

%

%% Variable Definition: $counter = 0

/counter 0 def

%

%% Function Definition: sub box(x,y) {...}

/box { %%% y x box

gsave %

20 mul % y X

0 % y X 0

moveto % y

20 mul % Y

dup % Y Y

10 0 % Y Y 10 0

rlineto % Y Y

0 % Y Y 0

exch % Y 0 Y

rlineto % Y

-10 0 % Y -10 0

rlineto % Y

neg % -Y

0 % -Y 0

exch % 0 -Y

rlineto %

closepath %

0 1 0 % 0 1 0

setrgbcolor % "green-color"

fill %

grestore %

} def %

Vector Graphics

Language

Prefix Notation

Stacks:

exec, paths, dicts, ...

Dictionaries:

Identifier Object

%

%% Initialization

100 100 translate % New Coords Origin

2 5 scale % Re-scaling x-axes*2

% % y-axes*5

%

%% BaseLine

gsave %

0 0 moveto %

90 0 lineto %

0 setgray %

1 setlinewidth %

stroke %

grestore %

Programming in PProgramming in POSTOSTSSCRIPT (II)CRIPT (II)

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

%

%% Main Loop

mark % mark

0.25 0.35 0.15 % mark 0.25 0.35 0.15

counttomark % mark 0.25 0.35 0.15 3

{ %%%%%%%%%%%%%% begin loop (x3)

/counter %%

counter %%

1 add %%

def %% $counter = $counter + 1

counter %

% 1st loop: mark 0.25 0.35 0.15 counter==1

% 2nd loop: mark 0.25 0.35 counter==2

% 2nd loop: mark 0.25 counter==3

box % mark ...

} repeat %%%%%%%%%%%%%% finish loop (x3)

pop % clean up stack (removes "mark")

%

showpage

%%EOF%%

GFF2PS and GFF2APLOTGFF2PS and GFF2APLOT

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Visualizing Genomic AnnotationsVisualizing Genomic Annotations

J.F. Abril and R. Guigó.

" gff2ps: visualizing genomic annotations "

Bioinformatics 16(8):743-744 (2000).

M.G. Reese, G. Hartzell, N.L. Harris, U. Ohler, J.F. Abril and S.E. Lewis.

" Genome Annotation Assessment in Drosophila melanogaster "

Genome Research 10(4):483-501 (2000).M.D. Adams et al (including J.F. Abril).

" The Genome Sequence of Drosophila melanogaster "

Science 287(5461):2185-2195 (2000).

J.C. Venter et al (including J.F. Abril and R. Guigó).

" The Sequence of the Human Genome "

Science 291(5507):1304-1351 (2001).

R.A. Holt et al (including J.F. Abril and R. Guigó).

" The Genome Sequence of the Malaria Mosquito Anopheles gambiae "

Science 298(5591):129-149 (2002).

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

http://genome.imim.es/software/gfftools/GFF2PS.html

Whole Genome Gene-FindingWhole Genome Gene-Finding

Homosapiens

GENES

abinitio

DATABASE

homology

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Whole Genome Gene-Finding: Comparative Whole Genome Gene-Finding: Comparative ApproachApproach

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Whole Genome Gene-Finding: Comparative Whole Genome Gene-Finding: Comparative ApproachApproach

GENES

Homosapiens

Musmusculus

GENES

homology

geneprediction

geneprediction

homology

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Whole Genome Gene-Finding Results Whole Genome Gene-Finding Results AnalysisAnalysis

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Human and Mouse Comparative GenomicsHuman and Mouse Comparative Genomics

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Mouse Genome Sequencing Consortium (including J.F. Abril, G. Parra and R. Guigó).

" Initial sequencing and comparative analysis of the mouse genome "

Nature 420(6915):520-562 (2002).

G. Parra, P. Agarwal, J.F. Abril, T. Wiehe, J.W. Fickett and R. Guigó.

" Comparative gene prediction in human and mouse "

Genome Research 13(1):108-117 (2003).

R. Guigó, E.T. Dermitzakis, P. Agarwal, C.P. Ponting, G. Parra, A. Reymond, J.F. Abril, E. Keibler, R. Lyle, C. Ucla, S.E. Antonarakis and M.R. Brent.

" Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes "

PNAS 100(3):1140-1145 (2003).

Predicting “Novel” Genes in the Mouse Predicting “Novel” Genes in the Mouse Genome (I)Genome (I)

golden path annotations

golden path annotations

additional blastn matches to ENSEMBL + REFSEQ

additional blastn matches to ENSEMBL + REFSEQ

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Predicting “Novel” Genes in the Mouse Predicting “Novel” Genes in the Mouse Genome (II)Genome (II)

tblastx

geneidexons

tblastx

sgpgenes

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

additional blastn matches to ENSEMBL + REFSEQ

Homosapiens

Predictions

Musmusculus

Predictions

GENESEnriched Pool

StructuralAlignment Exstral

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

HomologyBlastp

Homology and Gene Structure FilteringHomology and Gene Structure Filtering

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Exon Structure over an AlignmentExon Structure over an Alignment

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

RT-PCR ValidationRT-PCR Validation

Number of predictions

Tested Success Rate

Enriched 1428 214 62.15%

Similar 2125 38 10.53%

Other 3659 63 3.17%

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Results of the Experimental ValidationResults of the Experimental Validation

BIOINFORMÀTICA UPF T23 – 2003/03/06 – J.F. Abril and G. Parra @ Genome BioInformatics Lab – RGBI (IMIM-UPF-CRG)

Example of a Bash ScriptExample of a Bash Script

http://genome.imim.es/