23
Chani & Malki present : Project adviser: Dr. Ron Wides The OdzFinder

Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

Embed Size (px)

Citation preview

Page 1: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

Chani & Malki present:

Project adviser: Dr. Ron Wides

The OdzFinder

Page 2: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

WANTED

Name: Odz

a.k.a: Ten-m

Family: pair-rule gene

Length: 10,000 bp

Page 3: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

Getting to Know Odz… Discovered in D. Melanogaster in 1994

Odz protein is expressed in neurons, developing brain and hindgut

Odz protein is expressed in segmentation.

Od Od z

Belongs to pair rule gene family

Plays a crucial role in the CNS during fetal development

Page 4: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

The Odz Family

Ten-m1Ten-m2Ten-m3Ten-m4

Ten-a

Ten-m

Ten-m

Vertebrates

Arthropods

Odz gene orthologs have been found in 3 phylums:

Nematodes

Page 5: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

The Odz Protein

2731 Amino Acids

III. hydrophobic sequences, probably transmembrane sequence

EGF-like domain Intracellular kinase substrate domain ODZ

The only pair rule gene that encodes a protein!

Contains 3 domains:

I. extracellular EGF-like repeats

II. tyrosine kinase phosphorylation sites

Page 6: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

EGF-like Repeats

x(4)-C-x(0,48)-C-x(3,12)-C-x(1,70)-C-x(1,6)-C-x(2)-G-a-x(0,21)-G-x(2)-C-x

EGF-like domain: 30 - 40 amino acid residues Significant homology to epidermal growth factor

(EGF) Has been found in single or multiple copies in a

number of other proteins Generally found in the extracellular domain of

membrane proteins or secreted proteins Involved in receptor-ligand interactions Includes 6 conserved cysteine residues involved in

disulfide bonds

Page 7: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

The lab’s goals:

Genomics:

To find a broad family of Odz gene

Phylogenetic trees to discover segmentation mechanism

Massive alignment to find conserved regions

Biological in-vivo experiments to change regions

Proteomics:

The protein’s role

How the protein functions

The protein’s interactions with other proteins ( i.e : notch)

Page 8: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

Finding Odz Genes

BLASTing new EST libraries

DataBases

Se/uences discovered

in the lab

EST Libraries

Odz DataBase

Extracting DNA from various innocent creatures

BLASTing existing databases

Page 9: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

Odz Database

The collected data was organized by Michal

Markovitz in a relational database.

The database consists of 10 different tables.

For example:

Page 10: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

2 problems remained:

1. Blast results include many non Odz hits:

• prokaryotic hits• non-metazoan hits• EGF region hits• Low similarity

We need a program to automatically extract Odz hits from NCBI Blast results!!!

0

10

20

30

40

50

60

70

80

low scoreprokaryoticnon-metazoanOdzEgf-like

2. Every day…• New sequences are added to the existing databases• New EST libraries are released

Page 11: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

A perl program that will automatically extract Odz hits from NCBI Blast results.

The OdzFinder

Page 12: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

Blast Report Tax Report

UpdateDatabase

Combination

Look up table

Evalue>y?

Score>x? Score>x?

Evalue>y?

Odz

EGF?

Metazoan?

Prokaryote?

All EGFNo EGF

Mixed EGF

no

yes

yes

yes

yes

yes

input

S.O.F.T - screen Odz Flow Template

Page 13: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

>gi|163076235|gb|AC765764.7 Apis mellifera BAC clone RP11-18D7 , complete sequence

Length = 184032  

Score = 153 bits (328), Expect = 3e-36 Identities = 59/59 (100%), Positives = 59/59 (100%)

Frame = +3 / +3  

Query: 3 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH 179

IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH

Subjct: 3 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH 179 

The program extracts relevant information from each hit:

inputBlast Report

 

BLASTS are performed on the Odz orthologs

The results are sent to the OdzFinder program to be filtered.

Page 14: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

>gi|163076235|gb|AC765764.7 Apis mellifera BAC clone RP11-18D7 , complete sequence

Length = 184032  

Score = 153 bits (328), Expect = 3e-36 Identities = 59/59 (100%), Positives = 59/59 (100%)

Frame = +3 / +3  

Query: 3 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH 179

IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH

Subjct: 3 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH 179 

Taxonomy Report Eukaryota .................................. 2502 hits 41 orgs [root; cellular organisms] . Bilateria ................................ 2421 hits 33 orgs [Fungi/Metazoa group; Metazoa; Eumetazoa] . . Coelomata .............................. 2396 hits 31 orgs . . . Deuterostomia ........................ 2322 hits 23 orgs . . . . Chordata ........................... 2296 hits 22 orgs . . . . . Euteleostomi ..................... 2236 hits 21 orgs [Craniata; Vertebrata; Gnathostomata; Teleostomi] . . . . . . Tetrapoda ...................... 2022 hits 14 orgs [Sarcopterygii] . . . . . . . Amniota ...................... 1908 hits 12 orgs . . . . . . . . Eutheria ................... 1634 hits 10 orgs [Mammalia; Theria]

Search for eukaryotic and metazoan results.

Build prokaryotic database for possible future use.

Evolutional distance becomes relevant when dealing with EGF-like repeats.

The program will receive the BLAST hit’s Taxonomy Report and manipulate it into a manageable hash table.

A default Taxonomy Report will be available when BLASTing against ESTs.

inputBlast Report Tax Report

;

root ;cellular organisms ;Eukaryota ;Fungi/Metazoa group ;Metazoa ;Eumetazoa ;Bilateria ;Coelomata ;Protostomia ;Panarthropoda ;Arthropoda ;Mandibulata ;Pancrustacea ;Hexapoda ;Insecta ;Dicondylia ;Pterygota ;Neoptera ;Endopterygota ;

Hymenoptera ;Apocrita ;Aculeata ;Apoidea; Apidae; Apinae; Apini; Apis

Page 15: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

Tenascin-m (odz) includes 8 EGF-like repeats

The conserved EGF region gave problematic results.

Many hits appear only due to their similarity to the EGF region.

Query :

Subject :

EGF?

High score!!!

Page 16: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

There are three possible positions regarding the hit’s relation to the query’s EGF-like region-

I. The hit is completely inside the query’s EGF-region

525 2750804Query

Hit

II. The hit is completely outside the query’s EGF-region

525 804Query

Hit

III. The hit is partially in the query’s EGF-region

804525Query

Hit

Page 17: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

Get a better picture..

Page 18: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

score & e-value are examined

Set low threshholds to ensure that very small hits are not missed - some times

they are translocations

Position I:

The hit is completely outside the query’s EGF-like region

Evalue<y?

Score>x?

Odz

yes

yes

No EGF

Page 19: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

Position II:

The hit is completely inside the query’s EGF-like region

Look up table example:

In order to prevent acceptance of non-odz hits with high scores due to their egf-region , a look up table was established

evolutionally close query & subject high id % demanded

evolutionally distant query & subject low id % demanded

Query HitOdz OrthologOdz Paralog

Mus MusculusHomo Sapiens95%70%

Mus MusculusDrosophila Melanogaster

75%55%

Look up table

Score>x?

Evalue>y?

Odz

yes

yes

?

All EGF

Page 20: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

Position III :

The hit is partially inside the query’s EGF-like region

2 Possibilities:

A. False call ! An EGF hit with insignificant similarity outside of EGF-domains.

B. The Real Thing ! EGF with adjacent regions of significant similarity.

A B

Treat like II

Is it more like A or like B?

Treat like I

Mixed EGF

Page 21: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

DBIUpdate Database

:Data flow through DBI

A database interface module for Perl

Enables Perl applications to access multiple database types

Provides a consistent database interface independent of the actual database being used

DBD::MSQLMySQLRDBMSDBIPerl Script

Page 22: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

giscorespecies

49256537140Xenopus

48096180637Apis mellifera

45382362619Gallus gallus

42658224125Homo sapiens

34932761384Rattus norvegicus

38087011463Mus musculus

45446084419Drosophila melanogaster

325657151604Caenorhabditis elegans

41469033760Gasterosteus aculeatus

Results!

EGF

Odz

not Metazoa

ProkaryoticEGF

Odz

not Metazoa

Prokaryotic

Page 23: Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

Special thanks to our project adviser

Dr. Ron Wides

For his guidance, patience & Krispy Kreme donuts