Upload
miriam-stone
View
28
Download
0
Embed Size (px)
DESCRIPTION
Information System for Bee Gene Annotation. Xin He Beespace Grouping Meeting Nov 30, 2005. Motivation. Analysis of bee microarray expression data requires an information system that provides functions not available elsewhere No public database dedicated to honey bee - PowerPoint PPT Presentation
Citation preview
Information System for Bee Gene Annotation
Xin He
Beespace Grouping Meeting
Nov 30, 2005
Motivation
Analysis of bee microarray expression data requires an information system that provides functions not available elsewhere
No public database dedicated to honey bee Non-traditional queries. Example: EST
queries, find similarly expressed genes, etc.
Tasks
Gene homologs Gene GO terms GO term genes Gene genes with similar expression Gene genes with similar GO annotation
Database Design: Basic Entities Ids: biological sequences. Three subtypes
Gene Protein EST
Gonames: GO terms
Database Design: Basic Relationship Homologs: pairwise sequence similarity Gos: gene annotation Gosims: pairwise similarity of GO annotations Exprsims: pairwise simiarity of gene
expression pattern
Implementation of Tasks
Gene homologs: BLAST all pairs of genes. Choose E-value threshold 10E-10
Gene GO terms Fly: downloaded from Gene Ontology Bee: from bee biologists
GO term genes
Implementation of Tasks
Gene genes with similar expression: compute pairwise Pearson correlation. Choose threshold 0.5
Gene genes with similar GO annotation
GO-based Similarity
Idea: two genes are similar if they share some GO terms. Favor specific GO terms
View each gene as a document and a GO term as a term
Vector-space model: let t be a term, g be a gene, then TF(t,g) = 1 if g is annotated with t; 0 o/w IDF(t) = log[n/n(t)] n(t): #genes annotated with t
Cosine similarity
Demonstration…
For Discussion
Internal database, shared by all Beespace projects. Include: Genes, Proteins, GO Terms, Expression
Ontology-based similarity: applications? “Candidate genes” retrieval. Example: find all
genes involved in segmentation clock