10
Information System for Bee Gene Annotation Xin He Beespace Grouping Meeting Nov 30, 2005

Information System for Bee Gene Annotation

Embed Size (px)

DESCRIPTION

Information System for Bee Gene Annotation. Xin He Beespace Grouping Meeting Nov 30, 2005. Motivation. Analysis of bee microarray expression data requires an information system that provides functions not available elsewhere No public database dedicated to honey bee - PowerPoint PPT Presentation

Citation preview

Page 1: Information System for Bee Gene Annotation

Information System for Bee Gene Annotation

Xin He

Beespace Grouping Meeting

Nov 30, 2005

Page 2: Information System for Bee Gene Annotation

Motivation

Analysis of bee microarray expression data requires an information system that provides functions not available elsewhere

No public database dedicated to honey bee Non-traditional queries. Example: EST

queries, find similarly expressed genes, etc.

Page 3: Information System for Bee Gene Annotation

Tasks

Gene homologs Gene GO terms GO term genes Gene genes with similar expression Gene genes with similar GO annotation

Page 4: Information System for Bee Gene Annotation

Database Design: Basic Entities Ids: biological sequences. Three subtypes

Gene Protein EST

Gonames: GO terms

Page 5: Information System for Bee Gene Annotation

Database Design: Basic Relationship Homologs: pairwise sequence similarity Gos: gene annotation Gosims: pairwise similarity of GO annotations Exprsims: pairwise simiarity of gene

expression pattern

Page 6: Information System for Bee Gene Annotation

Implementation of Tasks

Gene homologs: BLAST all pairs of genes. Choose E-value threshold 10E-10

Gene GO terms Fly: downloaded from Gene Ontology Bee: from bee biologists

GO term genes

Page 7: Information System for Bee Gene Annotation

Implementation of Tasks

Gene genes with similar expression: compute pairwise Pearson correlation. Choose threshold 0.5

Gene genes with similar GO annotation

Page 8: Information System for Bee Gene Annotation

GO-based Similarity

Idea: two genes are similar if they share some GO terms. Favor specific GO terms

View each gene as a document and a GO term as a term

Vector-space model: let t be a term, g be a gene, then TF(t,g) = 1 if g is annotated with t; 0 o/w IDF(t) = log[n/n(t)] n(t): #genes annotated with t

Cosine similarity

Page 9: Information System for Bee Gene Annotation

Demonstration…

Page 10: Information System for Bee Gene Annotation

For Discussion

Internal database, shared by all Beespace projects. Include: Genes, Proteins, GO Terms, Expression

Ontology-based similarity: applications? “Candidate genes” retrieval. Example: find all

genes involved in segmentation clock