34
Confessions/ Disclaimers Ontologies and REDfly CARO SO OBO Foundry

Confessions/Disclaimers Ontologies and REDfly CARO SO OBO Foundry

Embed Size (px)

Citation preview

•Confessions/Disclaimers

•Ontologies and REDfly

•CARO

•SO

•OBO Foundry

Database of known Drosophila cis-regulatory elements

•over 675 CRMs so far from >150 genes•includes sequence and expression pattern

•includes links to other relevant databases

•soon will include transcription factor binding site data (FlyReg)

•soon will include images of expression patterns

•data exchange with ORegAnno in development

REDflyRegulatory Element Database for Drosophila

green fluorescent proteinminimal promoterCRM

reporter construct

make transgenic animal

Ontologies and REDfly

Ontologies and REDfly

Our overall goal is to create a comprehensive source of sequence and expression pattern data for Drosophila transcriptional cis-regulatory modules within a completely interoperable, ontology-compliant database framework. This framework can then be used as a model for managing CRM data from any other organism.

The OBO FoundryBarry SmithUniversity at Buffalohttp://ontology.buffalo.edu/smith

Ontology: A Vision for the Future and Its Realization

new

undergoing rigorous reform

GO Gene OntologyChEBI Chemical Ontology CL Cell OntologyFMA Foundational Model of AnatomyPaTO Phenotype Quality OntologySO Sequence OntologyCARO Common Anatomy Reference Ontology CTO Clinical Trial OntologyFuGO OBI Ontology of Biomedical InvestigationPrO Protein Ontology RnaO RNA Ontology RO Relation Ontology

CARO/Drosophila anatomy ontologyCARO/Drosophila anatomy ontology

OBIOBI

GOGO

organism/species ontology

organism/species ontology

CRM type(promoter, silencer, etc.)

SOSO

Ontologies and REDfly

Ontologies and REDfly: anatomy

Ontologies and REDfly: anatomy

Ontologies and REDfly: GO

Table 1: GO terms representing > 20% of genes associated with theREDfly subset CRMsGO Term GO ID % of

Genesmolecular function

binding GO:0005488 80.56%nucleic acid binding GO:0003676 55.56%transcription regulator activity GO:0030528 54.17%DNA binding GO:0003677 48.61%transcription factor activity GO:0003700 40.97%protein binding GO:0005515 25.69%

biological processphysiological process GO:0007582 93.75%development GO:0007275 83.33%metabolism GO:0008152 75.69%primary metabolism GO:0044238 73.61%regulation of biological process GO:0050789 65.28%nucleobase, nucleoside, nucleotide and

nucleic acid metabolismGO:0006139 59.72%

transcription GO:0006350 57.64%morphogenesis GO:0009653 50.69%cell differentiation GO:0030154 45.83%embryonic development GO:0009790 43.75%cell communication GO:0007154 28.47%signal transduction GO:0007165 26.39%reproduction GO:0000003 25.00%cell organization and biogenesis GO:0016043 24.31%

Ontologies and REDfly: GO

Ontologies and REDfly: complex queries

mus musculusmus musculus

evx1evx1

central nervous systemcentral nervous system

CARO: Common Anatomy Reference

Ontology

CARO: Common Anatomy Reference Ontologyhttp://www.bioontology.org/wiki/index.php/CARO:Main_Page

The main focus of this workshop is to pave the way for interoperability between the anatomical ontologies developed for various organisms (including human) by agreeing on shared methodologies for building our respective ontologies.

1. a list of relations (especially part_of) used within anatomical anatomies, including definitions and rules for consistent use within anatomy ontologies;

2. a list of major organizational units of biological organisms at all levels of granular partitions (e.g. biological macromolecule, cell, organ);

3. a representation of developmental stages of organisms; are anatomy and development two separate or one single integrated ontology? If separate what are the relations between them and how should they be applied;

4. a method that allows automated reasoners to recognize homologous anatomical structures of different species.

In order to meet our objectives, we need to create a common anatomy reference ontology (CARO) designed to ensure interoperability of the anatomy ontologies developed for specific organisms. This common ontology will comprehend both top-level categories and a common set of relations to be used within anatomical ontologies; CARO will be embedded in a set of principles for constructing anatomy ontologies for different organisms at different developmental stages.

CARO: Common Anatomy Reference Ontologyhttp://www.bioontology.org/wiki/index.php/CARO:Main_Page

CARO: Common Anatomy Reference Ontologyhttp://www.bioontology.org/wiki/index.php/CARO:Main_Page

Committed or likely to commit:

zebrafish (ZFIN)

Drosophila (FlyBase)

human (FMA)

amphibians

cyprinoforme fishes

hymenoptera (bees, wasps, ants)

mouse (MGI)

Dictyostelium (dictyBase)

C. elegans (WormBase)

SO: Sequence Ontology

SO: Sequence Ontologyhttp://www.sequenceontology.org

The Sequence Ontology is a set of terms and relationships used to describe the features and attributes of biological sequence. It encompasses both "raw" features, such as nucleotide similarity hits, and interpretations such as gene models. It also provides a rich set of attributes to describe these features such as "polycistronic" and "maternally imprinted".

REDfly: CRM, TFBS

ORegAnno: Regulatory haplotype, regulatory polymorphism, regulatory region, TFBS

SO: Sequence Ontologyhttp://www.sequenceontology.org

REDfly: CRM, TFBS

ORegAnno: Regulatory haplotype, regulatory polymorphism, regulatory region, TFBS

Obi asks: Discussion item: Should further sub-categorization of regulatory regions be allowed (e.g. Silencer, enhancer, locus-control region, etc)

I say: not only “should” but need to be present

SO: Some problemsA cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter.

A regulatory_region where more than 1 TF_binding_site together are regulatorily active.Synonyms: CRM

A region of a molecule that binds to a transcription factor.

A regulatory_region where more than 1 TF_binding_site together are regulatorily active.Synonyms: CRM

is_a TF_module

enhancer

SO: Some problems

SO: More problems?

The region on a DNA molecule involved in RNA polymerase binding to initiate transcription.

A region of sequence which is part of a promotor.

partial listing of known core promoter motifs

SO: More problems?

Combination of short DNA sequence elements which suppress the transcription of an adjacent gene or genes.

sequence elementsshort DNA

“sequence elements” not defined in ontology

A regulatory_region where more than 1 TF_binding_site together are regulatorily active.Synonyms: CRM

is_a TF_module?

silencer

SO: Some problems

A DNA region that includes DNAse hypersensitive sites located 5' to a gene that confers the high-level, position-independent, and copy number-dependent expression to that gene.

is_a TF_module

LCR

A TF_module that regulates the activity

of more than one gene within a defined locus

What terms are required to adequately capture cis-regulatory information?

Yuh and Davidson (1998)

“Composite_CRM” ?

Please provide suggestions/examples/cases to me ([email protected]), the RegCreative Wiki, or via the SO mailing list (https://lists.sourceforge.net/lists/listinfo/song-devel)

CRM_part?

What terms are required to adequately capture cis-regulatory information?

What additional terms & definitions do we need to add and what needs to be changed?

Please provide suggestions/examples/cases to me ([email protected]), the RegCreative Wiki, or the

SO mailing list (https://lists.sourceforge.net/lists/listinfo/song-devel)