21
Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas Bittner Colin Batchelor

Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Embed Size (px)

Citation preview

Page 1: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

Ontological relations and computable definitions for sequences at DNA, RNA

and protein levels

Karen Eilbeck

Neocles Leontis

Thomas Bittner

Colin Batchelor

Page 2: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

Two sections

1. A report on the joint RNAO and SO meeting held in SLC in April 2008 (Eilbeck, Leontis and Bittner)

2. Computable definitions for 1D and 2D structures (Batchelor)

Page 3: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

Ontological Relations for Sequences at DNA, RNA, and protein levels

A report on the joint RNAO and SO meeting held in SLC in April 2008.

Karen Eilbeck

Neocles Leontis

Thomas Bittner

Page 4: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

Aim of meeting

• Coordinate the development of relationships between SO and RNAO

Page 5: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

Universals and instances

• Universal: repeatable or recurrent entities that can be instantiated or exemplified by many particular things.

• Instance: A universal may have instances, known as its particulars. They identify single objects such as “that chromosome under that microscope”.

Page 6: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

What is a sequence?

• Sequence is a universal. A sequence can be located in places at the same time.

• Manifestation of the sequence happens at the molecular level.

Page 7: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

Same sequence different molecule.

Page 8: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

Identifying regions and relations between regions

• Category theory.– Morphism: relationship between some posited

domain and codomain.– Isomorphism between dna and RNA (both

directions)– Morphism between rna and protein (information

loss from protein to rna.)– Morphism between DNA and protein.

Page 9: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

Next step 1: core terms and relations

http://song.cvs.sourceforge.net/*checkout*/song/ontology/working_draft.obo

Page 10: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

Next step 2: even more relationships

• Homology and similarity relationships

• Topological relationships• Supportive evidence relationships

Page 11: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

Next step 3: Description logic

• Conversion of core types and relations to formal logic.

• A sound foundation to build upon for the features and other types in RNAO and SO

Page 12: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

People:Karen Eilbeck - SO

University of [email protected]

Neocles Leontis - RNAOBGSU

[email protected]

Thomas Bittner - OBOBuffalo

[email protected]

Colin Batchelor - relations in SORSC

[email protected]

Page 13: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

Computable definitions

Colin Batchelor

Page 14: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

Computable definitionsThese consist of necessary and sufficient conditions. Generally

written in OBO or OWL format.

Example from SO: any primary transcript that is adjacent to a cap must be a capped_primary_transcript, and conversely all capped_primary_transcripts are primary transcripts that are adjacent to caps.

id: SO:0000861name: capped_primary_transcriptdef: "A primary transcript that is capped." [SO:xp]intersection_of: SO:0000185 ! primary_transcriptintersection_of: adjacent_to SO:0000581 ! cap

Page 15: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

What does this buy us?

It makes ontology maintenance easier for the curators.

But most importantly:

With computable definitions, reasoners can in principle annotate automatically…

Page 16: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

Loops (1)Consider an example 1D sequence:

……(((((….((….))..))).))…

The definition of a tetraloop could look like this:

tetraloop =”.…” that (adjacent_to “(“) and (adjacent to “)”)

Much like the capture group in the regex \((\.{4})\)

Page 17: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

Loops (2):(includes cardinality)

loop = “.+” that adjacent_to “(“ and adjacent_to “)”

diloop = loop that has_part “.” cardinality exactly 2

triloop = loop that has_part “.” cardinality exactly 3

Page 18: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

Loops (3): stem-loopsAssume no kinks or bulges or pseudoknots. Take a simple

example: ((((..))))

“424 stemloop” = sequence that has_part (“(“ cardinality exactly 4 that adjacent_to diloop) and has_part (diloop adjacent_to “(“) and has_part (diloop adjacent_to “)”) and has_part (“)” cardinality exactly 4 that adjacent_to diloop)

But what about the general case?

Page 19: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

Loops (4): stem-loops and formal grammar

This:\({n}\.+\){n}

is not a valid regular expression. It reduces to anbn, which is well-known to be non-regular.

Likewise in OWL you cannot say cardinality exactly n. So what do we do?

Page 20: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

A way out

Write the necessary and sufficient conditions in terms of the 2D structure.

Hence:stem-loop = structure thathas_part (base-pair that bound_to base-pair)

and has_part (base-pair that bound_to loop) and has_part loop

Page 21: Karen Eilbeck 7/22/08 Ontological relations and computable definitions for sequences at DNA, RNA and protein levels Karen Eilbeck Neocles Leontis Thomas

Karen Eilbeck 7/22/08

What next?

Write necessary and sufficient conditions for some example motifs.

Take 2D structures in RNAML that contain known example motifs.

Convert RNAML to OWL.Run reasoner.