32
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Collaborative Construction of Large Biological Ontologies Jie Bao a , This work is in collaboration with Zhiliang Hu b , LaRon Hughes b , Doina Caragea c , Peter Wong a , James Reecy b , Vasant G Honavar a a Artificial Intelligence Research Laboratory, Department of Computer Science a Center for Computational Intelligence, Learning, and Discovery b Department of Animal Science, Iowa State University, Ames, IA 50011, USA c Department of Computing and Information Sciences, Kansas State University Manhattan, KS 66506 Email: {baojie, zhu, laron, pwwong,jreecy, honavar}@iastate.edu, [email protected]

Collaborative Construction of Large Biological Ontologies

  • Upload
    jie-bao

  • View
    1.222

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Collaborative Construction of Large Biological Ontologies

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Collaborative Construction of Large Biological Ontologies

Jie Baoa,

This work is in collaboration with Zhiliang Hub, LaRon Hughesb, Doina Carageac, Peter Wonga, James Reecyb,

Vasant G Honavara

aArtificial Intelligence Research Laboratory, Department of Computer ScienceaCenter for Computational Intelligence, Learning, and Discovery

bDepartment of Animal Science, Iowa State University, Ames, IA 50011, USA

c Department of Computing and Information Sciences, Kansas State University Manhattan, KS 66506

Email: {baojie, zhu, laron, pwwong,jreecy, honavar}@iastate.edu, [email protected]

Page 2: Collaborative Construction of Large Biological Ontologies

2

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Outline

• Collaborative Ontology Building (COB) Desiderata

• Limitations of CVS-based Collaboration

• COB-based on Modular Ontologies

• The COB Editor

Page 3: Collaborative Construction of Large Biological Ontologies

3

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Large Biological Ontologies

Gramineae Taxonomy

Plant Ontology

Gene Ontology

MGED Ontology

(microarray)

Page 4: Collaborative Construction of Large Biological Ontologies

4

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Example: Gene Ontology

Page 5: Collaborative Construction of Large Biological Ontologies

5

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Non-collaborative Ontology Building

DownloadOntology Local Editing

UploadOntology

(single curator)

(Protégé) (OBO-Edit)

Page 6: Collaborative Construction of Large Biological Ontologies

6

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Collaboration In NeedExample: Gene Ontology Consortium

Page 7: Collaborative Construction of Large Biological Ontologies

7

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Collaboration In Need (2)

Swine

Cattle Chicken

Horse

Each group works on an ontology module for a particular species (according to the group’s best expertise)

Example 2: an animal trait ontology that involves multiple research groups across the world

Page 8: Collaborative Construction of Large Biological Ontologies

8

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Challenges

• Knowledge Integration

• Concurrence Management

• Consistency Maintenance

• Privilege Management

• History Maintenance

• Scalability

Page 9: Collaborative Construction of Large Biological Ontologies

9

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Solutions1. Pipeline

• Divide the ontology building process into sequential phrases

• Each phrase is assigned to a particular contributor.

2. CVS• CVS = Concurrent Version System• Treat an ontology as a single file/document;• Use collaborative tools like CVS to build the ontology.

3. Modular Ontology • Build the ontology with fine-grained modules; • Different contributors can concurrently edit different

modules.

<= Very limited collaboration

<= Collaboration with high cost

<= Our approach

Page 10: Collaborative Construction of Large Biological Ontologies

10

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Outline

• Collaborative Ontology Building (COB) Desiderata

• Limitations of CVS-based Collaboration

• COB-based on Modular Ontologies

• The COB Editor

Page 11: Collaborative Construction of Large Biological Ontologies

11

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

CVS-based Ontology Building

Get GO CVS Account

Get Source Forge Account

Set Up CVS Access

Submit Change Request

Track the Request

User submit change suggestion

(in natural language)

Get Source Forge Account

Take a Change Request

Curator

Download Whole GO Flat File

Local Editing

Make Local Log File

Save GO Flat File

Manual Version Control

Commit the Whole New Ontology to CVS

Page 12: Collaborative Construction of Large Biological Ontologies

12

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Unprincipled Authorization and Organization

• No principled mechanism to ensure curator privilege assignments,

• No clear organizational division of the whole ontology into smaller manageable units.

Page 13: Collaborative Construction of Large Biological Ontologies

13

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Risk of Inconsistency

• No principled way to avoid unintended couplings and over-writing.

• The validity and consistency of the ontology are heavily dependent on – the curator discipline and

– good community communications (e.g., via email lists).

Page 14: Collaborative Construction of Large Biological Ontologies

14

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Lack of Partial Editing/Reuse

• A curator has to – download the entire ontology, before

editing, – and submit the entire modified ontology,

after editing;

• A user cannot download and reuse only a selected subset of the ontology

• High communication and memory overhead!

Page 15: Collaborative Construction of Large Biological Ontologies

15

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Expensive History Maintenance

• Even a minor edit of the ontology causes the ontology file to be replicated in its entirety

• Tracing the changing history of a term requires processing the entire ontology file for comparisons (e.g., diff)

Page 16: Collaborative Construction of Large Biological Ontologies

16

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Limited Participation

• Since all editing has global effect, it is diffcult to – grant privileges scope to different types of users

(e.g., core curators versus normal curators)– accept/deny/modify/revert local changes made

by other curators

• The curator community has to be limited to a small number of trusted curators.

Page 17: Collaborative Construction of Large Biological Ontologies

17

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Outline

• Collaborative Ontology Building (COB) Desiderata

• Limitations of CVS-based Collaboration

• COB based on Modular Ontologies

• The COB Editor

Page 18: Collaborative Construction of Large Biological Ontologies

18

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Basic Strategy

• Localize the interactions among different parts of a large ontology.

• Build an ontology with fine-grained organizational structure.

• Allow group collaboration on different ontology modules.

Page 19: Collaborative Construction of Large Biological Ontologies

19

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Package-based Ontologies

• The whole ontology consists of a set of packages

• Each package represents a fragment of the whole ontology

• Each term has a "home package"

General Cattle

Pig Chicken

Animal Trait ontology

EggChicken

ReproductionGeneral

Page 20: Collaborative Construction of Large Biological Ontologies

20

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Package Nesting• A nested package is a part of

another package• Could be used to represent

the organizational structure of an ontology– Arrange knowledge– Enforce hierarchical

management of knowledge

General

Pig

Pig Health

Animal trait ontology

Page 21: Collaborative Construction of Large Biological Ontologies

21

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Division of Labor

• A package can be assigned to curators with the best knowledge of the relevant sub-domain. – e.g. Pig Health, Pig Reproduction

• The package hierarchy helps to manage interactions among experts with different degrees of expertise.– e.g. Pig, Pig Health

Page 22: Collaborative Construction of Large Biological Ontologies

22

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Partial Reuse

General Cattle

Pig Chicken

Animal Trait Ontology(Centralized)

Pork

General

Pig

Cattle

Chicken

Pork

Animal Trait Ontology(Package-based)

Semantic importing

Knowledge incorporated in Pork ontology

Knowledge not presented in Prok ontologyLegend:

Page 23: Collaborative Construction of Large Biological Ontologies

23

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Scaleability• Reduction in communication overhead and

computational time cost – Parsing– Transfering– Consistency check

• Reduction in memory requirements– Ontology can be partially loaded into memory

• Reduction in history tracking cost– Effect of changes is localized

Page 24: Collaborative Construction of Large Biological Ontologies

24

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Broadened Participation• Open-community collaboration success witnessed

by DMOZ and Wikipedia• Package-based ontology management can

– Control the scope of an editing action– Minimize the risk of vandalization

• Better tradeoff between broader participation and ontology quality– There are different levels of curators, e.g. ontology

admins, pig experts, pig health experts.– An editing action can be approved or denied by a

curator with higher privileges

Page 25: Collaborative Construction of Large Biological Ontologies

25

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Outline

• Collaborative Ontology Building (COB) Desiderata

• Limitations of CVS-based Collaboration

• COB-based on Modular Ontologies

• The COB Editor

Page 26: Collaborative Construction of Large Biological Ontologies

26

Iowa State University Department of Computer ScienceArtificial Intelligence Research LaboratoryThe COB Editor

Pig Package

Cattle Package

Chicken Package

[Bao et al. BIDM06]

Page 27: Collaborative Construction of Large Biological Ontologies

27

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Collaborative Ontology BuildingOntology modularity facilitates collaborative building• Each package can be independently developed• Different curators can concurrently edit the

ontology on different packages• Ontology can be only partially loaded• Unwanted interactions are minimized by limiting

term and axiom visibility• Module access privileges can be controlled by the

package hierarchy

Page 28: Collaborative Construction of Large Biological Ontologies

28

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Work with COB Editor

Download

• http://www.animalgenome.org/bioinfo/projects/ATO/

• http://sourceforge.net/projects/cob (source code)

Get Ontology Account

Check out a package

CuratorCreate new

package

or Lock Package

Edit the Package

Commit the Package

(Auto) Server Change Log

Page 29: Collaborative Construction of Large Biological Ontologies

29

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

More Features• Support import/export from/to OWL and

OBO format– can be used for Gene Ontology and others

• Ontology shared on a database server• Allows multi-relational hierarchies

– e.g. both is-a and part-of

• Visibility of a term can be controlled by scope limitation modifiers– e.g. public, private, protected

Page 30: Collaborative Construction of Large Biological Ontologies

30

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Conclusions

• Modular ontologies can improve collaborative ontology building in many aspects

• Package-based Ontology offers an "importing" based ontolog language.

• COB Editor provides the necessary tool to collaboratively build well-structured, large-scale, biomedical ontologies

Page 31: Collaborative Construction of Large Biological Ontologies

31

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Future Work

• Support of inference and consistency checking

• Accommodation and modularization of existing ontologies, e.g. GO, EC, SCOP

• Support of ontology mapping and ontology integration

• Support of more expressive ontologies, e.g. UMLS, SNOMED

Page 32: Collaborative Construction of Large Biological Ontologies

32

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Thanks!