21
A method to propagate permissions in biomedical data using a semantic web framework Helena F. Deus and Jonas S. Almeida [email protected] The University of Texas M. D. Anderson Cancer Center

A method to propagate permissions in biomedical data using a semantic web framework

  • Upload
    keran

  • View
    34

  • Download
    1

Embed Size (px)

DESCRIPTION

A method to propagate permissions in biomedical data using a semantic web framework Helena F. Deus and Jonas S. Almeida [email protected] The University of Texas M. D. Anderson Cancer Center. History of the web. Web 1.0 Links -> Documents Web 2.0 - PowerPoint PPT Presentation

Citation preview

Page 1: A method to propagate permissions in biomedical data using a semantic web framework

A method to propagate permissions in biomedical data using a semantic

web framework

Helena F. Deus and Jonas S. [email protected]

The University of Texas M. D. Anderson Cancer Center

Page 2: A method to propagate permissions in biomedical data using a semantic web framework

Web 1.0

Links -> Documents

Web 2.0

Links -> Data Structures -> Web services

Web 3.0

Links -> Web Services -> Links -> Web Services -> Links -> Web Services .…

History of the webHistory of the web

Page 3: A method to propagate permissions in biomedical data using a semantic web framework

Evolution of data representationEvolution of data representation

Nature Biotechnology. 2005 Vol 23 Nr 29Nature Biotechnology. 2005 Vol 23 Nr 29

Page 4: A method to propagate permissions in biomedical data using a semantic web framework

Clinical/Medical data

MDAxxxxMDAxxxx

MDAxxxx

Electronic Health Records

RDBMS

Life is good!

Data management in the life Data management in the life sciencessciences

Page 5: A method to propagate permissions in biomedical data using a semantic web framework

MDAxxxxMDAxxxx

MDAxxxx

RDBMS

Clinical/Medical dataCore facilities data

Microarrays

Protein Arrays

Pulse Field Gel Electrophoresis

DNA Sequencing

Data everywhere!

Heterogeneous data managementHeterogeneous data management

Page 6: A method to propagate permissions in biomedical data using a semantic web framework

Semantic web of data: a set of best practicesSemantic web of data: a set of best practices

Page 7: A method to propagate permissions in biomedical data using a semantic web framework

W3CW3C

Data

Information

Knowledge

Wisdom

Files

TEXT

XML

RDF

OWL, OBO

SPARQL

A data pyramidA data pyramid

Page 8: A method to propagate permissions in biomedical data using a semantic web framework

S3DB Core ModelS3DB Core Model

Page 9: A method to propagate permissions in biomedical data using a semantic web framework

Snapshots of interfaces using S3DB’s API (Application Programming Interface). These applications exemplify why the semantic web designs can be particularly effective at enabling generic tools to assist users in exploring data documenting very specific and very complex relationships. Snapshot A was taken from S3DB’s web interface, which is included in the downloadable package. This interface was developed to assist in managing the database model and, therefore, is centered on the visualization and manipulation of the domain of discourse, its Collections of Items and Rules defining the documentation of their relations. The application depicted on snapshots B-D describe a document management tool S3DBdoc, freely available as a Bioinformatics Station module (see Figure 6). The navigation is performed starting from the Project (C), then to the Collection (B) and finally to the editing of the Statements about an Item (D). The snapshot B illustrates an intermediate step in the navigation where the list of Items (in this case samples assayed by tissue arrays, for which there is clinical information about the donor) is being trimmed according to the properties of a distant entity, Age at Diagnosis, which is a property of the Clinical Information Collection associated with the sample that originated the array results. This interaction would have been difficult and computationally intensive to manage using a relational architecture. The RDF formatted query result produced by the API was also visualized using a commercial tool, Sentient Knowledge Explorer (IO-Informatics Inc), shown in snapshot E, and by Welkin, F, developed by the digital inter-operability SIMILE project at the Massachusetts Institute of Technology. See text for discussion of graphic representations by these tools. To protect patient confidentiality some values in snapshots B and D are scrambled and numeric sample and patient identifiers elsewhere are altered.

PLoS ONE. 2008 Aug 13;3(8):e2946PLoS ONE. 2008 Aug 13;3(8):e2946

Page 10: A method to propagate permissions in biomedical data using a semantic web framework

http://tcga.s3db.org

Example: TCGA data structureExample: TCGA data structure

Page 11: A method to propagate permissions in biomedical data using a semantic web framework

Patient

Sample

Patient

Sample

blood

S3DB Rule

Tissue

tumor

??

sampleX patientYR427

http://tcga.s3db.org/R247

S3DB Statement http://tcga.s3db.org/S234

Page 12: A method to propagate permissions in biomedical data using a semantic web framework

TCGA domain - instanceTCGA domain - instance

PLoS ONE. 2008 Dec;3(12):e4076PLoS ONE. 2008 Dec;3(12):e4076

Page 13: A method to propagate permissions in biomedical data using a semantic web framework

SPARQLSPARQL

Page 14: A method to propagate permissions in biomedical data using a semantic web framework

Code portability and distributed dataCode portability and distributed data

API API

APISPARQL

Page 15: A method to propagate permissions in biomedical data using a semantic web framework

Permission managementPermission management

Markov Model

Page 16: A method to propagate permissions in biomedical data using a semantic web framework

Permission propagationPermission propagation

Page 17: A method to propagate permissions in biomedical data using a semantic web framework

Upper ontologies

Intermediate Ontologies

Domain-Specific Ontologies

MGED and others

Experimental, evolving Data Models

Proposed entry level for computation

Current entry level for computation

Raw data

Experimental evolving ontologiesExperimental evolving ontologies

Page 18: A method to propagate permissions in biomedical data using a semantic web framework

exfoliatins104enterotoxins103ClfB102

LN2 viability test101institution100antibiotic consumption97MRSE frequency96MRSA frequency95Plasmid analysis81mechanism and genes74target73name63number of children62DCC61bed size60specialty59category58SCCmec typing57Rep-PCR56Dot-blot55LN2 freezing54patient clinical data53Hospital52final classification51species and tests50code49indoor area48outdoor area47number of employees46number of rooms45country, city44country, state/province/county, city43-80oC42isolate reference41susceptibility40ITQB isolate39MIC38alternative name373-4 letter code36name35country, state/province/county, city34PCR genes amplification33Agr32susceptibility31beta-lactamase30isolates from same subject29MIC28setting, hospital/DCC/heard, service/room, ICU27project, period26collection date25disk inhibition24subject type23full name22class21abbreviation20Antibiotic19SmaI hybridization bands18Phagetyping17Ribotyping16other15hemolysins14leukocidins13project, station12disk inhibition11PFGE10ClaI-mecA::Tn5549MLST8patient (or subject) demographic data7patient admittance data6collection site5RAPD4monthly fee3Doubling time2Spa typing1Entity#

exfoliatins104enterotoxins103ClfB102

LN2 viability test101institution100antibiotic consumption97MRSE frequency96MRSA frequency95Plasmid analysis81mechanism and genes74target73name63number of children62DCC61bed size60specialty59category58SCCmec typing57Rep-PCR56Dot-blot55LN2 freezing54patient clinical data53Hospital52final classification51species and tests50code49indoor area48outdoor area47number of employees46number of rooms45country, city44country, state/province/county, city43-80oC42isolate reference41susceptibility40ITQB isolate39MIC38alternative name373-4 letter code36name35country, state/province/county, city34PCR genes amplification33Agr32susceptibility31beta-lactamase30isolates from same subject29MIC28setting, hospital/DCC/heard, service/room, ICU27project, period26collection date25disk inhibition24subject type23full name22class21abbreviation20Antibiotic19SmaI hybridization bands18Phagetyping17Ribotyping16other15hemolysins14leukocidins13project, station12disk inhibition11PFGE10ClaI-mecA::Tn5549MLST8patient (or subject) demographic data7patient admittance data6collection site5RAPD4monthly fee3Doubling time2Spa typing1Entity#

Day 5

Day 17

Day 365

Ontology-centric web clientS3DB is equipped with REST application programming interface (API ) , that is, client applications can be easily weaved by composing URL calls with variable values.

A year A year in the life of in the life of a semantic a semantic databasedatabase

A year A year in the life of in the life of a semantic a semantic databasedatabase

• Seeding: The first stage of usage of the semantic database is characterized by a focus on the domain of discourse. In this seeding stage many Rules are inserted without validation by submission of actual data (Statements).

• Seeding: The first stage of usage of the semantic database is characterized by a focus on the domain of discourse. In this seeding stage many Rules are inserted without validation by submission of actual data (Statements).

Day 152

Growth: This third pattern of usage is much longer than the previous two and corresponds to a relative light activity editing the domain of discourse while, on the contrary, an intensification of the database access by the target community of users. This is distinct from the preceding Calibration state where data submission is frequently aided or even mediated by the database developers.

• Maturation: The end of the data acquisition program that motivated the creation of the database is sometimes associated with a decrease in the insertion of new data (Statements) and a near stop in the editing of the domain of discourse (Rules). This period of maturation therefore produces a stable data service that remains useful and is accessed regularly. We found this period to be ideal for harvesting: exporting the database schema for analysis of the knowledge domain, including the designing of intuitive Graphic User Interfaces.

Document-centric clients… and client side applications can be easily developed, relying only on the

REST protocol to interoperate with the S3DB DBMS service.

S3DB is being used for a variety of molecular epidemiology domains, for

example, for Cancer Research:

Day 25

S

essions 0 100 200 300 400 500 600 700 800 900 1000

Rules

0 10 20 30 40 50 60 70

Users

0

5

10

15

20

25

Statements per rule

0

500

1000

1500

2000

2500

0

• Calibration: once the submission of data triples (Statements) intensifies, the seed data model is reconsidered and is significantly edited. This second stage is characterized by heavy activity both regarding expanding or updating the domain of discourse and also regarding submission of data. We found this to be the right time to engage the user community with training programs.

• Calibration: once the submission of data triples (Statements) intensifies, the seed data model is reconsidered and is significantly edited. This second stage is characterized by heavy activity both regarding expanding or updating the domain of discourse and also regarding submission of data. We found this to be the right time to engage the user community with training programs.

Page 19: A method to propagate permissions in biomedical data using a semantic web framework

What is S3DB?

It is a web service that manages semantic web content distinguishing the domain of discourse from its instantiation. It was configured specifically for the needs of Biomedical Informatics projects where:

1.Those who submit the data keep a fine tuned control over its access and use.

2.The data model is deployed over a core ontology that allows its editing.

3.It has a distributed deployment designed to deal with heterogeneous environments.

What S3DB is not?

1. It is not a client application.

2. It is not a “work in progress”: a SPARQL endpoint assures that experimental data is not kept outside of the Linked Data Web until is matures

S3DB.ORG

Page 20: A method to propagate permissions in biomedical data using a semantic web framework

In ConclusionIn ConclusionDissolution of boundaries

between data structures is a good thing… But doing it without losing the role of each data element is even better

Some level of explicit granularity in the data is necessary to implement a permission model.

Page 21: A method to propagate permissions in biomedical data using a semantic web framework

AcknowledgemenAcknowledgementsts

Jonas S. AlmeidaKadir AkdemirMiriã CoelhoCintia PalúPablo Freire

The Integrative Bioinformatics Lab at the University of Texas MD Anderson Cancer Center (Houston, Tx)

Instituto de Tecnologia Quimica e Biologica, Universidade Nova de Lisboa (Lisbon, Portugal)

http://s3db.org