19
TOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS KONDYLAKIS FORTH-ICS VPH WORKSHOP ON CLINICAL DATA MANAGEMENT & SUSTAINABILITY AMSTERDAM, MARCH 2015

Tools and repositories for data storage or distributed …users.ics.forth.gr/~kondylak/publications/2015_VPH.pdfTOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS

Embed Size (px)

Citation preview

Page 1: Tools and repositories for data storage or distributed …users.ics.forth.gr/~kondylak/publications/2015_VPH.pdfTOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS

TOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESSHARIDIMOS KONDYLAKIS

FORTH-ICS

VPH WORKSHOP ON CLINICAL DATA MANAGEMENT & SUSTAINABILITY

AMSTERDAM, MARCH 2015

Page 2: Tools and repositories for data storage or distributed …users.ics.forth.gr/~kondylak/publications/2015_VPH.pdfTOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS

OUTLINE

1. MOTIVATION

2. INTEROPERABILITY & DATA INTEGRATION

3. EXPERIENCES

4. LESSONS

Page 3: Tools and repositories for data storage or distributed …users.ics.forth.gr/~kondylak/publications/2015_VPH.pdfTOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS

1. MOTIVATION: BIG DATA ERA

Page 4: Tools and repositories for data storage or distributed …users.ics.forth.gr/~kondylak/publications/2015_VPH.pdfTOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS

1.1 SCIENTIFIC DATA GROWTH

[Howe & Halperin, 2012]

Page 5: Tools and repositories for data storage or distributed …users.ics.forth.gr/~kondylak/publications/2015_VPH.pdfTOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS

1.2 E-SCIENCE

Enhanced quality

of care

Ready Available

Information

Improved cost

efficiency

Ubiquitous access

to medical

information

Page 6: Tools and repositories for data storage or distributed …users.ics.forth.gr/~kondylak/publications/2015_VPH.pdfTOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS

2. INTEROPERABILITY & DATA INTEGRATION

• Often although people build databases in isolation, they want to share their

data

• Interoperability means the ability of two or more information systems to

accept data from each other [Bhartiya & Mehrotra, 2014]

• Data integration is the problem of providing unified and transparent access to

a collection of data stored in multiple, autonomous, and heterogeneous data

sources [Calvanese, 2006]

• Integration generally goes beyond mere interoperability to involve some

degree of functional dependency also.

Page 7: Tools and repositories for data storage or distributed …users.ics.forth.gr/~kondylak/publications/2015_VPH.pdfTOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS

2.1 WHY IT IS A DIFFICULT PROBLEM?• Number of sources / size of the problem

• Location of the sources / source discovery

• does a source that supposedly fulfills

my info needs exist?

• where is it located?

• Heterogeneity of the sources

• system (Web Services, WSDL/SOAP, etc..)

• syntactic (HTML, XML, RDF, RDBS, etc..)

• structural (DB schemas, XML DTDs, RDF/OWL Ontologies)

• semantic (class Painter =? Creator, “John Smith” =? http://foo-

ns/Smith)

• Autonomy, Volatility & different capabilities of the sources

• Legal, security, and privacy issues

Page 8: Tools and repositories for data storage or distributed …users.ics.forth.gr/~kondylak/publications/2015_VPH.pdfTOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS

2.2 ONTOLOGIES/TERMINOLOGIES

Ontologies : “Formal models about how we perceive a domain of interest and provide a

precise, logical account of the intended meaning of terms, data structures and other elements

modeling the real world”

Page 9: Tools and repositories for data storage or distributed …users.ics.forth.gr/~kondylak/publications/2015_VPH.pdfTOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS

Oracle Clinical - EHR

INDIVO- X PHR

Medications

Name Dose Route Start End

DONEPEZIL 5mg Oral 01/201

2

06/201

3

Encounter

Symptoms Encounter Date

nausea 02/02/2012

2.3 LINKING STRUCTURED DATA

Page 10: Tools and repositories for data storage or distributed …users.ics.forth.gr/~kondylak/publications/2015_VPH.pdfTOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS

2.4 ADDING METADATA TO UNSTRUCTURED DATA

• it's just data

• it's data describing other data

• its' meant for machine consumption

disease

name

symptoms

drug

administration

Page 11: Tools and repositories for data storage or distributed …users.ics.forth.gr/~kondylak/publications/2015_VPH.pdfTOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS

2.5 ONTOLOGIES & EXCHANGE STANDARDS(JUST A FEW)

• Mesh

• Medical Subject Headings, National Library of Medicine

• 22.000 descriptions

• EMTREE

• Commercial Elsevier, Drugs and diseases

• 45.000 terms, 190.000 synonyms

• UMLS

• Integrates 100 different vocabularies

• SNOMED

• 200.000 concepts, College of American Pathologists

• Gene Ontology

• 15.000 terms in molecular biology

• NCBI Cancer Ontology: • 17,000 classes (about 1M definitions),

EHR Exchange Standards

• HL7

• A collection of message formats and related

clinical standards

• DICOM

• Attributes which contains a multitude of image

related information

• HITECH

• Developing and harmonizing standards that

could support diversified needs of

interoperable health data exchange

Page 12: Tools and repositories for data storage or distributed …users.ics.forth.gr/~kondylak/publications/2015_VPH.pdfTOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS

3.1 PAST EXPERIENCES: THE ACGT

• Project Outcomes (wrt data management)

• The ACGT Master Ontology

• The ACGT Semantic Mediator

• The ACGT Data Access Services

• Issues identified

• Performance issues

• One ontology not enough

• The interaction with data access wrappers

is not easy

One Ontology to rule them all

Page 13: Tools and repositories for data storage or distributed …users.ics.forth.gr/~kondylak/publications/2015_VPH.pdfTOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS

3.2 EXPERIENCES: THE P-MEDICINE

• Project Outcomes (wrt data management)

• HDOT Ontology this time trying

to integrate existing ontologies

• Ontology Annotator/Data

Translator

• Data Warehouse

• Issues identified

• Interfaces should conform to

highly restrictive legal policies

One Ontology to unify them all

Page 14: Tools and repositories for data storage or distributed …users.ics.forth.gr/~kondylak/publications/2015_VPH.pdfTOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS

3.3 EXPERIENCES: EURECA & INTEGRATE

EHR

ResearchDW

CodingRules(CSV / HL7 templates)

HL7 Export / Mirth Connect

Export to HL7-Based CSV

@Data Provider@EURECA Platform

SYNTACTIC NORMALIZATION: DATA PUSH SERVICE

SEMANTIC NORMALIZATION PIPELINE

HL7-based CDMORIGINAL

HL7-based CDMNORMALIZED

HL7v3

HL7v3

CSV Template + Doc

CSV

IHE-based HL7 Templates

Validation

MIRTH Connect

Pentaho

SNOMED Normal Form

Terminology Binding Service (SNOMED, LOINC, HGNC to HL7)

Guidelines for data

exportation

Terminology Linking Service (Bioportal)

N

L

P

Page 15: Tools and repositories for data storage or distributed …users.ics.forth.gr/~kondylak/publications/2015_VPH.pdfTOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS

3.4 EXPERIENCES: MYHEALTHAVATAR

Page 16: Tools and repositories for data storage or distributed …users.ics.forth.gr/~kondylak/publications/2015_VPH.pdfTOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS

4.1 LESSON LEARNED

• Lesson 1: Data Integration is hard.

• No single solution exists.

• We like experimenting (in many cases “try and cry”)

• Lesson 2: Lack of coordination between standards/terminologies/ontologies.

• Semantic inconsistencies between them.

Page 17: Tools and repositories for data storage or distributed …users.ics.forth.gr/~kondylak/publications/2015_VPH.pdfTOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS

4.2 LESSONS LEARNED

• Lesson 3: Multiple Ontologies/Terminologies for Clinical Data Management are needed.

• No single ontology to rule them all.

• The community can benefit from guidance on vocabularies to represent data and an integrated library

with the recommended ontologies.

• Lesson 4: In many cases technical problems are less important than the legal and economic

issues.

• Most of the data are proprietary. Getting approval from the legal department can be challenging.

• The right of the patient to own his data is crucial.

Page 18: Tools and repositories for data storage or distributed …users.ics.forth.gr/~kondylak/publications/2015_VPH.pdfTOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS

REFERENCES

• [Howe & Halperin, 2012] Howe, B., Halperin, D.: Advancing Declarative Query in the

Long Tail of Sciencem, IEEE Data Engineering Bulletin, vol. 35, no. 3, pp. 16-26,

September 2012.

• [Bhartiya & Mehrotra, 2014] Bhartiya, S., Mehrota, D.: Challenges and

Recommendations to Healthcare Data Exchange in an interoperable environment,

eJHI Journal, 8 (16), 2014

• [Calvanese, 2006] Calvanese, D.: Query processing in Data Integration Systems, BIT

PHD Summer School, 2006

Page 19: Tools and repositories for data storage or distributed …users.ics.forth.gr/~kondylak/publications/2015_VPH.pdfTOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESS HARIDIMOS

THANK YOUQUESTIONS?