Upload
nguyendat
View
217
Download
3
Embed Size (px)
Citation preview
TOOLS AND REPOSITORIES FOR DATA STORAGE OR DISTRIBUTED DATA ACCESSHARIDIMOS KONDYLAKIS
FORTH-ICS
VPH WORKSHOP ON CLINICAL DATA MANAGEMENT & SUSTAINABILITY
AMSTERDAM, MARCH 2015
OUTLINE
1. MOTIVATION
2. INTEROPERABILITY & DATA INTEGRATION
3. EXPERIENCES
4. LESSONS
1. MOTIVATION: BIG DATA ERA
1.1 SCIENTIFIC DATA GROWTH
[Howe & Halperin, 2012]
1.2 E-SCIENCE
Enhanced quality
of care
Ready Available
Information
Improved cost
efficiency
Ubiquitous access
to medical
information
2. INTEROPERABILITY & DATA INTEGRATION
• Often although people build databases in isolation, they want to share their
data
• Interoperability means the ability of two or more information systems to
accept data from each other [Bhartiya & Mehrotra, 2014]
• Data integration is the problem of providing unified and transparent access to
a collection of data stored in multiple, autonomous, and heterogeneous data
sources [Calvanese, 2006]
• Integration generally goes beyond mere interoperability to involve some
degree of functional dependency also.
2.1 WHY IT IS A DIFFICULT PROBLEM?• Number of sources / size of the problem
• Location of the sources / source discovery
• does a source that supposedly fulfills
my info needs exist?
• where is it located?
• Heterogeneity of the sources
• system (Web Services, WSDL/SOAP, etc..)
• syntactic (HTML, XML, RDF, RDBS, etc..)
• structural (DB schemas, XML DTDs, RDF/OWL Ontologies)
• semantic (class Painter =? Creator, “John Smith” =? http://foo-
ns/Smith)
• Autonomy, Volatility & different capabilities of the sources
• Legal, security, and privacy issues
2.2 ONTOLOGIES/TERMINOLOGIES
Ontologies : “Formal models about how we perceive a domain of interest and provide a
precise, logical account of the intended meaning of terms, data structures and other elements
modeling the real world”
Oracle Clinical - EHR
INDIVO- X PHR
Medications
Name Dose Route Start End
DONEPEZIL 5mg Oral 01/201
2
06/201
3
Encounter
Symptoms Encounter Date
nausea 02/02/2012
2.3 LINKING STRUCTURED DATA
2.4 ADDING METADATA TO UNSTRUCTURED DATA
• it's just data
• it's data describing other data
• its' meant for machine consumption
disease
name
symptoms
drug
administration
2.5 ONTOLOGIES & EXCHANGE STANDARDS(JUST A FEW)
• Mesh
• Medical Subject Headings, National Library of Medicine
• 22.000 descriptions
• EMTREE
• Commercial Elsevier, Drugs and diseases
• 45.000 terms, 190.000 synonyms
• UMLS
• Integrates 100 different vocabularies
• SNOMED
• 200.000 concepts, College of American Pathologists
• Gene Ontology
• 15.000 terms in molecular biology
• NCBI Cancer Ontology: • 17,000 classes (about 1M definitions),
EHR Exchange Standards
• HL7
• A collection of message formats and related
clinical standards
• DICOM
• Attributes which contains a multitude of image
related information
• HITECH
• Developing and harmonizing standards that
could support diversified needs of
interoperable health data exchange
3.1 PAST EXPERIENCES: THE ACGT
• Project Outcomes (wrt data management)
• The ACGT Master Ontology
• The ACGT Semantic Mediator
• The ACGT Data Access Services
• Issues identified
• Performance issues
• One ontology not enough
• The interaction with data access wrappers
is not easy
One Ontology to rule them all
3.2 EXPERIENCES: THE P-MEDICINE
• Project Outcomes (wrt data management)
• HDOT Ontology this time trying
to integrate existing ontologies
• Ontology Annotator/Data
Translator
• Data Warehouse
• Issues identified
• Interfaces should conform to
highly restrictive legal policies
One Ontology to unify them all
3.3 EXPERIENCES: EURECA & INTEGRATE
EHR
ResearchDW
CodingRules(CSV / HL7 templates)
HL7 Export / Mirth Connect
Export to HL7-Based CSV
@Data Provider@EURECA Platform
SYNTACTIC NORMALIZATION: DATA PUSH SERVICE
SEMANTIC NORMALIZATION PIPELINE
HL7-based CDMORIGINAL
HL7-based CDMNORMALIZED
HL7v3
HL7v3
CSV Template + Doc
CSV
IHE-based HL7 Templates
Validation
MIRTH Connect
Pentaho
SNOMED Normal Form
Terminology Binding Service (SNOMED, LOINC, HGNC to HL7)
Guidelines for data
exportation
Terminology Linking Service (Bioportal)
N
L
P
3.4 EXPERIENCES: MYHEALTHAVATAR
4.1 LESSON LEARNED
• Lesson 1: Data Integration is hard.
• No single solution exists.
• We like experimenting (in many cases “try and cry”)
• Lesson 2: Lack of coordination between standards/terminologies/ontologies.
• Semantic inconsistencies between them.
4.2 LESSONS LEARNED
• Lesson 3: Multiple Ontologies/Terminologies for Clinical Data Management are needed.
• No single ontology to rule them all.
• The community can benefit from guidance on vocabularies to represent data and an integrated library
with the recommended ontologies.
• Lesson 4: In many cases technical problems are less important than the legal and economic
issues.
• Most of the data are proprietary. Getting approval from the legal department can be challenging.
• The right of the patient to own his data is crucial.
REFERENCES
• [Howe & Halperin, 2012] Howe, B., Halperin, D.: Advancing Declarative Query in the
Long Tail of Sciencem, IEEE Data Engineering Bulletin, vol. 35, no. 3, pp. 16-26,
September 2012.
• [Bhartiya & Mehrotra, 2014] Bhartiya, S., Mehrota, D.: Challenges and
Recommendations to Healthcare Data Exchange in an interoperable environment,
eJHI Journal, 8 (16), 2014
• [Calvanese, 2006] Calvanese, D.: Query processing in Data Integration Systems, BIT
PHD Summer School, 2006
THANK YOUQUESTIONS?