Upload
ida-sim
View
668
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Demo of end-to-end federation of human studies design data using semantic web approaches and the Ontology of Clinical Research as the reference semantics.
Citation preview
Human Studies Database ProjectCTSA Informatics All Hands MeetingOctober 13, 2011Ida Sim, UCSF, for the HSDB teamFunding: CTSAs and R01-RR-026040
And: Rob Wynden (UCSF), Davera Gabriel (UCDavis), Herb Hagler (UTSW), Meredith Nahm (Duke), Swati Chakraborty (Duke), Jahangheer Shaik (WUSL), Aniket Bhandare (WUSL), Richard Scheuermann (UTSW),
Alan Rector (U Manchester)
Jim Brinkley
U Wash
Simona Carini
UCSF
Todd Detwiler
U Wash
Harold Lehmann
Hopkins
Brad Pollock
UTHSC S Ant
Shamim Mollah
Rockefeller
Ida Sim
UCSF
Harold Solbrig
Mayo
Samson Tu
Stanford
Knut Wittkowski
Rockefeller
BERD
BERD
The HSDB Team
Outline
• HSDB Overview
• Quick Pass Demo
• Under the Hood
• Deeper Dive Demo and Discussion
• Summary
Broad Long-Term Objective
• Human studies a most valuable source of evidence
• Goal is a federated, CTSA-wide database of past and ongoing human studies– interventional and observational
• To enable large-scale computational reuse of human studies data for clinical and translational research– data mining
– systematic review
– planning future studies
Go for the Gold?
46.4 (39.2-51.2) 45.1 (39.9-50.5)
0.83 (0.79-0.99) 0.91 (0.93-1.04)
2.2 (1.7-3.4) 2.7 (1.1 - 4.1)
110 (87-134) 121 (99-129)
Main Results Table
Need Standardized Metadata
• e.g., SNOMED code for serum ionized calcium: 391084000
Age 46.4 (39.2-51.2) 45.1 (39.9-50.5)
ICa 0.83 (0.79-0.99) 0.91 (0.93-1.04)
Creatinine 2.2 (1.7-3.4) 2.7 (1.1 - 4.1)
Weight (kg) 110 (87-134) 121 (99-129)
Description of Study Protocol Critical for Interpreting Results
Age 46.4 (39.2-51.2) 45.1 (39.9-50.5)
ICa 0.83 (0.79-0.99) 0.91 (0.93-1.04)
Creatinine 2.2 (1.7-3.4) 2.7 (1.1 - 4.1)
Weight (kg) 110 (87-134) 121 (99-129)
• Baseline, primary, or secondary outcomes?
Garlic Chocolate
Age 46.4 (39.2-51.2) 45.1 (39.9-50.5)
ICa 0.83 (0.79-0.99) 0.91 (0.93-1.04)
Creatinine 2.2 (1.7-3.4) 2.7 (1.1 - 4.1)
Weight (kg) 110 (87-134) 121 (99-129)
Description of Study Protocol Critical for Interpreting Results
• Baseline, primary, or secondary outcomes?
• What do the columns represent?
Garlic Chocolate
Age 46.4 (39.2-51.2) 45.1 (39.9-50.5)
ICa 0.83 (0.79-0.99) 0.91 (0.93-1.04)
Creatinine 2.2 (1.7-3.4) 2.7 (1.1 - 4.1)
Weight (kg) 110 (87-134) 121 (99-129)
Need Ontology of Clinical Research
• Large scale computation of human studies data requires an ontology of study protocol metadata– for interpretive context around the results tables
HSDB Project Aims and Status
• Define the Ontology of Clinical Research (OCRe)– modeled study design typology, interventions, outcomes,
analyses, basic administrative data
• Define the data sharing architecture using OCRe as the reference semantics– using semantic web technology (after several detours)
• Pilot human studies data sharing from multiple CTSAs– demo of federated data queries over study designs
• sample studies at Rockefeller (n=186), Hopkins (n=2), and UCSF (n=4)
– sharing of summary-level results is Phase II
6 HIV and 186 StudiesStudies Study design
Intervention/Factor Primary outcome(s) Secondary outcome(s)
Taha (#1, UCSF*)
Parallel group, randomized
Arm 1: metronidazole + erythomycinArm 2: placebo
- Infant HIV Infection at 4-6 wks
- Composite of infant HIV infection and mortality, at 1 year of age
- Infant HIV Infection at 24-48 hours, and 12 months - etc.
Metzger (#2, UCSF*)
Parallel group, randomized
Arm 1: Buprenorphine/ Naloxone 3 wks + 52 wksArm 2: Buprenorphine/ Naloxone max 18 daysAll arms: counseling
HIV-1 Infection or death, at 104 week visit
- Death, through week 156- HIV-1 Infection every 6 months at scheduled follow up visits- etc.
German (#3, Hopkins)
Cohort HIV status at baseline Recognized HIV Infection, Wave 2 (3 years)
- Unrecognized HIV Infection, Wave 2 (3 years) - etc.
Wawer (#4, Hopkins)
Arm 1: Immediate circumcisionArm 2: Delayed circumcision
Male-to-female HIV transmission, throughout study
El-Sadr (#5, UCSF*)
Cohort Assigned to drug conservation (DC) arm or assigned to viral suppression (VS) arm in SMART study
HIV transmission risk behavior, end of study
HIV transmission risk behavior in participants who are not on ART at enrollment, end of study
Cohen (#6, UCSF*)
Cohort HIV-1 infection status: proven acute, established, or uninfected
- Prevalence of acute HIV infection, throughout study - etc.
Rockefeller: 186 studies
Interventional Observational
Data Sources
Local Servers
Query Integrator
Protocol Documents Electronic IRB (iMedRIS)
Registry
Johns Hopkins RockefellerUCSF (AWS)
XMLXML
auto-generationOCRe-XSDOCReOCRe
XML
Manual Bulk Upload
Calls BioPortal with a subsumption query on SNOMED, for children of “macrolide” [428787002]
Retrieve all studies where a macrolide antibiotic was administered
Searches for interventions within arms, for codes matching “macrolide” or children
Outline
• HSDB Overview
• Quick Pass Demo
• Under the Hood– OCRe
– Data Sharing Architecture
– Data Acquisition
– Federated Query
• Deeper Dive Demo and Discussion
• Summary
Data Sources
Local Servers
Query Integrator
Protocol Documents Electronic IRB (iMedRIS)
Registry
Johns Hopkins RockefellerUCSF (AWS)
XMLXML
auto-generationOCRe-XSDOCReOCRe
XML
Manual Bulk Upload
OCRe
• OWL 2.0 project at HSDBwiki.org, NCBO BioPortal
• Models human studies for scientific query and analysis
• Domain– all studies in which humans, parts of humans, or
groups of humans are enrolled, exposed, or observed
• Scope– all clinical domains, all variable types (quantitative,
qualitative, imaging, genomics, etc.)
Sim I, et al. AMIA CRI Summit 2010, p.51-55.
OCRe Import Graph
• OCRe: core OWL ontology containing all primitive concepts and relationships and selected defined classes
• OCRe_ext: extended with application/project specific defined classes
• HSDB_OCRe: includes HSDB information model annotations (for autogenerating XSD)
Modeling Study Outcomes and Analyses
HIV Infection
has_code 86406008
has_code_system_name SNOMED-CT
has_code_system_version2011_01_31
has_display_name Human immunodeficiency virus infection
Primary
4-6 weeks
Composite Outcomes: HIV Infection or Death at 1 year of age
• Need an expression grammar (HIV infection OR Death)
Var1: HIV Infection
Var2: Death
1 year of age
Primary
Outline
• HSDB Overview
• Quick Pass Demo
• Under the Hood– OCRe
– Data Sharing Architecture
– Data Acquisition
– Federated Query
• Deeper Dive Demo and Discussion
• Summary
Data Sources
Local Servers
Query Integrator
Protocol Documents Electronic IRB (iMedRIS)
Registry
Johns Hopkins RockefellerUCSF (AWS)
XMLXML
auto-generationOCRe-XSDOCReOCRe
XML
Manual Bulk Upload
• CShare, BRIDG, OCRe in UML, OpenMDR, CDEs, LexEVS, BioPortal caGrid, SHRINE, Dynamic Extensions, HOM, i2b2, etc.
October, 2010
OCRe-XSD
• Automatically generated from HSDB_OCRe– guided by annotations
• Elements are indexed to OCRe IDs via purl URIs
Outline
• HSDB Overview
• Quick Pass Demo
• Under the Hood– OCRe
– Data Sharing Architecture
– Data Acquisition
– Federated Query
• Deeper Dive Demo and Discussion
• Summary
XML Generator:1. Links IORG number2. Cleans textual data3. Generates xml file
Oracle DB RU iMedRISOracle DB
RU iMedRIS
HSDB xsd
schema
HSDB xsd
schema
SQL MapperSQL Mapper
RU HSDB xml
RU HSDB xml
XML Generator
XML Generator
generates rules for mapping data elements
extracts dataelements
SQL Mapper:1. Maps data elements using xsd
2. Transforms extracted data using analytics (data conversion, masking,
concatenation, etc.)
generates data elements table
RU HSDB Data Mapping Workflow
Hopkins and UCSF Workflow
• Manual selection of HIV-related protocols– from local IRB, ClinicalTrials.gov
• Manual instantiation into XSD-conformant XML– using Oxygen XML editor
Registry File of Published XML Instances
• http://purl.org/sig/reg/hsdb/hsdb_demo_registry.xml
•Rockefeller, 186 studies, at Rockefeller server
•UCSF, 4 studies, at Amazon Web Services
•Hopkins, 2 studies, at Hopkins server
Outline
• HSDB Overview
• Quick Pass Demo
• Under the Hood– OCRe
– Data Sharing Architecture
– Data Acquisition
– Federated Query
• Deeper Dive Demo and Discussion
• Summary
Query Integrator: Brinkley Lab, U Wash
• Write, save, reuse, and chain queries over any web accessible XML or RDF source
• SPARQL, XQuery, IML, etc.
http://www.si.washington.edu/projects/QI
BioPortal REST services (SNOMED)
UCSF HSDB Data
OCRe in OWL
Remote ServicesRemote Services
vSPARQLService
vSPARQLService
DXQueryService
DXQueryService
OtherServices
OtherServices
XMLXMLRDF/OWLRDF/OWL
Other ClientsOther Clients
QI Client
QI Client
RDF StoreRDF
StoreQuery
DatabaseQuery
Database
QI ServerQI Server
QESQES
QI Core
QI “Plugins”
Outline
• HSDB Overview
• Quick Pass Demo
• Under the Hood– OCRe
– Data Sharing Architecture
– Data Acquisition
– Federated Query
• Deeper Dive Demo and Discussion
• Summary
Four Illustrative Queries
• Interventions– all studies administering macrolide
• Study Design– all interventional studies
– all placebo-controlled randomized studies
• Outcomes– all studies with primary outcome of HIV Infection
Demo: Query on Interventions
• Interventions Query 1: all studies administering a macrolide– demonstrates query exploiting SNOMED’s semantic
hierarchies
– demonstrates modeling of arm structure in interventional studies
Chains a subsumption query to SNOMED, macrolide ID = 428787002
Retrieve all studies where a macrolide antibiotic was administered
BioPortal SNOMED subclass query
REST call to SNOMED in BioPortal
Cleans and returns all subclasses of SNOMED ID (e.g., 428787002 for Macrolide)
Matches studies where Arm tags contain SNOMED code of Macrolide or its children
Retrieve all studies where a macrolide antibiotic was administered
Query on Study Design
• Design Query 1: all interventional studies– demonstrates use of OCRe’s study design typology
• returns “parallel group” studies
Finding all study designs matching OCRe ID for “interventional” or its children
Retrieve all interventional studies
Query on Study Design
• Design Query 1: all interventional studies– demonstrates use of OCRe’s study design typology
• returns “parallel group” studies
• Design Query 2: placebo-controlled RCTs– demonstrates explicit modeling
• Intervention = placebo (re-using “macrolide” query)• StudyDesign = parallel group• AllocationType = Random Allocation, or OCRe child of
Finding all study designs matching OCRe ID for “parallel group”
Retrieve all placebo-controlled randomized trials
Finding allocation schemes under OCRe’s “random allocation” hierarchy
Finding interventions = SNOMED code for “placebo” [182886004]
Query on Outcome Variables
• Outcome Query 1: Any study for which HIV infection is a Primary outcome (single variable outcome)– demonstrates use of SNOMED hierarchy for “HIV
infection”
– illustrates timepoint-specific primary and secondary outcomes
Same BioPortal SNOMED subsumption call as for Macrolide query
Matches any outcome variable code to SNOMED ID for HIV Infection or children
All studies with HIV Infection as any single variable outcome
Call query for HIV infection as a single variable outcome with outcome priority = Primary
All studies with HIV infection as a Primary single variable outcome
Primary outcome is HIV Infection at 4-6 weeks
HIV Infection at 24-48 hours, and at 12 months, are Secondary outcomes
Outline
• HSDB Overview
• Quick Pass Demo
• Under the Hood– OCRe
– Data Sharing Architecture
– Data Acquisition
– Federated Query
• Deeper Dive Demo and Discussion
• Summary
Summary
• Human studies most valuable source of evidence on therapies, etc., should be computable at large scale
• Requires standardized semantics of study protocol features
• HSDB Project has demonstrated– OCRe can be used to describe range of studies– can capture OCRe-standardized data from source systems
via XSD schema– can federate data queries over XML instances, OCRe, and
SNOMED (via live queries to BioPortal)
• Data acquisition remains challenging
Future Work
• OCRe– expression grammar for composite outcomes
– ERGO for eligibiity criteria, summary-level results
• Data federation– converting XML instances to RDF
– data curation user interface
– friendlier query interface
• Data acquisition – mapping and bulk uploads from local source systems
– policy and staffing issues
Federate Your Data With Us!
• Bulk transformation of instances– map from your local schema to our XSD
– generate XML instance file
• Curate– when data acquisition/curation interface available, help
test
– curate data
• Publish locally– may install local Query Integrator
Links
• Project Links– HSDB http://hsdbwiki.org/
– OCRe http://code.google.com/p/ontology-of-clinical-research/ – Query Integrator http://sig.biostr.washington.edu/projects/queryintegrator
• Contacts– Overall: Ida Sim, [email protected]
– OCRe: Samson Tu, [email protected]
– Data federation: • Jim Brinkley, [email protected]
• Todd Detweiler [email protected]