66
Human Studies Database Project CTSA Informatics All Hands Meeting October 13, 2011 Ida Sim, UCSF, for the HSDB team Funding: CTSAs and R01-RR-026040

Human Studies Database Project (demo)

  • Upload
    ida-sim

  • View
    668

  • Download
    4

Embed Size (px)

DESCRIPTION

Demo of end-to-end federation of human studies design data using semantic web approaches and the Ontology of Clinical Research as the reference semantics.

Citation preview

Human Studies Database ProjectCTSA Informatics All Hands MeetingOctober 13, 2011Ida Sim, UCSF, for the HSDB teamFunding: CTSAs and R01-RR-026040

And: Rob Wynden (UCSF), Davera Gabriel (UCDavis), Herb Hagler (UTSW), Meredith Nahm (Duke), Swati Chakraborty (Duke), Jahangheer Shaik (WUSL), Aniket Bhandare (WUSL), Richard Scheuermann (UTSW),

Alan Rector (U Manchester)

Jim Brinkley

U Wash

Simona Carini

UCSF

Todd Detwiler

U Wash

Harold Lehmann

Hopkins

Brad Pollock

UTHSC S Ant

Shamim Mollah

Rockefeller

Ida Sim

UCSF

Harold Solbrig

Mayo

Samson Tu

Stanford

Knut Wittkowski

Rockefeller

BERD

BERD

The HSDB Team

Outline

• HSDB Overview

• Quick Pass Demo

• Under the Hood

• Deeper Dive Demo and Discussion

• Summary

Broad Long-Term Objective

• Human studies a most valuable source of evidence

• Goal is a federated, CTSA-wide database of past and ongoing human studies– interventional and observational

• To enable large-scale computational reuse of human studies data for clinical and translational research– data mining

– systematic review

– planning future studies

Go for the Gold?

46.4 (39.2-51.2) 45.1 (39.9-50.5)

0.83 (0.79-0.99) 0.91 (0.93-1.04)

2.2 (1.7-3.4) 2.7 (1.1 - 4.1)

110 (87-134) 121 (99-129)

Main Results Table

Need Standardized Metadata

• e.g., SNOMED code for serum ionized calcium: 391084000

Age 46.4 (39.2-51.2) 45.1 (39.9-50.5)

ICa 0.83 (0.79-0.99) 0.91 (0.93-1.04)

Creatinine 2.2 (1.7-3.4) 2.7 (1.1 - 4.1)

Weight (kg) 110 (87-134) 121 (99-129)

Description of Study Protocol Critical for Interpreting Results

Age 46.4 (39.2-51.2) 45.1 (39.9-50.5)

ICa 0.83 (0.79-0.99) 0.91 (0.93-1.04)

Creatinine 2.2 (1.7-3.4) 2.7 (1.1 - 4.1)

Weight (kg) 110 (87-134) 121 (99-129)

• Baseline, primary, or secondary outcomes?

Garlic Chocolate

Age 46.4 (39.2-51.2) 45.1 (39.9-50.5)

ICa 0.83 (0.79-0.99) 0.91 (0.93-1.04)

Creatinine 2.2 (1.7-3.4) 2.7 (1.1 - 4.1)

Weight (kg) 110 (87-134) 121 (99-129)

Description of Study Protocol Critical for Interpreting Results

• Baseline, primary, or secondary outcomes?

• What do the columns represent?

Garlic Chocolate

Age 46.4 (39.2-51.2) 45.1 (39.9-50.5)

ICa 0.83 (0.79-0.99) 0.91 (0.93-1.04)

Creatinine 2.2 (1.7-3.4) 2.7 (1.1 - 4.1)

Weight (kg) 110 (87-134) 121 (99-129)

Need Ontology of Clinical Research

• Large scale computation of human studies data requires an ontology of study protocol metadata– for interpretive context around the results tables

HSDB Project Aims and Status

• Define the Ontology of Clinical Research (OCRe)– modeled study design typology, interventions, outcomes,

analyses, basic administrative data

• Define the data sharing architecture using OCRe as the reference semantics– using semantic web technology (after several detours)

• Pilot human studies data sharing from multiple CTSAs– demo of federated data queries over study designs

• sample studies at Rockefeller (n=186), Hopkins (n=2), and UCSF (n=4)

– sharing of summary-level results is Phase II

Quick Pass Demo

6 HIV and 186 StudiesStudies Study design

Intervention/Factor Primary outcome(s) Secondary outcome(s)

Taha (#1, UCSF*)

Parallel group, randomized

Arm 1: metronidazole + erythomycinArm 2: placebo

- Infant HIV Infection at 4-6 wks

- Composite of infant HIV infection and mortality, at 1 year of age

- Infant HIV Infection at 24-48 hours, and 12 months - etc.

Metzger (#2, UCSF*)

Parallel group, randomized

Arm 1: Buprenorphine/ Naloxone 3 wks + 52 wksArm 2: Buprenorphine/ Naloxone max 18 daysAll arms: counseling

HIV-1 Infection or death, at 104 week visit

- Death, through week 156- HIV-1 Infection every 6 months at scheduled follow up visits- etc.

German (#3, Hopkins)

Cohort HIV status at baseline Recognized HIV Infection, Wave 2 (3 years)

- Unrecognized HIV Infection, Wave 2 (3 years) - etc.

Wawer (#4, Hopkins)

Arm 1: Immediate circumcisionArm 2: Delayed circumcision

Male-to-female HIV transmission, throughout study

El-Sadr (#5, UCSF*)

Cohort Assigned to drug conservation (DC) arm or assigned to viral suppression (VS) arm in SMART study

HIV transmission risk behavior, end of study

HIV transmission risk behavior in participants who are not on ART at enrollment, end of study

Cohen (#6, UCSF*)

Cohort HIV-1 infection status: proven acute, established, or uninfected

- Prevalence of acute HIV infection, throughout study - etc.

Rockefeller: 186 studies

Interventional Observational

Data Sources

Local Servers

Query Integrator

Protocol Documents Electronic IRB (iMedRIS)

Registry

Johns Hopkins RockefellerUCSF (AWS)

XMLXML

auto-generationOCRe-XSDOCReOCRe

XML

Manual Bulk Upload

Calls BioPortal with a subsumption query on SNOMED, for children of “macrolide” [428787002]

Retrieve all studies where a macrolide antibiotic was administered

Searches for interventions within arms, for codes matching “macrolide” or children

Phase III Trial of Antibiotics to Reduce Chorioamnionitis-Related Perinatal HIV Transmission

Outline

• HSDB Overview

• Quick Pass Demo

• Under the Hood– OCRe

– Data Sharing Architecture

– Data Acquisition

– Federated Query

• Deeper Dive Demo and Discussion

• Summary

Data Sources

Local Servers

Query Integrator

Protocol Documents Electronic IRB (iMedRIS)

Registry

Johns Hopkins RockefellerUCSF (AWS)

XMLXML

auto-generationOCRe-XSDOCReOCRe

XML

Manual Bulk Upload

OCRe

• OWL 2.0 project at HSDBwiki.org, NCBO BioPortal

• Models human studies for scientific query and analysis

• Domain– all studies in which humans, parts of humans, or

groups of humans are enrolled, exposed, or observed

• Scope– all clinical domains, all variable types (quantitative,

qualitative, imaging, genomics, etc.)

Sim I, et al. AMIA CRI Summit 2010, p.51-55.

OCRe Import Graph

• OCRe: core OWL ontology containing all primitive concepts and relationships and selected defined classes

• OCRe_ext: extended with application/project specific defined classes

• HSDB_OCRe: includes HSDB information model annotations (for autogenerating XSD)

Modeling Study Outcomes and Analyses

HIV Infection

has_code 86406008

has_code_system_name SNOMED-CT

has_code_system_version2011_01_31

has_display_name Human immunodeficiency virus infection

Primary

4-6 weeks

Composite Outcomes: HIV Infection or Death at 1 year of age

• Need an expression grammar (HIV infection OR Death)

Var1: HIV Infection

Var2: Death

1 year of age

Primary

Outline

• HSDB Overview

• Quick Pass Demo

• Under the Hood– OCRe

– Data Sharing Architecture

– Data Acquisition

– Federated Query

• Deeper Dive Demo and Discussion

• Summary

Data Sources

Local Servers

Query Integrator

Protocol Documents Electronic IRB (iMedRIS)

Registry

Johns Hopkins RockefellerUCSF (AWS)

XMLXML

auto-generationOCRe-XSDOCReOCRe

XML

Manual Bulk Upload

The HSDB 4-Quadrant Diagram

• CShare, BRIDG, OCRe in UML, OpenMDR, CDEs, LexEVS, BioPortal caGrid, SHRINE, Dynamic Extensions, HOM, i2b2, etc.

October, 2010

• Dropped OCRe in UML, CShare, caGrid

April, 2011

• Dropped openMDR, CDEs, SHRINE, i2b2, HOM

• Moved to fully semantic web approach

May, 2011

• Dropped Dynamic Extensions, brought in VIVO/VITRO

June, 2011

October, 2010October, 2011

• For demo: staying in XML (not RDF), no logical curation, no data curation interface

OCRe-XSD

• Automatically generated from HSDB_OCRe– guided by annotations

• Elements are indexed to OCRe IDs via purl URIs

Outline

• HSDB Overview

• Quick Pass Demo

• Under the Hood– OCRe

– Data Sharing Architecture

– Data Acquisition

– Federated Query

• Deeper Dive Demo and Discussion

• Summary

XML Generator:1. Links IORG number2. Cleans textual data3. Generates xml file

Oracle DB RU iMedRISOracle DB

RU iMedRIS

HSDB xsd

schema

HSDB xsd

schema

SQL MapperSQL Mapper

RU HSDB xml

RU HSDB xml

XML Generator

XML Generator

generates rules for mapping data elements

extracts dataelements

SQL Mapper:1. Maps data elements using xsd

2. Transforms extracted data using analytics (data conversion, masking,

concatenation, etc.)

generates data elements table

RU HSDB Data Mapping Workflow

Hopkins and UCSF Workflow

• Manual selection of HIV-related protocols– from local IRB, ClinicalTrials.gov

• Manual instantiation into XSD-conformant XML– using Oxygen XML editor

Registry File of Published XML Instances

• http://purl.org/sig/reg/hsdb/hsdb_demo_registry.xml

•Rockefeller, 186 studies, at Rockefeller server

•UCSF, 4 studies, at Amazon Web Services

•Hopkins, 2 studies, at Hopkins server

Outline

• HSDB Overview

• Quick Pass Demo

• Under the Hood– OCRe

– Data Sharing Architecture

– Data Acquisition

– Federated Query

• Deeper Dive Demo and Discussion

• Summary

Query Integrator: Brinkley Lab, U Wash

• Write, save, reuse, and chain queries over any web accessible XML or RDF source

• SPARQL, XQuery, IML, etc.

http://www.si.washington.edu/projects/QI

BioPortal REST services (SNOMED)

UCSF HSDB Data

OCRe in OWL

Remote ServicesRemote Services

vSPARQLService

vSPARQLService

DXQueryService

DXQueryService

OtherServices

OtherServices

XMLXMLRDF/OWLRDF/OWL

Other ClientsOther Clients

QI Client

QI Client

RDF StoreRDF

StoreQuery

DatabaseQuery

Database

QI ServerQI Server

QESQES

QI Core

QI “Plugins”

Outline

• HSDB Overview

• Quick Pass Demo

• Under the Hood– OCRe

– Data Sharing Architecture

– Data Acquisition

– Federated Query

• Deeper Dive Demo and Discussion

• Summary

Four Illustrative Queries

• Interventions– all studies administering macrolide

• Study Design– all interventional studies

– all placebo-controlled randomized studies

• Outcomes– all studies with primary outcome of HIV Infection

Demo: Query on Interventions

• Interventions Query 1: all studies administering a macrolide– demonstrates query exploiting SNOMED’s semantic

hierarchies

– demonstrates modeling of arm structure in interventional studies

Querying BioPortal

Chains a subsumption query to SNOMED, macrolide ID = 428787002

Retrieve all studies where a macrolide antibiotic was administered

BioPortal SNOMED subclass query

REST call to SNOMED in BioPortal

Cleans and returns all subclasses of SNOMED ID (e.g., 428787002 for Macrolide)

SNOMED Results: All Children of 42878700

Matches studies where Arm tags contain SNOMED code of Macrolide or its children

Retrieve all studies where a macrolide antibiotic was administered

Arm Structure Explicitly Modeled

Query on Study Design

• Design Query 1: all interventional studies– demonstrates use of OCRe’s study design typology

• returns “parallel group” studies

Finding all study designs matching OCRe ID for “interventional” or its children

Retrieve all interventional studies

OCRe study design typology

OCRe_XSD

XML instance

Query on Study Design

• Design Query 1: all interventional studies– demonstrates use of OCRe’s study design typology

• returns “parallel group” studies

• Design Query 2: placebo-controlled RCTs– demonstrates explicit modeling

• Intervention = placebo (re-using “macrolide” query)• StudyDesign = parallel group• AllocationType = Random Allocation, or OCRe child of

Finding all study designs matching OCRe ID for “parallel group”

Retrieve all placebo-controlled randomized trials

Finding allocation schemes under OCRe’s “random allocation” hierarchy

Finding interventions = SNOMED code for “placebo” [182886004]

OCRe Hierarchy of Allocation Type

Query on Outcome Variables

• Outcome Query 1: Any study for which HIV infection is a Primary outcome (single variable outcome)– demonstrates use of SNOMED hierarchy for “HIV

infection”

– illustrates timepoint-specific primary and secondary outcomes

Same BioPortal SNOMED subsumption call as for Macrolide query

Matches any outcome variable code to SNOMED ID for HIV Infection or children

All studies with HIV Infection as any single variable outcome

Call query for HIV infection as a single variable outcome with outcome priority = Primary

All studies with HIV infection as a Primary single variable outcome

Primary outcome is HIV Infection at 4-6 weeks

HIV Infection at 24-48 hours, and at 12 months, are Secondary outcomes

Discussion

Outline

• HSDB Overview

• Quick Pass Demo

• Under the Hood– OCRe

– Data Sharing Architecture

– Data Acquisition

– Federated Query

• Deeper Dive Demo and Discussion

• Summary

Summary

• Human studies most valuable source of evidence on therapies, etc., should be computable at large scale

• Requires standardized semantics of study protocol features

• HSDB Project has demonstrated– OCRe can be used to describe range of studies– can capture OCRe-standardized data from source systems

via XSD schema– can federate data queries over XML instances, OCRe, and

SNOMED (via live queries to BioPortal)

• Data acquisition remains challenging

Future Work

• OCRe– expression grammar for composite outcomes

– ERGO for eligibiity criteria, summary-level results

• Data federation– converting XML instances to RDF

– data curation user interface

– friendlier query interface

• Data acquisition – mapping and bulk uploads from local source systems

– policy and staffing issues

Federate Your Data With Us!

• Bulk transformation of instances– map from your local schema to our XSD

– generate XML instance file

• Curate– when data acquisition/curation interface available, help

test

– curate data

• Publish locally– may install local Query Integrator

Links

• Project Links– HSDB http://hsdbwiki.org/

– OCRe http://code.google.com/p/ontology-of-clinical-research/ – Query Integrator http://sig.biostr.washington.edu/projects/queryintegrator

• Contacts– Overall: Ida Sim, [email protected]

– OCRe: Samson Tu, [email protected]

– Data federation: • Jim Brinkley, [email protected]

• Todd Detweiler [email protected]