26
Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical Data Analysis 1

Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Embed Size (px)

Citation preview

Page 1: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Using EMR Data for Population Registries

Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga

David Thiemann, Center for Clinical Data Analysis1

Page 2: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Potential Data Uses

• Sample Size Estimates (aggregate data without IRB approval)– Feasibility, grant applications, statistical planning

• Identifying patients for enrollment/recruitment– By diagnosis, pathology, stage, labs, meds

• Identifying/creating matched study controls• Obtaining current demographics (name, address) for mail

solicitation– From research list or by clinic, provider, clinical criteria

• Obtaining ongoing clinical + administrative data on a registry panel– Labs, visits, procedures, immunizations, CPT/ICD9 codes,

resource use

2

Page 3: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Possible research data sources

• EPR (JHH & JHBMC)• Sunrise Clinical Manager (JHH – inpatient)• Meditech (Bayview)• Casemix Datamart• GE Centricity (JHCP)• EPR2020• Departmental Systems (ED, OR, Anesthesia)• Clinical Research Management System (CRMS)• IDX (professional fees)• Death Registry

3

Page 4: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Methods for Data Access

• Historical: Researcher Negotiates Access With Clinical System /DBA

– Logistic nightmare, technical challenge

• Clinical Research Management System (CRMS)– Study cohort with real-time links to enterprise data

• Center for Clinical Data Analysis– Monthly/quarterly data extracts from designated systems

4

Page 5: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Clinical Research Management System (CRMS)

5

• 1,054 Users• 1079 Active Studies• 25,430 Participants

Data Available in CRMS– eIRB – EPR (patient demographics)– Study participants / accruals– Electronic Case Report Forms - in next 2-3 months

Page 6: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Clinical Research Management System (CRMS)

6

Ways to extract data– Canned Reports (click for examples)

– Ad-hoc querying using SQL

– Possible with CCDA support - automated study-specific data extracts

Page 7: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

EPR2020 Data for Researchers

7

4.2M Patients, 23.4M Visits

12.3M Documents, 6.8M Radiology Reports

25.6M Lab Results

1.5M Problems, 2.2M Medications, 140K Allergies

Planned • Bayview & JHCP data• ICD9 diagnosis codes and CPT charges (IDX)

Future• Death Registry• Blood Product Data for Transfusions• Eclipsys SCM Order data• HMED (ED), ORMIS, eADR/Medivision

FromEPR

Today

Page 8: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

My Participant’s Lab Data

8

Reliable. Driven by the CRMS Participant Registry. Exportable.

Page 9: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Registry Cohort Discovery using EPR2020

A JHM investigator wants to find and enroll diabetic patients

aged 45-65 years

with hemoglobin A1C between 7 and 9%

serum creatinine < 2 mg/dl

9

Page 10: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Center for Clinical Data Analysis (CCDA)

Provides periodic (monthly/quarterly) bulk data extracts (delimited/flat files, .xls):

• Preliminary, anonymous data for feasibility, grant applications and statistical sample-size estimates

• IRB-approved case-finding--for study enrollment (mailings, phone solicitation), chart review, and cohort/case-control studies

• Research data extracts - monthly/quarterly integrated extracts from EPR, POE, ORMIS, lab/PDS, billing systems, vaccination/transfusion/culture data, etc.

10

Page 11: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

How CCDA works

• Email [email protected], cc: [email protected]; phone 410-955-65558 (Thiemann)

• For IRB-approved research: – Provide full protocol + IRB approval– Meet to discuss query methods, format– Iterate, then schedule prod (email extracts, Jshare)– Cost: $100/hour

• For non-IRB projects (exploratory analyses, QI)– Same process, cost subsidized by ICTR/JHM– Do NOT implicitly morph QI into IRB

11

Page 12: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

The Basics: Getting Clinical Data Into a Registry Database

• Real work, not ad hoc/bootstrap

• Need $$$ and FTE(s)

• Smart analyst(s) who know database technology and understand (or can learn) nuances of the sources and content domain

• Hands-on PI management/guidance

• Statistical liason early, before database schema and ETL methods are set in stone

12

Page 13: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

The Extract-Transform-Load process:Getting Clinical Data into Research DB

• Raw clinical/administrative data is useless for research

• Build an intermediate (staging) database

– Don’t do data management in SAS/Stata/Excel

• Data dictionary—derivation for each field

• Templated, tested, documented cleanup scripts/routines.

• Intermediate tables: Log each step/modification – For each batch, be able to re-create data transform from scratch

– Version control, change control and documentation are vital

– Build data versioning into the database

13

Page 14: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Transforming Data

• Raw data typically string (char/text) fields

• Unanalyzable characters (* < >, comments) still have meaning

– Put non-numeric data in separate field. Avoid numerical recoding (999)

• ~3% of pts have multiple/non-preferred MRNs– Need 1-to-many link table

• Assays/reference ranges/coding changes– Avoid using raw codes (CPT/ICD) in research db– Map clinical codes to research terms

• Defer analytic assumptions. When recoding data, anticipate problems. Keep options open.

14

Page 15: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

More Data Transform Challenges

• NEVER trust raw data. Learn business logic of source system.

– CPTs morph annually, internal complexity/redundancy– Lab assays/reference/terms change– Parsing is inherently unreliable– Administrative names/groups change (clinic #s, departments).

• Duplicate-value problems (labs, orders)

• System-attribution source/datetime (POE, lab)

• Always run an aggregate (“group by” ) query to identify alternative names (eg lab name) and values (number, result) before transform. Otherwise you’ll miss something

15

Page 16: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

Understanding Business Logic

• Trust but verify: Test coding accuracy – Providers may habitually use imprecise/inaccurate diagnosis

codes (especially in profee data)– ICD9 procedure indications often a billing fiction – Trained coders may make systematic errors – Different content domains may have different standards (inpt vs

outpt coders)– Don’t infer/assume dependencies unless enforced by source

system.

• Run min/max queries, aggregates, outer joins– Confirm date ranges, data ranges, relative proportions by year

• Don’t assume that null rows actually are empty. Maybe the query missed something

16

Page 17: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

JHM Clinical Data Landscape: Past, Present and Future

Past : Babble of unintegrated systems

• EPR (antiquated technology, VSAM files, DB2) contains text, not queryable, analyzable data

Present: EPR2020 (aka Amalga) –integrated data!!

• Has everything in EPR, plus JHCP, plus gradually adding data from clinical/departmental/administative systems (IDX CPTs, transfusion medicine, ORMIS, HMED, eADR, death registry, ad infinitum)

Future: ? Epic, ? JHM Data Warehouse• Epic: One system replacing all major JHM systems• JHH timeline: 4+ years

17

Page 18: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

JHM Data Sources: Casemix Datamart

• Gold standard for JHM (non-profee) administrative data, including payer/insurance data

• Combines data from Keane (hospital charges), ADT (admission/discharge/transfer), HDM (ICD9 diagnosis + procedure coding), HSCRC (regulatory submissions)

• Not a true data warehouse; meager reconciliation

• Best source for length of stay, resource use, ICD9 diagnoses

• Outpatient ICD9s limited

• Has JHH + BMC + HCGH data 18

Page 19: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

JHM Data Sources: IDX (profee)

• Gold standard for inpatient +outpatient CPT (profee charge) data

• ICD9 diagnosis data problematic

• Limitation: No data from non-faculty providers (private physicians, etc.)

• Difficult to query. Has a data warehouse, limited access.

• Early target for EPR2020/Amalga integration.

19

Page 20: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

JHH Data Sources: SCM/POE

• Sunrise Clinical Manager/Provider Order Entry

• Replicated transactional database, difficult to query

• For registry purposes POE has large attribution/process challenges: Stutter-step orders, multiple alerts, imputed times

• Great source for inpatient meds, labs, physiologic monitor data

• No codified ICD9/Snomed/RxNorm data

• No outpatient data

20

Page 21: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

JHH Data Sources: SCC/AIM

• Sunrise Critical Care (aka Emtek, Eclipsys). JHH ICUs + stepdown units + oncology

• AIM analytic database contains selected but comprehensive batch extract

• Sunsets as ICUs switch to POE ClinDoc

• Challenging to query. Lots of denormalized fields

21

Page 22: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

JHH + BMC Data Sources: PDS

• PDS=Pathology Data Systems

• Includes lab, transfusion medicine, anatomic pathology, cytopath, John Boitnott’s death registry

• Lab data also available via EPR2020/Amalga and POE

22

Page 23: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

BMC Data Sources: Meditech

• Shrink-wrapped, comprehensive inpatient + outpatient clinical + financial system

• Difficult for ad hoc research queries.

• Exports data to Datamart and EPR2020

• BMC-JHH patient linkage doable but difficult, needs caution

23

Page 24: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

JHCP Data Sources: GE Centricity

• All clinical + administrative data for JHCP clinics

• Largely opaque to research query; JHCP sometimes collaborates directly, especially for its physician/investigators

• Early target for EPR2020/Amalga integration

• Linkage challenges to BMC and JHH mrns

24

Page 25: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

JHH Departmental Data:ORMIS + eADR/Medivision

• ORMIS: Operating Room Management Information System

• Mostly transactional scheduling/tracking/administrative data, limited clinical data.

• Has diagnoses, procedures, case start/stop times

• eADR/Medivision (anesthesia) still evolving, limited research data access

• Design challenges similar to legacy SCC critical-care system.

25

Page 26: Using EMR Data for Population Registries Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga David Thiemann, Center for Clinical

JHH Departmental Data: HMED (Emergency Department)

• Mostly opaque to research

• Replicated data hosted by Datamart

26