18
A Centralized De- A Centralized De- Duplication Service Duplication Service 2003 Immunization Registry Conference 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH Paul Schaeffer, MPA, NYC DOHMH [email protected] Daryl Chertcoff, HLN Consulting Daryl Chertcoff, HLN Consulting [email protected] Co-Authors: Alexandra Ternier Co-Authors: Alexandra Ternier Angel Aponte (DOHMH) Angel Aponte (DOHMH)

A Centralized De- Duplication Service A Centralized De- Duplication Service 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH [email protected]

Embed Size (px)

Citation preview

Page 1: A Centralized De- Duplication Service A Centralized De- Duplication Service 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH pschaeff@health.nyc.gov

A Centralized De-A Centralized De-Duplication ServiceDuplication Service

2003 Immunization Registry Conference 2003 Immunization Registry Conference

Paul Schaeffer, MPA, NYC DOHMHPaul Schaeffer, MPA, NYC [email protected]

Daryl Chertcoff, HLN ConsultingDaryl Chertcoff, HLN [email protected]

Co-Authors: Alexandra Ternier Co-Authors: Alexandra Ternier

Angel Aponte (DOHMH)Angel Aponte (DOHMH)

Page 2: A Centralized De- Duplication Service A Centralized De- Duplication Service 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH pschaeff@health.nyc.gov

ObjectivesObjectivesTo describe To describe the NYC the NYC

Department of Health and Department of Health and Mental Hygiene’s (DOHMH) Mental Hygiene’s (DOHMH) centralized de-duplication centralized de-duplication serviceservice

Page 3: A Centralized De- Duplication Service A Centralized De- Duplication Service 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH pschaeff@health.nyc.gov

Rationale – Centralized Rationale – Centralized De-Duplication Service De-Duplication Service

DDuplication of records – uplication of records –

a department-wide database a department-wide database problemproblem

Page 4: A Centralized De- Duplication Service A Centralized De- Duplication Service 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH pschaeff@health.nyc.gov

Duplication Rates - Duplication Rates - DOHMH DatabasesDOHMH Databases

ProgramProgram Current Current Estimated Estimated Duplication Duplication RatesRates

CIRCIR 30%30%

LQLQ 7%7%

CDSSCDSS 30%30%

Page 5: A Centralized De- Duplication Service A Centralized De- Duplication Service 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH pschaeff@health.nyc.gov

Key TermsKey Terms Master Client Index (MCI) – Master Client Index (MCI) –

database that stores information database that stores information from different programs for from different programs for matchingmatching

Core Services – implementation of Core Services – implementation of Business Rules governing the MCIBusiness Rules governing the MCI

De-Duplication Service – matches De-Duplication Service – matches duplicate recordsduplicate records

Page 6: A Centralized De- Duplication Service A Centralized De- Duplication Service 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH pschaeff@health.nyc.gov

Background - MCIBackground - MCI The MCI integrates data from and The MCI integrates data from and provides a centralized de-duplication provides a centralized de-duplication service to:service to:

Citywide Immunization Registry (CIR)Citywide Immunization Registry (CIR) Lead Quest Registry (LQ) from the Lead Lead Quest Registry (LQ) from the Lead Poisoning and Prevention ProgramPoisoning and Prevention Program Vital birth recordsVital birth records Communicable Disease (Spring 2004)Communicable Disease (Spring 2004) Additional health databases (in the Additional health databases (in the future)future)

Page 7: A Centralized De- Duplication Service A Centralized De- Duplication Service 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH pschaeff@health.nyc.gov

Development of MCIDevelopment of MCI Developing Requirements & SpecsDeveloping Requirements & Specs

Selecting middleware technologySelecting middleware technology

Building MCI Core Services Building MCI Core Services

Configuring servers and platforms Configuring servers and platforms

Building MCI Administration ToolsBuilding MCI Administration Tools

Page 8: A Centralized De- Duplication Service A Centralized De- Duplication Service 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH pschaeff@health.nyc.gov

Development of MCIDevelopment of MCI (Continued)(Continued)

Modifying CIR and LQ (first clients)Modifying CIR and LQ (first clients) Training artificial intelligence de-Training artificial intelligence de-duplication softwareduplication software

Data loads into MCIData loads into MCI

DeploymentDeployment

Page 9: A Centralized De- Duplication Service A Centralized De- Duplication Service 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH pschaeff@health.nyc.gov

Master Client IndexMaster Client Index

De-Duplication ServiceDe-Duplication ServiceMCI CoreMCI Core Services Services Win 2000Win 2000 Servers Servers

LQ ClientLQ ClientCIR ClientCIR Client

MCI MCI AdministrationAdministration

Tools Tools (VB Application)(VB Application)

MCI MCI

DatabaseDatabase

(Oracle)(Oracle)

Unix ServerUnix Server

CIR CIR DatabaseDatabase (Oracle)(Oracle)

Unix ServerUnix Server

LQ DatabaseLQ Database(Microsoft SQL(Microsoft SQL))

Win 2000 Win 2000 ServerServer

CIR Front EndCIR Front EndPower Builder Power Builder

Application Application

LQ Front End LQ Front End Power BuilderPower BuilderApplicationApplication

CDSS ClientCDSS Client

CDSS DatabaseCDSS Database(Microsoft SQL(Microsoft SQL))

Win 2000 ServerWin 2000 Server

CDSS Front End CDSS Front End JSP WebJSP Web

ApplicationApplication

Page 10: A Centralized De- Duplication Service A Centralized De- Duplication Service 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH pschaeff@health.nyc.gov

MCI – Core ServicesMCI – Core Services MCI’s main function - to facilitate matching MCI’s main function - to facilitate matching

and be extensible to all DOHMH databasesand be extensible to all DOHMH databases

Data model - designed with attributes Data model - designed with attributes

common to all systemscommon to all systems

Information specific to a particular system Information specific to a particular system

may also be stored in the MCI to improve may also be stored in the MCI to improve

matchingmatching

Page 11: A Centralized De- Duplication Service A Centralized De- Duplication Service 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH pschaeff@health.nyc.gov

MCI – Core Services MCI – Core Services (Continued)(Continued)

““Person-centric" modelPerson-centric" model

Artificial intelligence is “trained” by Artificial intelligence is “trained” by

program-specific dataprogram-specific data

Matching based on probabilistic Matching based on probabilistic

algorithmalgorithm

Page 12: A Centralized De- Duplication Service A Centralized De- Duplication Service 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH pschaeff@health.nyc.gov

De-Duplication : De-Duplication : FeaturesFeatures

Potential duplicate pairs are Potential duplicate pairs are reviewed by humans to train the reviewed by humans to train the modelmodel

““Artificial Intelligence” model Artificial Intelligence” model createdcreated

Match thresholds are determinedMatch thresholds are determined

Page 13: A Centralized De- Duplication Service A Centralized De- Duplication Service 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH pschaeff@health.nyc.gov

De-Duplication : De-Duplication : ProcessProcess

Incoming Records to MCI (not Incoming Records to MCI (not client systems)client systems)

De-Duplication happens in MCI and De-Duplication happens in MCI and trickles down to client systems trickles down to client systems

Clients have access to each other’s Clients have access to each other’s data for human review processdata for human review process

Page 14: A Centralized De- Duplication Service A Centralized De- Duplication Service 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH pschaeff@health.nyc.gov

De-Duplication De-Duplication Service – Some Service – Some NumbersNumbers

Estimated 94% of new reports will Estimated 94% of new reports will be either merged or inserted be either merged or inserted

Remaining 6% - sent to hold queue Remaining 6% - sent to hold queue for Human Reviewfor Human Review

99.7% accuracy of De-Duplication 99.7% accuracy of De-Duplication ServiceService

Page 15: A Centralized De- Duplication Service A Centralized De- Duplication Service 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH pschaeff@health.nyc.gov

Benefits – Centralized Benefits – Centralized De-Duplication De-Duplication Service Service

Cross-program leveraging of Cross-program leveraging of resourcesresources

Programs have access to other Programs have access to other program’s dataprogram’s data

Less FTEs needed for human review Less FTEs needed for human review – able to re-deploy staff– able to re-deploy staff

Page 16: A Centralized De- Duplication Service A Centralized De- Duplication Service 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH pschaeff@health.nyc.gov

ChallengesChallenges

Who will be responsible for cross Who will be responsible for cross program record review – individual program record review – individual programs, or an MCI team?programs, or an MCI team?

Ownership of data – CIR will now Ownership of data – CIR will now disseminate LQ datadisseminate LQ data

Confidentiality Issues Confidentiality Issues All Clients have access to VR informationAll Clients have access to VR information CIR has access to LQ dataCIR has access to LQ data

Page 17: A Centralized De- Duplication Service A Centralized De- Duplication Service 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH pschaeff@health.nyc.gov

Fiscal issues Fiscal issues

Joint Project Activities – data Joint Project Activities – data disseminationdissemination

MCI System Operations & Maintenance MCI System Operations & Maintenance – need to divide responsibilities – need to divide responsibilities between MIS, MCI, CIR and LQ staffbetween MIS, MCI, CIR and LQ staff

ChallengesChallenges(Continued)(Continued)

Page 18: A Centralized De- Duplication Service A Centralized De- Duplication Service 2003 Immunization Registry Conference Paul Schaeffer, MPA, NYC DOHMH pschaeff@health.nyc.gov

Future PlansFuture Plans

Environmental Health - Adult Environmental Health - Adult Heavy Metal Poisoning Database Heavy Metal Poisoning Database

Expanding the MCI to the rest of Expanding the MCI to the rest of DOHMHDOHMH