(Closet Skeletons Version) Richard Pham Enterprise Architect
OI&T Corporate Data Warehouse Architecture
[email protected]
Slide 2
Slide 3
REGION 1 REGION 2 REGION 4 REGION 3 CDW Informatics and
Analytics Ecosystem RDW V20V19V18V22V21 RPC Farm RDW
V12V15V16V17V23 RPC Farm RDW V1V2V3V4V5 RPC Farm RDWV6V7V8V9V10V11
RPC Farm CDW SAS Grid VINC I Ana Apps ePM GIS RPC Farm CDW
Corporate Data Warehouse RDW Regional Data Warehouse Hardware Stats
411 Servers 4 PB Storage 54 Racks BI Farm SQL Server Data Center
Build (Engine, SSIS, SSAS, SSRS) Excel Services
SharePoint/PerformancePoint Services Team Foundation Services SAS
Stata TreeAge
Slide 4
Some Things Never Change VHA and OI&T have a tense/unhappy
relationship OI&T project management bureaucracy is onerous The
use and oversight of contractors is problematic Pharmacy knows what
they are doing (more so than OI&T)
Slide 5
In the beginning, there were files (early 70s)
Slide 6
There were problems How do I maintain each file? If I change
one file, what happens to the other files? How do I control growth
of the files?
Slide 7
Then came databases(late 70s)
Slide 8
And there were more problems How can the databases share common
elements like patient? What if some idiot changes one table
structure that collapses everything else? Who remembers how this
database was designed?
Slide 9
This is only two packages, think of the 100+ that are in VistA
Now, try extrapolating those trends in your head Have a picture in
your mind?
Slide 10
Did That Picture Look Like This?! (~7% of VistA as of
2010)
Slide 11
Slide 12
One more extension, lets try to analyze this data
Slide 13
This Is What Happens With Extracts (90s)
Slide 14
Even more problems. Is my data timely (Extract to production
system time lag)? Are the extracts one-time? Are they repeatable?
Who manages all these extracts? No seriously, this becomes a really
ugly problem
Slide 15
Why Am I Giving This Presentation? Quite simply, feedback on: I
dont understand what you mean when you say File or Pointer. Where
does the data come from? How does the data get to CDW? Also, while
you are using the CDW to prepare your work, it really helps if you
know the origins of where the data comes from
Slide 16
DHCP/VistA/CPRS/HealthEVet VistA Veterans Health Information
Systems and Technology Architecture 2 nd Generation Architecture.
Refers both to the architecture and the database which the
architecture supports DHCP - Decentralized Hospital Computer
Program The DOS (Unix-like) system where many of VistAs
non-clinical entries take place CPRS - Computerized Provider Record
System A user-friendly GUI providing access to clinical order entry
functions HealthEVet 3 rd Generation of VAs EMR. Planned inclusions
are patient-facing applications, better alignment with coding
standards, and MDS compliance.
Slide 17
The Health Care Process Is More Complicated Than We Think
Slide 18
Objectives The main objective is to understand the data
lifecycle of VAs VistA/CPRS and the user experience of VistA/CPRS A
high-level overview of VistA Internals Learn about data structures
and outputs in VistA Learn where data enters and travels throughout
the VA Try to make sense of data resources within the VA and how
they are accessed
Slide 19
The VA Data Lifecycle
Slide 20
Slide 21
Core Patient Care Functionality VistA is first and foremost an
Electronic Medical Record. The architecture design supports veteran
health care.
Slide 22
Core Patient Care Functionality VistA Internals DHCP CPRS
Slide 23
VistA Internals 101 MUMPS Server and Operating System Kernel
Three Wise Men (Managers) TaskMan MailMan FileMan Modules
Slide 24
Slide 25
Why Is Med Safety at Ann Arbor?
Slide 26
To Best Care Anywhere
Slide 27
Massachusetts General Hospital Utility Multi-Programming System
(MUMPS or M) My definition in English M is a programming language
designed for hierarchical databases that is convenient for medical
applications or anything else where speed and data storage upkeep
are a problem and programmer intelligence/organization is not My
technical definition M is a Turing-complete, low and high-level,
imperative, machine-compiled (no longer interpreted) programming
language utilizing a hierarchical global array file structure Used
commonly in healthcare and financial industry settings
Slide 28
Structure of The Veterans Administration Data Efforts (Late
1970s) VHA Ancestor Department of Medicine and Surgery (DMAS)
VHA-OI Ancestor Computer Assisted System Staff (CASS) OI&T
Ancestor Office of Data Management & Telecommunications
(ODM&T)
Slide 29
Comparing The Two Offices CASS ODM&T Decentralized design
philosophy Rapid, agile development SME-involved development
Centralized design philosophy Bureaucratic, process-focused
development Development without SMEs
Slide 30
Slide 31
Highlights of ODM&T Development Took 6 years to deploy
APPLES Pharmacy at 10 sites A 1980 paper detailing ODM&Ts
transactional patient treatment file (PTF) system promised an
interactive national solution by 1990. Navigating the mandated 17
steps between system specification and deployment alone is said to
have required at least 3 years.
Slide 32
Slide 33
Beginnings of DHCP There were subject matter experts that
believed that they could put out useful applications faster than
the ODM&T sloth Development of the testing and principles was
done unofficially throughout the early to late 1970s
Slide 34
Original DHCP Design Principles A commitment to rapid prototype
development All use ANSI MUMPS Modular Design Actively Maintained
Data Dictionary Code Sharing/Portability Involve the SMEs
Slide 35
DHCP Kernel Functions as both an operating system for VistA
applications and an M virtual machine Kernel shields DHCP modules
from needing to know hardware and OS configurations on the server
Isolates M to the ANSI standard (1995) Provides a toolbox of
standard functions for most programmers
Slide 36
Slide 37
MUMPS Classic Database One Data Type String (Text) Other types
Cardinal Numbers Float Numbers $H Dates One Data Storage Type
Multidimensional Array aka Globals Dynamic (duck) typing
Slide 38
VistA Data Organization Namespace File Field Record 654 (VAMC
Reno) File 120.5 (GMR Vitals) Field 0.1 (DATE/TIME VITALS TAKEN)
IEN-1, BP, 140/90 Most Files have an entry at the 0.001 Field
called IEN or Internal Entry Number as an identity key to mark the
record as unique
Slide 39
From The Beginning - Entry An entry is a piece of data Richard
First Name Pham Last Name 05/03/1983 Date of Birth
Slide 40
Record (Row) A group of related data Richard Pham M
05/03/1983
Slide 41
Field A group of related data Richard First Name Pham Last Name
05/03/1983 Date of Birth
Slide 42
File A group of related fields and the records that we have
File 200 NEW PERSON Richard First Name 200 Pham Last Name 200 Date
of Birth -
File Relationships - Pointers When two files share a common
field with each other, this is called a pointer There are three
major types Pointer - One record in one file matches to one record
in another file Self-Referential One record in one file matches to
one record in the same file (in the past or the future) Multiple
One record in one file matches to many records in one file
(parent-child) Variable One record and some logic matches to one
file
Slide 45
Pointers File 52 PRESCRIPTION Field 2 Patient File 2 PATIENT
All fields One-to-one
Slide 46
Self-Referential Pointer File 100 OE/RR Field 9 Replaced Order
File 100 OE/RR (Past Order) Present-to-Past File 100 OE/RR Field
9.1 Replaced Order File 100 OE/RR (Future Order) Present-to-Future
Warning DO NOT $o these fields without programmer assistance! You
will bring down DHCP this way!!!
Slide 47
Multiple Subfile File 52 PRESCRIPTION Field 52 Refill Subfile
File 52.1 REFILL All fields One-to-many
Slide 48
Multiple Subfile File 120.8 PATIENT ALLERGIES Field 1 GMR
Allergy File 50 DRUG One-to-many files File 50.6 NATIONA L DRUG
120.2 GMR ALLERGIE S File 50.416 DRUG INGREDIE NT File 50.605 DRUG
CLASS
Slide 49
Computed/MCode A placeholder that does not contain any stored
information Calculated ad hoc when you look up the value Warning
For this reason, the value ALWAYS has the possibility of
changing
Slide 50
How Complicated Is The Pharmacy Package? 440 files in the File
50 Series 3,175 fields 527 Pointers 310 External References
Slide 51
VistA to Relational Database Terminology VistA
(Example)Relational Database (Example) Namespace (VHAFRE)Database
(VA Fresno) Package Not hardcodedSchema (RxOutpatient) File (50.68
VA PRODUCT)Table (NationalDrug) Field (.01 NAME)Column
(DrugNameWithDose) Domain (cardinal/decimal, setofcodes,
freetext/wordprocessing) Field Type (numeric, boolean, varchar)
Internal Entry Number (IEN or.001)~Key (9722) RecordTuple/Row
(ISOSORBIDE MONONITRATE 120MG TAB,SA) Pointer (IEN)Foreign Key
(VAClassIEN) Multiple PointerNo equivalent Computed/MCode
FieldTrigger (Age Trigger)
Slide 52
Upside of Using Globals Faster - No joins Faster All parameter
pointers built in Faster Direct and planned programmatic access to
database (Look at SQL execution plans) Less Data Storage Overhead
and faster paging If the data point does not exist in the array,
there does not need to be a fixed point like in relational
Slide 53
Downside of Using Globals No Intrinsic Structure and No
Enforcement* - M believes whatever you put into the globals (most M
programmers view this as an advantage while relational programmers
have an MI) ACID-compliance not mandated (Il)logical data
structures guaranteed There are many interesting* ways that the M
programmers modeled the data that does not make sense to later
viewers
Slide 54
MUMPS Quirks Whitespace (Space) matters Requires knowledge of
kernel and sometimes lower- level concepts Programming Without Type
or Structure Enforcement VA programming standards and
conventions
Slide 55
Slide 56
The Three Wise Men (Managers) TaskMan The man(anger) that
schedules tasks to the kernel MailMan The man(anger) that messages
between the user, TaskMan, and any other two-way communication
between packages FileMan The man(anager) that controls internal
file (data structure) interactions
Slide 57
TaskMan TaskMan handles application processing: Creation of
application processing tasks Scheduling these tasks Monitoring
health/statistics of these tasks If kernel is the brain, then
TaskMan is the body of the operation If programming, NEVER EVER use
the TaskMan global. This subverts TaskMans scheduling queue, and
can cause a system memory leak. Use the calls instead
Slide 58
MailMan VistA needs a way to pass and receive data from the
database to other areas MailMan fulfills this function in the
pre-TCP/IP days Electronic mail doesnt mean just email Practically
any message between the database and anyone else (the end-user,
another site, or application, etc.) can be moved this way Gives
programmers methods to both receive and return data to the database
MailMan is its own protocol, but will use HL7 when communicating
with non-DHCP programs
Slide 59
FileMan A higher-level method to access the VistA database
without exposing a programmer interface Mostly menu-driven One can
use limited programming Serves as the model for all other modules
that interact with the VistA database
Slide 60
ODM&T Initial Action Plan To DHCP Development (1980)
Ordered that development stop Fired the developers Removed the
hardware Cut the DMAS budget so it would never happen again
Slide 61
The official history
Slide 62
Development Goes Underground Developers that survived the
ODM&T purge continued their work as a black project in DMAS
During 1980 and 1981, the survivors (Underground Railroad)
continued work on developing modules for system integration
Slide 63
Modules Modules are programmed to interact with the VistA
database Most use FileMan as a model for programming
Slide 64
Some of the Many Modules
MedicineSurgeryDentistryNursingPharmacy LaboratoryCare Management
Patient Care Encounters ADTMental Health EDISOncologyNutrition and
Food Service Imaging/PACSProsthetics Not really in the scope of
this presentation to cover each module. Try the VistA Documentation
Library: http://www4.va.gov/vdl/ http://www4.va.gov/vdl/ Or VHA
eHealth University (VeHU):
http://www.vehu.va.gov/http://www.vehu.va.gov/
Slide 65
Acceptance and DHCP 1.0 Once there was a critical mass of
packages that were shown to be useful, the tide turned and the
project was blessed Initial testing/installation done in 1980-83
1.0 installation was in 1985 Most of the underlying packages can
still be recognized by the original programmers
Slide 66
Computerized Patient Record System (CPRS) A Real-Time Order
Checking System that alerts clinicians during the ordering session
that a possible problem could exist if the order is processed A
Notification System that immediately alerts clinicians about
clinically significant events A Patient Posting System, displayed
on every CPRS screen, that alerts clinicians to issues related
specifically to the patient, including crisis notes, warning,
adverse reactions, and advance directives The Clinical Reminder
System, which allows caregivers to track and improve preventive
health care for patients and ensure timely clinical interventions
are initiated Remote Data View functionality that allows clinicians
to view a patients medical history from other VA facilities to
ensure the clinician has access to all clinically relevant data
available at VA facilities
Slide 67
CPRS Internals Written in Embarcadero Delphi (NOT in MUMPS)
Connects from the Graphic User Interface to the VistA database
using a Remote Procedure Call (RPC) Broker This Remote Procedure
Call Broker translates instruction sets from other languages into
M
Slide 68
Slide 69
Present State of VistA Large MUMPS database Over 50+ Main
Clinical Packages Over 10,000 + Tables Each medical center runs
somewhere between 2-4 TB worth of data over 30 years (mostly
imaging) Many processes 300+ MB of running executable at any given
time Over 20,000 subroutines (VDL) Many simultaneous users
Slide 70
Analytic Coursework
\\r01scrdwh65.r01.med.va.gov\vadatalifecycle\sql SQL T-SQL dialect
is for VHA PL\SQL dialect is for VBA SQL Server Reporting Services
SQL Server Analysis Services Statistical Analysis Programs SAS
Stata (preferred) TreeAge
Slide 71
Slide 72
Slide 73
Next Class SQL Basic query Optional introduction lecture on
basic computer science (algorithms, heaps, sorts, data structures).
Two 50 minute lectures for five weeks Basic Reporting - Two 50
minute lectures for five weeks Advanced Programming One 50 minute
lecture every other week Class is placed on the site Current
version has the DBZ Abridged Disclaimer
Slide 74
The VA Data Lifecycle
Slide 75
National Analytic Systems A list of systems that support
policy, planning, and congressional needs There are more extracts
than this, but I have chosen the most common ones
Slide 76
Systems to Support Planning Decision Support System (DSS)
Supports accounting and costing for the OIG, GAO, CBO, and other
auditing agencies Allocation Resource Center Supports personnel and
resource allocation at the medical center level Workload capture,
resource allocation Basis for the VERA (VAs Fund Control Point)
Model
Slide 77
Systems to Support Planning and Research National Patient Care
Database An integrated set of data that captures a patients care
encounter with the VA Corporate Data Warehouse A near real-time
accumulation of much of the same data The result of the Health Data
Repository process
Slide 78
78
Slide 79
NPCD Processing DSS data extracted Flat files are indexed and
loaded into the database daily Data is checked for duplicates bi-
monthly Data is extracted and filtered for reporting twice a month
Oracle on Unix NPCD UNIX Master Extract File (MEF) SAS z900
(MAINFRAME) VSSC/ KLF Menu WINDOWS Daily Data Loading
Slide 80
NPCD Data Flow Diagram Data extracted & backed up nightly
M-F DMI Data received in DMI 24x7 NPCD data is sent from the
facilities to the AAC via MailMan messaging MailMan Message VistA
MailMan NPCD and other applications retrieve their respective data
from DMI for use Data Stream Once a message reaches the AAC MailMan
server, It automatically moves to the Data Management Interface
System (DMI) Austin MailMan Server Acknowledgement messages are
sent to facilities HL7 data to Oracle DB Acknowledgement message
z900 NPCD Data extracted by application
Slide 81
Secrets of the VA Data Universe This was an extremely brief
introduction to a complicated area I have another presentation on
the availability of databases in the VA and how to access them for
operations and/or research
Slide 82
The VA Data Lifecycle
Slide 83
Regional Remote Data Processing Center Shadow Systems A offsite
backup process to ensure continuity of operations for VistA Patient
Care
Slide 84
Regional Data Processing Centers (RDPCs) Started as backups
Read only backup VistA systems are set up to take journaling files
When a record is written or altered to a local medical centers
VistA, a journal file with that entry is prepared and sent to a
Regional Data Processing Center This maintains an active backup in
case the local medical centers VistA goes down Nowadays, even the
production systems work from there Region I and IV fully (? On
status) Region I and III
Slide 85
Regions and RDPCs Region I RDPC Sacramento (SAC) and Denver
(DEN) Region II RDPC - Little Rock (LIT) Region III RDPC Durham
(DUR) and Augusta Region IV RDPC Philadelphia (PHI) and
Brooklyn
Slide 86
RDPC Denver and Brooklyn
Slide 87
The VA Data Lifecycle
Slide 88
Business Intelligence
Slide 89
Business Intelligence in the VA Making the Data Work For Us
VistA has a wealth of clinical and administrative data available In
the past, giving a value-added, timely VistA dataset was hard
Querying the active system with minimal impact Needed an interface
between M and analyst languages (SAS, SQL, etc.) Easy to read
reports was hard to build
Slide 90
REGION 1 REGION 2 REGION 4 REGION 3 BISL Informatics and
Analytics Ecosystem RDW V20V19V18V22V21 RPC Farm RDW
V12V15V16V17V23 RPC Farm RDW V1V2V3V4V5 RPC Farm RDWV6V7V8V9V10V11
RPC Farm CDW SAS Grid VINC I Ana Apps ePM GIS RPC Farm CDW
Corporate Data Warehouse RDW Regional Data Warehouse Hardware Stats
411 Servers 1.5PB Storage 54 Racks BI SharePoint (MOSS) Farm
Performance Point Services Excel Services Reporting Services
Analysis Services SharePoint Services Team Foundation Services
Slide 91
Different Ways To Access DHCP Data Direct Methods FileMan
Individual methods M Routines Not favored (permanent moratorium in
Region I) CPRS Injection MDWS (this is HI2s major method) Cache
Direct HDR Extractor (CDW Method) VDEF VistA Data Extraction
Framework Indirect Methods Journal Reader (CDW method)
Slide 92
MDR Extractor
Slide 93
Shadow Servers
Slide 94
Slide 95
Corporate/Regional Data Warehouse Takes a copy of the journal
file that goes into the backup shadow system Translated from the M
array to a relational database format using Intersystems Caches
class mapping program Staged in a Feeder-Collector system for
collection Indexed and value-added columns produced and loaded to
an VISN RDW Server
Slide 96
CDW Governance VHA Business Owners/SMEs VHA-OI Data Quality
OI&T Corporate Data Warehouse 10N, OIA, VBA CDW Governance
Board Communicates Organizational Priorities Organizes SMEs and
Data Stewards Provides Documentation and Clarification of Business
Logic Sets and monitors domain, work priorities, and timelines for
completion.
Slide 97
CDW Governance Is In VHAs Hands Ordered By VHA Domain and Work
Prioritization By CDW Governance Board Chair KLF (OIA) Vice-Chair
Larry Mole (Public Health SHG) Monitored and Accountable To VHA
Project management provided by John Quinn (National Data Systems)
and KLF (OIA) Supported By VHA OI Data Quality Business Owners PBMs
Data Steward is Rob Silverman
Slide 98
As the number of eyes goes up, the number of bugs goes down.
Writing documentation about the business logic of the files and
fields Answering end user questions about the data Data validation
Preferably before Inpatient Pharmacy ADR/Allergy Package
Slide 99
1 st category models are simple V Health Factor Source Mapping
FMFileFMFieldResolveFldDWTableNameDWFieldName V HEALTH
FACTORSHEALTH FACTORHealthFactorHealthFactorTypeIEN V HEALTH
FACTORSHEALTH FACTOR0.01HealthFactorHealthFactorType V HEALTH
FACTORSPATIENT NAMEHealthFactorPatientIEN V HEALTH FACTORSEVENT
DATE AND TIMEHealthFactorEventDateTime V HEALTH
FACTORSVISIT0.01HealthFactorVisitVistaDate V HEALTH
FACTORSVISIT0.01HealthFactorVisitDateTime V HEALTH
FACTORSLEVEL/SEVERITYHealthFactorLevelSeverity V HEALTH
FACTORSVISITHealthFactorVisitIEN V HEALTH FACTORSENCOUNTER
PROVIDERHealthFactorEncounterStaffIEN V HEALTH FACTORSCOMMENTS
HealthFactorComments
Slide 100
2 nd category models require transformation Prescription
Prescription and 1 st fill Refill Prescription Only All Fills
Partial Fill Fileman Data Warehouse
Slide 101
3 rd category models not usable without transformation -
PCMM
Slide 102
Levels of Data National Corporate Data Warehouse (CDW) Region
Regional Data Warehouse (RDW) VISN VISN Data Warehouse (VDW)
Medical Center Local Data
Slide 103
Entities Who Produce Business Intelligence Products National
VSSC, PSSG, DMDC, HEC, ARC, DSS, BIPL, OQP, PCS, PBM Region
Regional BISL Teams VISN VISN Data Warehouse, VISN PBM Local DSS
Bolded are ones that have substantial resources in clinical
business intelligence PSSG handles much of the GIS and Statistical
Demography for the VA
Slide 104
Data Access VISN and Station Level Contact Your VISN Database
Manager Regional/Corporate Access Contact NDS for the 9957
Permissions
Slide 105
Operational Challenges of VistA System Resources $8 Billion
investment over 20 years New needs for new domains MUMPS
Programmers must be internally trained (and many of them are
retiring or dying) Communication with Other Systems HIMISS
compliance with data interchange E-functions (billing, prescribing,
verification) Interagency Cooperation DoD and NHIN Business
Intelligence Closing the data lifecycle and bringing back clinical
data for knowledge discovery
Slide 106
Challenges CDW Faces Finding personnel who are able and willing
to help us define the data PCMM Giving analytic advice and
documentation What date should I use.? Where is this data.?
Building Advanced Tier II products Multifact table cubes Syndromic
Surveillance monitoring models with high dimensionality
scoring
Slide 107
Acknowledgments Kernel Jack Schram (Oakland OIFO) SQLI Ellen
Zufall (SF IRMS) FileMan/History of Production System Chuck Cobalis
RPC Broker and MUMPS coding Perry Richmond (VISN 18 BI) Regional
Data Process Vincent Bui and Ken Koenig (Region I SQL Back Office
Team)
Slide 108
Acknowledgments OI&T Business Intelligence Product Line
(BISL) Jack Bates Manager, OI&T BIPL Stephen Anderson Lead Data
Architect Mike Baker Lead ETL Architect Denver Griffith/Ken Fuchsel
Server Administrators Dave Fackler Ron Talmage Dan Hardan, Jeff
King, Jeff Price
Slide 109
Questions
Slide 110
Further Information On The Background For the VA Base M
Training http://vaww.vistau.med.va.gov/VistaU/MTraining/Def
ault.htm http://vaww.vistau.med.va.gov/VistaU/MTraining/Def
ault.htm For the VA Programming Standards and Conventions
http://vista.med.va.gov/sacc/ For the VA Document Library
http://vista.med.va.gov/vdl/
Slide 111
Resources for Further Information VA Information Resource
Center (ViREC) http://www.virec.research.va.gov National Patient
Care Database (Internal)
http://vaww.aac.va.gov/npcd/http://vaww.aac.va.gov/npcd/ National
Data Systems (NDS) (Internal)
http://vaww4.va.gov/NDS/DataAccess.asphttp://vaww4.va.gov/NDS/DataAccess.asp
Slide 112
Reading for Fun Official History VistA*/U.S. Department of
Veterans Affairs national- scale HIS Steven H. Brown, Michael J.
Lincoln, Peter J. Groen, Robert M. Kolodner International Journal
of Medical Informatics 69 (2003) 135/156 VistA Document Library
(VDL) www4.va.gov/vdl