Upload
harlan
View
39
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin. Health and Human Services Data Warehouse Redevelopment Project. Best Practices Data Audit Trails; Common Tables; Physical Data Model Standards; Person Matching; Address Cleansing Common Standards - PowerPoint PPT Presentation
Citation preview
Data Warehouse Core Common Models:
Progress and Future Direction
Jim Tepin
Health and Human Services Data
Warehouse Redevelopment Project
● Best Practiceso Data Audit Trails; Common Tables; Physical Data Model
Standards; Person Matching; Address Cleansing
● Common Standardso Physical Data Base Design & Security Role Standards
● High-Level Architectureo (including) Statewide Central “Lookup” Database
● Data Sharing / Central Views / Audit Complianceo Security Architecture Design
● Common Models – Address, Citizen, Events
HHS Data Warehouse Redevelopment Project
GartnerState of MichiganData Warehouse
Strategy
For internal use of State of Michigan only.© 2006 Gartner, Inc. and/or its affiliates. All rights reserved. Page 5
29 September 2006
DRAFT
Agency 1 Apps.
Agency 2 Apps.
Agency N Apps.
External Sources
OLTPApps.
Data Sources
ETLTools
Subject A Subject B Subject C
Foundation Layer
Optimization Layer
End Users
Logical View
Data Mart
Single unified Data Warehouse for all participating Departments / Agencies BI needs
Follows the best practice hybrid model
Nothing bypasses the Foundation Layer
No Department / Agency versions of data or independent data marts are part of this
Option 1 – Single Unified Data Warehouse
Target State Infrastructure: DW Architecture (Cont’d)5
For internal use of State of Michigan only.© 2006 Gartner, Inc. and/or its affiliates. All rights reserved. Page 6
29 September 2006
DRAFT
Agency 1 Apps.
Agency 2 Apps.
Agency N Apps.
External Sources
OLTPApps.
Data Sources
Department/Agency ETL
Agency 1 Agency 2 Agency N
Each Data Warehouse
includes common data
that is acquired independently
End Users
Dimensional views of
Department / Agency Data Warehouses
Shared Data Warehouse Infrastructure for those who elect to use it – slight variation of status quo
Data warehouses remain completely under the control of each Department / Agency
Data sharing is achieved on a Department / Agency to Department / Agency basis
Option 2 – Multiple Data Warehouse
Target State Infrastructure: DW Architecture (Cont’d)5
For internal use of State of Michigan only.© 2006 Gartner, Inc. and/or its affiliates. All rights reserved. Page 7
29 September 2006
DRAFT
DataSources
ETLProcesses
Master DataWarehouse
Example Department / Agency DataWarehouses
Agency 1 Agency 2 Agency 3 Agency N Agency N+1
Subject A Subject B
OLTPApps.
ExternalSources
Agency 1 Apps.
Agency 2 Sources
Agency N Apps.
Target State Infrastructure: DW Architecture (Cont’d)5
Master Data Warehouse contains a subset of common data identified as being widely useful
Option 3 – “Master” Data Warehouse
For internal use of State of Michigan only.© 2006 Gartner, Inc. and/or its affiliates. All rights reserved. Page 8
29 September 2006
DRAFTTarget State Infrastructure: DW Architecture (Cont’d)
Option 3 – “Master” Data Warehouse
5
Strengths: Provides Department / Agency control For the defined subset of State-wide data a single
foundation data model supports consistent results (a single version of the truth)
Provides for sharing of the most widely needed data
Provides a moderate degree of reuse and leverage of the technology infrastructure and staff
Potentially lower total cost of ownership than Option 2
Challenges: Deciding what should be included in the MDW is
very challenging AND this will change over time causing rework
Provides NO WAY to guarantee consistent results across all Departments / Agencies as there are no built-in controls to ensure the shared data source is used
Adding additional data types and relationships can be complex, costly and slow
A centralized data warehouse team must be created to manage the Master Data Warehouse
360 degree view of citizens and resulting outcome analysis may only be partially supported
Limited consistency of results and measures across Departments / Agencies achieved
Substantial redundancy of technologies, tools, staff and data acquisition through duplicated effort
Substantially larger total cost of ownership than Option 1
Potential single point of failure
For internal use of State of Michigan only.© 2006 Gartner, Inc. and/or its affiliates. All rights reserved. Page 9
29 September 2006
DRAFT
Common Address Model
HHS Common Address Prototype
Results:• Total Records: 132.6 million• Unique Raw Records: 34.2 million (74% reduction)
Lansing Subset:• Total Records: 2.3 million• Unique Raw Records: 575 thousand (75% reduction)• Unique Cleansed Records: 158 thousand (93% reduction)
Addresses across agencies (CSES, DCH, DHS, Judicial) were gathered, analyzed and cleansed.
Reductions above are based on record counts. A common model can also employ various technical means consistently (I.e. compression) to conserve disk space
● Data Architectureo Common location of both raw and cleansed addresses.o Secureo Central/Common Orientation
● Process Architectureo Simple Integrationo “Open”o Leverage Available Tools
● Complianceo HHS standards complianto Audit compliant
Common Address Model - Goals
Common AddressPhysical Model – P_SOM_COMMON
Cleansed_Address
Cleansed_Address_Id
Street_NumberStreet_Pre_DirectionStreet_NameStreet_SuffixStreet_Post_DirectionUnit_TypeUnit_NumberExtra_Secondary_Address_DataExtra_Private_Mailbox_DataCityState_AbbreviationZip_CodeZip_Code_4Urbanization_NameState_FIPS_CodeCounty_FIPS_CodeForeign_Address_IndicatorCASS_Record_TypeRecord_TypeRecord_Type_Default_IndicatorZip_Code_TypePrimary_USPS_Unit_TypeGeo_Match_CodeGeo_LattitudeGeo_LongitudeMichigan_X_CoordinateMichigan_Y_CoordinateGeo_FIPS_County_CodeGeo_FIPS_City_CodeGeo_Census_TractGeo_CBSA_NumberGeo_MSA_NumberCongressional_District_NumberPostal_Facility_Type_CodeZip_Code_Realignment_CodeDefault_Match_FlagCarrier_Route_NumberCarrier_Route_Sort_ZoneLine_Of_Travel_NumberLine_Of_Travel_SortationDelivery_Point_Check_DigitDelivery_Point_Barcode_AddendRecord_Add_TimestampRecord_Update_Timestamp
Cleansed_Address_Link
Address_IdCleansed_Address_Id
Primary_Dual_CodeStatus_CodeError_CodeLACS_IndicatorCity_Zip5_Match_FlagCity_Zip4_Match_FlagUnsuitable_Delivery_CodeCity_PlaceFirm_NameAddress_RemainderExtra_1_DataExtra_1_Data_Attn_Co_CodeExtra_2_DataExtra_2_Data_Attn_Co_CodeExtra_3_DataExtra_3_Data_Attn_Co_CodeExtra_4_DataExtra_4_Data_Attn_Co_CodeRecord_Add_TimestampRecord_Update_Timestamp
Cleansed_Address_Link_History
Address_IdCleansed_Address_IdRecord_Add_Timestamp
Primary_Dual_CodeStatus_CodeError_CodeLACS_IndicatorCity_Zip5_Match_FlagCity_Zip4_Match_FlagUnsuitable_Delivery_CodeCity_PlaceFirm_NameAddress_RemainderExtra_1_DataExtra_1_Data_Attn_Co_CodeExtra_2_DataExtra_2_Data_Attn_Co_CodeExtra_3_DataExtra_3_Data_Attn_Co_CodeExtra_4_DataExtra_4_Data_Attn_Co_CodeRecord_Update_TimestampRecord_Expire_Timestamp
Common_Address
Address_Id
Address_Line1Address_Line2Address_Line3Address_Line4CityStateZip_CodeRecord_Add_TimestampRecord_Update_TimestampAddress_Cleansed_Date
Common_Address_Agency_Link
Address_IdDW_Host_CodeOwner_CodeData_Source_Code
Record_Add_TimestampRecord_Update_Timestamp
Common AddressPhysical Model – P_SOM_LOOKUP
ACE_CASS_Record_Type_Info
CASS_Record_Type
CASS_Record_Type_Description
ACE_Error_Code_Info
Error_Code
Error_Code_Description
ACE_LACS_Indicator_Info
LACS_Indicator
LACS_Indicator_Description
ACE_Record_Type_Info
Record_Type
Record_Type_Description
ACE_Status_Code_Info
Status_Code
ACE_Truncation_NoteACE_City_State_Zip_Assn_NoteACE_Street_Dir_Suffix_NoteACE_County_CART_Unit_NoteACE_Lot_Urb_Note
FIPS_County_Class_Info
FIPS_Class_Code
FIPS_Class_Description
FIPS_County_Info
FIPS_State_CodeFIPS_County_Code
County_NameFIPS_Class_Code
FIPS_Sovereignty_Status_Info
Sovereignty_Status_Code
Sovereignty_Status_Description
FIPS_State_Territory_Info
FIPS_State_Code
State_Territory_NameState_Territory_Alpha_CodeTerritory_Type_CodeSovereignty_Status_Code
FIPS_Territory_Type_Info
Territory_Type_Code
Territory_Type_Description
● Common Area for System Codes & Values● Common Area for Federal Standards Codes (FIP,
NAICS, etc.)● Great Starting Point for Enterprise DW
P_SOM_Lookup Database
Common AddressPhysical Model – P_SOM_Control
SOM_Common_Proc_Load_Control
DW_Host_CodeCommon_Process_IdCommon_Process_PhaseHost_Process_IdRecord_Add_Timestamp
Load_StatusRecord_Update_TimestampAttempted_Seconds
SOM_Common_Process
DW_Host_CodeCommon_Process_IdHost_Process_Id
Expiration_TimestampEffective_TimestampSource_Database_NameSource_Table_NameOwner_CodeData_Source_CodeNotesRow_Add_User
SOM_Common_Process_Columns
COMMON_PROCESS_ID
DATABASE_NAMETABLE_NAMECOLUMN_NAMEEXPIRATION_TIMESTAMPEFFECTIVE_TIMESTAMPFIELD_IDCOLUMN_DEFINITIONUPI_FLAGNUPI_FLAGUSI_FLAGNUSI_FLAGNOTESROW_ADD_USER
SOM_Common_Process_Email
DW_Host_CodeCommon_Process_IdHost_Process_IdUser_KindUser_IdEmail_Address
Expiration_TimestampEffective_TimestampEmail_On_ActionNotesRow_Add_User
SOM_Common_Process_Field_Map
DW_HOST_CODECOMMON_PROCESS_IDHOST_PROCESS_IDDATABASE_NAMETABLE_NAMECOLUMN_NAMEMAP_SEQUENCEDW_Host_CodeCommon_Process_IdHost_Process_Id
EXPIRATION_TIMESTAMPEFFECTIVE_TIMESTAMPSOURCE_DATABASE_NAMESOURCE_TABLE_NAMESOURCE_COLUMN_NAMECOLUMN_SELECT_TEXTSQL_TRIM_FLAGNOTESROW_ADD_USER
SOM_Common_Process_Job_Audit
System_Job_NameJob_NameStep_NameStep_Start_Timestamp
Action_Start_TimestampAction_End_TimestampBatch_IdAudit_TagAction_Duration_SecondsRecords_InputRecords_OutputRecords_InsertedRecords_UpdatedRecords_DeletedRecords_ProcessedRecords_Not_ProcessedError_RecordsError_Records2Record_Add_Timestamp
SOM_Common_Process_Job_Detail
System_Job_NameJob_NameStep_NameStep_Start_Timestamp
Batch_IdJob_Start_TimestampStep_End_TimestampStep_Type_DescriptionStep_SequenceStep_Run_IndicatorStep_Fail_IndicatorStep_Return_CodeStep_Fail_Action_CodeUnix_Process_IdRecord_Add_Timestamp
SOM_Common_Process_Table_Key
Source_Database_NameSource_Table_Name
Source_Column_NameExpiration_TimestampEffective_TimestampNotesRow_Add_UserSQL_Null_Default
SOM_Common_Process_User
DW_Host_CodeCommon_Process_IdHost_Process_IdUser_KindUser_Id
Expiration_TimestampEffective_TimestampNotesRow_Add_UserRetry_Seconds
SOM_DW_Data_Source
DW_Host_CodeOwner_CodeData_Source_Code
Data_Source_DescriptionRow_Add_User
SOM_DW_Host_Agency
DW_Host_Code
Expiration_TimestampEffective_TimestampAgency_DescriptionRow_Add_User
Common Address Demonstration
Common Address – Internal Processing
Common LoadControl
Common RawAddresses
Check Phase IContention
Conention? Yes
Read SettingsPause -OR- Abort
Start
NoStagingTable
Common (Raw)Address
Maintenance
Raw Data
Address / AgencyRelation
End
Process Input / Output
Common ProcessAgency Settings
Address Id
Option I (Phase I)
Option 2?
Yes
No
Option II (Phase II)
Common LoadControl
Mark Complete
Phase II
End
PostalCleansing
(Windows)
Extract Addressesto Cleanse
Raw Address Data (NAS)
Cleansed Address Data (NAS)
Check Phase IIContention
Conention?
Common LoadControl
Cleansed Address
Cleansed AddressLink
PopulateCleansedAddresses
No
Load Staging
(Unix)
Cleansed AddressStaging
Repeatfor Dual/Canada
Common LoadControl
Mark Complete
Read Settings
Yes
Pause -OR- Abort
Common RawAddresses
Phase II
● Integration with Common Citizen● Security Mechanisms● IQ8 – Delivery Point Validation
Common Address – On the Horizon
Common CitizenModel
Common Citizen - Physical ModelCommon_Demographic
CPD_Id
Social_Security_NumberDate_Of_BirthFirst_NameMiddle_NameLast_NameName_SuffixName_TitleRecord_Add_Timestamp
Common_Demographic_NYSIIS
CPD_Id
First_Name_NYSIISMiddle_Name_NYSIISFirst_Middle_NYSIISMiddle_First_NYSIISLast_Name_NYSIISLast_Name_Part1Last_Name_Part2Last_Name_Part1_NYSIISLast_Name_Part2_NYSIISLast_Name_P2_P1_NYSIIS
Person
Person_IdDW_Host_CodeOwner_CodeData_Source_Code
Primary_CPD_IdMDOS_IdGenderEthnicity_CodeDate_Of_DeathDeath_Verification_CodeUS_Residency_Status_CodeMigrant_FlagHeightWeightEye_ColorHair_ColorIdentifying_MarksEducation_Level_CodeEnglish_Literacy_CodePrimary_Language_CodeReligion_CodeUnique_Citizen_IdentifierVeteran_IndicatorMultiple_Race_FlagSource_Effective_DateRecord_Add_TimestampRecord_Update_TimestampUS_Citizen_Status_Code
Person_Demographic_Link
Person_IdDW_Host_CodeOwner_CodeData_Source_CodeCPD_Id
Primary_Alias_CodeSSN_Verification_CodeBirth_Date_Verification_CodeSource_Effective_DateRecord_Update_TimestampRecord_Add_Timestamp
PERSON_MATCH
Person_IdDW_Host_CodeOwner_CodeData_Source_CodeCPD_IdMatched_Person_IdMatched_DW_Host_CodeMatched_Owner_CodeMatched_Data_Source_CodeMatched_CPD_Id
Primary_Alias_CodeMatched_Primary_Alias_CodeSSN_Matched_DigitsLast_Name_MatchLast_Name4_MatchFirst_Name_MatchFirst_Initial_MatchName_Suffix_MatchDOB_Year_MatchDOB_Month_MatchDOB_Day_MatchDOB_Null_IndicatorDOB_Month_01_IndicatorDOB_Day_01_IndicatorMatched_DOB_Null_IndicatorMatched_DOB_Month_01_IndicatorMatched_DOB_Day_01_IndicatorCross_Ref_Name_MatchLast_Name_NYSIIS_MatchFirst_Name_NYSIIS_MatchCross_Ref_Partial_Name_MatchFirst_Name_2_3_Char_MatchName_In_Name_MatchCross_Ref_Name_In_Name_MatchCross_Ref_DOB_MatchGender_MatchGender_Null_IndicatorMatched_Gender_Null_IndicatorDrivers_License_MatchSource_Matched_IndicatorMiddle_Name_MismatchFirst_Name_Blank_IndicatorLast_Name_Blank_IndicatorRecord_Add_TimestampRecord_Update_Timestamp
Person_Race
Person_IdDW_Host_CodeOwner_CodeData_Source_CodeRace_Code
Primary_Race_FlagPrimary_Provided_CodeRecord_Add_Timestamp
Person_Worker
Person_IdDW_Host_CodeOwner_CodeData_Source_CodeWorker_Id
Record_Add_TimestampRelation_DescriptionRecord_Expire_Date Primary_Language_Codes
Primary_Language_Code
Primary_Language_DescriptionRecord_Add_Timestamp
Religion_Codes
Religion_Code
Religion_DescriptionRecord_Add_Timestamp
US_Census_Ethnicity_Codes
Ethnicity_Code
Code_Check_DigitEthnicity_Concept_Level1Ethnicity_Concept_Level2Ethnicity_Concept_Level3Ethnicity_Concept_Level4Ethnicity_DescriptionEthnicity_SynonymDate_Added_to_Version
US_Census_Race_Codes
Race_Code
Code_Check_DigitRace_Concept_Level1Race_Concept_Level2Race_Concept_Level3Race_Concept_Level4Race_DescriptionRace_SynonymDate_Added_to_Version
US_Citizen_Status_Codes
US_Citizen_Status_Code
US_Citizen_DescriptionRecord_Add_Timestamp
US_Residency_Status_Codes
US_Residency_Status_CodeUS_Citizen_Status_Code
US_Residency_DescriptionRecord_Add_Timestamp
Overview:● Merged View of Various “Events”● Very Extensible (i.e. date of birth)● Tend to be the relationship of a person, an
organization and a time element.● Can be “one-time” or over a duration.
Intent:● Micro-analysis.● Macro-analysis.
Citizen Events
Citizen Events Model
Person_Match
Person_IdDW_Host_CodeOwner_CodeData_Source_CodeCPD_IdMatched_Person_IdMatched_DW_Host_CodeMatched_Owner_CodeMatched_Data_Source_CodeMatched_CPD_Id
Primary_Alias_CodeMatched_Primary_Alias_Codeadditional_not_shown
Event_Organization
Organization_Id
Organization_DescriptionFederal_Tax_Id
Person
Person_IdDW_Host_CodeOwner_CodeData_Source_Code
US_Citizen_Status_CodePrimary_CPD_IdMDOS_IdGenderEthnicity_CodeDate_Of_DeathDeath_Verification_CodeUS_Residency_Status_CodeMigrant_FlagHeightWeightEye_ColorHair_ColorIdentifying_MarksEducation_Level_CodeEnglish_Literacy_CodePrimary_Language_CodeReligion_CodeSource_Effective_DateUnique_Citizen_IdentifierVeteran_IndicatorMultiple_Race_FlagRecord_Add_TimestampRecord_Update_Timestamp
Person_Event
Person_IdDW_Host_CodeOwner_CodeData_Source_CodeOrganization_IdEvent_Type_Code
Begin_DateEnd_DateEnd_Date_ScheduledCompletion_CodeSupplemental_Character_DataSupplemental_Numeric_DataUOM_CountUOM_Count2UOM_AmountUOM_Amount2UOM_Amount3Date1Date2Date3Address_IdRecord_Add_TimestampRecord_Update_Timestamp
Event_Completion_Codes
Event_Type_CodeCompletion_Code
Completion_DescriptionSource_Completion_CodeRecord_Add_Timestamp
Event_Type_Info
Event_Type_Code
Event_DescriptionSupplemental_Character_DescSupplemental_Numeric_DescUOM_Frequency_CodeUOM_Count_DescriptionUOM_Count_Description2UOM_Amount_DescriptionUOM_Amount_Description2UOM_Amount_Description3Date1_DescriptionDate2_DescriptionDate3_DescriptionRecord_Add_Timestamp
Event_UOM_Frequency_Codes
UOM_Frequency_Code
UOM_Frequency_DescriptionAnnual_MultiplierMonthly_MultiplierWeekly_Multiplier
EventDescription
BeginDate
EndDate
CompletionDescription
UOM AmountDescription
UOMAmount
UOM AmountDescription2
UOMAmount2
Quarterly Wage Record 2001-07-01 2004-03-31 Reported Earnings Wages Paid 85.75 ? ?
Quarterly Wage Record 2002-01-01 2002-09-30 Reported Earnings Wages Paid 284.93 ? ?
SSI Rate Change 2005-03-30 2005-06-10Application for SSI is pending Gross Payment 0 Current Payment 0
Quarterly Wage Record 2005-04-01 2005-09-30 Reported Earnings Wages Paid 948.17 ? ?
SSI Rate Change 2005-06-10 2006-05-20 SSI application denied Gross Payment 0 Current Payment 0
RSDI Rate Change 2005-11-03 9999-12-31 Disallowed claim Gross Payment 0 Net Payment 0
SSI Rate Change 2006-05-20 9999-12-31 Closure of SSI record Gross Payment 0 Current Payment 0
Citizen Event Sample
● Prove the concept.● Integrate with Common Address● Establish Security Architecture● Business Intelligence Competency Center
Citizen EventsOn the Horizon