24
Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

  • Upload
    harlan

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin. Health and Human Services Data Warehouse Redevelopment Project. Best Practices Data Audit Trails; Common Tables; Physical Data Model Standards; Person Matching; Address Cleansing Common Standards - PowerPoint PPT Presentation

Citation preview

Page 1: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

Data Warehouse Core Common Models:

Progress and Future Direction

Jim Tepin

Page 2: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

Health and Human Services Data

Warehouse Redevelopment Project

Page 3: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

● Best Practiceso Data Audit Trails; Common Tables; Physical Data Model

Standards; Person Matching; Address Cleansing

● Common Standardso Physical Data Base Design & Security Role Standards

● High-Level Architectureo (including) Statewide Central “Lookup” Database

● Data Sharing / Central Views / Audit Complianceo Security Architecture Design

● Common Models – Address, Citizen, Events

HHS Data Warehouse Redevelopment Project

Page 4: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

GartnerState of MichiganData Warehouse

Strategy

Page 5: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

For internal use of State of Michigan only.© 2006 Gartner, Inc. and/or its affiliates. All rights reserved. Page 5

29 September 2006

DRAFT

Agency 1 Apps.

Agency 2 Apps.

Agency N Apps.

External Sources

OLTPApps.

Data Sources

ETLTools

Subject A Subject B Subject C

Foundation Layer

Optimization Layer

End Users

Logical View

Data Mart

Single unified Data Warehouse for all participating Departments / Agencies BI needs

Follows the best practice hybrid model

Nothing bypasses the Foundation Layer

No Department / Agency versions of data or independent data marts are part of this

Option 1 – Single Unified Data Warehouse

Target State Infrastructure: DW Architecture (Cont’d)5

Page 6: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

For internal use of State of Michigan only.© 2006 Gartner, Inc. and/or its affiliates. All rights reserved. Page 6

29 September 2006

DRAFT

Agency 1 Apps.

Agency 2 Apps.

Agency N Apps.

External Sources

OLTPApps.

Data Sources

Department/Agency ETL

Agency 1 Agency 2 Agency N

Each Data Warehouse

includes common data

that is acquired independently

End Users

Dimensional views of

Department / Agency Data Warehouses

Shared Data Warehouse Infrastructure for those who elect to use it – slight variation of status quo

Data warehouses remain completely under the control of each Department / Agency

Data sharing is achieved on a Department / Agency to Department / Agency basis

Option 2 – Multiple Data Warehouse

Target State Infrastructure: DW Architecture (Cont’d)5

Page 7: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

For internal use of State of Michigan only.© 2006 Gartner, Inc. and/or its affiliates. All rights reserved. Page 7

29 September 2006

DRAFT

DataSources

ETLProcesses

Master DataWarehouse

Example Department / Agency DataWarehouses

Agency 1 Agency 2 Agency 3 Agency N Agency N+1

Subject A Subject B

OLTPApps.

ExternalSources

Agency 1 Apps.

Agency 2 Sources

Agency N Apps.

Target State Infrastructure: DW Architecture (Cont’d)5

Master Data Warehouse contains a subset of common data identified as being widely useful

Option 3 – “Master” Data Warehouse

Page 8: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

For internal use of State of Michigan only.© 2006 Gartner, Inc. and/or its affiliates. All rights reserved. Page 8

29 September 2006

DRAFTTarget State Infrastructure: DW Architecture (Cont’d)

Option 3 – “Master” Data Warehouse

5

Strengths: Provides Department / Agency control For the defined subset of State-wide data a single

foundation data model supports consistent results (a single version of the truth)

Provides for sharing of the most widely needed data

Provides a moderate degree of reuse and leverage of the technology infrastructure and staff

Potentially lower total cost of ownership than Option 2

Challenges: Deciding what should be included in the MDW is

very challenging AND this will change over time causing rework

Provides NO WAY to guarantee consistent results across all Departments / Agencies as there are no built-in controls to ensure the shared data source is used

Adding additional data types and relationships can be complex, costly and slow

A centralized data warehouse team must be created to manage the Master Data Warehouse

360 degree view of citizens and resulting outcome analysis may only be partially supported

Limited consistency of results and measures across Departments / Agencies achieved

Substantial redundancy of technologies, tools, staff and data acquisition through duplicated effort

Substantially larger total cost of ownership than Option 1

Potential single point of failure

Page 9: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

For internal use of State of Michigan only.© 2006 Gartner, Inc. and/or its affiliates. All rights reserved. Page 9

29 September 2006

DRAFT

Common Address Model

Page 10: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

HHS Common Address Prototype

Results:• Total Records: 132.6 million• Unique Raw Records: 34.2 million (74% reduction)

Lansing Subset:• Total Records: 2.3 million• Unique Raw Records: 575 thousand (75% reduction)• Unique Cleansed Records: 158 thousand (93% reduction)

Addresses across agencies (CSES, DCH, DHS, Judicial) were gathered, analyzed and cleansed.

Reductions above are based on record counts. A common model can also employ various technical means consistently (I.e. compression) to conserve disk space

Page 11: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

● Data Architectureo Common location of both raw and cleansed addresses.o Secureo Central/Common Orientation

● Process Architectureo Simple Integrationo “Open”o Leverage Available Tools

● Complianceo HHS standards complianto Audit compliant

Common Address Model - Goals

Page 12: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

Common AddressPhysical Model – P_SOM_COMMON

Cleansed_Address

Cleansed_Address_Id

Street_NumberStreet_Pre_DirectionStreet_NameStreet_SuffixStreet_Post_DirectionUnit_TypeUnit_NumberExtra_Secondary_Address_DataExtra_Private_Mailbox_DataCityState_AbbreviationZip_CodeZip_Code_4Urbanization_NameState_FIPS_CodeCounty_FIPS_CodeForeign_Address_IndicatorCASS_Record_TypeRecord_TypeRecord_Type_Default_IndicatorZip_Code_TypePrimary_USPS_Unit_TypeGeo_Match_CodeGeo_LattitudeGeo_LongitudeMichigan_X_CoordinateMichigan_Y_CoordinateGeo_FIPS_County_CodeGeo_FIPS_City_CodeGeo_Census_TractGeo_CBSA_NumberGeo_MSA_NumberCongressional_District_NumberPostal_Facility_Type_CodeZip_Code_Realignment_CodeDefault_Match_FlagCarrier_Route_NumberCarrier_Route_Sort_ZoneLine_Of_Travel_NumberLine_Of_Travel_SortationDelivery_Point_Check_DigitDelivery_Point_Barcode_AddendRecord_Add_TimestampRecord_Update_Timestamp

Cleansed_Address_Link

Address_IdCleansed_Address_Id

Primary_Dual_CodeStatus_CodeError_CodeLACS_IndicatorCity_Zip5_Match_FlagCity_Zip4_Match_FlagUnsuitable_Delivery_CodeCity_PlaceFirm_NameAddress_RemainderExtra_1_DataExtra_1_Data_Attn_Co_CodeExtra_2_DataExtra_2_Data_Attn_Co_CodeExtra_3_DataExtra_3_Data_Attn_Co_CodeExtra_4_DataExtra_4_Data_Attn_Co_CodeRecord_Add_TimestampRecord_Update_Timestamp

Cleansed_Address_Link_History

Address_IdCleansed_Address_IdRecord_Add_Timestamp

Primary_Dual_CodeStatus_CodeError_CodeLACS_IndicatorCity_Zip5_Match_FlagCity_Zip4_Match_FlagUnsuitable_Delivery_CodeCity_PlaceFirm_NameAddress_RemainderExtra_1_DataExtra_1_Data_Attn_Co_CodeExtra_2_DataExtra_2_Data_Attn_Co_CodeExtra_3_DataExtra_3_Data_Attn_Co_CodeExtra_4_DataExtra_4_Data_Attn_Co_CodeRecord_Update_TimestampRecord_Expire_Timestamp

Common_Address

Address_Id

Address_Line1Address_Line2Address_Line3Address_Line4CityStateZip_CodeRecord_Add_TimestampRecord_Update_TimestampAddress_Cleansed_Date

Common_Address_Agency_Link

Address_IdDW_Host_CodeOwner_CodeData_Source_Code

Record_Add_TimestampRecord_Update_Timestamp

Page 13: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

Common AddressPhysical Model – P_SOM_LOOKUP

ACE_CASS_Record_Type_Info

CASS_Record_Type

CASS_Record_Type_Description

ACE_Error_Code_Info

Error_Code

Error_Code_Description

ACE_LACS_Indicator_Info

LACS_Indicator

LACS_Indicator_Description

ACE_Record_Type_Info

Record_Type

Record_Type_Description

ACE_Status_Code_Info

Status_Code

ACE_Truncation_NoteACE_City_State_Zip_Assn_NoteACE_Street_Dir_Suffix_NoteACE_County_CART_Unit_NoteACE_Lot_Urb_Note

FIPS_County_Class_Info

FIPS_Class_Code

FIPS_Class_Description

FIPS_County_Info

FIPS_State_CodeFIPS_County_Code

County_NameFIPS_Class_Code

FIPS_Sovereignty_Status_Info

Sovereignty_Status_Code

Sovereignty_Status_Description

FIPS_State_Territory_Info

FIPS_State_Code

State_Territory_NameState_Territory_Alpha_CodeTerritory_Type_CodeSovereignty_Status_Code

FIPS_Territory_Type_Info

Territory_Type_Code

Territory_Type_Description

Page 14: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

● Common Area for System Codes & Values● Common Area for Federal Standards Codes (FIP,

NAICS, etc.)● Great Starting Point for Enterprise DW

P_SOM_Lookup Database

Page 15: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

Common AddressPhysical Model – P_SOM_Control

SOM_Common_Proc_Load_Control

DW_Host_CodeCommon_Process_IdCommon_Process_PhaseHost_Process_IdRecord_Add_Timestamp

Load_StatusRecord_Update_TimestampAttempted_Seconds

SOM_Common_Process

DW_Host_CodeCommon_Process_IdHost_Process_Id

Expiration_TimestampEffective_TimestampSource_Database_NameSource_Table_NameOwner_CodeData_Source_CodeNotesRow_Add_User

SOM_Common_Process_Columns

COMMON_PROCESS_ID

DATABASE_NAMETABLE_NAMECOLUMN_NAMEEXPIRATION_TIMESTAMPEFFECTIVE_TIMESTAMPFIELD_IDCOLUMN_DEFINITIONUPI_FLAGNUPI_FLAGUSI_FLAGNUSI_FLAGNOTESROW_ADD_USER

SOM_Common_Process_Email

DW_Host_CodeCommon_Process_IdHost_Process_IdUser_KindUser_IdEmail_Address

Expiration_TimestampEffective_TimestampEmail_On_ActionNotesRow_Add_User

SOM_Common_Process_Field_Map

DW_HOST_CODECOMMON_PROCESS_IDHOST_PROCESS_IDDATABASE_NAMETABLE_NAMECOLUMN_NAMEMAP_SEQUENCEDW_Host_CodeCommon_Process_IdHost_Process_Id

EXPIRATION_TIMESTAMPEFFECTIVE_TIMESTAMPSOURCE_DATABASE_NAMESOURCE_TABLE_NAMESOURCE_COLUMN_NAMECOLUMN_SELECT_TEXTSQL_TRIM_FLAGNOTESROW_ADD_USER

SOM_Common_Process_Job_Audit

System_Job_NameJob_NameStep_NameStep_Start_Timestamp

Action_Start_TimestampAction_End_TimestampBatch_IdAudit_TagAction_Duration_SecondsRecords_InputRecords_OutputRecords_InsertedRecords_UpdatedRecords_DeletedRecords_ProcessedRecords_Not_ProcessedError_RecordsError_Records2Record_Add_Timestamp

SOM_Common_Process_Job_Detail

System_Job_NameJob_NameStep_NameStep_Start_Timestamp

Batch_IdJob_Start_TimestampStep_End_TimestampStep_Type_DescriptionStep_SequenceStep_Run_IndicatorStep_Fail_IndicatorStep_Return_CodeStep_Fail_Action_CodeUnix_Process_IdRecord_Add_Timestamp

SOM_Common_Process_Table_Key

Source_Database_NameSource_Table_Name

Source_Column_NameExpiration_TimestampEffective_TimestampNotesRow_Add_UserSQL_Null_Default

SOM_Common_Process_User

DW_Host_CodeCommon_Process_IdHost_Process_IdUser_KindUser_Id

Expiration_TimestampEffective_TimestampNotesRow_Add_UserRetry_Seconds

SOM_DW_Data_Source

DW_Host_CodeOwner_CodeData_Source_Code

Data_Source_DescriptionRow_Add_User

SOM_DW_Host_Agency

DW_Host_Code

Expiration_TimestampEffective_TimestampAgency_DescriptionRow_Add_User

Page 16: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

Common Address Demonstration

Page 17: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

Common Address – Internal Processing

Common LoadControl

Common RawAddresses

Check Phase IContention

Conention? Yes

Read SettingsPause -OR- Abort

Start

NoStagingTable

Common (Raw)Address

Maintenance

Raw Data

Address / AgencyRelation

End

Process Input / Output

Common ProcessAgency Settings

Address Id

Option I (Phase I)

Option 2?

Yes

No

Option II (Phase II)

Common LoadControl

Mark Complete

Phase II

End

PostalCleansing

(Windows)

Extract Addressesto Cleanse

Raw Address Data (NAS)

Cleansed Address Data (NAS)

Check Phase IIContention

Conention?

Common LoadControl

Cleansed Address

Cleansed AddressLink

PopulateCleansedAddresses

No

Load Staging

(Unix)

Cleansed AddressStaging

Repeatfor Dual/Canada

Common LoadControl

Mark Complete

Read Settings

Yes

Pause -OR- Abort

Common RawAddresses

Phase II

Page 18: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

● Integration with Common Citizen● Security Mechanisms● IQ8 – Delivery Point Validation

Common Address – On the Horizon

Page 19: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

Common CitizenModel

Page 20: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

Common Citizen - Physical ModelCommon_Demographic

CPD_Id

Social_Security_NumberDate_Of_BirthFirst_NameMiddle_NameLast_NameName_SuffixName_TitleRecord_Add_Timestamp

Common_Demographic_NYSIIS

CPD_Id

First_Name_NYSIISMiddle_Name_NYSIISFirst_Middle_NYSIISMiddle_First_NYSIISLast_Name_NYSIISLast_Name_Part1Last_Name_Part2Last_Name_Part1_NYSIISLast_Name_Part2_NYSIISLast_Name_P2_P1_NYSIIS

Person

Person_IdDW_Host_CodeOwner_CodeData_Source_Code

Primary_CPD_IdMDOS_IdGenderEthnicity_CodeDate_Of_DeathDeath_Verification_CodeUS_Residency_Status_CodeMigrant_FlagHeightWeightEye_ColorHair_ColorIdentifying_MarksEducation_Level_CodeEnglish_Literacy_CodePrimary_Language_CodeReligion_CodeUnique_Citizen_IdentifierVeteran_IndicatorMultiple_Race_FlagSource_Effective_DateRecord_Add_TimestampRecord_Update_TimestampUS_Citizen_Status_Code

Person_Demographic_Link

Person_IdDW_Host_CodeOwner_CodeData_Source_CodeCPD_Id

Primary_Alias_CodeSSN_Verification_CodeBirth_Date_Verification_CodeSource_Effective_DateRecord_Update_TimestampRecord_Add_Timestamp

PERSON_MATCH

Person_IdDW_Host_CodeOwner_CodeData_Source_CodeCPD_IdMatched_Person_IdMatched_DW_Host_CodeMatched_Owner_CodeMatched_Data_Source_CodeMatched_CPD_Id

Primary_Alias_CodeMatched_Primary_Alias_CodeSSN_Matched_DigitsLast_Name_MatchLast_Name4_MatchFirst_Name_MatchFirst_Initial_MatchName_Suffix_MatchDOB_Year_MatchDOB_Month_MatchDOB_Day_MatchDOB_Null_IndicatorDOB_Month_01_IndicatorDOB_Day_01_IndicatorMatched_DOB_Null_IndicatorMatched_DOB_Month_01_IndicatorMatched_DOB_Day_01_IndicatorCross_Ref_Name_MatchLast_Name_NYSIIS_MatchFirst_Name_NYSIIS_MatchCross_Ref_Partial_Name_MatchFirst_Name_2_3_Char_MatchName_In_Name_MatchCross_Ref_Name_In_Name_MatchCross_Ref_DOB_MatchGender_MatchGender_Null_IndicatorMatched_Gender_Null_IndicatorDrivers_License_MatchSource_Matched_IndicatorMiddle_Name_MismatchFirst_Name_Blank_IndicatorLast_Name_Blank_IndicatorRecord_Add_TimestampRecord_Update_Timestamp

Person_Race

Person_IdDW_Host_CodeOwner_CodeData_Source_CodeRace_Code

Primary_Race_FlagPrimary_Provided_CodeRecord_Add_Timestamp

Person_Worker

Person_IdDW_Host_CodeOwner_CodeData_Source_CodeWorker_Id

Record_Add_TimestampRelation_DescriptionRecord_Expire_Date Primary_Language_Codes

Primary_Language_Code

Primary_Language_DescriptionRecord_Add_Timestamp

Religion_Codes

Religion_Code

Religion_DescriptionRecord_Add_Timestamp

US_Census_Ethnicity_Codes

Ethnicity_Code

Code_Check_DigitEthnicity_Concept_Level1Ethnicity_Concept_Level2Ethnicity_Concept_Level3Ethnicity_Concept_Level4Ethnicity_DescriptionEthnicity_SynonymDate_Added_to_Version

US_Census_Race_Codes

Race_Code

Code_Check_DigitRace_Concept_Level1Race_Concept_Level2Race_Concept_Level3Race_Concept_Level4Race_DescriptionRace_SynonymDate_Added_to_Version

US_Citizen_Status_Codes

US_Citizen_Status_Code

US_Citizen_DescriptionRecord_Add_Timestamp

US_Residency_Status_Codes

US_Residency_Status_CodeUS_Citizen_Status_Code

US_Residency_DescriptionRecord_Add_Timestamp

Page 21: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

Overview:● Merged View of Various “Events”● Very Extensible (i.e. date of birth)● Tend to be the relationship of a person, an

organization and a time element.● Can be “one-time” or over a duration.

Intent:● Micro-analysis.● Macro-analysis.

Citizen Events

Page 22: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

Citizen Events Model

Person_Match

Person_IdDW_Host_CodeOwner_CodeData_Source_CodeCPD_IdMatched_Person_IdMatched_DW_Host_CodeMatched_Owner_CodeMatched_Data_Source_CodeMatched_CPD_Id

Primary_Alias_CodeMatched_Primary_Alias_Codeadditional_not_shown

Event_Organization

Organization_Id

Organization_DescriptionFederal_Tax_Id

Person

Person_IdDW_Host_CodeOwner_CodeData_Source_Code

US_Citizen_Status_CodePrimary_CPD_IdMDOS_IdGenderEthnicity_CodeDate_Of_DeathDeath_Verification_CodeUS_Residency_Status_CodeMigrant_FlagHeightWeightEye_ColorHair_ColorIdentifying_MarksEducation_Level_CodeEnglish_Literacy_CodePrimary_Language_CodeReligion_CodeSource_Effective_DateUnique_Citizen_IdentifierVeteran_IndicatorMultiple_Race_FlagRecord_Add_TimestampRecord_Update_Timestamp

Person_Event

Person_IdDW_Host_CodeOwner_CodeData_Source_CodeOrganization_IdEvent_Type_Code

Begin_DateEnd_DateEnd_Date_ScheduledCompletion_CodeSupplemental_Character_DataSupplemental_Numeric_DataUOM_CountUOM_Count2UOM_AmountUOM_Amount2UOM_Amount3Date1Date2Date3Address_IdRecord_Add_TimestampRecord_Update_Timestamp

Event_Completion_Codes

Event_Type_CodeCompletion_Code

Completion_DescriptionSource_Completion_CodeRecord_Add_Timestamp

Event_Type_Info

Event_Type_Code

Event_DescriptionSupplemental_Character_DescSupplemental_Numeric_DescUOM_Frequency_CodeUOM_Count_DescriptionUOM_Count_Description2UOM_Amount_DescriptionUOM_Amount_Description2UOM_Amount_Description3Date1_DescriptionDate2_DescriptionDate3_DescriptionRecord_Add_Timestamp

Event_UOM_Frequency_Codes

UOM_Frequency_Code

UOM_Frequency_DescriptionAnnual_MultiplierMonthly_MultiplierWeekly_Multiplier

Page 23: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

EventDescription

BeginDate

EndDate

CompletionDescription

UOM AmountDescription

UOMAmount

UOM AmountDescription2

UOMAmount2

Quarterly Wage Record 2001-07-01 2004-03-31 Reported Earnings Wages Paid 85.75 ? ?

Quarterly Wage Record 2002-01-01 2002-09-30 Reported Earnings Wages Paid 284.93 ? ?

SSI Rate Change 2005-03-30 2005-06-10Application for SSI is pending Gross Payment 0 Current Payment 0

Quarterly Wage Record 2005-04-01 2005-09-30 Reported Earnings Wages Paid 948.17 ? ?

SSI Rate Change 2005-06-10 2006-05-20 SSI application denied Gross Payment 0 Current Payment 0

RSDI Rate Change 2005-11-03 9999-12-31 Disallowed claim Gross Payment 0 Net Payment 0

SSI Rate Change 2006-05-20 9999-12-31 Closure of SSI record Gross Payment 0 Current Payment 0

Citizen Event Sample

Page 24: Data Warehouse Core Common Models: Progress and Future Direction Jim Tepin

● Prove the concept.● Integrate with Common Address● Establish Security Architecture● Business Intelligence Competency Center

Citizen EventsOn the Horizon