46574483 Siperian Hub Implementer Guide

Siperian Hub XU

Implementers Guide

XU

2007 Siperian, Inc.

Copyright 2007 Siperian, Inc. [Unpublished - rights reserved under the Copyright Laws of the United States]

THIS DOCUMENTATION CONTAINS CONFIDENTIAL INFORMATION AND TRADE SECRETS OF SIPERIAN, INC. USE, DISCLOSURE OR REPRODUCTION IS PROHIBITED WITHOUT THE PRIOR EXPRESS WRITTEN PERMISSION OF SIPERIAN, INC.

Contents

PrefaICLC

ChapS

RP

ChapGD

CCcentended Audience ............................................................................................................................................xontents .............................................................................................................................................................xiearning About Siperian Hub .......................................................................................................................xiiontacting Siperian ........................................................................................................................................xiv

ter 1: Introducing Siperian Hub Implementationiperian Implementation Methodology..........................................................................................................2

Reducing Project Risk.............................................................................................................................2Core Principles .........................................................................................................................................2

oles in a Siperian Hub Implementation Project .........................................................................................4hases in an Siperian Hub Implementation Project ....................................................................................6

Discover Phase.........................................................................................................................................6Analyze Phase...........................................................................................................................................7Design Phase ............................................................................................................................................7Build Phase ...............................................................................................................................................8Deploy Phase............................................................................................................................................8

ter 2: Analyzing Dataetting Started ...................................................................................................................................................9efining the Flow of Data Between Siperian Hub and Source/Target Systems ..................................10

Determine Data Source Characteristics .............................................................................................10Assemble a Statistically Representative Sample Data Set................................................................11Consider Data Sizing.............................................................................................................................11Consider the Relationship Between Data and Business Processes................................................12

onsider Data Cleansing and Standardization Rules .................................................................................12onsider Trust Levels and Validation Rules ...............................................................................................12iii

iv Sip

Trust Levels............................................................................................................................................ 13Validation Rules..................................................................................................................................... 14

Consider Match Rules .................................................................................................................................... 14

ChapA

D

D

ChapU

Uerian Hub XU Implementers Guide

ter 3: Designing the Data Modelbout Data Modeling for MRM................................................................................................................... 18

Data Model Design Deliverables ....................................................................................................... 18Conceptual Model ................................................................................................................................. 19Logical Model ........................................................................................................................................ 20Physical Model....................................................................................................................................... 24

esign Principles............................................................................................................................................. 27Principle 1: Consider Deep Versus Wide .......................................................................................... 28Principle 2: Match Requirements Drive the Model ......................................................................... 29Principle 3: Consolidation Counts...................................................................................................... 30Principle 4: Pass the Independence Test .......................................................................................... 33Principle 5: Mix Different Types of Customers Carefully .............................................................. 36Principle 6: Landing and Staging Data............................................................................................... 40

esign Patterns ............................................................................................................................................... 42Households ............................................................................................................................................ 42Addresses................................................................................................................................................ 43Populating the Address Household Object ...................................................................................... 45Communication Channel Models ....................................................................................................... 46

ter 4: Using Trust Settings and Validation Rulessing Trust Levels .......................................................................................................................................... 52

About Trust Levels ............................................................................................................................... 52How Trust Works ................................................................................................................................. 52Ranking Source Systems According to Trustworthiness ................................................................ 55Trust Best Practices .............................................................................................................................. 58Configuring Trust Levels ..................................................................................................................... 60Example Stored Procedure to Calculate Decayed Trust................................................................. 63

sing Validation.............................................................................................................................................. 65

About Validation Rules.........................................................................................................................65How Validation Works .........................................................................................................................65Best Practices for Validation Rules.....................................................................................................68

Using Trust and Validation Together...........................................................................................................70

ChapA

PT

SM

D

ME

SContents v

Scenarios Involving Trust and Validation for a Column.................................................................70What Happens When a Record Is Updated ......................................................................................71Example Using Trust Levels and Validation Rules Together.........................................................72

ter 5: Configuring and Tuning Match Rulesbout Matching ...............................................................................................................................................76

Before You Start Defining Your Match Rules..................................................................................76Steps in the Match Process ..................................................................................................................76

opulations .......................................................................................................................................................77okens for Match Keys .................................................................................................................................77

Determining When to Tokenize Your Data......................................................................................78Match Key Widths.................................................................................................................................79Match Key Types and Mixed Data .....................................................................................................80

earch Strategies ..............................................................................................................................................80atch Purposes ...............................................................................................................................................81

Using the Match Purposes to Match People .....................................................................................82Using the Match Purposes to Match Organizations ........................................................................82Using the Match Purposes to Match Addresses ...............................................................................82Name Formats .......................................................................................................................................82Field Types Used in Purposes .............................................................................................................83Match Levels ..........................................................................................................................................84

efining and Testing Your Match Rules .....................................................................................................85About Testing.........................................................................................................................................86

atching Best Practices..................................................................................................................................86xact Match Column Properties...................................................................................................................87

Null Match..............................................................................................................................................87Segment Match.......................................................................................................................................88Using Matching on Dependent Tables ..............................................................................................91

etting Match Batch Sizes ..............................................................................................................................91

vi Sip

Using Dynamic Match Analysis Threshold................................................................................................. 92Tuning Match for Performance .................................................................................................................... 92About Merging ................................................................................................................................................ 94

ChapAB

A

ChapAS

M

Jerian Hub XU Implementers Guide

ter 6: Implementing Hierarchy Managerbout Hierarchy Manager ............................................................................................................................. 96efore You Begin Implementing Hierarchy Manager............................................................................... 97

Defining Your Goals ............................................................................................................................ 97Understanding the Data ....................................................................................................................... 97Assembling the Team........................................................................................................................... 98Determining Resources ........................................................................................................................ 98

bout Implementing a Hierarchy Manager System .................................................................................. 98Step 1: Analyze Your Data .................................................................................................................. 99Step 2: Build the Data Model ............................................................................................................ 102Step 3: Configure Your Hierarchy Manager Implementation...................................................... 102Step 4: Load Data................................................................................................................................ 102

ter 7: Scheduling Batch Jobs and Batch Groupsbout Scheduling Siperian Hub Batch Jobs............................................................................................. 104etting Up Job Execution Scripts ............................................................................................................... 104

Metadata in the C_REPOS_TABLE_OBJECT_V View............................................................. 104Identifiers in C_REPOS_TABLE_OBJECT_V............................................................................ 106Determining Available Execution Scripts ....................................................................................... 107Retrieving Values from C_REPOS_TABLE_OBJECT_V at Execution Time ....................... 107Running Scripts Asynchronously...................................................................................................... 108

onitoring Job Results and Statistics ........................................................................................................ 108Error Messages and Return Codes................................................................................................... 108Job Execution Status .......................................................................................................................... 108

ob Scheduling Reference ............................................................................................................................ 111Alphabetical List of Jobs.................................................................................................................... 111Autolink Jobs ....................................................................................................................................... 112Auto Match and Merge Jobs ............................................................................................................. 113

Automerge Jobs ...................................................................................................................................115BVT Snapshot Jobs.............................................................................................................................116Generate Match Token Jobs..............................................................................................................118Key Match Jobs....................................................................................................................................120

S

D

ChapA

AContents vii

Load Jobs ..............................................................................................................................................121Manual Link Jobs.................................................................................................................................123Manual Unlink Jobs.............................................................................................................................125Match Jobs............................................................................................................................................127Match Analyze Jobs.............................................................................................................................128Match for Duplicate Data Jobs .........................................................................................................130Stage Jobs..............................................................................................................................................131Unmerge Jobs.......................................................................................................................................133

cheduling Batch Groups.............................................................................................................................137About Batch Groups...........................................................................................................................137Stored Procedures for Batch Groups ...............................................................................................138

eveloping Custom Stored Procedures for Batch Jobs..........................................................................145About Custom Stored Procedures ....................................................................................................145Required Execution Parameters for Custom Batch Jobs ..............................................................145Example Custom Stored Procedure .................................................................................................146Registering a Custom Stored Procedure ..........................................................................................149

ter 8: Implementing Custom Buttons in Hub Console Toolsbout Custom Buttons in the Hub Console ............................................................................................151

How Custom Buttons Appear in the Hub Console.......................................................................152What Happens When a User Clicks a Custom Button..................................................................154

dding Custom Buttons...............................................................................................................................155Writing a Custom Function ...............................................................................................................155Controlling the Custom Button Appearance ..................................................................................159Deploying Custom Buttons ...............................................................................................................159

viii Siperian Hub XU Implementers Guide

Preface

Welcome to the Siperian Hub Implementers Guide. This guide explains how to design and implement your Master Reference Manager (MRM) system.

This guide has been written for database administrators, system administrators, data stewards, application developers, and other members of an MRM implementation team who are responsible for MRM implementation and configuration tasks. To learn more, see Intended Audience on page x.

You must be familiar with the platform on which Siperian Hub is installed. If that platform is Windows, then you must also have knowledge of Microsoft Windows Component Services, which is required for Siperian Hub. Database administrators must be familiar with the database environment on which they have installed MRM. Knowledge of Oracle administration is particularly important.

Other administration and configuration tasks are described in the Siperian Hub Administrators Guide and Siperian Hub Users Guide.

This guide assumes that MRM and all supporting software components have been installed. To learn more about installing MRM, see the Siperian Hub Installation Guide for your platform.ix

Intended Audience

x Sipe

Intended AudienceThis guide is intended for the following audiences:rian Hub XU Implementers Guide

Audience Description

MRM Implementers Those responsible for designing, developing, testing, and deploying MRM according to the requirements of the organization. All of the chapters in this book are recommended for implementers.

Hierarchy Manager Implementers

Those responsible for designing, developing, testing, and deploying Hierarchy Manager according to the requirements of the organization. See Chapter 6, Implementing Hierarchy Manager.

Data Stewards Custodians of data quality. In Siperian terms, data stewards are the people responsible for reviewing and, where necessary, correcting and manually merging business data on a regular and ongoing basis. While the primary resources for data stewards is the Siperian Hub Users Guide, data stewards will also find the following chapters useful: Chapter 1, Introducing Siperian Hub Implementation Chapter 2, Analyzing Data Chapter 3, Designing the Data Model Chapter 4, Using Trust Settings and Validation Rules

Siperian Administrators IT people responsible for configuring or updating a Hub Store so that it provides the rules and functionality required by the data stewards. While the primary resource for administrators is the Siperian Hub Administrators Guide, administrators will also find the following chapters useful: Chapter 5, Configuring and Tuning Match Rules Chapter 4, Using Trust Settings and Validation Rules

Contents

ContentsThis guide contains the following chapters:xi

Chapter 1, Introducing Siperian Hub Implementation

Introduces the overall Siperian Hub implementation process and describes key concepts you need to understand before starting a Siperian Hub implementation project.

Chapter 2, Analyzing Data

Describes activities involved with analyzing data for a Siperian Hub implementation project.

Chapter 3, Designing the Data Model

Describes what implementers need to know need before building the data model for a Siperian Hub implementation project.

Chapter 4, Using Trust Settings and Validation Rules

Provides a brief overview of how trust settings and validation rules work together, best practice recommendations, and examples.

Chapter 5, Configuring and Tuning Match Rules

Describes how to use and tune match rules.

Chapter 6, Implementing Hierarchy Manager

Describes concepts, methodology, design patterns, and other information that implementers need to know before beginning a Hierarchy Manager (HM) implementation project.

Chapter 7, Scheduling Batch Jobs and Batch Groups

Explains how to schedule Siperian Hub batch jobs using job execution scripts.

Chapter 8, Implementing Custom Buttons in Hub Console Tools

Explains how to add custom buttons to tools in the Hub Console that allow users to invoke external services on demand.

Learning About Siperian Hub

xii Sip

Learning About Siperian HubSiperian Hub Documentation Navigatorerian Hub XU Implementers Guide

The Siperian Hub Documentation Navigator directs you to the books in the Siperian Hub documentation that are most useful to you based on your role.

Siperian Hub Installation Guide

The Siperian Hub Installation Guide for your platform explains how to install Siperian Hub and Cleanse Match Server. There is a Siperian Hub Installation Guide for each supported platform.

Siperian Hub Release Notes

The Siperian Hub Release Notes contain important information about this release of Siperian Hub. Read the Siperian Hub Release Notes before installing Siperian Hub.

Whats New in Siperian Hub

Whats New in Siperian Hub provides an enhanced description of the new features for this release.

Siperian Hub Tutorial

The Siperian Hub Tutorial walks you through various Siperian Hub implementation tasks on a step-by-step basis.

Siperian Hub Administrators Guide

The Siperian Hub Administrators Guide explains how to configure, administer, and manage a Siperian Hub implementation. It provides a description of the Siperian Hub platform through a discussion of Siperian Hub concepts, services, tools, and databases. Administrators should read the Siperian Hub Administrators Guide first.

Learning About Siperian Hub

Siperian Hub Users Guide

The Siperian Hub Users Guide explains how to use Siperian Hub. It provides a description of the Siperian Hub platform through a discussion of Siperian Hub xiii

concepts and tasks. Data stewards and users who are new to Siperian Hub should read the Siperian Hub Users Guide first.

Siperian Hub Implementers Guide

The Siperian Hub Implementers Guide explains how to design, implement, test, and deploy a Siperian Hub implementation. Implementers must be familiar with the content of the Siperian Hub Administrators Guide as well as the Siperian Hub Implementers Guide before starting a Siperian Hub implementation.

Siperian Services Integration Framework Guide

The Siperian Services Integration Framework Guide explains how to use the Siperian Hub Services Integration Framework (SIF) to integrate Siperian Hub functionality with your applications and how to create applications using the data provided by Siperian Hub. SIF allows you to integrate Siperian Hub smoothly with your organization's applications.

Siperian Training and Materials

Siperian provides live, instructor-based training to help you become a proficient user as quickly as possible. From initial installation onward, a dedicated team of qualified trainers ensure that your staff is equipped to take advantage of this powerful platform. To inquire about training classes or to find out where and when the next training session is offered, please visit our web site or contact Siperian directly.

Contacting Siperian

xiv S

Contacting SiperianTechnical support is available to answer your questions and to help you with any problems encountered using Siperian products. Please contact your local Siperian iperian Hub XU Implementers Guide

representative or distributor as specified in your support agreement. If you have a current Siperian Support Agreement, you can contact Siperian Technical Support:

We are interested in hearing your comments about this book. Send your comments to:

Method Contact Information

World Wide Web http://www.siperian.com

E-Mail [email protected]

Voice U.S.: 1-866-SIPERIAN (747-3742)

by E-Mail: [email protected]

by Postal Service: Documentation ManagerSiperian, Inc.1820 Gateway Dr., Suite 109 San Mateo, CA 94404

In

1

troducing Siperian Hub Implementation

This chapter introduces the overall Siperian Hub implementation process and describes key concepts you need to understand before starting a Siperian Hub implementation project. It provides a framework and methodology for implementing Siperian Hub in a Siperian customer environment. This framework is intended to help with implementation planning in conjunction with the particular requirements of your Siperian Hub implementation. Although every Siperian Hub implementation is unique in specific ways, certain principles, patterns, and best practices can apply generally across most Siperian Hub implementations.

Before you attempt to implement your Siperian Hub system, you should be intimately familiar with Siperian Hub and proficient in using the Siperian Hub tools. To learn more about using Siperian Hub, read through the following documents: Siperian Hub Users Guide

Siperian Hub Administrators Guide

Chapter Contents Siperian Implementation Methodology

Roles in a Siperian Hub Implementation Project

Phases in an Siperian Hub Implementation Project1

Siperian Implementation Methodology

2 Sipe

Siperian Implementation MethodologyThe Siperian implementation methodology provides a comprehensive set of procedures, guidelines, best practices, templates, and checklists for implementing the

Red

Corrian Hub XU Implementers Guide

Siperian Hub in a customer environment. It is intended to provide project teams with the flexibility to tailor an implementation project to meet their specific needs, while still providing the structure and guidance required to successfully implement Siperian Hub.

ucing Project Risk

The main focus of the Siperian implementation methodology is to reduce project risk by: Standardizing the approach to implementing Siperian solutions through the use of

best practices and templates

Applying a risk avoidance-based scheduling approach to all project plans so that high-risk components of the project plan are completed as early as possible

Including checkpoint review processes to help keep projects on track

Providing sufficient knowledge transfer of Siperian products and implementation methodology, along with associated skills, to customers and implementation partners

e Principles

The Siperian implementation methodology is deliverables-based, not time-based. Deliverables are produced by specific activities that are grouped into five gated phases (described in Phases in an Siperian Hub Implementation Project on page 6). Gated phases mean that the project needs to pass through a checkpoint gate (a specific review process) before any activities for the next phase can begin.

The objective of checkpoint gate reviews is not to enforce a rigid waterfall methodology in which everything must be completed, approved, and signed off before any activities in the next phase can begin. Used on its own, the Siperian implementation methodology allows for overlap between phases, with as much concurrency as possible, without exposing the project to unacceptable risk. The checkpoint gate reviews determine whether a sufficient portion of the deliverables

Siperian Implementation Methodology

from the current phase have been delivered with acceptable quality before the phase can be considered complete.

The Siperian implementation methodology can be used on its own or it can be Introducing Siperian Hub Implementation 3

incorporated into many other methodologies, such as PMBOK, Prince2, Iterative, Waterfall, RAD, and others (including your own in-house methodology). If you do incorporate the Siperian implementation methodology into your enterprise project management methodology, then your approach to starting a new phase will be determined by the guidelines of your particular enterprise project management methodology.

The Siperian implementation methodology is a project-based methodology that is based on the following principles: A project is a temporary and unique endeavor.

A project has a start date and an end date.

A project has a specific scope that is constrained by time, cost, and quality.

A project contains risk that must be managed.

The final goal of any project implemented under the guidelines of the Siperian implementation methodology is to deliver a fully configured, tested, and deployed Siperian Hub environment with the appropriate levels of project documentation.


4 Sipe

Roles in a Siperian Hub Implementation ProjectA Siperian Hub implementation project usually involves the following roles, various of which might be held by customers, Siperian, or a third-party integrators.rian Hub XU Implementers Guide

Typical Roles in a Siperian Hub Implementation Project

Role Responsibilities

Customer Project Manager

Manages the overall project, including: Provides day-to-day project management, planning, and tracking Ensures that all issues and change requests have been

communicated/resolved in a timely manner Defines and communicates resource needs Provides best practices and program management guidance Assists in requirements definition

Technical Lead Primary technical representative on project team Participates in analysis, design, and testing activities Manages Master Data design and implementation, including:

Data ModelingBusiness RulesData LoadsRules TuningConsolidation QAPackage/View Configuration

Database Administrator

Configures the database for Siperian Hub Sets up the Hub databases Works with the Solution Architect during Hub database

performance testing and tuning

System Administrator

Configures the required hardware and infrastructure software

Solution Architect Provides expert advice, counsel, and technical expertise to the project team to help assure that Siperian solutions are designed and developed in the optimal manner and in accordance with industry and Siperian best practices

Hub Builder Assists with Siperian Hub design, development, testing, and deployment


EAI Specialist Provides the design and development of EAI programs

Typical Roles in a Siperian Hub Implementation Project (Cont.)

Role ResponsibilitiesIntroducing Siperian Hub Implementation 5

The distinctions here are fluid and project-dependent. For a given Siperian Hub implementation project, a single team member might be responsible for multiple roles, and a single role might be shared among multiple team members.

ETL Specialist Provides the design and development of ETL programs/modules

Web Services Specialist

Provides the design and development of Web interface applications

Checkpoint Reviewer

Provides an independent review of designs and deliverables at key junctures in the project to help assure the quality of the end product

Phases in an Siperian Hub Implementation Project

6 Sipe

Phases in an Siperian Hub Implementation ProjectA Siperian Hub implementation can be broken down into five distinct phases: Discover Phase

Discrian Hub XU Implementers Guide

Analyze Phase

Design Phase

Build Phase

Deploy Phase

Each phase has specific activities and deliverables.

Note: A sixth phase, the management of steady-state processes for supporting the environment post-deployment, is outside the scope of this document.

over Phase

The Discover phase initiates the implementation project and includes the following activities: Identifying the overall vision driving the need for the project

Analyzing the high-level requirements for the project


Defining scope restrictions for the project

Defining the high-level solution architecture

Project planning and costing, along with all underlying assumptions

Ana

DesIntroducing Siperian Hub Implementation 7

Assessing project risk and defining risk mitigation strategies

Defining service level agreements (SLAs) for key systemic qualities, such as scalability, high availability, and performance

Note: Describing the Discover phase is outside the scope of this document.

lyze Phase

The Analyze phase involves refining the analysis of the system requirements, including: Detailed source data analysis

Detailed requirements definition

Detailed gap analysis

Evaluation and acquisition of any third party solutions

Refining the solution architecture

ign Phase

The Design phase focuses on translating the requirements of the analyze phase into concrete designs that can be implemented and tested in the build phase. It includes Data modeling

Interface design

Definition of business rules for cleansing, matching, merging, and maintaining data

Codification of standards and conventions

Definition of test cases


8 Sipe

Build Phase

The Build phase focuses on the following activities in a development environment: Siperian Hub installation and setup

Deprian Hub XU Implementers Guide

Configuring Siperian Hub to implement the data model and rules defined in the design phase

Fine-tuning the rules

Developing any interfaces between Siperian Hub and the source and target systems

Security and rules configuration

Testing the interfaces and rules

loy Phase

The Deploy phase involves: Deploying the fully built, tested, and accepted solution into a production

environment

Wrapping up the project

Handing the system over to the appropriate system support team

Training

Get2Analyzing Data

This chapter describes activities involved with analyzing data for a Siperian Hub implementation project.

Chapter Contents Getting Started

Defining the Flow of Data Between Siperian Hub and Source/Target Systems

Consider Data Cleansing and Standardization Rules

Consider Trust Levels and Validation Rules

Consider Match Rules

ting StartedA critical early step in a Siperian Hub implementation project is to gain a thorough understanding of the data that you are integrating. For example, for each data source, you must know the datas relative accuracy, structure, size, trends in the data, the amount of data, the expected growth of the data set, and any other characteristics that are peculiar to the data.

Data analysis is performed in the Analyze phase. The Analyze phase follows the Discover phase, during which a high-level data analysis is performed in order to identify any data issues or gaps that could impact project scope, timeline, costs, or risks. The Analyze phase includes both data analysis and business and functional 9

requirements analysis. Data analysis and requirements analysis tend to happen in


10 Sip

parallel with each other. The findings from data analysis often impact the requirements specification, and vice versa. However, data analysis is not dependent on requirements analysis.

DefSou

Deteerian Hub XU Implementers Guide

ining the Flow of Data Between Siperian Hub and rce/Target Systems

Data analysis begins by determining the source systems that will feed data into MRM. You must know exactly what data is comingand where it is coming fromby understanding what sources feed data into Siperian Hub, as well as what target systems are fed updates from Siperian Hub. At a high-level (in the Discover phase), it is just a system-level bubble diagram. By the time the technical design document is completed in the Design phase, it has evolved to the level of specific files or tables.

rmine Data Source Characteristics

For each data source, consider the following tasks: For each data source, determine the size, data type, data age, quality, quantity,

source, and any other characteristics that are peculiar to the data set.

Determine any data quality issues.

Check the primary keys that are available in the data.

Gain an understanding of the data cardinalitybetween entities, as well as consolidation cardinality.

Determine total data volumes, expected delta volumes, and load frequencies per source.

Identify any special initial data load requirements for the system.

Analyze data for invalid conditions, and then perform frequency analysis to determine how often those conditions occur per source.

Differentiate between invalid data conditions that can or cannot be remedied through data cleansing. The latter data conditions are the ones that should be considered in defining trust and validation rules.

It is important to identify what is the more correct data, not just the more correctly formatted data.


Consider which external systems, including source systems, should be updated when data changes in a base object. For example, you might want to update the CRM system whenever a customers address gets changed. Message queue triggers can be configured in the Hub Console so that data changes can be published to

Ass

ConAnalyzing Data 11

outbound message queues for retrieval by external systems. To learn more, see the Siperian Hub Administrators Guide.

emble a Statistically Representative Sample Data Set

To assist in data analysis, assemble a complete, diverse, but statistically representative sample of your production data from each source system. This sample should contain various types of non-identical duplicates. The more closely the sample data reflects the typical characteristics of the production data set, the more useful it will be. Having a sample data set is an invaluable resource for designing, configuring, and testing match rules.

sider Data Sizing

Developing detailed knowledge about data sources provides the basis for correctly sizing your MRM implementation. Consider the following factors: data volumenumber of rows, size of rows, large data sets, amount of raw data,

ratio of raw to consolidated records, how matchy the data is

data volatilitythe frequency of updates to the data within the source system

load frequencyhow often this data will be brought into MRM to update the master records

data modelnumber of base objects

history retention and audit requirements

number of source systems

match rules

performance requirements, if applicable

Consider Data Cleansing and Standardization Rules

12 Sip

Consider the Relationship Between Data and Business Processes

Con

Conerian Hub XU Implementers Guide

It is essential to understand the importance of: each columns data to the business processes and business users that produce it.

the quality of the data capture processes and data validation processes in each source system

how closely aligned is your use of the data to the purposes of the people with whom the data originates (closer alignment is more reliable)

sider Data Cleansing and Standardization RulesWhen analyzing data, consider source attributes that would benefit from data cleaning via the use of data cleansing and standardization rules. Cleanse lists are intended to facilitate data conversion during the staging process to ensure that the data that ends up in the staging table is in a standardized, consistent format. For each source, the appropriate transformation from source specific codes to the standard codes can be achieved with a cleanse list maintained in MRM. This will also enable the base objects to contain the actual standardized code values (as opposed to the Rowid_Object pointing to the standard code value).

If cleanse lists are used to standardize codes, then a lookup table can be set up in MRM for each code to validate the code during data loading, ensuring that any record containing an erroneous code for which there is not a cleanse list entry does not get propagated into the base objects.

sider Trust Levels and Validation RulesDuring the analysis and design phases of a project, it is important to identify the factors affecting the trust levels of your source data, and to determine what validation rules need to be implemented. Although configuring trust levels occurs later in the Siperian Hub implementation process, you should begin thinking about trust level settings and validation rules during data analysis. As you analyze the data, you learn more about its varying levels of accuracy. This knowledge contributes to the trust rules design.

Consider Trust Levels and Validation Rules

The quality of the data (as defined by the relative importance of the source system and the relative quality of the data coming from that source system) is the main factors in determining trust settings. If you find during data analysis that some data is typically erroneous, then you probably want to give it a lower trust score.

TrusAnalyzing Data 13

To learn more about defining trust settings and validation rules, see Chapter 4, Using Trust Settings and Validation Rules. For more information on using the MRM tools to set trust levels, see the Siperian Hub Administrators Guide.

t Levels

In MRM, the Siperian Trust Framework ensures that its consolidated records, at the cell level, contain the most reliable information available from the data sources. Trust is a mechanism for measuring the confidence factor associated with each cell based on its source system, change history, and other business rules. Trust takes into account the validity of the data, the age of the data, and how much its reliability has decayed over time. For more information about trust settings, see Using Trust Levels on page 52.

Trust is assigned at the column level. It can be specified, for example, that Source System 1 is more reliable for customer name but Source System 2 is more reliable for phone number. There are several parameters that can be set to assign Trust for each source systems column, such as: Maximum (initial) Trust level for a new data value

Minimum Trust level for an old data value

Decay Period or length of time that the trust level takes to decay from the Maximum Trust to the Minimum Trust

Decay Type or the shape of the decay curve (a straight line or a curve)

For example, the Email Address from a Web application might be assigned Maximum Trust of 80, Minimum Trust of 20, Decay Period of 1 year, and Decay Type of SIRL (Slow Initial, Rapid Later), indicating a curve that decays gently at first and more rapidly later.

In addition to internal data sources, consider data sources that are not controlled within your organization. For example, suppose your organization purchases data sets


14 Sip

from a third-party provider. These data sources might be guaranteed to consist of unique records with a high level of accuracy. Accordingly, you could decide to designate a high level of trust for this data.

Vali

Conerian Hub XU Implementers Guide

dation Rules

A validation rule tells Siperian Hub the condition under which a data value is not valid. If data meets the criterion specified by the validation rule, then the trust value for that data is downgraded by the percentage specified in the validation rule. To learn more about validation rules, see Using Validation on page 65.

Here are some examples of validation rules: Downgrade trust on Last Name if length(last_name) < 3 and last_

name NG

Downgrade trust on middle_name if middle_name is null Downgrade trust on Address Line 1, City, State, Zip and Valid_

address_ind if Valid_address_ind= False

If the Reserve Minimum Trust flag is enabled (checked) for a column, then the trust cannot be downgraded below the columns minimum trust setting.

sider Match RulesAlthough configuring match rules occurs later in the Siperian Hub implementation process, you should begin thinking about match rules during data analysis because the data analysis will turn up data characteristics that govern the match rules. Therefore, as you analyze data, do so with match rules in mind.

During data analysis, identify which columns are appropriate for matching. For example, if a gender column is null 80% of the time, then this column is probably not a column to use in a match rule. Similarly, investigate the distribution of data so that you can assess in advance how selective a match rule needs to be for certain columns.


To learn more about defining match rules, see Chapter 5, Configuring and Tuning Match Rules. For more information on using the MRM tools to configure match rules, see the Siperian Hub Administrators Guide.Analyzing Data 15


16 Siperian Hub XU Implementers Guide

3Designing the Data Model

This chapter describes what implementers need to know need before building the data model for a Siperian Hub implementation project. It is recommended for all implementers and anyone else who must understand the Master Reference Manager data model. To learn more about the data model, see the Siperian Hub Administrators Guide.

Note: This chapter assumes that the reader is familiar with conventional data modeling methodologiesit supplements conventional data modeling techniques with MRM-specific recommendations.

Chapter Contents About Data Modeling for MRM

Design Principles

Design Patterns17

About Data Modeling for MRM

18 Sip

About Data Modeling for MRMData modelers and design consultants responsible for defining the data model for MRM require expertise in relational modeling at the conceptual, logical, and physical

Dataerian Hub XU Implementers Guide

levels. The following sections introduce the various types of models necessary to develop a Siperian Hub implementation: Data Model Design Deliverables

Conceptual Model

Logical Model

Physical Model

Model Design Deliverables

The process of designing the data model for consolidated reference data for a Siperian Hub implementation involves a series of deliverables. The following figure shows the major phases of the Siperian implementation methodology, along with the data model delivered in each phase.

The design starts with a conceptual model, which identifies the main objects to be managed in MRM. It also identifies which objects will be consolidated, because match criteria ultimately drive modeling decisions for the physical model.

The conceptual model is used as the starting point for the logical model, which provides a logical representation of the entities and attributes to be managed in MRM.

The logical model is transformed into a physical model, which is the model that is then defined in MRM using the Schema Manager in the Hub Console. Transitioning from a logical model to an ideal MRM physical model involves design principles that are described in Design Principles on page 27 later in this


chapter. The physical model is the final output from the data modeling design steps, and it is the model that the business and system owners need to approve.

The following figure shows the increasing level of detail and number of entities in

ConDesigning the Data Model 19

conceptual, logical, and physical models.

ceptual Model

The purpose of the conceptual model is to identify and describe the main objects needed to create a global business view of the data, with little detail. This step is often skipped in typical IT projects, or it might be combined with the logical model. However, for Siperian Hub implementations, it is very important to go through this step because it starts the process of thinking about match requirements, which impact the physical model design.

The conceptual model for a Siperian Hub implementation shows the business entities that will need to be managed in MRM, along with the relationships among the business entities and some high-level design properties. If you have worked with entity relationship diagrams (ERDs), the conceptual model might look similar. To facilitate logical and physical (or logical to physical) data model design, the Match and Merge and Intertable Match Parent properties are the most critical properties to identify (to learn more, see the Siperian Hub Administrators Guide). One approach is to begin with the worst case match scenario, determine the elements in the token match table, and then trim this down to the tables that would be realistically used for matching.


20 Sip

The following figure shows an example of a conceptual data model.

Logerian Hub XU Implementers Guide

The conceptual model must be derived from the system requirements, with inputs from analyses of internal and external business system data sources.

Note: For some projects, a pre-existing logical data model might be available. In such cases, it is still important to create a conceptual data model to ensure that you have identified the Match and Merge requirements that can have a significant impact on the subsequent physical data model.

ical Model

The purpose of building a logical model is to confirm that the application will satisfy the business requirements. A logical model represents the entities, relationships, and attributes that are representative of the business information needs. A logical model is usually a normalized model. Normalization is the process of determining stable attribute groupings in entities with high interdependency and affinity.

By defining entities, attributes, and their relationships, you might discover data model design flaws that could produce anomalies. Data flaws include: Missing entities

Multiple entities that represent the same conceptual entity


Many-to-many relationships that need additional entities to resolve the many-to-many relationship by creating an intersection table, thus turning the many-to-many relationship into two one-to-many relationships.

Multivalued and redundant attributesDesigning the Data Model 21

Example Logical Model with Design Flaws

The following figure shows an example of a logical model that has some design flaws.

This logical model is based on the previous conceptual model example shown in the figure in Conceptual Model on page 19. It has the following design flaws:1. Affiliation Role probably needs a Lookup table to define the different types of

Affiliation Roles (missing entity).

2. Repeating attributes (phone numbers, fax numbers, email addresses) can be normalized into an Electronic Address entity.


22 Sip

Example Logical Model with Fixed Design Flaws

The following figure shows the logical model after it has been fully normalized and missing entities have been added.erian Hub XU Implementers Guide

The logical model includes the following new entities:3. An Electronic Address entity has been added to handle the repeating phone and

fax number attributes (which have therefore been removed from the Customer Address intersection table).

4. An Electronic Address Type table has been added to provide definitions for the types of electronic address represented in each record.

5. An Affiliation Role lookup table has been added.


Pre-Existing and New Logical Models

Before considering how the logical model will transition to a physical model, it is important to get the logical model right. In some Siperian Hub implementations, a Designing the Data Model 23

pre-defined logical model is available. In such situations, you still need to evaluate the logical model to make sure that: it meets the stated business needs

it makes sense logically

the entities and attributes in the logical model can be populated from the source systems (there is little point in modeling entities or attributes that cannot be populated from the source systems)

The pre-existing logical model might not be tuned to work particularly well in MRM. Therefore, you will need to determine how to transition that logical model to a suitable physical model.

In other Siperian Hub implementations, you will need to define the logical model from scratch. In such cases, the logical model can be defined in a way that suits the business needs and is more closely aligned with the models for which MRM is tuned.

Objects in the Logical Model

When modeling for MRM, the logical model must focus on the actual entities that will be defined in MRM as base objects or dependent objects.Objects in the Logical Model

Type of Object Description

Base Objects Used to describe central business entities, such as Customer, Product, or Employee. In a base object, data from multiple sources can be consolidated or merged. Trust settings are used to determine the most reliable value for each base object cell. In addition, one-to-many relationships (foreign keys) can be defined between base objects.

Dependent Objects Used to store detailed information about the rows in a base object (such as header-detail relationships). One row in a base object table can map to several rows in a dependent object table.


24 Sip

You do not model history, cross-references, and so on, as MRM automatically creates and manages these structures for you. In addition, avoid adding landing tables or staging tables to the logical model, because they clutter the model unnecessarily. You can model landing tables as part of the physical model.

Phyerian Hub XU Implementers Guide

Remember that the logical model is not an enterprise-wide data model. The logical model is a model for reference data only, and it is usually only for a specific subset of the reference data (such as Customer data). Similarly, do not include transaction data in the logical model, and limit the model to the reference data that is to be managed in MRM. Finally, bear in mind that the physical modelnot the logical modelis the actual model that you will implement for MRM.

sical Model

The physical model is the actual model that you define using the Schema Manager in the Hub Console (to learn more, see the Siperian Hub Administrators Guide). It is thus a subset of the complete physical schema that will be generated by MRM. The physical model diagram shows the base objects, dependent objects, and landing tables to be implemented in MRM.

The rule of thumb for physical model diagrams is to show the user-defined entities and attributes, plus the primary and foreign keys, so that relationships can be modeled correctly. In the physical model, avoid showing MRM-generated entities or attributes other than primary and foreign keys. All supporting tablessuch as cross-references, history tables, control tables, and staging tableswill be created by MRM and therefore are not included in the physical model diagram.

MRM is flexible enough to implement any logical model as a physical model, but it is tuned to work better with some types of models than with others. Performance is the main driver for differences between the logical model and the physical model. Before you develop a physical model for a Siperian Hub implementation, you must carefully review your logical model in light of its performance implications. An ideal physical model for MRM is a balance between a completely denormalized model (best performance) and highly normalized (best flexibility).


The following figure shows an example physical model based on the logical model described previously. Designing the Data Model 25

Notice that all of the entities defined in the logical model will be implemented as base objects and that ROWID_OBJECT is used for all primary keys. In addition, notice that the many-to-many relationship between the Customer and Address entities in the logical model has been changed to a one-to-many relationship in the physical model. The reasons for these changes will be explained in the Design Principles on page 27 section later in this chapter.

When designing the physical model, consider the following factors: Required Functionality

Performance and Scalability

Flexibility for Future Use

Siperian Product Roadmap


26 Sip

Required Functionality

Required functionality is one of the key factors affecting design decisions in the physical model. Some examples of functionality requirements include:erian Hub XU Implementers Guide

If you must keep a history of changes to attributes of an object, then define that object as a base object.

Performance and Scalability

A completely denormalized model gives the fastest performance, particularly for merge and unmerge, as there are fewer child tables to be updated on merge or unmerge. However, a completely denormalized model limits both flexibility and functionality.

The more denormalized the model, the fewer levels of consolidation are available, and the more difficult it can be to add new data sources and new attributes or entities in the future. You must therefore find a balance between modeling for performance (denormalizing) and modeling for functionality/flexibility (normalizing). You should not denormalize simply for the sake of denormalizingthere are some areas that are better to denormalize than others, as they yield the most performance benefit with the least functionality/flexibility loss. These issues are discussed in detail in the Design Principles on page 27 section later in this chapter.

Flexibility for Future Use

When defining the physical model, it is important to keep possible future requirements in mind, but without adding entities or attributes that cannot yet be maintained or that that are not yet fully understood. Sometimes building in system flexibility is as simple as naming things flexibly. For example, if you are building a Customer master for Organization data and you know that the plan is to add Person data to that Customer Master within the next year, then consider using a name other than Organization (such as Business Party) for the Customer table because the table may well end up containing both Organization and Customer data.

Be wary of adding physical limitations that might later cause problems. One example of this is specifying user-defined unique keys on base objects. If you define a unique key on a base object, you cannot merge records in that base object. Although this might not be a problem in the initial implementation of a project, it is not uncommon for

Design Principles

new sources that are later added to a system will bring their own values for the base object with the unique key, making it desirable to use match and merge functionality to consolidate the new systems data with that of the original systems data.

DesDesigning the Data Model 27

Siperian Product Roadmap

An optimal physical design for a Siperian Hub implementation takes into account what is known of future requirements, the Siperian product roadmap, and the intersection between them. If you have any questions about how your model relates to the Siperian product roadmap, arrange (through Siperian Support) for a data model review with Siperian Solutions Delivery and Engineering.

If you model types of objects (such as households) or types of relationships that are not discussed in this document, then you should review the data model with your Siperian Solution Architect to make sure that the model does not run contrary to any assumptions in MRM design, QA, or planned features. This review should be conducted as part of the data model checkpoint review that should already be built into your project plan.

ign PrinciplesThis section describes some underlying design principles for transitioning from a highly normalized logical model to a physical model. Principle 1: Consider Deep Versus Wide

Principle 2: Match Requirements Drive the Model

Principle 3: Consolidation Counts

Principle 4: Pass the Independence Test

Principle 5: Mix Different Types of Customers Carefully

Principle 6: Landing and Staging Data

Design Principles

28 Sip

Principle 1: Consider Deep Versus Wide

This design principle refers to the number of direct child tables linked to a parent table. The following figure shows the two different types of designs.erian Hub XU Implementers Guide

This principle applies when you want to merge or unmerge on the parent table. The design principle mainly affects performance of the merge and unmerge processes.

The more directly-linked child tables that a parent table has, the more those tables must have foreign key references updated when records merge in the parent table. Therefore, the more child tables a parent table has, the slower will merges for the parent table be.

This principle applies to the unmerge process as well. For unmerges in a deep model, consider how far you allow unmerges to cascade. Which child tables need to have cascade unmerge enabled? How many child tables deep should you choose to enable the Unmerge on Parent Unmerge flag? The more child tables you have with merged records and the Unmerge on Parent Unmerge flag enabled, the more work the unmerge needs to do, and therefore the slower the unmerge process.

Design Principles

Principle 2: Match Requirements Drive the Model

Match criteria also drive physical data model decisions with respect to functionality. Intertable match criteria involves the use of attributes from one table in the match rules Designing the Data Model 29

of a related tablefor example, matching customers using address information from the Address table. For more information, see Address Example on page 31.

Another area in which required match functionality can affect the physical model is the way in which match rules must be defined. If you need to define an AND match rule, you need to denormalize repeating attributes that are to be used in the match rule. Normalizing repeated attributes into a child table allows OR match rules on the normalized attributes, not AND match rules.

For example, if you create an Electronic Address table that contains phone numbers and e-mail addresses, you can use these in a match rule that identifies records as matching if their phone numbers are the same OR if their email addresses are the same. If you need a match rule that identifies records as matching if their phone numbers match AND their email addresses match, then you need to denormalize these into separate columns.

The following figure shows an example of a normalized Electronic Address table that supports OR match conditions only.

This Electronic Address table supports match rules in which phone numbers matched OR e-mail addresses matched. In the example shown, Customer IDs 12345, 45678, and 00001 would all be identified as matches for one another because of their matching phone numbers.

Design Principles

30 Sip

The following figure shows an example of denormalized attributes to support AND match conditions.

Prinerian Hub XU Implementers Guide

Logically, this table shows the same data as in the normalized Electronic Address table, but the physical structure has been denormalized to support match conditions that specify AND criteria. In this example, Customer IDs 12345 and 45678 would match because their phone numbers match AND their email addresses match. Customer ID 00001 would not be considered a match for the other two records because it has a different e-mail address. For more information, see Communication Channel Models on page 46.

ciple 3: Consolidation Counts

The physical model must take into account the required results after consolidation and, particularly, the desired cardinality of base object to cross-references after consolidation (where cardinality is the ratio of the number of records in the base object to the number of records in the cross-reference). The physical model must also consider the effects of source updates on the surviving record. This section describes several examples to illustrate this principle.

Physician Specialities Example

A physician can have one or more specialties. Pharmaceutical companies are often interested in identifying only the primary specialty for a physician. However, when two physician records are merged from different sources, those sources might provide different values for the physician's primary specialty. If the required cardinality after merging the specialties is one surviving primary specialty, then you should include Primary Specialty as a column on the Physician base object. However, if the pharmaceutical company wants to keep all of the specialties for the merged physician record, then Physician Specialty must be a child table of the Physician table.

Design Principles

Address Example

Logically, a single address can belong to multiple customers. For example, office addresses can be shared by colleagues at the same location, or group practice addresses Designing the Data Model 31

can be shared by partners in the same law firm. Of course, a customer can also have multiple addresses. For this reason, logical models usually have customer and address as distinct entities with a many-to-many intersection table between them.

However, in a physical model for consolidated data, this approach is not necessarily practical, especially if you are trying to reduce duplication in addresses from multiple sources. Consolidating addresses when they are not directly linked to customers means that you are consolidating addresses across customers. For example, in the following figure, N.E. One and Ann Other both have the same address. If the two address records are merged, then one survived address record will remain and that record will be linked to both N.E. One and Ann Other through the Customer Address intersection table.

Avoid consolidating addresses across customers unless there is a real business need for an enterprise-wide unique ID per physical address location. Even if there is a real business need, there are other ways to model this instead. For more information, see Design Patterns on page 42.

Consolidating addresses across customers involves limiting address changes to the right customers, performance considerations, and functionality considerations.

Design Principles

32 Sip

Limiting Address Changes to the Right Customers

If one Customer changes their address, then you need to make sure that the address change is not automatically applied to the consolidated address record for all erian Hub XU Implementers Guide

customers. For example, in the figure shown in Address Example on page 31, if N.E. One moves their office, it does not mean that Ann Other has also moved their office, so the consolidated address that was previously linked to both N.E. One and Ann Other now belongs only to Ann Other.

Performance Considerations

Consolidating addresses across customers means that you usually have a high degree of cardinality between the source addresses and the resultant consolidated addresses. The higher the number of duplicate records, the more work the merge must do to process them. The cardinality is reduced if Customer ID is one of the match criteria for addressesthat is, if addresses are consolidated only within customer records, not across them. The following figure shows the recommended approach for customer address relationships.

Using this approach also reduces the number of tables that must be staged and loaded. This approach does not necessarily yield a large performance gain if your implementation involves only a handful of source systems to process. However, the more source systems that are configured to process, the higher will be the performance impact that each additional target table has on stage and load batches. For example, a Siperian Hub implementation with five sources for the previous model (shown without consolidated addresses in Business Party and Differentiated Customer Models on page 36) requires 15 stage jobs and 15 load jobs. An implementation with ten sources for that same model requires 30 stage and 30 load jobs. For the model with

Design Principles

consolidated addresses, five sources require ten stage and ten load jobs, and ten sources requires 20 stage and 20 load jobs.

Functionality Considerations

PrinDesigning the Data Model 33

Modeling customer address as a direct (one-to-many) relationship between customer and address means that customer address attributes can be stored directly on the Customer Address base object or as a child base object linked to Customer Address. As long as the attributes are part of a base object, MRM tracks their history. This approach also means that Customer can use attributes from child tables of the Customer Address table for matching.

Similarly, keeping customer address attributes in base objects means that duplicate or overlapping attribute values from multiple sources can be consolidated to get to best of breed values for those attributes.

ciple 4: Pass the Independence Test

Independent base objects are base objects that are not linked to the core consolidated object through a one-to-many or a many-to-one relationship, but are instead linked through many-to-many intersection tables. If a base object is modeled as an independent base object, then its records should make sense on their own, without any reference to the core base object. It should make sense to consolidate its records to a distinct set of values.

Steps for Testing Independence

The independence test for a physical model includes the following steps:1. Identify the core base object that is being consolidated in the Hub

StoreCustomer in a Customer Master, Supplier in a Supplier Master, and so on.

2. Look for any many-to-many relationships (direct or indirect).

3. Inspect the base object that is on the other side of the many-to-many relationship and ask the question: What can the business do with a distinct list of the things in this object without knowing who the Customer is? If the answer is Nothing, then change the many-to-many to a one-to-many relationship.

Design Principles

34 Sip

Example Using a Highly Normalized Model

The following figure shows an example of a highly normalized model.erian Hub XU Implementers Guide

In this model, Specialty, Address, and Electronic Address are all linked to the core objectCustomerthrough many-to-many relationships. You can therefore apply the independence test by asking the following questions:

Question Answer

What can the business do with a distinct list of Specialties without knowing who the Customer is?

The distinct list of Specialties can be used to provide a pick or lookup list of Specialty values in a capture screen for new Customer information. The business wants to standardize the list of Specialties it uses in reporting by assigning each source specialty to a consolidated enterprise specialty value.

What can the business do with a distinct list of Addresses without knowing who the Customer is?

In most cases, the answer to this question is Nothing. Addresses are usually meaningful only in terms of the Customer to whom the Address belongs.

What can the business do with a distinct list of Electronic Addresses (for example, telephone numbers) without knowing who the Customer is

Nothinga telephone number has no significance in its own right.

Design Principles

Converting relationships from many-to-many to one-to-many for the objects that failed the independence test would result in the model shown in the following figure.Designing the Data Model 35

Design Principles

36 Sip

Principle 5: Mix Different Types of Customers Carefully

In Siperian Hub implementations, you must be careful when mixing different types of customers.erian Hub XU Implementers Guide

Business Party and Differentiated Customer Models

This principle focuses on the consequences of implementing two different modelsa Business Party model versus a Differentiated Customer model, which are shown in the following figure.

Data modelers often prefer the Differentiated Customer model because it reduces null attributes on the Customer table (for example, the Organization Customer does not need to carry any attributes that apply only to an Individual Customer). However, there are definite advantages to using a Business Party model over a Differentiated Customer

Model Description

Business Party Model All Customer records are loaded into the same Business Party table, and an attribute on that table identifies the type or classification of the Customer records. In this example of a Business Party model, the Class of Customer attribute distinguishes Organizations from Individuals.

Differentiated Customer Model

The type or classification of the Customer records is implied by where the records are stored. In this example of a Differentiated Customer model, the Organization table holds Customers classified as Organizations, and the Individual table holds Customers classified as Individuals.

Design Principles

model, even if it does result in more null attributes on the Business Party table. Such advantages include: The Business Party model easily supports any number of chained relationships

between different classes of customers and/or the same classes of customers.Designing the Data Model 37

The Business Party model allows you to model networks, not just parent/child hierarchies.

The Business Party model provides a single unique identifier for each Customer without any chance of overlap.

The Business Party model allows you to search for Customers in one place without needing to know anything about the type of Customer.

The Business Party model allows you to identify source records that have given Customers incorrect types.

Mixing Models

In your Siperian Hub implementation, you might decide to implement a Business Party model so that you get one unique Customer identifier and you can model Customer Affiliations flexibly. If you want to avoid too many redundant/null value columns on the Business Party base object, you can use child tables to carry some of the attributes that are specific to specific sub-types of Customers. However, if you do this, you must be very careful about how you mix the Business Party and Differentiated Customer models.

Design Principles

38 Sip

The following figure shows a poor mix of these models.erian Hub XU Implementers Guide

Design Principles

The following figure shows a better way to mix these models.Designing the Data Model 39

If the merge performance is a concern, then consider using a pure Business Party model, as shown in the figure in Business Party and Differentiated Customer Models on page 36.

This is a better mix than the figure showing a poor mix of models because it simplifies the relationships between the objects and reduces the number of cross-table joins required to get the match data. The preferred model is still the full business party model shown in Business Party and Differentiated Customer Models on page 36, as that reduces the number of child tables to be maintained on merge and unmerge.

The Customer match attributes have been denormalized so that they are attributes of the Business Party base object instead of the Organization and Individual base objects. This reduces the number of cross-joins used in populating the match token.

In the better mix, all relationships have been defined at the Business Party level, making it easier to navigate and maintain the relationships. The poor mix has an uneasy mixture of relationships, with Addresses having nullable foreign keys to either Individual or Organization.

Design Principles

40 Sip

Principle 6: Landing and Staging Data

This principle considers how you design landing and staging tables in your Siperian Hub implementation.erian Hub XU Implementers Guide

Landing Table

Although we have no strong design recommendations with respect to landing tables, consider the following issues for your Siperian Hub implementation: Some implementations have used source-specific landing tables (a landing table per

source table/source file). This keeps the landing table format closer to the source format and means that the ETL process does not need to transform all sources to a standard layout, which could simplify the process of making changes for one source or adding new sources with different attributes later. However, it usually also means a very large number of landing tables, which can be tedious and cumbersome to set up.

Other implementations have used one landing table per target table, which means that the ETL needs to transform all sources for the same target to the same standard layout. This approach does allow the ETL to be standardized, making it much faster to develop and test for the first implementation (where typically a large number of sources need to be coded). It is possible that this approach also makes it more costly to maintain after initial deployment, because changes from one source could potentially affect multiple ETL mappings.

If you use one landing table per target table in your Siperian Hub implementation, then the landing table needs to include a source identifier, which must be used in filtering the data mapped to each staging table. The landing table should also have a range partition specified in Oracle to partition it according to source system, which allows partitions to be truncated before the ETL inserts data from a source, rather than having records deleted from the landing table.

Design Principles

Staging Tables

Staging tables must be based on the columns provided by the source system for the target base object or dependent object for which the staging table is defined, even if the Designing the Data Model 41

landing tables are shared across multiple source systems. If you do not make the column on staging tables source-specific, then you create unnecessary trust and validation requirements.

Trust is a powerful mechanism, but it carries performance overhead. Use trust where it is appropriate and necessary, but not where the most recent cell value will suffice for the surviving record. For more information, see Using Trust Levels on page 52.

If you limit the columns in the staging tables to the columns actually provided by the source systems, then you can restrict the trust columns to those that come from two or more staging tables. Use this approach instead of treating every column as if it comes from every source, which would mean needing to add trust for every column, and then validation rules to downgrade the trust on null values for all of the sources that do not provide values for the columns.

More trust columns and validation rules obviously affect the load and the merge processes. Also, the more trusted columns, the longer will the update statements be for the control table. Bear in mind that Oracle and DB2 have a 32K limit on the size of the SQL buffer for SQL statements. For this reason, more than 40 trust columns result in a horizontal split in the update of the control tableMRM will try to update only 40 columns at a time.

Design Patterns

42 Sip

Design PatternsThis section summarizes the following typical physical data model design scenarios and describes options for implementing them:

Houerian Hub XU Implementers Guide

Households

Addresses

Populating the Address Household Object

Communication Channel Models

seholds

A Household is a grouping of customer records according to geographic location. For example, all of the people living at one address could be considered a household, or a group of doctors practicing at one hospital could be considered a household.

Create Household as a base object that is the parent of Customer. The easiest type of household is one in which the household has no attributes of its own. It uses inter-table match to match on selected Customer match columns that usually include the Address match columns.

Design Patterns

The following figure shows an example of a logical mode for Households.

AddDesigning the Data Model 43

resses

The ideal model for addresses involves a one-to-many relationship from Business Party to Address, with Address match rules that include Business Party ID to prevent matches across different Business Parties. However, there are occasionally business cases for consolidating addresses across Business Parties, such as to get a single identifying key for all addresses for the same location, regardless of which Business Parties use that address. If there are business reasons for consolidating Addresses

Design Patterns

44 Sip

regardless of the Business Parties using the Addresses, then the following consolidated address model is recommended.erian Hub XU Implementers Guide

In this model, the Business Party Address base object consolidates the Addresses per Business Party. The Business_Party_ROWID is one of the match criteria for the Business Party Address base object, and Business Party Addresses should merge only if they have the same Business_Party_ROWID value.

The Business Party Address base object gives you the distinct set of addresses for each business party, but it does not give you a distinct set of all the addresses with a unique ID for each unique address. To get a unique set of Address identifiers, the Address base object would need to be included in the data model.

At its simplest, the Address base object does not include any attributes of its own, other than a Status Indicator to indicate whether the Address ID is active or inactive. Instead, it uses intertable match to match using the attributes from the Business Party Address table. This approach assumes that tight matching rules are used for the Address base object, and that survivorship of household-specific attributes is not required. If household-specific attributes need to be survived, then those attributes must be defined and populated for the Address base object, along with the appropriate Trust rules.

Design Patterns

Populating the Address Household Object

The Address household object is a standard base object that is populated through landing and staging tables. At the cross-reference level, there is one-to-one cardinality Designing the Data Model 45

between the Address base object (Address cross-reference) and the Business Party Address base object (Business Party Address cross-reference).

Landing Tables

The Address object should share landing tables with the Business Party Address base object. The Address base object uses the same pkey_src_object values as the Business Party Address.

Staging Tables

The Address object must have its own staging tables. As for any other base object, the Address base object requires a separate staging table for each source system that can populate it. Each Address staging table usually only has pkey_src_object and last_update_date columns, unless there are other, household-specific attributes to be included.

If hard delete detection is being used to deactivate unused address identifi

Documents

46574483 Siperian Hub Implementer Guide