36
Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data Mgmt Ralph Hollinshead, Manager, Solutions Data Integration

Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Building and ImplementingIntegrated Data ModelsNancy Wills, Director, Access, Query and Data MgmtRalph Hollinshead, Manager, Solutions Data Integration

Page 2: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Overview

Part One: Building an Integrated Data Model

Part Two: Deploying and Scaling the Data Architecture

Page 3: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

SAS® Banking Intelligence Solutions Framework

Customer Retention

Customer Retention

X SellUp sellX Sell

Up sell

MarketingAutomationMarketing

Automation

CreditScoringCredit

Scoring

Credit RiskCredit Risk

Banking Intelligence ArchitectureBanking Intelligence Architecture

Strategic Performance Management

Strategic Performance Management

INTEGRATED EXTENDABLE ARCHITECTURE

FOCUSED ON BUSINESS ISSUES

BASED ON EXPERIENCE

INTEGRATED EXTENDABLE ARCHITECTURE

FOCUSED ON BUSINESS ISSUES

BASED ON EXPERIENCE

New Solutions

New Solutions

Page 4: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

SAS® Cross-Sell and Up-Sell for BankingSAS® Customer Retention for Banking

SAS® Credit Scoring for Banking

Solution Data MartsExtract and Cleanse Files

EnterpriseSource

Systems

Independent Solutions

Solutions

SAS® Credit Risk Management

Page 5: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Integrated Data Model: Not All Customers are the Same

Customer A: No Data Warehouse• Interested Multiple SAS Solutions

Customer B: With Data Warehouse• Adverse to Data Replication Issues

Customer C: With Data Warehouse• No Data Marts allowed – Active Data Warehousing Approach

Page 6: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Customer A: Full SAS Data Architecture

1

2

2

Solution Data Marts

Extract and Cleanse Files

EnterpriseSource

Systems

Solutions

SAS® Cross-Sell and Up-Sell for Banking

SAS® Customer Retention for Banking

SAS® Credit Scoring for Banking

SAS® Credit Risk ManagementSAS Banking Detail Data Store

Flexible Options to Meet Customer Needs!

Page 7: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Customer B: Partial SAS Data Architecture

1

2

2

Solution Data Marts

Extract and Cleanse Files

EnterpriseSource

Systems

Solutions

SAS® Cross-Sell and Up-Sell for Banking

SAS® Customer Retention for Banking

SAS® Credit Scoring for Banking

SAS® Credit Risk ManagementCustomer Enterprise Data Warehouse

Flexible Options to Meet Customer Needs!

Page 8: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Customer C: Customer Data Architecture

Information Maps

Extract and Cleanse Files

EnterpriseSource

Systems

Solutions

SAS® Marketing Automation

Customer Enterprise Data Warehouse

Page 9: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Scorecard for Data Architecture ApproachData Management Issue Score

Sensitivity to Data Replication -0-5

Sensitivity to H/W processor and storage budget -0-5

Existing warehouse quality -0-5

Implementation time constraints -0-5

Intentions to implement >1 SAS solution +0-5

Historical data requirements +0-5

Score Decision

-25 No DDS. Marts only if absolutely necessary. Information maps may be appropriate.

0 Use DDS to persist current extract from source systems. Marts hold multiple extracts up to full history.

+25 Implement full warehouse, persist history in DDS and as much as wanted in the marts.

Page 10: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Techniques for Data Model Integration

Detail Data Store• Varying Industries

• General Standards

• Warehousing Techniques

Data Marts• Approach Compared to DDS

Page 11: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Banking- Accounts

- Account Transactions, etc.

Telco- Subscriptions

- Equipment- Networks-Calls, etc.

Insurance- Premiums

- Claims- Benefits, etc.

CustomerSupplier

EmployeeGL

AccountProduct

etc.

Integrating Models at the Industry Level

Page 12: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Detail Data Store Standards Needed for Integration

Data Types / Lengths / Classifier Codes

Naming Conventions

Standards for Data Structures• Hierarchies

• Subtypes

• Reference Data

Page 13: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Data Administration StandardsDomain

Data Type

Width

Applicable Class Codes

Comment/Example

Identifier Varchar 32 ID Typically the identifier from the source system.

Small Code Varchar 3 CD Short length codes such as ADDRESS_TYPE_CD

Medium Code Varchar 10 CD Medium length codes such as EXCHANGE_SYMBOL_CD

Large Code Varchar 20 CD Long length codes such as POSTAL_CD

Standard Count Code Numeric 6 CNT Standard counts such as AUTHORIZED_USERS_CNT

Name Varchar 40 NM Proper name. For example, LAST_NM, FIRST_NM, etc.

Short Length Text Varchar 20 TXT Short freeform text.

Medium Length Text Varchar 100 TXT, DESCLonger freeform text and descriptions associated with code tables.

Indicator Field Character 1 FLG Binary indicatory flag (Y or N).

Surrogate Key Numeric 10 RK, SK Generated surrogate keys.

Currency Amount Numeric 18,5 AMT Standard currency amount.

Rates and Percentages

Numeric 9,4 PCT, RT For example, exchange rates.

DateTime Date DT, DTTM Accommodate dates as well as date/time.

Page 14: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Detail Data Store: Data Warehousing StandardsSurrogate Keys, Point-in-Time, and Rapidly Changing Data

CUSTOMER_RK VALID_FROM_DT VALID_TO_DT ACCOUNT_RK MARITAL_STATUS_CD FIRST_NM LAST_NM

100 01JAN1999 29FEB2000 201 S John Smith

100 01MAR2000 31DEC4747 201 M John Smith

ACCOUNT_RK VALID_FROM_DT VALID_TO_DT CUSTOMER_RK FINANCIAL_ACCOUNT_TYPE_CD OPEN_DT

201 01JAN1999 31DEC4747 100 SAVINGS 01JAN2000

CUSTOMER

FINANCIAL_ACCOUNT

ACCOUNT_RK VALID_FROM_DT VALID_TO_DT BALANCE_AMT CURRENCY_CD

201 01JAN1999 31JAN1999 2500.75 USD

201 1FEB1999 28FEB1999 4300.25 USD

FINANCIAL_ACCOUNT_CHNG

Page 15: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Conformed Dimensions

Page 16: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Tools: Extending ModelsCUSTOMER

EXTERNAL_ORG

SUPPLIER

INTERNAL_ORG

INTERNAL_ORG_ASSOC

INTERNAL_ORG_ASSOC_TYPE

COMPETITORS

Page 17: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Change Analysis Tool

Page 18: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Deploying the Integrated Data Architecture

Page 19: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Option A: Full SAS Data Architecture

1

2

2

Solution Data Marts

Extract and Cleanse Files

EnterpriseSource

Systems

Solutions

SAS® Cross-Sell and Up-Sell for Banking

SAS® Customer Retention for Banking

SAS® Credit Scoring for Banking

SAS® Credit Risk ManagementSAS Banking Detail Data Store

Flexible Options to Meet Customer Needs!

Page 20: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Populate DDS and Data Mart

Flat File

Step 1 - Extract cleanse and transform from source data into flat file

Data WarehouseDDS

Step 2 – ETL processing to load data warehouse•data validation•key creation•slowly changing dimensions

Banking Data Mart

Step 3 - Transform into data mart model

ExcelExcel

SASSAS

SAPSAPOracleOracle

PeopleSoftPeopleSoft

Source Data

Page 21: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Deployment Focus

Scalability and Performance

ETL flows

Physical data model

Page 22: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Deployment What did We do?

Create and Generate Data

Deploy Hardware and Software

Populate DDS

Populate Data Mart

Analyze ETL Flows

Analyze DDS Model

Change Management

Page 23: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

It All Starts with Data

Bought and Built Data Generators

Built Simulated Data

Applied Business Rules

Scaled - 5 gig -> 50 gig -> 500 gig -> 1TB

Page 24: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Deploy Hardware and Software

Choose Software Components• SAS for the DDS or Data Warehouse

• Databases for the DDS or Data Warehouse

• SAS for the Data Marts

Install and Configure SAS Software

Configure Hardware

Design for Progressive Larger Deployment Growth

Page 25: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Windows Server

*Dell PowerEdge 1600SC

Windows 2003

DualHyper-threaded 2.8 Ghz processors

4 GB RAM

4 internal IDE drives60 GB C drive 275 GB D drive

Single I/O channel

5gig -> 50gig of Data

Page 26: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

AIX UNIX Servers

IBM P630 eServer

AIX 5.3

4 processors

4 I/O channels

8 GB RAM

4x72 GB disks

14-drive SCSIS storage array

IBM P670 eServer

AIX 5.3

16 processors

8 - 1gig fiber I/O Channels

Dynamic logical partitioning

2 TB disks

50gig -> 500gig 5500gig -> 1TB of Data

Page 27: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Populate DDS and Data Mart

Ran ETL Flows• Registered in SAS Metadata Repository

• Loaded Data into Tables

• Use Slowly Changing Dimension Load Process

Analyze ETL Flows

Page 28: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Example of SAS ETL Studio Flow Analysis

Page 29: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Change Management

Loaded New Release of DDS in TST Repository

Compared PRD Repository to TST Repository

Ran Batch Reports to Examine Differences. 

Ran Impact Analysis on Column and Table  

Page 30: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

What Did We Find

Specific Techniques that Work Best

Recommendations

Tremendous Performance Gains!

Page 31: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Specific Techniques Examples

ETL Flows

Parallel ETL flows

SAS coding techniques to use

Use hash table instead of look up

Make sure the I/O buffer size is tuned

Drop constraints

Page 32: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Specific Techniques Examples

DDS Model

Indexes – when and when not to add

Denormalized some tables

Separate tables for data with high volume changes

Partition data by usage (date ranges)

Page 33: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Recommendations

Debugging techniques

Sorting and memory usage

Joins

Understand disk requirements

I/O optimization

Compression and performance

Page 34: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Above All

Write ETL

Test, Tune

Test, Tune

Test, Tune!!!!

Page 35: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Summary and Conclusions

Data integration is key

Different approaches for customers

Change management is vital

Performance tuning is vital

Technology evolving

Page 36: Copyright © 2004, SAS Institute Inc. All rights reserved. Building and Implementing Integrated Data Models Nancy Wills, Director, Access, Query and Data

Copyright © 2004, SAS Institute Inc. All rights reserved.

Questions?