38
© 2008 IBM Corporation 1 Effective Data Integration & Industry Data Model - Critical Success Factor for Business Intelligence / Data Warehouses

Effective Data Integration & Industry Data Model ... · PDF fileEffective Data Integration & Industry Data Model - Critical Success Factor for Business Intelligence / Data Warehouses

Embed Size (px)

Citation preview

© 2008 IBM Corporation1

Effective Data Integration & Industry Data Model

- Critical Success Factor for Business Intelligence / Data Warehouses

2

InfoSphere software

Mastering Your Data

is the Foundation for Success

Mastering Your Data

is the Foundation for Success

QualityQuality

GovernanceGovernance

Integration

Consolidation

Mergers &Acquisitions

CompetitivePressures

RegulatoryCompliance

BusinessEfficiency

Innovation& Growth

MarketMarketDynamicsDynamics

Risk & Risk & ComplianceCompliance

Product/ServiceProduct/ServiceOptimizationOptimization

EnterpriseEnterpriseIntelligenceIntelligence

CustomerCustomerCentricityCentricity

BusinessBusinessInitiativesInitiatives

ERP/CRMERP/CRMDeploymentsDeploymentsCDI/MDMCDI/MDMDataData

WarehousingWarehousingLegacy SystemLegacy SystemConsolidationsConsolidations

Major ITMajor ITProjectsProjects

IBM Information Server

IBM Master Data Server

Industry Models & Accelerators

IBM Information PlatformIBM Information Platform

IBM Information Server

IBM Master Data Server

Industry Models & Accelerators

IBM Information PlatformIBM Information Platform

IBM Information Server

IBM Master Data Server

Industry Models & Accelerators

IBM Information Platform

Information On Demand is Delivering ValueTrusted Information Enables Business Success

3

InfoSphere software

Global 2000 Profiting from Intelligent Information

4

InfoSphere software

Logistics

Asia Pacific Client List (Partial)

5

InfoSphere software

Analyst Validation: Gartner’s Magic Quadrant

"IBM demonstrates the best vision in the market for extensive data integration capabilities, as it continues to progress toward bringing together all its data integration components atop common metadata, common

design tooling, and a common look and feel."

6

InfoSphere software

Source System Analysis

Data Cleansing

Transformation Logic Construction

Data Management Services

Application System Connectivity

50+% gain

20+% gain

40+% gain

30+% gain

50+% gain

1 Compared to hand coding – gathered from IBM project studies

Customers Achieve Significant Productivity Benefits1 with Effective Data Integration Approx.ProjectEffort

30%

20%

20%

15%

10%

100%

7

InfoSphere software

Costs of Inefficient Data Integration and Data Quality Management

Inaccurate or incomplete data is a leading cause of failure

83% of business-intelligence and CRM data integration

projects either overrun or fail

Low data quality costs companies $611 billion

annually

Undetected defects will cost 10 to 100 times as much to fix upstream

25% of time is spent clarifying

bad data

Lack of consumer confidence

Lost opportunities

Scrap and reworkIncreased costs

8

InfoSphere software

The IBM Solution: IBM Information ServerDelivering information you can trust

IBM Information Server

Understand Cleanse Transform Deliver

Discover, model, and govern information

structure and content

Standardize, merge,and correct information

Combine and restructure information

for new uses

Replicate, virtualize and move information for

in-line delivery

Platform Services

ParallelProcessing

Services

ConnectivityServices

MetadataServices

DeploymentServices

AdministrationServices

Parallel Processing

Rich Connectivity to Applications, Data, and Content

Unified Deployment

Unified Metadata Management

Understand Cleanse Transform Deliver

Discover, model, and govern information

structure and content

Standardize, merge,and correct information

Combine and restructure information

for new uses

Replicate, virtualize and move information for

in-line delivery

9

InfoSphere software

Data Transformation & Movement: WebSphere DataStage

� Provides codeless visual design of data flows with hundreds of built-in transformation functions

• Optimized reuse of integration objects

• Supports batch & real-time operations

• Produces reusable components that can be shared across projects

� Complete ETL functionality with metadata-driven productivity

� Supports team-based development and collaboration

� Provides integration from across the broadest range of sources

Transform

Transform and aggregate any volume of information in batch or real time

through visually designed logic

Hundreds of Built-inTransformation Functions

ArchitectsDevelopers

WebSphere DataStage®

Deliver

10

InfoSphere software

Job Execution

� Job sequencer for sequencing and controlling job flow

� DataStage Director• Used to validate, schedule, run,

and monitor DataStage jobs*

� Command line interface

dsjob –run

[ –mode [ NORMAL | RESET | VALIDATE ] ]

[ –param name=value ]

[ –warn n ]

[ –rows n ]

[ –wait ]

[ –stop ]

[ –jobstatus ]

[ –userstatus ]

[ –local ]

11

InfoSphere software

Job Monitoring & Logging

� Detail job monitoring information available during and after job execution

• Start and elapsed times

• Record counts per link

• % CPU used by each process

• Data skew across partitions

� Available in the Director

� Also available from command line• dsjob –report <project> <job> [<type>]

type = BASIC , DETAIL, XML

Monitor informationat partition level

12

InfoSphere software

Parallel Runtime Execution

13

InfoSphere software

Scalable Performance

Benchmark: Scalable Data Integration Using Ascential DataStage Enterprise Edition

0

25,000

50,000

75,000

100,000

2 4 6 8 10 12 14 16 18 20 22 24

CPU/Node

Rec./Sec.

1:1 Ratio Linear

Note: Contact Ascential for an audited Performance Benchmark Report.

14

InfoSphere software

Change Data Capture and Replication

� Provides real time changed-data capture and delivery for

• Dynamic warehousing, eBusiness

• Synchronization

• Replication

� Provides high-volume, low-latency replication for

• Business continuity

• Workload distribution

• Business integration scenarios

� Minimal impact on production systems

� High scalability and end-to-end performance

� Wide breadth of RDBMS support

ArchitectsDevelopers

Transformation ServerReplication Server

Data Event PublisheriReflect

Deliver

Minimizes impact on performance of production systems

15

InfoSphere software

Business

Glossary

� Provides in-depth analysis of existing systems• Data-centric analysis of application,

database, and file-based sources for content, quality, and structure

• Secure, detailed profiling of fields, and relationship analysis across fields and across sources

� Enables ongoing measurement and baseline reporting of information quality

� Creates metadata that describes where information is managed across systems• Provides an understanding of the

fitness of specific sources and highlights data that may need downstream attention

Technical Metadata: WebSphere Information Analyzer

Other

Product Modules

Understand

Analyze source data structures, and monitor adherence to integration and

quality rules

WebSphere Information Analyzer

DataAnalysts

Subject Matter Experts

Physical View

16

InfoSphere software

Introducing Data Rules for Information AnalyzerAdding monitoring to assure accuracy and increase trust

� Establish Benchmarks for Variance Tracking

� Create Metrics across single or multiple Data Rules

� Organize Metrics and Rules within user-defined categories

� View Metric & Benchmark summaries

17

InfoSphere software

Business GlossaryCreate and manage business vocabulary and relationships

Subject Matter Experts Analyst

Web Browser

Features� Facilitate business & IT communications

by creating & managing a common business vocabulary

� Web based interface shared across enterprise business teams

� Allows creation of stewards & assignment of their responsibilities for terms & assets.

� Link business terms / concepts to Electronically Stored Information (technical assets)

Benefits� Aligns the efforts of IT with the goals of the

business

� Provides business context to information technology assets

� Establishes responsibility and accountabilityin accordance with data governance policies

Steward Console

18

InfoSphere software

Business Glossary BrowserDesigned for simplicity – read-only access to Business Glossary

Web Browser

Simple Search

Graphical navigation

Business UsersFeatures

� Designed based on two key principles: “simplicity lasts” and “cut right to the chase”

� Read-only browser interface

� Search and browse the enterprise glossary graphically or textually

� View details for terms, categories, stewards and other objects

� Send feedback directly to stewards

Benefits

� Facilitate business-IT alignment by encouraging the acceptance and growth of a corporate business glossary

� Adherence to data governance standards

� Promotes trust in business glossary assets through collaboration

19

InfoSphere software

Business Glossary AnywhereReal-time access to Business Glossary from any desktop application

Features

� From any desktop application, click on a term & view its business definition in a pop-up window without any loss of context or focus

� Intelligent matching returns best candidates in a single search

� Search engine for terms and categories

� Access steward contact information directly

Benefits

� Increased trust and acceptance of information by delivering definitions in context

� Expanded adoption of enterprise glossaryoutside of Information Platform technologies

� Improved information availability with multiple access mechanisms for electronically stored information (ESI)

ANY User

From Any Application...

Pop the Definition!

20

InfoSphere software

Logical Metadata: Rational Data Architect

� Data modeling for data structures and federations

� Federated data discovery

� Metadata relationship discovery & mapping

� Impact analysis, and synchronization across models

� SQL & XML generation capabilities

Subject Matter Experts

Create and manage business vocabulary and relationships, while

linking to physical sources

Data Modeling & Mapping

Architects

Rational Data Architect

21

InfoSphere software

Flexible Reporting

Specification

� Business analysts and IT collaborate in context to create project specification

� Leverages source analysis, target models, and metadata to facilitate mapping process

� Auto-generation of data transformation jobs & reports

Auto-generates DataStage jobs

Introducing Information Server FastTrackTo reduce Costs of Integration Projects through Automation

22

InfoSphere software

FastTrack Interface

Source column info Target column info Transformation rule and/or function

Drag&drop metadata browser

Details of source-to-target mapping

Customizable spread sheet view

hosted in metadata repository

DataStage job generation

23

InfoSphere software

Role-Based Tools with Integrated Metadata

� Simplify Integration � Increase trust and confidence in information

� Increase compliance to standards

� Facilitate change management & reuseDesign Operational

DevelopersSubject Matter Experts

DataAnalysts

Business Users

Architects DBAs

Unified Metadata Management

24

InfoSphere software

Metadata lineage: IBM Metadata Workbench

Data Integration Managers

Developers

Provides IT professionals with a tool for exploring and understanding the assets generated and used by the Information Server suite.

IBM Metadata Workbench®

Understand

� Web-based exploration of Information Assets generated and used by Information Server applications

� Cross-tool reporting on data movement, data lineage, business meaning, impact of changes and dependencies

� Cross-tool tracing of data lineage for Business Intelligence Reports to provide basis for compliance with legislation such as Sarbanes-Oxley and Basel II

25

InfoSphere software

Where does a field of data in this report come from?

Source Tables

IBM Information Server

� Import & Browse Full BI Report Metadata

� Navigate through report attributes

� Visually navigate through data lineage across tools

� Increases trust and understanding of business information

26

InfoSphere software

What happens if I change this column?

� Show complete change impact in graphical or list form

� Includes impact on reports in BI tools

� Allows impact analysis on any object type

� Reduces the cost associated with IT changes

27

InfoSphere software

Why Should I Care About Cleansing Information?

� Lack of information standards• Different formats & structures

across different systems

� Data surprises in individual fields

• Data misplaced in the database

� Information buried in free-form fields

� Data myopia

• Lack of consistent identifiers inhibit a single view

� The redundancy nightmare

• Duplicate records with a lack of standards

Kate A. Roberts 416 Columbus Ave #2, Boston, Mass 02116

Catherine Roberts Four sixteen Columbus APT2, Bosto n, MA 02116

Mrs. K. Roberts 416 Columbus Suite #2, Suffolk Co unty 02116

Name Tax ID Telephone

J Smith DBA Lime Cons. 228-02-1975 6173380300Williams & Co. C/O Bill 025-37-1888 415-392-20001st Natl Provident 34-2671434 3380321HP 15 State St. 508-466-1200 Orlando

WING ASSY DRILL 4 HOLE USE 5J868A HEXBOLT 1/4 INCH

WING ASSEMBY, USE 5J868-A HEX BOLT .25” - DRILL FOUR HOLES

USE 4 5J868A BOLTS (HEX .25) - DRILL HOLES FOR EA ON WING ASSEM

RUDER, TAP 6 WHOLES, SECURE W/KL2301 RIVETS (10 CM)

19-84-103 RS232 Cable 6' M-F CandS

CS-89641 6 ft. Cable Male-F, RS232 #87951

C&SUCH6 Male/Female 25 PIN 6 Foot Cable

90328574 IBM 187 N.Pk. Str. Salem NH 0145690328575 I.B.M. Inc. 187 N.Pk. St. Salem NH 0145690238495 Int. Bus. Machines 187 No. Park St Salem NH 0415690233479 International Bus. M. 187 Park Ave Salem N H 0415690233489 Inter-Nation Consults 15 Main Street Andove r MA 0234190345672 I.B. Manufacturing Park Blvd. Bostno MA 04106

28

InfoSphere software

Data Cleansing: WebSphere QualityStage

� Provides specialized data quality processing

• Ensures clean, standardized, de-duplicated information

• Enables a single version of the truth

• Supports global postal verification

� Provides visual tools for designing quality rules and matching logic

• Seamlessly integrated with DataStage (one engine, one metamodel, one UI)

• Precisely calibrates matching rules

� Allows quality logic to be deployed seamlessly within ETL, or as shared services

Cleanse

Subject Matter Experts

Standardize and correct source data fields, and match records together

across sources to create a single view

WebSphere QualityStage™

Visual Match Rule Design

DataAnalysts

29

InfoSphere software

QualityStage Methodology

Data Quality

Assessment (DQA)

Investigation

Data Re-Engineering (DRE)

Standardization Matching Survivorship

Blk 1, 1 St, 05-00

05-00 Frist St, Block 1

1 First Str, #05-00

1, St, #05-00

Blk 1| First St|05-00

Blk 1| First St|05-00

1|First St |#05-00

1|St|#05-00

Blk 1|First St|05-00

Blk 1|First St|05-00

1|First St|#05-00

1|St|#05-00

#05-00, Blk 1, First St

#05-00, 1, St

30

InfoSphere software

Web Services

Illustration of Information as a Service

Calls Data Cleansing/ Scrubbing Web Services

Data / Web Entry

Invokes RTI-web service DataStage + QualityStage for

data validation / cleansing / etc

Enters

Validated and Cleansed Result

31

InfoSphere software

Logical Metadata: IBM Industry Data Models

� Industry-proven models, including KPIs and compliance metrics

• Trusted, single analytical view of the business

� Proven data model methodology with over 400 clients

• Accelerated, business-centric development

� Models automatically populate and generate metadata in IBM Information Server

• Reduces project complexity and risk

Subject Matter Experts

Data Modeling & Mapping

Architects

IBM Industry Models

Delivers proven industry expertise, models and methodology for six

industries with Information Server

32

InfoSphere software

Information Server and IBM Industry Data Models

Banking(Banking Data Warehouse)

Financial Markets(Financial Markets Data Warehouse)

� Claims

� Medical Management

� Provider and Network

� Sales, Marketing and Membership

� Financials

� Profitability

� Relationship Marketing

� Risk Management

� Asset and Liability Mgmt

� Compliance

� Risk Management

� Asset and Liability Mgmt

� Compliance

Health Plan(Health Plan Data Warehouse)

� Customer centricity

� Claims

� Intermediary Performance

� Compliance

� Risk Management

Retail (Retail Data Warehouse)

� Customer centricity

� Merchandising Management

� Store Operations & Product Mgmt

� Supply Chain Management

� Compliance

Telco(Telecommunications Data Warehouse)

� Churn Management

� Relationship Mgmt & Segmentation

� Sales and Marketing

� Service Quality & Product Lifecycle

� Usage Profile

Insurance(Insurance Information Warehouse)

33

InfoSphere software

Data Model Impact on Data Warehouse Projects

� Mitigated project risk..………………………………………….....20%

� Fit with our business requirements…..………..…………….....85%

� Savings from initial analysis phase……………………….........75%

� Design phase (inc. logical and physical data models)..……..65%

� ETL activities………………………………………………………..20%

� Anticipated reuse savings on next project……………………50%

� Mitigated project risk..………………………………………….....20%

� Fit with our business requirements…..………..…………….....85%

� Savings from initial analysis phase……………………….........75%

� Design phase (inc. logical and physical data models)..……..65%

� ETL activities………………………………………………………..20%

� Anticipated reuse savings on next project……………………50%

Anticipated Insurance Client Benefits

Biggest Impact on DW Project

ROI

� 2 to 4 weeks for KPI selection to………….……………….........2 hours

� 6 to 12 weeks for Logical model build to..………….…………...2 days

� 0% of useful current KPIs to..….…………………..99%, 1 BST Added

� 3-5 days for business metadata capture to……………………Minutes

� 2 to 4 weeks for KPI selection to………….……………….........2 hours

� 6 to 12 weeks for Logical model build to..………….…………...2 days

� 0% of useful current KPIs to..….…………………..99%, 1 BST Added

� 3-5 days for business metadata capture to……………………Minutes

POC Metrics from Major Electronics Retailer

34

InfoSphere software

34

Plugging Industry Data Models into IBM Information Server Five Ways the IBM Data Model Accelerate DW Development

Understand Cleanse Transform Deliver

Parallel Processing

Rich Connectivity to Applications, Data, and Content

IBM Information Server

Discover, model, and govern information

structure and content

Standardize, merge,and correct information

Combine and restructure information

for new uses

Synchronize, virtualize and move information

for in-line delivery

Unified Deployment

Unified Metadata Management

Identify business analysis areas and data requirements

Define enterprise-wide data definitions and data standards

Create target data warehouse and mart structures for trusted data

Explain to business users the definition of the data they’re using

Information Server is the ONLY data integration platform capable of exploiting the full value of the data models

Simplify data warehouse design, reuse and lifecycle management

35

InfoSphere software

The IBM Information Server AdvantageSimplifying Information Integration

IBM Information Server accelerates information inte gration speed and flexibility by providing:

� An easily deployable, unified foundation for enterprise information architectures

� Metadata-driven automation, accelerating productivity and flexibility for integrating, enriching and understanding informatio n

� Simplified scalability at lower cost to manage current and future data requirements

� Data governance capabilities to ensure consistent and accurate compliance with information-centric regulations and requirements

� Broadest and deepest connectivity and platform support to leverage and extend existing IT investments

� Integrated with IBM Industry Data Model

36

InfoSphere software

Information Platform & Solutions:Fast Track Your Master Data

� http://www-01.ibm.com/software/data/ips/

� Ivan Lee

• 94500635

[email protected]

37

InfoSphere software

Next step…

� InfoSphere Warehouse Proof of Technology Workshop

• Sept 18, 2008 (Thur)

• 2:30 – 5PM

• IBM Solution Centre

– 10/F, PCCW Tower, Taikoo Place, 979 King’s Road, Quarry Bay, Hong Kong

• Demonstrating Cognos, Information Server & InfoSphereWarehouse

38

InfoSphere software