Upload
doankhue
View
234
Download
3
Embed Size (px)
Citation preview
© 2008 IBM Corporation1
Effective Data Integration & Industry Data Model
- Critical Success Factor for Business Intelligence / Data Warehouses
2
InfoSphere software
Mastering Your Data
is the Foundation for Success
Mastering Your Data
is the Foundation for Success
QualityQuality
GovernanceGovernance
Integration
Consolidation
Mergers &Acquisitions
CompetitivePressures
RegulatoryCompliance
BusinessEfficiency
Innovation& Growth
MarketMarketDynamicsDynamics
Risk & Risk & ComplianceCompliance
Product/ServiceProduct/ServiceOptimizationOptimization
EnterpriseEnterpriseIntelligenceIntelligence
CustomerCustomerCentricityCentricity
BusinessBusinessInitiativesInitiatives
ERP/CRMERP/CRMDeploymentsDeploymentsCDI/MDMCDI/MDMDataData
WarehousingWarehousingLegacy SystemLegacy SystemConsolidationsConsolidations
Major ITMajor ITProjectsProjects
IBM Information Server
IBM Master Data Server
Industry Models & Accelerators
IBM Information PlatformIBM Information Platform
IBM Information Server
IBM Master Data Server
Industry Models & Accelerators
IBM Information PlatformIBM Information Platform
IBM Information Server
IBM Master Data Server
Industry Models & Accelerators
IBM Information Platform
Information On Demand is Delivering ValueTrusted Information Enables Business Success
5
InfoSphere software
Analyst Validation: Gartner’s Magic Quadrant
"IBM demonstrates the best vision in the market for extensive data integration capabilities, as it continues to progress toward bringing together all its data integration components atop common metadata, common
design tooling, and a common look and feel."
6
InfoSphere software
Source System Analysis
Data Cleansing
Transformation Logic Construction
Data Management Services
Application System Connectivity
50+% gain
20+% gain
40+% gain
30+% gain
50+% gain
1 Compared to hand coding – gathered from IBM project studies
Customers Achieve Significant Productivity Benefits1 with Effective Data Integration Approx.ProjectEffort
30%
20%
20%
15%
10%
100%
7
InfoSphere software
Costs of Inefficient Data Integration and Data Quality Management
Inaccurate or incomplete data is a leading cause of failure
83% of business-intelligence and CRM data integration
projects either overrun or fail
Low data quality costs companies $611 billion
annually
Undetected defects will cost 10 to 100 times as much to fix upstream
25% of time is spent clarifying
bad data
Lack of consumer confidence
Lost opportunities
Scrap and reworkIncreased costs
8
InfoSphere software
The IBM Solution: IBM Information ServerDelivering information you can trust
IBM Information Server
Understand Cleanse Transform Deliver
Discover, model, and govern information
structure and content
Standardize, merge,and correct information
Combine and restructure information
for new uses
Replicate, virtualize and move information for
in-line delivery
Platform Services
ParallelProcessing
Services
ConnectivityServices
MetadataServices
DeploymentServices
AdministrationServices
Parallel Processing
Rich Connectivity to Applications, Data, and Content
Unified Deployment
Unified Metadata Management
Understand Cleanse Transform Deliver
Discover, model, and govern information
structure and content
Standardize, merge,and correct information
Combine and restructure information
for new uses
Replicate, virtualize and move information for
in-line delivery
9
InfoSphere software
Data Transformation & Movement: WebSphere DataStage
� Provides codeless visual design of data flows with hundreds of built-in transformation functions
• Optimized reuse of integration objects
• Supports batch & real-time operations
• Produces reusable components that can be shared across projects
� Complete ETL functionality with metadata-driven productivity
� Supports team-based development and collaboration
� Provides integration from across the broadest range of sources
Transform
Transform and aggregate any volume of information in batch or real time
through visually designed logic
Hundreds of Built-inTransformation Functions
ArchitectsDevelopers
WebSphere DataStage®
Deliver
10
InfoSphere software
Job Execution
� Job sequencer for sequencing and controlling job flow
� DataStage Director• Used to validate, schedule, run,
and monitor DataStage jobs*
� Command line interface
dsjob –run
[ –mode [ NORMAL | RESET | VALIDATE ] ]
[ –param name=value ]
[ –warn n ]
[ –rows n ]
[ –wait ]
[ –stop ]
[ –jobstatus ]
[ –userstatus ]
[ –local ]
11
InfoSphere software
Job Monitoring & Logging
� Detail job monitoring information available during and after job execution
• Start and elapsed times
• Record counts per link
• % CPU used by each process
• Data skew across partitions
� Available in the Director
� Also available from command line• dsjob –report <project> <job> [<type>]
type = BASIC , DETAIL, XML
Monitor informationat partition level
13
InfoSphere software
Scalable Performance
Benchmark: Scalable Data Integration Using Ascential DataStage Enterprise Edition
0
25,000
50,000
75,000
100,000
2 4 6 8 10 12 14 16 18 20 22 24
CPU/Node
Rec./Sec.
1:1 Ratio Linear
Note: Contact Ascential for an audited Performance Benchmark Report.
14
InfoSphere software
Change Data Capture and Replication
� Provides real time changed-data capture and delivery for
• Dynamic warehousing, eBusiness
• Synchronization
• Replication
� Provides high-volume, low-latency replication for
• Business continuity
• Workload distribution
• Business integration scenarios
� Minimal impact on production systems
� High scalability and end-to-end performance
� Wide breadth of RDBMS support
ArchitectsDevelopers
Transformation ServerReplication Server
Data Event PublisheriReflect
Deliver
Minimizes impact on performance of production systems
15
InfoSphere software
Business
Glossary
� Provides in-depth analysis of existing systems• Data-centric analysis of application,
database, and file-based sources for content, quality, and structure
• Secure, detailed profiling of fields, and relationship analysis across fields and across sources
� Enables ongoing measurement and baseline reporting of information quality
� Creates metadata that describes where information is managed across systems• Provides an understanding of the
fitness of specific sources and highlights data that may need downstream attention
Technical Metadata: WebSphere Information Analyzer
Other
Product Modules
Understand
Analyze source data structures, and monitor adherence to integration and
quality rules
WebSphere Information Analyzer
DataAnalysts
Subject Matter Experts
Physical View
16
InfoSphere software
Introducing Data Rules for Information AnalyzerAdding monitoring to assure accuracy and increase trust
� Establish Benchmarks for Variance Tracking
� Create Metrics across single or multiple Data Rules
� Organize Metrics and Rules within user-defined categories
� View Metric & Benchmark summaries
17
InfoSphere software
Business GlossaryCreate and manage business vocabulary and relationships
Subject Matter Experts Analyst
Web Browser
Features� Facilitate business & IT communications
by creating & managing a common business vocabulary
� Web based interface shared across enterprise business teams
� Allows creation of stewards & assignment of their responsibilities for terms & assets.
� Link business terms / concepts to Electronically Stored Information (technical assets)
Benefits� Aligns the efforts of IT with the goals of the
business
� Provides business context to information technology assets
� Establishes responsibility and accountabilityin accordance with data governance policies
Steward Console
18
InfoSphere software
Business Glossary BrowserDesigned for simplicity – read-only access to Business Glossary
Web Browser
Simple Search
Graphical navigation
Business UsersFeatures
� Designed based on two key principles: “simplicity lasts” and “cut right to the chase”
� Read-only browser interface
� Search and browse the enterprise glossary graphically or textually
� View details for terms, categories, stewards and other objects
� Send feedback directly to stewards
Benefits
� Facilitate business-IT alignment by encouraging the acceptance and growth of a corporate business glossary
� Adherence to data governance standards
� Promotes trust in business glossary assets through collaboration
19
InfoSphere software
Business Glossary AnywhereReal-time access to Business Glossary from any desktop application
Features
� From any desktop application, click on a term & view its business definition in a pop-up window without any loss of context or focus
� Intelligent matching returns best candidates in a single search
� Search engine for terms and categories
� Access steward contact information directly
Benefits
� Increased trust and acceptance of information by delivering definitions in context
� Expanded adoption of enterprise glossaryoutside of Information Platform technologies
� Improved information availability with multiple access mechanisms for electronically stored information (ESI)
ANY User
From Any Application...
Pop the Definition!
20
InfoSphere software
Logical Metadata: Rational Data Architect
� Data modeling for data structures and federations
� Federated data discovery
� Metadata relationship discovery & mapping
� Impact analysis, and synchronization across models
� SQL & XML generation capabilities
Subject Matter Experts
Create and manage business vocabulary and relationships, while
linking to physical sources
Data Modeling & Mapping
Architects
Rational Data Architect
21
InfoSphere software
Flexible Reporting
Specification
� Business analysts and IT collaborate in context to create project specification
� Leverages source analysis, target models, and metadata to facilitate mapping process
� Auto-generation of data transformation jobs & reports
Auto-generates DataStage jobs
Introducing Information Server FastTrackTo reduce Costs of Integration Projects through Automation
22
InfoSphere software
FastTrack Interface
Source column info Target column info Transformation rule and/or function
Drag&drop metadata browser
Details of source-to-target mapping
Customizable spread sheet view
hosted in metadata repository
DataStage job generation
23
InfoSphere software
Role-Based Tools with Integrated Metadata
� Simplify Integration � Increase trust and confidence in information
� Increase compliance to standards
� Facilitate change management & reuseDesign Operational
DevelopersSubject Matter Experts
DataAnalysts
Business Users
Architects DBAs
Unified Metadata Management
24
InfoSphere software
Metadata lineage: IBM Metadata Workbench
Data Integration Managers
Developers
Provides IT professionals with a tool for exploring and understanding the assets generated and used by the Information Server suite.
IBM Metadata Workbench®
Understand
� Web-based exploration of Information Assets generated and used by Information Server applications
� Cross-tool reporting on data movement, data lineage, business meaning, impact of changes and dependencies
� Cross-tool tracing of data lineage for Business Intelligence Reports to provide basis for compliance with legislation such as Sarbanes-Oxley and Basel II
25
InfoSphere software
Where does a field of data in this report come from?
Source Tables
IBM Information Server
� Import & Browse Full BI Report Metadata
� Navigate through report attributes
� Visually navigate through data lineage across tools
� Increases trust and understanding of business information
26
InfoSphere software
What happens if I change this column?
� Show complete change impact in graphical or list form
� Includes impact on reports in BI tools
� Allows impact analysis on any object type
� Reduces the cost associated with IT changes
27
InfoSphere software
Why Should I Care About Cleansing Information?
� Lack of information standards• Different formats & structures
across different systems
� Data surprises in individual fields
• Data misplaced in the database
� Information buried in free-form fields
� Data myopia
• Lack of consistent identifiers inhibit a single view
� The redundancy nightmare
• Duplicate records with a lack of standards
Kate A. Roberts 416 Columbus Ave #2, Boston, Mass 02116
Catherine Roberts Four sixteen Columbus APT2, Bosto n, MA 02116
Mrs. K. Roberts 416 Columbus Suite #2, Suffolk Co unty 02116
Name Tax ID Telephone
J Smith DBA Lime Cons. 228-02-1975 6173380300Williams & Co. C/O Bill 025-37-1888 415-392-20001st Natl Provident 34-2671434 3380321HP 15 State St. 508-466-1200 Orlando
WING ASSY DRILL 4 HOLE USE 5J868A HEXBOLT 1/4 INCH
WING ASSEMBY, USE 5J868-A HEX BOLT .25” - DRILL FOUR HOLES
USE 4 5J868A BOLTS (HEX .25) - DRILL HOLES FOR EA ON WING ASSEM
RUDER, TAP 6 WHOLES, SECURE W/KL2301 RIVETS (10 CM)
19-84-103 RS232 Cable 6' M-F CandS
CS-89641 6 ft. Cable Male-F, RS232 #87951
C&SUCH6 Male/Female 25 PIN 6 Foot Cable
90328574 IBM 187 N.Pk. Str. Salem NH 0145690328575 I.B.M. Inc. 187 N.Pk. St. Salem NH 0145690238495 Int. Bus. Machines 187 No. Park St Salem NH 0415690233479 International Bus. M. 187 Park Ave Salem N H 0415690233489 Inter-Nation Consults 15 Main Street Andove r MA 0234190345672 I.B. Manufacturing Park Blvd. Bostno MA 04106
28
InfoSphere software
Data Cleansing: WebSphere QualityStage
� Provides specialized data quality processing
• Ensures clean, standardized, de-duplicated information
• Enables a single version of the truth
• Supports global postal verification
� Provides visual tools for designing quality rules and matching logic
• Seamlessly integrated with DataStage (one engine, one metamodel, one UI)
• Precisely calibrates matching rules
� Allows quality logic to be deployed seamlessly within ETL, or as shared services
Cleanse
Subject Matter Experts
Standardize and correct source data fields, and match records together
across sources to create a single view
WebSphere QualityStage™
Visual Match Rule Design
DataAnalysts
29
InfoSphere software
QualityStage Methodology
Data Quality
Assessment (DQA)
Investigation
Data Re-Engineering (DRE)
Standardization Matching Survivorship
Blk 1, 1 St, 05-00
05-00 Frist St, Block 1
1 First Str, #05-00
1, St, #05-00
Blk 1| First St|05-00
Blk 1| First St|05-00
1|First St |#05-00
1|St|#05-00
Blk 1|First St|05-00
Blk 1|First St|05-00
1|First St|#05-00
1|St|#05-00
#05-00, Blk 1, First St
#05-00, 1, St
30
InfoSphere software
Web Services
Illustration of Information as a Service
Calls Data Cleansing/ Scrubbing Web Services
Data / Web Entry
Invokes RTI-web service DataStage + QualityStage for
data validation / cleansing / etc
Enters
Validated and Cleansed Result
31
InfoSphere software
Logical Metadata: IBM Industry Data Models
� Industry-proven models, including KPIs and compliance metrics
• Trusted, single analytical view of the business
� Proven data model methodology with over 400 clients
• Accelerated, business-centric development
� Models automatically populate and generate metadata in IBM Information Server
• Reduces project complexity and risk
Subject Matter Experts
Data Modeling & Mapping
Architects
IBM Industry Models
Delivers proven industry expertise, models and methodology for six
industries with Information Server
32
InfoSphere software
Information Server and IBM Industry Data Models
Banking(Banking Data Warehouse)
Financial Markets(Financial Markets Data Warehouse)
� Claims
� Medical Management
� Provider and Network
� Sales, Marketing and Membership
� Financials
� Profitability
� Relationship Marketing
� Risk Management
� Asset and Liability Mgmt
� Compliance
� Risk Management
� Asset and Liability Mgmt
� Compliance
Health Plan(Health Plan Data Warehouse)
� Customer centricity
� Claims
� Intermediary Performance
� Compliance
� Risk Management
Retail (Retail Data Warehouse)
� Customer centricity
� Merchandising Management
� Store Operations & Product Mgmt
� Supply Chain Management
� Compliance
Telco(Telecommunications Data Warehouse)
� Churn Management
� Relationship Mgmt & Segmentation
� Sales and Marketing
� Service Quality & Product Lifecycle
� Usage Profile
Insurance(Insurance Information Warehouse)
33
InfoSphere software
Data Model Impact on Data Warehouse Projects
� Mitigated project risk..………………………………………….....20%
� Fit with our business requirements…..………..…………….....85%
� Savings from initial analysis phase……………………….........75%
� Design phase (inc. logical and physical data models)..……..65%
� ETL activities………………………………………………………..20%
� Anticipated reuse savings on next project……………………50%
� Mitigated project risk..………………………………………….....20%
� Fit with our business requirements…..………..…………….....85%
� Savings from initial analysis phase……………………….........75%
� Design phase (inc. logical and physical data models)..……..65%
� ETL activities………………………………………………………..20%
� Anticipated reuse savings on next project……………………50%
Anticipated Insurance Client Benefits
Biggest Impact on DW Project
ROI
� 2 to 4 weeks for KPI selection to………….……………….........2 hours
� 6 to 12 weeks for Logical model build to..………….…………...2 days
� 0% of useful current KPIs to..….…………………..99%, 1 BST Added
� 3-5 days for business metadata capture to……………………Minutes
� 2 to 4 weeks for KPI selection to………….……………….........2 hours
� 6 to 12 weeks for Logical model build to..………….…………...2 days
� 0% of useful current KPIs to..….…………………..99%, 1 BST Added
� 3-5 days for business metadata capture to……………………Minutes
POC Metrics from Major Electronics Retailer
34
InfoSphere software
34
Plugging Industry Data Models into IBM Information Server Five Ways the IBM Data Model Accelerate DW Development
Understand Cleanse Transform Deliver
Parallel Processing
Rich Connectivity to Applications, Data, and Content
IBM Information Server
Discover, model, and govern information
structure and content
Standardize, merge,and correct information
Combine and restructure information
for new uses
Synchronize, virtualize and move information
for in-line delivery
Unified Deployment
Unified Metadata Management
Identify business analysis areas and data requirements
Define enterprise-wide data definitions and data standards
Create target data warehouse and mart structures for trusted data
Explain to business users the definition of the data they’re using
Information Server is the ONLY data integration platform capable of exploiting the full value of the data models
Simplify data warehouse design, reuse and lifecycle management
35
InfoSphere software
The IBM Information Server AdvantageSimplifying Information Integration
IBM Information Server accelerates information inte gration speed and flexibility by providing:
� An easily deployable, unified foundation for enterprise information architectures
� Metadata-driven automation, accelerating productivity and flexibility for integrating, enriching and understanding informatio n
� Simplified scalability at lower cost to manage current and future data requirements
� Data governance capabilities to ensure consistent and accurate compliance with information-centric regulations and requirements
� Broadest and deepest connectivity and platform support to leverage and extend existing IT investments
� Integrated with IBM Industry Data Model
36
InfoSphere software
Information Platform & Solutions:Fast Track Your Master Data
� http://www-01.ibm.com/software/data/ips/
� Ivan Lee
• 94500635
37
InfoSphere software
Next step…
� InfoSphere Warehouse Proof of Technology Workshop
• Sept 18, 2008 (Thur)
• 2:30 – 5PM
• IBM Solution Centre
– 10/F, PCCW Tower, Taikoo Place, 979 King’s Road, Quarry Bay, Hong Kong
• Demonstrating Cognos, Information Server & InfoSphereWarehouse