Upload
nguyen-ba-quoc-an
View
25
Download
1
Tags:
Embed Size (px)
DESCRIPTION
ODI Solution Overview
Citation preview
Oracle Data Integrator Solution Overview
Nguyen Tuan Khang, [email protected]
Senior Solutions Consultant
Fusion Middleware
Oracle Vietnam
3Data IntegrationData Integration
Data Warehousing
Master Data Management
Real Time Messaging
FederationMigration
Data in Disparate SourcesData in Disparate Sources
ERP
------------
------
CRM
- - -
Legacy
------------
------ ---------
---
------
Best-of-breed Applications
Information How and Where you Want ItInformation How and Where you Want It
Business IntelligenceCorporate Performance
Management
Business Activity
Monitoring
Business Process
Management
HAVE
NEED
- - -- - -
- - -- - -
Data Synchronization
Why Data Integration?
43 Pillars of Data Integration
Batch
A
s
y
n
c
Sync
5Enterprise Information IntegrationThe Traditional Approach
TransformSource
Applications
Target Data
WarehouseExtractLoad
ETL processes often use batch processing approaches
Example: Customer nightly batch runs can take > 24 hours!
Services that operate on data are not easily reusable in other contexts
ETL Services and Processes are insecure and hard to monitor (i.e.
no SLA)
6ChallengesIn Data Integration
CHALLENGE
1. Increasing data volumes;
decreasing batch windows
2. Non-integrated integration
3. Complexity, manual effort of
conventional ETL design
4. Lack of knowledge capture
7Oracle Data IntegratorBased on Technology from
Data Movement and Transformation from
Multiple Sources to Heterogeneous Targets
D I F F E R E N T I A T O RB E N E F I T
Knowledge ModulesHot-Pluggable4
The Chosen Integration
Technology of Oracle FusionFuture Proof5
Declarative CDCReal-time Integration3
Declarative DesignProductivity2
Heterogeneous E-LTBest Performance1
8Typical Considerations for ODI
High volume data synchronization more than 20MB/min
Heterogeneous data sources DB2/AS400, Oracle, Excel, File, SQL, BAM
Capture new data changes regardless of data sources CDC using Native Journal, LogMiner or Trigger
Real-time data synchronization
Easy to implement the solution without changing your current IT infrastructure No separate server required
9Challenges & Emerging SolutionsIn Data Integration
CHALLENGE EMERGING SOLUTION
1. Increasing data volumes;
decreasing batch windows
2. Non-integrated integration
3. Complexity, manual effort of
conventional ETL design
4. Lack of knowledge capture
Shift from E-T-L to E-LT
Convergence of integration
solutions
Shift from custom coding to
declarative design
Shift to pattern-driven
development
10
E-LT ArchitectureHigh Performance
Conventional ETL Architecture
Extract LoadTransform
Next Generation Architecture
E-LTE-LTLoadExtract
Transform Transform
Transform in Separate ETL Server Proprietary Engine
Poor Performance
High Costs
Transform in Existing RDBMS Leverage Resources
Efficient
High Performance
BenefitsOptimal Performance & Scalability
Easier to Manage & Lower Cost
11
Eg. Informatica, IBM Datastage
Oracle Data Integrator
11
Traditional E-T-L
Conventional ETL Architecture
Repository
Staging tables
Extract
ETL DB
Target 1
S1
S2
Transform
Server
----
--------
----
S3
Load
Need one powerful server for Transform
Server and for its staging data tables
High total cost for maintenance
It is not flexible when we add more source
and target data sources
Require coding
Bad performance
(more I/O among staging
tables and source/target)
11Technical Detail
12
E-LT Architecture
Next General Architecture: E-LT
Staging tables
Extract
S1
S2
----
--------
----
S3
Load
Target 1
Transform
ODI Designer
ODI Agent
Leverage resources for transformation for high
performance, less I/O, and license
Design data flow by pre-defined templates,
open for all types of data sources (drag & drop)
Capture changes data for near real-time data
synchronization
No coding required
11
For scheduling and real-time monitoring changes only
No need at production
Technical Detail
13
Enables real-time data warehousing and operational data hubs Services plug into Oracle SOA Suite for comprehensive integration
Oracle Data Integrator
Data-oriented Integration
Event ConductorEvent Conductor
Event-oriented
Integration
Service ConductorService Conductor
Service-oriented
Integration
Declarative Design
Metadata
Data ConductorData Conductor
Active IntegrationBatch, Event-based, and Service-oriented Integration
Evolve from Batch to Near Real-time Warehousing on Common Platform
Unify the Silos of Data Integration
Data Integrity on the Fly
Services Plug into Oracle SOA Suite
Benefits
22
14
Declarative DesignDeveloper Productivity
Conventional ETL DesignSpecify ETL Data Flow Graph Developer must define every step of Complex ETL Flow Logic
Traditional approach requires specialized ETL skills
And significant development and maintenance efforts
Declarative Set-based Design Simplifies the number of steps
Automatically generates the Data Flow
whatever the sources and target DB
Benefits Significantly reduce the learning curve
Shorter implementation times
Streamline access to non-IT pros
ODI Declarative Design
Define How: Built-in Templates
Define
WhatYou Want
Automatically GenerateDataflow
11 22
33
15
Journalize
Read from CDC
Source
Load
From Sources to
Staging
Check
Constraints before
Load
Integrate
Transform and Move
to Targets
Service
Expose Data and
Transformation
Services
Reverse
Engineer Metadata
Tailor to existing best practices
Ease administration work Depend on the specific data source, we will select right pre-defined coding
module (Knowledge Module) -> Hot-Pluggable
Support all types of data sources (DB2/AS400, Oracle, Excel, File)
Reduce cost of ownership
Reverse
Journalize
Load
Check
IntegrateServices
Pluggable Architecture
CDC
Sources
Staging Tables
Target Tables
WSWS
WS
Benefits
Pluggable Data Integration ArchitectureHot-Pluggable: Modular, Flexible, Extensible
44
16
Journalize
Read from CDC
Source
Load
From Sources to
Staging
Check
Constraints before
Load
Integrate
Transform and Move
to Targets
Service
Expose Data and
Transformation
Services
Reverse
Engineer Metadata
Tailor to existing best practices
Ease administration work
Reduce cost of ownership
Reverse
Journalize
Load
Check
IntegrateServices
Pluggable Knowledge Modules Architecture
CDC
Sources
Staging Tables
Error Tables
Target Tables
WSWS
WS
SAP/R3
Siebel
Log Miner
DB2 Journals
SQL Server Triggers
Oracle DBLink
DB2 Exp/Imp
JMS QueuesCheck MS Excel
Check Sybase
Oracle SQL*Loader
TPump/ Multiload
Type II SCD
Oracle Merge
Siebel EIM Schema
Oracle Web Services
DB2 Web Services
Sample out-of-the-box Knowledge Modules
Benefits
Knowledge ModulesHot-Pluggable: Modular, Flexible, Extensible
44
17
KMs: Truly Heterogeneous
Generic SQL DB
Oracle DB 9i
Oracle DB 10g
Oracle DB 10g XE
IBM DB2/400
IBM DB2/UDB
IBM Informix SE
IBM LDAP Server
MS SQL Server 2000
MS SQL Server 2005
MS SQL Server 2005 SE
MS Office Access 2000
MS Office Excel 2000
MS Active Directory
Sybase ASA 8.x & 9.x
Sybase IQ 12.x
Sonic MQ v7.0
Teradata V2R5.x
Teradata V2R6.x
Netezza Performance Server 2.2.1
Hyperion Essbase
PostgresSQL 8.1
MySQL 4.0
MySQL 5.0
Oracle BI Suite 10g
Oracle BAM 10g
Oracle Internet Directory 9i
OpenLDAP 2.3
Siebel CRM 7.8
JD Edwards
PeopleSoft
SAP R/3
Oracle EBusiness Suite
Oracle AQ 10g
Oracle SOA Suite
Oracle ESB 10g
SalesForce.com App Exchange
Any JMS Standard Implementation
Out-of-Box
Knowledge
Modules
44
18
Popular Usage Scenarios
19
E-LT for Data WarehouseCreate Data Warehouse for Business Intelligence
Populate Warehouse with High Performance ODI
Heterogeneous sources
and targets
Incremental load
Slowly changing
dimensions
Data integrity and
consistency
Changed data capture
Data lineage
Data Warehouse
Cube
Cube
Cube
----
--------
----
Operatio
nal
Analytic
s
Metadata
Load
Transform
Capture Changes
Incremental Update
Data Integrity
Aggregate
Export
Data Transformation Data Warehousing
20
Fusion Middleware Foundation
ODI for Master Data Management Common Data Quality, and Middleware Services
Siebel CRM
Oracle EBS
PeopleSoftSAP/R3Other
Sources
Oracle Data Integrator
E-LT Metadata
E-LT Agent
Golden MasterRecords
Master Data Management
TelcoTelco EnergyEnergy BankingBanking RetailRetail MfrMfr ..
CustomerCustomer SupplierSupplier EmployeeEmployee ProductProduct AssetAsset ..
IndustrySolutions
MDMApplications
Solutions & Applications Vertical Driven
Data Object Centric
Application Focus
Middleware Foundation Process Orchestration
Business Intelligence
Registry & Policies
Data Integration & Quality
Oracle Data Integrator Batch & Real-time Integration
Data Quality & Profiling
Transformation & Data Routing
21
ODI Enhances Oracle BIPopulate Warehouse with High Performance ODI
Oracle Business Intelligence
Suite EE: Simplified Business Model View
Advanced Calculation & Integration Engine
Intelligent Request Generation
Optimized Data Access
Oracle Data Integrator: Populate Enterprise Data Warehouse
Optimized Performance for Load and Transform
Extensible Pre-packaged E-LT Content
Siebel CRM
Oracle EBS
PeopleSoftSAP/R3Other
Sources
Oracle Data Integrator
E-LT Metadata
E-LT Agent
Oracle BI Enterprise Data Warehouse
Oracle BI Suite EE
Oracle BI Server
Oracle BI Presentation Server
AnswersInteractive Dashboards
Publisher Delivers
Bulk E-LT
22
ODI Enhances Oracle SOA SuiteAdd Bulk Data Transformation to BPEL Process
Oracle SOA Suite: BPEL Process Manager for
Business Process Orchestration
Oracle Data Integrator: Efficient Bulk Data Processing
as Part of Business Process
Interact via Data Services and Transformation Services
Oracle SOA Suite
Business Activity Monitoring
Web Services Manager
Declarative Rules Engine
Enterprise Service Bus
BPEL Process Manager
Bulk Data Processing
Oracle Data Integrator
E-LT Metadata
E-LT Agent
23
ODI with BAMPopulate BAM with ETL Data Efficiently
Oracle SOA Suite Business Activity Monitoring
for Real-time Business Insight
Message-based, event-driven, memory-resident architecture
Oracle Data Integrator High Performance Loading of
BAMs Active Data Cache
Pre-built and Integrated via Knowledge Modules
BAM Java APIs Exposed through Interface Like Any Other Target
Sample Combined Use Cases
Monitor Together Events and the Aggregate Implications of Events
Data Warehouse
Oracle SOA Suite
BPEL Process Manager
Web Services Manager
Business Rules Engine
Enterprise Service Bus
Bulk and Real-Time
Data Processing
SAP/R3
PeopleSoft
Message Queues
CDC
Business Activity Monitoring
Active Data Cache
Event Engine Report Cache
Event Monitoring Web Applications
Oracle Data Integrator
MetadataAgent
24
XML
Integration with SOA/BI/FusionResolve All Integration Challenges
Oracle BI Enterprise Data Warehouse
Oracle BI
Dashboards, Reporting, Analysis, Publishing
Oracle BPA and Human Workflow
Oracle Data Integrator
Transformation Services
Data Services
E-LT AgentMetadata RepositoryKnowledge
Modules
WSDL
Generate Data Services
High speed Batch ELT
High speed JMS ELT
CDC based ELT
Oracle JMS
CDC
Invoke
Invoke Invoke
Invoke
BPEL Process Manager
Invoke
Oracle BAM
Active
Data Cache
Invoke
Service as Data Source
25
Performance
26
ODI vs. ESB
Recommended
Considered
Can use
27
Performance Report
Source and Target: 2 dual
core CPU, 12GB RAM
28
Oracle Data Integrator
ODI with ESB
Data Volume Processing
Data Latency
Message by Message
Mini Batches Large Volume(over 1M)
Synchronous(immediate)
Asynchronous
Batch(over 2 hours)
Oracle Enterprise Service Bus
Real-life
Scenarios
29
Understanding Performance ChoicesWhen you need to transform data at large size
Less than 10MB XML File DB
XML ESB ESB ESB
File ESB ESB depends
DB ESB depends ODI
Between 10-50MB XML File DB
XML depends depends ODI
File depends ODI ODI
DB ODI ODI ODI
Greater than 50MB XML File DB
XML depends ODI ODI
File ODI ODI ODI
DB ODI ODI ODI
Depends on whether an
intermediary XML format
is useful for other
processing (use ESB),
or if joining File data to
tabular RDB data is
required (use ODI)
Depends on ho much
cross-referencing
among the data values
and rows is required
during transformation
the more there is, the
faster ODI will perform
relative to ESB
If the source and target
are both XML, and there
is no cross-referencing
of data among rows,
then a streaming-type or
parallel-engine-type
approach might scale
*caveat always benchmark if you are unsure and require best possible results
(source)
(target)
(source)
(target)
(source)
(target)
Topology 1 Oracle to OracleVietnamese Customer PoC
Oracle 10.2+/Win
Oracle 10.2+/Linux
Repositories
Agent
ODI Designer
Hardware: Quad Core/4 GB RAM
Hardware: Dual Core/2 GB RAM
Data Synchronization
Performance Results
100k rows, 15 fields Load: LKM DBLink 3s
Real-time synchronization (JKM DBLink)
Update 65k: 13s
Delete 30k: 8s
1.2m rows, 8 fields (about 120 bytes/row) Load: LKM DBLink 24s, JDBC 4.5 minutes
Real-time synchronization (JKM DBLink)
Update 5000 rows, 8s
Delete 5000 rows, 8s
Real-time Synchronization with CDCCPU Usage
Without CDC: CPU 10%, 1s-1.5s
Enable CDC (LogMiner) and Use AgentScheduler CPU 2%, 1s-1.5s
Scenario with 1.2m rows
Update 3900 rows, CPU 23%, 2s
Delete 3900 rows, CPU 21%, 2s
35
Summary
36
Oracle Data Integrator
Data Movement and Transformation from
Multiple Sources to Heterogeneous Targets
D I F F E R E N T I A T O RB E N E F I T
Knowledge ModulesHot-Pluggable4
The Chosen Integration
Technology of Oracle FusionFuture Proof5
Declarative CDCReal-time
Integration3
Declarative DesignProductivity2
Heterogeneous E-LTBest Performance1
37
Reference Customers
38
Business Problem: Solution Architecture:
Oracle Data Integrator Solution:
Wanted to enable sales, finance, marketing and
merchandising teams to have access to near
real-time data so that they could make timely,
more intelligent business decisions.
Wanted to know at any point in time if company
performance is meeting the target metrics.
Needed a data integration product that could
handle our high-volume loading and
transformation requirements in near real time.
Found a way to ensure that Teradata data warehouse
was constantly updated.
Even highly complex transformations are
automated within the
Supporting several terabytes of data stored in the
enterprise warehouse, and millions of daily transactions
Platforms:
IBM AIX, Dell Linux
GoldenGate TDM
Transactional Management
Oracle Data Integrator: 100% Java architecture, high-performance E-
LT transformations, business-rules driven transformation design tool,
automatic load script generation
>1.2M SKUs, > 5M daily transactions, >300 users, deployable for
both batch and real-time use cases, leverages power of Teradata
engine for improved speed of data transformation
Data Integration Architecture
Teradata 8-node 54000Oracle 9i RAC & 10g RAC
Data Sources, Targets, and Platforms
Overstock.comHigh-Volume Real-Time Data TransformationOracle Data Integrator, Oracle 9i & 10g RAC, Dell Linux, IBM AIX, Teradata 8-node 54000
Customer:Solution:
Technology:
Oracle Data Integrator is helping us turn our data into gold
Data Integrator allows us to perform data transformations using the power of our Teradata Enterprise Warehousing platform. [] With Oracle, over 300 users are now able to have access to their relevant data in real-time, hourly, daily, or weekly depending upon their needs.
Having access to key business metrics in real-time is no longer a fantasy.
In short, Oracle Data Integrator give us the ability to make better decisions and better manage our bottom line.
Overstock.com, Inc. (NASDAQ: OSTK) operates as an online retailer offering bed-and-bath goods, furniture, watches, jewelry, electronics, sporting goods, and designer accessories.
Company: Overstock.com Product: Oracle Data IntegratorContact: Miranda Nash Email: [email protected]
39
Business Problem: Solution Architecture:
Oracle Data Integrator Solution:
Customer:Solution:
Technology:
Sabre HoldingsHigh-Volume Real-Time Data TransformationOracle Data Integrator, Oracle DB, MQ sources, Teradata Data Warehouse target
We needed a data integration tool that would reduce our dependency on manual coding of E-LT scripts and leverage the power of our Teradata Warehouse for data transformation.
Various other sources over MQFlat Files
Oracle Data Integrator: 100% Java architecture, high-performance E-
LT transformations, business-rules driven transformation design tool,
automatic load script generation
Data Integration Architecture
Teradata Data WarehouseOracle RDBMS
Data Sources, Targets, and Platforms High costs associated with Data Warehouse
loading from new sources
Large Teradata Data Warehouse requires top
performance for loading data in near-real time
Integrated views of data require complex
transformations, expensive to maintain
E-LT architecture maximizes performance and
leverages existing investment in Teradata
infrastructure
Lower development and maintenance costs for
E-LT driven by declarative design tools
Bottom Line: Integrated travel industry data in
consolidated view enables Sabre to better serve
their customers and travel suppliers
For more than 40 years, Sabre Holdings (NYSE: TSG) has transformed the airline industry through technological advancement, the Company offers a portfolio of travel marketing, distribution and technology solutions.
Company: Sabre Holdings Product: Oracle Data IntegratorContact: Miranda Nash Email: [email protected]
40
Business Problem: Solution Architecture:
Oracle Data Integrator Solution:
Customer:Solution:
Technology:
DHLHigh-Volume Real-Time Data TransformationOracle Data Integrator, Oracle RDBMSs, Teradata Data Warehouse, Cobol Flat Files
With Oracle Data Integrator, every batch that used
to last one hour now lasts seconds
Reducing window time is critical to adding more
functionality
Running mini-batches more often results in more
customer services and more revenue
Using the RDBMS as an engine for data
transformation simplifies the administrative workload
24/7 business cannot be compromised by long
ETL batches (via an ETL Tool)
Every daily load cannot last more than one hour
When the volume of data doubles, execution
time triples
Data Integration was the bottleneck in providing
more services
Solution completely meets our needs. [] Oracle Data Integrator was developed by ETL developers, who really know and understand ETL concerns and pains, and how to do things better.
Platforms:
Linux, Cobol
Flat Files
Oracle Data Integrator: 100% Java architecture, high-performance E-
LT transformations, business-rules driven transformation design tool,
automatic load script generation
2.5 terabytes loaded every 15 minutes from 8 major data sources
>50 events, >5 shipments and > piece/parcel records per day
Data Integration Architecture
Teradata Data WarehouseOracle RDBMS
Data Sources, Targets, and Platforms
Company: DHL Product: Oracle Data IntegratorContact: Miranda Nash Email: [email protected]
For more than 35 years, DHL has built the world's premier global delivery network by trailblazing express shipping in one country after another. Over 220 countries and territories later, DHL is the global market leader of the international express and logistics industry.
41
Business Problem: Solution Architecture:
Oracle Data Integrator Solution:
Customer:Solution:
Technology:
iBasisHigh-Volume Real-Time Data TransformationOracle Data Integrator, Oracle 10g, Netezza PowerCenter NPS8350 Warehouse Appliance
Founded in 1996, iBasis (NASDAQ: IBAS) is one of the largest carriers of international voice traffic in the world and a leading provider of prepaid calling services.
The first thing that struck us
was the speed with which we
ramped up our ETL
developments with Oracle
Data Integrator.
"Given the massive volumes of data we need to process every day, getting timely data in the data warehouse requires high performance loading processes. Using Oracle Data Integrators set of Knowledge Modules for Netezza, we are able to take advantage of the massively parallel processing capabilities of Netezza and to reduce load times significantly. [] as our goal is to go more and more toward real-time, it will be easy for us to change the latency of these flows without having to redevelop them."
Applications (future):
Call Billing, Network Monitoring
Flat Files
Oracle Data Integrator: 100% Java architecture, high-performance E-
LT transformations, business-rules driven transformation design tool,
automatic load script generation
4.5TB data warehouse, > 8 billion records, company processes >150
million transactions per day
Data Integration Architecture
Netezza PowerCenter NPS8350Oracle RDBMS
Data Sources, Targets, and Platforms Data warehouse had become obsolete and could
not respond to the growing requirements of
management, sales, and operational centers
Needed more accurate and timely data
Replaced entire Data Warehouse infrastructure
Needed a data integration that would provide the
scalability and performance they needed to
aggregate, transform, and load their data
Company: iBasis Product: Oracle Data IntegratorContact: Miranda Nash Email: [email protected]
42
Analysts Coverage
43
Gartner
Sunopsis (Oracle) has made strides in building
market awareness beyond its base in Europe.
Sunopsis has a range of capabilities, spanning ETL
and real-time messaging, and an architecture that
enables distribution of transformation workload
across data sources and targets.
Ted Friedman, Bill Gassman,
Magic Quadrant for Extraction, Transformation and Loading, 1H05,
May 11, 2005
44
Bloor Research
While there are many relatively young
vendors within the ETL market, Sunopsis has
undoubtedly made the biggest impression,
both in terms of the users that it has gained
and in the way that its approach has
influenced the market.
Philip Howard,
Bullseye Report - Extract, Transform & Load,
March 28, 2006
45
Gartner
By purchasing Sunopsis, Oracle has acquired a server-independent and platform-independent data integration tool, which will be renamed Oracle Data Integrator (ODI). OFM and Oracle Applications customers will welcome the addition of the ODI's database independence. In particular, the acquisition could provide needed new momentum for Fusion Middleware. Fusion Middleware customers have heterogeneous IT environments, as do former PeopleSoft, Siebel Systems and JD Edwards customers, who have an ongoing requirement for integration with non- Oracle systems. The acquisition will provide OFM with a data integration tool that is capable of deploying small-grained data services within a service-oriented architecture (SOA) environment. This capability could have a positive influence on Fusion Middleware - if Oracle leverages the Sunopsis philosophy.
Mark A. Beyer, Ted Friedman
Sunopsis Data Integration May Fuel Oracle Fusion Middleware
October 23, 2006
46
Forrester Research
Oracle has recognized that its customers require
diverse data integration features without having to
integrate and manage products from many vendors.
Integrating Sunopsis heterogeneous extract, load,
transform (ELT) and event-driven CDC capabilities
within its middleware offerings is a great start.
Rob Karel
Oracle Makes Serious Move In Data Heterogeneity by Acquiring
Sunopsis
October 29, 2006