17
Exero 01, 5555 BLA BLA BLA 1

Green Field March Edition

Embed Size (px)

Citation preview

Page 1: Green Field March Edition

8/6/2019 Green Field March Edition

http://slidepdf.com/reader/full/green-field-march-edition 1/16

Exero 01, 5555 BLA BLA BLA 1

Page 2: Green Field March Edition

8/6/2019 Green Field March Edition

http://slidepdf.com/reader/full/green-field-march-edition 2/16

2 BLA BLA BLA Exero 01, 5555

» THE GREEN FIELD MARCH 2010 , pg.1

Page 3: Green Field March Edition

8/6/2019 Green Field March Edition

http://slidepdf.com/reader/full/green-field-march-edition 3/16

Exero 01, 5555 BLA BLA BLA 3

March 2010 - Edition 1 INSIDE

gREENFIELDTABLE OF CONTENTSFrom the Editor 2

THE SAgA o ETL 3 Traversing the open route Talend - The Vanguard 4Kettle - Transorm Data into Prots 5Apatar - Bringing Business Closer to IT 6CloverETL - ETL made Easy 6,7

ETL 7 Time or a walk in the clouds

Maximize 8ROI on Data Integration with Inormatica

Datastae 9 The solution to enterprise data integration

Ab Initio 10A new beginning

Redefne 10Data Quality with ODI

SSIS 11

Microsot’s Bet in the ETL MarketMerin Horizons 12O ETL, EAI and EII

ETL in the times to come 13

DID YOU KNOW?Banita Rout, a BI Team Member is a key con-tributor o this newsletter and you can read

more o her articles:

KETTLE - TRANSFORMDATA INTO PROFITSStuck between “Build “or “Buy”?

Well. Nothin strane. Because both are veryattractive choices which make the decision

even more dicult. This scenario is the samewith any data warehousing project where youneed to decide whether populate your datawarehouse manually using custom code orchoose a proprietary ETL tool like Inormaticaor Oracle Warehouse Builder.

 Then you know there is always this one goodthing about the open source tools…. You getexactly what you need or ree. Well this hasbrought smiles on the aces o business in alot o organizations. And a handul o theseare ones who went on to choose an opensource product or ETL.

The SAGA of ETL

BY SWETA GUPTA

more on pg. 3

REVERSIDE BI NEWSLETTER

» THE GREEN FIELD M

ARCH 2010 , pg.2

From the Editor’s Deskby Sweta Gupta

Business Intellience is the buzz word that oc-cupies the top rank in the list o priorities o theCIOs.

 The main challenge o Business Intelligence is togather and serve organized inormation regard-ing all relevant actors that drive the businessand enable end-users to access that knowledgeeasily and eciently and in eect maximize thesuccess o an organization. As competition getsercer in the market, opting or the correct Bisolution assumes increased importance.

With this growing popularity o Business Intelli-gence, we, at Reverside BI Labs, ocus on explor-ing the gREEN areas with the idea o coming upwith our independent opinion about the lead-ing BI strategies, tools and vendors. A team o IT proessionals is dedicated or doing researchand survey o the BI market and building capac-

ity. We dive to the depth o the various proposi-tions that the leading tools and vendors in themarket promise, or showcasing the availablealternatives and helping our valued clients ndthe best suited solution.

We are pleased to bring orth gREEN FIELD ourBI newsletter with the idea o providing a spaceor showcasing the research and views o the BILabs and sharing it with all the members o theorganization.

In this rst issue, we ocus on one among the

most important areas that Business Intelligence(BI) covers rom a technical standpoint - ETL orExtract, Transorm and Load. We have tried toput orth a unied view o the past, the pres-ent and the uture o ETL, an integral part o BI.We have discussed some leading proprietary

ETL tools like Oracle Data Integrator, Ab Initio,Datastage as well as Open Source like KETTLE, Talend, CloverETL and Apatar. And Comparisono these tools has been based on statistics andndings by the BI lab team.

So next time you are looking or what’s new andlatest in the BI market, ASK US!!

  Thanks to the BI lab team members or theircontributions. We hope that you like this editionand please do share your comments to help usmake gREEN FIELD better.

» Read more on pg. 5

» Talend - The Vangaurd pg. 4» DataStage - The solution to Enterprise DataIntegration pg. 9

Page 4: Green Field March Edition

8/6/2019 Green Field March Edition

http://slidepdf.com/reader/full/green-field-march-edition 4/16

4 BLA BLA BLA Exero 01, 5555

» THE GREEN FIELD MARCH 2010, pg.3

by Sweta Gupta

These have been the causes o nightmares ororganizations both big and small. So how do weprepare to ace these challenges and ascertainthat they would not haunt us again?

In this era o high-end technology and inor-

mation, anything that is used and bought byan organization translates to a source o hugeamount o data. We are ooded with inorma-tion rom numerous sources which adds morecomplexity in harnessing it as well as derivingsubstantial conclusions. There is too much databut very little insight. Absence o “Single Versiono Truth” causes chaos.

Business decisions are driven by “Availability o Right inormation at Right time”. For Example,Gross Margin can be dened dierently by Fi-nance and Marketing which inuences how andwhat numbers are reported.

Data Warehousing emerged as the savior ororganizations by harnessing inormation andtranslating them to prots. With growth in mind,it has been on the nerves o all organizationsacross all domains. As the keeper o highly re-ned and detailed inormation, data warehous-es orm the base and core o strategic decisionstaken by business.

 Thanks to ETL (acronym or Extract, Transormand Load) which sieves out only the requiredand ne grained data rom the transactionalsystems and routes them to data warehouse. Asan acronym, however, ETL only tells part o thestory. ETL tools also commonly move or trans-port data between sources and targets, docu-

ment how data elements change as they movebetween source and target (i.e., meta data), ex-change this meta data with other applicationsas needed, and administer all run-time process-es and operations (e.g., scheduling, error man-agement, audit logs, and statistics.)

With ETL

STOP Crunching numbers 

START Crushing them!!! 

ETL, since the time o introduction has continu-ously improved and evolved to help users takebetter and more inormed decisions. In earlynineties, the ETL process was hand coded. De-velopers used a combination o dierent lan-guages like Shell, SAS, Perl, Database, etc. towrite custom codes to perorm the ETL task. Butthese hand written ETL programs

1. Were lengthy and hard to document.

 2. Hand coding o ETL required maintenance o 

metadata tables separately.

3. Any new changes required manual changesto the metadata tables.

4. Also these programs, being single threaded,had a slower rate o execution.

In mid 90s vendors recognized the opportu-nity and started shipping ETL tools that wouldlessen the arduous task o writing programs or

ETL. And thus Code Generation ETL Tools cameinto market. These tools provided a graphicaluser interace which would generate the codeor the ETl process.

However this did not succeed in the long run….Reasons??

1. The tools would produce the code in third gen-eration languages like COBOL and hence main-tenance o the code was dicult as it required extensive knowledge on the specic language.

 2. Also it did not automate the run time environ-ment.

3. Oten, administrators had to manually dis-tribute and manage compiled code, scheduleand run jobs, or copy and transport les.

All these were enough reasons or vendors tobring to the market the Engine based ETL tools. These products launched in mid to late ninetiesemployed proprietary scripting languages run-ning within an ETL or DBMS server. The devel-opers use the graphical interace to design theETL workows which are stored in a metadatarepository. The ETL engine which typically sitson a Windows or UNIX machine either connectsdirectly to a relational data source and readsthe repository at runtime to determine how to

process the incoming data. It can also connectto non relational databases using third partygateways to connect or by creating a at le. Itis also possible to process ETL workows in par-allel across multiple processors

Although the engine based approach uniesthe design and the runtime environments itdoes not necessarily eliminate all custom cod-ing and maintenance. The user needed to write

code or complex requirements which mademaintenance dicult. The Volume o data keptincreasing exponentially through time and theparameters or measuring perormance grewmore complex. The vendors soon realized theweight o the “T” in ETL.

Bulk processing had to be adhered to meet thechallenge. The only way it could be achievedwas by moving the transormation overheadrom ETL engines to the source and target da-

tabases. With the transormations being in-da-tabase, data could ow rom source to targetand then transormed by the target database. Itwould thus eliminate the row-by–row process-ing thereby improving eciency and peror-mance o the ETL process.

 Today, several DBMS vendors embed ETL capa-bilities in their DBMS products (as well as OLAPand data mining capabilities). Since these Data-base Centric ETL vendors oer ETL capabilitiesat no or little extra charge, organizations areseriously exploring this option because it prom-ises to reduce costs and simpliy their BI envi-ronments.

So it’s not weird to hear users asking “Why shouldwe purchase a third-party ETL tool when we canget ETL capabilities or ree rom our databasevendor o choice? What would be the additionalbenets o buying a third-party ETL tool?”Is it that ETL tools are used only with warehous-es?

 The answer is NO. ETL does not only work withdata warehouses but also when it comes tomoving data among applications (web-based), customer data integration and database con-solidation. The non – data warehouse usage isgrowing alarmingly and is already more than40 % o the total ETL industry usage. User prac-

tices or ETL continue to evolve to keep pacewith new requirements or data integration. Theresult is a growing market and innovative userpractices.

Market growth proves ETL is here to stay.Hey Wait!!! This is not the end o story. There islot more in store or you.

The SAGA o ETL

» Read more about “ETL in the times to come...”pg.13

Page 5: Green Field March Edition

8/6/2019 Green Field March Edition

http://slidepdf.com/reader/full/green-field-march-edition 5/16

Exero 01, 5555 BLA BLA BLA 5» THE GREEN FIELD M

ARCH 2010 , pg.4

The Vanguardby Banita Rout

There is nothin new about the act that or-ganizations’ inormation systems tend to growin complexity. The reasons or this include the“layer stackup trend” and the act that inor-mation systems need to be more and moreconnected to those o vendors, partners andcustomers. Another reason is the multiplica-tion o data storage ormats, protocols anddatabase technologies.

So how do we manage a proper integration o this data scattered throughout the company’sinormation systems? Various unctions laybehind the data integration principle: busi-ness intelligence or analytics integration andoperational integration.

By 2000, data warehousing had begun toemerge as a concept that was applicable tocompanies rom medium sized and small tolarge, but there was a gap in the marketplace.While there certainly was a need or ETL, thedeal size was so large that midsize and smallercompanies could not aord the ETL sotware.It was then that a new type o ETL - was in-troduced. This was ETL or the midsize mar-ketplace. And Talend emerged as the numerouno in this space. Both ETL or analytics andETL or operational integration needs are ad-dressed by Talend Open Studio.

  Talend has a unctional ETL tool set, but atopen systems prices. This means that there is

aordability or the midsize world. Talend o-ers its basic kernel or ree. The basic kernelcan be downloaded rom the Internet. Sittingon top o the Talend basic kernel are other ea-tures and services.

  Talend ts into the marketplace with goodunctionality at a price signicantly below thato any other competitor. This is indeed goodnews or the midsize companies who needETL but who do not need the price tag o aull-blown ETL package oered to and usedby much larger companies.

 Talend hits all the highlights one would look 

or in traditional integration platorms:

Batch Delivery,•

Transorms,•

ETL,•

Data Governance,•

 And a strong set o connectivity adapters.•

At the same time it keeps pace with importanttrends with such eatures as change data cap-ture, metadata support, ederated views, andSOA-based access to data services. Talend is

capable o scaling rom small departmentalle migrations to large-scale enterprise ware-housing projects

 Talend Open Studio operates as a code gener-ator allowing data transormation scripts andunderlying programs to be generated eitherin Java or Perl. Its GUI is made o a metadatarepository and a graphical designer. TalendOpen Studio is a metadata-driven solution.All metadata is stored and managed in the re-pository shared by all the modules. The jobsare designed using graphical components, ortransormation, connectivity, or other opera-

tions. The threads created can be executed romwithin the studio or as standalone scripts.

In a nutshell, Talend Open Studio is a solution

that addresses all o an organization’s data inte-gration needs:

Synchronization or replication o databases•

Right-time or batch exchanges o data•

ETL or analytics•

Data migration•

Complex data transormation and loading•

Data quality •

Continuous eorts are being put in to make Tal-end a tough competitor to the products rom

the commercial space. With every version wesee enhanced eatures being added. Some o the eatures seen in the latest version o Talendare:

Open Bravo components•

Die on Error on tamp•

Enable Inormix Bulk inserts (tInormixBulk-•

Output).

tELTMSSQL, tELTSybase and tELTPostgreSQL•

components

E • nable PreparedStatement or the all DB

Row components

MacOS X ini le points to correct launcher-•

 Analysis o a set o columns are enhanced 

 Ability to use Java User-dened indicators•

New type o UDI (with numeric values)•

Menus to drill down into the values on pat-•

tern matching indicator 

Here is a comparative analysis o Talend withsome o its biggest open source and proprietarycompetitors.

Talend Vs pentaho:Pentaho is a metadata driven ramework whichis tightly integrated into a BI ramework whereas Talend is a code generator which can be easilyintegrated to any BI platorm.

Pentaho supports java as the programminglanguage where Talend supports both Perl andJava. And there are no limitations o loading o date in case o Talend.

And the most important thing is that we don’tneed to install and congure the Talend sot-ware.

Talend Vs CloverETL:CloverETL is also a metadata driven ramework where there is a limitation o loading the hugenumbers o records.

Clover ETL doesn’t accept .xlsx les where as Tal-end can easily do it.

Talend Vs Inormatica:Inormatica is cost eective and Talend is anopen source where we can download and con-gure easily.

In Talend we can export the job as a script andcan run it through the command prompt whichis not in case o Inormatica.

 Talend is now the recognized market leader inopen source data integration and has becomea competitor to the other market leaders. Thereare now more than 1,000 paying customersaround the globe, including Yahoo!, Virgin Mo-bile, Sony and Swiss Lie.

It’s not unreasonable to say that Talend will de-nitely go a long way.

Talend

Page 6: Green Field March Edition

8/6/2019 Green Field March Edition

http://slidepdf.com/reader/full/green-field-march-edition 6/16

6 BLA BLA BLA Exero 01, 5555

KETTLETransorm Data into Proftsby Harapriya Montry

Stuck between “Build “or “Buy”?

Well. Nothin strane. Because both are very

attractive choices which make the decision evenmore dicult. This scenario is the same with anydata warehousing project where you need todecide whether populate your data warehousemanually using custom code or choose a propri-etary ETL tool like Inormatica or Oracle Ware-house Builder.

  Then you know there is always this one goodthing about the open source tools…. You getexactly what you need or ree. Well this hasbrought smiles on the aces o business in alot o organizations. And a handul o these areones who went on to choose an open sourceproduct or ETL.

Let’s di a little deeer.

  The ‘build’ solution is appealing in that thereare no upront costs associated with sotwarelicensing and you can build the solution to yourexact specications. However, businesses todayare in a constant state o change and the ongo-ing costs to maintain a custom solution otennegate the initial savings. Proprietary ETL oer-ings will get your project o the ground asterand provide dramatic savings in maintenancecosts over time, but oten carry a six gure pricetag just to get started.

Pentaho Data Integration delivers the best o 

both worlds with no upront license costs anda signicant reduction in TCO compared tocustom built solutions. An annual subscriptionproviding proessional support, certied builds,and IP indemnication is also available at a rac-tion o the cost o proprietary oerings.

When the KETTLE open source product movedunder the Pentaho umbrella it gave the producta new lease on lie. This was already one o themost (i not the most) popular open source ETLwith a vibrant developer community. Howeverit was at risk o alling behind those open sourceETL oerings (like Talend) that was backed by aunded company. Pentaho is a good match orKETTLE as it puts it into a complementary suite

and oers some out o the box integration be-tween products.

Companies looking or all in one open sourcebusiness intelligence are going to like this suite– Pentaho Data Integration.

Unlike the traditional ETL process (extract, trans-orm and load) KETTLE has a slightly modiedcontent “ETTL “

Data extraction rom source databases•

Transport o the data•

Data transormation•

Loading o data into a data warehouse•

Kettle is 100% metadata based, without requiring any code generation in order torun properly. Metadata driven ETL tools are worth their worth in gold because they don’t require code changes in order to fully 

manage and control the tool.

It uses an innovative meta-driven approach andhas a strong and very easy-to-use GUI. It has astrong community o 13,500 registered users.It has a stand-alone java engine that processesthe jobs and tasks.

Kettle comes with 4 tools:

Spoon: GUI allowing you to design complex •

transormationsPan: Batch executor o transormations (XML•

or in repository)Che: GUI allowing you to design complex •

 jobsKitchen: Batch executor o jobs (XML or in re-•

 pository)

Why should we go or Kettle?

It is one o the oldest open source ETL tools. Ithas a large user community and a new driverom the support rom Pentaho. It can run onWindows, UNIX and Linux. It has the integrationwith other Pentaho open source products such

as BI, EII and EAI. No ee or license .It has a strongeasy to use GUI require less training. It includesa transormation library with over 70 mappingobjects. Almost every popular database is sup-ported. Many advanced eatures exist to allowast inserts such as batch updates. A Pentahoorum and a Issue Tracking and Pentaho Com-munity with deep live technical articles that arebetter than some premium ETL vendor sites.

pentaho Data Interation is a ull-eature ETLsolution includin:

Transormations and jobs are made up o •

100% Meta data. This meta-data is parsed by Kettle and executed. No code-generation

is involved.Pentaho Kettle has relatively richer eatures•

(compared to other open source alternativeslike Talend, CloverETL etc) in its open sourceversionFairly large connectivity options to support •

all databases and systemsVery rich library o transormation objects•

which can be extended Supports real-time debugging•

Command line or application interace to•

control and run jobs – available in both opensource and commercial editionsOne o the very important ETL need “Di-•

mension Lookup/Update” to handle slowly 

changing dimensions is available & easy touse.Error logs are easily available and they are•

easy to congure. No need to code it explic-itly Pentaho Services monitoring console avail-•

able or monitoring Pentaho related servic-esThough error recovery is manual, or “Text •

File Input” and “Excel Input” operators that are capable o logging the error rows and tore-run only those error rows when run againClustering eature is available open source•

editionIt provides a plug-in mechanism that allows•

us to create plug-ins or any possible data

acquisition or transormation purpose.It is one o the only ETL tools on the market •

to support partitioned tables on PostgreSQLby allowing records to be inserted into dier-ent inherited tables.It can schedule tasks but needs a scheduler •

or that.It can run remote jobs on “slave servers” on•

other machines.It has data quality eatures: rom its own•

GUI, writing more customized SQL queries, JavaScript and regular expressions.

It supports a Parallel Processing Architec-•

ture by distributing ETL tasks across multipleservers.

Out o the box integration with other Penta-•

ho open source products such as BI, EII and EAI.The GUI Designer interace, the out o the•

box transormer objects and the support or slowly changing dimensions should enableincreased developer productivity.Community articles show an enthusiastic •

sharing o tips and tricks.Enterprise-class perormance and scalabil-•

ity SAP Connector also available•

KETTLE Vs Talend

Both Talend and Kettle are ew o the industryleading Open source ETL tools; let’s compareew o their eatures.

Ease o Use:

Pentaho Kettle – It has the most easy to useGUI out o all the ETL tools. Training can also beound online or within the community.

 Talend – It has also a GUI but is an add-on insideEclipse RC.

Seed:

Pentaho Kettle – it is aster than Talend, but the

Java-connector slows it down somewhat. Alsorequires manual tweaking like Talend. Can beclustered by placed on many machines to re-duce network trac.

  Talend – It is slower than Pentaho. It requiresmanual tweaking and prior knowledge o thespecic data source to reduce network tracand processing.

Data Quality:

Pentaho – It has DQ eatures in its GUI, allows orcustomized SQL statements, by using JavaScriptand Regular Expressions. It also has some addi-tional modules ater subscribing.

 Talend – It has DQ eatures in its GUI, allows orcustomized SQL statements and by using Java.

Connectivity:

Pentaho Kettle – It can connect to a very widevariety o databases, at les , xml les, excelles and web services.

  Talend – Can connect to all the current data-bases, at les, xml les, excel les and web ser-vices, but is reliant on Java drivers to connect tothose data sources.

 The best part is Pentaho Data Integration’s meta-data-driven approach lets you simply speciyWHAT you want to do, but not HOW you wantto do it. Now administrators can create complextransormations and jobs in a graphical, drag-and-drop environment without having to gen-erate any custom code.

And denitely Pentaho is a good match or KET- TLE as it puts it into a complementary suite andoers some out o the box integration betweenproducts. KETTLE transorms data into prots.

» THE GREEN FIELD MARCH 2010 , pg.5

Page 7: Green Field March Edition

8/6/2019 Green Field March Edition

http://slidepdf.com/reader/full/green-field-march-edition 7/16

Exero 01, 5555 BLA BLA BLA 7

Apatar

CloverETL

Bringing Business Closer to ITby Jagyanseni Das

ETL made Easyby Sodyam Bebarta

A rorammer spent hours o his spare timeto switch rom one database ormat to anotherwith big troubles related to handling duplicates,

converting ormats. Then he was supposed tointegrate thousands o records rom several MSExcel spreadsheets to Oracle DB.

A Business Analyst wonders how much timewould be required to integrate all data. Howwould the company’s data warehouse be edwith current and past records?

 This is where ETL tool comes to the rescue o de-velopers as well as business. ETL tools are meantto extract, transorm and load the data into DataWarehouse or decision making. They make the job o a programmer easier by providing an easyway to update and insert records to dierentsources very quickly. There are various ETL toolavailable in the market.

Apatar is one o the market leading ETL tool. Ap-atar open source project was ounded in 2005by Apatar, Inc. At the beginning o the 2007 AlexMalkov, Apatar Product Manager elt it was timeto share the results o their hard work amongbusiness users or saving them rom spendingendless hours o coding building “pipes” be-tween data sources and applications.

Why Aatar?

It is very user-riendly and, even or a non-tech-nical user; it would take just a couple o hours to

get trained. Where data integration is requiredon a regular basis, Apatar will benet rom re-usable connectors and mappings, with a urtherquantum leap in productivity. Customers don’t

Total Cost o Ownershi is the major concern

or all organizations irrespective o the geog-raphy, domain and area o operation. The ETLmarket is hot with many vendors and productshowever there are very ew products which ad-dress all the above mentioned scenarios.

One such product is Clover ETL. It is platorm-in-dependent and resource-ecient. Due to highscalability it can be used on low-cost PCs as wellas high-end multi-processors servers. CloverETLis enhanced with the visual design o data trans-ormations CloverETL Designer. It allows or theeasy design o any data manipulating applica-tion through the suitable combination o stan-dard predened ETL components using visualeditor.

CloverETL Engine is an Open Source tool dis-tributed under dual license which allows totaltransparency and control over the tool, as com-plete source code or the engine is available toall customers and end-users.

In order to use any sotware tool in a proession-al environment, it is necessary to have compe-tent support and service to provide bug xes orapplication enhancements and to have expertconsultants at hand who have practical knowl-edge and experience with the tool. Customer

have to pay or the sotware as it is an OpenSource.

Once the Programmer installs and starts work-ing on the environment, we don’t need to paymore to the programmer or maintenance. Alittle bit o technical knowledge & training isrequired to start working on Apatar. Apatarhas been developed using sophisticated tech-niques to achieve the data integration by “dragand drop” o connectors, Operations, Data Qual-ity services.

Features o Aatar:

Integrate data across the enterprise•

Populate data warehouses and data marts•

Cross systems such as: source systems; fat •

les; FTP logic; queue-to-queue; and appli-cation-to-application

Cross time-zones, currency barriers•

eciently 

Overcome brittle “mainrame” or legacy •

code uplinks that transer data, sometimes

unreliably 

Schedule, maintain no-code or little code in•

connections to many dierent systems

Platorm Independent •

Advantae:

No Coding .A visual job designer is used to•

develop all kind o mapping and transor-mation.

It provides connectivity or more than 40 di-•

erent data sources.

support as well as training and consulting ser-vices or CloverETL Engine/CloverETL Designerare oered by OpenSys company .

How does CloverETL work?

It is based on the Transormation Graph. Trans-ormation Graph is a set o certain specializedcomponents, which are interlinked via datapipes. Every such component perorms certain

operation . Data processed by CloverETL owsthrough the transormation graph and is step-by-step transormed into the required ormat.While perorming the transormation, data canbe merged, sorted, split or enriched in manyother ways.

pros

Embedded technology Being completely platorm independent, Clov-erETL can be easily embedded in other applica-tions as a powerul transormation library.

It is Unicode-compliant (meaning It can han-•

dle any language

It provides mailing acility or Notication•

It provides CDYNE Death Index or veriy-•

ing Social Security inormation and prevent 

deceased credit raud. I the customer is

deceased, providing inormation about the

date o death, date o birth, and zip code o 

last known residence. All o the inormation

cross-reerences with CDYNE’s Death Index 

master le, which is updated directly rom

the U.S. Social Security Administration once

a month.

It Provides CDYNE Phone Verication Web•

service allows determining the validity o 

any U.S. or Canadian phone number.

New Releases:

  Apatar Allows to Automatically Truncate•

Text to Specied Length

 Apatar Data Integration Parses the Text o E-•

mails Omitting System Inormation

 Apatar Controls Number o Scheduled Data•

Integration Launches

  Apatar Data Integration Replaces Multiple•

Field Values

With its Ease of Use, Apatar is bridging the gap be-tween business and IT.

Development is still on to provide connector orSAP, Microsot Exchange Server, and MicrosotDynamics CRM. Let’s see what the next releasehas or us.

Small ootprint Compared to its competitors, CloverETL showsmodest memory requirements even when per-orming complex data transormation tasks

Rapid customization Thanks to its modular structure, CloverETL canbe easily extended by custom Java-coded com-ponents. Such components can be used as any

other component contained in the standardpackage.

Reduced cost o ownershipCloverETL suite oers wide range o solutionsto meet any user requirements. Ranging romdevelopers-oriented CloverETL Engine to enter-prise-oriented CloverETL Server, CloverETL de-livers the best price-perormance ratio.

» THE GREEN FIELD M

ARCH 2010 , pg.6

» continue on pg.7

Page 8: Green Field March Edition

8/6/2019 Green Field March Edition

http://slidepdf.com/reader/full/green-field-march-edition 8/16

8 BLA BLA BLA Exero 01, 5555

» THE GREEN FIELD MARCH 2010 , pg.7

Short development timeCloverETL is continuously developed by a stableteam o programmers, which allows exible re-action to customer needs and eedback. Cus-tomization can be delivered within a ew days.

Easy installation

  There is no need or expensive on-site assis-tance, CloverETL can be easily installed, cong-ured, and run by its users. There is no need orinstallation o expensive proprietary applica-tions as running environment.

New Features and Comonents:

Inobright Data Writer: This component writes data into Inobright sot-ware, a column-oriented relational database. In-obright is a provider o solutions designed todeliver a scalable data warehouse optimizedor analytic queries. Inobright is a highly com-pressed column-oriented database, based on

MySql engine. In this database data are storedcolumn-by-column instead o more typicallyrow-by-row. There are many advantages o column-orientation, including the ability to domore ecient data compression and allowingcompression to be optimized or each particulardata type. The higher eciency can be achievedbecause each column stores a single data typeas opposed to rows that typically contain sev-eral data type. However the main purpose o Inobright sotware is to deliver a scalable datawarehouse database optimized or analytic que-ries

Enterrises are slowly moving into Cloud Com-puting and IT giants like Microsot ,Amazon,IBM and many other gearing up to acilitate thischange. Technology is used by businesses to cutcost and shiting to cloud computing will notonly cut cost but will allow companies to ocuson their core business.

“Cloud computing is a way o computing, viathe Internet, that broadly shares computer re-sources instead o having a local personal com-puter handle specic applications.”

Cloud Computing can actually be sliced intothree main layers.

Hardware/Inrastructure –Storage or CPU 1. power Sotware/Application you can use the sot- 2.ware as a utility like renting a car.Platorm here you can build and deploy your 3.web applications

Cloud Computing can be viewed as a stack o the above layers.

Now with cloud computing becoming popular

we will soon see data in dierent data ormatsin many dierent clouds. With time we wouldrequire cloud to cloud integration or even cloudto enterprise integration that’s were ETL (Extract Transorm and load) comes into the picture. Dueto the adoption o cloud computing we havedata scattered all around so now we need tobe sure that the data is up to date accurate andcomplete.

Cloud Data Interation:Importing and exporting data needs the ETLto read data in dierent ormats and convertthem to the right ormat o the target system we

Web Services component:  The new component makes communicationwith Web Services easier than ever. It providesuser riendly graphical interace or mappingyour data into Web Service elds, automaticallygenerates requests and process responsesAd-ditionally to reading plain data rom MicrosotExcel sheets, the Excel component is now alsocapable o reading user-ormatted values suchas currencies, dates or numbers.

New tracking option:Customers can now see all absolute speed ratesor nished data transormations, acilitatingcomparative analysis in pursuit o process im-provements.

New Aspell Lookup table:Brand new implementation o this componentbrings better perormance, improved congu-ration and better customization.

Improved treatment o empty (NULL) values:Developers can now speciy special strings thatshould be treated as empty (NULL) when data isbeing parsed. This eature simplies processingo typical application export les which oten

contain values insignicant or ETL processing.Additionally it may lead to improved processingthroughput and lower memory consumption o data transormation.

More user riendly File URL dialog and improvedLDAP unctionality.

CloverETL Vs OthersCloverETL is a metadata based tool, it does notrequire any code generation in order torun jobs. Changes made to transormations out-

would need the ETL to map dierent le ormatslike relational database to at le or at le toweb service. Using the power o graphical toolsall this can be just a click a drag and drop.

Synchronization o data:Having several applications we need to have aproper synchronization o data between themthis to can be done using the power o the ETL.Where the look-up unctionality o the Transor-mation can sync the data between the dierentdata sources.

Otimization o the data:Using the power o the ETL we can look or du-plicate data or even check or data integrity. TheData Quality checks can be done on the dataand the errors can be logged into

Relication o data:ETL can be used to replicate data to back it upor even or moving rom on premise databaseto the cloud.

You may be wondering that our common ETLdoes all these things already but what’s newwith a cloud?

 As we can get Inrastructure as a service (IAAS)1.we can do the transormations in no time.We can increase the speed o the processingby increasing the inrastructure required.The Bulk data that needs to be processed can be processed in parallel and the Inra-structure can be release when completed.

We no longer need to worry about the 2.sotware installation and license main-tenance it would all be a web based ex-  perience. (SAAS)Sotware as a Service

side visual editor are reected once loaded back into the designer

It brings a smaller palette o components buttheir unctionality is more Complex, they beat  Talend’s equivalents in many aspects. It’s alsoeasier to choose

Suitable component in CloverETL’s palette thanin Talend’s palette

It has a special ETL scripting language, CTL (Clo-ver Transormation Language) which is easy tolearn and enables users without programmingskills to develop a complex transormation in ashort time.

Clover ETL and Talend, both products supportcomponent and pipeline parallelism to speedup executions.Test results show that Talend isnot able to eciently utilize more CPUs to speedup an execution.

Even or an experienced ETL developer, PentahoData Integration is denitely more dicult tolearn than CloverETL. Components oten haveunexpected names and conusing interace.

Many components require sorted input, makinggraphs more cluttered. News o the Hour: Customers can evaluate thesenew features along with CloverETL’s other leadingcapabilities with a free 30-day trial of the Clov-erETL Designer Pro evaluation, which is availableat www.cloveretl.com. Information management   professionals can also evaluate the enterprise in-tegration features of CloverETL Server via an on-line demo.

We will be able to experience the true inte-3.gration capabilities o an ETL.

 The combination o the ETL technology with aCloud with proper planning could set any smallbusiness up and running and it can use thesame technologies that any big size companymay be using. Integration is key here as and ETLcan make or break the harmony between theclouds.

Large company’s who have already spent huge

sums o money on data centers and applicationswould not want to move into cloud computingorgoing what they have already invested in,rather they would integrate the existing Inra-structure with a cloud. This could allow themthe exibility to experiment with new technol-ogy without having to worry about the inra-structure or the cost o licenses.

With clouds we will also see unstructured dataall around and the challenge to maintain thisdata can be possible using Content ETL whichcan map the dierent models look or the per-missions, metadata, users and then perorm theactual transer. Thus the ETL can now be used

or Content transers between the Clouds.

 The Cloud using the ETL could take data wher-ever whenever required, and this could alsomean optimized use o the resources which inturn could reduce cooling cost o servers andmaintenance o large data centrists.

Could it actually be that a proper integration us-ing ETL and cloud computing is here to makethe world a greener place?

CloverETL cont...

ETL - Time or a walk in the cloudsby Sudip Basu

Page 9: Green Field March Edition

8/6/2019 Green Field March Edition

http://slidepdf.com/reader/full/green-field-march-edition 9/16

Exero 01, 5555 BLA BLA BLA 9» THE GREEN FIELD M

ARCH 2010 , pg.8

What better an introduction could be or world’snumber one independent leader in data inte-gration sotware – Inormatica.

Inormatica Corporation provides data integra-tion and data quality sotware and services orvarious businesses, industries and governmentorganizations, including telecommunications,health care, insurance, and nancial services.

Inormatica comprises six business units whichinclude: Data Integration, Data Quality, CloudData Integration, Application Inormation Lie-cycle Management (ILM), Complex Event Pro-cessing (CEP) and B2B.

What gives way to ETL tools like Inormatica? Think o GE, the company has over 100+ yearso history & presence in almost all the indus-tries. Over these years company’s managementstyle has been changed rom book keeping toSAP. This transition was not a single day transi-tion. In transition, rom book keeping to SAP,they used a wide array o technologies, rangingrom mainrames to PCs, data storage rangingrom at les to relational databases, program-ming languages ranging rom COBOL to Java. This transormation resulted into dierent busi-nesses, or to be precise dierent sub businesseswithin a business, running dierent applica-tions, dierent hardware and dierent architec-ture. Technologies are introduced as and wheninvented & as and when required.

 This directly resulted into the scenario, like HRdepartment o the company running on OracleApplications, Finance running SAP, some part o process chain supported by mainrames, somedata stored on Oracle, some data on mainrames,some data in VSM les & the list goes on. I oneday company requires a consolidated report o assets, there are two ways.

First completely manual, generate dierent•reports rom dierent systems and integratethem.Second etch all the data rom dierent sys-•tems/applications, make a Data Warehouse,and generate reports as per the require-ment.

Obviously second approach is going to be a bet-ter bet.

Now to etch the data rom dierent systems,making it coherent and loading into a DataWarehouse requires some kind o extraction,cleansing, integration, and load. ETL stands orExtraction, Transormation & Load.

ETL Tools provide acility to Extract data rom di-erent non-coherent systems, cleanse it, mergeit and load into target systems.

Inormatica – what and how?

Inormatica is an easy to use ETL tool. It has got asimple visual interace like orms in visual basic.You just need to drag and drop dierent objects(known as transormations) and design processow or Data extraction transormation andload. These process ow diagrams are knownas mappings. Once a mapping is made, it canbe scheduled to run as and when required. Inthe background Inormatica server takes careo etching data rom source, transorming it, &loading it to the target systems/databases.Inormatica can communicate with all major

data sources (mainrame/RDBMS/Flat Files/XML/VSM/SAP etc), can move/transorm databetween them. It can move huge volumes o 

data in a very eective way, many a times bet-ter than even bespoke programs written orspecic data movement only. It can throttle thetransactions (do big updates in small chunks toavoid long locking and lling the transactionallog). It can eectively join data rom two distinctdata sources (even an xml le can be joined witha relational table). In all, Inormatica has got theability to eectively integrate heterogeneousdata sources & converting raw data into useulinormation.

Architecture Illustration:Inormatica ETL product, known as InormaticaPower Center consists o 3 main components.

Inormatica PowerCenter Client Tools1. :  These are the development tools in-stalled at developer end. Thesetools enable a developer to

• Dene transformation pro-cess, known as mapping. (Designer)• Dene run-time properties for amap-ping, known as sessions (Workow Manager)• Monitor execution of ses-sions (Workow Monitor)• Manage repository, useful for ad-ministrators (Repository Manager)

• Report Metadata (Metadata Reporter) 

Inormatica PowerCenter Repository: 2.Repository is the heart o Inormaticatools. Repository is a kind o data inven-tory where all the data related to mappings,sources, targets etc is kept. This is the placewhere all the metadata or your applica-tion is stored. All the client tools and Inor-matica Server etch data rom Repository.Inormatica client and server without re-pository is same as a PC without memory/harddisk, which has got the ability to pro-cess data but has no data to process. Thiscan be treated as backend o Inormatica.

Inormatica PowerCenter Server:3.Server is the place, where all the executionstake place. Server makes physical connec-tions to sources/targets, etches data, ap-plies the transormations mentioned in themapping and loads the data in the targetsystem.

Inormatica Transormations – A Value AddA transormation is a repository object that gen-erates, modies, or passes data. The Designerprovides a set o transormations that perorm

specic unctions. For example, an Aggregatortransormation perorms calculations on groupso data. Inormatica has a strong list o built in

transormations that it provides t ease an ETLdeveloper’s work.

The Inormation CloudSaaS has gained enormous ground in the com-petitive market today. Inormatica had sensedthis shit towards Cloud Computing beore anyother ETL tool provider and launched a dozeno adaptors like Aymetrix, Brocade Commu-nications Systems and PowerData. A change indirection was observed ater Inormatica joinedhands with Salesorce.com and ApexConnectprogram on the AppExchange. This has turnedout to be a strategic relationship or its custom-ers ensuring they can manage and share all o 

their enterprise data and inormation on de-mand.

As cloud computing has become widely adopt-ed in organizations o all sizes, Inormatica hascontinued to expand their ocus on cloud dataintegration. In 2009 the company announcedInormatica Cloud 9, “a comprehensive oeringor cloud data integration.” It eatured:

  The Inormatica Cloud Platorm - a multi-•tenant, enterprise-class data integrationplatorm-as-a-service (PaaS).

Inormatica Cloud Services - purpose-built,•sotware-as-a-service (SaaS) data integra-tion applications designed or non-technicalusers.Inormatica Data Quality and PowerCenter•Cloud Editions – the ability or customers torun Inormatica sotware on inrastructureas a service platorms such as Amazon EC2.

Challeners?Inspite o being the best o breed product in theData Integration space, Inormatica aces toughcompetition rom hand coding. Some o the pro-prietary competing ETL tools are IBM DataStage,Ab Initio, Business Objects Data Integrator, and

Microsot’s SQL Server Integration Services.

Not to orget are some o the open source oer-ings like Apatar, CloverETL, Pentaho, Kettle and Talend who are pushing rom the low end, byoering less expensive solutions meeting qual-ity expectations.

Inormatica has answered all the competitors bycontinuously acquiring the best o the availablein the market and the latest HOT news readsInormatica Acquires Siperian. Details at http://www.inormatica.com/news_events/Pages/sip-erian.aspx

MaximizeROI on Data Integration with Inormaticaby BI Lab’s Members

Page 10: Green Field March Edition

8/6/2019 Green Field March Edition

http://slidepdf.com/reader/full/green-field-march-edition 10/16

10 BLA BLA BLA Exero 01, 5555

» THE GREEN FIELD MARCH 2010 , pg.9

Over the ast decade, IT departments at manyorganizations have built large, sophisticateddata integration and management inrastruc-

tures using industry leading products to deliverbusiness value in terms o better customer un-derstanding, aster time to market, and businessagility all at lower costs.

Most o today’s critical business initiatives can-not succeed without eective integration o in-ormation. Initiatives such as single view o thecustomer, business intelligence, supply chainmanagement, and Basel II and Sarbanes-Oxleycompliance require consistent, complete, andtrustworthy inormation.

IBM® Inormation Server is the industry’s rstcomprehensive, unied oundation or enter-prise inormation architectures, It is the inte-grated set o components that include Web-SphereDataStage and Quality Stage, WebSphere Inormation Analyzer, Federation Server,and Business Glossary that share a commonmetadata repository, common administration,common logging and common reporting and iscapable o scaling to meet any inormation vol-ume requirement so that companies can deliverbusiness results within these initiatives asterand with higher quality results.

DataStage has seen major transormations inthe past years rom an extract-transorm-loadtool running in what was called the Universe en-gine, to what is now a DataStage engine. With

the need to adapt to demands o volume pro-cessing, the parallel processing engine has beenintegrated into DataStage.

IBM Inormation Server suorts all o theseinitiatives:

Business intelligence•

Master data management•

Inrastructure rationalization•

Business transormation•

Risk and compliance•

Caabilities

IBM Inormation Server eatures a unied set o separately orderable product modules, or suite

components, that solve multiple types o busi-ness problems. Inormation validation, accessand processing rules can be reused across proj-

ects, leading to a higher degree o consistency,stronger control over data, and improved e-ciency in IT projects

 There are many new eatures and added unc-tionalities in Web Sphere DataStage that helpcut development time, simpliy job design andimprove job perormance.

Among these Features there are some more ad-vance eatures which make it the rst choice o the retailer like

Single interace to integrate heterogeneous•

applications

Flexible development environment•

Data Connection Object•

ODBC Connector•

Slowly Changing•

Dimension stage•

Range Look-up•

Advanced and Quick Find•

Parameter Set•

Common Logging•

Reuse, Versioning and Sharing•

Resource Estimation tool•

Perormance Analysis tool and Job Com-•

pare

Business advantaes o usin DataStae asan ETL tool:

Apart rom the other advantages data stageprovides retailers more benets or which it ispreerred ETL tool

Signicant ROI over hand-coding•

Learning curve - quick development and re-•duced maintenance with GUI tool

Development Partnerships - easy integra-•

tion with top market products interaced

with the data warehouse, such as SAP, Cog-

nos, O racle, Teradata, SAS

Single vendor solution or bulk data transer•

and complex transormations (DataStage

versus DataStage TX)

 Transparent and wide range o licensing op-•

tions. And now let’s see how it’s better than its leadingmarket competitors.

DataStae Vs Inormatica

Datastage is more powerul transormation en-gine by using unctions and routines. We can doalmost any transormation. Inormatica is morevisual, programmer riendly.

Lookups in Datastage are much aster than In-ormatica, because the way the hash les arebuilt. We can tune the hash les to get an opti-mal perormance.

DataStage has a command line interace. Thedsjob command can be used by any schedulingtool or rom the command line to run jobs andcheck the results and logs o jobs.

DataStae Vs SSIS

SSIS introduces the partitioned sort but DataSt-age show much evidence o a parallel process-ing architecture to handle very high volumetransormation, cleansing and load.Almost every type o transormation in a DataSt-age and/or QualityStage parallel job can parti-tion data and run on multiple nodes.

IBM is one o the markets leading Vendor. So ithas always tried to maintain the perormancelabel o its products as it did in DataStage. Everynew release o the product xes some bugs withsome added advanced eatures to compete with

other market leading ETL tool and recognize it-sel as a leading solution.

DataStageThe solution to Enterprise Data Integrationby Banita Rout

Page 11: Green Field March Edition

8/6/2019 Green Field March Edition

http://slidepdf.com/reader/full/green-field-march-edition 11/16

Exero 01, 5555 BLA BLA BLA 11» THE GREEN FIELD M

ARCH 2010 , pg.10

While the selection o a database and a hard-ware platorm is a must, the selection o an ETLtool is highly recommended, but it’s not a must.

When you evaluate ETL tools, it pays to look orthe ollowing characteristics:

Functional capability:•

  Ability to read directly rom your data•

source:Metadata support:•

However, it’s an absolute piece o cake to use.It does require some thinking about, but that’smore to do with the logic o the process thanuse o the tool itsel. Understanding what youwant to achieve is stage one, establishing whichgraph components is stage two and the easystage is the last one, putting the graph togeth-er.

Ab Initio is suite o applications containing thevarious components, but generally when peo-ple name Ab Initio, they mean “Ab Initio Cooper-ation system”, which is primarily a GUI based ETLApplication. It gives user the ability to drag anddrop dierent components and attach them,quite akin to drawing.

 The strength o Ab Initio-ETL is massively paral-lel processing which gives it capability o han-dling large volume o data.

Today’s interation roject teams ace thedaunting challenge o deploying integrationsthat ully meet unctional, perormance, andquality specications on time and within bud-get. These processes must be maintainable overtime, and the completed work should be reus-able urther.

 Traditional “Extract, Transorm, Load” tools close-ly intermix data transormation rules with theintegration process procedures, requiring thedevelopment o both data transormations anddataow. ODI-EE takes a dierent approach tointegration by clearly separating the declarativerules (the “what”) rom the actual implementa-tion (the “how”).

Integrating data and applications throughoutthe enterprise, and presenting a unied viewo them, is a complex proposition. Not only arethere broad disparities in data structures and ap-

  The Ab Initio sotware is a suite o productswhich together provide a platorm or data pro-cessing applications. The Core Ab Initio prod-

ucts are:

Co Operating System•

The Component Library •

Graphical Development Environment •

Enterprise Meta>Environment •

Data Proler •

Conduct It •

Ab Initio has added lots o eatures over theyears, especially in response to prospect or cus-tomer requests.

I • BM OS/390 support SOAP/XML support •

  A compressed le system that can directly •

store 100s o TBs o user dataDynamic script Generation•

PDL and Component olding•

Handling run time related errors•

Ecient use o components•

Documentation tools•

Run History Tracking•

Mastery o parallel processing, high peror-•

mance computing and ETL job perormanceUnderstanding o associated environments•

and technologies

plication unctionality, but there are also unda-mental dierences in integration architectures.Some integration needs are data oriented, espe-cially those involving large data volumes. Oth-er integration projects lend themselves to anevent-oriented architecture or asynchronousor synchronous integration.

Changes tracked by Changed Data Captureconstitute data events. The ability to track theseevents and process them regularly in batches orin real time is the key to the success o event-driven integration architecture. ODI-EE providesrapid implementation and maintenance or alltypes o integration projects.

 The ODI-EE architecture is organized around amodular repository, which is accessed in client-server mode by components—graphical mod-ules and execution agents—that are writtenentirely in Java. The architecture also includes

Ab Initio Vs InormaticaInormatica and Ab Initio both support paral-lelism. But vInormatica supports only one type

o parallelism but the Ab Initio supports threetypes o parallelism.

Component 1.Data Parallelism 2.Pipe Line parallelism.3.

Ab Initio supports dierent types o text lesthat are not possible in Inormatica.

Inormatica is an engine based ETL tool, so wecan’t see or modiy the code that it generatesater development. Ab Initio is a code basedETL tool, which generates ksh or bat etc. code,that can be modied to achieve the goals, i anythat cannot be taken care through the ETL toolitsel.

In Ab Initio you can attach error and reject lesto each transormation and capture and analyzethe message and data separately. Inormaticahas one huge log! Very inecient when work-ing on a large process, with numerous points o ailure.

Inormatica is very basic as ar as transormationsgo whereas Ab Initio is much more robust.So go ahead and Open up your ETL options withAb Initio.

a Web application, Metadata Navigator, whichenables users to access inormation through aWeb interace.

Poor-quality data aficts almost every companyo moderate size and operational complexity. Inact, inconsistent, inaccurate, incomplete, andout-o-date data are oten the root cause o ex-

pensive business problems such as operationalineciencies, aulty analysis or business opti-mization, unrealized economies o scale, anddissatised customers. Savvy IT managers cansolve a host o these and other business-levelproblems by committing to a program o com-prehensive data quality. Oracle Data Integratoroers a comprehensive data quality solution tomeet any data quality challenge or any type o global data with a single, well integratedtechnology package.

Ab Initio

Redefne

A new beginningby BI Lab’s Members

Data Quality with ODIby BI Lab’s Members

» continue on pg.11

Page 12: Green Field March Edition

8/6/2019 Green Field March Edition

http://slidepdf.com/reader/full/green-field-march-edition 12/16

12 BLA BLA BLA Exero 01, 5555

» THE GREEN FIELD MARCH 2010 , pg.11

Oracle’s solution or comprehensive data qualityincludes three products: Oracle Data Integrator,Oracle Data Proling, and Oracle Data Qualityor Oracle Data Integrator. These three best-o-breed technologies work seamlessly together tosolve the most challenging enterprise data gov-

ernance problems.

  The rst step in a comprehensive data qualityprogram is to assess the quality o your datathrough data proling. Proling data means re-verse-engineering metadata rom various datastores, detecting patterns in the data so thatadditional metadata can be inerred, and com-paring the actual data values to expected datavalues. Proling provides an initial baseline orunderstanding the ways in which actual datavalues in the systems ail to conorm to expecta-tions. Oracle Data Integrator proling capabili-ties ensure data assessment is not a one-timeactivity, but an ongoing practice that ensuresdata quality over time. Once data problems are

well understood, the rules to repair those prob-lems can be created and executed by data qual-ity engines. For both standard data quality andadvanced data quality, an initial set o rules canbe generated based on the results o proling,then users that understand the data can reneand extend those rules.

Comprehensive data quality should be a keyenabling technology or any IT inrastructure,and it is critical to solving a range o expensivebusiness problems. Comprehensive data qualityis particularly important in the context o anydata integration process to prevent data quality

Redefne cont...

problems rom prolierating. Oracle Data Inte-grator’s inline, stepped approach to comprehen-sive data quality ensures that data is adequately

veried, validated, and cleansed at every pointo the integration process.

Ater Quality, Security is the main concern needto be taken care o. The rst steps in securing anintegration project are setting up access to Or-acle Data Integrator objects and dening userproles and access privileges or those users.Oracle Data Integrator can provide the securityto integration project requires, even in the mosthighly sensitive environments.

Next Challenge is to manage the version - de-velopment teams ace a great deal o trouble

in managing a project’s hundreds—sometimesthousands—o work units throughout the de-velopment process and beyond. Success or

ailure o the version management process cangreatly aect the integrity and success o anydevelopment project.

Regardless o the database or applications with-in the IT ecosystem, the ODI solution can be op-timized to drive the highest-perormance bulk or real-time transormations. Oracle’s vision isto combine and enable these capabilities romwithin a next-generation, unbreakable Service-Oriented Architecture that will continue to drivebusiness value within the enterprise or manyyears to come.

The data to which a company has access is keyto its uture success, but obtaining meaningulinormation rom data can be ar rom straight-orward. Companies may need to harvest datarom multiple geographical locations and it isunlikely that all the data will be stored in a sin-gle ormat. Microsot Oce Excel spreadsheets,Microsot Access database , XML documents,

SQL Server database, Oracle databases, Teradatadata warehouses, and SAP systems are just a ewo the data stores that contemporary organiza-tions use. Other issues, such as data ownershipand compliance with regulatory requirements,can urther complicate matters.

Data consolidation can be time consuming andresource intensive, and batch windows can behard to nd in an increasingly globalized en-vironment. Furthermore, the value o data canalso depreciate in a relatively short period o time. Consequently, making reliable data avail-able in a timely and ecient manner is a majorchallenge or the modern data worker.

SQL Server 2008 introduced SQL Server Integra-tion Services, enterprise-level data integrationand workow solutions platorm or perormingextract, transorm, and load (ETL) operations.Integration Services provides a set o poweruleatures that enable the merging and consoli-dation o data rom heterogeneous sources, andincludes tools or extracting, cleaning, standard-izing, transorming, and loading data. A widevariety o built-in connectors support these op-erations, enabling Integration Services to inter-act not just with SQL Server databases, but withmany other proprietary and non-proprietarydata sources.

 The SQL Server 2008 implementation o Integra-tion Services builds upon the strengths o theprevious release, and as a result the new releaseis a robust enterprise ETL platorm that is evenmore productive and extensible.

  Two key areas o development in SQL Server2008 Integration Services are:

Improved options or connectivity.•

Signicant gains in perormance.•

Integration Services provides a wide range o data source connectors out o the box and manyadd-on connectors are available rom Microsotand rom third-party vendors. As a result, Inte-gration Services is able to work with a broaderrange o sources than ever beore. The new con-nectivity options have also contributed to im-proving perormance, and SQL Server now hasthe astest ETL tool available.

We can use Integration Services to create pack-ages that encapsulate a specic business re-

quirement, such as extracting data rom anOracle database, cleaning the data, and thenloading it into an Analysis Services database.Packages consist o one or more control owtasks, where each task eeds into the next.

  There are add-on connectors also that oerconnectivity to sources that have no built-inconnector, such as Teradata and SAP BI, or thatoer improved perormance over existing con-nectors or sources that are already supported,such as Oracle.

 The In-Build connectors that available are OLEDB, ADO.NET, FLATFILE, MULTI FLATFILE, andFILE, FTP and HTTP, MSMQ, MSOLAP100, SMOS-ERVER, SMTP, SQLMOBILE, WMI, XML.

In addition to the extensive range o built-inconnectors, there are many more that we caninstall as add-ons. Some o these connectors are

provided by Microsot and others by third par-ties. There are two main reasons why vendorscreate add-on connectors:

To acilitate access to a data source that is•

not supported by any o the built-in connec-torsTo provide an improvement in perormance•

over existing connectors

Fig shows an Oracle Source and Oracle Destina-tion in use as part o the data ow.

SSISMicrosot’s Bet in the ETL Marketby BI Lab’s Members

» continue on pg.12

Page 13: Green Field March Edition

8/6/2019 Green Field March Edition

http://slidepdf.com/reader/full/green-field-march-edition 13/16

Exero 01, 5555 BLA BLA BLA 13» THE GREEN FIELD M

ARCH 2010 , pg.12

How many o you have heard the myth that Mi-crosot SQL Server Integration Services (SSIS)does not scale?

Well here is a question as an answer!!

“Does your system need to scale beyond 4.5million sales transaction rows per second?”

SQL Server Integration Services is a high peror-mance Extract-Transorm-Load (ETL) platormthat scales to the most extreme environments.SQL Server Integration Services can process atthe scale o 4.5 million sales transaction rowsper second. That should bring a smile on a loto aces.

SSIS cont...

When managing complex database environ-ments, IT vendors and buyers agree on the threetop priorities: Integration, Integration and Inte-gration.

And restrictions on IT spending and stanghas urther uelled the need to integrate exist-ing systems rather than investing in new tech-nology. Data Integration reers to the organiza-tion’s inventory o data and inormation assetsas well as the tools, strategies and philosophies

by which ragmented data assets are aligned tosupport business goals. Data Integration prob-lems are becoming a barrier to business successand a company must have an enterprise widedata integration strategy i it is to overcomethis barrier. That explains why so many vendorsseem to be bragging about their integrationcapabilities these days. Broadly speaking, enter-prise business integration can occur at our di-erent levels in an IT system; data, application,business process and user interaction.

Many technologies neatly t into one o thesecategories but there is a trend in the industrytowards IT applications supporting multiple in-tegration levels, it is thereore very important todesign an integration architecture that can in-corporate all our levels o enterprise businessintegration.

Data Integration provides a unied view o thebusiness data that is scattered throughout anorganization. It may be a physical view o datathat has been captured rom multiple dispa-rate data sources and consolidated data into anintegrated data store like a data warehouse oroperational data store, or it may be a virtual ed-erated view o disparate data that is assembleddynamically at data access time. A third optionis to provide a view o data that has been inte-grated by propagating data rom one database

to another - like merging customer data roma CRM database into an ERP database, or ex-ample.

Over time, companies are migrating to the phi-losophy o a service-oriented architecture (SOA)that applies Web protocols and standards orsel-identiying application and data end points.  This transition is proceeding slowly and selec-tively as companies are reluctant to abandonproven systems, including mainrames and tra-ditional messaging, which remain mission-criti-cal to business operations.

 The our levels o enterprise business integrationdo not operate in isolation rom each other.

In a ully integrated business environment in-teraction oten occurs between the dierentintegration levels. In the data warehousing en-vironment, some data integration tools work with application integration sotware to captureevents rom an application workow, and trans-orm and load the event data into an operationaldata store (ODS) or data warehouse. The results

o analyzing this integrated data are oten pre-sented to users through business dashboardsthat operate under the control o an enterpriseportal that implements user interaction integra-tion.

Both IT sta and vendors now realize that dataintegration cannot be considered in isolation.Instead, a data integration strategy and inra-structure must take into account the applica-tion, business process, and user interaction in-tegration strategies o the organization.

 As with any technology, there’s convergence in the market place.

Convergence across EII, EAI, ETL, and web-services.

SOA is the architectural icing on the cake.

Let’s analyze these three Es of data integra-tion in a little more detail.

vEII (Enterprise Inormation Integration) providesan optimized and transparent data access andtransormation layer providing a single relation-al interace across all enterprise data. It enablesthe integration o structured and unstructureddata to provide real-time read and write access,to transorm data or business analysis and datainterchange and to manage data placement or

perormance, currency, and availability.

ETL (Extract, Transorm and load) is designed toprocess very large amount o data. It provides asuitable platorm or:

Improved productivity by reuse o objects•

and transormationsStrict methodology •

Better metadata support, including impact •

analysis

EAI (Enterprise Application Integration) provides

message-based, transaction-oriented, point-to-point (or point-to-hub) brokering and transor-mation or application-to-application integra-tion. The core benets oered by EAI are:

  A ocus on integrating both business level •

 processes and data A ocus on reuse and distribution o business•

 processes and data A ocus on simpliying application integration•

by reducing the amount o detailed, applica-

tion specic knowledge required by users.

All these three EII, ETL and EAI range rom theneed or real time versus batch integration androm the need or the integration o data versusthe integration o applications.

So, we need to judge our needs better and iden-tiy a matching solution. For organizations thatneed real time data integration, EII ts in there.For those who require batch data integration,ETL would be the best bet. And or those whoneed either batch or real time application inte-gration, EAI is the most appropriate tool.

But gradually the horizons o these three E’s aremerging. EAI, ETL and EII can co-exist. This is ac-tually a matter o act in most o today’s organi-zations. Every organization has the need or EAIso that their various data entry systems like in-ventory, payroll, marketing, operations can talk to each other. This is ollowed up by ETL to storethe same data to central repositories rom whereit can be extracted as per requirement. Ater allthis is done, EII comes into play by delivering thedecision maker with a customizable view thatmight extract data rom a single database, mul-tiple databases or even OLTP applications.

A classic reerence architecture in which all 3tools can play a part is when, transactional ap-

plications are integrated thro’ EAI, data romthese applications ow into an Enterprise DataWarehouse (EDW) by leveraging ETL capabilityand then EII tools help to combine data romOLTP applications, EDW, external data reposito-ries and local excel sheets or business decisionmaking.

EAI, EII and ETL complement each other andwhen implemented together within an orga-nization’s data integrations architecture onlystrengthen the oundations o all decisionshence promising growth.

Merging Horizonso ETL, EAI and EIIby Gitanjali Kahaly

Page 14: Green Field March Edition

8/6/2019 Green Field March Edition

http://slidepdf.com/reader/full/green-field-march-edition 14/16

14 BLA BLA BLA Exero 01, 5555

Editor: Sweta Gupta

Content & Layout Design: Jakkie Swart 

 Assistant editors: Sodyam Bebarta, Sudip Basu, Harapriya Montry, Jagyanseni Das, Banita Rout & Gitanjali Kahaly.

Credits

Even ater so much bein written, bloggedand discussed about this three lettered wordETL which spans Extract, Transorm and Loado data rom varied sources to specic targets,there is still so much let to be told. And here isa candid look at the current and the uture per-

spective o ETL.

ETL technology has continuously evolved romthe legacy code generators to proprietary en-gine and to the current third generation ETL. I.e.ELT. This exchange o places between “T” and “L”has made all the dierence. With the early gen-erations o ETL already having been discussed in“The SAGA o ETL” let’s talk about how the letter“T” in ETL has proved to be the determining ac-tor when it comes to measuring perormance,eciency and ease o use.

 The proprietary engine based ETL tools had theETL hub server sitting between the source and

the target. All data had to be routed throughthis server where it would be transormed (row -by - row) beore it reached the target (the ware-house).This made the ETL process slow and ine-ective and with it arose a need or an alternativethat would reduce this overhead. The databasevendors heard the cry and invested signicantlyto better their RDBMS by adding new unction-alities so as to build the complex transormationlogic in – house thus leveraging the power o their traditional SQL.

ETL architecture then turned to ELT architec-ture, where users were presented with a highlyinteractive GUI with the ability to generate na-tive SQL to execute data transormations on the

data warehouse server. ELT also enabled bulk processing o data ater being loaded to the tar-get. The perormance bettered by 1000 timesand only increases with increasing volume o data owing in. Since database engines can be

the source and target, the SQL code could bedistributed among them to achieve the bestperormance. Today’s RDBMSs have the power toperorm any data integration work. Third-gener-ation E-L-T tools take advantage o this power byleveraging and orchestrating the work o these

systems — and processing all data transorma-tions in bulk.

Change being the trend, the needs o businesshas never been constant. This is an era o real-time applications generating huge volumes o data every millisecond. We live in a browser-centric world today. The next generation Inte-gration Technology has to support and servicedata integration, data warehousing and e-bizapplications and services. One solution is theEnterprise Application Integration (EAI) toolswhich respond to the real time needs o theinternet and other applications but they lack the extract and load capability o the ETL tools.

Even DQM (Data Query Management) could notprove to be an ideal solution. Though it bypassesthe data warehousing architecture and providesreal-time data access and integration in hetero-geneous DBMS/platorm environments, it is notideal or large volumes o data.

So what should we call as a complete solution?  The answer is - A blend o ETL, EAI and DQMtools which would route data to and rom inor-mation - craving entities based on prescribedbusiness rules. GartnerGroup has named thisexible, scalable and intelligent solution as ILNInormation Logistics Network.

ETL vendors are now ocused to get their tools

to encompass the ull range o data integrationcapabilities needed or integration and man-agement o business processes and transactionsacross ERP and CRM systems. The tools cannotaord to stay the same and need to integrate

both structured and unstructured real time dataand to manage and share technical to externalindustry data. ETL must evolve into integrationtechnology that solves issues at the data leveland beyond.

Nowadays real – time databases appear distrib-uted over the web and contain specic inorma-tion. Hence the challenge or ILN would be tocollect the interactions taking place and sendthem securely to the data warehouse or analy-sis and actions. ILNs can use the underlying sot-ware and hardware to leverage parallel process-ing rom source to target. Some ETL vendorsare making their tools critical to the process o sharing data rom data warehousing to businessto business (B2B). They oresee that eventuallymetadata will be XML- based because, as an in-terchange ormat, it oers a great deal o ex-ibility.

“ETL or HTML” is a popular phrase used to de-scribe how most o us will access web data.It encompasses Web2.0 and Enterprise DataManagement. Unlike traditional ETL, Web DataServices provides two-way access to data. Thismeans we can leave the data where it residesbest and get ull programmatic access by usinga Web Data Server to “wrap” the applications intostandard service APIs like REST, SOAP or .NET.With the data explosion around us it becomesimpractical to move and synchronize data intoone common data repository. The data we needto perorm our analysis and drive business deci-sions will change more and more rapidly. We willneed new data sources daily, or at least weekly,to react to the ever changing business needs o 

the uture.

Let’s hope the marriage o ETL, EAI and DQMbegin a new trend in the world o Data Integra-tion.

ETL in the times to come…by Sweta Gupta

» THE GREEN FIELD MARCH 2010 , pg.13

Page 15: Green Field March Edition

8/6/2019 Green Field March Edition

http://slidepdf.com/reader/full/green-field-march-edition 15/16

Exero 01, 5555 BLA BLA BLA 15» THE GREEN FIELD M

ARCH 2010 , pg.14

Page 16: Green Field March Edition

8/6/2019 Green Field March Edition

http://slidepdf.com/reader/full/green-field-march-edition 16/16

16 BLA BLA BLA Exero 01, 5555

» THE GREEN FIELD MARCH 2010 , pg.15