18
Big Data for Dummies using DataStage Big Data for Dummies using DataStage By Peter Bjelvert InfoSphere Architect Middlecon AB

Big data for dummies using data stage live tool demo

  • View
    408

  • Download
    6

Embed Size (px)

DESCRIPTION

Big data for dummies using data stage live tool demo

Citation preview

Page 1: Big data for dummies using data stage   live tool demo

Big Data for Dummies using DataStageBig Data for Dummies using DataStage

By Peter Bjelvert

InfoSphere Architect

Middlecon AB

Page 2: Big data for dummies using data stage   live tool demo

ETL – Relational DB

Extract Transform in DataStage

Load

Your powerful DataStage server will handle all complex transformation and the database is only used for reading and writing.

Page 3: Big data for dummies using data stage   live tool demo

ELT – Relationel DB

ExtractLoad with Transform

If you have powerful Database servers you can push down much of the work to the database, then DataStage will mostly control the flow

Page 4: Big data for dummies using data stage   live tool demo

Balanced Optimization

Bal. Opt. create a second copy of the jobb that push everything into target. Creates one big SQL statement.

Bal. Opt. creates a new copy of the jobb that push the load into Source and Target

Use DataStage Balanced Optimization to select how to push the load: -To Source-To Target -To Both

The DataStage job is re-written into SQL code.

Page 5: Big data for dummies using data stage   live tool demo

ETL Balanced Optimization feature of Datastage

ELT – PushDown

DataStage is doing the main work Bal. Opt. creates a new copy of the job with SQL code:SELECT * FROM (SELECT distinct BRANCH_CITY, BRANCH_STATE, BRANCH_ZIP FROM JK_BANK2.BANK_BRANCH) AS A, ( Select distinct BRANCH_CITY,

DB server is doing the main job

Page 6: Big data for dummies using data stage   live tool demo

Hadoop Distributed File System - HDFS

Application Layer

Workload mgmt Layer

Data Layer

One file3 copies

Page 7: Big data for dummies using data stage   live tool demo

MapReduce example

Page 8: Big data for dummies using data stage   live tool demo

Hadoop application stack

Application Layer

Workload mgmt Layer

Data LayerHDFS

MapReduce

JACL, AQL….

Page 9: Big data for dummies using data stage   live tool demo

IBM’s Hadoop implementation

Page 10: Big data for dummies using data stage   live tool demo

ETL – HDFS

ExtractTransform in DataStage Load

Your powerful DataStage server can read and write to the distributed file system

Page 11: Big data for dummies using data stage   live tool demo

DataStage HDFS example

Read and write to a Hadoop system using the new BDFS stage

Page 12: Big data for dummies using data stage   live tool demo

ELT – Hadoop system

Extract

Use DataStage Balanced Optimization to select how to push the load: -To Source-To Target -To Both

The DataStage job is re-written into JACL code.

Load with Transform

Page 13: Big data for dummies using data stage   live tool demo

DataStage JACL example

Bal. Opt. create a second copy of the jobb that push everything into target. Creates one big JACL statement.

Page 14: Big data for dummies using data stage   live tool demo

ETL Balanced Optimization feature of Datastage

ELT – PushDown

DataStage is doing the main work Bal. Opt. creates a new copy of the job with SQL code:SELECT * FROM (SELECT distinct BRANCH_CITY, BRANCH_STATE, BRANCH_ZIP FROM JK_BANK2.BANK_BRANCH) AS A, ( Select distinct BRANCH_CITY,

DB server is doing the main job

HDFS DataStage is doing the main work Bal. Opt. creates a new copy of the job with JACL code:

SetOptions({conf:{"mapred.job.name":"Data

Stage BalOp job BIGDATA:dstage1 ff_read_write_to_hadoop_jaql_balopt_join CustomerTarget 16_#DSJobInvocationId#"}}); setOptions({conf:{"mapred.reduce.tasks":1}}));

Hadoop application server execute the JACL code onall nodes.

Page 15: Big data for dummies using data stage   live tool demo

Extract, Transform and filter in DataStage

Load good data into HDFS

DataStage can read from many different sources. Convert common data (like time/date) to failitate following queries. Send unwanted data to garbage

A good scenario for DS customer

Analytic functionsAQL …

Page 16: Big data for dummies using data stage   live tool demo

o LIVE DEMO

Page 17: Big data for dummies using data stage   live tool demo

o Borrowed images from google

� Slide 6- https://yoyoclouds.wordpress.com/tag/hadoop/� Slide 7- http://kickstarthadoop.blogspot.se/2011/04/word-count-hadoop-

map-reduce-example.html� Slide 8 - http://www.rosebt.com/1/post/2012/07/hadoop-internal-software-

architecture.html� Slide 9- http://www.ndm.net/datawarehouse/IBM/ibm-infosphere-

biginsights

Page 18: Big data for dummies using data stage   live tool demo

Handling Big Data without angst