13
12/6/2014 Datastage4u: SCD Types and How Many ways to develope the SCD's http://datastageganesh.blogspot.in/2013/06/scdtypesandhowmanywaystodevelope.html 1/13 Datastage4u FRIDAY, JUNE 21, 2013 SCD Types and How Many ways to develope the SCD's 1. What type of SCD you have used so far? Slowly Changing Dimensions (SCD) is dimensions that have data that slowly changes. For example, we may have a Dimension in our database that tracks the sales records of your company's salesmen and when sales person is transferred from one regional office to another. Dealing with these issues involves SCD management methodologies referred to as Type 0, 1, 2, 3, 4, and 6. Type 6 SCDs are also sometimes called Hybrid SCDs. The Type 0 method is a passive approach to managing dimension value changes, in which no action is taken. Values remain as they were at the time the dimension record was first entered. The Type 1 methodology overwrites old data with new data, and therefore does not track historical data at all. This is most appropriate when correcting certain types of data errors, such as the spelling of a name. (Assuming we won't ever need to know how it used to be misspelled in the past.) The Type 2 method tracks historical data by creating multiple records in the dimensional tables with separate keys. With Type 2, we have unlimited history preservation as a new record is inserted each time a change is made. The Type 3 method tracks changes using separate columns. Whereas Type 2 had unlimited history preservation, Type 3 has limited history preservation, as it's limited to the number of columns we designate for storing historical data. Where the original table structure in Type 1 and Type 2 was very similar, Type 3 will add additional columns to the tables: The Type 4 method is usually just referred to as using "history tables", where one table keeps the current data and an additional table is used to keep a record of some or all changes. The Type 6 method is one that combines the approaches of types 1, 2 and 3 (1 + 2 + 3 = 6). It is not frequently used because it has the potential to complicate end user access, but has some advantages over the other approaches especially when techniques are employed to mitigate the downstream complexity. 2. How did you implement Type2 SCD in Datastage? The following steps are required to implement SCD type2 in Datastage. 1) Need to take a snap shot of the WareHouse final target dimensional table and store in a DataSet or Temp Table or directly from the Data warehouse target table itself. 2) Retrieve the new records from the source (Source table/ Flat file/ view/ or any other source) and lookup the snap shot with the help of lookup or Join stage and based on Primary Key 3) Allow to pass both the values for these particular SCD columns (the columns that are affected with change) with a different column name (like SalesTerritory Source and SalesTerritoryLkp. 4) In the next step, in a transformer, compare these two values for every single primary key using stage variables. If these values are different then close the previous record by choosing SalesTerritoryLkp value (coming from target snap shot) for SalesTerritory column ,updating the CURRENT_RECORD='N' and setting the END_DATE as Current Time ( when the records are being processed) in first link from the transformer. Simultaneously with the new value (Sales_TeritorySource ) in the SCD columns insert One more records into the target table with START_DATE as current time, CURRENT_RECORD=‘Y’ and setting the END_DATE as Null. 2013 (6) November (1) June (5) Ten Reasons Why You Need DataStage Inosphere 8.5 Data Warehousing Concepts SCD Types and How Many ways to develope the SCD's About me About me BLOG ARCHIVE ganesh s Follow 1 View my complete profile ABOUT ME 0 More Next Blog» Create Blog Sign In

Datastage4u - Webnodefiles.datastage.webnode.com/200000156-0509105fde... · 1) Need to take a snap shot of the WareHouse final target dimensional table and store in a DataSet or Temp

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Datastage4u - Webnodefiles.datastage.webnode.com/200000156-0509105fde... · 1) Need to take a snap shot of the WareHouse final target dimensional table and store in a DataSet or Temp

12/6/2014 Datastage4u: SCD Types and How Many ways to develope the SCD's

http://datastageganesh.blogspot.in/2013/06/scdtypesandhowmanywaystodevelope.html 1/13

Datastage4u

FRIDAY, JUNE 21, 2013

SCD Types and How Many ways to develope the SCD's

1. What type of SCD you have used so far?

Slowly Changing Dimensions (SCD) is dimensions that have data that slowly changes.For example, we may have a Dimension in our database that tracks the sales records ofyour company's salesmen and when sales person is transferred from one regional office toanother.Dealing with these issues involves SCD management methodologies referred to as Type0, 1, 2, 3, 4, and 6. Type 6 SCDs are also sometimes called Hybrid SCDs.The Type 0 method is a passive approach to managing dimension value changes, inwhich no action is taken. Values remain as they were at the time the dimension record wasfirst entered.The Type 1 methodology overwrites old data with new data, and therefore does nottrack historical data at all. This is most appropriate when correcting certain types of dataerrors, such as the spelling of a name. (Assuming we won't ever need to know how it usedto be misspelled in the past.)The Type 2 method tracks historical data by creating multiple records in thedimensional tables with separate keys. With Type 2, we have unlimited history preservationas a new record is inserted each time a change is made.The Type 3 method tracks changes using separate columns. Whereas Type 2 hadunlimited history preservation, Type 3 has limited history preservation, as it's limited to thenumber of columns we designate for storing historical data. Where the original tablestructure in Type 1 and Type 2 was very similar, Type 3 will add additional columns to thetables: The Type 4 method is usually just referred to as using "history tables", where onetable keeps the current data and an additional table is used to keep a record of some or allchanges. The Type 6 method is one that combines the approaches of types 1, 2 and 3 (1 + 2 +3 = 6). It is not frequently used because it has the potential to complicate end user access,but has some advantages over the other approaches especially when techniques areemployed to mitigate the downstream complexity.

2. How did you implement Type2 SCD in Datastage?

The following steps are required to implement SCD type2 in Datastage.1) Need to take a snap shot of the WareHouse final target dimensional table and store in aDataSet or Temp Table or directly from the Data warehouse target table itself.2) Retrieve the new records from the source (Source table/ Flat file/ view/ or any othersource) and lookup the snap shot with the help of lookup or Join stage and based onPrimary Key3) Allow to pass both the values for these particular SCD columns (the columns that areaffected with change) with a different column name (like SalesTerritory Source andSalesTerritoryLkp.4) In the next step, in a transformer, compare these two values for every single primary keyusing stage variables. If these values are different then close the previous record bychoosing SalesTerritoryLkp value (coming from target snap shot) for SalesTerritory column,updating the CURRENT_RECORD='N' and setting the END_DATE as Current Time (when the records are being processed) in first link from the transformer. Simultaneouslywith the new value (Sales_TeritorySource ) in the SCD columns insert One more recordsinto the target table with START_DATE as current time, CURRENT_RECORD=‘Y’ andsetting the END_DATE as Null.

2013 (6) November (1)

June (5)Ten Reasons Why You NeedDataStage Inosphere 8.5

Data Warehousing Concepts

SCD Types and How Many ways todevelope the SCD's

About me

About me

BLOG ARCHIVE

ganesh s Follow 1

View my completeprofile

ABOUT ME

0 More Next Blog» Create Blog Sign In

Page 2: Datastage4u - Webnodefiles.datastage.webnode.com/200000156-0509105fde... · 1) Need to take a snap shot of the WareHouse final target dimensional table and store in a DataSet or Temp

12/6/2014 Datastage4u: SCD Types and How Many ways to develope the SCD's

http://datastageganesh.blogspot.in/2013/06/scdtypesandhowmanywaystodevelope.html 2/13

5) To maintain the uniqueness of these two records generate a surrogate key and use thisas the primary key in the target for dimensional table

SCD Type 1Type 1 Slowly Changing Dimension data warehouse architecture applies when nohistory is kept in the database. The new, changed data simply overwrites old entries. Thisapproach is used quite often with data which change over the time and it is caused bycorrecting data quality errors (misspells, data consolidations, trimming spaces, languagespecific characters).Type 1 SCD is easy to maintain and used mainly when losing the ability to track the oldhistory is not an issue.

SCD 1 implementation in Datastage

The job described and depicted below shows how to implement SCD Type 1 in Datastage.It is one of many possible designs which can implement this dimension. The example isbased on the customers load into a data warehouse

Datastage SCD1 job design

The most important facts and stages of the CUST_SCD2 job processing:• There is a hashed file (Hash_NewCust) which handles a lookup of the new datacoming from the text file.• A T001_Lookups transformer does a lookup into a hashed file and maps new andold values to separate columns.SCD1 Transformer mapping

Page 3: Datastage4u - Webnodefiles.datastage.webnode.com/200000156-0509105fde... · 1) Need to take a snap shot of the WareHouse final target dimensional table and store in a DataSet or Temp

12/6/2014 Datastage4u: SCD Types and How Many ways to develope the SCD's

http://datastageganesh.blogspot.in/2013/06/scdtypesandhowmanywaystodevelope.html 3/13

http://etltools.info/images/scd1transformermapnewdata.jpg

• A T002 transformer updates old values with new ones without concerningabout the overwritten data.SCD1 Transformer update old entrieshttp://etltools.info/images/scd1transformeroverwritedata.jpg

• The database is updated in a target ODBC stage (with the 'update existing rows'update action)

SCD Type 2

Slowly changing dimension Type 2 is a model where the whole history is stored in thedatabase. An additional dimension record is created and the segmenting between the oldrecord values and the new (current) value is easy to extract and the history is clear.The fields 'effective date' and 'current indicator' are very often used in that dimension andthe fact table usually stores dimension key and version number.SCD 2 implementation in Datastage

The job described and depicted below shows how to implement SCD Type 2 in Datastage.It is one of many possible designs which can implement this dimension.For this example, we will use a table with customers data (it's name isD_CUSTOMER_SCD2) which has the following structure and data:

Page 4: Datastage4u - Webnodefiles.datastage.webnode.com/200000156-0509105fde... · 1) Need to take a snap shot of the WareHouse final target dimensional table and store in a DataSet or Temp

12/6/2014 Datastage4u: SCD Types and How Many ways to develope the SCD's

http://datastageganesh.blogspot.in/2013/06/scdtypesandhowmanywaystodevelope.html 4/13

Datastage SCD2 job designhttp://etltools.info/images/scd2jobdesign.jpg

The most important facts and stages of the CUST_SCD2 job processing: The dimension table with customers is refreshed daily and one of the data sources is atext file. For the purpose of this example the CUST_ID=ETIMAA5 differs from the onestored in the database and it is the only record with changed data. It has the followingstructure and data:SCD 2 Customers file extract:SCD 2 Customers file extract

There is a hashed file (Hash_NewCust) which handles a lookup of the new data comingfrom the text file. A T001_Lookups transformer does a lookup into a hashed file and maps new and oldvalues to separate columns.SCD 2 lookup transformer

Page 5: Datastage4u - Webnodefiles.datastage.webnode.com/200000156-0509105fde... · 1) Need to take a snap shot of the WareHouse final target dimensional table and store in a DataSet or Temp

12/6/2014 Datastage4u: SCD Types and How Many ways to develope the SCD's

http://datastageganesh.blogspot.in/2013/06/scdtypesandhowmanywaystodevelope.html 5/13

http://etltools.info/images/scd2transformerlookup.jpg

A T002_Check_Discrepacies_exist transformer compares old and new values of recordsand passes through only records that differ.SCD 2 check discrepancies transformerhttp://etltools.info/images/scd2transformerfinddiscrepancies.jpg

A T003 transformer handles the UPDATE and INSERT actions of a record. The oldrecord is updated with current indictator flag set to no and the new record is inserted withcurrent indictator flag set to yes, increased record version by 1 and the current date.SCD 2 insertupdate record transformer

Page 6: Datastage4u - Webnodefiles.datastage.webnode.com/200000156-0509105fde... · 1) Need to take a snap shot of the WareHouse final target dimensional table and store in a DataSet or Temp

12/6/2014 Datastage4u: SCD Types and How Many ways to develope the SCD's

http://datastageganesh.blogspot.in/2013/06/scdtypesandhowmanywaystodevelope.html 6/13

http://etltools.info/images/scd2transformerinsertupdaterecord.jpg

ODBC Update stage (O_DW_Customers_SCD2_Upd) update action 'Update existingrows only' and the selected key columns are CUST_ID and REC_VERSION so they willappear in the constructed where part of an SQL statement. ODBC Insert stage (O_DW_Customers_SCD2_Ins) insert action 'insert rows withoutclearing' and the key column is CUST_ID.

In the Type 3 Slowly Changing Dimension only the information about a previous valueof a dimension is written into the database. An 'old 'or 'previous' column is created whichstores the immediate previous attribute. In Type 3 SCD users are able to describe historyimmediately and can report both forward and backward from the change. However, that model can't track all historical changes, such as when a dimension changestwice or more. It would require creating next columns to store historical data and couldmake the whole data warehouse schema very complex.

To implement SCD Type 3 in Datastage use the same processing as in the SCD2example, only changing the destination stages to update the old value with a new one andupdate the previous value field.

The Type 4 SCD idea is to store all historical changes in a separate historical data tablefor each of the dimensions.

To implement SCD Type 4 in Datastage use the same processing as in the SCD2example, only changing the destination stages to insert an old value into the destionationstage connected to the historical data table (D_CUSTOMER_HIST for example) andupdate the old value with a new one.

SCD 2 implementation in Datastage Parallel Jobs7.5X2.

Change Capture Stage: “It is processing stage, that it capture whether a record from table is copy or editedor insert or to delete by keeping the code column name”.

SCD TYPE 3

SCD TYPE 4

Page 7: Datastage4u - Webnodefiles.datastage.webnode.com/200000156-0509105fde... · 1) Need to take a snap shot of the WareHouse final target dimensional table and store in a DataSet or Temp

12/6/2014 Datastage4u: SCD Types and How Many ways to develope the SCD's

http://datastageganesh.blogspot.in/2013/06/scdtypesandhowmanywaystodevelope.html 7/13

Simple example of change capture:

Change_capture

Properties of Change Capture: Change keys

o Key = EID (key column name) Sort order = ascending order

Change valveso Values =? \\ ENAMEo Values =? \\ ADD

Optionso Change mode = (explicit keys & values / explicit keys, values)o Drop output for copy = (false/ true) “false – default ”o Drop output for delete = (false/ true) “false – default”o Drop output for edit = (false/ true) “false – default”o Drop output for insert = (false/ true) “false – default”

Copy code = 0 Delete code = 2 Edit code = 3 Insert code = 1 Code column name = <column name>

o Log statistics = (false/ true) “false – default”

Change Apply Stage: “It is processing stage, that it applies the changes of records of a table”.

Change Apply Properties of Change Apply: Change keys

o Key = EID Sort order = ascending order

Optionso Change mode = explicit key & valueso Check value columns on delete = (false/ true) “true default”o Log statistics = falseo Code column name = <column name> \\ change capture and this has to be

SAME for apply operations

SCD II in version 7.5.x2

Page 8: Datastage4u - Webnodefiles.datastage.webnode.com/200000156-0509105fde... · 1) Need to take a snap shot of the WareHouse final target dimensional table and store in a DataSet or Temp

12/6/2014 Datastage4u: SCD Types and How Many ways to develope the SCD's

http://datastageganesh.blogspot.in/2013/06/scdtypesandhowmanywaystodevelope.html 8/13

Design of that ESDATE=current date () EEDATE= “99991231” Key=EID ACF= “Y” option: e k & vBefore.txt c=3

c=all after.txt key= EID option: e k & v before.txt

ESDATE currentdate ()

EEDATE if c=3 thenDFJD(JDFD(CD())1)

elseEEDATE = “99991231” ACF if(c=3) then “N” else “Y”

Example table of SCD data:SID CID CNAME ADD AF ESDATE EEDATE RV UID1 11 A HYD N 030606 291110 1 12 22 B SEC N 030606 070907 1 23 33 C DEL Y 030606 99991231 1 34 22 B DEL N 080907 291110 2 25 44 D MCI Y 080907 99991231 1 56 11 A GDK Y 301110 99991231 2 17 22 B RAJ Y 301110 99991231 3 28 55 E CUL Y 301110 99991231 1 8

Table: this table is describing the SCD six types and the description is shown above.DAY 44

SCD I & SCD II (Design and Properties)

SCD – I: Type1 (Design and Properties):Transfer job Load job10,20,30

OE_DIM before fact DS_FACT 10, 20,40 10, 20, 40

Page 9: Datastage4u - Webnodefiles.datastage.webnode.com/200000156-0509105fde... · 1) Need to take a snap shot of the WareHouse final target dimensional table and store in a DataSet or Temp

12/6/2014 Datastage4u: SCD Types and How Many ways to develope the SCD's

http://datastageganesh.blogspot.in/2013/06/scdtypesandhowmanywaystodevelope.html 9/13

Text Box:

DS_TRG_DIM OE_UPSERT10, 20, 40 After dim 10,20, 40 update andinsert

OE_SRC DS_TRG_DIM

In oracle we have to create table1 and table2,Table1: Create table SRC(SNO number, SNAME varchar2(25));

o Insert into src values(111, ‘naveen’);o Insert into src values(222, ‘munna’);o Insert into src values(333, ‘kumar’);

Table2: Create table DIM(SKID number, SNO number, SNAME varchar2(25));

o No records to display;

Processes of transform job SCD1:

Step 1: Load plugin Meta data from oracle of before and after data as shown in the abovelinks that coming from different sources.

Step 2: “SCD1 properties”

Fast path 1 of 5: select output link as:

Fast path 2 of 5: navigating the key column value between before and after tables

Fast path 3 of 5: selecting source type and source name.

Source type: source name:

NOTE: for every time of running the program we should empty the source name i.e.,empty.txt, else surrogate key will continue with last stored value.

Fast path 4 of 5: select output in DIM.

For path 5 of 5: setting the output paths to FACT data set.

fact

Page 10: Datastage4u - Webnodefiles.datastage.webnode.com/200000156-0509105fde... · 1) Need to take a snap shot of the WareHouse final target dimensional table and store in a DataSet or Temp

12/6/2014 Datastage4u: SCD Types and How Many ways to develope the SCD's

http://datastageganesh.blogspot.in/2013/06/scdtypesandhowmanywaystodevelope.html 10/13

Text Box:

Step 3: In the Next job, i.e. in load job if we change or edit in the source table and whenyou are loading into oracle we must change the write method = upsert in that we have twooptions they are, update n insert \\ if key column value is already.

insert n update \\ if key column value is new.

Here SCD I result is for the below input

SCD – II: (Design and Properties):

Transfer job Load job10,20,30 beforeOE_DIM fact DS_FACT 10, 20, 20, 30, 40 10, 20, 20, 30, 40

DS_TRG_DIM OE_UPSERT10, 20, 40 After dim 10, 20, 20, 30, 40 update and insert

OE_SRC DS_TRG_DIM

Step 1: in transformer stage:Adding some columns to the to before table – to covert EEDATE and ESDATE columnsinto time stamp transformer stage to perform SCD II

Page 11: Datastage4u - Webnodefiles.datastage.webnode.com/200000156-0509105fde... · 1) Need to take a snap shot of the WareHouse final target dimensional table and store in a DataSet or Temp

12/6/2014 Datastage4u: SCD Types and How Many ways to develope the SCD's

http://datastageganesh.blogspot.in/2013/06/scdtypesandhowmanywaystodevelope.html 11/13

In TX properties:

In SCD II properties:

Fast path 1 of 5: select output link as:

Fast path 2 of 5: navigating the key column value between before and after tables

Fast path 3 of 5: selecting source type and source name.

Source type: source name:

NOTE: for every time of running the program we should empty the source name i.e.,empty.txt, else surrogate key will continue with last stored value.

Fast path 4 of 5: select output in DIM.

fact

Page 12: Datastage4u - Webnodefiles.datastage.webnode.com/200000156-0509105fde... · 1) Need to take a snap shot of the WareHouse final target dimensional table and store in a DataSet or Temp

12/6/2014 Datastage4u: SCD Types and How Many ways to develope the SCD's

http://datastageganesh.blogspot.in/2013/06/scdtypesandhowmanywaystodevelope.html 12/13

Posted by ganesh s at 10:15 AM

For path 5 of 5: setting the output paths to FACT data set.

Step 3: In the Next job, i.e. in load job if we change or edit in the source table and whenyou are loading into oracle we must change the write method = upsert in that we have twooptions they are, update n insert \\ if key column value is already.

insert n update \\ if key column value is new.

Here SCD II result is for the below input

Recommend this on Google

Enter your comment...

Comment as: Google Account

Publish

Preview

No comments:

Post a Comment

Page 13: Datastage4u - Webnodefiles.datastage.webnode.com/200000156-0509105fde... · 1) Need to take a snap shot of the WareHouse final target dimensional table and store in a DataSet or Temp

12/6/2014 Datastage4u: SCD Types and How Many ways to develope the SCD's

http://datastageganesh.blogspot.in/2013/06/scdtypesandhowmanywaystodevelope.html 13/13

Newer Post Older PostHome

Subscribe to: Post Comments (Atom)

Picture Window template. Powered by Blogger.