51
Data Twisting OUGN Spring Seminar 10-12 March 2016 Kim Berg Hansen Senior Consultant

Data twisting

Embed Size (px)

Citation preview

Page 1: Data twisting

Data TwistingOUGN Spring Seminar 10-12 March 2016

Kim Berg HansenSenior Consultant

Page 2: Data twisting

Data Twisting2 05/01/2023

• Danish geek• SQL & PL/SQL developer since 2000• Developer at Trivadis AG since 2016

http://www.trivadis.dk• Oracle Certified Expert in SQL• Oracle ACE• Blogger at http://www.kibeha.dk• SQL quizmaster at

http://plsqlchallenge.oracle.com• Likes to cook• Reads sci-fi• Chairman of local chapter of

Danish Beer Enthusiasts

About me

Page 3: Data twisting

Data Twisting3 05/01/2023

About Trivadis

Trivadis is a market leader in IT consulting, system integration, solution engineeringand the provision of IT services focusing on and technologies in Switzerland, Germany, Austria and Denmark.We offer our services in the following strategic business fields:

Trivadis Services takes over the interacting operation of your IT systems.

O P E R A T I O N

Page 4: Data twisting

Data Twisting4 05/01/2023

COPENHAGEN

MUNICH

LAUSANNEBERN

ZURICHBRUGG

GENEVA

HAMBURG

DÜSSELDORF

FRANKFURT

STUTTGART

FREIBURG

BASEL

VIENNA

With over 600 specialists and IT experts in your region

14 Trivadis branches and more than600 employees

260 Service Level Agreements

Over 4,000 training participants

Research and development budget:EUR 5.0 million

Financially self-supporting and sustainably profitable

Experience from more than 1,900 projects per year at over 800customers

Page 5: Data twisting

Agenda for Data Twisting

Data Twisting5 05/01/2023

1. Why do we need to Twist, Shake, Rattle ‘n‘ Roll2. Twist

UNPIVOT with single or multi-column dimensionsUnpivoting with row generators

3. ShakePIVOT with single or multi-column dimensions, with or without groupingPivoting with GROUP BY and CASE

4. RattleTurning delimited data into columns and rowsODCI dynamic table function parser

5. RollLISTAGG to turn rows into delimited dataAlternative methods for string aggregation

6. Coda

Page 6: Data twisting

Data Twisting6 05/01/2023

Twist, Shake, Rattle ’n’ Roll

Page 7: Data twisting

Data Twisting7 05/01/2023

EMEA AMER ASOC

Beer

WineWine

Twist Columns to Rows

Category Region Sales

Beer

Wine

200000

10000

150000

25000

225000

17500

EMEA AMER ASOC

Beer

Page 8: Data twisting

Data Twisting8 05/01/2023

EMEA

AMER

ASOC

Beer

Wine

Wine

Shake Rows to Columns

Category Region Sales

Beer

Wine

200000

10000

150000

25000

225000

EMEA

AMER

ASOCBeer

17500

Page 9: Data twisting

Data Twisting9 05/01/2023

Beer

Wine

Rattle Delimited Data to Columns

CategoryEMEAAMER

200000

10000

150000

25000

225000

17500

ASOCCategory;EMEA;AMER;ASOC

Beer;200000;150000;225000

Wine;10000;25000;17500

Page 10: Data twisting

Data Twisting10 05/01/2023

Rattle Delimited Data to Rows

Category

BeerBeer

Type

PilsnerBeer

WineWine

AleStout

RedChampagne

TypeList

Pilsner;Ale;Stout

Red;Champagne

Page 11: Data twisting

Data Twisting11 05/01/2023

TypeList

Pilsner;Ale;Stout

Red;Champagne

Roll Rows to Delimited Data

Category

Beer

Beer

Beer

Wine

Wine

Type

Pilsner

Ale

Stout

RedChampagne

Page 12: Data twisting

Data Twisting12 05/01/2023

Twist

Page 13: Data twisting

Data Twisting13 05/01/2023

Single dimension and measure

create table sales1 ( category varchar2(10) , emea number , amer number , asoc number);

insert into sales1 values ('Beer', 200000, 150000, 225000);

insert into sales1 values ('Wine', 10000, 25000, 17500);

Table of beverage sales with columns per region

Page 14: Data twisting

Data Twisting14 05/01/2023

Single dimension and measure

select category, region, sales from sales1unpivot ( sales for region in ( emea as 'EMEA' , amer as 'AMER' , asoc as 'ASOC' )) order by category, region;

UNPIVOT create dimension REGION and measure SALES

CATEGORY REGI SALES---------- ---- ----------Beer AMER 150000Beer ASOC 225000Beer EMEA 200000Wine AMER 25000Wine ASOC 17500Wine EMEA 10000

Page 15: Data twisting

Data Twisting15 05/01/2023

Single dimension and measure

select category , case n# when 1 then 'EMEA' when 2 then 'AMER' when 3 then 'ASOC' end region , case n# when 1 then emea when 2 then amer when 3 then asoc end sales from sales1 cross join ( select level n# from dual connect by level <= 3 ) order by category, region;

Generate 3 rows - Cartesian join – CASE logic for dimension and measure

CATEGORY REGI SALES---------- ---- ----------Beer AMER 150000Beer ASOC 225000Beer EMEA 200000Wine AMER 25000Wine ASOC 17500Wine EMEA 10000

Page 16: Data twisting

Data Twisting16 05/01/2023

Single dimension and measure

with r (region) as ( select 'EMEA' from dual union all select 'AMER' from dual union all select 'ASOC' from dual)select category, region , case region when 'EMEA' then emea when 'AMER' then amer when 'ASOC' then asoc end sales from sales1 cross join r order by category, region;

Generate 3 rows with dimension - Cartesian join – CASE logic for measure

CATEGORY REGI SALES---------- ---- ----------Beer AMER 150000Beer ASOC 225000Beer EMEA 200000Wine AMER 25000Wine ASOC 17500Wine EMEA 10000

Page 17: Data twisting

Data Twisting17 05/01/2023

Multiple dimensions and measures

create table sales2 ( category varchar2(10) , dk_b2b_qty number , dk_b2b_amount number , dk_b2c_qty number , dk_b2c_amount number , uk_b2b_qty number , uk_b2b_amount number , uk_b2c_qty number , uk_b2c_amount number);

insert into sales2 values ('Beer', 500, 5000, 250, 2500, 100, 1000, 200, 2000);

insert into sales2 values ('Wine', 150, 3000, 200, 4000, 400, 8000, 300, 6000);

Table of beverage sales with qty and amount columns per country and channel

Page 18: Data twisting

Data Twisting18 05/01/2023

Multiple dimensions and measures

select category, country, channel, qty, amount from sales2unpivot ( ( qty, amount ) for ( country, channel ) in ( (dk_b2b_qty, dk_b2b_amount) as ('DK', 'B2B') , (dk_b2c_qty, dk_b2c_amount) as ('DK', 'B2C') , (uk_b2b_qty, uk_b2b_amount) as ('UK', 'B2B') , (uk_b2c_qty, uk_b2c_amount) as ('UK', 'B2C') )) order by category, country, channel;

UNPIVOT create dimensions COUNTRY, CHANNEL and measures QTY, AMOUNT

CATEGORY CO CHA QTY AMOUNT---------- -- --- ----- -------Beer DK B2B 500 5000Beer DK B2C 250 2500Beer UK B2B 100 1000Beer UK B2C 200 2000Wine DK B2B 150 3000Wine DK B2C 200 4000Wine UK B2B 400 8000Wine UK B2C 300 6000

Page 19: Data twisting

Data Twisting19 05/01/2023

Single dimension and multiple measures

select category, country_and_channel, qty, amount from sales2unpivot ( ( qty, amount ) for ( country_and_channel ) in ( (dk_b2b_qty, dk_b2b_amount) as ('DK_B2B') , (dk_b2c_qty, dk_b2c_amount) as ('DK_B2C') , (uk_b2b_qty, uk_b2b_amount) as ('UK_B2B') , (uk_b2c_qty, uk_b2c_amount) as ('UK_B2C') )) order by category, country_and_channel;

UNPIVOT create dimension COUNTRY_AND_CHANNEL - measures QTY, AMOUNT

CATEGORY COUNTR QTY AMOUNT---------- ------ ----- -------Beer DK_B2B 500 5000Beer DK_B2C 250 2500Beer UK_B2B 100 1000Beer UK_B2C 200 2000Wine DK_B2B 150 3000Wine DK_B2C 200 4000Wine UK_B2B 400 8000Wine UK_B2C 300 6000

Page 20: Data twisting

Data Twisting20 05/01/2023

Multiple dimensions and single measure

select category, country, channel, amount from sales2unpivot ( ( amount ) for ( country, channel ) in ( (dk_b2b_amount) as ('DK', 'B2B') , (dk_b2c_amount) as ('DK', 'B2C') , (uk_b2b_amount) as ('UK', 'B2B') , (uk_b2c_amount) as ('UK', 'B2C') )) order by category, country, channel;

UNPIVOT create dimensions COUNTRY, CHANNEL - measure AMOUNT

CATEGORY CO CHA AMOUNT---------- -- --- ----------Beer DK B2B 5000Beer DK B2C 2500Beer UK B2B 1000Beer UK B2C 2000Wine DK B2B 3000Wine DK B2C 4000Wine UK B2B 8000Wine UK B2C 6000

Page 21: Data twisting

Data Twisting21 05/01/2023

Shake

Page 22: Data twisting

Data Twisting22 05/01/2023

Single dimension and measure

create table sales3 ( category varchar2(10) , region varchar2(10) , sales number);

insert into sales3 values ('Beer', 'EMEA', 200000);insert into sales3 values ('Beer', 'AMER', 150000);insert into sales3 values ('Beer', 'ASOC', 225000);insert into sales3 values ('Wine', 'EMEA', 10000);insert into sales3 values ('Wine', 'AMER', 25000);insert into sales3 values ('Wine', 'ASOC', 17500);

Table of beverage sales per region

Page 23: Data twisting

Data Twisting23 05/01/2023

Single dimension and measure

select category, emea, amer, asoc from sales3 pivot ( sum(sales) for region in ( 'EMEA' as emea , 'AMER' as amer , 'ASOC' as asoc ) ) order by category;

PIVOT create 3 columns for 3 dimension values and 1 measure

CATEGORY EMEA AMER ASOC---------- ------- ------- -------Beer 200000 150000 225000Wine 10000 25000 17500

Page 24: Data twisting

Data Twisting24 05/01/2023

Single dimension and measure

select category , sum(case region when 'EMEA' then sales end) as emea , sum(case region when 'AMER' then sales end) as amer , sum(case region when 'ASOC' then sales end) as asoc from sales3 group by category order by category;

GROUP BY using CASE statement within SUM for each of the 3 dimension values

CATEGORY EMEA AMER ASOC---------- ------- ------- -------Beer 200000 150000 225000Wine 10000 25000 17500

Page 25: Data twisting

Data Twisting25 05/01/2023

Single dimension and measure

insert into sales3 values ('Beer', 'AMER', 25000);commit;

select category, emea, amer, asoc from sales3 pivot ( sum(sales) for region in ( 'EMEA' as emea , 'AMER' as amer , 'ASOC' as asoc ) ) order by category;

Aggregations used for non-unique dimensions

CATEGORY EMEA AMER ASOC---------- ------- ------- -------Beer 200000 175000 225000Wine 10000 25000 17500

Page 26: Data twisting

Data Twisting26 05/01/2023

Single dimension and multiple measures

select * from sales3 pivot ( sum(sales) , count(*) for region in ( 'EMEA' as emea , 'AMER' as amer , 'ASOC' as asoc ) ) order by category;

Columns are named <dim>_<measure> , so problem if no measure aliases

ERROR at line 1:ORA-00918: column ambiguously defined

Page 27: Data twisting

Data Twisting27 05/01/2023

Single dimension and multiple measures

CATEGORY EMEA_SALE EMEA_CNT AMER_SALE AMER_CNT ASOC_SALE ASOC_CNT-------- --------- -------- --------- -------- --------- --------Beer 200000 1 175000 2 225000 1Wine 10000 1 25000 1 17500 1

select category, emea_sale, emea_cnt, amer_sale, amer_cnt, asoc_sale, asoc_cnt from sales3 pivot ( sum(sales) as sale, count(*) as cnt for region in ( 'EMEA' as emea, 'AMER' as amer, 'ASOC' as asoc ) ) order by category;

With measure aliases we get 3x2 columns named <dim>_<measure> combinations

Page 28: Data twisting

Data Twisting28 05/01/2023

Multiple dimensions and measures

create table sales4 ( category varchar2(10) , country varchar2(10) , channel varchar2(10) , qty number , amount number);insert into sales4 values('Beer', 'DK', 'B2B', 500, 5000);insert into sales4 values('Beer', 'DK', 'B2C', 250, 2500);insert into sales4 values('Beer', 'UK', 'B2B', 100, 1000);insert into sales4 values('Beer', 'UK', 'B2C', 200, 2000);insert into sales4 values('Wine', 'DK', 'B2B', 150, 3000);insert into sales4 values('Wine', 'DK', 'B2C', 200, 4000);insert into sales4 values('Wine', 'UK', 'B2B', 400, 8000);insert into sales4 values('Wine', 'UK', 'B2C', 300, 6000);

Table of beverage sales measured in qty and amount per country and channel

Page 29: Data twisting

Data Twisting29 05/01/2023

Multiple dimensions and measures

CATEGORY DK_B2B_QTY DK_B2B_AMOUNT DK_B2C_QTY DK_B2C_AMOUNT UK_B2B_QTY UK_B2B_AMOUNT UK_B2C_QTY UK_B2C_AMOUNT---------- ---------- ------------- ---------- ------------- ---------- ------------- ---------- -------------Beer 500 5000 250 2500 100 1000 200 2000Wine 150 3000 200 4000 400 8000 300 6000

select category, dk_b2b_qty, dk_b2b_amount, dk_b2c_qty, dk_b2c_amount , uk_b2b_qty, uk_b2b_amount, uk_b2c_qty, uk_b2c_amount from sales4 pivot ( sum(qty) as qty, sum(amount) as amount for ( country, channel ) in ( ('DK', 'B2B') as dk_b2b , ('DK', 'B2C') as dk_b2c , ('UK', 'B2B') as uk_b2b , ('UK', 'B2C') as uk_b2c ) ) order by category;

With dimension and measure aliases we get (2x2)x2 columns

Page 30: Data twisting

Data Twisting30 05/01/2023

Rattle

Page 31: Data twisting

Data Twisting31 05/01/2023

Delimited data to columns

create table sales5 ( txt varchar2(100));

insert into sales5 values ('Beer;200000;150000;225000');insert into sales5 values ('Wine;10000;25000;17500');

Table of beverage sales as semi-colon separated text

Page 32: Data twisting

Data Twisting32 05/01/2023

Delimited data to columns

CATEGORY EMEA AMER ASOC-------- ------ ------ ------Beer 200000 150000 225000Wine 10000 25000 17500

select substr(txt, 1, instr(txt,';') - 1) category , substr( txt, instr(txt,';') + 1, instr(txt,';',1,2) - instr(txt,';') -1 ) emea , substr( txt, instr(txt,';',1,2) + 1, instr(txt,';',1,3) - instr(txt,';',1,2) - 1 ) amer , substr(txt, instr(txt,';',1,3) + 1) asoc from sales5 order by category;

Using SUBSTR and INSTR

Page 33: Data twisting

Data Twisting33 05/01/2023

Delimited data to columns

CATEGORY EMEA AMER ASOC-------- ------ ------ ------Beer 200000 150000 225000Wine 10000 25000 17500

select regexp_substr(txt, '[^;]+', 1, 1) category , regexp_substr(txt, '[^;]+', 1, 2) emea , regexp_substr(txt, '[^;]+', 1, 3) amer , regexp_substr(txt, '[^;]+', 1, 4) asoc from sales5 order by category;

Using REGEXP_SUBSTR

Page 34: Data twisting

Data Twisting34 05/01/2023

Delimited data to rows

create table beverages1 ( category varchar2(10) , typelist varchar2(100));

insert into beverages1 values ('Beer', 'Pilsner;Ale;Stout');insert into beverages1 values ('Wine', 'Red;Champagne');

Table of beverage types as semi-colon separated text

Page 35: Data twisting

Data Twisting35 05/01/2023

Delimited data to rows

create type beverage_collection_type as table of varchar2(10);/create or replace function beverage_typelist_to_coll ( typelist in beverages1.typelist%type ) return beverage_collection_type pipelinedis list_len pls_integer; from_pos pls_integer; to_pos pls_integer;begin list_len := length(typelist); from_pos := 1; loop to_pos := nvl(nullif(instr(typelist, ';', from_pos), 0), list_len+1); pipe row (substr(typelist, from_pos, to_pos-from_pos)); exit when to_pos > list_len; from_pos := to_pos + 1; end loop;end beverage_typelist_to_coll;/

Collection type and pipelined function to parse string and pipe out collection

Page 36: Data twisting

Data Twisting36 05/01/2023

Delimited data to rows

select category , column_value as beverage_type from beverages1 , table(beverage_typelist_to_coll(typelist)) order by category, beverage_type;

Use pipelined table function within TABLE

CATEGORY BEVERAGE_T-------- ----------Beer AleBeer PilsnerBeer StoutWine ChampagneWine Red

Page 37: Data twisting

Data Twisting37 05/01/2023

Delimited data to rows

select category , regexp_substr(typelist, '[^;]+', 1, sub#) beverage_type from beverages1 cross join lateral ( select level sub# from dual connect by level <= regexp_count(typelist, ';') + 1 ) order by category, beverage_type;

Generate count of delimiters + 1 rows per category (note: LATERAL requires 12c)

CATEGORY BEVERAGE_T-------- ----------Beer AleBeer PilsnerBeer StoutWine ChampagneWine Red

Page 38: Data twisting

Data Twisting38 05/01/2023

Delimited/structured data to rows and columns

create table beverages2 ( category varchar2(10) , typelist varchar2(100));

insert into beverages2 values ('Beer', 'Pilsner|Light;Ale|Medium;Stout|Dark');insert into beverages2 values ('Wine', 'Red|Red;Champagne|Clear');

Table of beverage types and colors as semi-colon and pipe separated text

Page 39: Data twisting

Data Twisting39 05/01/2023

Delimited/structured data to rows and columns

create or replace type delimited_col_row as object ( {globals} , static function parser( {params} ) return anydataset pipelined using delimited_col_row , static function odcitabledescribe( {params} ) return number , static function odcitableprepare( {params} ) return number , static function odcitablestart( {params} ) return number , member function odcitablefetch( {params} ) return number , member function odcitableclose( {params} ) return number)/

create or replace type body delimited_col_row as {implementation}end;/

Object type implementing ODCI functions (complete code in script: http://bit.ly/kibeha_datatwist_sql)

Page 40: Data twisting

Data Twisting40 05/01/2023

Delimited/structured data to rows and columns

select category, beverage_type, color from beverages2 , table( delimited_col_row.parser( typelist , 'BEVERAGE_TYPE|VARCHAR2(10);COLOR|VARCHAR2(10)' , '|' , ';' ) ) type_and_color order by category, beverage_type;

Use ODCI parser function within TABLE – Column definition string must be a literal

CATEGORY BEVERAGE_T COLOR-------- ---------- ----------Beer Ale MediumBeer Pilsner LightBeer Stout DarkWine Champagne ClearWine Red Red

Page 41: Data twisting

Data Twisting41 05/01/2023

Roll

Page 42: Data twisting

Data Twisting42 05/01/2023

Rows to delimited data

create table beverages3 ( category varchar2(10) , beverage_type varchar2(10));

insert into beverages3 values ('Beer', 'Pilsner');insert into beverages3 values ('Beer', 'Ale');insert into beverages3 values ('Beer', 'Stout');insert into beverages3 values ('Wine', 'Red');insert into beverages3 values ('Wine', 'Champagne');

Table of beverage types per category

Page 43: Data twisting

Data Twisting43 05/01/2023

Rows to delimited data

select category , listagg(beverage_type, ';') within group ( order by beverage_type ) typelist from beverages3 group by category order by category;

LISTAGG built-in aggregate function (11.2)

CATEGORY TYPELIST-------- --------------------Beer Ale;Pilsner;StoutWine Champagne;Red

Page 44: Data twisting

Data Twisting44 05/01/2023

Rows to delimited data

create type beverage_collection_type as table of varchar2(10);/create or replace function beverage_typecoll_to_string ( typecoll in beverage_collection_type ) return varchar2is type_string varchar2(4000);begin for idx in typecoll.first .. typecoll.last loop if idx = typecoll.first then type_string := typecoll(idx); else type_string := type_string || ';' || typecoll(idx); end if; end loop; return type_string;end beverage_typecoll_to_string;/

Create collection type and a function to turn collection into delimited string

Page 45: Data twisting

Data Twisting45 05/01/2023

Rows to delimited data

select category , beverage_typecoll_to_string( cast( collect( beverage_type order by beverage_type ) as beverage_collection_type ) ) typelist from beverages3 group by category order by category;

Use COLLECT to aggregate into collection, then call function to create string

CATEGORY TYPELIST-------- --------------------Beer Ale;Pilsner;StoutWine Champagne;Red

Page 46: Data twisting

Data Twisting46 05/01/2023

Rows to delimited data

create or replace type string_agg_type as object( total varchar2(4000), static function ODCIAggregateInitialize( {params} ) return number, member function ODCIAggregateIterate( {params} ) return number, member function ODCIAggregateTerminate( {params} ) return number, member function ODCIAggregateMerge( {params} ) return number );/create or replace type body string_agg_type {implementation}end;/create or replace function stragg( input varchar2 ) return varchar2 parallel_enable aggregate using string_agg_type;/

Tom Kyte STRAGG function using ODCI implementation of user aggregate functionhttps://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:2196162600402

Page 47: Data twisting

Data Twisting47 05/01/2023

Rows to delimited data

select category , stragg(beverage_type) typelist from beverages3 group by category order by category;

Use STRAGG like any aggregate – Note unlike LISTAGG this can not ORDER BY

CATEGORY TYPELIST-------- --------------------Beer Pilsner;Stout;AleWine Red;Champagne

Page 48: Data twisting

Data Twisting48 05/01/2023

Coda

Page 49: Data twisting

Data Twisting49 05/01/2023

We Can Boogie

Twist Columns to Rows

– UNPIVOT or dummy row generators

Shake Rows to Columns

– PIVOT or GROUP BY with CASE

Rattle Delimited Data to Columns or Rows

– Parse delimited data

Roll Rows to Delimited Data

– LISTAGG or other string aggregation techniques

Boogie!

Page 50: Data twisting

Data Twisting50 05/01/2023

Links

This presentation PowerPoint http://bit.ly/kibeha_datatwist_pptx

Script with all examples from this presentation http://bit.ly/kibeha_datatwist_sql

Page 51: Data twisting

Questions & AnswersKim Berg HansenSenior Consultant

[email protected]

05/01/2023 Data Twisting51

http://bit.ly/kibeha_datatwist_pptxhttp://bit.ly/kibeha_datatwist_sql