Upload
kim-berg-hansen
View
361
Download
0
Embed Size (px)
Citation preview
Data TwistingOUGN Spring Seminar 10-12 March 2016
Kim Berg HansenSenior Consultant
Data Twisting2 05/01/2023
• Danish geek• SQL & PL/SQL developer since 2000• Developer at Trivadis AG since 2016
http://www.trivadis.dk• Oracle Certified Expert in SQL• Oracle ACE• Blogger at http://www.kibeha.dk• SQL quizmaster at
http://plsqlchallenge.oracle.com• Likes to cook• Reads sci-fi• Chairman of local chapter of
Danish Beer Enthusiasts
About me
Data Twisting3 05/01/2023
About Trivadis
Trivadis is a market leader in IT consulting, system integration, solution engineeringand the provision of IT services focusing on and technologies in Switzerland, Germany, Austria and Denmark.We offer our services in the following strategic business fields:
Trivadis Services takes over the interacting operation of your IT systems.
O P E R A T I O N
Data Twisting4 05/01/2023
COPENHAGEN
MUNICH
LAUSANNEBERN
ZURICHBRUGG
GENEVA
HAMBURG
DÜSSELDORF
FRANKFURT
STUTTGART
FREIBURG
BASEL
VIENNA
With over 600 specialists and IT experts in your region
14 Trivadis branches and more than600 employees
260 Service Level Agreements
Over 4,000 training participants
Research and development budget:EUR 5.0 million
Financially self-supporting and sustainably profitable
Experience from more than 1,900 projects per year at over 800customers
Agenda for Data Twisting
Data Twisting5 05/01/2023
1. Why do we need to Twist, Shake, Rattle ‘n‘ Roll2. Twist
UNPIVOT with single or multi-column dimensionsUnpivoting with row generators
3. ShakePIVOT with single or multi-column dimensions, with or without groupingPivoting with GROUP BY and CASE
4. RattleTurning delimited data into columns and rowsODCI dynamic table function parser
5. RollLISTAGG to turn rows into delimited dataAlternative methods for string aggregation
6. Coda
Data Twisting6 05/01/2023
Twist, Shake, Rattle ’n’ Roll
Data Twisting7 05/01/2023
EMEA AMER ASOC
Beer
WineWine
Twist Columns to Rows
Category Region Sales
Beer
Wine
200000
10000
150000
25000
225000
17500
EMEA AMER ASOC
Beer
Data Twisting8 05/01/2023
EMEA
AMER
ASOC
Beer
Wine
Wine
Shake Rows to Columns
Category Region Sales
Beer
Wine
200000
10000
150000
25000
225000
EMEA
AMER
ASOCBeer
17500
Data Twisting9 05/01/2023
Beer
Wine
Rattle Delimited Data to Columns
CategoryEMEAAMER
200000
10000
150000
25000
225000
17500
ASOCCategory;EMEA;AMER;ASOC
Beer;200000;150000;225000
Wine;10000;25000;17500
Data Twisting10 05/01/2023
Rattle Delimited Data to Rows
Category
BeerBeer
Type
PilsnerBeer
WineWine
AleStout
RedChampagne
TypeList
Pilsner;Ale;Stout
Red;Champagne
Data Twisting11 05/01/2023
TypeList
Pilsner;Ale;Stout
Red;Champagne
Roll Rows to Delimited Data
Category
Beer
Beer
Beer
Wine
Wine
Type
Pilsner
Ale
Stout
RedChampagne
Data Twisting12 05/01/2023
Twist
Data Twisting13 05/01/2023
Single dimension and measure
create table sales1 ( category varchar2(10) , emea number , amer number , asoc number);
insert into sales1 values ('Beer', 200000, 150000, 225000);
insert into sales1 values ('Wine', 10000, 25000, 17500);
Table of beverage sales with columns per region
Data Twisting14 05/01/2023
Single dimension and measure
select category, region, sales from sales1unpivot ( sales for region in ( emea as 'EMEA' , amer as 'AMER' , asoc as 'ASOC' )) order by category, region;
UNPIVOT create dimension REGION and measure SALES
CATEGORY REGI SALES---------- ---- ----------Beer AMER 150000Beer ASOC 225000Beer EMEA 200000Wine AMER 25000Wine ASOC 17500Wine EMEA 10000
Data Twisting15 05/01/2023
Single dimension and measure
select category , case n# when 1 then 'EMEA' when 2 then 'AMER' when 3 then 'ASOC' end region , case n# when 1 then emea when 2 then amer when 3 then asoc end sales from sales1 cross join ( select level n# from dual connect by level <= 3 ) order by category, region;
Generate 3 rows - Cartesian join – CASE logic for dimension and measure
CATEGORY REGI SALES---------- ---- ----------Beer AMER 150000Beer ASOC 225000Beer EMEA 200000Wine AMER 25000Wine ASOC 17500Wine EMEA 10000
Data Twisting16 05/01/2023
Single dimension and measure
with r (region) as ( select 'EMEA' from dual union all select 'AMER' from dual union all select 'ASOC' from dual)select category, region , case region when 'EMEA' then emea when 'AMER' then amer when 'ASOC' then asoc end sales from sales1 cross join r order by category, region;
Generate 3 rows with dimension - Cartesian join – CASE logic for measure
CATEGORY REGI SALES---------- ---- ----------Beer AMER 150000Beer ASOC 225000Beer EMEA 200000Wine AMER 25000Wine ASOC 17500Wine EMEA 10000
Data Twisting17 05/01/2023
Multiple dimensions and measures
create table sales2 ( category varchar2(10) , dk_b2b_qty number , dk_b2b_amount number , dk_b2c_qty number , dk_b2c_amount number , uk_b2b_qty number , uk_b2b_amount number , uk_b2c_qty number , uk_b2c_amount number);
insert into sales2 values ('Beer', 500, 5000, 250, 2500, 100, 1000, 200, 2000);
insert into sales2 values ('Wine', 150, 3000, 200, 4000, 400, 8000, 300, 6000);
Table of beverage sales with qty and amount columns per country and channel
Data Twisting18 05/01/2023
Multiple dimensions and measures
select category, country, channel, qty, amount from sales2unpivot ( ( qty, amount ) for ( country, channel ) in ( (dk_b2b_qty, dk_b2b_amount) as ('DK', 'B2B') , (dk_b2c_qty, dk_b2c_amount) as ('DK', 'B2C') , (uk_b2b_qty, uk_b2b_amount) as ('UK', 'B2B') , (uk_b2c_qty, uk_b2c_amount) as ('UK', 'B2C') )) order by category, country, channel;
UNPIVOT create dimensions COUNTRY, CHANNEL and measures QTY, AMOUNT
CATEGORY CO CHA QTY AMOUNT---------- -- --- ----- -------Beer DK B2B 500 5000Beer DK B2C 250 2500Beer UK B2B 100 1000Beer UK B2C 200 2000Wine DK B2B 150 3000Wine DK B2C 200 4000Wine UK B2B 400 8000Wine UK B2C 300 6000
Data Twisting19 05/01/2023
Single dimension and multiple measures
select category, country_and_channel, qty, amount from sales2unpivot ( ( qty, amount ) for ( country_and_channel ) in ( (dk_b2b_qty, dk_b2b_amount) as ('DK_B2B') , (dk_b2c_qty, dk_b2c_amount) as ('DK_B2C') , (uk_b2b_qty, uk_b2b_amount) as ('UK_B2B') , (uk_b2c_qty, uk_b2c_amount) as ('UK_B2C') )) order by category, country_and_channel;
UNPIVOT create dimension COUNTRY_AND_CHANNEL - measures QTY, AMOUNT
CATEGORY COUNTR QTY AMOUNT---------- ------ ----- -------Beer DK_B2B 500 5000Beer DK_B2C 250 2500Beer UK_B2B 100 1000Beer UK_B2C 200 2000Wine DK_B2B 150 3000Wine DK_B2C 200 4000Wine UK_B2B 400 8000Wine UK_B2C 300 6000
Data Twisting20 05/01/2023
Multiple dimensions and single measure
select category, country, channel, amount from sales2unpivot ( ( amount ) for ( country, channel ) in ( (dk_b2b_amount) as ('DK', 'B2B') , (dk_b2c_amount) as ('DK', 'B2C') , (uk_b2b_amount) as ('UK', 'B2B') , (uk_b2c_amount) as ('UK', 'B2C') )) order by category, country, channel;
UNPIVOT create dimensions COUNTRY, CHANNEL - measure AMOUNT
CATEGORY CO CHA AMOUNT---------- -- --- ----------Beer DK B2B 5000Beer DK B2C 2500Beer UK B2B 1000Beer UK B2C 2000Wine DK B2B 3000Wine DK B2C 4000Wine UK B2B 8000Wine UK B2C 6000
Data Twisting21 05/01/2023
Shake
Data Twisting22 05/01/2023
Single dimension and measure
create table sales3 ( category varchar2(10) , region varchar2(10) , sales number);
insert into sales3 values ('Beer', 'EMEA', 200000);insert into sales3 values ('Beer', 'AMER', 150000);insert into sales3 values ('Beer', 'ASOC', 225000);insert into sales3 values ('Wine', 'EMEA', 10000);insert into sales3 values ('Wine', 'AMER', 25000);insert into sales3 values ('Wine', 'ASOC', 17500);
Table of beverage sales per region
Data Twisting23 05/01/2023
Single dimension and measure
select category, emea, amer, asoc from sales3 pivot ( sum(sales) for region in ( 'EMEA' as emea , 'AMER' as amer , 'ASOC' as asoc ) ) order by category;
PIVOT create 3 columns for 3 dimension values and 1 measure
CATEGORY EMEA AMER ASOC---------- ------- ------- -------Beer 200000 150000 225000Wine 10000 25000 17500
Data Twisting24 05/01/2023
Single dimension and measure
select category , sum(case region when 'EMEA' then sales end) as emea , sum(case region when 'AMER' then sales end) as amer , sum(case region when 'ASOC' then sales end) as asoc from sales3 group by category order by category;
GROUP BY using CASE statement within SUM for each of the 3 dimension values
CATEGORY EMEA AMER ASOC---------- ------- ------- -------Beer 200000 150000 225000Wine 10000 25000 17500
Data Twisting25 05/01/2023
Single dimension and measure
insert into sales3 values ('Beer', 'AMER', 25000);commit;
select category, emea, amer, asoc from sales3 pivot ( sum(sales) for region in ( 'EMEA' as emea , 'AMER' as amer , 'ASOC' as asoc ) ) order by category;
Aggregations used for non-unique dimensions
CATEGORY EMEA AMER ASOC---------- ------- ------- -------Beer 200000 175000 225000Wine 10000 25000 17500
Data Twisting26 05/01/2023
Single dimension and multiple measures
select * from sales3 pivot ( sum(sales) , count(*) for region in ( 'EMEA' as emea , 'AMER' as amer , 'ASOC' as asoc ) ) order by category;
Columns are named <dim>_<measure> , so problem if no measure aliases
ERROR at line 1:ORA-00918: column ambiguously defined
Data Twisting27 05/01/2023
Single dimension and multiple measures
CATEGORY EMEA_SALE EMEA_CNT AMER_SALE AMER_CNT ASOC_SALE ASOC_CNT-------- --------- -------- --------- -------- --------- --------Beer 200000 1 175000 2 225000 1Wine 10000 1 25000 1 17500 1
select category, emea_sale, emea_cnt, amer_sale, amer_cnt, asoc_sale, asoc_cnt from sales3 pivot ( sum(sales) as sale, count(*) as cnt for region in ( 'EMEA' as emea, 'AMER' as amer, 'ASOC' as asoc ) ) order by category;
With measure aliases we get 3x2 columns named <dim>_<measure> combinations
Data Twisting28 05/01/2023
Multiple dimensions and measures
create table sales4 ( category varchar2(10) , country varchar2(10) , channel varchar2(10) , qty number , amount number);insert into sales4 values('Beer', 'DK', 'B2B', 500, 5000);insert into sales4 values('Beer', 'DK', 'B2C', 250, 2500);insert into sales4 values('Beer', 'UK', 'B2B', 100, 1000);insert into sales4 values('Beer', 'UK', 'B2C', 200, 2000);insert into sales4 values('Wine', 'DK', 'B2B', 150, 3000);insert into sales4 values('Wine', 'DK', 'B2C', 200, 4000);insert into sales4 values('Wine', 'UK', 'B2B', 400, 8000);insert into sales4 values('Wine', 'UK', 'B2C', 300, 6000);
Table of beverage sales measured in qty and amount per country and channel
Data Twisting29 05/01/2023
Multiple dimensions and measures
CATEGORY DK_B2B_QTY DK_B2B_AMOUNT DK_B2C_QTY DK_B2C_AMOUNT UK_B2B_QTY UK_B2B_AMOUNT UK_B2C_QTY UK_B2C_AMOUNT---------- ---------- ------------- ---------- ------------- ---------- ------------- ---------- -------------Beer 500 5000 250 2500 100 1000 200 2000Wine 150 3000 200 4000 400 8000 300 6000
select category, dk_b2b_qty, dk_b2b_amount, dk_b2c_qty, dk_b2c_amount , uk_b2b_qty, uk_b2b_amount, uk_b2c_qty, uk_b2c_amount from sales4 pivot ( sum(qty) as qty, sum(amount) as amount for ( country, channel ) in ( ('DK', 'B2B') as dk_b2b , ('DK', 'B2C') as dk_b2c , ('UK', 'B2B') as uk_b2b , ('UK', 'B2C') as uk_b2c ) ) order by category;
With dimension and measure aliases we get (2x2)x2 columns
Data Twisting30 05/01/2023
Rattle
Data Twisting31 05/01/2023
Delimited data to columns
create table sales5 ( txt varchar2(100));
insert into sales5 values ('Beer;200000;150000;225000');insert into sales5 values ('Wine;10000;25000;17500');
Table of beverage sales as semi-colon separated text
Data Twisting32 05/01/2023
Delimited data to columns
CATEGORY EMEA AMER ASOC-------- ------ ------ ------Beer 200000 150000 225000Wine 10000 25000 17500
select substr(txt, 1, instr(txt,';') - 1) category , substr( txt, instr(txt,';') + 1, instr(txt,';',1,2) - instr(txt,';') -1 ) emea , substr( txt, instr(txt,';',1,2) + 1, instr(txt,';',1,3) - instr(txt,';',1,2) - 1 ) amer , substr(txt, instr(txt,';',1,3) + 1) asoc from sales5 order by category;
Using SUBSTR and INSTR
Data Twisting33 05/01/2023
Delimited data to columns
CATEGORY EMEA AMER ASOC-------- ------ ------ ------Beer 200000 150000 225000Wine 10000 25000 17500
select regexp_substr(txt, '[^;]+', 1, 1) category , regexp_substr(txt, '[^;]+', 1, 2) emea , regexp_substr(txt, '[^;]+', 1, 3) amer , regexp_substr(txt, '[^;]+', 1, 4) asoc from sales5 order by category;
Using REGEXP_SUBSTR
Data Twisting34 05/01/2023
Delimited data to rows
create table beverages1 ( category varchar2(10) , typelist varchar2(100));
insert into beverages1 values ('Beer', 'Pilsner;Ale;Stout');insert into beverages1 values ('Wine', 'Red;Champagne');
Table of beverage types as semi-colon separated text
Data Twisting35 05/01/2023
Delimited data to rows
create type beverage_collection_type as table of varchar2(10);/create or replace function beverage_typelist_to_coll ( typelist in beverages1.typelist%type ) return beverage_collection_type pipelinedis list_len pls_integer; from_pos pls_integer; to_pos pls_integer;begin list_len := length(typelist); from_pos := 1; loop to_pos := nvl(nullif(instr(typelist, ';', from_pos), 0), list_len+1); pipe row (substr(typelist, from_pos, to_pos-from_pos)); exit when to_pos > list_len; from_pos := to_pos + 1; end loop;end beverage_typelist_to_coll;/
Collection type and pipelined function to parse string and pipe out collection
Data Twisting36 05/01/2023
Delimited data to rows
select category , column_value as beverage_type from beverages1 , table(beverage_typelist_to_coll(typelist)) order by category, beverage_type;
Use pipelined table function within TABLE
CATEGORY BEVERAGE_T-------- ----------Beer AleBeer PilsnerBeer StoutWine ChampagneWine Red
Data Twisting37 05/01/2023
Delimited data to rows
select category , regexp_substr(typelist, '[^;]+', 1, sub#) beverage_type from beverages1 cross join lateral ( select level sub# from dual connect by level <= regexp_count(typelist, ';') + 1 ) order by category, beverage_type;
Generate count of delimiters + 1 rows per category (note: LATERAL requires 12c)
CATEGORY BEVERAGE_T-------- ----------Beer AleBeer PilsnerBeer StoutWine ChampagneWine Red
Data Twisting38 05/01/2023
Delimited/structured data to rows and columns
create table beverages2 ( category varchar2(10) , typelist varchar2(100));
insert into beverages2 values ('Beer', 'Pilsner|Light;Ale|Medium;Stout|Dark');insert into beverages2 values ('Wine', 'Red|Red;Champagne|Clear');
Table of beverage types and colors as semi-colon and pipe separated text
Data Twisting39 05/01/2023
Delimited/structured data to rows and columns
create or replace type delimited_col_row as object ( {globals} , static function parser( {params} ) return anydataset pipelined using delimited_col_row , static function odcitabledescribe( {params} ) return number , static function odcitableprepare( {params} ) return number , static function odcitablestart( {params} ) return number , member function odcitablefetch( {params} ) return number , member function odcitableclose( {params} ) return number)/
create or replace type body delimited_col_row as {implementation}end;/
Object type implementing ODCI functions (complete code in script: http://bit.ly/kibeha_datatwist_sql)
Data Twisting40 05/01/2023
Delimited/structured data to rows and columns
select category, beverage_type, color from beverages2 , table( delimited_col_row.parser( typelist , 'BEVERAGE_TYPE|VARCHAR2(10);COLOR|VARCHAR2(10)' , '|' , ';' ) ) type_and_color order by category, beverage_type;
Use ODCI parser function within TABLE – Column definition string must be a literal
CATEGORY BEVERAGE_T COLOR-------- ---------- ----------Beer Ale MediumBeer Pilsner LightBeer Stout DarkWine Champagne ClearWine Red Red
Data Twisting41 05/01/2023
Roll
Data Twisting42 05/01/2023
Rows to delimited data
create table beverages3 ( category varchar2(10) , beverage_type varchar2(10));
insert into beverages3 values ('Beer', 'Pilsner');insert into beverages3 values ('Beer', 'Ale');insert into beverages3 values ('Beer', 'Stout');insert into beverages3 values ('Wine', 'Red');insert into beverages3 values ('Wine', 'Champagne');
Table of beverage types per category
Data Twisting43 05/01/2023
Rows to delimited data
select category , listagg(beverage_type, ';') within group ( order by beverage_type ) typelist from beverages3 group by category order by category;
LISTAGG built-in aggregate function (11.2)
CATEGORY TYPELIST-------- --------------------Beer Ale;Pilsner;StoutWine Champagne;Red
Data Twisting44 05/01/2023
Rows to delimited data
create type beverage_collection_type as table of varchar2(10);/create or replace function beverage_typecoll_to_string ( typecoll in beverage_collection_type ) return varchar2is type_string varchar2(4000);begin for idx in typecoll.first .. typecoll.last loop if idx = typecoll.first then type_string := typecoll(idx); else type_string := type_string || ';' || typecoll(idx); end if; end loop; return type_string;end beverage_typecoll_to_string;/
Create collection type and a function to turn collection into delimited string
Data Twisting45 05/01/2023
Rows to delimited data
select category , beverage_typecoll_to_string( cast( collect( beverage_type order by beverage_type ) as beverage_collection_type ) ) typelist from beverages3 group by category order by category;
Use COLLECT to aggregate into collection, then call function to create string
CATEGORY TYPELIST-------- --------------------Beer Ale;Pilsner;StoutWine Champagne;Red
Data Twisting46 05/01/2023
Rows to delimited data
create or replace type string_agg_type as object( total varchar2(4000), static function ODCIAggregateInitialize( {params} ) return number, member function ODCIAggregateIterate( {params} ) return number, member function ODCIAggregateTerminate( {params} ) return number, member function ODCIAggregateMerge( {params} ) return number );/create or replace type body string_agg_type {implementation}end;/create or replace function stragg( input varchar2 ) return varchar2 parallel_enable aggregate using string_agg_type;/
Tom Kyte STRAGG function using ODCI implementation of user aggregate functionhttps://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:2196162600402
Data Twisting47 05/01/2023
Rows to delimited data
select category , stragg(beverage_type) typelist from beverages3 group by category order by category;
Use STRAGG like any aggregate – Note unlike LISTAGG this can not ORDER BY
CATEGORY TYPELIST-------- --------------------Beer Pilsner;Stout;AleWine Red;Champagne
Data Twisting48 05/01/2023
Coda
Data Twisting49 05/01/2023
We Can Boogie
Twist Columns to Rows
– UNPIVOT or dummy row generators
Shake Rows to Columns
– PIVOT or GROUP BY with CASE
Rattle Delimited Data to Columns or Rows
– Parse delimited data
Roll Rows to Delimited Data
– LISTAGG or other string aggregation techniques
Boogie!
Data Twisting50 05/01/2023
Links
This presentation PowerPoint http://bit.ly/kibeha_datatwist_pptx
Script with all examples from this presentation http://bit.ly/kibeha_datatwist_sql
Questions & AnswersKim Berg HansenSenior Consultant
05/01/2023 Data Twisting51
http://bit.ly/kibeha_datatwist_pptxhttp://bit.ly/kibeha_datatwist_sql