View
213
Download
0
Embed Size (px)
Citation preview
Data Warehouses and Analytical Data Processing
in CERN’s Administrative Decision Making Support SystemsJan Janke
Software EngineerCERN / GS-AIS
October 25 - 29, 2010JINR/CERN Grid and Management Information Systems
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 2
Data Warehouses in Administrative Computing Recap: Data Warehouses Theory Data Warehouses and Information Systems in AIS
◦ Foundation, HR and FI Information Systems◦ Complex Data Extraction Processes◦ Pixel-Perfect Reporting◦ Dashboards
Detailed Data Warehouse Example◦ Management Data Layer (MDL)
Agenda
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 3
Data Warehouses in Administrative Computing Recap: Data Warehouses Theory Data Warehouses and Information Systems in AIS
◦ Foundation, HR and FI Information Systems◦ Complex Data Extraction Processes◦ Pixel-Perfect Reporting◦ Dashboards
Detailed Data Warehouse Example◦ Management Data Layer (MDL)
Agenda
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 7
Provides means to administrate CERN Enables physicists to focus on their work Allows management to make the right moves
Administrative Computing
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 8
Heterogeneous computing landscape Various specialised OLTP systems Planning needs Legal Requirements
Why Data Warehouses?
Support administrative staff Enforce security and safety on site Allow management to make decisions
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 9
Specialised Systems◦ Accounting, ERP for CERN stores◦ External contracts management◦ Payroll, treasury management, …
Example: Keep Finances Under Control
Specialised small user groups
Distinct databases
High availabilityand performance,real-time data
Systems only accessible to authorised specialists
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 10
General Financial Information System◦ Single system◦ Access to data from multiple sources◦ Different levels of complexity
Example: Keep Finances Under Control
Specialised small user groups
Distinct databases
High availabilityand performance,real-time data
Systems only accessible to authorised specialists
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 11
General Financial Information System◦ Single system◦ Access to data from multiple sources◦ Different levels of complexity
Example: Keep Finances Under Control
Users from all areas of CERN
Single data warehouse
High availabilityand performance,but no necessity for real-time data
Security is extremely important! System is accessible CERN wide.
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 12
Keep data in sync with data providers Master complex data extraction process Ensure high query performance Base for detailed data analysis
AIS’ Financial Data Warehouse
Technologies:o ORACLE RAC databaseo Java Enterprise web applicationso In-house developed frameworkso Third-party BI and reporting tools
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 13
Data Warehouses in Administrative Computing Recap: Data Warehouses Theory Data Warehouses and Information Systems in AIS
◦ Foundation, HR and FI Information Systems◦ Complex Data Extraction Processes◦ Pixel-Perfect Reporting◦ Dashboards
Detailed Data Warehouse Example◦ Management Data Layer (MDL)
Agenda
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 15
OLTP OLAP
Data source Operations OLTP (consolidated)
Data purpose Run the business Reporting, analysis
Inserts, updates High Periodic batch jobs
Query complexity Low High
DB design Normalized Star, snowflake
Availability Critical Less critical
Target Operational staff Middle/higher Mgmt.
OLTP vs OLAP
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 16
OLTP OLAP
Data source Operations OLTP (consolidated)
Data purpose Run the business Reporting, analysis
Inserts, updates High Periodic batch jobs
Query complexity Low Depends …
DB design Normalized Snowflake and others
Availability Critical May be very critical
Target Operational staff Mgmt. + Operations
OLTP vs OLAP
That’s theory!
Real world is not that easy…
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 17
1NF◦ 1 table = 1 relation, no repeating groups or duplicate rows
2NF◦ All non prime attributes depend on
all parts (attributes) of a composite key 3NF
◦ All non prime attributes depend only on the (whole) key
Normalisation (Codd/Boyce)
Course Category Winner Origin
Monaco ‘10 Formula 1 M. Webber Australia
Japan ‘10 Formula 1 S. Vettel Germany
Japan ‘10 Rally S. Ogier France
Not in 3NF, why ?
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 18
Star Schema
Source: http://www.executionmih.com/data-warehouse/star-snowflake-schema.php (16/10/2010)
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
branch_keybranch_namebranch_type
Branch
Sales Fact Tabletime_keydayday_of_the_weekmonthquarteryear
time item_keyitem_namebrandtypesupplier_type
item
locationlocation_keyStreetcitystate_or_provincecountry
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 19
Snowflake Schema
Source: http://www.executionmih.com/data-warehouse/star-snowflake-schema.php (16/10/2010)
city_keycitystate_or_provincecountry
city
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
branch_keybranch_namebranch_type
Branch
Sales Fact Tabletime_keydayday_of_the_weekmonthquarteryear
time item_keyitem_namebrandtypesupplier_key
item
locationlocation_keystreetcity_key
supplier_keySupplier_type
supplier
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 20
From Operations to Reporting
Source: http://www.deakin.edu.au/ddw/what-is.php (16/10/2010)
ERP
FI
HR
…
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 21
Data Mining Drilldown
◦ Finer detail granularity (e.g. add a group-by column) Slice & dice
◦ Play with the dimensions Combine different dimensions Remove/add a dimension Analyse fact changes
Analysis
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 22
Data Warehouses in Administrative Computing Recap: Data Warehouses Theory Data Warehouses and Information Systems in AIS
◦ Foundation, HR and FI Information Systems◦ Complex Data Extraction Processes◦ Pixel-Perfect Reporting◦ Dashboards
Detailed Data Warehouse Example◦ Management Data Layer (MDL)
Agenda
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 24
Common data layer for various AIS services Data interfaces for other CERN services Common applications (e.g. mgmt. of roles)
Foundation
HR Information System (HRT)
FI Information System (CET)
… more domain specific information systems
Operative systems
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 25
ORACLE HR CERN Training Application Safety & access systems EDH (Electronic Document Handling) Accounting Application ERP system for CERN stores Contract follow-up …
Various Specialised Systems
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 26
Source databases:◦ ORACLE 10g◦ Microsoft Excel
HR/FI Information Systems:◦ ORACLE 10g◦ Java Enterprise web applications◦ SAP Business Objects tool family
Technical Environment
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 27
Nightly scheduled batch jobs Extractions organised in SQL scripts Run by self-developed “batch runner”
◦ Controls Order of execution (sequential, parallel) Criticality Logging Problem escalation (automatic emails)
Data Extractions
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 28
Definition of Extraction Process (1)
General definitions
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 29
Definition of Extraction Process (2)
Batches & commands
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 30
Importance of Monitoring
New hardware for DEV databases (gain > 1h)
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 31
Turtle or Leopard ?
The difference may be subtle …
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 32
Pre-aggregated summaries Benefit from query rewrite
ORACLE Materialised Views
Source: ORACLE 10g Documentation / Data Warehousing Guide
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 33
Don’t use remote tables if you need query rewrite Create materialized view log on all source tables
Materialised (Summary) Views
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 34
Use snapshots to efficiently access remote tables◦ Syntax: CREATE SNAPSHOT … AS [Your Query]◦ Refresh options:
FAST COMPLETE FORCE
Snapshots
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 35
PL/SQL is data source instead of a table May increase performance in environments with
heavy PL/SQL use
Pipelined Functions
CREATE OR REPLACE TYPE myTableFormat AS OBJECT( col_a NUMBER, col_b DATE, col_c VARCHAR2(25) )/
CREATE OR REPLACE TYPE myTableType AS TABLE OF myTableFormat/
1
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 36
Pipelined FunctionsCREATE OR REPLACE FUNCTION myFunc RETURN myTableType PIPELINED IS BEGIN FOR i in 1 .. 5 LOOP PIPE ROW ( myTableFormat( i, SYSDATE+i, 'Row '||i ) ); END LOOP; RETURN; END;END;/
2
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 37
Pipelined Functions
SELECT * FROM TABLE( myFunc() );
col_a col_b col_c--------- ---------- ----------1 27/10/2010 Row 12 28/10/2010 Row 23 29/10/2010 Row 3 4 30/10/2010 Row 45 31/10/2010 Row 5
3
Use a pipelined function if you require a data source other than a table!
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 38
Star schema like Highly de-normalised incl. duplication of data Use single-attribute keys wherever possible Performance matters!
◦ Be careful when extracting over database links◦ Certain tables from operational systems are copied◦ Deletion & recreation of indexes◦ Use partitions◦ Manual control of statistics collection◦ Optimizing execution plans very time-consuming
Database Design
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 39
Column and ordering selection Sub reports Various output formats (e.g. HTML, PDF) Charts Self-service reporting Automated scheduled report execution Row and column based access control
Reporting Application Framework
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 40
Data access
Name Unit Tel Salary Category
Meyer A 12345 $ 4,900 3
Schmidt B 23456 $ 6,400 1
Cook B 34567 $ 5,700 2
Which rows are visible to me? Unit leader of B only sees persons from Unit B.
Which data (columns) am I allowed to see? As a supervisor I may not be entitled to see the health insurance category. A safety or medical officer may not see the salary, etc.
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 42
Use of Apache FOP library◦ Examples:
Employment & training attestations Swiss / French card application forms
Business Objects XI Enterprise◦ Direct use◦ Indirect use via Business Objects Java SDK◦ Examples:
Salary slips Car stickers Work orders
Pixel Perfect Forms
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 43
Commercial tool family from SAP Advantages
◦ Rich reporting possibilities (interactive or via SDK)◦ Appealing dashboards using Xcelsius◦ Only a few users need the knowledge to design reports
Drawbacks◦ Two-way data storage (file system & database)◦ Sometimes stability problems◦ Time-intensive administration and maintenance◦ Expensive
Business Objects
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 44
Management Dashboards
Designed locally using MS Office and Xcelsius.
Data comes from the MDL data warehouse.
Published as Flash to the BO Server.
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 45
Data Warehouses in Administrative Computing Recap: Data Warehouses Theory Data Warehouses and Information Systems in AIS
◦ Foundation, HR and FI Information Systems◦ Complex Data Extraction Processes◦ Pixel-Perfect Reporting◦ Dashboards
Detailed Data Warehouse Example◦ Management Data Layer (MDL)
Agenda
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 46
KPI data warehouse Very extensible Fixed generic schema Feeds management dashboards
Management Data Layer (MDL)
Performance: Currently ca. 170 GB data in two tables
Generality: Different forms of data sources, new sources are added and removed all the time.
Integration with existing tools and development frameworks (ORACLE, Excel, BO, …)
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 47
MDL Data Model
MDL_HEADERS MDL_DIMENSIONS
MDL_VALUES
MDL_RAW_DATA
MDL_SUMMARY_DATA
MDL_LOOKUP_INFO
MDL_LOOKUP_DATA
n
n
n
n n
n
describes
describes
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 48
MDL Data Model
MDL_HEADERS MDL_DIMENSIONS
MDL_VALUES
MDL_RAW_DATA
MDL_SUMMARY_DATA
MDL_LOOKUP_INFO
MDL_LOOKUP_DATA
n
n
n
n n
n
describes
describes
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 49
MDL Data Model
MDL_HEADERS MDL_DIMENSIONS
MDL_VALUES
MDL_RAW_DATA
MDL_SUMMARY_DATA
MDL_LOOKUP_INFO
MDL_LOOKUP_DATA
n
n
n
n n
n
describes
describes
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 50
MDL Data Model
MDL_HEADERS MDL_DIMENSIONS
MDL_VALUES
MDL_RAW_DATA
MDL_SUMMARY_DATA
MDL_LOOKUP_INFO
MDL_LOOKUP_DATA
n
n
n
n n
n
describes
describes
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 51
Fact Table Partitioning
… 2008
Hash Partitioning
Data Set 1
Data Set 2
…
Data Set n
2009
Hash Partitioning
Data Set 1
Data Set 2
…
Data Set n
2010
Hash Partitioning
Data Set 1
Data Set 2
…
Data Set n
…
Range Partitioning
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 52
Keep it simple Redesign / add data source if required Use partitions and indexes
Query optimisation
SELECT dimension1, dimension3, sum( value2)FROM mdl_raw_dataWHERE data_id = 45 AND value_date > 20100000GROUP BY dimension1, dimension2ORDER BY 1, 2;
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 53
High data volumes + analysis = data warehouse OLTP vs. OLAP Use the facilities the tool provides
◦ Materialized views, snapshots, pipelined functions Keep things extensible and simple! Partitions are very helpful
Remember:
Jan Janke: "Data Warehouses and Analytical Data Processing ..." 54
Data Warehouses in Administrative Computing Recap: Data Warehouses Theory Data Warehouses and Information Systems in AIS
◦ Foundation, HR and FI Information Systems◦ Complex Data Extraction Processes◦ Pixel-Perfect Reporting◦ Dashboards
Detailed Data Warehouse Example◦ Management Data Layer (MDL)
Thank You!