Business Intelligence
www.robertomarchetto.com
History
● Business Intelligence term first apparition on 1958 by Hans Peter Luhn, an IBM researcher
● Authomatic method to provide current awareness services to scientists and engineers
● Current definition of Business Intelligence as a combination of processes and technologies for gathering, storing, analyzing and providing access to informations to help enterprise users to make conscious decisions
www.robertomarchetto.com
Main concept
● Collect data from different sources● Integrate and clean up data in a common, easy
to analyze repository● Provide business related analysis for managers
and decision makers● Focus on business, data integration, data
presentation
www.robertomarchetto.com
Datawarehouse
● Bill Inmon: A collection of data in support of decisional process● End-user oriented● Collected from different sources● Time dependence● Data is not editable
● In theory means a group of processes● In the real world is often used for the database
www.robertomarchetto.com
OLTP: On-Line Transaction Processing
● Commonly used in ERP, CRM systems and database applications
● Focuson transaction level (one invoice, one sales order, a search query, etc.)
● Updates and insertions are frequent● Relational model with many tables, using
normalization rules
www.robertomarchetto.com
OLAP: On-Line Analytical Processing
● A system designed for analysis prouposes● Focused on the data exploration on the whole ● Data once added changes a lot less frequently● 13 (12+0) rules of Dr. Codd (1993)
● Multidimensional view● Intuitive data manipulation● Dimensions, Facts, Hierarchy levels, Cardinality
www.robertomarchetto.com
Relational OLAP
● Uses relational database schemas and SQL to store and access OLAP cubes
● Reuse of RDBMS technology● Many tools and vendors available● SQL can be used directly by many tools● Scalability
www.robertomarchetto.com
Memory OLAP, Hybrid OLAP
● Memory OLAP uses optimized multidimensional arrays● Requires pre-computation and storage of the cube
(processing)● Often better in performances than ROLAP, better
caching, multidimensional indexing● Compression techniques, statistical indexes● Less scalable than ROLAP on high volume of data,
less tools and vendors available● Hybrid OLAP (HOLAP) is the combination of ROLAP
and MOLAP
www.robertomarchetto.com
Slowly Changing Dimensions
● In some Business Intelligence implementations data is always added and almost never modified
● This makes possible to go back in the timeline ● For example if an employer was hired in a time period
you can analyze data as being in that period, counting exactly the number of employes
● A common approach to ensure Slowly Changing Dimesions is to add some special fields to the database records, giving a time-related validity for each record
www.robertomarchetto.com
MDX
● Multidimensional Expressions (MDX) is a query language for OLAP databases
● MDX is to OLAP as SQL queries are to OLTP databases
● Powerfull on computing indexes and navigating through OLAP dimensions
● SELECT {[Measures].[Store Sales]} ON COLUMNS{[Date].[2002], [Date].[2003]} ON ROWS FROM Sales WHERE ([Store].[USA].[CA])
www.robertomarchetto.com
Features for a BI platform
● Data storage, data management● Data Integration, process schedulement● Querying and reporting● On Line Analitycal Processing (OLAP)● Documents management, versioning● Statistical computations● Microsoft Office or Open Office support● Easy to use and end user self creation of
documents (indipendence from developers)
www.robertomarchetto.com
Data Mining
● Requires a strong preparation in computational statistics
www.robertomarchetto.com
● Reporting● OLAP● Charts● Portal containers● Data integration tools● Libraries, CMS,
scheduler● Databases
Open Source offers
www.robertomarchetto.com
SpagoBI (BI Suite)
● Engineering Informatica (Italy)
● Integration of components using drivers
● Comprehensive● Full Open Source
www.robertomarchetto.com
Pentaho (BI Suite)
● Pentaho (USA)● Acquisition instead of
integration● Strong marketing● Commercial and
Open Source
www.robertomarchetto.com
JasperServer (BI Suite)
● JasperSoft (USA)● Famous for
JasperReports● Easy to use● Commercial and
Open Souce
www.robertomarchetto.com
Palo (In memory OLAP)
● Jedox (Germany)● Interesting technology
(M-OLAP, GPU)● Excel and OpenOffice
plugins● Web spreadsheet and
reporting● Open Source and
Commercial support
www.robertomarchetto.com
Talend (Data Integration)
● Talend (France)● „Cool Vendor“
Gartner for Data Integration
● Data Integration, Data Quality, Data Management, ESB
● Open Source and Commercial support