On Line Analytical Modeling

Embed Size (px)

Citation preview

  • 8/11/2019 On Line Analytical Modeling

    1/74

  • 8/11/2019 On Line Analytical Modeling

    2/74

    17/08/2014 2

    What is

    On-Line: A process controlled by a computer.

    Analytical Processingneeds Analytical Data.

    Analytical Data: Data that involve analysis. Analytical Data consist of Business Data.

    Business Data: Time, Customers, Sales, Stores,Products, etc.

    Business Data

    Analytical Data

    Analytical ProcessingClient

  • 8/11/2019 On Line Analytical Modeling

    3/74

  • 8/11/2019 On Line Analytical Modeling

    4/74

  • 8/11/2019 On Line Analytical Modeling

    5/74

  • 8/11/2019 On Line Analytical Modeling

    6/74

    Interactive, exploratory analysis ofmultidimensional data to discover patterns

    age accid

    ents

    gender

  • 8/11/2019 On Line Analytical Modeling

    7/74

  • 8/11/2019 On Line Analytical Modeling

    8/74

    Online analytical processing is a category ofsoftware technology that enables analysts,manager and executives to gain insight intodata through fast consistent, interactive

    access in a wide variety of possible views ofinformation that has been transformed fromraw data to reflect the real dimensionality ofthe enterprise as understood by the user.

  • 8/11/2019 On Line Analytical Modeling

    9/74

    Advanced data analysis environment Supports decision making, business modeling,

    and operations research activities Characteristics of OLAP

    Use multidimensional data analysis techniques Provide advanced database support Provide easy-to-use end-user interfaces Support client/server architecture Facilitate interactive query and complex analysis for the

    user

    Allow drill down or roll up Ability to perform intricate calculations and comparisons Present result in meaningful ways like chart graphs

  • 8/11/2019 On Line Analytical Modeling

    10/74

    August 17, 2014 Data Mining: Concepts and Techniques 10

    OLTP OLAP

    users clerk, IT professional knowledge worker

    function day to day operations decision support

    DB design application-oriented subject-oriented

    data current, up-to-datedetailed, flat relational

    isolated

    historical,summarized, multidimensional

    integrated, consolidated

    usage repetitive ad-hoc

    access read/write

    index/hash on prim. key

    lots of scans

    unit of work short, simple transaction complex query# records accessed tens millions

    #users thousands hundreds

    DB size 100MB-GB 100GB-TB

    metric transaction throughput query throughput, response

  • 8/11/2019 On Line Analytical Modeling

    11/74

  • 8/11/2019 On Line Analytical Modeling

    12/74

  • 8/11/2019 On Line Analytical Modeling

    13/74

    Multidimensional conceptual view Transparency Accessibility Consistent reporting performance Client server architecture Generic dimensionality Dynamic sparse matrix handling Multiuser support Unrestricted cross dimensional operations

    Intuitive data manipulation Flexible reporting Unlimited dimensions and aggregation levels

  • 8/11/2019 On Line Analytical Modeling

    14/74

    17/08/2014

    Theodoros CHRYSAFIS - Academix

    s3ctit03 -www.city.academic.gr/academix 14

    OLAP Taxonomy

    Multi-dimensional OLAP(MOLAP)A k-dimensional matrix based on a non relational storage

    structure. Agrawal et al.

    Relational OLAP(ROLAP)A relational back-end wherein operations of the data are

    translated to relational queries. Agrawal et al.

    Hybrid OLAP(HOLAP)Integration of MOLAP and ROLAP.

    Desktop OLAP(DOLAP)Provides a specific cube for analysis. Simplified version of

    MOLAP or ROLAP.

  • 8/11/2019 On Line Analytical Modeling

    15/74

    OLAP functionality tomultidimensional databases (MDBMS)

    Stored data in multidimensional datacube N-dimensional cubes called

    hypercubes

    Cube cache memory speedsprocessing Affected by how the database system

    handles density of data cube calledsparsity

  • 8/11/2019 On Line Analytical Modeling

    16/74

    OLAP functionality

    Uses relational DB query tools

    Extensions to RDBMS Multidimensional data schema support

    Data access language and query performanceoptimized for multidimensional data

    Support for very large databases (VLDBs)

  • 8/11/2019 On Line Analytical Modeling

    17/74

    General features Basic features are

    Multidimensional analysis Consistent performance Fast response time Drill down and roll up Navigation in and out of details Slice and dice rotation Multiple view modes

    Easy scalability Time intelligence

  • 8/11/2019 On Line Analytical Modeling

    18/74

  • 8/11/2019 On Line Analytical Modeling

    19/74

  • 8/11/2019 On Line Analytical Modeling

    20/74

  • 8/11/2019 On Line Analytical Modeling

    21/74

  • 8/11/2019 On Line Analytical Modeling

    22/74

    Three-

    DimensionalCubeDisplay

    Page Columns

    Region:

    North

    Sales

    Red

    blob

    Blue

    blob

    Total

    1996

    Rows 1997

    Year Total

  • 8/11/2019 On Line Analytical Modeling

    23/74

    Six-Dimensional

    Cube

    Dimension Example

    Brand Mt. Airy

    Store Atlanta

    Customer segment Business

    Product group Desks

    Period January

    Variable Units sold

  • 8/11/2019 On Line Analytical Modeling

    24/74

    MDS structure A hypercube is general metaphor for

    representing multidimensional data

  • 8/11/2019 On Line Analytical Modeling

    25/74

  • 8/11/2019 On Line Analytical Modeling

    26/74

    Region Sales variance

    Africa 105%

    Asia 57%

    Europe 122%

    North America 97%

    Pacific 85%

    South America 163%

    Nation Sales variance

    China 123%Japan 52%

    India 87%

    Singapore 95%

  • 8/11/2019 On Line Analytical Modeling

    27/74

    Just a snippet from http://www.olapreport.com/ProductsIndex.htm; not an end

    http://www.olapreport.com/ProductsIndex.htmhttp://www.olapreport.com/ProductsIndex.htm
  • 8/11/2019 On Line Analytical Modeling

    28/74

    Advance Database Techniques 28

    The database is stored in a special structure that isoptimized for multidimensional analysis.

    Data is aggregated and stored according to predicted usage

    Very fast query response time as data is mostly pre-calculated

    Systems are best used when data is desired for a specificapplication

    Tight Coupling between application and presentation layer

    MOLAP

  • 8/11/2019 On Line Analytical Modeling

    29/74

    Advance Database Techniques 29

    Practical limit on the size- time taken to calculate the database & the

    space - required to hold these pre-calculatedvalues

    - Good for smaller storage space (< 50 GB)

    Navigation of Data is limited

    Costly to maintain

    Does not scale well

    MOLAP

  • 8/11/2019 On Line Analytical Modeling

    30/74

  • 8/11/2019 On Line Analytical Modeling

    31/74

  • 8/11/2019 On Line Analytical Modeling

    32/74

    Advance Database Techniques 32

    Advantages

    Excellent performance:MOLAP cubes are built for fast data retrieval, and is

    optimal for slicing and dicing operations.

    Can perform complex calculations:

    All calculations have been pre-generated when the

    cube is created. Hence, complex calculations are not onlydoable, but they return quickly.

  • 8/11/2019 On Line Analytical Modeling

    33/74

    Advance Database Techniques 33

    Disadvantages

    :

    Handles limited data :

    Because all calculations are performed whenthe cube is built, it is not possible to include a largeamount of data in the cube itself.

    Requires additional investment :

    Cube technology are often proprietary and donot already exist in the organization. Therefore, toadopt MOLAP technology chances, additionalinvestments in human and capital resources are

    needed.

    MOLAP

  • 8/11/2019 On Line Analytical Modeling

    34/74

    Advance Database Techniques 34

    ROLAP is an alternative to the MOLAP technology.

    ROLAP differs significantly in that it does notrequire the pre-computation and storage ofinformation.

    ROLAP tools access the data in a relationaldatabase and generate SQL queries to calculateinformation at the appropriate level when an Enduser requests it

    It is possible to create additionaldatabase(summary tables and aggregation) tableswhich is summarize the data at any desiredcombination of dimensions.

  • 8/11/2019 On Line Analytical Modeling

    35/74

    Advance Database Techniques 35

    The database is a standard relationaldatabase and the database model is a

    multidimensional model, often referred toas a star or snowflake model or schema.

  • 8/11/2019 On Line Analytical Modeling

    36/74

    17/08/2014

    Theodoros CHRYSAFIS - Academix

    s3ctit03 -www.city.academic.gr/academix 36

    ROLAP

    A multi-dimensional user view on relationaldata storage using Star or SnowflakeDatabase Schemata.

    ProductDimension

    TimeDimension

    RegionDimension

    CustomerDimension

    ProductDimension

    YearDimension

    CountryDimension

    CustomerDimension

    Sales

    CustomerCharacteristics

    ProductKind

    Region

    Month

    Snowflake

    Schema

    Sales

    Star Schema

  • 8/11/2019 On Line Analytical Modeling

    37/74

    17/08/2014

    Theodoros CHRYSAFIS - Academix

    s3ctit03 -www.city.academic.gr/academix 37

    ROLAP

    Advantages: Easy to understand, easy tomodel, easy to implement.

    Further Research on dynamic optimisation, onmeta-models, on functional extensions forthe ROLAP engines, on user-definedfunctions for the OLAP.

  • 8/11/2019 On Line Analytical Modeling

    38/74

  • 8/11/2019 On Line Analytical Modeling

    39/74

    Advance Database Techniques 39

    Advantages:

    Can handle large amounts of data:The data size limitation of ROLAP technology

    is the limitation on data size of the underlyingrelational database. In other words, ROLAP itself

    places no limitation on data amount. Can leverage functionalities inherent in the

    relational database:

    Often, relational database already comes witha host of functionalities. ROLAP technologies,since they sit on top of the relational database,can therefore leverage these functionalities.

    Easy to understand, easy to model, easy to

    implement.

  • 8/11/2019 On Line Analytical Modeling

    40/74

    Advance Database Techniques 40

    Disadvantages:

    Performance can be slow:

    Because each ROLAP report is essentially a SQLquery (or multiple SQL queries) in the relational database,the query time can be long if the underlying data size is

    large. Limited by SQL functionalities:

    Because ROLAP technology mainly relies ongenerating SQL statements to query the relational

    database, and SQL statements do not fit all needs (forexample, it is difficult to perform complex calculationsusing SQL), ROLAP technologies are therefore traditionallylimited by what SQL can do. ROLAP vendors have mitigatedthis risk by building into the tool out-of-the-box complexfunctions as well as the ability to allow users to define their

    own functions.

    ROLAP

  • 8/11/2019 On Line Analytical Modeling

    41/74

    Advance Database Techniques 41

    ROLAP v/s MOLAP

    AND

    HOLAP

  • 8/11/2019 On Line Analytical Modeling

    42/74

  • 8/11/2019 On Line Analytical Modeling

    43/74

  • 8/11/2019 On Line Analytical Modeling

    44/74

  • 8/11/2019 On Line Analytical Modeling

    45/74

    Advance Database Techniques 45

    a hybrid of ROLAP and MOLAP can be thought of as a virtual database

    whereby the higher levels of the database areimplemented as MOLAP and the lower levels of

    the database as ROLAP

    HOLAP

  • 8/11/2019 On Line Analytical Modeling

    46/74

    Advance Database Techniques 46

    A system, which supports (and integrates)multi-dimensional and relational storage fordata in an equivalent manner in order tobenefit from the corresponding characteristics

    and optimization techniques. Advantages:

    use of best techniques introduced onMOLAP and ROLAP, transparency betweenMOLAP and ROLAP systems.

    HOL P Contd

  • 8/11/2019 On Line Analytical Modeling

    47/74

    Advance Database Techniques 47

    Development Issues

    Results in lots of data redundancy

    It allows users to build custom cubes causing data

    inconsistencies Only limited amounts of Data can be maintained

    efficiently

    Almost all systems utilize HOLAP to some

    respects

    HOL P Contd

  • 8/11/2019 On Line Analytical Modeling

    48/74

    Advance Database Techniques 48

    DOLAP Desktop OLAP)

    The previous terms are used to refer to server based OLAPtechnologies

    DOLAP enables users to quickly pull together small cubes that run ontheir desktops or laptops

  • 8/11/2019 On Line Analytical Modeling

    49/74

  • 8/11/2019 On Line Analytical Modeling

    50/74

  • 8/11/2019 On Line Analytical Modeling

    51/74

  • 8/11/2019 On Line Analytical Modeling

    52/74

    2014.08.17.OLAP operations 52

  • 8/11/2019 On Line Analytical Modeling

    53/74

    2014.08.17.OLAP operations 53

    Roll up (drill-up):summarizedata

    by climbing up

    hierarchy or by

    dimension reduction

    Drill down (roll down):reverse

    of roll-up

    from higher level

    summary to lower levelsummary or detailed

    data, or introducing

    new dimensions

  • 8/11/2019 On Line Analytical Modeling

    54/74

    2014.08.17.OLAP operations 54

    OLAP operations II.

    Slice and dice:

    project and select

    Pivot (rotate):

    reorient the cube,

    visualization, 3D to

    series of 2D planes.

    Other operations drill across:

    involving (across)

    more than one fact

    table

    drill through:

    through the bottom

    level of the cube to

    its back-end

    relational tables

  • 8/11/2019 On Line Analytical Modeling

    55/74

    55

    sale prodId storeId date amt

    p1 c1 1 12

    p2 c1 1 11

    p1 c3 1 50

    p2 c2 1 8

    p1 c1 2 44

    p1 c2 2 4

    Add up amounts for day 1In SQL: SELECT sum(amt) FROM SALE

    WHERE date = 1

    81

  • 8/11/2019 On Line Analytical Modeling

    56/74

    56

    sale prodId storeId date amt

    p1 c1 1 12

    p2 c1 1 11

    p1 c3 1 50

    p2 c2 1 8

    p1 c1 2 44

    p1 c2 2 4

    Add up amounts by dayIn SQL: SELECT date, sum(amt) FROM SALE

    GROUP BY date

    ans date sum

    1 81

    2 48

  • 8/11/2019 On Line Analytical Modeling

    57/74

    57

    sale prodId storeId date amt

    p1 c1 1 12

    p2 c1 1 11

    p1 c3 1 50

    p2 c2 1 8

    p1 c1 2 44

    p1 c2 2 4

    Add up amounts by day, productIn SQL: SELECT date, sum(amt) FROM SALE

    GROUP BY date, prodId

    sale prodId date amt

    p1 1 62

    p2 1 19

    p1 2 48

    drill-down

    rollup

  • 8/11/2019 On Line Analytical Modeling

    58/74

    58

    Operators: sum, count, max, min,median, avg

    Having clause

    Using dimension hierarchy average by region (within store)

    maximum by month (within date)

  • 8/11/2019 On Line Analytical Modeling

    59/74

  • 8/11/2019 On Line Analytical Modeling

    60/74

    60

    day 2

    c1 c2 c3

    p1 44 4

    p2 c1 c2 c3

    p1 12 50

    p2 11 8

    day 1

    c1 c2 c3

    p1 56 4 50

    p2 11 8

    c1 c2 c3

    sum 67 12 50

    sum

    p1 110

    p2 19

    129

    . . .

    sale c1,*,*)

    sale *,*,*)ale c2,p2,*)

  • 8/11/2019 On Line Analytical Modeling

    61/74

    61

    c1 c2 c3 *

    p1 56 4 50 110

    p2 11 8 19

    * 67 12 50 129ay 2 c1 c2 c3 *

    p1 44 4 48

    p2

    * 44 4 48c1 c2 c3 *

    p1 12 50 62

    p2 11 8 19

    * 23 8 50 81

    day 1

    *

    sale *,p2,*)

  • 8/11/2019 On Line Analytical Modeling

    62/74

    62

    day 2

    c1 c2 c3

    p1 44 4

    p2 c1 c2 c3

    p1 12 50p2 11 8

    day 1

    region A region B

    p1 56 54p2 11 8

    customer

    region

    country

    (customer c1 in Region A;customers c2, c3 in Region B)

  • 8/11/2019 On Line Analytical Modeling

    63/74

    63

    sale prodId storeId date amt

    p1 c1 1 12

    p2 c1 1 11

    p1 c3 1 50

    p2 c2 1 8

    p1 c1 2 44

    p1 c2 2 4

    day 2

    c1 c2 c3

    p1 44 4

    p2 c1 c2 c3p1 12 50

    p2 11 8

    day 1

    Multi-dimensional cube:Fact table view:

    c1 c2 c3

    p1 56 4 50

    p2 11 8

  • 8/11/2019 On Line Analytical Modeling

    64/74

    Advance Database Techniques 64

    Slicing is selecting a group of cells fromthe entire multidimensional array byspecifying a specific value for one or moredimensions.

    Dicing involves selecting a subset of cells byspecifying a range of attribute values.

    This is equivalent to defining a subarray from the complete array.

    In practice, both operations can also beaccompanied by aggregation over somedimensions.

    Slicing and Dicing

  • 8/11/2019 On Line Analytical Modeling

    65/74

  • 8/11/2019 On Line Analytical Modeling

    66/74

  • 8/11/2019 On Line Analytical Modeling

    67/74

  • 8/11/2019 On Line Analytical Modeling

    68/74

    68

    20

    23

    1819

    20

    21

    22

    23

    25

    26

    id name age

    1 joe 20

    2 fred 20

    3 sally 21

    4 nancy 205 tom 20

    6 pat 25

    7 dave 21

    8 jeff 26

    ageindex

    datarecords

  • 8/11/2019 On Line Analytical Modeling

    69/74

  • 8/11/2019 On Line Analytical Modeling

    70/74

    70

    sale prodId storeId date amt

    p1 c1 1 12

    p2 c1 1 11p1 c3 1 50

    p2 c2 1 8

    p1 c1 2 44

    p1 c2 2 4

    Combine SALE, PRODUCT relationsIn SQL: SELECT * FROM SALE, PRODUCT

    product id name price

    p1 bolt 10

    p2 nut 5

    joinTb prodId name price storeId date amt

    p1 bolt 10 c1 1 12p2 nut 5 c1 1 11

    p1 bolt 10 c3 1 50

    p2 nut 5 c2 1 8

    p1 bolt 10 c1 2 44

    p1 bolt 10 c2 2 4

  • 8/11/2019 On Line Analytical Modeling

    71/74

    71

    product id name price jIndex

    p1 bolt 10 r1,r3,r5,r6

    p2 nut 5 r2,r4

    sale rId prodId storeId date amt

    r1 p1 c1 1 12

    r2 p2 c1 1 11

    r3 p1 c3 1 50

    r4 p2 c2 1 8r5 p1 c1 2 44

    r6 p1 c2 2 4

    join index

  • 8/11/2019 On Line Analytical Modeling

    72/74

    Bitmapped join index

    File organisation

  • 8/11/2019 On Line Analytical Modeling

    73/74

    Web based OLAP

    Web OLAP approaches Browser plug ins

    Precreated HTML documents OLAP in the server

  • 8/11/2019 On Line Analytical Modeling

    74/74