74
לללל לBI

מבוא ל BI

  • Upload
    sammy

  • View
    54

  • Download
    5

Embed Size (px)

DESCRIPTION

מבוא ל BI. Automated Decision-Making Framework . BI (לפי ויקיפדיה) . http://he.wikipedia.org/wiki/%D7%91%D7%99%D7%A0%D7%94_%D7%A2%D7%A1%D7%A7%D7%99%D7%AA תוכן עניינים 1  היסטוריה 2 תהליך העבודה 3 מחסן נתונים ו- BI 4 עיבוד אנליטי מקוון (OLAP ) 5  כריית מידע (כל שיטות הלמידה שלמדנו) - PowerPoint PPT Presentation

Citation preview

Page 1: מבוא ל BI

ל BIמבוא

Page 2: מבוא ל BI

Automated Decision-Making Framework

Page 3: מבוא ל BI

BI ) ויקיפדיה) לפי

•:// . . / /% 7%91% 7%99% 7% 0% 7%94_% 7% 2% 7% 1% 7% 7% 7%99% 7%http he wikipedia org wiki D D D A D D A D A D A D D AA

עניינים תוכןהיסטוריה 1•תהליךהעבודה  2•נתוניםו-  3• מחסן BI•4  מקוון אנליטי )OLAP(עיבוד•5  מידע (כריית שלמדנו ) הלמידה שיטות כלעסקיתתפעולית  6• בינהשימושיםעיקריים  7•מוצרי  8• BI

Page 4: מבוא ל BI

של DSSהיסטוריהClassical Definitions of DSS

• Interactive computer-based systems, which help decision makers utilize data and models to solve unstructured problems" - Gorry and Scott-Morton, 1971

• Decision support systems couple the intellectual resources of individuals with the capabilities of the computer to improve the quality of decisions. It is a computer-based support system for management decision makers who deal with semistructured problems - Keen and Scott-Morton, 1978

Page 5: מבוא ל BI

Types of DSS • Two major types:

– Model-oriented DSS– Data-oriented DSS

• Evolution of DSS into Business Intelligence– Use of DSS moved from specialist to managers, and

then whomever, whenever, wherever– Enabling tools like OLAP, data warehousing, data mining,

intelligent systems, delivered via Web technology have collectively led to the term “business intelligence” (BI) and “business analytics”

Page 6: מבוא ל BI

...מויקיפדיה

  מאמצע -2000החל ה עסקית  שנות לבינה חדשים כלים קיימים  הנקראת 2.0Business בתפיסה Intelligence( BI 2.0 ,)

הארגון נתוני על עובדים ידי על שאילתות ביצוע המאפשרים . המושג אמיתי  BI 2.0בזמן למושג בהקבלה 2.0Web נטבע

  של  בתפיסה הם זה מסוג שעיבודים  דפדפןמשום בסביבת Web .כליBI 2.0 מהדיווחים יותר דינמיים דיווחים מאפשרים

. מסוג לעיבודים חשוב בסיס קודם מדור כלים שאפיינו הסטטיים- ב השימוש הוא  SOAזה במוצרי, שימוש עם ביחד תָוְוכָהשבא

( Middleware )ב ושימוש יותר .תקניםגמישים מידע  להעברתService Oriented Architecture = SOA

Page 7: מבוא ל BI

DSS Description

• DSS application A DSS program built for a specific purpose (e.g., a scheduling system for a specific company)

• Business intelligence (BI)A conceptual framework for decision support. It combines architecture, databases (or data warehouses), analytical tools, and applications

Page 8: מבוא ל BI

Business Intelligence (BI) • BI is an evolution of decision support

concepts over time.– Meaning of EIS/DSS…

• Then: Executive Information System • Now: Everybody’s Information System (BI)

• BI systems are enhanced with additional visualizations, alerts, and performance measurement capabilities.

• The term BI emerged from industry apps.

Page 9: מבוא ל BI

The Evolution of BI Capabilities

Page 10: מבוא ל BI

The Architecture of BI• A BI system has four major components

– a data warehouse, with its source data– business analytics, a collection of tools for manipulating, mining,

and analyzing the data in the data warehouse; – business performance management (BPM) for monitoring and

analyzing performance– a user interface (e.g., dashboard)

ב – מרכזי מקום עסקית בינה של הנושא תפס האחרונות בשניםהמידע ממוחשבות. מערכות במערכות הנצבר במידע הרב הגידול

תהיה שלמידע מנת על רלוונטיים נתונים של וריכוז הצגה מחייבחברות. רכישת הוא התחום לחשיבות הביטויים אחד משמעות

גדולות תוכנה חברות ידי על בתחום המתמחות בולטות

Page 11: מבוא ל BI

A High-Level Architecture of BI

Page 12: מבוא ל BI

Learning Objectives• Explain data integration and the extraction,

transformation, and load (ETL) processes• Describe real-time (a.k.a. right-time and/or active)

data warehousing• Understand data warehouse administration and

security issues

Page 13: מבוא ל BI

Stage 1: Data Warehouse• A physical repository where relational data are

specially organized to provide enterprise-wide, cleansed data in a standardized format

• “The data warehouse is a collection of integrated, subject-oriented databases designed to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time”

Page 14: מבוא ל BI

DW Framework

DataSources

ERP

Legacy

POS

OtherOLTP/wEB

External data

Select

Transform

Extract

Integrate

Load

ETL Process

EnterpriseData warehouse

Metadata

Replication

A P

I

/ M

iddl

ewar

e Data/text mining

Custom builtapplications

OLAP,Dashboard,Web

RoutineBusinessReporting

Applications(Visualization)

Data mart(Engineering)

Data mart(Marketing)

Data mart(Finance)

Data mart(...)

Access

No data marts option

Page 15: מבוא ל BI

Extraction, transformation, and load (ETL)

Data Integration and the Extraction, Transformation, and Load (ETL) Process

Packaged application

Legacy system

Other internal applications

Transient data source

Extract Transform Cleanse Load

Datawarehouse

Data mart

Page 16: מבוא ל BI

Data MartA departmental data warehouse that stores only relevant data

– Dependent data mart A subset that is created directly from a data warehouse

– Independent data martA small data warehouse designed for a strategic business unit or a department

Page 17: מבוא ל BI

OLAP vs. OLTPOnline Analytical vs. Online Transaction (Processing)

Page 18: מבוא ל BI

OLAP

Product

Time

Geo

grap

hy

Sales volumes of a specific Product on variable Time and Region

Sales volumes of a specific Region on variable Time and Products

Sales volumes of a specific Time on variable Region and Products

Cells are filled with numbers representing

sales volumes

A 3-dimensional OLAP cube with slicing operations

Slicing Operations on a Simple Tree-DimensionalData Cube

Page 19: מבוא ל BI

Star vs Snowflake Schema

Fact TableSALES

UnitsSold

...

DimensionTIME

Quarter

...

DimensionPEOPLE

Division

...

DimensionPRODUCT

Brand

...

DimensionGOGRAPHY

Coutry

...

Fact TableSALES

UnitsSold

...

DimensionDATE

Date

...

DimensionPEOPLE

Division

...

DimensionPRODUCT

LineItem

...

DimensionSTORE

LocID

...

DimensionBRAND

Brand

...

DimensionCATEGORY

Category

...

DimensionLOCATION

State

...

DimensionMONTH

M_Name

...

DimensionQUARTER

Q_Name

...

Star Schema Snowflake Schema

Page 20: מבוא ל BI

של דוגמא SNOWFLAKEעוד

Page 21: מבוא ל BI

מידע כריית•)... ל ) שווה לא או שווה סיווג

– , , חדש סניף לפחות בתחום להשקיע כסף להלוותאשכולות )• (Clusteringניתוח

–? ? אותם מאחד מה יש לקוחות סוגי כמהרגרסיה • ניתוח

– , אופטימיזציה נרוויח כמה

Page 22: מבוא ל BI

מידע סוגימנתונים • מידע כריית

–" פשוט " היותרמטקסטים • מידע כריית

–INFORMATION RETRIEVAL–TREND ANALYSIS, SENTIMENT ANALYSIS

Page 23: מבוא ל BI

Categories of ModelsCategory Objective Techniques

Optimization of problems with few alternatives

Find the best solution from a small number of alternatives

Decision tables, decision trees

Optimization via algorithm

Find the best solution from a large number of alternatives using a step-by-step process

Linear and other mathematical programming models

Optimization via an analytic formula

Find the best solution in one step using a formula

Some inventory models

Simulation Find a good enough solution by experimenting with a dynamic model of the system

Several types of simulation

Heuristics Find a good enough solution using “common-sense” rules

Heuristic programming and expert systems

Predictive and other models

Predict future occurrences, what-if analysis, …

Forecasting, Markov chains, financial, …

Page 24: מבוא ל BI

Static and Dynamic Models

• Static Analysis– Single snapshot of the situation– Single interval– Steady state

• Dynamic Analysis– Dynamic models– Evaluate scenarios that change over time– Time dependent– Represents trends and patterns over time– More realistic: Extends static models

Page 25: מבוא ל BI

Decision Analysis: A Few Alternatives

Single Goal Situations

• Decision trees– Graphical representation of

relationships– Multiple criteria approach– Demonstrates complex

relationships– Cumbersome, if many alternatives

exists

Page 26: מבוא ל BI

Decision Tables

• Investment example

• One goal: maximize the yield after one year

• Yield depends on the status of the economy (the state of nature)– Solid growth– Stagnation– Inflation

Page 27: מבוא ל BI

Investment Example: Possible Situations

1. If solid growth in the economy, bonds yield 12%; stocks 15%; time deposits 6.5%

2. If stagnation, bonds yield 6%; stocks 3%; time deposits 6.5%

3. If inflation, bonds yield 3%; stocks lose 2%; time deposits yield 6.5%

Page 28: מבוא ל BI

Optimization via Mathematical Programming

• Mathematical Programming A family of tools designed to help solve managerial problems in which the decision maker must allocate scarce resources among competing activities to optimize a measurable goal

• Optimal solution: The best possible solution to a modeled problem – Linear programming (LP): A mathematical model for the

optimal solution of resource allocation problems. All the relationships are linear

Page 29: מבוא ל BI

LP Problem Characteristics

1. Limited quantity of economic resources2. Resources are used in the production of products or

services3. Two or more ways (solutions, programs) to use the

resources4. Each activity (product or service) yields a return in

terms of the goal5. Allocation is usually restricted by constraints

Page 30: מבוא ל BI

Line

Linear Programming Steps• 1. Identify the …

– Decision variables – Objective function – Objective function coefficients – Constraints

• Capacities / Demands

• 2. Represent the model– LINDO: Write mathematical formulation– EXCEL: Input data into specific cells in Excel

• 3. Run the model and observe the results

Page 31: מבוא ל BI

LP ExampleThe Product-Mix Linear Programming Model • MBI Corporation • Decision: How many computers to build next month?• Two types of mainframe computers: CC7 and CC8• Constraints: Labor limits, Materials limit, Marketing lower limits

CC7 CC8 Rel LimitLabor (days) 300 500 <= 200,000 /moMaterials ($) 10,000 15,000 <= 8,000,000 /moUnits 1 >= 100Units 1 >= 200Profit ($) 8,000 12,000 Max

Objective: Maximize Total Profit / Month

Page 32: מבוא ל BI

Sensitivity, What-if, and Goal Seeking Analysis

• Sensitivity– Assesses impact of change in inputs on outputs– Eliminates or reduces variables– Can be automatic or trial and error

• What-if– Assesses solutions based on changes in variables or

assumptions (scenario analysis)• Goal seeking

– Backwards approach, starts with goal– Determines values of inputs needed to achieve goal– Example is break-even point determination

Page 33: מבוא ל BI

Heuristic Programming

• Cuts the search space• Gets satisfactory solutions more

quickly and less expensively• Finds good enough feasible

solutions to very complex problems• Heuristics can be

– Quantitative– Qualitative (in ES)

• Traveling Salesman Problem >>>

Page 34: מבוא ל BI

Heuristic Programming - SEARCH

Page 35: מבוא ל BI

Traveling Salesman Problem• What is it?

– A traveling salesman must visit customers in several cities, visiting each city only once, across the country. Goal: Find the shortest possible route

– Total number of unique routes (TNUR):TNUR = (1/2) (Number of Cities – 1)!Number of Cities TNUR

5 12 6 60 9 20,160

20 1.22 1018

Page 36: מבוא ל BI

When to Use Heuristics

When to Use Heuristics– Inexact or limited input data– Complex reality– Reliable, exact algorithm not available– Computation time excessive– For making quick decisions

Limitations of Heuristics– Cannot guarantee an optimal solution

Page 37: מבוא ל BI

• Tabu search– Intelligent search algorithm

• Genetic algorithms– Survival of the fittest

• Simulated annealing– Analogy to Thermodynamics

Modern Heuristic Methods

Page 38: מבוא ל BI

Simulation

• Technique for conducting experiments with a computer on a comprehensive model of the behavior of a system

• Frequently used in DSS tools

Page 39: מבוא ל BI

• Imitates reality and capture its richness• Technique for conducting experiments• Descriptive, not normative tool• Often to “solve” very complex problems

Simulation is normally used only when a problem is too complex to be treated using numerical optimization techniques

Major Characteristics of Simulation

Page 40: מבוא ל BI

Advantages of Simulation

• The theory is fairly straightforward• Great deal of time compression• Experiment with different alternatives• The model reflects manager’s perspective• Can handle wide variety of problem types • Can include the real complexities of problems • Produces important performance measures• Often it is the only DSS modeling tool for non-

structured problems

Page 41: מבוא ל BI

Limitations of Simulation

• Cannot guarantee an optimal solution• Slow and costly construction process• Cannot transfer solutions and inferences to solve

other problems (problem specific)• So easy to explain/sell to managers, may lead

overlooking analytical solutions• Software may require special skills

Page 42: מבוא ל BI

Simulation Types• Stochastic vs. Deterministic Simulation

– In stochastic simulations: We use distributions (Discrete or Continuous probability distributions)

• Time-dependent vs. Time-independent Simulation– Time independent stochastic simulation via Monte Carlo technique (X =

A + B)• Discrete event vs. Continuous simulation• Steady State vs. Transient Simulation

• Simulation Implementation – Visual simulation– Object-oriented simulation

Page 43: מבוא ל BI

Data Mining Methods: Classification

• Most frequently used DM method• Part of the machine-learning family • Employ supervised learning• Learn from past data, classify new data• The output variable is categorical (nominal

or ordinal) in nature• Classification versus regression?• Classification versus clustering?

Page 44: מבוא ל BI

Assessment Methods for Classification

• Predictive accuracy– Hit rate

• Speed– Model building; predicting

• Robustness• Scalability• Interpretability

– Transparency, explainability

Page 45: מבוא ל BI

Accuracy of Classification Models• In classification problems, the primary source for

accuracy estimation is the confusion matrix

True Positive

Count (TP)

FalsePositive

Count (FP)

TrueNegative

Count (TN)

FalseNegative

Count (FN)

True ClassPositive Negative

Pos

itive

Neg

ativ

eP

redi

cted

Cla

ss FNTPTPRatePositiveTrue

FPTNTNRateNegativeTrue

FNFPTNTPTNTPAccuracy

FPTPTPrecision

PFNTP

TPcallRe

Page 46: מבוא ל BI

Estimation Methodologies for Classification

• Simple split (or holdout or test sample estimation) – Split the data into 2 mutually exclusive sets training

(~70%) and testing (30%)

PreprocessedData

Training Data

Testing Data

Model Development

Model Assessment

(scoring)

2/3

1/3

Classifier

Prediction Accuracy

Page 47: מבוא ל BI

Estimation Methodologies for Classification

• k-Fold Cross Validation (rotation estimation) – Split the data into k mutually exclusive subsets– Use each subset as testing while using the rest of the

subsets as training– Repeat the experimentation for k times – Aggregate the test results for true estimation of

prediction accuracy training• Other estimation methodologies

– Leave-one-out, bootstrapping, jackknifing– Area under the ROC curve

Page 48: מבוא ל BI

Classification Techniques

• Decision tree analysis• Statistical analysis• Neural networks• Support vector machines• Case-based reasoning• Bayesian classifiers• Genetic algorithms• Rough sets

Page 49: מבוא ל BI

Decision Trees

• Employs the divide and conquer method• Recursively divides a training set until each division

consists of examples from one class1. Create a root node and assign all of the training data to it2. Select the best splitting attribute3. Add a branch to the root node for each value of the split.

Split the data into mutually exclusive subsets along the lines of the specific split

4. Repeat the steps 2 and 3 for each and every leaf node until the stopping criteria is reached

Page 50: מבוא ל BI

Decision Trees

• DT algorithms mainly differ on– Splitting criteria

• Which variable to split first?• What values to use to split?• How many splits to form for each node?

– Stopping criteria• When to stop building the tree

– Pruning (generalization method)• Pre-pruning versus post-pruning

• Most popular DT algorithms include– ID3, C4.5, C5; CART; CHAID; M5

Page 51: מבוא ל BI

Cluster Analysis for Data Mining

• k-Means Clustering Algorithm– k : pre-determined number of clusters– Algorithm (Step 0: determine value of k)Step 1: Randomly generate k random points as initial cluster

centersStep 2: Assign each point to the nearest cluster centerStep 3: Re-compute the new cluster centersRepetition step: Repeat steps 3 and 4 until some

convergence criterion is met (usually that the assignment of points to clusters becomes stable)

Page 52: מבוא ל BI

Cluster Analysis for Data Mining - k-Means Clustering Algorithm

Step 1 Step 2 Step 3

Page 53: מבוא ל BI

Data Mining Myths

• Data mining …– provides instant solutions/predictions– is not yet viable for business applications– requires a separate, dedicated database– can only be done by those with advanced degrees– is only for large firms that have lots of customer

data– is another name for the good-old statistics

Page 54: מבוא ל BI

Common Data Mining Mistakes1. Selecting the wrong problem for data mining2. Ignoring what your sponsor thinks data mining is

and what it really can/cannot do3. Not leaving insufficient time for data acquisition,

selection and preparation4. Looking only at aggregated results and not at

individual records/predictions5. Being sloppy about keeping track of the data

mining procedure and results

Page 55: מבוא ל BI

Common Data Mining Mistakes6. Ignoring suspicious (good or bad) findings and

quickly moving on7. Running mining algorithms repeatedly and blindly,

without thinking about the next stage8. Naively believing everything you are told about

the data9. Naively believing everything you are told about

your own data mining analysis10. Measuring your results differently from the way

your sponsor measures them

Page 56: מבוא ל BI

Text Mining Application Area

• Information extraction• Topic tracking• Summarization• Categorization• Clustering• Concept linking• Question answering

Page 57: מבוא ל BI

Text Mining Terminology

• Unstructured or semistructured data• Corpus (and corpora)• Terms• Concepts• Stemming• Stop words (and include words)• Synonyms (and polysemes)• Tokenizing

Page 58: מבוא ל BI

Text Mining Terminology

• Term dictionary• Word frequency• Part-of-speech tagging• Morphology• Term-by-document matrix

– Occurrence matrix• Singular value decomposition

– Latent semantic indexing

Page 59: מבוא ל BI

Natural Language Processing (NLP)• Structuring a collection of text

– Old approach: bag-of-words– New approach: natural language processing

• NLP is …– a very important concept in text mining– a subfield of artificial intelligence and computational

linguistics– the studies of "understanding" the natural human

language• Syntax versus semantics based text mining

Page 60: מבוא ל BI

Natural Language Processing (NLP)• Challenges in NLP

– Part-of-speech tagging– Text segmentation– Word sense disambiguation– Syntax ambiguity– Imperfect or irregular input– Speech acts

• Dream of AI community – to have algorithms that are capable of automatically

reading and obtaining knowledge from text

Page 61: מבוא ל BI

NLP Task Categories• Information retrieval • Information extraction• Named-entity recognition• Question answering• Automatic summarization• Natural language generation and understanding• Machine translation• Foreign language reading and writing• Speech recognition• Text proofing• Optical character recognition

Page 62: מבוא ל BI

Text Mining Applications• Marketing applications

– Enables better CRM• Security applications

– ECHELON, OASIS– Deception detection (…)

• Medicine and biology– Literature-based gene identification (…)

• Academic applications– Research stream analysis

Page 63: מבוא ל BI

Web Mining Success Stories• Amazon.com, Ask.com, Scholastic.com, …• Website Optimization Ecosystem

Web Analytics

Voice of Customer

Customer Experience Management

Customer Interaction on the Web

Analysis of Interactions Knowledge about the Holistic View of the Customer

Page 64: מבוא ל BI

Web Mining ToolsProduct Name URL

Angoss Knowledge WebMiner angoss.com

ClickTracks clicktracks.com

LiveStats from DeepMetrix deepmetrix.com

Megaputer WebAnalyst megaputer.com

MicroStrategy Web Traffic Analysis microstrategy.com

SAS Web Analytics sas.com

SPSS Web Mining for Clementine spss.com

WebTrends webtrends.com

XML Miner scientio.com

Page 65: מבוא ל BI

Machine Learning MethodsMachine Learning

Supervised Learning

Reinforcement Learning

Unsupervised Learning

Classification· Decision Tree · Neural Networks· Support Vector Machines· Case-based Reasoning· Rough Sets· Discriminant Analysis· Logistic Regression· Rule Induction

Regression· Regression Trees· Neural Networks· Support Vector Machines· Linear Regression· Non-linear Regression· Bayesian Linear Regression

Clustering / Segmentation· SOM (Neural Networks)· Adaptive Resonance Theory · Expectation Maximization· K-Means · Genetic Algorithms

Association· Apriory· ECLAT Algorithm· FP-Growth· One-attribute Rule· Zero-attribute Rule

· Q-Learning· Adaptive Heuristic Critic

(AHC), · State-Action-Reward-State-

Action (SARSA) · Genetic Algorithms· Gradient Descent

Page 66: מבוא ל BI

BPM versus BI

• BPM is an outgrowth of BI and incorporates many of its technologies, applications, and techniques. – The same companies market and sell them.– BI has evolved so that many of the original differences

between the two no longer exist (e.g., BI used to be focused on departmental rather than enterprise-wide projects).

– BI is a crucial element of BPM.

• BPM = BI + Planning (a unified solution)

Page 67: מבוא ל BI

• Key performance indicator (KPI)A KPI represents a strategic objective and metric that measures performance against a goal

• Distinguishing features of KPIs

Performance Measurement KPIs and Operational Metrics

Strategy Targets Ranges

Encodings Time frames Benchmarks

Page 68: מבוא ל BI

• Key performance indicator (KPI)Outcome KPIs vs. Driver KPIs(lagging indicators (leading indicators e.g., revenues) e.g., sales leads)

• Operational areas covered by driver KPIs– Customer performance– Service performance – Sales operations– Sales plan/forecast

Performance Measurement

Page 69: מבוא ל BI

• The meaning of “balance” – BSC is designed to overcome the limitations of

systems that are financially focused – Nonfinancial objectives fall into one of three

perspectives: 1. Customer2. Internal business process 3. Learning and growth

BPM Methodologies

Page 70: מבוא ל BI

• In BSC, the term “balance” arises because the combined set of measures are supposed to encompass indicators that are: – Financial and nonfinancial– Leading and lagging– Internal and external– Quantitative and qualitative– Short term and long term

BPM Methodologies

Page 71: מבוא ל BI

BPM Methodologies

Strategy mapA visual display that delineates the relationships among the key organizational objectives for all four BSC perspectives

Page 72: מבוא ל BI

Performance Dashboards

• Dashboards and scorecards both provide visual displays of important information that is consolidated and arranged on a single screen so that information can be digested at a single glance and easily explored

Page 73: מבוא ל BI

Performance Dashboards

Page 74: מבוא ל BI

Performance Dashboards

• Dashboards versus scorecards – Performance dashboards

Visual display used to monitor operational performance (free form)

– Performance scorecards Visual display used to chart progress against strategic and tactical goals and targets (predetermined measures)