Upload
dangthien
View
227
Download
0
Embed Size (px)
Citation preview
1 1
The Analyst’s Perspective: Ad-hoc Analysis with Microsoft PowerPivot and Office 2010 Excel
Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd [email protected]
2 2
Objectives
Introduce powerful self-service analysis with PowerPivot
Show use of Microsoft SQL Server 2008 Analysis Services Data Mining
The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation. Portions © 2010 Project Botticelli Ltd & entire material © 2010 Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.
This seminar is based on a number of sources including a few dozen of Microsoft-owned presentations, used with permission. Thank you to Chris Dial, Tara Seppa, Aydin Gencler, Ivan Kosyakov, Bryan Bredehoeft, Marin Bezic, and Donald Farmer with his entire team for all the support.
4 4
Massive Data Volumes With a few mouse clicks, a user can create and publish intuitive and interactive self-service analysis solutions
7 7
Published
Reports
SharePoint
Farm
Report-Based
Data Feeds
OLTP and OLAP Data Sources
Reporting Services as a Data Source
9 9
Share and Collaborate
With SharePoint:
Publish your PowerPivots as Web applications for your team
Schedule data refreshes to keep your analysis up-to-date
Manage security just like a document
10 10
PowerPivot Infrastructure Overview
SharePoint Farm
WFE
App Servers
Content dBs
NLB
Excel, RB, PerfPoint
Power User
Data Sources
Excel Services
PowerPivot Mid-Tier
AS Engine
Browser
Standard User
PowerPivot Add-In
11 11
PowerPivot Infrastructure: Excel
SharePoint Farm
WFE
App Servers
Content dBs
NLB
Excel Services
Gemini Mid-Tier
Gemini Engine
Browser
Standard User
Excel, RB, PerfPoint
Power User
Data Sources
• Use of IMBI Engine: In-Memory Column-
Based store
• Once data is imported, all calculations
are performed on client
• Excel now has it’s own local SSAS
engine
• Added Excel power functions for Gemini
called DAX (Data Analysis eXpressions)
• Use of new compression algorithm to
significantly compress the data ~ 10:1
• Added slicer functionality: not just for UI
but for smoother SharePoint integration
PowerPivot Add-In
12 12
Excel, RB, PerfPoint
Power User
Data Sources
Browser
Standard User
SharePoint Farm
WFE
App Servers
Content dBs
NLB
Excel Services
PowerPivot Mid-Tier
AS Engine
PowerPivot SharePoint Integration: ECS Viewing
Excel Web Access
13 13
Excel, RB, PerfPoint
Power User
Data Sources
Browser
Standard User
SharePoint Farm
WFE
App Servers
Content dBs
NLB
Excel Services
PowerPivot Mid-Tier
AS Engine
PowerPivot SharePoint Integration: Server Action
Excel Web Access
14 14
Data Analysis Expressions (DAX)
Simple Excel-style formulas
Define new fields in the PivotTable field list
Enable Excel users to perform powerful data analysis using the skills they already have
Has elements of MDX but does not replace MDX
15 15
Data Analysis Expressions (DAX)
No notion of addressing individual cells or ranges
DAX functions refer to columns in the data
Sample DAX expression Means: = [First Name] &“ ”& [Last Name] String concatenation just like Excel
=SUM(Sales[Amount]) SUM function takes a column name
instead of a range of cells
=RELATED (Product[Cost]) new RELATED function follows
relationship between tables
16 16
DAX Aggregation Functions
DAX implements aggregation functions from Excel including SUM, AVERAGE, MIN, MAX, COUNT, but instead of taking multiple arguments (a list of ranges,) they take a reference to a column
DAX also adds some new aggregation functions which aggregate any expression over the rows of a table
SUMX (Table, Expression)
AVERAGEX (Table, Expression)
COUNTAX (Table, Expression)
MINX (Table, Expression)
MAXX (Table, Expression)
16
17 17
More than 80 Excel Functions in DAX Date and Time Information Math and Trig Statistical Text DATE ISBLANK ABS AVERAGE CONCATENATE DATEVALUE ISERROR CEILING, ISO.CEILING AVERAGEA EXACT DAY ISLOGICAL EXP COUNT FIND EDATE ISNONTEXT FACT COUNTA FIXED EOMONTH ISNUMBER FLOOR COUNTBLANK LEFT HOUR ISTEXT INT MAX LEN MINUTE LN MAXA LOWER
MONTH Logical LOG MIN MID NOW AND LOG10 MINA REPLACE SECOND IF MOD REPT TIME IFERROR MROUND RIGHT TIMEVALUE NOT PI SEARCH TODAY OR POWER SUBSTITUTE WEEKDAY FALSE QUOTIENT TRIM WEEKNUM TRUE RAND UPPER YEAR RANDBETWEEN VALUE
YEARFRAC ROUND
ROUNDDOWN ROUNDUP SIGN SQRT SUM SUMSQ TRUNC
18 18
Example: Functions over a Time Period TotalMTD (Expression, Date_Column [, SetFilter])
TotalQTD (Expression, Date_Column [, SetFilter])
TotalYTD (Expression, Date_Column [, SetFilter] [,YE_Date])
OpeningBalanceMonth (Expression, Date_Column [,SetFilter])
OpeningBalanceQuarter (Expression, Date_Column [,SetFilter])
OpeningBalanceYear (Expression, Date_Column [,SetFilter] [,YE_Date])
ClosingBalanceMonth (Expression, Date_Column [,SetFilter])
ClosingBalanceQuarter (Expression, Date_Column [,SetFilter])
ClosingBalanceYear (Expression, Date_Column [,SetFilter] [,YE_Date])
22 22
Typical Uses
Data Mining
Seek Profitable Customers
Understand Customer
Needs
Anticipate Customer
Churn
Predict Sales &
Inventory
Build Effective
Marketing Campaigns
Detect and Prevent Fraud
Correct Data During
ETL
23 23
Analysis Services Server
Mining Model
Data Mining Algorithm Data Source
Server Mining Architecture
Excel/Visio/SSRS/Your App
OLE DB/ADOMD/XMLA
Deploy
BIDS Excel Visio SSMS
App Data
24 24
Mining Model Mining Model Mining Model
Mining Process
DM Engine DM Engine
Training data
Data to be
predicted Mining Model
With
predictions
25
Microsoft Decision Trees
Use for: Classification: churn and risk analysis
Regression: predict profit or income
Association analysis based on multiple predictable variable
Builds one tree for each predictable attribute
Fast
27 27
Profitability and Risk
Finding what makes a customer profitable is also classification or regression
Typically solved with: Decision Trees (Regression), Linear Regression,
and Neural Networks or Logistic Regression
Often used for prediction Important to predict probability of the predicted, or expected profit
Risk scoring Logistic Regression and Neural Networks
28 28
Neural Network & Logistic Regression
Applied to Classification
Regression
Great for finding complicated relationship among attributes
Difficult to interpret results
Gradient Descent method
LR is NNet with no hidden layers
Age Education Sex Income
Input
Layer
Hidden
Layers
Output
Layer Loyalty
30
Time Series
Uses: Forecast sales
Inventory prediction
Web hits prediction
Stock value estimation
Regression trees with extras
32 32
Data Mining Techniques Algorithm Description
Decision Trees Finds the odds of an outcome based on values in a training set
Association Rules Identifies relationships between cases
Clustering Classifies cases into distinctive groups based on any attribute sets
Naïve Bayes Clearly shows the differences in a particular variable for various data elements
Sequence Clustering
Groups or clusters data based on a sequence of previous events
Time Series Analyzes and forecasts time-based data combining the powerof ARTXP (developed by Microsoft Research) for short-term predictionswith ARIMA (in SQL 2008) for long-term accuracy.
Neural Nets Seeks to uncover non-intuitive relationships in data
Linear Regression Determines the relationship between columns in order to predict an outcome
Logistic Regression
Determines the relationship between columns in order to evaluate the probability that a column will contain a specific state
34 34
Summary
Self-service analysis is now very powerful
Works with huge data sets PowerPivot for columnar and multidimensional analysis
Data Mining for pattern discover
To start, all you need is PowerPivot, Excel 2010, and perhaps SQL Analysis Services
35 35
© 2010 Microsoft Corporation & Project Botticelli Ltd. All rights reserved. The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation. Portions © 2010 Project Botticelli Ltd & entire material © 2010 Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.