From data to business advantage - az370354.vo.msecnd.netaz370354.vo.msecnd.net/videos/Dopoledne_Jak...

Preview:

Citation preview

From data to business advantage

Rafal LukawieckiStrategic Consultant

Project Botticelli Ltd

rafal@projectbotticelli.com

@rafaldotnet

Objectives

The information herein is for informational purposes only and represents the opinions and views of Project

Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors.

Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation.

Portions © 2014 Project Botticelli Ltd & entire material © 2014 Microsoft Corp unless noted otherwise. Some

slides contain quotations from copyrighted materials by other authors, as individually attributed or as already

covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other

product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The

information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of

the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions,

it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli

cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli

makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.

[data + analytics + people] @ speedMicrosoft data dividend formula

Microsoft transformation

Mobile-first Cloud-first

Data-driven

Microsoft transformation

Cloud-firstMobile-first

Water pumps

Outbreaks

Public domain picture. Wikimedia. http://commons.wikimedia.org/wiki/File:Snow-cholera-map.jpg

Transformative opportunity

Business analytics

Good BI is key to business analytics

Integrate

+ Cleanse

Model +

Enrich

Visualise

Query

Share

Power

Query

Power

PivotPower View

& Power

Map

Power BI

Q&A

Power BI

Sites +

SharePoint

Power BI for cloud collaboration and new experiences

Excel as the BI tool for everyone

Self-service, cloud

IT

SharePoint + SQL/APSIT scalability & control

Excel user-driven

Corporate self-service, on-prem

Power BI for cloud collaboration and new experiences

Hybrid

IT

SharePoint + SQL/APSIT scalability & control

Big data, or just complex data?

velocity

variety complexity

volume

Data

interpretingpreparing

Today’s big data, tomorrow’s little dataComplexity vs. current capabilities

FAA International Flight Service Station, Honolulu, Hawaii, 1964 (Public Domain Image)

So… what is big data?

Machine learning

Machine learning = data mining?

Best together

Data wrangling (munging), retrieval

+ storage

Data mining & machine learning

Statistics

Big data

Domain Common big data scenarios

Financial services Modeling true risk

Threat analysis and fraud detection

Trade surveillance

Credit scoring and analysis

Media & Entertainment Recommendation engines

Ad targeting

Search quality

Abuse and click fraud detection

Retail Point of sales transaction analysis

Customer churn analysis

Sentiment analysis

Telecommunications Customer churn prevention

Network performance optimization

Call Detail Record (CDR) analysis

Network failure prediction

Government Cyber security (botnets, fraud)

Traffic congestion and re-routing

Environmental monitoring

Antisocial monitoring via social media

Healthcare Genomics research

Cancer research

Health pandemics early detection

Air quality monitoring

Do you need it?

Process

Understand & change

data

Discover patterns, build & validate models

Change business

People

Data expert

Data scientist

Domain expert

Start of an engagement

Data, sucks

slightly

Are there any useful patterns?

Unclear business

goals

Example: fraud

Does enough data show

examples of fraud?

Are the predictable patterns of fraud?

Can we reduce

fraud? What is fraud?

In-house intelligence

Understand & change

data

Discover patterns, build & validate models

Change business

What tools do data scientists use?Purple=data analyst role

SQL 42%!

#1 data science tool

Chart from "2013 Data Science Salary Survey" (ISBN 978-1-491-94914-6)

© 2014 O'Reilly Media, used with permission.

For more info, and great titles on data science, visit oreilly.com

My analytical toolkit at Project Botticelli

main tools

secondary

only if I can’t avoid it

Chart from "2013 Data Science Salary Survey" (ISBN 978-1-491-94914-6)

© 2014 O'Reilly Media, used with permission.

For more info, and great titles on data science, visit oreilly.com

might try

My toolkit (chronologically)

○ SQL Server

○ DB engine

○ SSAS for data mining

○ Excel

○ now + Power Query

○ R and RStudio

○ Stats

○ Great charts

○ Curve fitting

○ Rattle for data mining

○ Mahout in Hadoop

○ HDP, HDInsight, or just *nix

Hadoop

○ Evaluating H2O + Spark now

○ Python 3

○ PyCharm IDE on OS X

○ Visual Studio with Microsoft

Python tools on Windows

○ Azure ML

○ Data mining

Algorithm Description

Decision Trees Finds the odds of an outcome based on values in a training set, presents visually

Association Rules Identifies relationships between cases

Clustering Classifies cases into distinctive groups based on any attribute sets

Naïve Bayes Clearly shows the differences in a particular variable for various data elements

Sequence

Clustering

Groups or clusters data based on a sequence of previous events

Time Series Analyzes and forecasts time-based data combining the power of ARTXP (developed

by Microsoft Research) for short-term predictions with ARIMA for long-term accuracy.

Neural Nets Seeks to uncover non-intuitive relationships in data

Linear Regression Determines the relationship between columns in order to predict an outcome

Logistic

Regression

Determines the relationship between columns in order to evaluate the probability that

a column will contain a specific state

Algorithm Description

Random

Forests

Like decision tree, but can be more accurate, and

difficult to understand

Boosting Like random forest, but using any other algorithm (not

just DT), “boosts” model accuracy for less frequent

items

Survival

Analysis

Finds the risk of an outcome given periods of time

Ensemble Combination of multiple models (in SQL or R)

Microsoft tech for big data

Prebuilt & performance-tuned appliance

Linear scale-out to petabytes of data

MPP design & in-memory columnstorefor 100x speed improvement

Dedicated region for Hadoop

Joining relational & non-relational datawith Polybase

Analytics Platform System (APS)

MPP SQL Server

Hadoop

PolyBase

Apache Hadoop distribution

Developed by Hortonworks & Microsoft

Integrated with Microsoft BI

Microsoft HDInsight

Part 1: the job

Big, fast, or

complex

data

HDInsight

Tabular

OLAP

SQL

010101010101010101

1010101010101010

01010101010101

101010101010

Interaction,

exploration,

reporting,

visualisationAPS +

Polybase

Hadoop cluster

Yahoo! Hadoop cluster, about 2007.

Source: http://developer.yahoo.com. Picture used with permission.

Hadoop cluster

Buster Cluster, an early research project

by Miles Osborne, University of

Edinburgh, School of Informatics.

Picture used with permission.

http://homepages.inf.ed.ac.uk/miles/

Cloudrent-a-Hadoop-cluster, or:

“Supercomputer for cents”

Windows Azure HD Insight

Processing logic in HDInsight 3.0 & 3.1Hadoop 2.2/2.4: Interactive, online, stream, or batch(Hadoop 1.x was batch process only)

Hadoop data science

Collaborative filtering,

recommenders, clustering,

singular value decomposition,

parallel frequent pattern mining,

naive Bayes, decision tree

Part 2: the results

Azure ML = data science in

Turning data into advantage

Summary

projectbotticelli.com

BI video tutorials, PPTs, and articles

15% Off: 15PRAGUE2014

Valid until end of November 2014

Follow: @rafaldotnet

Email: rafal@projectbotticelli.com

Discover: rafal.net

The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties,

express, implied or statutory, as to the information in this presentation.

Portions © 2014 Project Botticelli Ltd & entire material © 2014 Microsoft Corp unless noted otherwise. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright

ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and

represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and

Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.

Recommended