56
Microsoft Technologies for Data Science Mark Tabladillo, Ph.D. Lead Data Scientist (Architect) Microsoft December 2016: SQL Saturday BI Atlanta, GA

Microsoft Technologies for Data Science 201612

Embed Size (px)

Citation preview

Page 1: Microsoft Technologies for Data Science 201612

Microsoft Technologies for Data Science

Mark Tabladillo, Ph.D.

Lead Data Scientist (Architect)

Microsoft

December 2016: SQL Saturday BI Atlanta, GA

Page 2: Microsoft Technologies for Data Science 201612

Networking

Interactive

Page 3: Microsoft Technologies for Data Science 201612
Page 4: Microsoft Technologies for Data Science 201612
Page 5: Microsoft Technologies for Data Science 201612

Terms Definition

Data Science

Machine Learning

Data Mining

Applied Statistics

the automated or semi-

automated process of

discovering patterns in

data

Applied scientific method

Page 6: Microsoft Technologies for Data Science 201612

http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html

Page 7: Microsoft Technologies for Data Science 201612
Page 8: Microsoft Technologies for Data Science 201612

https://azure.microsoft.com/en-us/documentation/learning-paths/data-science-process/

Page 9: Microsoft Technologies for Data Science 201612

https://azure.microsoft.com/en-us/documentation/learning-paths/data-science-process/

Page 10: Microsoft Technologies for Data Science 201612

https://azure.microsoft.com/en-us/documentation/learning-paths/data-science-process/

Page 11: Microsoft Technologies for Data Science 201612
Page 12: Microsoft Technologies for Data Science 201612
Page 13: Microsoft Technologies for Data Science 201612

Technology Choices

SQL SERVER ANALYSIS SERVICES Enterprise

Business Intelligence

EXCEL ADD-IN FOR SSAS Office 365

Office 2013 or Higher x64

SEMANTIC SEARCH Enterprise

Business Intelligence

Standard

Web

Express with Advanced Services

MICROSOFT AZURE ML Free (Size Limited)

Paid (Web Service): Experiment + Query

F# Open Source

SQL SERVER R SERVICES SQL Server 2016 or higher

Page 14: Microsoft Technologies for Data Science 201612
Page 15: Microsoft Technologies for Data Science 201612

http://download.microsoft.com/download/F/C/2/FC21C981-

4351-4434-A78A-

3384CA7515BF/SQL_Server_2016_Deeper_Insights_Across_D

ata_White_Paper.pdf

Page 16: Microsoft Technologies for Data Science 201612

SS

SQL

AS

NoSQL

Page 17: Microsoft Technologies for Data Science 201612
Page 18: Microsoft Technologies for Data Science 201612

Data mining add-in for business analysts

• Ease of use

• Rich data mining

• Scalable

Page 19: Microsoft Technologies for Data Science 201612
Page 20: Microsoft Technologies for Data Science 201612
Page 21: Microsoft Technologies for Data Science 201612
Page 22: Microsoft Technologies for Data Science 201612

Rowset

Output

with Scores

Varchar

NVarchar

Office

PDF

Page 23: Microsoft Technologies for Data Science 201612

Documents

Full-Text

Keyword

Index

“FTI”

iFilters

Semantic Document

Similarity Index “DSI”

Semantic

Database

Semantic

Key Phrase

Index –

Tag Index

“TI”

Page 24: Microsoft Technologies for Data Science 201612

Simplified Chinese

British English

Portuguese

Chinese (Hong Kong SAR, PRC)

Spanish

Chinese (Singapore)

Chinese (Macau SAR)

Page 25: Microsoft Technologies for Data Science 201612

Time in Seconds vs. Number of Documents

(2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)

http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf

Page 26: Microsoft Technologies for Data Science 201612
Page 27: Microsoft Technologies for Data Science 201612
Page 28: Microsoft Technologies for Data Science 201612
Page 29: Microsoft Technologies for Data Science 201612
Page 30: Microsoft Technologies for Data Science 201612

FeaturesMicrosoft R Open

R Distribution (Free)

Microsoft R Client

Free

Microsoft R Server

Commercial

Big Data

In-memory bound

Can only process datasets that fit

into the available memory

In-memory bound

Can process datasets that fit into the available

memory

Operates on large volumes when connected

to R Server

Disk scalability

Operates on bigger volumes &

factors

Speed of

Analysis

Multi-threaded when MKL is

installed for non-ScaleR functions

Multi-threaded with MKL for non-ScaleR

functions

Up to 2 threads for ScaleR functions with a

local compute context

Full parallel threading &

processing

Enterprise

ReadinessCommunity support Community support Commercial support

Analytic

Breadth

& Depth

8000+ open source packagesLeverage & optimize open source R packages

plus 'Big Data'-ready ScaleR packages

Leverage & optimize open source

R packages plus 'Big Data'-ready

+ Multithreaded ready ScaleR

packages

Commercial

Viability

Risk of deployment to open

sourceFree for everyone Commercial licenses

DeployR

EnterpriseNot available Not available Included

Page 31: Microsoft Technologies for Data Science 201612

Microsoft R Server Editions Description Install ScaleR Get Started

R Server for Hadoop

Scale your analysis transparently

by distributing work across

nodes without complex

programming

Doc Doc

R Server for Teradata DB

Run advanced analytics in-

database for seamless data

analysis

Doc Doc

R Server for Linux

Bring predictive and prescriptive

analytics power to your Linux

environments

Doc Doc

Page 32: Microsoft Technologies for Data Science 201612

http://datacamp.com

Page 33: Microsoft Technologies for Data Science 201612
Page 34: Microsoft Technologies for Data Science 201612
Page 35: Microsoft Technologies for Data Science 201612
Page 36: Microsoft Technologies for Data Science 201612

Mutable Immutable

Classic Open

Source

Java Scala

.NETNow Open Source

C#, C++,

VB.NET

F#

Page 37: Microsoft Technologies for Data Science 201612
Page 38: Microsoft Technologies for Data Science 201612
Page 39: Microsoft Technologies for Data Science 201612
Page 40: Microsoft Technologies for Data Science 201612
Page 41: Microsoft Technologies for Data Science 201612

https://www.microsoft.com/en-us/cloud-platform/what-is-cortana-intelligence-suite

Page 42: Microsoft Technologies for Data Science 201612

Capabilities Products

Preconfigured solutions •Business scenarios •Forecasting, churn, etc.

Intelligence

•Integration with Cortana

•Bot services

•Cognitive services

•Cortana

•Bot Framework

•Cognitive Services

Dashboards and visualizations •Dashboards and visualizations •Power BI

Machine learning and advanced

analytics

•Machine learning

•Hadoop

•Distributed analytics

•Complex event processing

•Machine Learning

•HDInsight (Data Lake service)

•Data Lake analytics

•Stream Analytics

Big data stores•Big Data repository

•Elastic data warehouse

•Data Lake store, Blobs

•SQL Data Warehouse

Information management

•Data orchestration

•Data catalog

•Event ingestion

•Data Factory

•Data catalog

•Event Hubs

Page 43: Microsoft Technologies for Data Science 201612

https://github.com/jakevdp/sklearn_pycon2015

Page 44: Microsoft Technologies for Data Science 201612
Page 45: Microsoft Technologies for Data Science 201612

http://www.bing.com/explore/predicts

Page 46: Microsoft Technologies for Data Science 201612

https://techcrunch.com/2016/07/07/microsoft-now-helps-businesses-use-the-data-that-powers-bing-predicts/

Page 47: Microsoft Technologies for Data Science 201612
Page 48: Microsoft Technologies for Data Science 201612

https://academy.microsoft.com/en-US/professional-degree/data-science/

https://borntolearn.mslearn.net/b/weblog/posts/announcing-the-microsoft-professional-degree-mpd-program

Page 49: Microsoft Technologies for Data Science 201612

http://www.kdnuggets.com/2015/09/free-data-science-books.html

Page 50: Microsoft Technologies for Data Science 201612

https://channel9.msdn.com/Blogs/Windows-Azure

https://mva.microsoft.com/

Page 51: Microsoft Technologies for Data Science 201612

http://blogs.technet.com/b/machinelearning/

http://social.msdn.microsoft.com/forums/azure/en-US/home?forum=MachineLearning

http://sqlserverdatamining.com

http://marktab.net

http://curah.microsoft.com/342704/azure-machine-learning-videos-february-2015

Page 52: Microsoft Technologies for Data Science 201612

http://datascience.sqlpass.org/

https://www.youtube.com/channel/UCqB3xWdwjA9soFV6EOu7qfg

Page 55: Microsoft Technologies for Data Science 201612
Page 56: Microsoft Technologies for Data Science 201612