21
Microsoft SQL Server 2016 R Services

Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

Embed Size (px)

Citation preview

Page 1: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

Microsoft

SQL Server 2016 R Services

Page 2: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

Consistent experience from on-premises to cloud

Microsoft Tableau Oracle

$120

$480

$2,230

Self-service BI per user

In-memory across all workloads

built-inbuilt-in built-in built-in built-in

at massive scale

0 14

0 03

34

29

22

15

5

22

6

43

20

69

18

49

3

0

10

20

30

40

50

60

70

80

1 2 3 4 5 6

SQL Server Oracle MySQL SAP HANA TPC-H

Oracle is #5#2

SQL Server

#1

SQL Server

#3

SQL Server

SQL Server 2016: Everything built-in

2

Page 3: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

從資料到決策和行動

價值

資料

$1.6trillion

行動决策

Page 4: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

微軟先進分析產品

Cortana

Analytics Suite

SQL Server 2016

Page 5: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

典型先進分析的生命週期

Ingest Transform Explore Model Deploy

Score Visualize Measure

Model

Score

ƒ(x)

準備 Modeling

投入生產

Page 6: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

資料科學家應該是關注創建/測試模型

Data scientist

Ingest Transform Explore Model Deploy

Score Visualize Measure

Model

Score

ƒ(x)

準備 Modeling

投入生產

Page 7: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

但現實是...

Data scientist focus time

Ingest Transform Explore Model Deploy

Score Visualize Measure

Model

Score

ƒ(x)

準備 Modeling

投入生產

80%

5%

15%

Page 8: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

決定

投入生產

先進分析是一項團隊運動

Preparation

model

Page 9: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

什麼是 R ?

開源“lingua franca”

Analytics, computing, modeling

Global community

Millions of users 7,000+Packages

Big dataEcosystem

Scalability

Page 10: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

CRAN: The Comprehensive R Archive Network

Open Source “lingua franca”

Analytics, Computing, Modeling

In addition to CRAN, Bioconductor, GitHub, and others distribute R packages

Page 11: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

大量人才知道如何使用

為什麼 R ?

可擴充正在進行計算的資料

更容易保護重要的資料

角色使用創建效率

Page 12: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

$?

開源R的挑戰

Uncertain total cost of ownership and return on investment

Integrating R with existing and ever changing data infrastructures

Scale and Performance

Data movement restricts access for efficient data modeling

Page 13: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

Big Data In-memory bound Hybrid memory & disk scalability Operates on bigger

volumes & factors

Speed of

Analysis

Single threaded Parallel threading and Processing Shrinks analysis time

Enterprise

Readiness

Community support Commercial support Delivers full service

production support

Analytic

Breadth &

Depth

7000+ innovative analytic

packages

Leverage and optimize open

source packages plus Big Data

ready packages

Supercharges R

Commercial

Viability

Risk of deployment of open

source

Commercial licenses Eliminates risk with

open source

開源 好處微軟R

微軟R的好處

Page 14: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

Faster And More Scalable

Page 15: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

Custom parallelization

PEMA-R API

rxDataStep

rxExec

Data step

Data import – Delimited, fixed, SAS, SPSS, OBDC

Variable creation & transformation

Recode variables

Factor variables

Missing value handling

Sort, merge, split

Aggregate by category (means, sums)

Descriptive statistics

Min/max, mean, median (approx.)

Quantiles (approx.)

Standard deviation

Variance

Correlation

Covariance

Sum of squares (cross-product matrix for set variables)

Pairwise cross tabs

Risk ratio & odds ratio

Cross-tabulation of data (standard tables & long form)

Marginal summaries of cross tabulations

Statistical tests

Chi Square Test

Kendall Rank Correlation

Fisher’s Exact Test

Student’s t-Test

Sampling

Subsample (observations & variables)

Random sampling

Predictive models

Sum of squares (cross-product matrix for set variables)

Multiple linear regression

Generalized linear models (GLM) exponential family distributions: binomial,

Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit,

identity, log, logit, probit. User defined distributions & link functions.

Covariance & correlation matrices

Logistic regression

Classification & regression trees

Predictions/scoring for models

Residuals for all models

Simulation

Simulation (e.g., Monte Carlo)

Parallel random number generation

Cluster analysis

K-Means

Classification

Decision trees

Decision forests

Gradient-boosted decision trees

Naïve Bayes

Parallelized, Remote Executing Algorithms

Page 16: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

In-database advanced analytics

Data Scientist

Interacts directly with data

SQL Developer/DBAManage data and

analytics together

ExtensibilityExample solutions

Sales forecasting

Warehouse efficiency

Predictive

maintenance

Credit risk protection

010010

100100

010101

Relational data

Analytics library

T-SQL interface

?R

integration

Built into

SQL Server 2016

010010

100100

010101

Real-time operational analyticswithout moving data

R with in-memory scalability

Page 17: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

rows

min

ute

s

External

Access

In

Database

Page 18: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

Flexibility & Agility

寫一次部署在任何地方 No model re-writes across platforms

No re-writes from modeling to scoring

Hybrid modeling & scoring Model on premises, score on premises

Model on premises, score in the cloud

Model on cloud, score on premises

ModelPrepare

SQL

Server

Score

Parallelized Models

Page 19: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

Financial Services Digital Media & Retail

Healthcare & Pharma Government & Academia Analytics Service Providers

Manufacturing & High Tech

微軟R部分的客戶

Page 20: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,

SQL Server 2016 R Services ( In-database)

In-DB analytics

Parallel threading and processing

Easy to operationize

Developers, DBAs and Data Scientists can use their preferred tools

Model on-premises, score in cloud—or vice versa

Easy way to overcome memory limitations -enabling limits of larger data sets

Included in SQL Server 2016

Reuse and optimization of existing R code

Reduced recoding and training costs

$

Page 21: Microsoft SQL Server 2016 R Servicesdownload.microsoft.com/.../0324SQLserver2016_session3.pdf · Data import –Delimited, fixed, SAS, SPSS, OBDC ... Gaussian, inverse Gaussian, Poisson,