50
dashDB Advanced Warehouse Analytics in the Cloud Torsten Steinbach Armin Stegerer © 2014 IBM Corporation

IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

Embed Size (px)

Citation preview

Page 1: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

dashDBAdvanced Warehouse Analytics in the Cloud

Torsten SteinbachArmin Stegerer

© 2014 IBM Corporation

Page 2: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

Please Note• IBM’s statements regarding its plans, directions, and intent are subject to change or

withdrawal without notice at IBM’s sole discretion.

• Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.

• The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.

• The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

2

Page 3: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

Disclaimer

© Copyright IBM Corporation 2014. All rights reserved.U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM'S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE.

IBM's statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM's sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

IBM, the IBM logo, ibm.com, Information Management, DB2, DB2 Connect, DB2 OLAP Server, pureScale, System Z, Cognos, solidDB, Informix, Optim, InfoSphere, and z/OS are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml

Other company, product, or service names may be trademarks or service marks of others.

Page 4: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

The Analytics ChallengeBringing Analytics to the DataAnalytics with dashDBPredictive Analytics with RGeoSpatial Analytics with ESRIAnalytic Extension Framework

Page 5: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

Data is the Basis of

1“2014 Analytics Market Survey,” research note, Nucleus Research, September 2014.2“Analytics Pays Back $13.01 for Every Dollar Spent,” research note, Nucleus Research, September 2014.3"Predicts 2014: Why You Should Modernize Your Information Infrastructure", November 28, 2013. Gartner.

Increasing investment

71%Faster ROI

13 to 1

Data warehouses will get you there

over 90%of analytics customers plan to increase their

analytics budgets within the next 2 years1

Analytics pays back US$13.01 for every

dollar spent – 1.2 times more than it

did 3 years ago2

of big data implementations will

augment, not replace, existing data warehouses3

Data is the Basis of New Competitive Advantage

NEW COMPETITIVE ADVANTAGE

Page 6: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

6

The Analytics Challenge

FRAUD DETECTIONBYING BEHAVIORCROSS-SELLING

HEALTH RISK ASSESSMENT

PORTFOLIO MANAGEMENT

DIGITAL MARKETINGSTORE PLACEMENT

ROUTE OPTIMIZATION PRODUCT PRICING

NEAREST SHOP

TelcoHealth Banking

Insurance

Retail

Transportation GovermentManufacturing

Big Data

Reading the data into the analytic tools

Page 7: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

Advanced Analytics Is Much More than OLAP or Calculating Statistics

Source: Wiki:: CRISP-DM Reference Model

Page 8: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

Most Time is Spent in Data Discovery and Preparation

Source: RexerAnalytics Data Miner Survey 2008

Some more recent sources claim this to be up to 60-70%

Page 9: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

9

Data + Data > 2 x Data

Public Data• Weather• News• Stocks• Social

Media• ...

Enterprise Data• Orders• CRM• Master Data• Operations• ...

Systems of Engagement• IoT• Mobile Apps• Cloud Apps

Correlation of Structured

Data

through overall reduction of systems, not data movements, improved utilization

and the power of mature structured data processing

Optimal ROI of in-db Analytics

Combining various data in a DW can be a

fusion reactor for analytics

• Speed to market• Improved accuracy• Lower cost

Page 10: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

Basic Math* Permutation and

Combination* Greatest Common

Divisor and Least Common Multiple*

Conversion of Values* Exponential and

Logarithm* Gamma and Beta

Functions Matrix Algebra+ Area Under Curve* Interpolation Methods*

Transformations MathematicalTime Series

Linear Regression+

Logistic Regression+

Classification

Bayesian

Sampling

Model Testing

Geospatial Data Type

Geometric Functions

Geometric Analysis

Predictive Geospatial* Fuzzy Logix

DB Lytix capabilities

+ Netezza Analytics and Fuzzy Logix DB Lytix capabilities

Data Profiling / Descriptive Statistics+

General Diagnostics

Statistics+

Sampling

Data prep

In-db Analytics provides support for all phases of the analytical process

Descriptive Statistics+

Distance Measures*

Hypothesis Testing*

Chi-Square & Contingency Tables*

Univariate & Multivariate Distributions+

Monte Carlo Simulation*

Autoregressive+

Forecasting*

Association Rules+

Clustering+

Feature Extraction+

Discriminant Analysis*

Data Mining

Statistics

Page 11: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

The Analytics ChallengeBringing Analytics to the DataAnalytics with dashDBPredictive Analytics with RGeoSpatial Analytics with ESRIAnalytic Extension Framework

Page 12: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

The One Big Reason for In-Database Analytics:Bring Analytics to the Data

• Scalable and high-performance analytics -> Analytics Accelerator

Shorten response times Scale analyzed data volume (both by two to three digit

factors)• The secret sauce

Data Proximity: avoid to move data to analytic tools Scale-out: Run code on the MPP architecture of the WH

engine Talk the language of the user and the application developer

• R, SQL, Java, Python, C++, LUA, etc. Flexible runtime model: scalar, aggregate or table functions,

external executables Coverage: wide variety of algorithms and operators out-of-

the-box: • Predictive, Statistical and GeoSpatial Analytics

• Complements analytic tools … because it allows to accelerate and scale their analytics SPSS, R, SAS, ESRI, FuzzyLogix, Zementis, Aginity, …

Page 13: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

IBM Netezza Analytics Ecosystem

PureData for Analytics AMPP Platform

SoftwareDevelopment

Kit

3rd PartyIn-Database

Analytics

NetezzaIn-Database

Analytics

User-DefinedExtensions(UDF,UDA,

UDTF,UDAP)

Transformations

Mathematical

Geospatial

Predictive

Statistics

Time Series

Data Mining

Fuzzy Logix

SAS

Zementis

IBM SPSS

LanguageSupport

(Map/Reduce, Java, R, Python,

Lua, Perl,C, C++, Fortran) Mathworks

Open Source R

BI Tools

Visualization Tools

Eclipse

Open Source R

SAS

IBM SPSS

Apache Hadoop

Cloudera

IBMInfoSphereBigInsights

IBM InfoSphere

Streams

Esri

Netezza Analytics is one of the leaders for in-database analytics, making Netezza an attractive platform for users and third-party vendors in the

predictive analytics space

Page 14: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

Analytic Code & Algorithms:

Analytic Data:

Data pulled out and processed in analytic application

Analytic Applications

This is where we start from: All analytic processing done on application side

Analytics of Warehouse Data

Page 15: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

SQLs

Analytic Code & Algorithms:

Analytic Data:

Simple data lookup & massage operations pushed down as SQL operations

Analytic Applications

Benefit: Acceleration with no SQL skills required

SQLs

Push Down Step 1: BLU tables only logically represented in analytic application

Accelerate Analytics for Warehouse Data

Page 16: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

SQLs

Analytic Code & Algorithms:

Analytic Data:

Call built-in functions via SQL to execute typical algorithms inside db

Cloud Tooling

Analytic Applications

Benefit: Bring Standard Analytics to the Data

SQLsCanned

Algorithms

Push Down Step 2: Typical and popular algorithms pushed down to canned UDFs in the db

Accelerate Analytics for Warehouse Data

Page 17: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

Lang

uage

Fra

mew

ork

(UD

X &

AE

)

Analytic Code & Algorithms:

Analytic Data:

Deploy customer code and call via special SQL function interfaces

SQLsSQLs

Canned Algorithms

Analytic Applications

Benefit: Bring Custom Analytics to the Data

Push Down Step 3: Execute entire customer analytic programs inside the db

Accelerate Analytics for Warehouse Data

Page 18: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

The Analytics ChallengeBringing Analytics to the DataAnalytics with dashDBPredictive Analytics with RGeoSpatial Analytics with ESRIAnalytic Extension Framework

Page 19: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

Modernize existing Data Warehousing with on-demand cloud agility

Embrace the concept of the logical data warehouse by combining cloud and on-premises deployments

Faster insight without the up front infrastructure investment

Full support for hybrid “ground to cloud” deployments

19

Organizations gaining competitive advantage through cloud adoption are reporting: 1

as compared to peer companies who are more cautious about cloud computing1

1http://www-03.ibm.com/press/us/en/pressrelease/42304.wss2http://www.huffingtonpost.com/vala-afshar/the-top-100-cloud-computi_b_3756172.html3http://www.businesswire.com/news/home/20100722005325/en/Cloud-Computing-Delivering-Promise-Doubts-Hold-Adoption#.UufrRKX0B8Y

77% of enterprises are in the initial stages of

cloud adoption2

84% of CIOs cut application costs by moving to the cloud2

58% of IT Decision Makers think cloud solutions give

them better control of their data3

2x revenue growth

2.5x higher gross profit

Cloud is Essential to the Modern Data Warehouse

Page 20: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

20

•Cloud-based predictive & cognitive analytics discovery platform

•Designed for business use

• Integrated social collaboration

•Freemium to enterprise versions

•Enable self service access & integration of multiple data sources

•Simplified tools to prepare, refine & secure data

•Open application programming interfaces for application development

•On-premise and cloud / internal & external data

• Rapid deployment of large scale data warehouses

• Enables scaling of both volume and processing speed

• Unified architecture that enables hybrid data processing, on-premise & in the cloud

• In-database analytic capabilities for the best analytic performance

DataWorks dashDB Watson Analytics

IBM’s Analytics Cloud Service Ecosystem

Page 21: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

21

dashDB

Page 22: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

• Enterprise Plan• Dedicated

infrastructure• Terabyte-scale

capacity• Closed Beta for

qualified accounts

• Deploy within Bluemix cloud-based environment for analytics and warehousing services

• Ingest data from a wide variety of sources

• In-database analytics included

• Pay as you go• Rapid Deployment

• Auto-provisioning from Cloudant management GUI

• Built-in automated synchronization from Cloudant JSON data stores

• Built-in analytics for Cloudant data

• Pay as you go• Rapid Deployment

1 2 3

dashDB – Available With Three Deployment Choices

Page 23: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

© 2014 IBM Corporation

dashDB Entry Plan

Bare Metal4x8 core

256GB RAM

2x 500GBHDD/root, /opt, /etc

12x 200GB SSD/mnt/bludata0

Swift Object StorageBackup and Metadata

Legacy iSCSI drives(detachable)/mnt/blumeta0

Mount

Data Center 1Gbps connection

1TB local HDD has the OS installed with necessary binaries and scripts stored.– /mnt/blutmp0 (16GB swap space)– /opt, /etc, /usr, /bin …

1.2TB local SSD– /mnt/bludata0 – used for database

Legacy iSCSI drive are used to store DB2 database and configuration.– /mnt/blumeta0 – used for configuration

Backups are stored in Swift Object Storage

Run commands to backup and restore Backs up from iSCSI LUNs to Swift

Restores from Swift to iSCSI LUNs

Data Center

Guardium (Shared)Public Shared 8 Core

16GB RAM100 GB San

DSM (Shared)Public Shared 16 Core

64GB RAM1000GB SAN

Page 24: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

© 2014 IBM Corporation

dashDB Enterprise 1TB Plan

VM #1Public Shared 16 core @ 2.0GHz

64GB RAM

100GB SAN1 (OS)/root/opt/etc

Swift Object StorageBackup and Metadata

1TB SAN2 (detachable)/mnt/bludata0

/mnt//bludata0/blumeta0

Mount

Data Center 1Gbps connection

Backs up from SAN2 to SwiftRestores from Swift to SAN2

100GB SAN1 has the OS installed with necessary binaries and scripts stored.– /mnt/blutmp0 (16GB swap space)– /opt, /etc, /usr, /bin …

1TB SAN2 holds the database and configuration for DB2.– /mnt/bludata0 – used for database– /mnt/blumeta0 -> /mnt/bludata0/blumeta0 – used for configuration

Backups are stored in Swift Object Storage

Swift Object StorageBackup and Metadata

Data Center

Guardium (Shared)Public Shared 8 Core

16GB RAM100 GB San

DSM (Shared)Public Shared 16 Core

64GB RAM1000GB SAN

Page 25: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

© 2014 IBM Corporation

dashDB Enterprise 4TB Plan – Compute Optimized

Bare Metal32 core

256GB RAM

2x 500GBHDD/root/opt/etc

Swift Object StorageBackup and Metadata

4TB Consistent Perf Storage /mnt/bludata0

/mnt/blumeta0

Mount

Data Center 10 Gbps connection

Backs up from Consistent Perf Storage to SwiftRestores from Swift to Consistent Perf Storage

1TB local HDD has the OS installed with necessary binaries and scripts stored.– /mnt/blutmp0 (16GB swap space)– /opt, /etc, /usr, /bin …

4TB Consistent Performance Storage 6K IOPs holds the database and configuration for DB2.– /mnt/bludata0 – used for database– /mnt/blumeta0 – used for configuration

Backups are stored in Swift Object Storage

Swift Object StorageBackup and Metadata

Data Center

Guardium (Shared)Public Shared 8 Core

16GB RAM100 GB San

DSM (Shared)Public Shared 16 Core

64GB RAM1000GB SAN

Page 26: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

© 2014 IBM Corporation

dashDB Enterprise 12TB Plan – Storage Optimized

Bare Metal32 core

256GB RAM

2x 500GBHDD/root/opt/etc

Swift Object StorageBackup and Metadata

12 TB Consistent Perf Storage /mnt/bludata0

/mnt/blumeta0

Mount

Data Center 10 Gbps connection

Backs up from Consistent Perf Storage to SwiftRestores from Swift to Consistent Perf Storage

1TB local HDD has the OS installed with necessary binaries and scripts stored.– /mnt/blutmp0 (16GB swap space)– /opt, /etc, /usr, /bin …

4TB Consistent Performance Storage 6K IOPs holds the database and configuration for DB2.– /mnt/bludata0 – used for database– /mnt/blumeta0 – used for configuration

Backups are stored in Swift Object Storage

Swift Object StorageBackup and Metadata

Data Center

Guardium (Shared)Public Shared 8 Core

16GB RAM100 GB San

DSM (Shared)Public Shared 16 Core

64GB RAM1000GB SAN

Page 27: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

© 2014 IBM Corporation

Server Outage Availability Scenario for dashDB Enterprise 4TB & 12 TB

Bare Metal #1 - Primary32 core

256GB RAM

2x 500GBHDD/root/opt/etc

Swift Object StorageBackup and Metadata

4/12TB Consistent Perf Storage /mnt/bludata0/mnt/blumeta0

Mount

Data Center 10 Gbps connection

Backs up from Consistent Perf Storage to SwiftRestores from Swift to Perf Consistent iSCSI

When primary server (BM #1) fails, its Perf Consistent iSCSI volume is re-mapped from primary server (BM #1) to standby server (BM #2)

Swift Object StorageBackup and Metadata

Data Center

Bare Metal #2 - Standby32 core

256GB RAM

2x 500GBHDD/root/opt/etc

Mount

Guardium (Shared)Public Shared 8 Core

16GB RAM100 GB San

DSM (Shared)Public Shared 16 Core

64GB RAM1000GB SAN

Page 28: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

© 2014 IBM Corporation

Coming Up Soon: Initial dashDB MPP Offering

Probably 8 partitions per node of initial cluster 2 TB storage per cluster node One node comparable to the 4TB SMP offering

– bare metal, 16 cores, 128 or 256 GB memory, local storage Smallest cluster offered: 3 nodes, i.e. 6 TB Grow in one node steps, up to 10 nodes (i.e. 20 TB)

–Distributing entire MLNs of initial cluster instead of redistribute data Larger MPP offerings are going to be rolled out in a second phase All this might still change until we release

Page 29: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

We Bring The Same Compatible Analytic Platform from Netezza to the Cloud

Analytic Extension FrameworkUDX C++ API

Canned Analytics

Application Integration

AE Framework In-DB R In-DB LUAIn-DB Python In-DB Perl

OLAP Functions

ROW_NUMBER

RANK

LAG LEAD

DENSE_RANKLinear

Regression

Kmeans Clustering Decision Tree

Association Rules

Association Rules

Naive Bayes

Spatial Operators

Contains

Touches

Within

Intersects

Crosses

Overlaps

R Wrapper Watson Analytics ESRI ArcGIS Connector …

Analytics Applications of ISVs and Customers

STDDEV

COVAR

……

Page 30: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

The Analytics ChallengeBringing Analytics to the DataAnalytics with dashDBPredictive Analytics with RGeoSpatial Analytics with ESRIAnalytic Extension Framework

Page 31: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

Predictive Analytics With R• Very popular language for statisticians and data miners

> val1 <- c(23,54,100,134,200,252,311)> val2 <- sqrt(val1)> lm_vals <- lm(val1~val2)> summary(lm_vals)

Residuals: 1 2 3 4 5 6 7 23.480 -3.052 -16.814 -18.330 -10.170 2.785 22.102

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -108.570 20.645 -5.259 0.0033 ** val2 22.538 1.667 13.523 3.96e-05 ***

:> plot(lm_vals)

• Built-in support for graphs and charting; Large set of math. and statistic packages due to extensibility and very active community

• Data Frames: tables of data maintained in memory of R runtime> col1 <- c(23,54,100)> col2 <- c(”xyz”, ”abc”, ”123”)> col3 <- c(TRUE, FALSE, TRUE)> myDf <- data.frame(col1, col2, col3)

• Data frames can be populated from DB tables via RODBC package> library(RODBC)> myconn <-odbcConnect("mydsn", uid= "db2inst1", pwd= "secret")> myDf <- sqlQuery(myconn, "select * from employees")

Page 32: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

dashDB

Predictive Analytics With R In dashDB 1/3• Built-in R runtime & R Studio

• ibmdbR package Data frames logically representing data physically residing in Dynamite tables

> con <- idaConnect("BLUDB", "", "")> idaAnalyticsInit(con)> sysusage<-ida.data.frame('DB2INST1.SHOWCASE_SYSUSAGE')> systems<-ida.data.frame('DB2INST1.SHOWCASE_SYSTEMS')> systypes<-ida.data.frame('DB2INST1.SHOWCASE_SYSTYPES’)

Push down of R data preparation to Dynamite> sysusage2 <- sysusage[sysusage$MEMUSED>50000,c("MEMUSED","USERS")]> mergedSys<-idaMerge(systems, systypes, by='TYPEID')> mergedUsage<-idaMerge(sysusage2, mergedSys, by='SID’)

Push down of analytic algorithms to in-db execution> lm1 <- idaLm(MEMUSED~USERS, mergedUsage)

R StudioBrowser

Any R Runtime

ibmdbR

ibmdbR

Page 33: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

Predictive Analytics With R In dashDB 2/3 Dynamite-native implementation of statistical functions

• colnames, cor, cov, dim, head, length, max, mean, min, names, print, sd, summary, var

Logically derived columns pushed down to Dynamite> myDF <- ida.data.frame('DB2INST1.SHOWCASE_SYSUSAGE')> myDF$MemPerUser <- myDF$MEMUSED / myDF$USERS

Sampling of tables in Dynamite> idaSample(myDF, 3)

SID DATE USERS MEMUSED ALERT MemPerUser1 8 2014-02-14 23:39:00.000000 34 5015 f 1472 5 2014-01-22 07:52:00.000000 96 11512 f 1193 7 2013-09-12 05:17:00.000000 39 5592 t 143

Statistics about tables in Dynamite> summary(myDF)

SID USERS MEMUSED ALERT MemPerUser

Min. :0.000 Min. : 3.000 Min. : 350.000 f :3655563 Min. :105.000

1st Qu.:2.000 1st Qu.: 35.000 1st Qu.: 5113.000 t :1344437 1st Qu.:135.000

Median :4.500 Median : 64.000 Median : 9455.000 NA's: NA Median :150.000

Mean : NA Mean : NA Mean : NA Mean : NA

3rd Qu.:7.000 3rd Qu.:111.000 3rd Qu.:16517.000 3rd Qu.:165.000

Max. :9.000 Max. :347.000 Max. :62379.000 Max. :209.000

Statistics about categorical values> idaTable(myDF)

ALERT f t 3655563 1344437

Page 34: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

Predictive Analytics With R In dashDB 3/3 Store R objects in Dynamite database

> myPrivateObjects <- ida.list(type='private’)> myPrivateObjects['series100'] <- 1:100> x <- myPrivateObjects['series100’]> X [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 [45] 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 [67] 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 [89] 89 90 91 92 93 94 95 96 97 98 99 100> names(myPrivateObjects) [1] "series100”> myPrivateObjects['series100'] <- NULL

Manage Dynamite tables> idaExistTable('DB2INST1.SHOWCASE_SYSUSAGE') [1] TRUE> idaShowTables()

Schema Name Owner Type 1 BLUADMIN R_OBJECTS_PRIVATE BLUADMIN T 2 BLUADMIN R_OBJECTS_PRIVATE_META BLUADMIN T 3 BLUADMIN R_OBJECTS_PUBLIC BLUADMIN T 4 BLUADMIN R_OBJECTS_PUBLIC_META BLUADMIN T> myView <- idaCreateView(myDF)> idaIsView(myView) [1] TRUE> idaDropView(myView)> idaIsView(myView) [1] FALSE

Page 35: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

The Analytics ChallengeBringing Analytics to the DataAnalytics with dashDBPredictive Analytics with RGeoSpatial Analytics with ESRIAnalytic Extension Framework

Page 36: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

The Power of Place

• Spatial Awareness is a dramatically increasing property of big data due to mobile computing and Internet of Things

• Spatial Insight is directly available in dashDB through built-in spatial data type and operators, like for instance: WITHIN – E.g.: Show me the clients that are affected by a

power outage! OVERLAPS – E.g.: What are my cell phone customers who

are at risk of cell tower service outage due to upcoming tornados?

TOUCHES – E.g.: Give me the neighboring ZIP areas per customer for customized marketing campaigns!

DISTANCE – E.g.: List me the top 5 closest stores! DISJOINT – E.g.: What are candidates of insurance fraud

because a client submitted a claim from a different place than the case is for?

… and ~100 further operators• Supported and leveraged by ESRI – major spatial tooling

vendor

Page 37: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

GeoSpatial Analytics In dashDB

• Implements ISO SQL/MM standard for spatial See

http://www.iso.org/iso/catalogue_detail.htm?csnumber=38651

• Spatial data type ST_GEOMETRY (hierarchy)• Enables spatial joins in database through spatial

operators available as user defined functions• Dedicated support in ESRI tools starting V 10.3 http://www.esri.com/software/arcgis/arcgis-for-desktop/free-trial 

• GeoSpatial Applications Examples Telco Location Data Utilities Smart Grid GPS Tracking in Transportation Insurance Demographics Cable Marketing Campaigns Retail Store Placement

Page 38: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

Examples of using ESRI ArcGIS with dashDB 1/3Load spatial data into dashDBDiscover & browse spatial data with ArcCatalog

Counties

Tornado paths over recent 50 years

Page 39: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

Examples of using ESRI ArcGIS with dashDB 2/3Combine spatial data from dashDB into interactive maps with ArcMap

Page 40: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

Examples of using ESRI ArcGIS with dashDB 3/3Perform spatial joins in dashDB using query layers and visualize results ArcMap

Tornado risk per county

Page 41: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

41

Insurance Risk Analysis – Show case overview

Public spatial data sets available online- Historical tornados from 1950s to today

http://www.spc.noaa.gov/gis/svrgis/- Current tornado weather warnings

http://www.nws.noaa.gov/regsci/gis/shapefiles/- US counties

https://www.census.gov/geo/maps-data/data/tiger-line.html

Mobile application generating

spatial data for insurance claims for tornado damage

Cloud warehouse service for analytics and correlation

between customer data and public or third party data

Visualization and spatial analysis capabilities by

Esri ArcGIS

www.bluemix.net

www.cloudant.comdashDB

Cloud service for persistency of

system of engagementInsurance Master Data (customers)

Page 42: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

© 2010 IBM Corporation

Information Management

Twitter-dashDB Show Case (www.youtube.com/watch?v=9yVNwOs9L4c)

http://american-sniper-analysis.mybluemix.net

Page 43: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

The Analytics ChallengeBringing Analytics to the DataAnalytics with dashDBPredictive Analytics with RGeoSpatial Analytics with ESRIAnalytic Extension Framework

Page 44: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

The Two Elements of Analytic Extension Framework1. User Defined Extension – UDX – C++ API

Three types of UDXs:• Scalar Functions

SELECT MyXForm(Col1, Col2) FROM MyTab

• Aggregate FunctionsSELECT Col1, MyAgg(Col2) FROM MyTab GROUP BY Col1

• Table FunctionsSELECT b.MyCol1 FROM MyTab a, TABLE(MyTableFunc(a.Col1, a.Col2)) AS b

C++ code compiled and linked within dashDB service Registered via DDL, e.g.

CREATE FUNCTION MyXForm(VARCHAR(ANY), INTEGER) RETURNS VARCHAR(ANY) LANGUAGE CPP PARAMETER STYLE NPSGENERIC EXTERNAL NAME ’mylib.so!cMyFunc’CREATE FUNCTION MyAgg(INTEGER) LANGUAGE CPP RETURNS DOUBLE AGGREGATE WITH (SUM INTEGER) PARAMETER STYLE NPSGENERIC External Name 'mylib.so!cMyAgg'CREATE FUNCTION MyTableFunc(VARARGS) RETURNS TABLE (Col1 INTEGER) LANGUAGE CPP PARAMETER STYLE NPSGENERIC External Name 'mylib.so!cMyUDTF’

2. REST API & tooling for development & deployment:pushFile, pullFile, executeCC, compile, link, promote, createPackage, deployPackage, getProjList, getFileList, executeDDL, executeSQL, dropUDX, ...

Page 45: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

class cMyFunc: public nz::udx_ver2::Udf{public: cMyFunc(UdxInit *pInit) : Udf(pInit) { } static nz::udx_ver2::Udf* instantiate(UdxInit *pInit);

virtual nz::udx_ver2::ReturnValue evaluate() { int int1= int32Arg(0); int int2= int32Arg(1); int retVal = int1 * int2;

NZ_UDX_RETURN_INT32(retVal); }};nz::udx_ver2::Udf* cMyFunc::instantiate(UdxInit *pInit){ return new cMyFunc(pInit);}

User Defined Scalar Function API Example

Page 46: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

class cMyAgg: public nz::udx_ver2::Uda{

public: GenericSum(UdxInit *pInit) : Uda(pInit) { } static nz::udx_ver2::Uda* instantiate(UdxInit *pInit);

void initializeState() { int64 *s = int64State(0); *s = 0; setStateNull(0, false); } //Accumulate data in states. virtual void accumulate() { if (isArgNull(0)) return; int64 *s = int64State(0); *s += int16Arg(0); } //States flowed in as input; Merge back in state virtual void merge() { accumulate();

} //Merged data copied to input virtual ReturnValue finalResult() { if (isArgNull(0)) NZ_UDX_RETURN_NULL(); setReturnNull(false); NZ_UDX_RETURN_INT64(int64Arg(0)); }};

User Defined Aggregate Function API Example

Page 47: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

class OneUdtf : public nz::udx_ver2::Udtf{private: int32 argInt, xcount;public:

static nz::udx_ver2::Udtf* instantiate(UdxInit *pInit);

OneUdtf(UdxInit *pInit) : Udtf(pInit) { }

static nz::udx_ver2::Uda* instantiate(UdxInit *pInit);

virtual void newInputRow(){ argInt=0;for (int i = 0; i < numArgs(); i++){

if(argType(i) == UDX_INT32){argInt = int32Arg(i);

}else{

throwUdxException( "Unknown type");}

}xcount = 1;

} virtual DataAvailable nextOutputRow(){

if (xcount > 5)return Done;

for (int i=0; i < numReturnColumns(); i++) {setReturnColumnNull(i, false);if (returnTypeColumn(i) == UDX_INT32){

*int32ReturnColumn(i) = argInt + xcount;}else{

throwUdxException( "Unknown type");}

}xcount++;return MoreData;

}};

User Defined Table Function API Example

Page 48: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

dashDB

push File

REST API

.cpp

compile

.o

pro mote

exeuteDDL

Command Line 3rd Party IDEs

Cloud Web IDE

release

create Package

LogsLogs

pull File

.o .o

Run SQL

.cpp.cpp BLUDB

Catalog

dashDB Developer

Setup

Analytic Extension Development Process

unde

r con

sider

ation

unde

r con

struc

tion

DRDA

link

.so

.zip

deploy Package

Page 49: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

Some Examples Highlighting the REST APILogin and keep a cookie for the sessioncurl -d j_username=<User> -d j_password=<PW> https://<IP>:8443/services/loginService -c ck.dat

Upload source filescurl –F cmd=pushFile –F proj=udsf1 –F subDir=src --form "file[0]=@./udsf1.cpp" --form "file[1]=@./opr.cpp" --form "file[2]=@./opr.h" https://<IP>:8443/ida -b ck.dat

Compile source filescurl –d cmd=compile –d proj=udsf1 –d targetDir=bin -d "files={\"files\":[\"src/udsf1.cpp\”]}” https://<IP>:8443/ida -b ck.dat

Link object filescurl –d cmd=link–d proj=udsf1 –d targetDir=bin -d "files={\"files\":[\"bin/udsf1.o\“]}“ https://<IP>:8443/ida -b ck.dat

Alternatively: low-level cc invocationcurl –d cmd=executeCC –d proj=udsf1 -d "args=-m64 -Wall -fPIC -c -D_CPLUSPLUS src/udsf1.cpp -I/mnt/blumeta0/home/db2inst1/sqllib/include -o udsf1.o” https://<IP>:8443/ida -b ck.dat

Promote linked binaries to release directorycurl –d cmd=promote –d proj=udsf1 -d "files=lib*.so“ https://<IP>:8443/ida -b ck.dat

Register UDX with DDLcurl –d cmd=executeDDL –d profileName=BLUDB -d "ddl=CREATE FUNCTION udf1(INT) RETURNS INT LANGUAGE CPP PARAMETER STYLE NPSGENERIC FENCED EXTERNAL NAME '/mnt/blumeta0/home/bluadmin/projects/udsf1/release/libudsf1.so!CUdf';" https://<IP>:8443/blushiftservices/BluShiftHttp.do -b ck.dat

Page 50: IBM Insight 2014 - Advanced Warehouse Analytics in the Cloud

A Proof Point of UDX Support in dashDB

We have working prototype of the entire the Netezza SQL Extension Toolkit for dashDB !!