25
Zementis © Michael Zeller, Ph.D. CEO Zementis, Inc. EDM Summit October 30, 2008 Agile Deployment of Predictive Analytics Using Amazon EC2

Zeller Edm Summit Agile Deployment Of Predictive Analytics

Embed Size (px)

Citation preview

Page 1: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis ©

Michael Zeller, Ph.D.CEO Zementis, Inc.

EDM SummitOctober 30, 2008

Agile Deployment of Predictive AnalyticsUsing Amazon EC2

Page 2: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © - Confidential 2

Presentation Outline

1. Predictive Analytics

2. Development, Integration, and Deployment of predictive models … from R to WS to PMML

Focus on PMML, the Predictive Modeling Markup Language

3. Cloud Computing and Amazon EC2

4. Bringing it all together: from the desktop to the cloud

Page 3: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 3

What is Predictive Analytics http://en.wikipedia.org/wiki/Predictive_analyticshttp://en.wikipedia.org/wiki/Predictive_analytics

Data Decisions

PredictiveModels

Data Decisions

PredictiveModels

Predictive AnalyticsPredictive Analytics

Science that makes decisions smarter by uncovering hidden data trends not obvious to the human eye.

Predicts future customer behaviour with today’s data.

Science that makes decisions smarter by uncovering hidden data trends not obvious to the human eye.

Predicts future customer behaviour with today’s data.

Page 4: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 4

Improve Processes

1

2

3

Leverage in Every Business Process

Shorten Time-to-Market

Reduce Complexity and Cost

Business Objective for Predictive Analytics

Make Smarter DecisionsAutomate Decisions

Agility to Quickly Change with Market Conditions

Ensure Consistent Decisions

Page 5: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 5

Presentation Outline

1. Predictive Analytics

2. Development, Integration, and Deployment of predictive models … from R to WS to PMML

Focus on PMML, the Predictive Modeling Markup Language

3. Cloud Computing and Amazon EC2

4. Bringing it all together: from the desktop to the cloud

Page 6: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 6

Integration

Web-service calls allow for fast integration

Development

OpenStandards

R allows for reliable data manipulation and model building

Deployment

PMML allows for easy expression and deployment of

data transformations and data-mining models

Development, Integration, and Deployment

Page 7: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 7

Development, Integration, and Deployment

Integration

Web-service calls allow for fast integration

Development

R allows for reliable data manipulation and model building

Deployment

PMML allows for easy expression and deployment of

data transformations and data-mining models

OpenStandards

Page 8: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 8

The R ProjectThe R Project

R is an integrated suite of software facilities for data manipulation, calculation and graphical display.

R provides a wide variety of statistical techniques and is highly extensible.

R is similar to the S language and environment developed at Bell Labs.

It is Open Source and a GNU project. R is available for free at http://www.r-project.org/

R is an integrated suite of software facilities for data manipulation, calculation and graphical display.

R provides a wide variety of statistical techniques and is highly extensible.

R is similar to the S language and environment developed at Bell Labs.

It is Open Source and a GNU project. R is available for free at http://www.r-project.org/

R

Model Development

Page 9: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 9

Web-Services for IntegrationWeb-Services for Integration

Service Oriented Architecture (SOA) Defined as a group of services that communicate

with each other Implementation via Web Services

Establishes a “loosely coupled” infrastructure Allows the business to respond more quickly and

cost-effectively to changes in market conditions Increases Interoperability

Integrate various systems Agnostic to specific languages (Java, .Net, Cobol) Utilize internal and external services

Service Oriented Architecture (SOA) Defined as a group of services that communicate

with each other Implementation via Web Services

Establishes a “loosely coupled” infrastructure Allows the business to respond more quickly and

cost-effectively to changes in market conditions Increases Interoperability

Integrate various systems Agnostic to specific languages (Java, .Net, Cobol) Utilize internal and external services

WS

Integration

Page 10: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 10

Predictive Modeling Markup LanguagePredictive Modeling Markup Language

PMML is an XML-based language to Define statistical and data mining models Share models between compliant applications

Standard for exchange of models to Avoid proprietary issues and incompatibilities Deploy models in operational infrastructure

Clear separation of tasks Model development vs. model deployment Scientists focus on building the best model Eliminates need for custom model deployment Ensures scalability and reliability

PMML is an XML-based language to Define statistical and data mining models Share models between compliant applications

Standard for exchange of models to Avoid proprietary issues and incompatibilities Deploy models in operational infrastructure

Clear separation of tasks Model development vs. model deployment Scientists focus on building the best model Eliminates need for custom model deployment Ensures scalability and reliability

PMML

Deployment and Execution

Page 11: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 11

Mature and Supported by IndustryMature and Supported by Industry

Data Mining Group http://www.dmg.org Mature standard

Current version 3.2 Active group and constant enhancements

Vendor independent consortium Industry supporters

Major Players: IBM, Oracle, SAP, Microsoft Analytics: SAS, SPSS, Fair Isaac, Zementis Business Intelligence: MicroStrategy, Teradata Open Source: R

Data Mining Group http://www.dmg.org Mature standard

Current version 3.2 Active group and constant enhancements

Vendor independent consortium Industry supporters

Major Players: IBM, Oracle, SAP, Microsoft Analytics: SAS, SPSS, Fair Isaac, Zementis Business Intelligence: MicroStrategy, Teradata Open Source: R

PMML

PMML Industry Support

Page 12: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 12

PMMLModels

PMML defines a standard not only to represent data-mining models, but also data handling and data transformations (pre- and post-processing)

Data Transformations and Data-Mining Models come together in PMML.

Predictive Modeling Markup LanguagePredictive Modeling Markup Language

A Data Dictionary defines all the raw data fields (including missing value strategy and outlier treatment).

Several Data Transformations strategies allow for intelligent extraction of feature detectors from raw data (“data massaging”).

A comprehensive list of Data-Mining

Models offers power and flexibility.

Post-processing of results allow for tailored decisions

A Data Dictionary defines all the raw data fields (including missing value strategy and outlier treatment).

Several Data Transformations strategies allow for intelligent extraction of feature detectors from raw data (“data massaging”).

A comprehensive list of Data-Mining

Models offers power and flexibility.

Post-processing of results allow for tailored decisions

PMMLBringing data and Models Together

Transformations

Page 13: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 13

Presentation Outline

1. Predictive Analytics

2. Development, Integration, and Deployment of predictive models … from R to WS to PMML

Focus on PMML, the Predictive Modeling Markup Language

3. Cloud Computing and Amazon EC2

4. Bringing it all together: from the desktop to the cloud

Page 14: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 14

Data Analysis

Statistical Model

PMML Export

Got Models…

What Now?

Page 15: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 15

You are already using it everyday: Google search, Gmail, Salesforce.com, …

Computing as a “Utility”Evolve from “World Wide Web” to “World Wide Computer”

Applications/Services/Storage/CPUs on the Internet.Incorporate via SOA Standards and Web Services.

Emerging Utility Providers: Amazon, Sun, Google...Promise: Scalable, secure, reliable and more cost-effective than do-it-yourself.

Cloud Computing

Page 16: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 16

Cloud Computing Benefits

SaaS

Pay-As-You-Go

SaaS

Pay-As-You-Go

SOA

Open Standards

SOA

Open StandardsUtility Computing

Paradigm

Utility Computing

Paradigm

• Open Standards vs. Proprietary Code

• Select Best-of-Breed Services & Applications

• Avoid Vendor Lock-in

• Deployment in Minutes vs. Months

• No In-house Hardware/Software to Maintain

• Scales with Business Demand

• Operational Cost vs. Capital Expenditures

• No Long-term Committment• Only Pay for Actual Usage

Page 17: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 17

Cost-effective and Reliable Software as a Service (SaaS) Based on Amazon’s Infrastructure

Secure Dedicated, Controlled Instances HTTPS & WS-Security

Elastic Choice of Instance Type (S,L,XL) Launch Multiple Instances on Demand

Superior Time-to-Market Commission Your Instance in Minutes Minimal Learning Curve: Based on

LINUX / Open Source and Commodity Hardware

Amazon Elastic Compute Cloud (EC2) Amazon Web Services

Page 18: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 18

Presentation Outline

1. Process overview and Predictive Analytics

2. Development, Integration, and Deployment of predictive models … from R to WS to PMML

Focus on PMML, the Predictive Modeling Markup Language

3. Cloud Computing and Amazon EC2

4. Bringing it all together: from the desktop to the cloud

Page 19: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 19

Open Standards &Open Source Solutions

Cloud ComputingCloud Computing

Internet-centric use of applications, services, or computers as a “utility”.

Incorporates SaaS and SOA concepts.

Scalable, Secure, Cost-effective.

Internet-centric use of applications, services, or computers as a “utility”.

Incorporates SaaS and SOA concepts.

Scalable, Secure, Cost-effective.

Predictive AnalyticsPredictive Analytics

Science that makes decisions smarter by uncovering hidden data trends not obvious to the human eye.

Predicts future customer behaviour with today’s data.

Science that makes decisions smarter by uncovering hidden data trends not obvious to the human eye.

Predicts future customer behaviour with today’s data.

Bringing it all together

Bridging the gap

Recapping …

Page 20: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 20

Model BuildingExtensive data analysis and manipulation as well as selection of the most appropriate statistical modeling technique.

These steps can then be represented as a PMML file.

Model ExecutionAmazon EC2 offers

utility computing with virtually unlimited

scalability as well as flexible choice of server size based on memory

and processing needs.

From the Desktop to the Cloud:Bridging the Gap with ADAPA

ADAPA(WS & PMML)

Page 21: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 21

Scalable Execution PlatformScalable Execution PlatformScalable Execution PlatformScalable Execution Platform

Environment to Manage Models and RulesEnvironment to Manage Models and RulesEnvironment to Manage Models and RulesEnvironment to Manage Models and Rules

Framework for SOA-based IT IntegrationFramework for SOA-based IT IntegrationFramework for SOA-based IT IntegrationFramework for SOA-based IT Integration

ADAPA is not ...ADAPA is not ...ADAPA is not ...ADAPA is not ...

Data transformation and model execution in real-time or batch mode.

Deploys one or many models or rules sets. Manages and maintains these through a web console.

Completely standards-based and easily integrated into any existing infrastructure.

Not a model development environment.

ADAPAADAPA

SaaS Predictive Analytics on Amazon EC2The ADAPA Example

Page 22: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 22

Bridging the Gap: Highlights

Ability to Quickly Deploy Predictive Models

Open Standards for Integration and Models

Leverages a Scalable & Secure Infrastructure

Ability to Execute Predictive Models in Real-time

Web 2.0 Support

Low TCO and Fast Time-to-Market

Predictive Analytics

&Cloud

Computing

Predictive Analytics

&Cloud

Computing

Page 23: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 23

1 Data Extraction and Analysis

Model Building

PMML Export

PMML Import

Web-Service Calls

Model Execution

23456

1 through 6 – From Raw Data to Smart Decisions

Page 24: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 24

1

2

34

Fast and effective creation of your presentation

Appealing visualization of

your contents

Your

Enterprise

Decision

Management

Strategy

Page 25: Zeller Edm Summit   Agile Deployment Of Predictive Analytics

Zementis © 25

Thank You!

U.S.A Asia

E-mail: [email protected]

19/F., Unit AHo Lee Commercial Building38-44 D’Aguilar StreetCentral, Hong Kong (S.A.R.)

Tel: +852 2868-0878Fax: +852 2845-6027

6125 Cornerstone Court EastSuite 250San Diego, CA, 92121

Tel: +1 619 330-0780Fax: +1 858 535-0227