Slides pentaho-hadoop-weka

Preview:

DESCRIPTION

 

Citation preview

F**** around with Big Data and Predictive Analytics

Featuring Kettle, Weka & Hadoop.

© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Pentahuh?

2© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

What’s Pentaho exactly?

CENTRAL ADMINISTRATION, AUDITING & MONITORING

DELIVER When & WhereUsers Need It

STREAMLINE Information Delivery

VISUALIZE& Report Information In Any Style

ACCESSAll Enterprise Data Sources

ISV & Packaged Applications

SaaS / Cloud Applications

EMBEDDED

Web

Mobile

Print

E-Mail

STANDALONE

‣ Advanced & Predictive Analytics

DATA MINING

‣ Interactive

‣ Operational

‣ Enterprise

REPORTING

‣ Ad hoc Exploration

‣ Multi-Dimensional

ANALYSIS

‣ Interactive Metrics

‣ Rich Visualizations

DASHBOARDS

ERP / CRM / Enterprise Apps (e.g. SAP, Oracle)

Hadoop & NoSQL Data

Unstructured & semi-structured (XML, Excel, Files, etc.)

Relational Data Sources

Cloud(e.g. Salesforce, Amazon, Dell)

‣Data Integration

‣ Graphical ETL Designer

INTEGRATE, CLEANSE, & ENRICH DATA

‣ In Memory Caching

‣ High Performance

ANALYTICS ACCELERATOR

‣ Direct Access

‣ Hadoop Clustering/ Scheduling

‣ Instant OLAP Cubes

‣ Enterprise Scalability

We do open source analytics.

4© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Why does Pentaho claim to have anything to do with Big Data??

5© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Project Kettle powerful Extraction, Transformation and Loading (ETL) capabilities

using an innovative, metadata-driven approach

6© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Bring the code to the data

7© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

JDBC

Bring the code to the data

8© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

JDBCKettle

KettleKettle

Bring the code to the data

9© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Kettle

Project Weka a comprehensive set of tools for machine learning and data mining

10© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

11© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

12© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Bring Weka to the data

13© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Kettle

Kettle

JDBCKettle

Kettle

Bring Weka to the data

14© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

JDBC Services for Kettleruntime optimization and SQL pushdown

15© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

A smart(er) JDBC Layer

16© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Kettle

Kettle

Kettle

Kettle JDBC

SELECT CUSTOMER_ID, SUM(UNIT_SALES)

FROM SALES_FACT

WHERE AGE_GROUP_ID > 3

GROUP BY CUSTOMER_ID;

SELECT CUSTOMER_ID

FROM SALES_FACT;

SELECT CUSTOMER_ID, SUM(UNIT_SALES)

FROM SALES_FACT

WHERE AGE_GROUP_ID > 3

GROUP BY CUSTOMER_ID;

A smart(er) JDBC Layer

17© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Kettle

Kettle

Kettle

Kettle Kettle JDBC

Kettle

Kettle

The gains

18© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

• Job design and

administration becomes

trivial.

• Runs the rich Kettle plugin

environment directly on the

nodes.

• Performs much better than

Hive.

• The JDBC layer is pretty

neat.

The caveats

19© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

• True parallel machine

learning algorithms are rare

and hard to design.

• Not an actual

production-ready design.

• Clients might have caches,

which must be notified by

the BD store for updates.

© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755520

Demo!

© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755521

Thank you!

Join the conversation. You can find us on:

blog.pentaho.com

@Pentaho

Facebook.com/Pentaho

Pentaho Business Analytics

© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755522

Want to learn more?

Learning Linear Models in Hadoop with Wekahttp://markahall.blogspot.ca/2013/03/learning-linear-models-in-hadoop-with.html

Introduction to MapReduce with Pentaho Data Integrationhttp://www.youtube.com/watch?v=KZe1UugxXcs`

Recommended