19
©2015 Slide 1 Prepared for: BDA Meetup Turbocharging CDAP Applications With Ampool Milind Bhandarkar, (@techmilind) Founder & CEO @AmpoolIO

Turbocharging CDAP Applications with Ampool

Embed Size (px)

Citation preview

Page 1: Turbocharging CDAP Applications with Ampool

©2015Slide 1

Prepared for:BDA Meetup

Turbocharging CDAP Applications With AmpoolMilind Bhandarkar, (@techmilind)Founder & CEO @AmpoolIO

Page 2: Turbocharging CDAP Applications with Ampool

©2015Slide 2

Prepared for:BDA Meetup

Ampool Vision

Pipelines w/ CDAP

IMDG w/ Geode

Ampool w/ CDAP

Q & A

Outline 1

2

3

4

Ampool Vision

Pipeline/ CDAP

IMDG / Geode

Ampool/ CDAP

5

Q & A

Page 3: Turbocharging CDAP Applications with Ampool

©2015Slide 3

Prepared for:BDA Meetup

Data Processing & Storage layers have evolved for scale-out

Unstructured Structured

Pers

iste

nce

Proc

essi

ng ImmutableMutable

Unmanaged Managed

Log Publish

QTx

ETL

In the beginning…

As app users & data grew…

Big Data/ App Explosion!

Ampool Vision

Pipeline/ CDAP

IMDG / Geode

Ampool/ CDAP

Q & A

Page 4: Turbocharging CDAP Applications with Ampool

©2015Slide 4

Prepared for:BDA Meetup

ImmutableMutable

Unmanaged Managed

Log Publish

ETL

Build a Processing & Storage-agnostic Memory Architecture

Unstructured Structured

Pers

iste

nce

Proc

essi

ng

Unify data processing

Design for Scale-out

Best of breed data engines!

ampool

Data Frame

Data Set

QTxAmpool Vision

Pipeline/ CDAP

IMDG / Geode

Ampool/ CDAP

Q & A

Page 5: Turbocharging CDAP Applications with Ampool

©2015Slide 5

Prepared for:BDA Meetup

Ampool’s Mission:To help build real-time customer experiences through high-performance analytics built for modern, commodity hardware platforms

For the community:To speed-up big, real-time analytics in a democratic way through a memory-centric architecture (complementing existing architectures), driving better interoperability between compute and storage layers.

Ampool Vision

Pipeline/ CDAP

IMDG / Geode

Ampool/ CDAP

Q & A

Page 6: Turbocharging CDAP Applications with Ampool

©2015Slide 6

Prepared for:BDA Meetup

AnalyticsIngest App UseETL

Big Data Processing Pipelines…use slow, persistent storage for data exchange today!

…!

" # # #

Ampool Vision

Pipeline/ CDAP

IMDG / Geode

Ampool/ CDAP

Q & A

Page 7: Turbocharging CDAP Applications with Ampool

©2015Slide 7

Prepared for:BDA Meetup

AnalyticsIngest App UseETL

…!

" # # #

AMPOOL: Fast memory across distributed compute clusters...driving performance, simplicity and agility

ampool …

Ampool Vision

Pipeline/ CDAP

IMDG / Geode

Ampool/ CDAP

Q & A

Page 8: Turbocharging CDAP Applications with Ampool

©2015Slide 8

Prepared for:BDA Meetup

AnalyticsIngest App UseETL

!

"

Energy ManagementIoT Analytics

Data ingestion flows:• Smart meter data

(Kafka)

Hive processing:• De-norm, Sessionize• Aggregations

Spark processing:• Linear Regression• Export to HBase

Downstream Apps:• Web app integration

…ampool

HDFS

Ampool Vision

Pipeline/ CDAP

IMDG / Geode

Ampool/ CDAP

Q & A

Page 9: Turbocharging CDAP Applications with Ampool

©2015Slide 9

Prepared for:BDA Meetup

Pipeline implemented in CDAPAmpool Vision

Pipeline/ CDAP

IMDG / Geode

Ampool/ CDAP

Q & A

Page 10: Turbocharging CDAP Applications with Ampool

©2015Slide 10

Prepared for:BDA Meetup

CDAP Application

Ampool Vision

Pipeline/ CDAP

IMDG / Geode

Ampool/ CDAP

Q & A

Page 11: Turbocharging CDAP Applications with Ampool

©2015Slide 11

Prepared for:BDA Meetup

In-memory TechnologyWhat is Apache Geode?

Ampool Vision

Pipeline/ CDAP

IMDG / Geode

Ampool/ CDAP

Q & A

Page 12: Turbocharging CDAP Applications with Ampool

©2015Slide 12

Prepared for:BDA Meetup

How does it compare with the Big Data stack?YCSB: Geode & HBase

Ampool Vision

Pipeline/ CDAP

IMDG / Geode

Ampool/ CDAP

Q & A

Page 13: Turbocharging CDAP Applications with Ampool

©2015Slide 13

Prepared for:BDA Meetup

Ampool with CDAP

CDAP with HBase

(as-is Application)

Configuration ChangesExtension modules/directoryDistributed Mode table/stream

CDAP with Ampool(powered by Geode)

Ampool Vision

Pipeline/ CDAP

IMDG / Geode

Ampool/ CDAP

Q & A

Page 14: Turbocharging CDAP Applications with Ampool

©2015Slide 14

Prepared for:BDA Meetup

Ampool Vision

Pipeline/ CDAP

IMDG / Geode

Ampool/ CDAP

Q & ACDAP Demo Pipeline(Video)

Page 15: Turbocharging CDAP Applications with Ampool

©2015Slide 15

Prepared for:BDA Meetup

Ampool with CDAPPipeline Baseline: Ampool & HBase

Ampool Vision

Pipeline/ CDAP

IMDG / Geode

Ampool/ CDAP

Q & A

Page 16: Turbocharging CDAP Applications with Ampool

©2015Slide 16

Prepared for:BDA Meetup

• CDAP simplifies the development of complex big data pipelines and offers extensibility at multiple layers

• In-memory technology such as Geode promise higher performancein certain use-cases

• Ampool, powered by Geode, is able to show immediate performance gains without any pipeline re-engineering!

• Future…

Key TakeawaysAmpool complements CDAP…

Page 17: Turbocharging CDAP Applications with Ampool

©2015Slide 17

Prepared for:BDA Meetup

C o m p a t i b l e w i t h t h e F u t u r e

Page 18: Turbocharging CDAP Applications with Ampool

©2015Slide 18

Prepared for:BDA Meetup

AnalyticsIngest App UseETL

ampool

Customer BehaviorPredictive Modeling

Data ingestion flows:• Click streams (Kafka)• Dim. tables (Sqoop)

2-stage MR pipeline:• Cleanse data• Sessionize clickstream

HAWQ stages:• Data import (PxF)• Exp. features (MADlib)

Spark modeling stages:• Feature analysis (MLlib)• Scoring (R/ HAWQ)

…HDFS

!

"

Page 19: Turbocharging CDAP Applications with Ampool

©2015Slide 19

Prepared for:BDA Meetup

AnalyticsIngest App UseETL

Security AnalyticsBig Data Insights

Data ingestion flows:• Security Logs (Flume)

Pig data processing:• Joins logs w/ catalog• Stores denorm. logs

Kylin stages:• Pre-aggregations• Export to HBase

Downstream Apps:• Drill-down API for logs• Web app integration

…ampool

!

"

HDFS