22
Breathe New Life Into Your Data Warehouse by Offloading ETL on Hadoop Shahab Kamal Supreet Oberoi

Breathe new life into your data warehouse by offloading etl processes to hadoop

Embed Size (px)

Citation preview

Page 1: Breathe new life into your data warehouse by offloading etl processes to hadoop

Breathe New Life Into Your Data Warehouse by Offloading ETL on Hadoop

Shahab KamalSupreet Oberoi

Page 2: Breathe new life into your data warehouse by offloading etl processes to hadoop

ABOUT CONCURRENT

2

TRUSTEDby over 10,000

companies as their big data app platform

BACKEDby top Silicon Valley

investors True Ventures,Rembrandt VP, Bain

Capital

FOUNDEDin 2008, with

headquarters in San Francisco

Page 3: Breathe new life into your data warehouse by offloading etl processes to hadoop

•Founded in 1995

•HQ in Chicago, IL

•Offices in India & Australia

• ISO 9001:2008 & ISO

27001:2005 Certified

Most experienced

data professionals

Proprietary frameworks and accelerators for

guaranteed, efficient and cost-effective services for data

projects

ABOUT BITWISE

3

Page 4: Breathe new life into your data warehouse by offloading etl processes to hadoop

ENTERPRISE ENGAGEMENT WITH HADOOP IS GAINING DEPTH…

4

Improving brand experience, creating new revenue channels, enhancing operational visibility to risk & compliance, reducing

TCO have been the key drivers, engaging at levels of CEO, CIO, CDO

Page 5: Breathe new life into your data warehouse by offloading etl processes to hadoop

Analytics

EMERGING ENTERPRISE ARCHITECTURE FOR HADOOP

5

Reporting Mining Analytics

Exploratory Discovery Search

Data Mart

ReportingData Mining

STAGE TRANSFORM ARCHIVE

Data Lake

Page 6: Breathe new life into your data warehouse by offloading etl processes to hadoop

CASE STUDY

6

RECOVERY APPLICATIONRECOVERY APPLICATIONDATA SOURCE

ANALYTICSANALYTICS

REPORTINGSREPORTINGS

Developer UI

XMLCustomCode

Execution Service

Cascading Framework

Generate Cascading Flow

Launch MapReduce Jobs

On ExecutionETL Application

ETL Application

RECOVERY APPLICATIONRECOVERY APPLICATIONDATA SOURCE

ANALYTICSANALYTICS

REPORTINGSREPORTINGS

Automated ETL

Conversion

RDBMSRDBMS

RDBMS

Data Quality

Monitoring

Dat

a Q

ualit

y M

onito

ring

ETLTesting

Page 7: Breathe new life into your data warehouse by offloading etl processes to hadoop

ETLConversion QualiDI Data Quality

Framework

BITWISE ETL TOOL ARCHITECTURE

7

Developer UI

XMLCustomCode

Execution Service

CascadingFramework

Development Environment

Page 8: Breathe new life into your data warehouse by offloading etl processes to hadoop

DRIVEN PROVIDES OPERATIONAL READINESS TO ETL WORKLOADS

PERFORMANCE MANAGEMENT FOR BIG DATA APPLICATIONS

higher quality big data apps

BUILDbig data apps more reliably

RUNbig data apps

more effectively

MANAGE

Page 9: Breathe new life into your data warehouse by offloading etl processes to hadoop

BUILD HIGHER QUALITY BIG DATA APPS

9

SOURCES OPERATIONS(Functions, filters, joins, and aggregators)

RESULTS

Fully visualize your entire data pipeline Quickly and easily identify execution errors

Page 10: Breathe new life into your data warehouse by offloading etl processes to hadoop

10

BUILD HIGHER QUALITY BIG DATA APPSFully visualize your entire data pipeline Quickly and easily identify execution errors

Page 11: Breathe new life into your data warehouse by offloading etl processes to hadoop

RUN BIG DATA APPS MORE RELIABLY

11

CURRENTLY EXECUTING

Watch your apps execute in real time Easily detect apps that violate SLA’s and policiesPinpoint bottlenecks and identify causes

Page 12: Breathe new life into your data warehouse by offloading etl processes to hadoop

RUN BIG DATA APPS MORE RELIABLY

12

Pinpoint bottlenecks and identify causes

EXECUTING WAITING

Watch your apps execute in real time Easily detect apps that violate SLA’s and policiesPinpoint bottlenecks and identify causes

DETAILED MAPPER/REDUCER STATS

Page 13: Breathe new life into your data warehouse by offloading etl processes to hadoop

RUN BIG DATA APPS MORE RELIABLY

13

Pinpoint bottlenecks and identify causes

Watch your apps execute in real time Easily detect apps that violate SLA’s and policiesPinpoint bottlenecks and identify causes

For example, see metrics for all apps on the production cluster that failed to execute in under 5 minutes…

…or all applications that use more than their allotment of mappers

Page 14: Breathe new life into your data warehouse by offloading etl processes to hadoop

MANAGE BIG DATA APPS MORE EFFECTIVELY

14

See how all apps consume resources as they run Compare performance, resource consumption, and other metrics across departments, teams and any segment you define

Page 15: Breathe new life into your data warehouse by offloading etl processes to hadoop

MANAGE BIG DATA APPS MORE EFFECTIVELY

15

See how all apps consume resources as they run Segment performance by team, by department or custom tags for role-based views, chargeback models, and capacity planning

For example, see performance of all apps owned by the DevOps team

Marketing Sales Compliance Data science team QA cluster Production cluster

Page 16: Breathe new life into your data warehouse by offloading etl processes to hadoop

MANAGE BIG DATA APPS FOR COMPLIANCE

16

Visualize Lineage – See exactly how each app ingests, manipulates and outputs data

Further inspect lineage by detecting apps that write to, or read from, a given dataset

SOURCES OPERATIONS(Functions, filters, joins, and aggregators)

RESULTS

Page 17: Breathe new life into your data warehouse by offloading etl processes to hadoop

MANAGE BIG DATA APPS FOR COMPLIANCE

17

Visualize Lineage – See exactly how each app ingests, manipulates and outputs data

Further inspect lineage by detecting apps that write to, or read from, a given dataset

For example, show all apps that interact with the dataset in “rain.txt”

Page 18: Breathe new life into your data warehouse by offloading etl processes to hadoop

MANAGE BIG DATA APPS FOR COLLABORATION

18

Create JIRA issues with views and data for quickly collaborating to resolve performance problems

Integrate alerts with popular notification platforms like HipChat, PagerDuty, & Nagios

With one click, create a Jira issue with a link to this view

Page 19: Breathe new life into your data warehouse by offloading etl processes to hadoop

MANAGE BIG DATA APPS FOR COLLABORATION

19

Create JIRA issues with views and data for quickly collaborating to resolve performance problems

Integrate alerts with popular notification platforms like HipChat, PagerDuty, & Nagios

Automatically send app status notifications via webhooks or JMX

Page 20: Breathe new life into your data warehouse by offloading etl processes to hadoop

NURTURE A CULTURE OF OPERATIONAL EXCELLENCE NURTURE A CULTURE OF OPERATIONAL EXCELLENCE

“The coolest part about Driven is being able to visualize data pipelines and inspect components in real time for easy troubleshooting and optimization. I don't know of any other tool that's close in functionality.”

- Neville LiSoftware Engineer, Spotify

20

”Driven has given us a way to monitor the performance of our data-driven applications in a manner which is visually intuitive to both engineering and business users.”

- Joao Vicente Performance Architect Dun & Bradstreet

Page 21: Breathe new life into your data warehouse by offloading etl processes to hadoop

End-to-end operational telemetry metadata for big data applicationsAccessible via Web browser, command-line interface (CLI), or simple search queriesEasy integrations through JMX and upcoming Driven SDK

… THROUGH A SCALABLE, SEARCHABLE METADATA STORE

Telemetry metadata(SSL)

YARNYARN

HADOOP APPS AND INFRASTRUCTURE

APPLICATIONS

Plugin

21

HADOOP CLUSTERS

WAR

files Web App

Server

Server

Web CLI JMX

Web AppServer

Page 22: Breathe new life into your data warehouse by offloading etl processes to hadoop

THANK YOU