13
Hadoop WorkFlow Scheduler / Automation Engine Azkaban & Oozie Praveen Thirukonda Senior Associate Data & Analytics Orange County, CA 09/11/2014

Azkaban - WorkFlow Scheduler/Automation Engine

Embed Size (px)

DESCRIPTION

Azkaban - WorkFlow Scheduler/Automation Engine Seminar given at KPMG by Praveen Thirukonda.

Citation preview

Page 1: Azkaban - WorkFlow Scheduler/Automation Engine

Hadoop WorkFlow Scheduler / Automation

EngineAzkaban & Oozie

Praveen Thirukonda Senior Associate

Data & AnalyticsOrange County, CA

09/11/2014

Page 2: Azkaban - WorkFlow Scheduler/Automation Engine

© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied.

2

What is a workflow?

- A workflow is a Directed Acyclic Graph (DAG) of “jobs” where each job has one or more inputs and outputs.

- A workflow scheduler helps us manage the co ordination among the various jobs.

Page 3: Azkaban - WorkFlow Scheduler/Automation Engine

© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied.

3

When do we need a workflow scheduler?

- In a Data Pipeline, Batch jobs need to be scheduled to run periodically.

- They also typically have intricate dependency chains—for example, dependencies on various data extraction processes or previous steps.

- Larger processes might have 50 or 60 steps, of which some might run in parallel and others must wait for the output of earlier steps.

Page 4: Azkaban - WorkFlow Scheduler/Automation Engine

Azkaban

Page 5: Azkaban - WorkFlow Scheduler/Automation Engine

© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied.

5

What is Azkaban?

- “cron on steroids”- A workflow scheduler can be seen as a

combination of the cron and make Unix utilities combined with a friendly UI.

Page 6: Azkaban - WorkFlow Scheduler/Automation Engine

© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied.

6

What is Azkaban?

- Azkaban was implemented at LinkedIn to solve the problem of Hadoop job dependencies.

- Azkaban resolves the ordering through job dependencies and provides an easy to use web user interface to maintain and track your workflows.

Page 7: Azkaban - WorkFlow Scheduler/Automation Engine

© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied.

7

An Image is worth a 1000 words..

Page 8: Azkaban - WorkFlow Scheduler/Automation Engine

Apache Oozie

Page 9: Azkaban - WorkFlow Scheduler/Automation Engine

© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied.

9

What is Apache Oozie?

- Similar to Azkaban. - Whereas Azkaban uses a series of

Properties files, Oozie uses an XML file.- Oozie supports Java API, command line

methods for workflow submission in addition to Browser interface/REST API.

- Oozie is part of our Hortonworks environment in our cluster.

Page 10: Azkaban - WorkFlow Scheduler/Automation Engine

© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied.

10

Advantages of using a workflow scheduler

- Let’s you easily manage dependencies within the various tasks.

- Scheduling of workflows- Monitor the progress of your workflow with

nice interface.- Email alerts on failure and successes- Retrying of failed jobs.

Page 11: Azkaban - WorkFlow Scheduler/Automation Engine

© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied.

11

Application of a workflow scheduler

- Real Life example of how and where you might use a workflow scheduler in your Big Data System architecture?

Page 12: Azkaban - WorkFlow Scheduler/Automation Engine

Thank you

Presentation by Praveen Thirukonda

Page 13: Azkaban - WorkFlow Scheduler/Automation Engine

© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

The KPMG name, logo and “cutting through complexity” are registered trademarks or trademarks of KPMG International.