20
WEBINAR How CBS Interactive Uses Cloudera Manager to Effectively Manage their Hadoop Cluster Wednesday, September 19 th , 2012 Manoj Murumkar - Senior Manager, Data Engineering, CBS Interactive Bala Venkatrao – Director of Products, Cloudera

How CBS Interactive uses Cloudera Manager to effectively manage their Hadoop cluster

Embed Size (px)

Citation preview

WEBINARHow CBS Interactive Uses Cloudera Manager to Effectively Manage their Hadoop Cluster

Wednesday, September 19th, 2012

Manoj Murumkar - Senior Manager, Data Engineering, CBS Interactive

Bala Venkatrao – Director of Products, Cloudera

Agenda

Introductions

CBSi • Hadoop Use Case • Operational Challenges• How Cloudera Manager helps CBSi & Demo • Benefits of using Cloudera Manager

Cloudera Manager• Overview & Benefits • Key Features • Roadmap

Q&A

2 ©2012 Cloudera, Inc. All Rights Reserved.

Introductions

Manoj MurumkarSenior Manager, Data Engineering at CBS Interactive

Manoj has been working with data technologies since 1998. His team currently responsible for providing data infrastructure solutions and operating them for internet division of CBS corporation. He has been involved with Hadoop for more around 3 years, around 2 years of which working with Cloudera. His team has built big data infrastructure from ground up that helps in clickstream analysis using Hadoop streaming.

Bala VenkatraoDirector, Products at Cloudera

Bala Venkatrao is part of the product management team at Cloudera and leads the efforts around Cloudera Manager. In addition, he is involved in several other initiatives, including customer advocacy, partnership development, marketing etc.

3 ©2012 Cloudera, Inc. All Rights Reserved.

Building web analytics for Top 10 global web property on Hadoop 235M worldwide monthly unique users

©2012 Cloudera, Inc. All Rights Reserved.

Requires advanced analytics on click stream data in near real time

Weblog processing time on proprietary platform hit limit while data volumes continuously increased

Ability/Cost to store historical data for analyses

Web analytics platform on Hadoop processes >1B global events/day

>1PB on Hadoop; 42 nodes

Tracking clicks, page views, downloads, streaming video events, ad events, etc.

Hadoop Components: HDFS, Hive, MapReduce, Pig, Hadoop Streaming

Optimizing what content is placed beside that which user is currently reading

Reduced processing time by 6+ hrs. to reach SLA

Accommodates 50% data volume increase per year

Reduce cost of storing/processing data

Greater ad revenues achieved

Challenge Solution Results

Source: Hadoop World 2012 presentation. Michael Sun, Lead Software Engineer & Manager of DW Operations, CBS Interactive. http://www.cloudera.com/resource/hadoop-world-2012-presentation-slides-building-web-analytics-processing-on-hadoop-at-cbs-interactive/

4

CBSi Hadoop

Operational Challenges

Prior to Cloudera Manager

Lack of… Holistic view Configuration control

No audit trail/history of changes

Existing solutions were… Ganglia , Hadoop web UI pages and custom scripts

Difficult to maintain

No visibility into activity failures• Reactive to user complaints on failed/long running jobs

5 ©2012 Cloudera, Inc. All Rights Reserved.

How Cloudera Manager helps CBSi with

Hadoop Operations

6

Intuitive visual interface Can manage and monitor the whole cluster

Overall health status/dashboard Ability to drill down from services > roles > hosts

Service Monitoring and Alerting Makes Hadoop operations pro-active

Heatmaps provides an easy way to identify outliers

Activity Monitoring• Helps identify failed or slow running jobs

Notify end-users on failed jobs and manage SLA’s

Workflows Simple to add new ‘data nodes’, hosts, clients etc.

©2012 Cloudera, Inc. All Rights Reserved.

CLOUDERA MANAGER

CBSi Demo

Key Benefits of

Using Cloudera Manager at CBSi

Lowers the barrier for Hadoop administration Do not need to rely on experts solely

Makes life easier – saves money & time Avoid licensing costs associated with managing multiple tools Cuts technical and human resource costs Reduces time to manage and maintain the cluster

Provides a “one-stop” holistic view Easy to understand how the overall cluster is performing

Helps create repeatable processes & workflows for Hadoop operations

Improves efficiency of the Operations team

8 ©2012 Cloudera, Inc. All Rights Reserved.

The 6 Characteristics of

Enterprise Grade Hadoop

9 ©2012 Cloudera, Inc. All Rights Reserved.

Why You Need Cloudera Manager

10

COMPLEXITYHADOOP IS MORE THAN A DOZEN SERVICES RUNNING ACROSS MANY MACHINES

HUNDREDS OF HARDWARE COMPONENTS THOUSANDS OF SETTINGS LIMITLESS PERMUTATIONS

1

CONTEXTHADOOP IS A SYSTEM, NOT JUST ACOLLECTION OF PARTS

EVERYTHING IS INTERRELATED RAW DATA ABOUT INDIVIDUAL PIECES IS NOT ENOUGH MUST EXTRACT WHAT’S IMPORTANT

2

EFFICIENCYMANAGING HADOOP WITH MULTIPLE TOOLS & MANUAL PROCESSES TAKES LONGER

COMPLICATED, ERROR PRONE WORKFLOWS LONGER ISSUE RESOLUTION LACK OF CONSISTENT AND REPEATABLE PROCESSES

3

©2012 Cloudera, Inc. All Rights Reserved.

Cloudera Manager Provides

End-to-End CDH Administration

11

DEPLOYINSTALL, CONFIGURE AND START YOUR CLUSTER IN 3 SIMPLE STEPS

1

CONFIGURE & OPTIMIZEENSURE OPTIMAL SETTINGS FOR ALL HOSTS AND SERVICES

2

MONITOR, DIAGNOSE & REPORTFIND AND FIX PROBLEMS QUICKLY, VIEW CURRENT AND HISTORICAL ACTIVITY AND RESOURCE USAGE

3 CDH

©2012 Cloudera, Inc. All Rights Reserved.

Managing Complexity

One Tool For Everything

©2012 Cloudera, Inc. All Rights Reserved.12

DEPLOYMENT & CONFIGURATION MONITORING WORKFLOWS EVENTS & ALERTS LOG SEARCH DIAGNOSTICS REPORTING ACTIVITY

MONITORING

CLOUDERA ENTERPRISE

+

DO-IT-YOURSELF

“In a recent Cloudera survey, >95% of respondents emphasized the need for a single end-to-end tool to manage their Hadoop Operations”

Providing Context

Raw Data vs. Hadoop Intelligence

13

WORKFLOWSENSURES THAT MULTI-STEP TASKS ARE ACCOMPLISHED COMPLETELY & IN THE CORRECT SEQUENCE

2

SMART CONFIGURATIONAUTO-SETS CONFIGURATIONS & GUARDS AGAINST USER ERROR

1

EVENTS & ALERTSMAKES YOU AWARE OF WHAT’S IMPORTANT AT A HADOOP SYSTEM LEVEL

4

DEPENDENCIESAWARE OF HOW A PARTICULAR ACTION AFFECTS THE REST OF THE CLUSTER & MANAGES THE IMPACT

3

HISTORYCOMPARES CURRENT & PAST ACTIVITIES FOR CONTEXT5

? VS.

©2012 Cloudera, Inc. All Rights Reserved.

Cloudera Manager Key FeaturesAutomated Deployment Installs the complete Hadoop stack in minutes via a wizard-based interface

Centralized Management Gives you complete, end-to-end visibility and control over your Hadoop cluster from a single interface

Multi-Cluster Management Allows you to manage multiple clusters from a single instance of Cloudera Manager

LDAP Authentication Integrate Cloudera Manager with Active Directory

Global Time Control Establishes the time context globally for almost all views

Correlates jobs, activities, logs, system changes, configuration changes and service metrics along a single timeline to simplify diagnosis

Service & Configuration Management

Set server roles, configure services and manage security across the cluster

Gracefully start, stop and restart of services as needed

Role-Based Administration Supports Administrator and Read-Only users

Audit Trails Maintains a complete record of configuration changes with the ability to roll back to previous states

Proactive Health Checks Monitors dozens of service performance metrics and alerts you when you approach critical thresholds

14 ©2012 Cloudera, Inc. All Rights Reserved.

Cloudera Manager Key Features

Intelligent Log Management Gather, view and search Hadoop logs collected from across the cluster

Scans Hadoop logs for irregularities and warns you before they impact the cluster

Event Management Creates and aggregates relevant Hadoop events pertaining to system health, log messages, user services and activities and make them available for alerting and searching

Alerting Generates email alerts when certain events occur

Activity Monitoring Consolidates all cluster activity into a single, real-time view

Host Level Monitoring View information pertaining to hosts in your cluster including status, resident memory, virtual memory and roles

Heatmaps Visualize health status and metrics across the cluster to quickly identify problem nodes and take action

Operational Reports Visualize current and historical disk usage by user, group and directoryTrack MapReduce activity on the cluster by job or user

Support Integration Takes a snapshot of the cluster state and automatically sends it to Cloudera support to assist with resolution

Comprehensive API Easily integrate Cloudera Manager with your existing enterprise-wide management and monitoring tools

15 ©2012 Cloudera, Inc. All Rights Reserved.

©2012 Cloudera, Inc. All Rights Reserved.

16

Cloudera Manager Roadmap

Maintenance mode

Platform Support Manage additional services like Flume, Hive etc.

Monitoring ZooKeeper monitoring Advanced HBase monitoring

Rolling Upgrades

Usability enhancements Improved error handling Log search enhancements Enhanced charting

17

Why Enterprises are Standardizing on Cloudera Manager

SIMPLEEND-TO-END HADOOP ADMINISTRATION IN A SINGLE TOOL1INTELLIGENTMANAGES HADOOP AT THE SYSTEM LEVEL - CLOUDERA’S EXPERIENCE REALIZED IN SOFTWARE

2EFFICIENTSIMPLIFIES COMPLEX WORKFLOWS & MAKES ADMINISTRATORS MORE EFFICIENT

3BEST-IN-CLASSTHE ONLY ENTERPRISE-GRADE HADOOP MANAGEMENT APPLICATION AVAILABLE

4

©2012 Cloudera, Inc. All Rights Reserved.

Next Steps

• Try out FREE edition of Cloudera Manager• Download from:

http://www.cloudera.com/products-services/tools/• Support available via [email protected]• For Cloudera Enterprise subscriptions, please

contact: [email protected]

©2012 Cloudera, Inc. All Rights Reserved.18

Q & AFor more information go to www.cloudera.com

THANK YOU!We appreciate your time and interest in Cloudera!

For more information: www.cloudera.comSales: (888)789-1488