30
CLOUDERA: CDH4 MELBOURNE HUG Presented by Angus Klein, Sr. Director of Support Linden Hillenbrand, Field Support Engineer

CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

CLOUDERA: CDH4 MELBOURNE HUG

Presented by Angus Klein, Sr. Director of Support Linden Hillenbrand, Field Support Engineer

Page 2: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

ABOUT CLOUDERA 1.  STARTED IN 08’ 2.  > 250 EMPLOYEES 3. > 1,200 PATCHES IN 2011 4. PARTNER PROGRAM 5. CERTIFIED & COMPATIBLE

Page 3: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

3

CLOUDERA

TIMELINE 2008 CLOUDERA FOUNDED BY MIKE OLSON, AMR AWADALLAH & JEFF HAMMERBACHER

2009 HADOOP CREATOR

DOUG CUTTING JOINS CLOUDERA

2009 CDH: FIRST COMMERCIAL APACHE HADOOP DISTRIBUTION

2010 CLOUDERA MANAGER:

FIRST MANAGEMENT APPLICATION FOR

HADOOP

2011 CLOUDERA REACHES 100 PRODUCTION CUSTOMERS

2011 CLOUDERA UNIVERSITY

EXPANDS TO 140 COUNTRIES

2012 CLOUDERA ENTERPRISE 4: THE STANDARD FOR HADOOP IN THE ENTERPRISE

2012 CLOUDERA CONNECT

REACHES 300 PARTNERS

BEYOND… TRANSFORMING

HOW COMPANIES THINK ABOUT

DATA

CDH CLOUDERA MANAGER

CLOUDERA ENTERPRISE

4

CHANGING THE WORLD ONE PETABYTE AT A TIME

Page 4: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

San Francisco, CA

©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 4

Cloudera’s proactive, production-level support gives you the expertise and responsiveness you need to run Apache Hadoop in Production

Chennai, India

Tokyo Palo Alto, CA RTP, NC

Boston New York

Austin, TX Tucson, AZ

San Francisco, CA

Stuttgart, GE

Support Center

Support Presence

Seattle, WA

Melbourne, Aus.

UK

Page 5: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

THE FUTURE: CDH4 What does the enterprise need?

Page 6: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

CDH4

THE STANDARD FOR HADOOP IN THE ENTERPRISE

100% OPEN SOURCE HADOOP DISTRIBUTION

CLOUDERA ENTERPRISE 4.0

CLOUDERA MANAGER 4

PRODUCTION SUPPORT

Page 7: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

Scalable and extensible

1 2 3

High Availability Integration with the rest of IT

Secure

6

Simplified configuration and deployment

Global Support and services

5 4

Page 8: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

Highly Available Namenode Function: A secondary namenode as a hot standby, ready for fail-over Benefit: Eliminates the only remaining single point of failure in HDFS

Heterogeneous Clusters Function: Users can run different nodes on different Hadoop versions Benefit: Lower downtime by gradually staging code changes into select nodes of a cluster

Increased usability for mission-critical use cases and applications.

Page 9: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

Store more sensitive data in CDH and get granular access control to facilitate multi-tenancy.

HBase Table & Column Permissions Function: Secure which users and groups have access to HBase tables and columns Benefit: Allows sensitive data to be stored in HBase. Facilitates multi-tenancy

Fair Scheduler ACLs Function: Secure which groups can administer or submit jobs into different Fair Scheduler pools Benefit: Makes it easier to administrate a multi-tenant cluster.

Page 10: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

CDH extends its leadership and ability to solve a broader range of problems than other data management systems.

Co-processors Function: Create and run custom programs that operate on data as it changes in real-time Benefit: Developers can create more sophisticated real-time applications on top of HBase

Open Resource Management (a.k.a. MR2) Function: Enable multiple data processing frameworks to run on the same Hadoop cluster Benefit: Save cost by running more applications on the same storage & cluster resources

Page 11: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

!  Common compression codec (Snappy) !  Common file format (Avro) !  REST over HTTP access to HDFS !  Manage limitless file counts with NameNode federation !  Web shell for Pig, HBase and Flume !  Slot-less resource manager !  Faster, more useable user web access to Hadoop systems !  Support for concurrent queries over Hive !  100% gain in filesystem I/O performance !  100% speedup in HBase random reads !  200% improvement in Flume data ingest rate !  30% faster MapReduce shuffle

Page 12: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

BUT HOW DO I MANAGE THIS? CM4

Page 13: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

Cloudera Enterprise is the easiest Hadoop solution to deploy and manage.

3-Step HA Configuration Function: Enable high availability for the NameNode in 3 steps Benefit: Guided set up reduces more than a dozen manual procedures into 3 simple steps

Backward Compatible Function: Support CDH3 and CDH4 clusters with Cloudera Manager 4 Benefit: Flexibility in management

Multi-Cluster Management Function: Manage multiple clusters from a single instance of Cloudera Manager Benefit: Central administration for your entire CDH environment

Page 14: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

Cloudera Manager provides rich visualizations and sophisticated automation to reliably manage large-scale clusters.

Heatmaps Function: Visualizes health status and metrics across the cluster Benefit: Quickly identify problem nodes within large clusters and take action

Federated NameNode Management Function: Configure and manage NameNode federation Benefit: Simplifies the process of growing CDH to billions of files across thousands of nodes

Page 15: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

Cloudera Enterprise fits seamlessly into existing infrastructures and processes.

Cloudera Manager API Function: Provides application calls for all features in Cloudera Manager Benefit: Easily integrate Cloudera Manager with your existing enterprise-wide management and monitoring tools

Broader Support & Packaging Function: Cloudera Manager packages for Debian and Ubuntu / Support for Oracle 11g and PostgreSQL as backend databases Benefit: Increased flexibility in deployment

LDAP Authentication Function: Authenticate administrator logins against Active Directory Benefit: Single set of login credentials for administrators

Page 16: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

!  Syncing of client configurations !  More granular host monitoring !  Simpler interface for alert management !  Improved quota management !  Improved logic for auto configuration and validation checks !  New health checks for various services !  Manage and configure multiple processing frameworks !  Better support integration !  Support for webhdfs/httpfs

Page 17: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

CLOUDERA USE CASES

17

Page 18: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

©2011 Cloudera, Inc. All Rights Reserved. 18

!  Customer Risk Analysis !  Surveillance and Fraud Detection !  Central Data Repository !  Personalization and Asset Management !  Market Risk Modeling !  Trade Performance Analytics

!  Genomics !  Utilities and Power Grid !  Smart Meters !  Biodiversity Indexing !  Preventing Network Failures !  Seismic Data

!  Customer Churn Analysis !  Brand and Sentiment Analysis !  POS Transaction Analysis !  Pricing Models !  Customer Loyalty !  Targeted Offers

!  Online Media !  Mobile !  Online Gaming !  Search Quality !  Recommendations !  Influence

Page 19: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

Financial Data is messy due to many interacting systems Personal data is obfuscated for security and records get out of sync Trades need to be “sessionized” into accounts and products Discrepancies are difficult to reconcile, need to track corrections

Hadoop is a centralized platform for data collection Single source for data, processing happens on the platform Metadata used to track information lifecycle Workflows run and monitor data transformation pipelines

Data served via APIs or in Batch Single version of the truth, data processed and cleansed centrally Clear audit trail of data dependencies and usage

Copyright 2010 Cloudera Inc. All rights reserved 19

Page 20: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

Power grid is aging and maintained incrementally Failures hard to predict and can have cascading effects Looking at vibration of transformers over time to find patterns

Predicting failure of grid equipment Supervised learning to scan time series data for fuzzy patterns Identify likely faulting equipment for targeted replacement

Hadoop based tools to model equipment behavior openPDC project: http://openpdc.codeplex.com Lumberyard - indexing time series data for low latency fuzzy queries

Copyright 2010 Cloudera Inc. All rights reserved 20

Page 21: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

OPERATIONAL ROADMAP ARCHITECTING CENTER OF EXCELLENCE

Page 22: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

22

!  Pace of Business Offers Little Room for Risk

!  Disruptive Technologies Require New Skills

!  New Processes are Error Prone

Page 23: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

23

!  Identify Technologies !  Learn New Skills !  Develop and Test Processes to

Lower Risk

A Center of Excellence (COE) is where organizations:

Page 24: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

24

Learn Develop

Operate

Deploy

Publish

Research

Page 25: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

An integrated, turn-key system for Big Data management

Continually first to market with ground-breaking features

CDH is 100% Apache licensed with no forks or proprietary underpinnings

More enterprises run and more vendors certify with Cloudera than all other solutions combined.

4th generation product with multi-year track record of predictable releases, low-risk upgrades and strong compatibility guarantees

Page 26: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

cloudera.com +1 (888) 789-1488

[email protected]

twitter.com/ cloudera

facebook.com/ cloudera

Thank You!

For questions, please contact: Angus Klein: [email protected] Linden Hillenbrand: [email protected]

HUGs All Around

Page 27: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:
Page 28: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

!  Apache-compatible licensed compression

!  Web shell for Hue

!  Flume/HBase integration !  HBase bulk loading

Q3 2011 !  Added HBase for real-

time read/write access !  Added Zookeeper and

Oozie for coordination !  Added Flume and Sqoop

for data integration !  Added Hue for web-

based access

!  Integrated authentication throughout the stack

Q2 2011

2009 2010 2011 2012

!  Improved stability and performance

!  Added packages for CentOS and Ubuntu

!  Rackspace and Softlayer support

Q1 2010 !  The first commercial

Apache Hadoop distro !  HDFS and MapReduce

for storage and computation

!  Hive and Pig for data processing/analysis

!  Packaged for Red Hat Enterprise Linux

!  Amazon EC2 support

Q3 2009

3 Years of Steady Innovation.

!  Added Mahout for machine learning

!  Improved Avro file format support in Sqoop, Flume, MapReduce, Pig, Hive and Hue

Q4 2011 !  NameNode high

availability !  HBase table and column

permissions for increased security

!  HBase co-processors for real-time applications

!  Support for multiple data processing frameworks (MR1 and MR2)

AVAILABLE NOW !  Performance

improvements to HBase and HDFS

!  Faster HBase recoverability from region failure

!  Improved HDD failure handling

Q1 2012

Page 29: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

Installs the complete Hadoop stack in minutes via a wizard-based interface

Gives you complete, end-to-end visibility and control over your Hadoop cluster from a single interface

Allows you to manage multiple clusters from a single instance of Cloudera Manager

Integrate Cloudera Manager with Active Directory

Establishes the time context globally for almost all views

Correlates jobs, activities, logs, system changes, configuration changes and service metrics along a single timeline to simplify diagnosis

Set server roles, configure services and manage security across the cluster

Gracefully start, stop and restart of services as needed

Supports Administrator and Read-Only users

Maintains a complete record of configuration changes with the ability to roll back to previous states

Monitors dozens of service performance metrics and alerts you when you approach critical thresholds

Page 30: CLOUDERA: CDH4 MELBOURNE HUGfiles.meetup.com/2808892/Melbourne_HUG.pdf · Cloudera Enterprise is the easiest Hadoop solution to deploy and manage. 3-Step HA Configuration Function:

Gather, view and search Hadoop logs collected from across the cluster

Scans Hadoop logs for irregularities and warns you before they impact the cluster

Creates and aggregates relevant Hadoop events pertaining to system health, log messages, user services and activities and make them available for alerting and searching

Generates email alerts when certain events occur

Consolidates all cluster activity into a single, real-time view

View information pertaining to hosts in your cluster including status, resident memory, virtual memory and roles

Visualize health status and metrics across the cluster to quickly identify problem nodes and take action

Visualize current and historical disk usage by user, group and directory Track MapReduce activity on the cluster by job or user

Takes a snapshot of the cluster state and automatically sends it to Cloudera support to assist with resolution

Easily integrate Cloudera Manager with your existing enterprise-wide management and monitoring tools