23
View Hadoop Administration Course at www.edureka.co/hadoop-admin Secure your Hadoop Cluster with Kerberos

Secure Hadoop Cluster With Kerberos

  • Upload
    edureka

  • View
    164

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Secure Hadoop Cluster With Kerberos

View Hadoop Administration Course at www.edureka.co/hadoop-admin

Secure your Hadoop Cluster with Kerberos

Page 2: Secure Hadoop Cluster With Kerberos

www.edureka.co/hadoop-adminSlide 2 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Objectives

At the end of this module, you will be able to

Hadoop Cluster introductionRecommended Configuration for clusterHadoop cluster running modesHadoop Security with Kerberos Hadoop Admin ResponsibilitiesDemo on Kerberos

Page 3: Secure Hadoop Cluster With Kerberos

Slide 3Slide 3Slide 3 www.edureka.co/java-hadoop

Hadoop Core Components

Hadoop 2.x Core Components

HDFS YARN

Storage Processing

DataNode

NameNode Resource Manager

Node Manager

Master

Slave

SecondaryNameNode

www.edureka.co/hadoop-admin

Page 4: Secure Hadoop Cluster With Kerberos

Slide 4

RAM: 16GBHard disk: 6 x 2TBProcessor: Xenon with 2 coresEthernet: 3 x 10 GB/sOS: 64-bit CentOS

Hadoop Cluster: A Typical Use Case

RAM: 16GBHard disk: 6 x 2TBProcessor: Xenon with 2 cores.Ethernet: 3 x 10 GB/sOS: 64-bit CentOS

RAM: 64 GB,Hard disk: 1 TBProcessor: Xenon with 8 CoresEthernet: 3 x 10 GB/sOS: 64-bit CentOSPower: Redundant Power Supply

RAM: 32 GB,Hard disk: 1 TBProcessor: Xenon with 4 CoresEthernet: 3 x 10 GB/sOS: 64-bit CentOSPower: Redundant Power Supply

Active NameNodeSecondary NameNode

DataNode DataNode

RAM: 64 GB,Hard disk: 1 TBProcessor: Xenon with 8 CoresEthernet: 3 x 10 GB/sOS: 64-bit CentOSPower: Redundant Power Supply

StandBy NameNode

Optional

RAM: 16GBHard disk: 6 x 2TBProcessor: Xenon with 2 coresEthernet: 3 x 10 GB/sOS: 64-bit CentOS

DataNode

DataNode DataNode DataNode

www.edureka.co/hadoop-admin

Page 5: Secure Hadoop Cluster With Kerberos

www.edureka.co/hadoop-adminSlide 5

Seeking cluster growth on storage capacity is often a good method to use!

Cluster Growth Based On Storage Capacity

Data grows by approximately5TB per week

HDFS set up to replicate eachblock three times

Thus, 15TB of extra storagespace required per week

Assuming machines with 5x3TBhard drives, equating to a newmachine required each week

Assume Overheads to be 30%

Page 6: Secure Hadoop Cluster With Kerberos

www.edureka.co/hadoop-adminSlide 6

Slave Nodes: Recommended Configuration

Higher-performance vs lower performance components

Save the Money, Buy more Nodes!

General ( Depends on requirement ‘base’ configuration for a slave Node

» 4 x 1 TB or 2 TB hard drives, in a JBOD* configuration

» Do not use RAID!» 2 x Quad-core CPUs» 24 -32GB RAM» Gigabit Ethernet

General Configuration

Multiples of ( 1 hard drive + 2 cores+ 6-8GB RAM) generally work wellfor many types of applications

Special Configuration

Slave Nodes

“A cluster with more nodes performs better than one with fewer, slightly faster nodes”

Page 7: Secure Hadoop Cluster With Kerberos

www.edureka.co/hadoop-adminSlide 7

Slave Nodes: More Details (RAM)

Slave Nodes (RAM)

Generally each Map or Reduce taskwill take 1GB to 2GB of RAM

Slave nodes should not be usingvirtual memory

RULE OF THUMB!Total number of tasks = 1.5 x numberof processor core

Ensure enough RAM is present torun all tasks, plus the DataNode,TaskTracker daemons, plus theoperating system

Page 8: Secure Hadoop Cluster With Kerberos

www.edureka.co/hadoop-adminSlide 8

Master Node Hardware Recommendations

Carrier-class hardware (Not commodity hardware)

Dual power supplies

Dual Ethernet cards(Bonded to provide failover)

Raided hard drives

At least 32GB of RAM

Master Node

Requires

Page 9: Secure Hadoop Cluster With Kerberos

www.edureka.co/hadoop-adminSlide 9

Hadoop Cluster Modes

Hadoop can run in any of the following three modes:

Fully-Distributed Mode

Pseudo-Distributed Mode

No daemons, everything runs in a single JVM Suitable for running MapReduce programs during development Has no DFS

Hadoop daemons run on the local machine

Hadoop daemons run on a cluster of machines

Standalone (or Local) Mode

Page 10: Secure Hadoop Cluster With Kerberos

www.edureka.co/hadoop-adminSlide 10

Configuration Files

ConfigurationFilenames

Description of Log Files

hadoop-env.shyarn-env.sh

Settings for Hadoop Daemon’s process environment.

core-site.xmlConfiguration settings for Hadoop Core such as I/O settings that common to both HDFS and YARN.

hdfs-site.xml Configuration settings for HDFS Daemons, the Name Node and the Data Nodes.

yarn-site.xml Configuration setting for Resource Manager and Node Manager.

mapred-site.xml Configuration settings for MapReduce Applications.

slaves A list of machines (one per line) that each run DataNode and Node Manager.

Page 11: Secure Hadoop Cluster With Kerberos

Slide 11

Core

HDFS

core-site.xml

hdfs-site.xml

yarn-site.xmlYARN

mapred-site.xmlMap

Reduce

Hadoop 2.x Configuration Files – Apache Hadoop

www.edureka.co/hadoop-admin

Page 12: Secure Hadoop Cluster With Kerberos

Slide 12 www.edureka.in/hadoop-admin

Security

The Hadoop ecosystem has only partially adopted Kerberos but many services remain unprotected and use trivial authentication systems.

Service-level authorization and web proxy capabilities in YARN.

Most security tools fail to scale and perform with big data environments.

Page 13: Secure Hadoop Cluster With Kerberos

Slide 13 www.edureka.in/hadoop-admin

Security – Simple Flow

Security Risks

Insufficient Authentication Do not authenticate users services

No Privacy and No Integrity Insecure Network Transport No Message level security

Arbitrary Code Execution No User verification for MapReduce code

execution, malicious users could submit a job

Client Resource Manager

HDFS

Task

HDFS

Node Manager

Task

Node Manager

Page 14: Secure Hadoop Cluster With Kerberos

Slide 14 www.edureka.in/hadoop-admin

Kerberos to the rescue

Network authentication protocol

Developed at MIT in the mid 1980s

Available as open source or in supported commercial software

Page 15: Secure Hadoop Cluster With Kerberos

Slide 15 www.edureka.in/hadoop-admin

Kerberos Design Requirements

Interactions between hosts and clients should be encrypted.

Must be convenient for users (or they won’t use it).

Protect against intercepted credentials.

Kerberos is based on the Secret-Key Distribution Model

-keys are the basis of authentication in Kerberos

-typically a short sequence of bytes.

-used to both encrypt & decrypt

Encryption => plainTxt + Encryption key = cipherTxt

Decryption => cipherTxt + Decryption key = plainTxt

Page 16: Secure Hadoop Cluster With Kerberos

Slide 16 www.edureka.in/hadoop-admin

Kerberos to the rescue

Kerberos Integration

User Authentication User and Group access control list at

cluster level Tokens

Delegation

Job

Block Access

Simple Authentication and Security Layer (SASL) with RPC digest mechanism

Server

1: AuthenticationGet TGT

2: AuthorizationGet Service Ticket

3: Service RequestStart Service Session

Kerberos Key Distribution Center

Authentication Server

Ticket Granting Server

Client

Page 17: Secure Hadoop Cluster With Kerberos

Slide 17 www.edureka.in/hadoop-admin

Kerberos Applications

Authentication

Authorization

Confidentiality

Within networks and small sets of networks

Page 18: Secure Hadoop Cluster With Kerberos

www.edureka.co/hadoop-adminSlide 18

DEMO

Page 19: Secure Hadoop Cluster With Kerberos

www.edureka.co/hadoop-adminSlide 19

Hadoop Admin Responsibilities

Responsible for implementation and administration of Hadoop infrastructure.

Testing HDFS, Hive, Pig and MapReduce access for Applications.

Cluster maintenance tasks like Backup, Recovery, Upgrade, Patching.

Performance tuning and Capacity planning for Clusters.

Monitor Hadoop cluster and deploy security.

Page 20: Secure Hadoop Cluster With Kerberos

LIVE Online Class

Class Recording in LMS

24/7 Post Class Support

Module Wise Quiz

Project Work

Verifiable Certificate

www.edureka.co/hadoop-adminSlide 20 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

How it Works?

Page 21: Secure Hadoop Cluster With Kerberos

Questions

www.edureka.co/hadoop-adminSlide 21 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Page 22: Secure Hadoop Cluster With Kerberos

www.edureka.co/hadoop-adminSlide 22

Course Topics

Module 1 » Hadoop Cluster Administration

Module 2» Hadoop Architecture and Cluster setup

Module 3 » Hadoop Cluster: Planning and Managing

Module 4 » Backup, Recovery and Maintenance

Module 5 » Hadoop 2.0 and High Availability

Module 6» Advanced Topics: QJM, HDFS Federation and

Security

Module 7» Oozie, Hcatalog/Hive and HBase Administration

Module 8» Project: Hadoop Implementation

Page 23: Secure Hadoop Cluster With Kerberos