21

Click here to load reader

Federated HDFS

  • Upload
    huguk

  • View
    10.439

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Federated HDFS

Apache Hadoop

India Summit 2011

HDFS FederationSanjay Radia, Hadoop Architect

Yahoo! Inc

1

Page 2: Federated HDFS

Outline

• HDFS - Quick overview

• Scaling HDFS - FederationHDFS Distributed file

system

MapReduce Distributed

computation

HBase Column store

Pig Dataflow language

Hive Data warehouse

Zookeeper Distributed

coordination

Avro Data Serialization

Oozie Workflow

Hadoop Components

Page 3: Federated HDFS

3

Page 4: Federated HDFS

4

HDFS

b1

b2

b3 b1

b5

b3 b3

b5

b2

b4b5

b6b2

b3

b4

Namenode

Namespace Metadata &

Journal

Namespace

State

Block

Map

Heartbeats & Block Reports

Block ID Block Locations

Datanodes

Block ID Data

Backup

Namenode

Hierarchal NamespaceFile Name BlockIDs

Horizontally Scale IO and Storage

Page 5: Federated HDFS

5

HDFSClient reads and writes

b1

b2

b3 b1

b5

b3 b3

b5

b2

b4b5

b6b2

b3

b4

Client Client

Namenode1 open

2 read2 write

1 create

writewrite

Datanodes

Namespace

State

Block

Map

End-to-end checksum

Page 6: Federated HDFS

6

HDFS Architecture :

Computation close to the data

DataData data data data data

Data data data data data

Data data data data data

Data data data data data

Data data data data data

Data data data data data

Data data data data data

Data data data data data

Data data data data data

Data data data data data

Data data data data data

Data data data data data

ResultsData data data data

Data data data data

Data data data data

Data data data data

Data data data data

Data data data data

Data data data data

Data data data data

Data data data data

Hadoop Cluster

Block 1

Block 1

Block 2

Block 2

Block 2

Block 1

MAP

MAP

MAP

Reduce

Block 3

Block 3

Block 3

Page 7: Federated HDFS

Quiz: What Is the Common Attribute?

7

Page 8: Federated HDFS

HDFSActively maintain data reliability

b1

b2

b3 b1

b5

b3 b3

b5

b2

b4b5

b6b2

b3

b4

Namenode

2. copy

3.

blockReceived

1.

replicate

Datanodes

Bad/lost

block replica

Periodically

check block

checksums

Namespace

State

Block

Map

Page 9: Federated HDFS

Hadoop at Yahoo!

9

99.85

99.47

99.69

99.2 99.3 99.4 99.5 99.6 99.7 99.8 99.9

Production

Research

Sandbox

Availability SLA

0

50,000

100,000

150,000

200,000

250,000

2006 -Qtr1

2006 -Qtr2

2006 -Qtr3

2006 -Qtr4

2007 -Qtr1

2007 -Qtr2

2007 -Qtr3

2007 -Qtr4

2008 -Qtr1

2008 -Qtr2

2008 -Qtr3

2008 -Qtr4

2009 -Qtr1

2009 -Qtr2

2009 -Qtr3

2009 -Qtr4

2010 -Qtr1

2010 -Qtr2

2010 -Qtr3

Total Nodes = 43,936

Total Storage = 206 PB

13,687

22,334

7,803

0 5000 10000 15000 20000 25000

Production

Research

Sandbox

Nodes running Hadoop at Yahoo!

Over 43,000 nodes running Hadoop

1M+ Monthly Hadoop Jobs

Page 10: Federated HDFS

Scaling Hadoop

Early Gains

• Simple design allowed rapid improvements

• Namespace is all in RAM, simpler locking

• Improved memory usage in 0.16, JVM Heap configuration (Suresh Srinivas)

Growth of number of files and storage is limited by adding RAM to namenode

• 50G heap = 200M “fs objects” = 100M names + 100MBlocks

• 14PB of storage (50MB blocksize)

• 4K nodes

- Job Tracker carries out both job lifecycle management and scheduling

Yahoo’s Response:

• HDFS Federation: horizontal scaling of namespace (0.22)

• Next Generation of Map-Reduce - Complete overhaul of job tracker/task tracker

Goal:

• Clusters of 6000 nodes, 100,000 cores & 10k concurrent jobs, 100 PB raw storage per cluster

6 May 201010

Page 11: Federated HDFS

11

Scaling the Name Service:

Options

Partial

NS (Cache)

in memory

Multiple

Namespace

volumes

All NS

in memoryArchives

# names

# clients

100M 10B200M 1B 2B

1x

4x

20x

50x

Partial

NS in memory

With Namespace

volumes

Not to scale

100x Good isolation

properties

Separate Bmaps from NN

20B

Block-reports for Billions of

blocks requires rethinking

block layer

Distributed NNs

Page 12: Federated HDFS

Opportunity:Vertical & Horizontal scaling

Horizontal scaling/federation benefits:

– Scale

– Isolation, Stability, Availability

– Flexibility

– Other Namenode implementations or non-HDFS namespaces

12

Namenode Horizontal: Federation

Vertical scalingMore RAM, Efficiency in memory usage

First class archives (tar/zip like)

Partial namespace in main memory

Page 13: Federated HDFS

Block (Object) Storage Subsystem

13

NS1 Foreign NS n

... ...NS k

Na

me

sp

ace

Datanode 1 Datanode 2 Datanode m... ... ...

Balancer

Block Pools

Pools nPools kPools 1

Blo

ck s

tora

ge

Block (Object) Storage Subsystem

• Shared storage provided as pools of blocks

• Namespaces (HDFS, others) use one or more block-pools

• Note: HDFS has 2 layers today – we are generalizing/extending it.

Page 14: Federated HDFS

1st Phase:

B-Pool management inside Namenode

14

Datanode 1 Datanode 2 Datanode m... ... ...

NS1 Foreign NS n

... ...NS k

Balancer

Block Pools

Pools nPools kPools 1

NN-1 NN-k NN-n

Future: Move Block

mgt into separate

nodes

Page 15: Federated HDFS

Future:

Move block management out

15

Datanode 1 Datanode 2 Datanode m... ... ...

Balancer

Block Pools

Pools nPools kPools 1

NS1Foreign

NS n... ...

NS k

Block Managerclient

1. Open

2. getBlockLocations

3. ReadBlock

Easier to scale horizontally

than the name server

Page 16: Federated HDFS

What is a HDFS Cluster

Current• HDFS Cluster

– 1 Namespace

– A set of blocks

• Implemented as

– 1 Namenode

– Set of DNs

New• HDFS Cluster

– N Namespaces

– Set of block-pools• Each block-pool is set of blocks

• Phase 1: 1 BP per NS

– Implies N block-pools

• Implemented as

– N Namenode

– Set of DNs• Each DN stores the blocks for

each block-pool

16

Page 17: Federated HDFS

Managing Namespaces

• Federation has multiple namespaces – don’t you need a single global namespace?

– Key is to share the data and the names used to access the shared data.

• A global namespace is one way to do that – but even there we talk of several large “global” namespaces

• Client-side mount table is another way to share

– Shared mount-table => “global” shared view

– Personalized mount-table => per-application view

• Share the data that matter by mounting it

hom

eprojectdata tmp

/ Client-side

mount-table

Page 18: Federated HDFS

18

HDFS Federation Across Clusters

home

projectdata

tmp

/

Cluster 1

home

projectdata

tmp

/

Cluster 2

Application

mount-

table in

Cluster 1

Application

mount-

table in

Cluster 2

Page 19: Federated HDFS

Nameserver as container for namespaces

• Nameserver as a container for namespaces

• Each namespace with its own separate state• Persistent state in shared storage (e.g. Book Keeper)

• Each nameserver serves a set of namespaces

• Selected based on isolation and capacity

• A namespace can be moved between nameserver

19

Shared persistent storage for namespace metadata(e.g. Book keeper)

Nameserver Nameserver

Page 20: Federated HDFS

Summary

Federated HDFS (Jira HDFS-1052)

• Scale by adding independent Namenodes

• Preserves the robustness of the Namenodes

• Not much code change to the Namenode

• Generalizes the Block storage layer

• Analogous to Sans & Luns

• Can add other implementations of the Namenodes

• Even other name services (HBase?)

• Could move the Block management out of the Namenode in the future

• But to truly scale to 10s or 100s Bilions of blocks we need to rethink the block map and block reports

• Benefits

• Scale number of file names and blocks

• Improved isolation and hence availability

6 May 201020

Page 21: Federated HDFS

Q & A

21