23
HDFS HA using Journal Nodes

Hdfs ha using journal nodes

Embed Size (px)

Citation preview

Page 1: Hdfs ha using journal nodes

HDFS HA using Journal Nodes

Page 2: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

Introducing Journal Nodes

Manual

Failover

Page 3: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

ArchitectureJN1

JN2

JN3

NNActive

NNStandby

DN DNDNDN

Block locations map

Page 4: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

• When any namespace modification is performed it durably logs a record of the modification to JNs

• The Standby reads the edits from the JNs and applies them to its own namespace

JournalNodes’ job

JN1

JN2

JN3

NNActive

NNStandby

Edits Edits Edits

Edits Edits EditsEdits Edits Edits

SafeMode

Page 5: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

• Specify path on local disk

• tolerate at most (N - 1) / 2 failures 

JournalNodes’ storage

Page 6: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

• JournalNodes will only allow a single NameNode to be a writer at a time.

• no potential for corrupting the file system metadata from a split-brain scenario.

JournalNodes’ fencing 

JN1

JN2

JN3

NNActive

NNStandby

WRITE READ

Page 7: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

• Whenever a NameNode becomes active, it first generate an epoch number.

• first active NameNode after the namespace is initialized starts with epoch number 1

• any failovers or restarts result in an increment of the epoch number

JournalNodes’ fencing 

Page 8: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

• When a new NameNode becomes active, it has an epoch number higher than any previous NameNode

• Call JournalNodes to increment their promised epochs

• Fencing: – JNs receive newer epoch

update majority of JNs’ promised epochs accept– JNs receive older epoch

reject

JournalNodes’ fencing 

Page 9: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

• previous Active NameNode could serve read requests to clients which may be out of date until a write access performed

• You can specify some fencing method to avoid this happened

But…

Page 10: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

Fencing

Method

Page 11: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

• sshfenceSSH to the Active NameNode and kill the process

Fencing Method

Page 12: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

• shellrun a shell command to fence the Active NameNode

• The script may have properties with the '_' character replacing any '.' ex : dfs_namenode_rpc-address 

Fencing Method

Page 13: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

• Additional environment variable

Fencing Method

Page 14: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

Automatic

Failover

Page 15: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

JN1

JN2

JN3

Page 16: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

• Health monitoring– the ZKFC pings its local NameNode on a periodic basis with a

health-check command. (healthy/unhealthy)

• ZooKeeper session management– when the local NameNode is healthy, the ZKFC holds a session

open in ZooKeeper. – If the local NameNode is active, it also holds a special "lock"

znode. – if the session expires, the lock node will be automatically

deleted.

ZKFailoverController 

Page 17: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

• ZooKeeper-based election– if the local NameNode is healthy, and no other node currently

holds the lock znode, it will itself try to acquire the lock. – If it succeeds, then it has "won the election“

• Failover– the previous active is fenced– local NameNode transitions to active state.

ZKFailoverController 

Page 18: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

JN1

JN2

JN3

NNActive

1

2

3

4

5 6

7

Page 19: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

Client

Side

Page 20: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

• Client connect to Active Namenode via proxy

• When Active Namenode down, client receive Exception retry and send RPC to another namenode (implement by ConfiguredFailoverProxyProvider)

Client Failover

Page 21: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

Steps to Apply

HDFS HA

Page 22: Hdfs ha using journal nodes

04/11/2023 Copyright 2013 Trend Micro Inc.

• If setting up a fresh HDFS cluster,hdfs namenode –format

• copy over the contents of your NameNode metadata directories to the otherhdfs namenode –bootstrapStandby./format-failover-namenode.sh

• hdfs –initializeSharedEdits to initialize edits log in journalnode

• Startup both Namenode

converting a non-HA-enabled cluster to be HA-enabled