Wednesday, September 28, 2022

Fencing-HA

 

      Fencing

A fencing method is a method by which one node can forcibly prevent another node from making continued progress. This might be implemented by killing a process on the other node, by denying the other node's access to shared storage, or by accessing a PDU to cut the other node's power.

Hadoop NameNode High Availability Architecture



In HDFS NameNode High Availability Architecture, two NameNodes run at the same time. We can Implement the Active and Standby NameNode configuration in following two ways:

 

Ø Using Quorum Journal Nodes

Ø Using Shared Storage

 Using Quorum Journal Nodes

QJM is an HDFS implementation. It is designed to provide edit logs. It allows sharing these edit logs between the active namenode and standby namenode.

For High Availability, standby namenode communicates and synchronizes with the active namenode. It happens through a group of nodes or daemons called “Journal nodes”. The QJM runs as a group of journal nodes. There should be at least three journal nodes.

 

For N journal nodes, the system can tolerate at most (N-1)/2 failures and continue to function. So, for three journal nodes, the system can tolerate the failure of one {(3-1)/2} of them.

When an active node performs any modification, it logs modification to all journal nodes.

The standby node reads the edits from the journal nodes and applies to its own namespace in a constant manner. In the case of failover, the standby will ensure that it has read all the edits from the journal nodes before promoting itself to the Active state. This ensures that the namespace state is completely synchronized before a failure occurs.

To provide a fast failover, the standby node must have up-to-date information about the location of data blocks in the cluster. For this to happen, IP address of both the namenode is available to all the datanodes and they send block location information and heartbeats to both NameNode.

Using Shared Storage

Standby and active namenode synchronize with each other by using “shared storage device”. For this implementation, both active namenode and standby namenode must have access to the particular directory on the shared storage device (.i.e. Network file system).

 

When active namenode perform any namespace modification, it logs a record of the modification to an edit log file stored in the shared directory. The standby namenode watches this directory for edits, and when edits occur, the standby namenode applies them to its own namespace. In the case of failure, the standby namenode will ensure that it has read all the edits from the shared storage before promoting itself to the Active state. This ensures that the namespace state is completely synchronized before failover occurs.

 

To prevent the “split-brain scenario” in which the namespace state deviates between the two namenode, an administrator must configure at least one fencing method for the shared storage.

Both these terms are pertaining to HDFS in Hadoop.

HDFS (Hadoop Distributed File System) is the storage layer of Hadoop which stores large datasets as files in a distributed manner across multiple nodes in a hadoop cluster. A file is divided into blocks and these blocks are stored across multiple nodes in the cluster. HDFS has a master-slave architecture. The NameNode is the master which stores the metadata pertaining to the files, blocks, its location etc.. and DataNode(s) are the slaves which stores the actual data in terms of blocks.

HDFS Federation

NameNode keeps all metadata in main memory as an in-memory image. As the number of files grows in the cluster, it becomes impractical to host entire file namespace in the main memory of a single system. In such a case, the file system namespace is partitioned into namespace volumes. For example, all addresses starting with /usr or /home can be considered as two different volumes managed by two separate namenodes. Each namenode manages a namespace volume and comprises of the metadata for the namespace. The namenodes also manage a block pool containing all the blocks for the files in its namespace. There is no communication between the namenodes and the namenodes operate in isolation. Also, failure of one namenode doesn’t affect the working of other namenodes. The datanodes communicate with all the namenodes present in the cluster. The blocks in a specific data node can map to block pools of any name node.

HA (High Availability) For NameNode

Prior to Hadoop 2.x, only 1 instance of NameNode was supported and hence it was Single Point Of Failure i.e. if the NameNode crashes the entire HDFS is unavailable until the FSImage is reconstructed from the standby NameNode and the NameNode instance is restarted. This introduces considerably down-time and hence the system was not highly available.

From Hadoop 2.x, the support for HA NameNode was introduced where there is a pair of NameNodes in the system out of which one is active and another one is passive. Only the active NameNode manages the HDFS cluster. The passive NameNode keeps monitoring the active NameNode and makes sure it is in sync with the active NameNode. If the active NameNode crashes, the passive NameNode takes over as the active NameNode and continues the functions of HDFS without any down-time as the passive NameNode was always in sync with the active NameNode.

In the failover scenario, it is important to make sure that only one NameNode is active. At no point in time, there should be more than 1 active NameNodes. More than 1 active NameNodes results in a split-brain scenario.

Fencing is a technique using which it is ensured that only 1 NameNode is active. The most popular approach is to use Quoram Journal Nodes to implement fencing.

 

Hardware resources

In order to deploy an HA cluster, you should prepare the following:

 

NameNode machines - the machines on which you run the Active and Standby NameNodes should have equivalent hardware to each other, and equivalent hardware to what would be used in a non-HA cluster.

 

Shared storage - you will need to have a shared directory which both NameNode machines can have read/write access to. Typically this is a remote filer which supports NFS and is mounted on each of the NameNode machines. Currently only a single shared edits directory is supported. Thus, the availability of the system is limited by the availability of this shared edits directory, and therefore in order to remove all single points of failure there needs to be redundancy for the shared edits directory. Specifically, multiple network paths to the storage, and redundancy in the storage itself (disk, network, and power). Beacuse of this, it is recommended that the shared storage server be a high-quality dedicated NAS appliance rather than a simple Linux server.

Note that, in an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an error. This also allows one who is reconfiguring a non-HA-enabled HDFS cluster to be HA-enabled to reuse the hardware which they had previously dedicated to the Secondary NameNode.

 YouTube link: Hadoop series 1 - YouTube

 Follow 👉 syed ashraf quadri👈 for awesome stuff

No comments:

Post a Comment