Fencing
A fencing method is a
method by which one node can forcibly prevent another node from making
continued progress. This might be implemented by killing a process on the other
node, by denying the other node's access to shared storage, or by accessing a
PDU to cut the other node's power.
Hadoop NameNode High
Availability Architecture
In HDFS NameNode High
Availability Architecture, two NameNodes run at the same time. We can Implement
the Active and Standby NameNode configuration in following two ways:
Ø
Using Quorum Journal Nodes
Ø
Using Shared Storage
Using Quorum Journal Nodes
QJM is an HDFS
implementation. It is designed to provide edit logs. It allows sharing these
edit logs between the active namenode and standby namenode.
For High Availability,
standby namenode communicates and synchronizes with the active namenode. It
happens through a group of nodes or daemons called “Journal nodes”. The QJM
runs as a group of journal nodes. There should be at least three journal nodes.
For N journal nodes,
the system can tolerate at most (N-1)/2 failures and continue to function. So,
for three journal nodes, the system can tolerate the failure of one {(3-1)/2}
of them.
When an active node
performs any modification, it logs modification to all journal nodes.
The standby node reads
the edits from the journal nodes and applies to its own namespace in a constant
manner. In the case of failover, the standby will ensure that it has read all
the edits from the journal nodes before promoting itself to the Active state.
This ensures that the namespace state is completely synchronized before a
failure occurs.
To provide a fast
failover, the standby node must have up-to-date information about the location
of data blocks in the cluster. For this to happen, IP address of both the
namenode is available to all the datanodes and they send block location
information and heartbeats to both NameNode.
Using Shared Storage
Standby and active
namenode synchronize with each other by using “shared storage device”. For this
implementation, both active namenode and standby namenode must have access to
the particular directory on the shared storage device (.i.e. Network file
system).
When active namenode
perform any namespace modification, it logs a record of the modification to an
edit log file stored in the shared directory. The standby namenode watches this
directory for edits, and when edits occur, the standby namenode applies them to
its own namespace. In the case of failure, the standby namenode will ensure
that it has read all the edits from the shared storage before promoting itself
to the Active state. This ensures that the namespace state is completely
synchronized before failover occurs.
To prevent the “split-brain
scenario” in which the namespace state deviates between the two namenode, an
administrator must configure at least one fencing method for the shared
storage.
Both these terms are
pertaining to HDFS in Hadoop.
HDFS (Hadoop
Distributed File System) is the storage layer of Hadoop which stores large
datasets as files in a distributed manner across multiple nodes in a hadoop
cluster. A file is divided into blocks and these blocks are stored across
multiple nodes in the cluster. HDFS has a master-slave architecture. The
NameNode is the master which stores the metadata pertaining to the files,
blocks, its location etc.. and DataNode(s) are the slaves which stores the
actual data in terms of blocks.
HDFS Federation
NameNode keeps all
metadata in main memory as an in-memory image. As the number of files grows in
the cluster, it becomes impractical to host entire file namespace in the main
memory of a single system. In such a case, the file system namespace is
partitioned into namespace volumes. For example, all addresses starting with
/usr or /home can be considered as two different volumes managed by two
separate namenodes. Each namenode manages a namespace volume and comprises of
the metadata for the namespace. The namenodes also manage a block pool
containing all the blocks for the files in its namespace. There is no
communication between the namenodes and the namenodes operate in isolation.
Also, failure of one namenode doesn’t affect the working of other namenodes.
The datanodes communicate with all the namenodes present in the cluster. The
blocks in a specific data node can map to block pools of any name node.
HA (High Availability)
For NameNode
Prior to Hadoop 2.x,
only 1 instance of NameNode was supported and hence it was Single Point Of
Failure i.e. if the NameNode crashes the entire HDFS is unavailable until the
FSImage is reconstructed from the standby NameNode and the NameNode instance is
restarted. This introduces considerably down-time and hence the system was not highly
available.
From Hadoop 2.x, the
support for HA NameNode was introduced where there is a pair of NameNodes in
the system out of which one is active and another one is passive. Only the
active NameNode manages the HDFS cluster. The passive NameNode keeps monitoring
the active NameNode and makes sure it is in sync with the active NameNode. If
the active NameNode crashes, the passive NameNode takes over as the active
NameNode and continues the functions of HDFS without any down-time as the
passive NameNode was always in sync with the active NameNode.
In the failover
scenario, it is important to make sure that only one NameNode is active. At no
point in time, there should be more than 1 active NameNodes. More than 1 active
NameNodes results in a split-brain scenario.
Fencing is a technique
using which it is ensured that only 1 NameNode is active. The most popular
approach is to use Quoram Journal Nodes to implement fencing.
Hardware resources
In order to deploy an
HA cluster, you should prepare the following:
NameNode machines - the
machines on which you run the Active and Standby NameNodes should have
equivalent hardware to each other, and equivalent hardware to what would be
used in a non-HA cluster.
Shared storage - you
will need to have a shared directory which both NameNode machines can have
read/write access to. Typically this is a remote filer which supports NFS and
is mounted on each of the NameNode machines. Currently only a single shared
edits directory is supported. Thus, the availability of the system is limited
by the availability of this shared edits directory, and therefore in order to
remove all single points of failure there needs to be redundancy for the shared
edits directory. Specifically, multiple network paths to the storage, and
redundancy in the storage itself (disk, network, and power). Beacuse of this,
it is recommended that the shared storage server be a high-quality dedicated
NAS appliance rather than a simple Linux server.
Note
that, in an HA cluster, the Standby NameNode also performs checkpoints of the
namespace state, and thus it is not necessary to run a Secondary NameNode,
CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an
error. This also allows one who is reconfiguring a non-HA-enabled HDFS cluster
to be HA-enabled to reuse the hardware which they had previously dedicated to
the Secondary NameNode.
No comments:
Post a Comment