Sunday, September 11, 2022

rack awareness in hadoop

            Rack awareness in Hadoop

Hadoop components are rack aware. For example, HDFS block placement will use rack awareness for fault tolerance by placing one block replica on a different rack. This provides data availability in the event of a network switch failure or partition within the cluster.
 Hadoop master daemons obtain the rack id of the cluster workers by invoking either an external script or java class as specified by configuration files. Using either the java class or external script for topology, output must adhere to the java org.apache.hadoop.net. DNSToSwitchMapping interface. The interface expects a one-to-one correspondence to be maintained and the topology information in the format of ‘/myrack/myhost’, where ‘/’ is the topology delimiter, ‘myrack’ is the rack identifier, and ‘myhost’ is the individual host. Assuming a single /24 subnet per rack, one could use the format of ‘/192.168.100.0/192.168.100.5’ as a unique rack-host topology mapping.
 To use the java class for topology mapping, the class name is specified by the net.topology.node.switch.mapping.impl parameter in the configuration file. An example, NetworkTopology.java, is included with the hadoop distribution and can be customized by the Hadoop administrator. Using a Java class instead of an external script has a performance benefit in that Hadoop doesn’t need to fork an external process when a new worker node register itself.
 If implementing an external script, it will be specified with the net.topology.script.file.name parameter in the configuration files. Unlike the java class, the external topology script is not included with the Hadoop distribution and is provided by the administrator. Hadoop will send multiple IP addresses to ARGV when forking the topology script. The number of IP addresses sent to the topology script is controlled with net.topology.script.number.args and defaults to 100. If net. topology.script.number.args was changed to 1, a topology script would get forked for each IP submitted by Data Nodes and/or NodeManagers.
 If net.topology.script.file.name or net.topology.node.switch.mapping.impl is not set, the rack id ‘/default-rack’ is returned for any passed IP address. While this behavior appears desirable, it can cause issues with HDFS block replication as default behavior is to write one replicated block off rack and is unable to do so as there is only a single rack named ‘/default-rack’.
 By default, Hadoop store block in local rack and maintain one copy of block in rack which far away for fault tolerance like it one data center goes down then even in that situation another block copy will be available in different data center.
 Configuring HDFS rack awareness
The NameNode in an HDFS cluster maintains rack IDs of all the DataNodes. The NameNode uses this information about the distribution of DataNodes among various racks in the cluster to select the closer DataNodes for effective block placement during read/write operations. This concept of selecting the closer DataNodes based on their location in the cluster is termed as rack awareness. Rack awareness helps in maintaining fault tolerance in the event of a failure.
Configuring rack awareness on an HDP cluster involves creating a rack topology script, adding the script to core-site.xml, restarting HDFS, and verifying the rack awareness.
1.     Create a rack topology script
HDFS uses topology scripts to determine the rack location of nodes and uses this information to replicate block data to redundant racks.
2.     Add the topology script property to core-site.xml
Assign the name of the topology script to the net.topology.script.file.name property in core-site.xml.
3.     Restart HDFS and MapReduce services
After adding the topology script property to core-site.xml, you must restart the HDFS and MapReduce services.
4.    Verify rack awareness
You must perform a series of checks to verify if rack awareness is activated on the cluster.

 Follow 👉 syed ashraf quadri👈 for awesome stuff 

        



No comments:

Post a Comment