Sunday, July 2, 2023

Architecture of Zookeeper

 ZooKeeper

ZooKeeper is a centralized, open-source coordination service that provides reliable distributed coordination for applications in a distributed system. It follows a client-server architecture and is designed to be highly available and fault-tolerant. Let's explore the architecture of ZooKeeper:

 

Ensemble of Servers:

ZooKeeper operates in an ensemble mode where multiple ZooKeeper servers form a cluster. Each server in the ensemble contributes to the overall availability and fault-tolerance of the system. An ensemble typically consists of an odd number of servers (e.g., 3, 5, or 7) to achieve majority-based consensus.

 

Leader-Follower Model:

Within the ZooKeeper ensemble, one server is elected as the leader, while the rest of the servers function as followers. The leader is responsible for processing and coordinating all the client requests, while the followers replicate the leader's state and serve read requests from clients.

 

Data Model:

ZooKeeper provides a hierarchical data model similar to a file system, known as a "ZooKeeper tree" or "namespace." The namespace is organized as a tree-like structure, with each node referred to as a "znode." Znodes can be used to store data and also serve as synchronization primitives.

 

Write Requests and Consensus:

When a client sends a write request to the ZooKeeper ensemble, it is forwarded to the leader. The leader processes the request, updates its own state, and propagates the changes to the followers. To ensure consistency, ZooKeeper uses a consensus algorithm called ZAB (ZooKeeper Atomic Broadcast). The ZAB protocol ensures that all changes are applied in the same order on each server, maintaining strong consistency across the ensemble.

 

Read Requests and Follower Synchronization:

Read requests from clients can be served by any server in the ensemble, not just the leader. Followers maintain a copy of the leader's state through a process called "follower synchronization." When a follower receives updates from the leader, it applies the changes to its own state, ensuring that all servers have a consistent view of the data.

 

Watches and Event Notifications:

ZooKeeper supports a watch mechanism where clients can register watches on znodes. A watch is triggered when the data of a watched znode changes or when a znode is created or deleted. ZooKeeper sends notifications to the interested clients, allowing them to react to changes in real-time.

 

Client Libraries and Sessions:

ZooKeeper provides client libraries in various programming languages that enable applications to interact with the ensemble. Clients establish sessions with the ensemble and maintain a connection to one of the servers. If a client's session expires or its connection is lost, it can reconnect and resume its operations without losing its context.

 

By providing distributed coordination and synchronization primitives, ZooKeeper enables applications to implement various distributed systems, such as distributed locks, leader election, configuration management, and more. Its architecture ensures high availability, fault-tolerance, and strong consistency, making it a reliable foundation for building distributed applications.

Very Very Very important for interview point of view as hadoop administrator profile
Follow 👉https://lnkd.in/dDYk_vQs ðŸ‘ˆ for awesome stuff
YouTube channel : https://lnkd.in/d4JjCKZ4


Saturday, July 1, 2023

Speculative

 Speculative execution in Hadoop is a feature that addresses the problem of slow-running tasks, known as stragglers, in a MapReduce job. When enabled, Hadoop identifies tasks that are taking longer to complete than their counterparts and launches additional copies of those tasks on different nodes. The goal is to complete the job faster by having multiple attempts running in parallel and using the first successful result.

 

The speculative task attempts run concurrently with the original tasks. Hadoop monitors their progress and compares their execution times. Once any task completes successfully, all other speculative task attempts for the same task are terminated. The output of the successful task attempt is then used as the final result.


 

 

The purpose of speculative execution is to improve job completion time and resource utilization. By launching multiple attempts of slow-running tasks, Hadoop mitigates the impact of stragglers, which could be caused by various factors like hardware failures, network issues, or data skew. Speculative execution allows the job to make progress even if some tasks are running significantly slower than expected.

 

Overall, speculative execution is a technique employed by Hadoop to optimize job execution in a distributed computing environment by identifying and addressing slow-running tasks. It helps improve the efficiency and reliability of data processing in Hadoop clusters




Follow 👉https://lnkd.in/dDYk_vQs ðŸ‘ˆ for awesome stuff

YouTube channel : https://youtu.be/RJdXBT7f2U8