Wednesday, June 29, 2022

impala architecture

 


Impala Deamon: 


Impala daemon : It generally identified by the Impalad process . it runs on every node in the CDH cluster. It accepts the queries from various interfaces like impala shell, hue browser, etc.… and processes them.whenever any Impala node in the cluster creates, alter, drops any object or any statement like insert, load data is processed, each Daemon will also receive the broadcasted message


Whenever a query is submitted to an impalad on a particular node, that node serves as a “coordinator node” for that query

in order to store the mapping between table and files this daemon will use Hive metastore. 

 Also, uses HDFS NN to get the mapping between files and blocks. Therefore, to get/process the data impala uses hive metastore and Name Node.

we can say all ImpalaD’s are equivalent.

There are 3 major components of ImpalaD such as



Query Planner :  Query Planner is responsible for parsing out the query  this planning occurs in 2 parts.

 1)  Since all the data in the cluster resided on just one node, a single node plan is made, at first

 

 2) Afterwards, on the basis of the location of various data sources in the cluster, this single node plan is converted to a distributed plan (thereby leveraging data locality).



Coordinator  :   Query Coordinator is responsible for coordinating the execution of the entire query. To read and process data, it sends requests to various executors. Afterward, it receives the data back from these executors and streams it back to the client via JDBC/ODBC

 

Executor : Executor is responsible for  aggregations of data .Especially, the data which is read locally or if not available locally could be streamed from executors of other Impala daemons



Impala Catelog server :   Impala Catelog server is install on 1 host of the cluster .via the state stored it distributes metadata to Impala daemons.

It is physically represented by a daemon process named catalogd . You only need such a process on one host in a cluster.



Impala statestore :The name of the Impala State store daemon process is State stored .

Impala statestore  is install on one host of the cluster. statestore  checks on the health of Impala daemons on all the DataNodes .

 We can say Statestore daemon is a name service that monitors the availability of Impala services across the cluster. 

Also, handles situations such as nodes becoming unavailable or becoming available again. Impala statestore keeps track of which ImpalaD’s are up and running, and relays this information to all the ImpalaD’s in the cluster. Hence, they are aware of this information when distributing tasks to other ImpalaD’s.

In the event of a node failure due to any reason, Statestore updates all other nodes about this failure and once such a notification is available to the other impalad, no other Impala daemon assigns any further queries to the affected node.


   




No comments:

Post a Comment