Hive Architecture
Hive CLI - The Hive CLI (Command Line Interface) is a shell where we can execute Hive queries and commands.
Hive Web User Interface - The Hive Web UI is just an alternative of Hive CLI. It provides a web-based GUI for executing Hive queries and commands.
Hive Server - It is referred to as Apache Thrift Server. It accepts the request from different clients and provides it to Hive Driver.
HiveServer2 is the successor of HiveServer1.HiveServer2 enables clients to execute queries against the Hive. It allows multiple clients to submit requests to Hive and retrieve the final results. Hive Server1 It does not handle concurrent requests from more than one client due to which it was replaced by HiveServer2
It is basically designed to provide the best support for open API clients like JDBC and ODBC
Hive Driver - It receives queries from different sources like web UI, CLI, Thrift, and JDBC/ODBC driver. It transfers the queries to the compiler.
Hive Compiler - The purpose of the compiler is to parse the query and perform semantic analysis on the different query blocks and expressions. It converts HiveQL statements into MapReduce jobs.
Hive Optimizer : Optimizer generates the logical plan in the form of DAG of map-reduce tasks and HDFS tasks.
Hive Execution Engine : the execution engine executes the incoming tasks in the order of their dependencies
Hive MetaStore - It is a central repository that stores all the structure information of various tables and partitions in the warehouse. It also includes metadata of column and its type information, the serializers and deserializers which is used to read and write data and the corresponding HDFS files where the data is stored.
This metastore is generally a relational database.
No comments:
Post a Comment