Friday, May 27, 2022

Theory of Hive Partition

Partitions in Hive

Partitioning is the optimization technique in Hive which improves the performance significantly. It is a way of dividing a table into related parts based on the values of partitioned columns. Using partition, it is easy to query a portion of the data. Tables or partitions are sub-divided into buckets, to provide extra structure to the data so that  data may be used for more efficient querying. Bucketing works based on the value of hash function of some column of a table.


hive> create table employee (id int, name string, age int,  department string)   

partitioned by (department string)  

row format delimited  

fields terminated by ','; 

The partitioning in Hive can be executed in two ways -

1) Static partitioning : In static , it is required to pass the values of partitioned columns manually while loading the data into the table

hive> load data local inpath '/home/codegyani/hive/employee_details' into table employee

partition((department = "hr");  

2) Dynamic partitioning : In dynamic partitioning, the values of partitioned columns exist within the table. So, it is not required to pass the values of partitioned columns manually

we can’t perform alter on the Dynamic partition.

Enable the dynamic partition by using the following commands: 

hive> set hive.exec.dynamic.partition=true;    

hive> set hive.exec.dynamic.partition.mode=nonstrict;  


Create a dummy table to store the data


create table employee_dummy(id int, name string, age int,  department string)   

row format delimited  

fields terminated by ','; 


Now, load the data into the table.

hive> load data local inpath '/home/codegyani/hive/employee_details' into table employee_dummy;


Now, insert the data of dummy table into the partition table  

insert into employee

partition(department)  

select id, name, age,department  

from employee_dummy;

---------------- adding partition to  existing table ---------------------------

ALTER TABLE <Table_Name> ADD 

[IF NOT EXISTS] PARTITION 

<partition_name> LOCATION 'hdfs_path_of_directory'

----------------Rename Partition ---------------------------------------

Renaming a Partition

ALTER TABLE employee PARTITION (department='software-engineer') RENAME TO PARTITION (department='developer');

Manually Renaming Partitions on HDFS

hdfs dfs -mv /user/hive/warehouse/employee/department=software-engineer   /user/hive/warehouse/employee/department=developer

----------------Dropping a Partition ---------------------------------------

Dropping a Partition

>ALTER TABLE employee DROP IF EXISTS PARTITION (department='developer');

Manually Dropping Partitions on HDFS

hdfs dfs -rm -R /user/hive/warehouse/employee/department=developer

Show All Partitions on Hive Table

SHOW PARTITIONS employee;

How to Filter Partitions

> SHOW PARTITIONS employee PARTITION(department='developer');

No comments:

Post a Comment