Partitions in Hive
Partitioning is the optimization technique in Hive which improves the performance significantly. It is a way of dividing a table into related parts based on the values of partitioned columns. Using partition, it is easy to query a portion of the data. Tables or partitions are sub-divided into buckets, to provide extra structure to the data so that data may be used for more efficient querying. Bucketing works based on the value of hash function of some column of a table.
hive> create table employee (id int, name string, age int, department string)
partitioned by (department string)
row format delimited
fields terminated by ',';
The partitioning in Hive can be executed in two ways -
1) Static partitioning : In static , it is required to pass the values of partitioned columns manually while loading the data into the table
hive> load data local inpath '/home/codegyani/hive/employee_details' into table employee
partition((department = "hr");
2) Dynamic partitioning : In dynamic partitioning, the values of partitioned columns exist within the table. So, it is not required to pass the values of partitioned columns manually
we can’t perform alter on the Dynamic partition.
Enable the dynamic partition by using the following commands:
hive> set hive.exec.dynamic.partition=true;
hive> set hive.exec.dynamic.partition.mode=nonstrict;
Create a dummy table to store the data
create table employee_dummy(id int, name string, age int, department string)
row format delimited
fields terminated by ',';
Now, load the data into the table.
hive> load data local inpath '/home/codegyani/hive/employee_details' into table employee_dummy;
Now, insert the data of dummy table into the partition table
insert into employee
partition(department)
select id, name, age,department
from employee_dummy;
---------------- adding partition to existing table ---------------------------
ALTER TABLE <Table_Name> ADD
[IF NOT EXISTS] PARTITION
<partition_name> LOCATION 'hdfs_path_of_directory'
----------------Rename Partition ---------------------------------------
Renaming a Partition
ALTER TABLE employee PARTITION (department='software-engineer') RENAME TO PARTITION (department='developer');
Manually Renaming Partitions on HDFS
hdfs dfs -mv /user/hive/warehouse/employee/department=software-engineer /user/hive/warehouse/employee/department=developer
----------------Dropping a Partition ---------------------------------------
Dropping a Partition
>ALTER TABLE employee DROP IF EXISTS PARTITION (department='developer');
Manually Dropping Partitions on HDFS
hdfs dfs -rm -R /user/hive/warehouse/employee/department=developer
Show All Partitions on Hive Table
SHOW PARTITIONS employee;
How to Filter Partitions
> SHOW PARTITIONS employee PARTITION(department='developer');