'Hive Default Partition' in details
We will use below sample data for the task.
This is the sample data of employee details. Some employees are the member of company’s sports team like Cricket, Football, Basketball, Tennis, Badminton. Some employees are not a member of any sports team that records are highlighted in the above picture.
You can download the sample data from here.
Solution
There are 2 types of partitions in hive – Static and Dynamic. We do assign a value to a partition column in static partition table whereas, in the dynamic partition, the value gets assigned to the partitioned column dynamically based on the data available in the table for the defined partition column
Step 1: Set Up
In this step, we are creating a hive table for loading the sample data. If you have already loaded the data into a table then skip this step and jump to the next step.
Let’s say we are having a hive table named emp_sports which stores employees details (sample data). We can create the table using below DDL and load the data into it:
CREATE TABLE db_bdpbase.emp_sports(
id INT,
firstname STRING,
lastname STRING,
sports STRING,
city STRING,
country STRING
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
STORED AS TEXTFILE
TBLPROPERTIES(“skip.header.line.count”=“1”);
LOAD DATA INPATH ‘/usr/bdp/hive/sample_data.csv’ INTO db_bdpbase.emp_sports;
Here, we have created a table and loaded the sample data.
Let’s check the data in this table:
Step 2: Create Partitioned Table
In this step, we are going to create a hive partition table which is partitioned by Sports column.
The DDL will look like:
CREATE TABLE DB_BDPBASE.DEFAULT_PARTITION_TEST(
ID INT,
FIRSTNAME STRING,
LASTNAME STRING,
CITY STRING,
COUNTRY STRING
) PARTITIONED BY (
SPORTS STRING
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’;
Step 3: Load data into Partitioned Table
In this step, we are inserting data from the base table into the created hive partitioned table.
In this step, we are inserting data from the base table into the created hive partitioned table.
Here, total 6 partitions have created which are Badminton, Basketball, Cricket, Football, Tennis, and HIVE_DEFAULT_PARTITION. The HIVE_DEFAULT_PARTITION partition has been created because of NULL value of the partitioned column. The records which are having a NULL value for sports are getting stored in HIVE_DEFAULT_PARTITION partition.
Let’s check records of this default partition:
Wrapping Up
In this post, we understood what is the default partition in hive. This gets create when we load data into partitioned table dynamically and partition column has a NULL value.