Apache Hive is a data warehousing system built on Hadoop for analytics and MapReduce jobs. Partitioning divides large datasets into smaller parts for faster queries. Dynamic partitioning automatically determines partition values from the data being inserted, unlike static partitioning where values are manually specified.
Static vs Dynamic Partitioning
Enabling Dynamic Partitioning
SET hive.exec.dynamic.partition = true; SET hive.exec.dynamic.partition.mode = nonstrict;
Complete Example
Create a source table, load data, then dynamically partition it ?
-- Step 1: Create source table CREATE TABLE sales_raw ( id INT, product STRING, amount DOUBLE, sale_date STRING, country STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; -- Step 2: Load data LOAD DATA LOCAL INPATH '/home/data/sales.csv' INTO TABLE sales_raw; -- Step 3: Create partitioned table CREATE TABLE sales_partitioned ( id INT, product STRING, amount DOUBLE, sale_date STRING ) PARTITIONED BY (country STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; -- Step 4: Insert with dynamic partitioning INSERT INTO TABLE sales_partitioned PARTITION (country) SELECT id, product, amount, sale_date, country FROM sales_raw;
Hive automatically creates partition directories based on distinct country values in the source data.
Managing Partitions
-- View partitions SHOW PARTITIONS sales_partitioned; -- Query specific partition (partition pruning) SELECT * FROM sales_partitioned WHERE country = 'India'; -- Drop a partition ALTER TABLE sales_partitioned DROP PARTITION (country = 'India');
Conclusion
Dynamic partitioning in Hive automatically creates partitions from data values during INSERT, eliminating the need to manually specify each partition. It is ideal for large datasets with many distinct partition values, improving query performance through partition pruning while reducing manual effort.