Table Partitioning for Large Datasets
Partitioning splits large tables into smaller, manageable pieces based on column values. This improves query performance, simplifies maintenance, and enables efficient data archiving.
Range Partitioning (Most Common)
-- PostgreSQL example
CREATE TABLE orders (
order_id SERIAL,
order_date DATE,
customer_id INT,
total DECIMAL(10,2)
) PARTITION BY RANGE (order_date);
-- Create partitions for each year
CREATE TABLE orders_2023 PARTITION OF orders
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
CREATE TABLE orders_2024 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
List Partitioning
-- Partition by region
CREATE TABLE sales (
sale_id INT,
region VARCHAR(20),
amount DECIMAL(10,2)
) PARTITION BY LIST (region);
CREATE TABLE sales_north PARTITION OF sales
FOR VALUES IN ('North', 'Northeast');
CREATE TABLE sales_south PARTITION OF sales
FOR VALUES IN ('South', 'Southeast');
Hash Partitioning
-- Distribute data evenly across partitions
CREATE TABLE customers (
customer_id INT,
name VARCHAR(100)
) PARTITION BY HASH (customer_id);
CREATE TABLE customers_p0 PARTITION OF customers
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE customers_p1 PARTITION OF customers
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
Querying Partitioned Tables
-- Query entire table (searches all partitions)
SELECT * FROM orders WHERE customer_id = 123;
-- Query with partition key (only searches relevant partition)
SELECT * FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';
-- Only scans orders_2024 partition!
Benefits of Partitioning
- Faster queries: Scan only relevant partitions
- Easy archiving: Drop old partitions instead of DELETE
- Parallel operations: Query multiple partitions simultaneously
- Better maintenance: Rebuild indexes on one partition at a time
When to Partition
- Tables larger than 100GB
- Time-series data (logs, transactions, events)
- Regular archiving/purging requirements
- Queries frequently filter by specific columns
Pro Tip: Always include the partition key in your WHERE clause for maximum performance. Partition by date for time-series data, and set up automatic partition creation for ongoing tables.
← Back to SQL Tips