Mastering CTE: Escape Subquery Hell Once and For All
SOTAAZ·

Mastering CTE: Escape Subquery Hell Once and For All
One WITH clause transforms unreadable queries into clear, logical steps. Recursive CTEs handle hierarchies with ease.
TL;DR
- CTE (Common Table Expression): Temporary named result set within a query
- Readability: Break complex subqueries into named, logical steps
- Reusability: Reference the same CTE multiple times
- Recursive CTE: Handle hierarchies, generate date series, and more
1. What is a CTE?
The Subquery Problem
-- Hard to read nested subqueries
SELECT
user_id,
total_amount,
user_avg,
total_amount - user_avg as diff
FROM (
SELECT
user_id,
SUM(amount) as total_amount,
AVG(SUM(amount)) OVER () as user_avg
FROM (
SELECT
user_id,
amount
FROM orders
WHERE status = 'completed'
AND created_at >= '2024-01-01'
) filtered_orders
GROUP BY user_id
) user_totals
WHERE total_amount > user_avg;What's going on here?
With CTE
WITH filtered_orders AS (
SELECT user_id, amount
FROM orders
WHERE status = 'completed'
AND created_at >= '2024-01-01'
),
user_totals AS (
SELECT
user_id,
SUM(amount) as total_amount,
AVG(SUM(amount)) OVER () as user_avg
FROM filtered_orders
GROUP BY user_id
)
SELECT
user_id,
total_amount,
user_avg,
total_amount - user_avg as diff
FROM user_totals
WHERE total_amount > user_avg;Each step is crystal clear!
2. CTE Basic Syntax
Single CTE
WITH cte_name AS (
SELECT ...
)
SELECT * FROM cte_name;Multiple CTEs
WITH
first_cte AS (
SELECT ...
),
second_cte AS (
SELECT * FROM first_cte -- Can reference previous CTEs
WHERE ...
),
third_cte AS (
SELECT * FROM second_cte
JOIN first_cte ON ... -- Can join multiple CTEs
)
SELECT * FROM third_cte;3. Practical Example: Revenue Analysis
Problem: Monthly revenue with month-over-month growth rate
WITH monthly_sales AS (
-- Step 1: Aggregate monthly revenue
SELECT
DATE_TRUNC('month', order_date) as month,
SUM(amount) as revenue
FROM orders
WHERE status = 'completed'
GROUP BY DATE_TRUNC('month', order_date)
),
sales_with_growth AS (
-- Step 2: Get previous month's revenue
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) as prev_revenue
FROM monthly_sales
)
-- Step 3: Calculate growth rate
SELECT
month,
revenue,
prev_revenue,
ROUND((revenue - prev_revenue) / prev_revenue * 100, 2) as growth_rate
FROM sales_with_growth
WHERE prev_revenue IS NOT NULL
ORDER BY month;Result:
4. CTE vs Subquery vs Temp Table
When to Use What?
- Subquery: Simple filtering, EXISTS checks
- CTE: Complex multi-step processing, readability matters
- Temp Table: Large datasets, need indexes, multiple query reuse
5. Recursive CTE: Hierarchical Data
Organization Chart Example
-- Employees table
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100),
manager_id INT -- Manager's id (NULL for CEO)
);
-- Sample data
INSERT INTO employees VALUES
(1, 'Alice CEO', NULL),
(2, 'Bob VP', 1),
(3, 'Carol Dir', 2),
(4, 'Dave Mgr', 3),
(5, 'Eve Staff', 4),
(6, 'Frank VP', 1),
(7, 'Grace Dir', 6);Build Org Chart with Recursive CTE
WITH RECURSIVE org_tree AS (
-- Base case: Top level (CEO)
SELECT
id,
name,
manager_id,
1 as level,
name as path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: Subordinates
SELECT
e.id,
e.name,
e.manager_id,
ot.level + 1,
ot.path || ' → ' || e.name
FROM employees e
JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT
REPEAT(' ', level - 1) || name as org_chart,
level,
path
FROM org_tree
ORDER BY path;Result:
6. Recursive CTE: Generate Date Series
Problem: Show zero for dates with no orders
-- Dates with no orders don't appear
SELECT DATE(order_date) as date, COUNT(*) as orders
FROM orders
GROUP BY DATE(order_date);Solution: Generate date series, then LEFT JOIN
WITH RECURSIVE date_series AS (
-- Start date
SELECT DATE '2024-01-01' as date
UNION ALL
-- Increment by one day
SELECT date + INTERVAL '1 day'
FROM date_series
WHERE date < '2024-01-31'
),
daily_orders AS (
SELECT DATE(order_date) as date, COUNT(*) as orders
FROM orders
WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01'
GROUP BY DATE(order_date)
)
SELECT
ds.date,
COALESCE(do.orders, 0) as orders
FROM date_series ds
LEFT JOIN daily_orders do ON ds.date = do.date
ORDER BY ds.date;Now zero-order days appear too!
7. Recursive CTE: Running Totals
Daily Cumulative Revenue
WITH RECURSIVE daily_sales AS (
SELECT
DATE(order_date) as date,
SUM(amount) as daily_amount
FROM orders
GROUP BY DATE(order_date)
),
cumulative AS (
-- First day
SELECT
date,
daily_amount,
daily_amount as cumulative_amount,
1 as day_num
FROM daily_sales
WHERE date = (SELECT MIN(date) FROM daily_sales)
UNION ALL
-- Subsequent days
SELECT
ds.date,
ds.daily_amount,
c.cumulative_amount + ds.daily_amount,
c.day_num + 1
FROM daily_sales ds
JOIN cumulative c ON ds.date = c.date + INTERVAL '1 day'
)
SELECT * FROM cumulative ORDER BY date;💡 In practice, `SUM() OVER (ORDER BY date)` window function is more efficient, but this demonstrates recursive CTE mechanics.
8. Category Hierarchy
Get Full Category Path
-- Categories table (self-referencing)
CREATE TABLE categories (
id INT PRIMARY KEY,
name VARCHAR(100),
parent_id INT
);
-- Sample data
INSERT INTO categories VALUES
(1, 'Electronics', NULL),
(2, 'Computers', 1),
(3, 'Laptops', 2),
(4, 'Gaming Laptops', 3),
(5, 'Clothing', NULL),
(6, 'Men', 5);
-- Get full paths
WITH RECURSIVE category_path AS (
SELECT
id,
name,
parent_id,
name as full_path,
1 as depth
FROM categories
WHERE parent_id IS NULL
UNION ALL
SELECT
c.id,
c.name,
c.parent_id,
cp.full_path || ' > ' || c.name,
cp.depth + 1
FROM categories c
JOIN category_path cp ON c.parent_id = cp.id
)
SELECT id, name, full_path, depth
FROM category_path
ORDER BY full_path;Result:
9. Preventing Infinite Loops
Recursive CTEs can loop forever if written incorrectly.
Safety Net 1: LIMIT
WITH RECURSIVE bad_cte AS (
SELECT 1 as n
UNION ALL
SELECT n + 1 FROM bad_cte -- No termination condition!
)
SELECT * FROM bad_cte
LIMIT 100; -- Safety netSafety Net 2: Depth Limit
WITH RECURSIVE safe_tree AS (
SELECT id, name, parent_id, 1 as depth
FROM categories
WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.name, c.parent_id, st.depth + 1
FROM categories c
JOIN safe_tree st ON c.parent_id = st.id
WHERE st.depth < 10 -- Max 10 levels
)
SELECT * FROM safe_tree;Safety Net 3: Visited Check (Prevent Cycles)
WITH RECURSIVE safe_tree AS (
SELECT
id,
name,
parent_id,
ARRAY[id] as visited -- Array of visited ids
FROM categories
WHERE parent_id IS NULL
UNION ALL
SELECT
c.id,
c.name,
c.parent_id,
st.visited || c.id
FROM categories c
JOIN safe_tree st ON c.parent_id = st.id
WHERE NOT c.id = ANY(st.visited) -- Skip if already visited
)
SELECT * FROM safe_tree;10. Pro Tips
1. Use Clear CTE Names
-- Bad
WITH a AS (...), b AS (...), c AS (...)
-- Good
WITH daily_orders AS (...),
user_totals AS (...),
top_customers AS (...)2. Debug Step by Step
WITH step1 AS (...),
step2 AS (...),
step3 AS (...)
-- Debug: Check intermediate step
SELECT * FROM step2; -- Check step2 instead of step33. Consider Performance
-- PostgreSQL: MATERIALIZED hint to store CTE result
WITH MATERIALIZED expensive_calc AS (
-- Complex calculation
)
SELECT * FROM expensive_calc a
JOIN expensive_calc b ON ...; -- Computed only once despite two referencesConclusion
CTE Key Points:
- WITH clause separates queries into logical steps
- Good naming alone improves readability 200%
- Recursive CTEs handle hierarchies and sequences
- Infinite loop prevention is essential!
References
- PostgreSQL Documentation - WITH Queries
- MySQL 8.0 - Recursive CTE
- SQL Server - Common Table Expressions