PostgreSQL Window Functions

Advanced analytics with window functions

🪟 What are Window Functions?

Window functions perform calculations across rows related to the current row without grouping them. They provide powerful analytics like running totals, rankings, and moving averages while keeping individual rows intact.


-- Basic window function
SELECT 
    name,
    salary,
    AVG(salary) OVER() AS avg_salary
FROM employees;
                                    

Output:

name salary avg_salary
John 70000 65000

Common Window Functions

🏆

ROW_NUMBER

Assign unique row numbers

SELECT ROW_NUMBER() OVER(ORDER BY salary);
📊

RANK

Rank rows with gaps

SELECT RANK() OVER(ORDER BY score DESC);

Running Totals

Calculate cumulative sums

SELECT SUM(amount) OVER(ORDER BY date);
🔄

LAG / LEAD

Access previous/next rows

SELECT LAG(price) OVER(ORDER BY date);

🔹 OVER() Clause Basics

The OVER clause defines the window of rows for the function to operate on. It can include PARTITION BY to divide rows into groups and ORDER BY to define row sequence within each partition.

-- Window over all rows
SELECT 
    name,
    salary,
    AVG(salary) OVER() AS overall_avg
FROM employees;

-- Window with partition
SELECT 
    name,
    department,
    salary,
    AVG(salary) OVER(PARTITION BY department) AS dept_avg
FROM employees;

-- Window with ordering
SELECT 
    name,
    salary,
    ROW_NUMBER() OVER(ORDER BY salary DESC) AS salary_rank
FROM employees;
                            

Output:

name department salary dept_avg
John Engineering 70000 75000

🔹 ROW_NUMBER()

Assign a unique sequential number to each row within a partition. ROW_NUMBER always produces unique values even for ties. Perfect for pagination, deduplication, or creating unique identifiers within groups.

-- Simple row numbering
SELECT 
    name,
    salary,
    ROW_NUMBER() OVER(ORDER BY salary DESC) AS row_num
FROM employees;

-- Row number within partitions
SELECT 
    name,
    department,
    salary,
    ROW_NUMBER() OVER(PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;

-- Get top N per group
SELECT * FROM (
    SELECT 
        name,
        department,
        salary,
        ROW_NUMBER() OVER(PARTITION BY department ORDER BY salary DESC) AS rn
    FROM employees
) ranked
WHERE rn <= 3;
                            

Output:

name salary row_num
Alice 90000 1
Bob 80000 2

🔹 RANK() and DENSE_RANK()

Assign rankings to rows with handling for ties. RANK leaves gaps after ties (1, 2, 2, 4), while DENSE_RANK doesn't (1, 2, 2, 3). Use these for leaderboards, competitions, or any ranking scenario.

-- RANK with gaps
SELECT 
    name,
    score,
    RANK() OVER(ORDER BY score DESC) AS rank
FROM students;
-- Ties get same rank, next rank skips

-- DENSE_RANK without gaps
SELECT 
    name,
    score,
    DENSE_RANK() OVER(ORDER BY score DESC) AS dense_rank
FROM students;
-- Ties get same rank, next rank continues

-- Compare both
SELECT 
    name,
    score,
    RANK() OVER(ORDER BY score DESC) AS rank,
    DENSE_RANK() OVER(ORDER BY score DESC) AS dense_rank,
    ROW_NUMBER() OVER(ORDER BY score DESC) AS row_num
FROM students;
                            

Output:

name score rank dense_rank
Alice 95 1 1
Bob 90 2 2
Carol 90 2 2
Dave 85 4 3

🔹 Running Totals with SUM()

Calculate cumulative sums that grow with each row. Running totals show accumulated values over time or sequence. Essential for financial reports, inventory tracking, or any cumulative analysis.

-- Basic running total
SELECT 
    order_date,
    amount,
    SUM(amount) OVER(ORDER BY order_date) AS running_total
FROM orders;

-- Running total by partition
SELECT 
    order_date,
    customer_id,
    amount,
    SUM(amount) OVER(
        PARTITION BY customer_id 
        ORDER BY order_date
    ) AS customer_running_total
FROM orders;

-- Running average
SELECT 
    date,
    sales,
    AVG(sales) OVER(ORDER BY date) AS running_avg
FROM daily_sales;
                            

Output:

order_date amount running_total
2024-01-01 100 100
2024-01-02 150 250
2024-01-03 200 450

🔹 LAG() and LEAD()

Access values from previous or next rows without self-joins. LAG looks backward, LEAD looks forward. Perfect for comparing consecutive rows, calculating changes, or analyzing trends over time.

-- Get previous row value
SELECT 
    date,
    price,
    LAG(price) OVER(ORDER BY date) AS previous_price,
    price - LAG(price) OVER(ORDER BY date) AS price_change
FROM stock_prices;

-- Get next row value
SELECT 
    date,
    sales,
    LEAD(sales) OVER(ORDER BY date) AS next_day_sales
FROM daily_sales;

-- Compare with offset
SELECT 
    month,
    revenue,
    LAG(revenue, 1) OVER(ORDER BY month) AS last_month,
    LAG(revenue, 12) OVER(ORDER BY month) AS same_month_last_year
FROM monthly_revenue;
                            

Output:

date price previous_price price_change
2024-01-01 100 NULL NULL
2024-01-02 105 100 5

🔹 FIRST_VALUE() and LAST_VALUE()

Get the first or last value in a window frame. These functions access boundary values within your defined window. Useful for comparing current values against starting or ending points.

-- Get first value in partition
SELECT 
    name,
    department,
    salary,
    FIRST_VALUE(salary) OVER(
        PARTITION BY department 
        ORDER BY salary DESC
    ) AS highest_salary_in_dept
FROM employees;

-- Get last value (need frame clause)
SELECT 
    date,
    sales,
    LAST_VALUE(sales) OVER(
        ORDER BY date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS final_sales
FROM daily_sales;

-- Compare to first
SELECT 
    month,
    revenue,
    FIRST_VALUE(revenue) OVER(ORDER BY month) AS first_month_revenue,
    revenue - FIRST_VALUE(revenue) OVER(ORDER BY month) AS growth
FROM monthly_revenue;
                            

Output:

name department salary highest_salary
Alice Engineering 90000 90000

🔹 NTILE()

Divide rows into a specified number of equal groups or buckets. NTILE assigns a bucket number from 1 to N. Perfect for creating quartiles, percentiles, or dividing data into equal segments for analysis.

-- Divide into 4 quartiles
SELECT 
    name,
    salary,
    NTILE(4) OVER(ORDER BY salary) AS salary_quartile
FROM employees;

-- Create percentiles
SELECT 
    student_name,
    score,
    NTILE(100) OVER(ORDER BY score) AS percentile
FROM test_scores;

-- Divide by department
SELECT 
    name,
    department,
    salary,
    NTILE(3) OVER(PARTITION BY department ORDER BY salary) AS dept_tier
FROM employees;
                            

Output:

name salary salary_quartile
John 40000 1
Alice 90000 4

🔹 Moving Averages

Calculate averages over a sliding window of rows. Moving averages smooth out fluctuations and reveal trends. Specify the frame using ROWS BETWEEN to define how many preceding and following rows to include.

-- 3-day moving average
SELECT 
    date,
    sales,
    AVG(sales) OVER(
        ORDER BY date
        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
    ) AS moving_avg_3day
FROM daily_sales;

-- 7-day moving average
SELECT 
    date,
    sales,
    AVG(sales) OVER(
        ORDER BY date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_avg_7day
FROM daily_sales;

-- Moving sum
SELECT 
    date,
    orders,
    SUM(orders) OVER(
        ORDER BY date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS rolling_3day_total
FROM daily_orders;
                            

Output:

date sales moving_avg_3day
2024-01-01 100 125
2024-01-02 150 150

🧠 Test Your Knowledge

What does ROW_NUMBER() do?