MySQL GROUP BY

Grouping rows with aggregate functions

📊 What is GROUP BY?

GROUP BY groups rows that have the same values in specified columns into summary rows. It's commonly used with aggregate functions like COUNT, SUM, AVG, MAX, and MIN to perform calculations on each group of data.


-- Simple GROUP BY example
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department;
                                    

Output:

department employee_count
Sales 15
IT 12
HR 8

Key GROUP BY Concepts

🔢

Aggregate Functions

Works with COUNT, SUM, AVG, MAX, MIN

SELECT dept, COUNT(*)
GROUP BY dept;
📦

Groups Rows

Combines rows with same values

GROUP BY category
📋

Summary Data

Creates summary reports from data

SELECT city, SUM(sales)
GROUP BY city;
🔗

Multiple Columns

Can group by multiple columns

GROUP BY dept, year

🔹 Basic GROUP BY Syntax

GROUP BY organizes rows into groups based on column values, then applies aggregate functions to each group. The SELECT clause can only include grouped columns and aggregate functions. Any column in SELECT that isn't an aggregate function must appear in the GROUP BY clause.

-- Basic GROUP BY syntax
SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;

-- GROUP BY with multiple columns
SELECT column1, column2, COUNT(*)
FROM table_name
GROUP BY column1, column2;

Example Result:

One row per unique value or combination of values

🔹 GROUP BY with COUNT

COUNT is the most common aggregate function used with GROUP BY. It counts the number of rows in each group, helping you understand data distribution. This is essential for generating reports like customer counts per region, orders per month, or products per category.

-- Create sample table
CREATE TABLE orders (
    order_id INT,
    customer_name VARCHAR(100),
    product VARCHAR(100),
    quantity INT,
    order_date DATE
);

-- Insert sample data
INSERT INTO orders VALUES
(1, 'John', 'Laptop', 1, '2024-01-15'),
(2, 'Jane', 'Mouse', 2, '2024-01-16'),
(3, 'John', 'Keyboard', 1, '2024-01-17'),
(4, 'Mike', 'Monitor', 1, '2024-01-18'),
(5, 'Jane', 'Laptop', 1, '2024-01-19');

-- Count orders per customer
SELECT 
    customer_name,
    COUNT(*) AS total_orders,
    COUNT(DISTINCT product) AS unique_products
FROM orders
GROUP BY customer_name
ORDER BY total_orders DESC;

Output:

customer_name total_orders unique_products
John 2 2
Jane 2 2
Mike 1 1

🔹 GROUP BY with SUM and AVG

SUM calculates totals for each group while AVG computes averages. These functions are crucial for financial reports, sales analysis, and performance metrics. You can use multiple aggregate functions in the same query to get comprehensive group statistics in a single result set.

-- Calculate sales statistics by product
SELECT 
    product,
    COUNT(*) AS order_count,
    SUM(quantity) AS total_quantity,
    AVG(quantity) AS avg_quantity_per_order
FROM orders
GROUP BY product
ORDER BY total_quantity DESC;

-- Sales by month
SELECT 
    DATE_FORMAT(order_date, '%Y-%m') AS month,
    COUNT(*) AS orders,
    SUM(quantity) AS items_sold
FROM orders
GROUP BY DATE_FORMAT(order_date, '%Y-%m');

Output:

product order_count total_quantity avg_quantity_per_order
Laptop 2 2 1.00
Mouse 1 2 2.00
Keyboard 1 1 1.00

🔹 GROUP BY Multiple Columns

Grouping by multiple columns creates subgroups within groups, providing more detailed analysis. This is useful for multi-dimensional reports like sales by region and product category, or employee counts by department and job title. Each unique combination of values creates a separate group.

-- Group by customer and product
SELECT 
    customer_name,
    product,
    COUNT(*) AS purchase_count,
    SUM(quantity) AS total_quantity
FROM orders
GROUP BY customer_name, product
ORDER BY customer_name, purchase_count DESC;

-- Group by year and month
SELECT 
    YEAR(order_date) AS year,
    MONTH(order_date) AS month,
    COUNT(*) AS orders,
    SUM(quantity) AS items
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY year, month;

Output:

customer_name product purchase_count total_quantity
Jane Laptop 1 1
Jane Mouse 1 2
John Keyboard 1 1

🔹 Real-World Use Cases

GROUP BY is fundamental for business intelligence and reporting. Common applications include sales dashboards showing revenue by region, inventory reports with stock levels per warehouse, customer analytics with purchase patterns, employee reports with headcount by department, and financial statements with expenses grouped by category and time period.

-- Use Case 1: Sales report by region
SELECT 
    region,
    COUNT(DISTINCT customer_id) AS customers,
    COUNT(*) AS total_orders,
    SUM(order_amount) AS total_revenue,
    AVG(order_amount) AS avg_order_value
FROM sales
WHERE order_date >= DATE_SUB(CURDATE(), INTERVAL 1 YEAR)
GROUP BY region
ORDER BY total_revenue DESC;

-- Use Case 2: Product performance analysis
SELECT 
    category,
    product_name,
    SUM(quantity_sold) AS units_sold,
    SUM(revenue) AS total_revenue,
    AVG(rating) AS avg_rating
FROM product_sales
GROUP BY category, product_name
HAVING SUM(quantity_sold) > 100
ORDER BY category, total_revenue DESC;

🧠 Test Your Knowledge

Which aggregate function counts the number of rows in each group?