DISTINCT vs GROUP BY in SQL
SQL (Structured Query Language) is used to manage and manipulate the data in relational databases. It can be used for tasks such as database querying, data editing, database and table creation and deletion, and granting user permissions.
We can use the DISTINCT keyword and GROUP BY clause when we want to obtain the unique records of a table in SQL. Even though their purpose is the same, they are used in various ways and they also function differently.
DISTINCT Keyword
The DISTINCT keyword retrieves unique values or records from a table, eliminating duplicates. It is applied with the SELECT statement to obtain unique values of one or more columns. Consider a sales table with customer ID, name, product, and price. To find the unique products in the sales table, we can use the DISTINCT keyword.
Syntax:
SELECT DISTINCT column(s) FROM table_name
GROUP BY
The GROUP BY clause groups similar or identical values in a table and is used with aggregate functions like AVG(), SUM(), MAX(), MIN(), COUNT(), etc. It is employed with the SELECT statement and is positioned after the WHERE clause. Unlike DISTINCT, GROUP BY doesn’t directly remove duplicate records; instead, it groups similar values into sets and applies aggregate functions.
Syntax:
SELECT column_name1, aggregate_function(column_name) FROM table_name GROUP BY column_name1;
Examples of DISTINCT and GROUP BY in SQL
Let’s take a closer look at the functionality and use cases of GROUP BY and DISTINCT using a simple example in MySQL.
Sales Table:
-- Create sales table
CREATE TABLE sales(
OrderID INT PRIMARY KEY,
Product VARCHAR,
Name VARCHAR(50),
Price INT
);
To add the value to the sales table:
-- Insert data into sales table
INSERT INTO sales (OrderID,Product, Name, Price) VALUES
(1, 'Laptop','John', 1200),
(2, 'Smartphone','Alice', 800),
(3, 'Tablet','John', 500),
(4, 'Laptop','Bob', 1200),
(5, 'Laptop','Alice', 1200),
(6, 'Tablet','Emily' 400);
DISTINCT keyword
Example 1: Unique Products from Sales Table
SELECT DISTINCT Product FROM sales;
Output:
Explanation: The above result shows the unique values in the product column i.e. without duplicates.
Example 2: Find a the DISTINCT product and price
SELECT DISTINCT Product, Price FROM sales;
Output:
Explanation: The DISTINCT keyword can be used with multiple columns but it is recommended to use it on necessary columns to obtain the desired result.
GROUP BY clause
Example 1: Find the total sales of each product
SELECT Product ,SUM(Price) FROM sales GROUP BY Product;
Output:
Explanation: This SQL query retrieves the total sales for each unique product from the “sales” table. The output displays product names alongside their corresponding aggregated sum of prices.
Example 2: Find the total sales and quantity of each product
SELECT Product, SUM(price) as total_sales, COUNT(*) as quantity FROM Sales GROUP BY Product;
Output:
Explanation: This SQL query calculates the total sales and quantity for each unique product in the “Sales” table. The output presents product names, their total sales, and the respective quantities.
DISTINCT vs GROUP BY in SQL
Feature |
DISTINCT |
GROUP BY |
---|---|---|
Used for |
Unique values from a single column |
Grouped data (by one or more columns) along with aggregate function calculation. |
Syntax |
SELECT DISTINCT column1 FROM table_name; (multiple columns can be added) |
SELECT column1 aggregate_function(column_name) FROM table_name GROUP BY column1; (multiple columns can be added) |
Goal |
Removes duplicate rows from the result |
Groups rows based on specified columns, and use aggregate functions |
Columns in SELECT |
Include only the column(s) for which uniqueness is desired |
Include columns specified in group by clause as well as columns for which aggregate functions are applied |
Conclusion
In conclusion, DISTINCT and GROUP BY in SQL, though serving the common purpose of obtaining unique records, are applied differently. DISTINCT is suitable for obtaining unique values from one or more columns, while GROUP BY is employed for grouping data based on one or more columns along with aggregate function calculations. Understanding when to use each is crucial for crafting efficient and effective SQL queries.