Group By Vs Distinct in PostgreSQL
The GROUP BY and DISTINCT clauses are essential in PostgreSQL for efficient data operations. The DISTINCT clause is used to retrieve unique values from a designated column or combination of columns within a result set, while the GROUP BY clause is used with aggregate functions to organize the result set based on one or more columns.
Distinct is good for retrieving unique values from a column, while GROUP BY is used to summarize the data. The GROUP BY clause is slower than the DISTINCT clause in large data sets due to aggregation. Understanding the differences between these two clauses is crucial for identifying patterns in datasets and optimizing database performance.
Understanding GROUP BY Clause
The GROUP BY clause is used with aggregate functions such as SUM, AVG, COUNT, MIN, and MAX to organize the result set based on one or more columns. By defining conditions, it arranges data into groups.
Syntax:
SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;
In the Syntax,
- column1: Specifies the column by which the result set will be grouped.
- aggregate_function(column2): Applies an aggregate function (e.g., SUM, AVG, COUNT, MIN, MAX) to column2 within each group.
- table_name: Refers to the name of the table from which data is queried.
- GROUP BY column1: Organizes the result set into groups based on the values in column1.
Exploring DISTINCT Clause
The DISTINCT clause is a useful tool in PostgreSQL that helps retrieve unique values from a particular column or combination of columns within a result set. It filters out duplicate values and improves the quality of data. The DISTINCT clause plays an important role in efficient data processing and retrieval, which leads to faster query execution times, especially when working with large datasets.
Syntax:
SELECT DISTINCT column1, column2
FROM table_name;
Syntax:
- column1: Specifies the first column from which unique values will be retrieved.
- column2: Specifies the second column from which unique values will be retrieved, if applicable.
- table_name: Refers to the name of the table from which data is queried.
- DISTINCT: Filters out duplicate combinations of values from column1 and column2.
Setting Up Environment
To understand DISTINCT and GROUP BY Clause in PostgreSQL, we will first create a table name “student” which contains StudentID, Name, Age, the various and Grade as columns, then insert some values. We will perform various operations to understand it.
Query:
CREATE TABLE student (
StudentID SERIAL PRIMARY KEY,
Name VARCHAR(100),
Age INT,
Grade VARCHAR(10)
);
INSERT INTO student (Name, Age, Grade) VALUES ('Rohan', 18, 'A');
INSERT INTO student (Name, Age, Grade) VALUES ('Priya', 17, 'B');
INSERT INTO student (Name, Age, Grade) VALUES ('Aarav', 18, 'A');
INSERT INTO student (Name, Age, Grade) VALUES ('Isha', 17, 'B');
INSERT INTO student (Name, Age, Grade) VALUES ('Ananya', 18, 'A');
SELECT * FROM STUDENT;
After inserting values the table will be:
Example of DISTINCT Clause
Example 1: Obtaining Unique Grades
Suppose we want to see the unique grades in the table as there are duplicate grades also. We will run the below query.
Query:
SELECT DISTINCT Grade
FROM student;
Output:
Explanation: This query return the unique grades obtained by students. Each grade appears only once in the output.
Example 2: Obtaining Unique Ages
To see the distinct ages of the students, we can you the below query.
Query:
SELECT DISTINCT Age
FROM student;
Output:
Explanation: This query return the unique ages of the students. All the ages are listed only once which occurs once or more than once in the student table.
Example of GROUP BY Clause
Example 1: Total number of students in each Grade
Suppose we want to see the total number of students having the same grade for different grades, we can run the below query.
Query:
SELECT Grade, COUNT(*) AS total_students
FROM student
GROUP BY Grade;
Output:
Explanation: This query calculates the total number of students in each grade by adding the students having the same grade.
Example 2: Average age of students in each Grade
Suppose we want to know the average age of students who obtained the same grade. We can use the below query,
Query:
SELECT Grade, AVG(Age) AS avg_age
FROM student
GROUP BY Grade;
Output:
Explanation: This query calculates the average age (using an aggregate function) of the students having the same Grade.
GROUP BY Vs DISTINCT in PostgreSQL
The Difference between “DISTINCT” and “GROUP BY” in PostgreSQL:
Feature |
DISTINCT |
GROUP BY |
---|---|---|
Syntax |
SELECT DISTINCT column1, column2 FROM table_name; |
SELECT column1, aggregate_function(column2) FROM table_name GROUP BY column1; |
Used For |
To get Unique values from a single column |
To get Grouped data (by one or more columns) with aggregate function calculation. |
Columns in SELECT |
One Column for which we want the unique values |
Columns mentioned in Group By Clause and the columns on which aggregate functions are applied. |
Speed |
Faster Comparison to Group By |
Slower than Distinct in large data sets due to aggregation |
Example usage |
To get the unique name of products or customer ID |
Used for identifying patterns in a dataset |
Conclusion
Understanding the differences between DISTINCT and GROUP BY clauses in PostgreSQL is very important for efficient data operations in PostgreSQL. DISTINCT is used to get the Unique value from a column while GROUP BY is used to group the rows according to columns according to a condition by using aggregate functions such as sum, avg. DISTINCT is good for just retrieving the unique values from the column while GROUP BY is used to summarize the data.