SQL – Statistical Functions
SQL statistical functions are essential tools for extracting meaningful insights from databases. These functions, such as AVG(), SUM(), COUNT(), MIN(), MAX(), STDDEV(), and VAR(), enable users to perform statistical calculations on numeric data. Whether determining averages, sums, counts, or measures of variability, these functions empower efficient data analysis within the SQL environment.
What is Statistics?
So, Statistics is a branch of mathematics that deals with data collection, analysis, interpretation, presentation, and organization. It involves the use of mathematical techniques to extract meaningful information from data. Statistics is widely used in various fields such as business, economics, social science, medicine, and engineering
Where do We Use Stats?
let’s define Statistical Function, A Statistical function is a mathematical function that helps us to process and analyze data to provide meaningful information about the dataset. For example mean, sum, min, max, standard deviation, etc.
Statistical Functions in SQL
Here are Some Common Statistical Functions in SQL:
Function |
Output |
---|---|
AVG() |
Calculates the average value of a numeric column. |
SUM() |
Calculates the sum of values in a numeric column. |
COUNT() |
Counts the number of rows in a result set or the number of non-null values in a column. |
MIN() |
Returns the minimum value in a column. |
MAX() |
Returns the maximum value in a column. |
VAR() / VARIANCE() |
Calculates the population variance of a numeric column. |
STDDEV() / STDDEV_POP() |
Calculates the population standard deviation of a numeric column. |
CORR() |
Calculates the correlation coefficient between two numeric columns. |
COVAR_POP() |
Calculates the population covariance between two numeric columns. |
PERCENTILE_CONT() |
Calculates a specified percentile value for a numeric column |
Statistical Functions With Exmaple
We have four tables in our database: ‘studentDetails,’ ’employees,’ ‘sales_data,’ and ‘financial_data.’ (The pictures are displayed below.)
employees Table:
sales_data:
financial_data:
1. AVG() Function
Calculate the average or arithmetic mean for a group of numbers or a numeric column.
Syntax:
SELECT AVG(column_name) FROM table_name;
Example Query:
SELECT AVG(marks) AS average_marks FROM studentDetails;
Output:
2. SUM() Function
The total of all numeric values in a group i.e. Calculates the total sum of values in a numeric column.
Syntax:
SELECT SUM(column_name) FROM table_name;
Example Query:
SELECT SUM(marks) AS total_marks FROM studentDetails;
Output:
3. Count() Function
The number of cell locations in a range that contain a numeric character i.e Counts the number of rows in a result set or the number of non-null values in a column.
Syntax:
SELECT COUNT(*) FROM table_name;
SELECT COUNT(column_name) FROM table_name;
Example Query:
SELECT COUNT(studentID) AS total_students FROM studentDetails;
Output:
Example Query:
select count(*) from studentdetails;
Output:
Return the count of rows that meet a specified condition .
4. Max() Function
Returns the highest numeric value in a group of numbers.
Syntax:
SELECT MAX(column_name) FROM table_name;
Example Query:
SELECT MAX(marks) AS highest_marks FROM studentDetails;
Output:
5. MIN() Function
Returns the lowest numeric value in a group of numbers.
Syntax:
SELECT MIN(column_name) FROM table_name;
Example Query:
SELECT MIN(marks) AS lowest_marks FROM studentDetails;
Output:
6. VAR() / VARIANCE() Function
Calculates the population variance of a numeric column
Syntax:
SELECT VAR(column_name) FROM table_name;
Example Query:
SELECT VARIANCE(marks) AS variance_marks FROM studentDetails;
Output:
7. STDDEV() / STDDEV_POP() Function
The standard deviation for a group of numbers based on a sample
Syntax:
SELECT STDDEV(column_name) FROM table_name;
Example Query:
SELECT STDDEV(marks) AS stddev_marks FROM studentDetails;
Output:
8. PERCENTILE_CONT() Function
Calculates a specified percentile value for a numeric column.
Syntax:
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY column_name) FROM table_name;
Example Query:
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary
FROM employees;
Output:
9. CORR() Function
Calculates the correlation coefficient between two numeric columns.
Syntax:
SELECT CORR(column1, column2) FROM table_name;
Example Query:
SELECT CORR(sales, profit) AS correlation_coefficient
FROM sales_data;
Output:
10 .COVAR_POP() Function
Calculates the population covariance between two numeric columns.
Syntax:
SELECT COVAR_POP(column1, column2) FROM table_name;
Example Query:
SELECT COVAR_POP(revenue, expenses) AS population_covariance
FROM financial_data;
Output:
Conclusion
In SQL, statistical functions help to analyze and summarise data in the database. These functions assist in extracting meaningful information from the given datasets. For determining the number of occurrences , calculating totals , finding averages or calculating the variance in the dataset statistical functions plays a vital role .Overall, the integration of Statistical Functions elevates SQL’s capabilities, making it an invaluable asset for businesses and analysts seeking actionable intelligence from their relational databases.