One Sample Kolmogorov-Smirnov Test
The one-sample Kolmogorov-Smirnov (KS) test is used to determine whether a sample comes from a specific distribution. It is particularly useful when the assumption of normality is in question or when dealing with small sample sizes.
The test statistic, denoted as , measures the maximum difference between the two cumulative distribution functions.
Empirical Distribution Function
The empirical distribution function at the value x represents the proportion of data points that are less than or equal to x in the sample. The function can be defined as:
where,
- n is the number of observations in the sample
- represents the individual observations
- is an indicator function that is 1 if Xi ≤ x and 0 otherwise i.e if the condition is satisfied for the each observation , it is simply 1, otherwise 0.
Kolmogorov–Smirnov Statistic
The Kolmogorov–Smirnov statistic for a given cumulative distribution function is defined as:
where,
- sup stands for supremum, which means the largest value over all possible values of x.
- is the theoretical cumulative distribution function.
- is the empirical cumulative distribution function of the sample (calculated as described above).
Example
Let’s say you have a sample of n observations. You want to test whether this sample comes from a normal distribution with mean and standard deviation . The Null hypothesis is that the sample follows the specified distribution. Steps to follow the test are:
- Compute the Empirical Distribution Function
- Specify the Reference Distribution
- In this case, the cumulative distribution function of the normal distribution with mean and standard deviation is used.
- Calculate the Kolmogorov–Smirnov Statistic
- Compare KS static with Critical Value or P-value
Kolmogorov-Smirnov Test (KS Test)
The Kolmogorov-Smirnov (KS) test is a non-parametric method for comparing distributions, essential for various applications in diverse fields.
In this article, we will look at the non-parametric test which can be used to determine whether the shape of the two distributions is the same or not.