Expression for mean and variance in a running stream
Let we have a running stream of numbers as x1,x2,x3,…,xn.
The formula for calculating mean and variance at any given point is given as :
- Mean = E(x) = u = 1/n ∑i=1n xi
- Standard Deviation = s = 1/n ∑i=1n (xi – u) 2
- Variance = s2
However, it would be a very slow approach if we calculate these expressions by looping through all numbers each time a new number comes in.
Effective solution
s2 = 1/n ∑i=1n (xi - u) 2 = 1/n (∑i=1n xi2 + ∑i=1n u2 - 2u ∑i=1n xi) = 1/n (∑xi2 + nu2 - 2u ∑xi) = ∑xi2/n + u2 - 2u ∑xi/n = ∑xi2/n - u2 = E(x2) - u2 = E(x2) - [E(x)]2
Therefore, in this implementation, we have to maintain a variable sum of all the current numbers for mean and maintain variable sum2 of all the current numbers for E(x2) and we have to maintain another variable n for the count of numbers present.
Python code for the implementation :
Python3
sum = 0 # To store sum of stream sumsq = 0 # To store sum of square of stream n = 0 # To store count of numbers while ( True ): x = int ( input ( "Enter a number : " )) n + = 1 sum + = x sumsq + = (x * x) #Mean mean = sum / n #Variance var = (sumsq / n) - (mean * mean) print ( "Mean : " ,mean) print ( "Variance : " ,var) print () |
Input and corresponding output : Enter a number : 1 Mean : 1.0 Variance : 0.0 Enter a number : 2 Mean : 1.5 Variance : 0.25 Enter a number : 5 Mean : 2.6666666666666665 Variance : 2.8888888888888893 Enter a number : 4 Mean : 3.0 Variance : 2.5 Enter a number : 3 Mean : 3.0 Variance : 2.0
Thus, we can compute mean and variance of a running stream at any given point in constant time.