How to calculate a rolling average in R

In R Programming Language a rolling average, often referred to as a moving average, is a computation used in statistics and data analysis to analyze data points by generating a series of averages of various subsets of the entire dataset. This method works especially well for reducing data oscillations over time so that underlying trends may be seen more clearly.

Concept of Rolling Average

In order to generate averages for consecutive subsets, a rolling average first calculates the average of a certain window of data points and then moves the window across the dataset. Through this approach, the influence of random fluctuations is successfully reduced, and longer-term patterns within the data are highlighted.

Benefits of Using Rolling Average

  1. Smoothing: Rolling averages help in smoothing out short-term fluctuations, making it easier to identify long-term trends.
  2. Noise Reduction: By averaging data over a period, rolling averages mitigate the effects of outliers and noise in the dataset.
  3. Forecasting: Rolling averages are commonly used in forecasting future values based on historical trends.

Methods to Calculate Rolling Average in R

rollmean() Function from the xts Package: The rollmean() function, included in the xts package, lets users compute rolling means with extra parameters for handling missing values and window size specification.

Step 1: Installing and Loading Necessary Packages

Make that the xts and zoo packages are loaded into the R environment and installed before continuing.

R
install.packages("zoo")
install.packages("xts")

library(zoo)
library(xts)

Step 2: Create example dataset

Let’s construct a fictitious dataset with daily temperature values over a month:

R
# Create example dataset
dates <- seq(as.Date("2024-04-01"), by = "day", length.out = 30)
temperatures <- c(18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32, 33, 34, 
                  35, 36, 35, 33, 30, 28, 26, 25, 24, 23, 22, 20, 19, 18, 17, 16)

# Combine into a dataframe
temperature_data <- data.frame(date = dates, temperature = temperatures)
head(temperature_data)

Output:

        date temperature
1 2024-04-01          18
2 2024-04-02          20
3 2024-04-03          22
4 2024-04-04          23
5 2024-04-05          24
6 2024-04-06          25

Step 3: Calculate rolling average using rollmean() function

Similarly, we’ll use the rollmean() function from the xts package to calculate the rolling average with a 7-day window size.

R
# Convert dataframe to xts object
temperature_xts <- xts(temperature_data$temperature, order.by = temperature_data$date)

# Calculate rolling average using rollmean() function
rolling_avg <- rollmean(temperature_xts, k = 7, align = "right", fill = NA)

rolling_avg 

Output:

               [,1]
2024-04-01       NA
2024-04-02       NA
2024-04-03       NA
2024-04-04       NA
2024-04-05       NA
2024-04-06       NA
2024-04-07 22.57143
2024-04-08 23.85714
2024-04-09 25.00000
2024-04-10 26.00000
2024-04-11 27.00000
2024-04-12 28.14286
2024-04-13 29.28571
2024-04-14 30.42857
2024-04-15 31.57143
2024-04-16 32.71429
2024-04-17 33.57143
2024-04-18 34.00000
2024-04-19 33.71429
2024-04-20 33.00000
2024-04-21 31.85714
2024-04-22 30.42857
2024-04-23 28.71429
2024-04-24 27.00000
2024-04-25 25.42857
2024-04-26 24.00000
2024-04-27 22.71429
2024-04-28 21.57143
2024-04-29 20.42857
2024-04-30 19.28571

The rolling average is calculated from April 1 to April 12, with the following values:

  • From April 1 to April 6, the output is NA because there aren’t enough data points to calculate a 7-day average.
  • From April 7 onward, the rolling average reflects the average temperature of the past 7 days, ending on the current day.

The values from April 7 to April 12 show an increasing trend, starting from 22.57143 and going up to 28.14286.

Conclusion

When analysing data, rolling averages are a useful tool since they reduce noise and volatility and reveal patterns and trends. Rolling averages may be computed efficiently by analysts in R to improve their comprehension of time-series data, using techniques similar to those covered above.