Downsampling
Downsampling involves decreasing the time-frequency of the data, it is a data aggregation procedure where we aggregate the time frequency from a lower level to a higher level. For example summarizing the time-frequency from days to months, or hours to days or seconds to hours. Downsampling usually shrinks the size of the data, depending on the sampling frequency. If D is the size of original data and D’ is the size of Upsampled data, then D’ < D.
For example, car sales data shows sales value for the first 6 months daywise. Assume the task here is to predict the value of the quarterly sales. Given daily data, we are asked to predict the quarterly sales data, which signifies the use of downsampling.
Click here to download the practice dataset car-sales.csv used in this implementation.
Example:
Python3
# import the python pandas library import pandas as pd # read the data using pandas read_csv() function. data = pd.read_csv( "car-sales.csv" , header = 0 , index_col = 0 , parse_dates = True , squeeze = True ) # printing the first 6 rows of the dataset print (data.head( 6 )) |
Output:
We can use quarterly resampling frequency ‘Q’ to aggregate the data quarter-wise.
Python3
# Use resample function to downsample days # to months using the mean sales of month. downsampled = data.resample( 'Q' ).mean() # printing the downsampled data. print (downsampled) |
Output:
Now, this downsampled data can be used for predicting quarterly sales.
How to Resample Time Series Data in Python?
In time series, data consistency is of prime importance, resampling ensures that the data is distributed with a consistent frequency. Resampling can also provide a different perception of looking at the data, in other words, it can add additional insights about the data based on the resampling frequency.
resample() function: It is a primarily used for time series data.
Syntax:
# import the python pandas library import pandas as pd # syntax for the resample function. pd.series.resample(rule, axis=0, closed='left', convention='start', kind=None, offset=None, origin='start_day')
Resampling primarily involves changing the time-frequency of the original observations. The two popular methods of resampling in time series are as follows
- Upsampling
- Downsampling