Python | Working with date and time using Pandas
While working with data, encountering time series data is very usual. Pandas is a very useful tool while working with time series data.
Pandas provide a different set of tools using which we can perform all the necessary tasks on date-time data. Let’s try to understand with the examples discussed below.
Working with Dates in Pandas
The date class in the DateTime module of Python deals with dates in the Gregorian calendar. It accepts three integer arguments: year, month, and day.
Python3
from datetime import date d = date( 2000 , 9 , 17 ) print (d) print ( type (d)) |
Output:
2000-09-17
<class 'datetime.date'>
Year, month, and day extraction
Retrieve the year, month, and day components from a Timestamp object.
Python3
import pandas as pd # Creating a Timestamp object timestamp = pd.Timestamp( '2023-10-04 15:30:00' ) # Extracting the year from the Timestamp year = timestamp.year # Printing the extracted year print (year) # Extracting the month from the Timestamp month = timestamp.month # Printing the extracted month print (month) # Extracting the day from the Timestamp day = timestamp.day # Printing the extracted day print (day) |
Output:
2023
10
4
Weekdays and quarters
Determine the weekday and quarter associated with a Timestamp.
Python3
# Extracting the hour from the Timestamp hour = timestamp.hour # Printing the extracted hour print (hour) # Extracting the minute from the Timestamp minute = timestamp.minute # Printing the extracted minute print (minute) # Extracting the weekday from the Timestamp weekday = timestamp.weekday() # Printing the extracted weekday print (weekday) # Extracting the quarter from the Timestamp quarter = timestamp.quarter # Printing the extracted quarter print (quarter) |
Output:
15
30
2
4
Working with Time in Pandas
Another class in the DateTime module is called time, which returns a DateTime object and takes integer arguments for time intervals up to microseconds:
Python3
from datetime import time t = time( 12 , 50 , 12 , 40 ) print (t) print ( type (t)) |
Output:
12:50:12.000040
<class 'datetime.time'>
Time periods and date offsets
Create custom time periods and date offsets for flexible date manipulation.
Python3
# Creating a time period object time_period = pd.Period( '2023-10-04' , freq = 'M' ) # Extracting the year from the time period year = time_period.year # Printing the extracted year print (year) # Extracting the month from the time period month = time_period.month # Printing the extracted month print (month) # Extracting the quarter from the time period quarter = time_period.quarter # Printing the extracted quarter print (quarter) # Creating a date offset object date_offset = pd.DateOffset(years = 2 , months = 3 , days = 10 ) # Adding the date offset to a Timestamp new_timestamp = timestamp + date_offset # Printing the new Timestamp print (new_timestamp) |
Output:
2023
10
4
2026-01-14 15:30:00
Handling Time Zones
Time zones play a crucial role in date and time data. Pandas provides mechanisms to handle time zones effectively:
- UTC and time zone conversion: Convert between UTC (Coordinated Universal Time) and local time zones.
- Time zone-aware data manipulation: Work with time zone-aware data, ensuring accurate date and time interpretations.
- Custom time zone settings: Specify custom time zone settings for data analysis and visualization.
Python3
import pandas as pd # Creating a Timestamp object with a specific time zone timestamp = pd.Timestamp( '2023-10-04 15:30:00' , tz = 'America/New_York' ) # Printing the Timestamp with its time zone print (timestamp) # Converting the Timestamp to UTC utc_timestamp = timestamp.utcfromtz( 'America/New_York' ) # Printing the UTC timestamp print (utc_timestamp) # Converting the UTC timestamp back to the original time zone original_timestamp = utc_timestamp.tz_localize( 'America/New_York' ) # Printing the original timestamp print (original_timestamp) # Creating a DatetimeIndex with a specific time zone datetime_index = pd.DatetimeIndex([ '2023-10-04' , '2023-10-11' , '2023-10-18' ], tz = 'Asia/Shanghai' ) # Printing the DatetimeIndex with its time zone print (datetime_index) # Converting the DatetimeIndex to UTC utc_datetime_index = datetime_index.utcfromtz( 'Asia/Shanghai' ) # Printing the UTC DatetimeIndex print (utc_datetime_index) # Converting the UTC DatetimeIndex back to the original time zone original_datetime_index = utc_datetime_index.tz_localize( 'Asia/Shanghai' ) # Printing the original DatetimeIndex print (original_datetime_index) |
Output:
Original Timestamp: 2023-10-04 15:30:00-04:00
UTC Timestamp: 2023-10-04 19:30:00+00:00
Original Timestamp (Back to America/New_York): 2023-10-04 15:30:00-04:00
Original DatetimeIndex: DatetimeIndex(['2023-10-04 00:00:00+08:00', '2023-10-11 00:00:00+08:00',
'2023-10-18 00:00:00+08:00'],
dtype='datetime64[ns, Asia/Shanghai]', freq=None)
UTC DatetimeIndex: DatetimeIndex(['2023-10-03 16:00:00+00:00', '2023-10-10 16:00:00+00:00',
'2023-10-17 16:00:00+00:00'],
dtype='datetime64[ns, UTC]', freq=None)
Original DatetimeIndex (Back to Asia/Shanghai): DatetimeIndex(['2023-10-04 00:00:00+08:00', '2023-10-11 00:00:00+08:00',
'2023-10-18 00:00:00+08:00'],
dtype='datetime64[ns, Asia/Shanghai]', freq=None)
Working with Date and Time in Pandas
Pandas provide convenient methods to extract specific date and time components from Timestamp objects. These methods include:
Step-1: Create a dates dataframe
Python3
import pandas as pd # Create dates dataframe with frequency data = pd.date_range( '1/1/2011' , periods = 10 , freq = 'H' ) data |
Output:
DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',
'2011-01-01 02:00:00', '2011-01-01 03:00:00',
'2011-01-01 04:00:00', '2011-01-01 05:00:00',
'2011-01-01 06:00:00', '2011-01-01 07:00:00',
'2011-01-01 08:00:00', '2011-01-01 09:00:00'],
dtype='datetime64[ns]', freq='H')
Step-2: Create range of dates and show basic features
Python3
# Create date and time with dataframe data = pd.date_range( '1/1/2011' , periods = 10 , freq = 'H' ) x = pd.datetime.now() x.month, x.year |
Output:
(9, 2018)
Datetime features can be divided into two categories. The first one time moments in a period and second the time passed since a particular period. These features can be very useful to understand the patterns in the data.
Step-3: Divide a given date into features –
pandas.Series.dt.year returns the year of the date time.
pandas.Series.dt.month returns the month of the date time.
pandas.Series.dt.day returns the day of the date time.
pandas.Series.dt.hour returns the hour of the date time.
pandas.Series.dt.minute returns the minute of the date time.
Refer all datetime properties from here.
Break date and time into separate features
Python3
# Create date and time with dataframe rng = pd.DataFrame() rng[ 'date' ] = pd.date_range( '1/1/2011' , periods = 72 , freq = 'H' ) # Print the dates in dd-mm-yy format rng[: 5 ] # Create features for year, month, day, hour, and minute rng[ 'year' ] = rng[ 'date' ].dt.year rng[ 'month' ] = rng[ 'date' ].dt.month rng[ 'day' ] = rng[ 'date' ].dt.day rng[ 'hour' ] = rng[ 'date' ].dt.hour rng[ 'minute' ] = rng[ 'date' ].dt.minute # Print the dates divided into features rng.head( 3 ) |
Output:
date year month day hour minute
0 2011-01-01 00:00:00 2011 1 1 0 0
1 2011-01-01 01:00:00 2011 1 1 1 0
2 2011-01-01 02:00:00 2011 1 1 2 0
Step-4: To get the present time, use Timestamp.now() and then convert timestamp to datetime and directly access year, month or day.
Python3
# Input present datetime using Timestamp t = pandas.tslib.Timestamp.now() t |
Output:
Timestamp('2018-09-18 17:18:49.101496')
Python3
# Convert timestamp to datetime t.to_datetime() |
Output:
datetime.datetime(2018, 9, 18, 17, 18, 49, 101496)
Step-5: Extracting specific components of datetime columne like date, time, day of the week for further analysis.
Python3
# Directly access and print the features t.year t.month t.day t.hour t.minute t.second |
Output:
2018
8
25
15
53
Exploring UFO Sightings Over Time
Let’s analyze this problem on a real dataset uforeports.
Python3
import pandas as pd url = 'http://bit.ly/uforeports' # read csv file df = pd.read_csv(url) df.head() |
Output:
City Colors Reported Shape Reported State Time
0 Ithaca NaN TRIANGLE NY 6/1/1930 22:00
1 Willingboro NaN OTHER NJ 6/30/1930 20:00
2 Holyoke NaN OVAL CO 2/15/1931 14:00
3 Abilene NaN DISK KS 6/1/1931 13:00
4 New York Worlds Fair NaN LIGHT NY 4/18/1933 19:00
The code is used to convert a column of time values in a Pandas DataFrame into the datetime format.
Python3
# Convert the Time column to datetime format df[ 'Time' ] = pd.to_datetime(df.Time) df.head() |
Output:
City Colors Reported Shape Reported State \
0 Ithaca NaN TRIANGLE NY
1 Willingboro NaN OTHER NJ
2 Holyoke NaN OVAL CO
3 Abilene NaN DISK KS
4 New York Worlds Fair NaN LIGHT NY
Time
0 1930-06-01 22:00:00
1 1930-06-30 20:00:00
2 1931-02-15 14:00:00
3 1931-06-01 13:00:00
4 1933-04-18 19:00:00
The code is used to display the data types of each column in a Pandas DataFrame.
Python3
# shows the type of each column data df.dtypes |
Output:
City object
Colors Reported object
Shape Reported object
State object
Time datetime64[ns]
dtype: object
The code is used to extract the hour details from a column of time data in a Pandas DataFrame.
Python3
# Get hour detail from time data df.Time.dt.hour.head() |
Output:
0 22
1 20
2 14
3 13
4 19
Name: Time, dtype: int64
The code is used to retrieve the names of the weekdays for a column of date and time data in a Pandas DataFrame.
Python3
# Get name of each date df.Time.dt.weekday_name.head() |
Output:
0 Sunday
1 Monday
2 Sunday
3 Monday
4 Tuesday
Name: Time, dtype: object
The code is used to retrieve the ordinal day of the year for each date in a column of date and time data in a Pandas DataFrame.
Python3
# Get ordinal day of the year df.Time.dt.dayofyear.head() |
Output:
0 152
1 181
2 46
3 152
4 108
Name: Time, dtype: int64
Creating visualization to explore the frequency of UFO sightings by hour of the day.
Python3
# Convert the 'Time' column to datetime format df[ 'Time' ] = pd.to_datetime(df.Time) # Extract the hour of the day from the 'Time' column df[ 'Hour' ] = df[ 'Time' ].dt.hour # Create a histogram to visualize UFO sightings by hour plt.figure(figsize = ( 10 , 6 )) plt.hist(df[ 'Hour' ], bins = 24 , range = ( 0 , 24 ), edgecolor = 'black' , alpha = 0.7 ) plt.xlabel( 'Hour of the Day' ) plt.ylabel( 'Number of UFO Sightings' ) plt.title( 'UFO Sightings by Hour of the Day' ) plt.xticks( range ( 0 , 25 )) plt.grid( True ) plt.show() |
Output:
Conclusion
Working with date and time data is an essential skill for data analysts and scientists. Pandas provides a comprehensive set of tools and techniques for effectively handling date and time information, enabling insightful analysis of time-dependent data. By mastering these techniques, you can gain valuable insights from time series data and make informed decisions in various domains.