Lesson 6 - DateTime Fundamentals
Working with Dates and Times
You can perform operations on numerical data. Now you will learn to work with dates and times—essential skills for analyzing time-series data, tracking trends, and handling temporal information.
By the end of this lesson, you will be able to:
- Convert strings to datetime objects with
pd.to_datetime() - Extract date components (year, month, day, weekday)
- Perform date arithmetic and calculate time differences
- Filter data by date ranges
- Format datetime objects for display
- Handle different date formats
Working with dates is crucial for financial data, sales analysis, web analytics, and any time-based measurements. Let’s master these fundamental skills.
Why DateTime Matters
Dates and times appear everywhere in data analysis:
- Financial data: Stock prices, transaction timestamps
- Sales data: Order dates, delivery dates, seasonal trends
- Web analytics: User activity, session timestamps
- Scientific data: Measurement timestamps, experiment dates
Pandas provides powerful tools for working with temporal data through specialized datetime types and methods.
Converting Strings to DateTime
Most datasets store dates as strings. You need to convert them to proper datetime objects to unlock date-specific operations.
Creating Sample Data
import pandas as pd
import numpy as np
# Create sample sales data with date strings
sales = pd.DataFrame({
'order_id': [1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008],
'customer': ['Ali', 'Sara', 'Reza', 'Maryam', 'Hassan', 'Zahra', 'Fatima', 'Leila'],
'order_date': ['2024-01-15', '2024-01-20', '2024-02-03', '2024-02-14',
'2024-03-01', '2024-03-15', '2024-04-05', '2024-04-20'],
'delivery_date': ['2024-01-20', '2024-01-25', '2024-02-10', '2024-02-19',
'2024-03-08', '2024-03-22', '2024-04-12', '2024-04-27'],
'amount': [1200, 850, 2300, 1500, 980, 1750, 3200, 1100]
})
print("Sales Data:")
print(sales)
print(f"\nData types:\n{sales.dtypes}")Output shows dates are stored as object (string) type, not datetime.
Using pd.to_datetime()
# Convert date columns to datetime
sales['order_date'] = pd.to_datetime(sales['order_date'])
sales['delivery_date'] = pd.to_datetime(sales['delivery_date'])
print("After conversion:")
print(f"Data types:\n{sales.dtypes}")
print(f"\nFirst order date: {sales['order_date'].iloc[0]}")
print(f"Type: {type(sales['order_date'].iloc[0])}")Output:
Data types:
order_id int64
customer object
order_date datetime64[ns]
delivery_date datetime64[ns]
amount int64
dtype: object
First order date: 2024-01-15 00:00:00
Type: <class 'pandas._libs.tslibs.timestamps.Timestamp'>Now the dates are proper datetime objects!
Handling Different Date Formats
Pandas automatically parses many date formats:
# Different date formats
date_examples = pd.Series([
'2024-01-15', # ISO format
'15/01/2024', # European format
'Jan 15, 2024', # Text month
'15-Jan-2024', # Day-Month-Year
'2024-01-15 14:30:00' # With time
])
print("Different formats:")
print(date_examples)
# Pandas parses automatically
parsed_dates = pd.to_datetime(date_examples)
print("\nAll parsed to datetime:")
print(parsed_dates)For faster parsing when you know the format, specify it explicitly:
# Specify format for better performance
# %Y=year, %m=month, %d=day, %H=hour, %M=minute, %S=second
date_str = '15-Jan-2024 14:30:00'
parsed = pd.to_datetime(date_str, format='%d-%b-%Y %H:%M:%S')
print(f"Parsed with format: {parsed}")Extracting Date Components
Once you have datetime objects, use the .dt accessor to extract components.
The .dt Accessor
# Extract date components
sales['year'] = sales['order_date'].dt.year
sales['month'] = sales['order_date'].dt.month
sales['day'] = sales['order_date'].dt.day
sales['day_of_week'] = sales['order_date'].dt.day_name()
sales['month_name'] = sales['order_date'].dt.month_name()
print("Sales with date components:")
print(sales[['customer', 'order_date', 'year', 'month', 'day', 'day_of_week', 'month_name']])Output:
customer order_date year month day day_of_week month_name
0 Ali 2024-01-15 2024 1 15 Monday January
1 Sara 2024-01-20 2024 1 20 Saturday January
2 Reza 2024-02-03 2024 2 3 Saturday February
...More Date Components
# Additional components
sales['quarter'] = sales['order_date'].dt.quarter
sales['week_of_year'] = sales['order_date'].dt.isocalendar().week
sales['day_of_year'] = sales['order_date'].dt.dayofyear
sales['is_weekend'] = sales['order_date'].dt.dayofweek >= 5 # Saturday=5, Sunday=6
print("Additional date components:")
print(sales[['customer', 'order_date', 'quarter', 'week_of_year', 'day_of_year', 'is_weekend']])Output shows quarter (1-4), week number, day of year (1-365/366), and weekend flag.
Date Arithmetic
DateTime objects support arithmetic operations for calculating time differences and adding/subtracting periods.
Calculate Time Differences
Subtract datetime objects to get time differences (timedelta):
# Calculate delivery time in days
sales['delivery_days'] = (sales['delivery_date'] - sales['order_date']).dt.days
print("Delivery time calculation:")
print(sales[['customer', 'order_date', 'delivery_date', 'delivery_days']])
print(f"\nAverage delivery time: {sales['delivery_days'].mean():.1f} days")
print(f"Fastest delivery: {sales['delivery_days'].min()} days")
print(f"Slowest delivery: {sales['delivery_days'].max()} days")Output:
customer order_date delivery_date delivery_days
0 Ali 2024-01-15 2024-01-20 5
1 Sara 2024-01-20 2024-01-25 5
2 Reza 2024-02-03 2024-02-10 7
...
Average delivery time: 6.1 days
Fastest delivery: 5 days
Slowest delivery: 7 daysAdd or Subtract Time Periods
# Calculate follow-up date (30 days after order)
sales['followup_date'] = sales['order_date'] + pd.Timedelta(days=30)
print("Follow-up dates (30 days after order):")
print(sales[['customer', 'order_date', 'followup_date']])Different time periods:
# Various time operations
today = pd.Timestamp('2024-05-01')
print(f"Today: {today}")
print(f"Tomorrow: {today + pd.Timedelta(days=1)}")
print(f"Next week: {today + pd.Timedelta(weeks=1)}")
print(f"Next month: {today + pd.DateOffset(months=1)}")
print(f"One year ago: {today - pd.DateOffset(years=1)}")Output:
Today: 2024-05-01 00:00:00
Tomorrow: 2024-05-02 00:00:00
Next week: 2024-05-08 00:00:00
Next month: 2024-06-01 00:00:00
One year ago: 2023-05-01 00:00:00Days Since Reference Date
# Set a reference date
reference_date = pd.Timestamp('2024-05-01')
# Calculate days since order
sales['days_since_order'] = (reference_date - sales['order_date']).dt.days
print(f"Days since order (as of {reference_date.date()}):")
print(sales[['customer', 'order_date', 'days_since_order']])Filtering by Date
Filter DataFrames using date conditions just like numeric filtering.
Filter by Specific Date
# Orders after February 1, 2024
after_feb = sales[sales['order_date'] > '2024-02-01']
print("Orders after February 1, 2024:")
print(after_feb[['customer', 'order_date', 'amount']])Filter by Date Range
# Q1 2024 orders (January - March)
q1_orders = sales[
(sales['order_date'] >= '2024-01-01') &
(sales['order_date'] < '2024-04-01')
]
print("Q1 2024 Orders:")
print(q1_orders[['customer', 'order_date', 'amount']])
print(f"\nQ1 Total Revenue: ${q1_orders['amount'].sum():,}")Using .between():
# March orders
march_orders = sales[sales['order_date'].between('2024-03-01', '2024-03-31')]
print("March 2024 Orders:")
print(march_orders[['customer', 'order_date', 'amount']])Filter by Month or Year
# All February orders
february_orders = sales[sales['order_date'].dt.month == 2]
print("All February orders:")
print(february_orders[['customer', 'order_date', 'amount']])
# Orders on Mondays
monday_orders = sales[sales['order_date'].dt.day_name() == 'Monday']
print(f"\nOrders placed on Mondays: {len(monday_orders)}")Formatting Dates for Display
Use .dt.strftime() to format dates as strings in custom formats.
Custom Date Formats
# Different date formats
sales['formatted_date'] = sales['order_date'].dt.strftime('%d/%m/%Y')
sales['long_format'] = sales['order_date'].dt.strftime('%B %d, %Y') # January 15, 2024
sales['short_format'] = sales['order_date'].dt.strftime('%d-%b-%y') # 15-Jan-24
print("Different date formats:")
print(sales[['order_date', 'formatted_date', 'long_format', 'short_format']].head())Output:
order_date formatted_date long_format short_format
0 2024-01-15 15/01/2024 January 15, 2024 15-Jan-24
1 2024-01-20 20/01/2024 January 20, 2024 20-Jan-24
...Common Format Codes
| Code | Meaning | Example |
|---|---|---|
%Y | 4-digit year | 2024 |
%y | 2-digit year | 24 |
%m | Month (01-12) | 03 |
%B | Full month name | March |
%b | Month abbreviation | Mar |
%d | Day (01-31) | 15 |
%A | Full day name | Monday |
%a | Day abbreviation | Mon |
%H | Hour (00-23) | 14 |
%M | Minute (00-59) | 30 |
%S | Second (00-59) | 45 |
Practice Exercises
Apply datetime operations with these exercises.
Exercise 1: Employee Tenure Analysis
Given employee hire dates, calculate years of service.
employees = pd.DataFrame({
'name': ['Ali', 'Sara', 'Reza', 'Maryam', 'Hassan'],
'hire_date': ['2020-03-15', '2019-07-01', '2021-01-10', '2018-11-20', '2022-05-03'],
'department': ['IT', 'Sales', 'IT', 'HR', 'Sales']
})Tasks:
- Convert hire_date to datetime
- Calculate years of service (from 2024-05-01)
- Find who was hired in 2020
- Determine the longest-serving employee
# Your code hereExercise 2: Project Deadline Analysis
Track project deadlines and identify overdue projects.
Tasks:
- Create a DataFrame with projects and deadlines
- Add a column showing days until deadline (from 2024-04-15)
- Identify overdue projects
- Find projects due in April 2024
# Your code hereHint
A negative “days until deadline” value means the project is overdue!
Exercise 3: Sales Trends by Day of Week
Using the sales DataFrame, analyze which days generate the most revenue.
Tasks:
- Group by day of week
- Calculate total revenue per day
- Find the best day for sales
# Your code hereSummary
You now master datetime operations in pandas. Let’s review the key concepts.
Key Concepts
Converting to DateTime
- Always convert date strings with
pd.to_datetime() - Pandas auto-detects most formats
- Specify format for better performance
Extracting Components
- Use
.dtaccessor for date/time parts - Extract year, month, day, weekday, quarter, etc.
- Create derived features like is_weekend
Date Arithmetic
- Subtract dates to get time differences
- Use
.dt.daysto get days from timedelta - Add/subtract periods with
pd.Timedeltaorpd.DateOffset
Filtering by Date
- Filter dates like numbers with
>,<,.between() - Filter by month/year using extracted components
- Combine date conditions with
&and|
Syntax Reference
# Convert to datetime
df['date'] = pd.to_datetime(df['date'])
pd.to_datetime(df['date'], format='%Y-%m-%d')
# Extract components
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day
df['day_name'] = df['date'].dt.day_name()
df['month_name'] = df['date'].dt.month_name()
df['quarter'] = df['date'].dt.quarter
df['dayofweek'] = df['date'].dt.dayofweek # 0=Monday, 6=Sunday
# Date arithmetic
df['days_diff'] = (df['end'] - df['start']).dt.days
df['future'] = df['date'] + pd.Timedelta(days=30)
df['past'] = df['date'] - pd.DateOffset(months=1)
# Filtering
df[df['date'] > '2024-01-01']
df[df['date'].between('2024-01-01', '2024-12-31')]
df[df['date'].dt.month == 3]
df[df['date'].dt.year == 2024]
# Formatting
df['formatted'] = df['date'].dt.strftime('%d/%m/%Y')
df['long_fmt'] = df['date'].dt.strftime('%B %d, %Y')Important Reminders
- Convert first: Use
pd.to_datetime()before any date operations - Use .dt accessor: Required for extracting date components from Series
- Timedelta for differences: Subtract dates, then use
.dt.days - Timedelta vs DateOffset: Use Timedelta for fixed periods, DateOffset for variable periods (months)
- Filter naturally: Date filtering works like numeric filtering
- Format for display: Use
.dt.strftime()for custom date strings
Next Steps
You can now work with dates and times effectively. In the next lesson, you will learn Boolean filtering—applying NumPy-style filtering to DataFrames for powerful data selection.
Continue to Lesson 7 - Boolean Filtering
Learn to filter DataFrame rows based on conditions
Back to Lesson 5 - Series Operations & Value Counts
Review Series arithmetic and categorical analysis
Master Temporal Data Analysis
DateTime operations are essential for time-series analysis, trend detection, and temporal filtering. You now have the skills to convert, extract, calculate, and filter dates effectively.
Use these datetime skills to unlock time-based insights in your data!