Lesson 6 - DateTime Fundamentals

Working with Dates and Times

You can perform operations on numerical data. Now you will learn to work with dates and times—essential skills for analyzing time-series data, tracking trends, and handling temporal information.

By the end of this lesson, you will be able to:

  • Convert strings to datetime objects with pd.to_datetime()
  • Extract date components (year, month, day, weekday)
  • Perform date arithmetic and calculate time differences
  • Filter data by date ranges
  • Format datetime objects for display
  • Handle different date formats

Working with dates is crucial for financial data, sales analysis, web analytics, and any time-based measurements. Let’s master these fundamental skills.


Why DateTime Matters

Dates and times appear everywhere in data analysis:

  • Financial data: Stock prices, transaction timestamps
  • Sales data: Order dates, delivery dates, seasonal trends
  • Web analytics: User activity, session timestamps
  • Scientific data: Measurement timestamps, experiment dates

Pandas provides powerful tools for working with temporal data through specialized datetime types and methods.


Converting Strings to DateTime

Most datasets store dates as strings. You need to convert them to proper datetime objects to unlock date-specific operations.

Creating Sample Data

import pandas as pd
import numpy as np

# Create sample sales data with date strings
sales = pd.DataFrame({
    'order_id': [1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008],
    'customer': ['Ali', 'Sara', 'Reza', 'Maryam', 'Hassan', 'Zahra', 'Fatima', 'Leila'],
    'order_date': ['2024-01-15', '2024-01-20', '2024-02-03', '2024-02-14',
                   '2024-03-01', '2024-03-15', '2024-04-05', '2024-04-20'],
    'delivery_date': ['2024-01-20', '2024-01-25', '2024-02-10', '2024-02-19',
                      '2024-03-08', '2024-03-22', '2024-04-12', '2024-04-27'],
    'amount': [1200, 850, 2300, 1500, 980, 1750, 3200, 1100]
})

print("Sales Data:")
print(sales)
print(f"\nData types:\n{sales.dtypes}")

Output shows dates are stored as object (string) type, not datetime.

Using pd.to_datetime()

# Convert date columns to datetime
sales['order_date'] = pd.to_datetime(sales['order_date'])
sales['delivery_date'] = pd.to_datetime(sales['delivery_date'])

print("After conversion:")
print(f"Data types:\n{sales.dtypes}")
print(f"\nFirst order date: {sales['order_date'].iloc[0]}")
print(f"Type: {type(sales['order_date'].iloc[0])}")

Output:

Data types:
order_id                int64
customer               object
order_date     datetime64[ns]
delivery_date  datetime64[ns]
amount                  int64
dtype: object

First order date: 2024-01-15 00:00:00
Type: <class 'pandas._libs.tslibs.timestamps.Timestamp'>

Now the dates are proper datetime objects!

Handling Different Date Formats

Pandas automatically parses many date formats:

# Different date formats
date_examples = pd.Series([
    '2024-01-15',           # ISO format
    '15/01/2024',           # European format
    'Jan 15, 2024',         # Text month
    '15-Jan-2024',          # Day-Month-Year
    '2024-01-15 14:30:00'   # With time
])

print("Different formats:")
print(date_examples)

# Pandas parses automatically
parsed_dates = pd.to_datetime(date_examples)
print("\nAll parsed to datetime:")
print(parsed_dates)

For faster parsing when you know the format, specify it explicitly:

# Specify format for better performance
# %Y=year, %m=month, %d=day, %H=hour, %M=minute, %S=second
date_str = '15-Jan-2024 14:30:00'
parsed = pd.to_datetime(date_str, format='%d-%b-%Y %H:%M:%S')

print(f"Parsed with format: {parsed}")

Extracting Date Components

Once you have datetime objects, use the .dt accessor to extract components.

The .dt Accessor

# Extract date components
sales['year'] = sales['order_date'].dt.year
sales['month'] = sales['order_date'].dt.month
sales['day'] = sales['order_date'].dt.day
sales['day_of_week'] = sales['order_date'].dt.day_name()
sales['month_name'] = sales['order_date'].dt.month_name()

print("Sales with date components:")
print(sales[['customer', 'order_date', 'year', 'month', 'day', 'day_of_week', 'month_name']])

Output:

  customer order_date  year  month  day day_of_week month_name
0      Ali 2024-01-15  2024      1   15      Monday    January
1     Sara 2024-01-20  2024      1   20    Saturday    January
2     Reza 2024-02-03  2024      2    3    Saturday   February
...

More Date Components

# Additional components
sales['quarter'] = sales['order_date'].dt.quarter
sales['week_of_year'] = sales['order_date'].dt.isocalendar().week
sales['day_of_year'] = sales['order_date'].dt.dayofyear
sales['is_weekend'] = sales['order_date'].dt.dayofweek >= 5  # Saturday=5, Sunday=6

print("Additional date components:")
print(sales[['customer', 'order_date', 'quarter', 'week_of_year', 'day_of_year', 'is_weekend']])

Output shows quarter (1-4), week number, day of year (1-365/366), and weekend flag.


Date Arithmetic

DateTime objects support arithmetic operations for calculating time differences and adding/subtracting periods.

Calculate Time Differences

Subtract datetime objects to get time differences (timedelta):

# Calculate delivery time in days
sales['delivery_days'] = (sales['delivery_date'] - sales['order_date']).dt.days

print("Delivery time calculation:")
print(sales[['customer', 'order_date', 'delivery_date', 'delivery_days']])

print(f"\nAverage delivery time: {sales['delivery_days'].mean():.1f} days")
print(f"Fastest delivery: {sales['delivery_days'].min()} days")
print(f"Slowest delivery: {sales['delivery_days'].max()} days")

Output:

  customer order_date delivery_date  delivery_days
0      Ali 2024-01-15    2024-01-20              5
1     Sara 2024-01-20    2024-01-25              5
2     Reza 2024-02-03    2024-02-10              7
...

Average delivery time: 6.1 days
Fastest delivery: 5 days
Slowest delivery: 7 days

Add or Subtract Time Periods

# Calculate follow-up date (30 days after order)
sales['followup_date'] = sales['order_date'] + pd.Timedelta(days=30)

print("Follow-up dates (30 days after order):")
print(sales[['customer', 'order_date', 'followup_date']])

Different time periods:

# Various time operations
today = pd.Timestamp('2024-05-01')

print(f"Today: {today}")
print(f"Tomorrow: {today + pd.Timedelta(days=1)}")
print(f"Next week: {today + pd.Timedelta(weeks=1)}")
print(f"Next month: {today + pd.DateOffset(months=1)}")
print(f"One year ago: {today - pd.DateOffset(years=1)}")

Output:

Today: 2024-05-01 00:00:00
Tomorrow: 2024-05-02 00:00:00
Next week: 2024-05-08 00:00:00
Next month: 2024-06-01 00:00:00
One year ago: 2023-05-01 00:00:00

Days Since Reference Date

# Set a reference date
reference_date = pd.Timestamp('2024-05-01')

# Calculate days since order
sales['days_since_order'] = (reference_date - sales['order_date']).dt.days

print(f"Days since order (as of {reference_date.date()}):")
print(sales[['customer', 'order_date', 'days_since_order']])

Filtering by Date

Filter DataFrames using date conditions just like numeric filtering.

Filter by Specific Date

# Orders after February 1, 2024
after_feb = sales[sales['order_date'] > '2024-02-01']

print("Orders after February 1, 2024:")
print(after_feb[['customer', 'order_date', 'amount']])

Filter by Date Range

# Q1 2024 orders (January - March)
q1_orders = sales[
    (sales['order_date'] >= '2024-01-01') &
    (sales['order_date'] < '2024-04-01')
]

print("Q1 2024 Orders:")
print(q1_orders[['customer', 'order_date', 'amount']])
print(f"\nQ1 Total Revenue: ${q1_orders['amount'].sum():,}")

Using .between():

# March orders
march_orders = sales[sales['order_date'].between('2024-03-01', '2024-03-31')]

print("March 2024 Orders:")
print(march_orders[['customer', 'order_date', 'amount']])

Filter by Month or Year

# All February orders
february_orders = sales[sales['order_date'].dt.month == 2]

print("All February orders:")
print(february_orders[['customer', 'order_date', 'amount']])

# Orders on Mondays
monday_orders = sales[sales['order_date'].dt.day_name() == 'Monday']
print(f"\nOrders placed on Mondays: {len(monday_orders)}")

Formatting Dates for Display

Use .dt.strftime() to format dates as strings in custom formats.

Custom Date Formats

# Different date formats
sales['formatted_date'] = sales['order_date'].dt.strftime('%d/%m/%Y')
sales['long_format'] = sales['order_date'].dt.strftime('%B %d, %Y')  # January 15, 2024
sales['short_format'] = sales['order_date'].dt.strftime('%d-%b-%y')   # 15-Jan-24

print("Different date formats:")
print(sales[['order_date', 'formatted_date', 'long_format', 'short_format']].head())

Output:

  order_date formatted_date      long_format short_format
0 2024-01-15     15/01/2024  January 15, 2024     15-Jan-24
1 2024-01-20     20/01/2024  January 20, 2024     20-Jan-24
...

Common Format Codes

CodeMeaningExample
%Y4-digit year2024
%y2-digit year24
%mMonth (01-12)03
%BFull month nameMarch
%bMonth abbreviationMar
%dDay (01-31)15
%AFull day nameMonday
%aDay abbreviationMon
%HHour (00-23)14
%MMinute (00-59)30
%SSecond (00-59)45

Practice Exercises

Apply datetime operations with these exercises.

Exercise 1: Employee Tenure Analysis

Given employee hire dates, calculate years of service.

employees = pd.DataFrame({
    'name': ['Ali', 'Sara', 'Reza', 'Maryam', 'Hassan'],
    'hire_date': ['2020-03-15', '2019-07-01', '2021-01-10', '2018-11-20', '2022-05-03'],
    'department': ['IT', 'Sales', 'IT', 'HR', 'Sales']
})

Tasks:

  1. Convert hire_date to datetime
  2. Calculate years of service (from 2024-05-01)
  3. Find who was hired in 2020
  4. Determine the longest-serving employee
# Your code here

Exercise 2: Project Deadline Analysis

Track project deadlines and identify overdue projects.

Tasks:

  1. Create a DataFrame with projects and deadlines
  2. Add a column showing days until deadline (from 2024-04-15)
  3. Identify overdue projects
  4. Find projects due in April 2024
# Your code here

Hint

A negative “days until deadline” value means the project is overdue!

Using the sales DataFrame, analyze which days generate the most revenue.

Tasks:

  1. Group by day of week
  2. Calculate total revenue per day
  3. Find the best day for sales
# Your code here

Summary

You now master datetime operations in pandas. Let’s review the key concepts.

Key Concepts

Converting to DateTime

  • Always convert date strings with pd.to_datetime()
  • Pandas auto-detects most formats
  • Specify format for better performance

Extracting Components

  • Use .dt accessor for date/time parts
  • Extract year, month, day, weekday, quarter, etc.
  • Create derived features like is_weekend

Date Arithmetic

  • Subtract dates to get time differences
  • Use .dt.days to get days from timedelta
  • Add/subtract periods with pd.Timedelta or pd.DateOffset

Filtering by Date

  • Filter dates like numbers with >, <, .between()
  • Filter by month/year using extracted components
  • Combine date conditions with & and |

Syntax Reference

# Convert to datetime
df['date'] = pd.to_datetime(df['date'])
pd.to_datetime(df['date'], format='%Y-%m-%d')

# Extract components
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day
df['day_name'] = df['date'].dt.day_name()
df['month_name'] = df['date'].dt.month_name()
df['quarter'] = df['date'].dt.quarter
df['dayofweek'] = df['date'].dt.dayofweek  # 0=Monday, 6=Sunday

# Date arithmetic
df['days_diff'] = (df['end'] - df['start']).dt.days
df['future'] = df['date'] + pd.Timedelta(days=30)
df['past'] = df['date'] - pd.DateOffset(months=1)

# Filtering
df[df['date'] > '2024-01-01']
df[df['date'].between('2024-01-01', '2024-12-31')]
df[df['date'].dt.month == 3]
df[df['date'].dt.year == 2024]

# Formatting
df['formatted'] = df['date'].dt.strftime('%d/%m/%Y')
df['long_fmt'] = df['date'].dt.strftime('%B %d, %Y')

Important Reminders

  • Convert first: Use pd.to_datetime() before any date operations
  • Use .dt accessor: Required for extracting date components from Series
  • Timedelta for differences: Subtract dates, then use .dt.days
  • Timedelta vs DateOffset: Use Timedelta for fixed periods, DateOffset for variable periods (months)
  • Filter naturally: Date filtering works like numeric filtering
  • Format for display: Use .dt.strftime() for custom date strings

Next Steps

You can now work with dates and times effectively. In the next lesson, you will learn Boolean filtering—applying NumPy-style filtering to DataFrames for powerful data selection.

Continue to Lesson 7 - Boolean Filtering

Learn to filter DataFrame rows based on conditions

Back to Lesson 5 - Series Operations & Value Counts

Review Series arithmetic and categorical analysis


Master Temporal Data Analysis

DateTime operations are essential for time-series analysis, trend detection, and temporal filtering. You now have the skills to convert, extract, calculate, and filter dates effectively.

Use these datetime skills to unlock time-based insights in your data!