Pandas: Time Series Analysis

Working with Datetime Data

Time series data is everywhere—from stock prices to climate data—and handling it without Pandas is like trying to read a calendar from ancient Egypt. Let’s make it easier:

  • Converting Strings to Datetime Format
import pandas as pd
df = pd.DataFrame({"date": ["2023-01-01", "2023-02-01", "2023-03-01"]})
df["date"] = pd.to_datetime(df["date"])
print(df.dtypes)

pd.to_datetime() ensures that your dates behave like actual dates instead of stubborn strings.

  • Extracting Date Components
df["year"] = df["date"].dt.year
df["month"] = df["date"].dt.month
df["weekday"] = df["date"].dt.day_name()
print(df.head())

The .dt accessor extracts useful parts of a date, saving you from unnecessary string manipulations.

  • Generating Date Ranges
date_range = pd.date_range(start="2023-01-01", periods=10, freq="D")
print(date_range)

pd.date_range() is useful when you need a sequence of dates, like forecasting or simulating missing timestamps.

Indexing, Resampling, and Rolling Window Calculations

  • Setting Datetime as Index
df.set_index("date", inplace=True)

Time-based indexing makes operations like filtering and resampling a breeze.

  • Resampling Data
monthly_data = df.resample("M").mean()
print(monthly_data)

Resampling allows aggregation at different time frequencies ('D' for daily, 'M' for monthly, 'Y' for yearly, etc.).

  • Applying Rolling Statistics
df["rolling_mean"] = df["value"].rolling(window=3).mean()

Rolling averages smooth out noisy data to reveal trends more clearly.

Handling Missing Dates in Time Series Data

  • Identifying Missing Timestamps
print(pd.date_range(start=df.index.min(), end=df.index.max()).difference(df.index))
  • Filling Missing Time-Based Data
df = df.asfreq("D", method="ffill")

Forward-fill (ffill) fills missing dates with the last known value, preventing gaps in time series models.

  • Detecting Trends with Rolling Averages
df["trend"] = df["value"].rolling(window=12).mean()
  • Identifying Seasonality with Visualization
import matplotlib.pyplot as plt
df["value"].plot()
df["trend"].plot()
plt.legend(["Original Data", "Trend"])
plt.show()
  • Decomposing Time Series Data
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(df["value"], model="additive", period=12)
decomposition.plot()
plt.show()

Hands-On Exercise

  1. Convert and Extract Date Information: Load a dataset with date columns, convert to datetime format, and extract components.
  2. Resample and Aggregate Time Series Data: Resample a dataset at different time intervals and compute aggregations.
  3. Apply Rolling Window Calculations: Perform moving average smoothing on time series data.
  4. Handle Missing Dates: Identify and fill missing timestamps in a dataset.
  5. Analyze Trends and Seasonality: Visualize time series trends and decompose data into its components.

References