Pandas: Time Series Analysis
Working with Datetime Data
Time series data is everywhere—from stock prices to climate data—and handling it without Pandas is like trying to read a calendar from ancient Egypt. Let’s make it easier:
- Converting Strings to Datetime Format
import pandas as pd
df = pd.DataFrame({"date": ["2023-01-01", "2023-02-01", "2023-03-01"]})
df["date"] = pd.to_datetime(df["date"])
print(df.dtypes)pd.to_datetime() ensures that your dates behave like actual dates instead of stubborn strings.
- Extracting Date Components
df["year"] = df["date"].dt.year
df["month"] = df["date"].dt.month
df["weekday"] = df["date"].dt.day_name()
print(df.head())The .dt accessor extracts useful parts of a date, saving you from unnecessary string manipulations.
- Generating Date Ranges
date_range = pd.date_range(start="2023-01-01", periods=10, freq="D")
print(date_range)pd.date_range() is useful when you need a sequence of dates, like forecasting or simulating missing timestamps.
Indexing, Resampling, and Rolling Window Calculations
- Setting Datetime as Index
df.set_index("date", inplace=True)Time-based indexing makes operations like filtering and resampling a breeze.
- Resampling Data
monthly_data = df.resample("M").mean()
print(monthly_data)Resampling allows aggregation at different time frequencies ('D' for daily, 'M' for monthly, 'Y' for yearly, etc.).
- Applying Rolling Statistics
df["rolling_mean"] = df["value"].rolling(window=3).mean()Rolling averages smooth out noisy data to reveal trends more clearly.
Handling Missing Dates in Time Series Data
- Identifying Missing Timestamps
print(pd.date_range(start=df.index.min(), end=df.index.max()).difference(df.index))- Filling Missing Time-Based Data
df = df.asfreq("D", method="ffill")Forward-fill (ffill) fills missing dates with the last known value, preventing gaps in time series models.
Analyzing Trends and Seasonality
- Detecting Trends with Rolling Averages
df["trend"] = df["value"].rolling(window=12).mean()- Identifying Seasonality with Visualization
import matplotlib.pyplot as plt
df["value"].plot()
df["trend"].plot()
plt.legend(["Original Data", "Trend"])
plt.show()- Decomposing Time Series Data
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(df["value"], model="additive", period=12)
decomposition.plot()
plt.show()Hands-On Exercise
- Convert and Extract Date Information: Load a dataset with date columns, convert to datetime format, and extract components.
- Resample and Aggregate Time Series Data: Resample a dataset at different time intervals and compute aggregations.
- Apply Rolling Window Calculations: Perform moving average smoothing on time series data.
- Handle Missing Dates: Identify and fill missing timestamps in a dataset.
- Analyze Trends and Seasonality: Visualize time series trends and decompose data into its components.