Time Series Analysis Basics
Time series analysis examines data points collected over time to identify patterns, trends, and seasonality. It's essential for forecasting, anomaly detection, and understanding temporal behaviors.
Key Components of Time Series
Trend
# Long-term increase or decrease
# Example: Sales growing 10% year-over-year
import pandas as pd
import matplotlib.pyplot as plt
# Calculate moving average to see trend
df['trend'] = df['sales'].rolling(window=12).mean()
plt.plot(df['date'], df['sales'], label='Actual')
plt.plot(df['date'], df['trend'], label='Trend')
plt.legend()
Seasonality
# Repeating patterns at regular intervals
# Example: Retail sales spike every December
# Detect seasonal patterns
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(df['sales'], model='additive', period=12)
decomposition.plot()
Noise (Residual)
# Random variation not explained by trend or seasonality
# Important for understanding data quality
residual = decomposition.resid
print(f"Residual std dev: {residual.std()}")
Checking for Stationarity
Augmented Dickey-Fuller Test
from statsmodels.tsa.stattools import adfuller
# Test if series is stationary
result = adfuller(df['sales'])
print(f'ADF Statistic: {result[0]}')
print(f'p-value: {result[1]}')
# If p-value < 0.05, series is stationary
if result[1] < 0.05:
print("Series is stationary")
else:
print("Series is non-stationary - needs differencing")
Making Data Stationary
Differencing
# Remove trend by subtracting previous value
df['sales_diff'] = df['sales'].diff()
# Second differencing if needed
df['sales_diff2'] = df['sales_diff'].diff()
Log Transformation
import numpy as np
# Stabilize variance
df['sales_log'] = np.log(df['sales'])
# Then difference if needed
df['sales_log_diff'] = df['sales_log'].diff()
Simple Forecasting Methods
Moving Average
# Average of last N observations
window = 3
df['forecast_ma'] = df['sales'].rolling(window=window).mean().shift(1)
Exponential Smoothing
from statsmodels.tsa.holtwinters import ExponentialSmoothing
# Accounts for trend and seasonality
model = ExponentialSmoothing(df['sales'],
trend='add',
seasonal='add',
seasonal_periods=12)
fit = model.fit()
forecast = fit.forecast(steps=12) # Next 12 periods
ARIMA Model
from statsmodels.tsa.arima.model import ARIMA
# Auto-regressive Integrated Moving Average
# (p, d, q) = (autoregressive order, differencing, moving average order)
model = ARIMA(df['sales'], order=(1, 1, 1))
fit = model.fit()
forecast = fit.forecast(steps=12)
Evaluating Forecasts
from sklearn.metrics import mean_absolute_error, mean_squared_error
# Split data
train = df['sales'][:80]
test = df['sales'][80:]
# Make predictions
predictions = fit.forecast(steps=len(test))
# Calculate errors
mae = mean_absolute_error(test, predictions)
rmse = np.sqrt(mean_squared_error(test, predictions))
mape = np.mean(np.abs((test - predictions) / test)) * 100
print(f"MAE: {mae:.2f}")
print(f"RMSE: {rmse:.2f}")
print(f"MAPE: {mape:.2f}%")
Detecting Anomalies
# Points outside 3 standard deviations
mean = df['sales'].mean()
std = df['sales'].std()
df['anomaly'] = (df['sales'] < mean - 3*std) | (df['sales'] > mean + 3*std)
anomalies = df[df['anomaly']]
print(f"Found {len(anomalies)} anomalies")
Common Applications
- Sales forecasting: Predict future revenue
- Demand planning: Inventory optimization
- Stock prices: Financial modeling
- Website traffic: Capacity planning
- Energy consumption: Resource allocation
Key Takeaways
- Always decompose series into trend, seasonal, and residual
- Check for stationarity before modeling
- Use appropriate differencing to remove trends
- Validate forecasts on holdout test data
- Consider multiple models and compare performance
Pro Tip: Start with simple methods (moving average, exponential smoothing) before jumping to complex models like ARIMA. Plot your data first to visually identify trends and seasonality!
← Back to Data Analysis Tips