Useful Data Tips

Time Series Analysis Basics

⏱️ 30 sec read 📈 Data Analysis

Time series analysis examines data points collected over time to identify patterns, trends, and seasonality. It's essential for forecasting, anomaly detection, and understanding temporal behaviors.

Key Components of Time Series

Trend

# Long-term increase or decrease
# Example: Sales growing 10% year-over-year

import pandas as pd
import matplotlib.pyplot as plt

# Calculate moving average to see trend
df['trend'] = df['sales'].rolling(window=12).mean()
plt.plot(df['date'], df['sales'], label='Actual')
plt.plot(df['date'], df['trend'], label='Trend')
plt.legend()

Seasonality

# Repeating patterns at regular intervals
# Example: Retail sales spike every December

# Detect seasonal patterns
from statsmodels.tsa.seasonal import seasonal_decompose

decomposition = seasonal_decompose(df['sales'], model='additive', period=12)
decomposition.plot()

Noise (Residual)

# Random variation not explained by trend or seasonality
# Important for understanding data quality

residual = decomposition.resid
print(f"Residual std dev: {residual.std()}")

Checking for Stationarity

Augmented Dickey-Fuller Test

from statsmodels.tsa.stattools import adfuller

# Test if series is stationary
result = adfuller(df['sales'])
print(f'ADF Statistic: {result[0]}')
print(f'p-value: {result[1]}')

# If p-value < 0.05, series is stationary
if result[1] < 0.05:
    print("Series is stationary")
else:
    print("Series is non-stationary - needs differencing")

Making Data Stationary

Differencing

# Remove trend by subtracting previous value
df['sales_diff'] = df['sales'].diff()

# Second differencing if needed
df['sales_diff2'] = df['sales_diff'].diff()

Log Transformation

import numpy as np

# Stabilize variance
df['sales_log'] = np.log(df['sales'])

# Then difference if needed
df['sales_log_diff'] = df['sales_log'].diff()

Simple Forecasting Methods

Moving Average

# Average of last N observations
window = 3
df['forecast_ma'] = df['sales'].rolling(window=window).mean().shift(1)

Exponential Smoothing

from statsmodels.tsa.holtwinters import ExponentialSmoothing

# Accounts for trend and seasonality
model = ExponentialSmoothing(df['sales'],
                             trend='add',
                             seasonal='add',
                             seasonal_periods=12)
fit = model.fit()
forecast = fit.forecast(steps=12)  # Next 12 periods

ARIMA Model

from statsmodels.tsa.arima.model import ARIMA

# Auto-regressive Integrated Moving Average
# (p, d, q) = (autoregressive order, differencing, moving average order)
model = ARIMA(df['sales'], order=(1, 1, 1))
fit = model.fit()
forecast = fit.forecast(steps=12)

Evaluating Forecasts

from sklearn.metrics import mean_absolute_error, mean_squared_error

# Split data
train = df['sales'][:80]
test = df['sales'][80:]

# Make predictions
predictions = fit.forecast(steps=len(test))

# Calculate errors
mae = mean_absolute_error(test, predictions)
rmse = np.sqrt(mean_squared_error(test, predictions))
mape = np.mean(np.abs((test - predictions) / test)) * 100

print(f"MAE: {mae:.2f}")
print(f"RMSE: {rmse:.2f}")
print(f"MAPE: {mape:.2f}%")

Detecting Anomalies

# Points outside 3 standard deviations
mean = df['sales'].mean()
std = df['sales'].std()

df['anomaly'] = (df['sales'] < mean - 3*std) | (df['sales'] > mean + 3*std)
anomalies = df[df['anomaly']]

print(f"Found {len(anomalies)} anomalies")

Common Applications

Key Takeaways

Pro Tip: Start with simple methods (moving average, exponential smoothing) before jumping to complex models like ARIMA. Plot your data first to visually identify trends and seasonality!

← Back to Data Analysis Tips