Module 21: Macroeconomics for Statisticians

GDP, inflation, unemployment, central bank policy, and business cycles through a statistical lens

Part V of 5 Module 21 of 22

← Previous Module 21 of 22 Next →

Macroeconomics studies the economy as a whole — aggregate output, price levels, employment, and the policy levers that influence them. For a statistician, macroeconomics is a treasure trove of time series, ratio estimators, index numbers, structural breaks, and causal inference challenges. This module translates the core macro concepts into the statistical language you already speak.

21.1 — GDP as a Time Series

Gross Domestic Product (GDP) is the single most important macroeconomic indicator. It measures the total market value of all final goods and services produced in a country over a given period. From a statistical perspective, GDP is a quarterly time series with trend, seasonal, and cyclical components.

Finance Term

Nominal GDP: GDP measured in current prices. Real GDP: GDP adjusted for inflation (measured in constant prices of a base year). The difference matters enormously — nominal GDP can grow 5% while real GDP grows only 2% if inflation is 3%.

The GDP Deflator and Chain-Weighting

The conversion from nominal to real GDP requires a price index. The GDP deflator is implicitly defined as:

GDP Deflator = (Nominal GDP / Real GDP) × 100

Real GDP = Nominal GDP / (GDP Deflator / 100)

Modern GDP calculations use chain-weighting, which updates the base-year weights every period rather than fixing them. This avoids the substitution bias inherent in fixed-weight indices (Laspeyres bias).

Stats Bridge

The chain-weighted GDP is analogous to a Fisher ideal index — the geometric mean of a Laspeyres (base-period weights) and Paasche (current-period weights) index. This is a well-known result in index number theory: the Fisher index satisfies the time-reversal test and the factor-reversal test, making it "ideal" in the axiomatic approach to index numbers.

Seasonal Adjustment: The X-13 ARIMA-SEATS Filter

Raw GDP data has strong seasonal patterns (e.g., holiday spending in Q4). The Bureau of Economic Analysis (BEA) uses the Census Bureau's X-13 ARIMA-SEATS method to remove seasonal components. This is a sophisticated decomposition:

Y_t = T_t × S_t × I_t

where T = trend-cycle, S = seasonal, I = irregular component

The seasonally adjusted series removes S_t, leaving the trend-cycle and irregular components. The annualized quarter-over-quarter growth rate that gets reported in the news is:

g_t = [(GDP_t / GDP_t-1)⁴ − 1] × 100

Common Pitfall

The annualization exponent (raising to the 4th power) amplifies noise. A quarterly growth rate of 0.5% becomes an annualized rate of approximately 2.0%, but a quarterly rate of 0.1% becomes 0.4%. Small measurement errors in the quarterly figure produce large swings in the annualized headline number. GDP is also subject to multiple revisions: advance, second, and third estimates can differ substantially.

Trend vs. Cycle: The Hodrick-Prescott Filter

Separating the long-run trend from the business cycle is a fundamental problem in macroeconometrics. The most widely used (and most criticized) method is the Hodrick-Prescott (HP) filter.

The HP filter solves the optimization problem:

min_τ ∑_t=1^T (y_t − τ_t)² + λ ∑_t=2^T-1 [(τ_t+1 − τ_t) − (τ_t − τ_t-1)]²

where τ_t is the trend, y_t is the observed series, and λ controls the smoothness of the trend. For quarterly data, the standard choice is λ = 1600.

Stats Bridge

The HP filter is a penalized regression (ridge-like regularization on the second differences of the trend). The penalty term penalizes changes in the slope of the trend, forcing it to be smooth. This is identical in spirit to a cubic smoothing spline. Hamilton (2018) has argued that the HP filter produces spurious cycles and recommends a simple regression-based alternative: regress y_t on y_t-8 (for quarterly data) and use the residual as the cycle component.

Python
# Download and decompose real GDP using FRED
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas_datareader import data as pdr

# Download Real GDP from FRED
gdp = pdr.get_data_fred("GDPC1", start="1970-01-01")
gdp.columns = ["real_gdp"]
gdp["log_gdp"] = np.log(gdp["real_gdp"])

# Hodrick-Prescott Filter
from statsmodels.tsa.filters.hp_filter import hpfilter

cycle, trend = hpfilter(gdp["log_gdp"].dropna(), lamb=1600)

# Hamilton (2018) alternative: regress y_t on y_{t-8}
from statsmodels.api import OLS, add_constant
y = gdp["log_gdp"].dropna()
y_lag8 = y.shift(8).dropna()
y_aligned = y.loc[y_lag8.index]
X = add_constant(y_lag8)
hamilton_model = OLS(y_aligned, X).fit()
hamilton_cycle = hamilton_model.resid

# Compute annualized growth rates
gdp["growth_qoq"] = gdp["real_gdp"].pct_change()
gdp["growth_annual"] = ((1 + gdp["growth_qoq"]) ** 4 - 1) * 100

print("Real GDP Time Series Analysis")
print("=" * 50)
print(f"Sample: {gdp.index[0].date()} to {gdp.index[-1].date()}")
print(f"Observations: {len(gdp)}")
print(f"Mean annualized growth: {gdp['growth_annual'].mean():.1f}%")
print(f"Std of annualized growth: {gdp['growth_annual'].std():.1f}%")
print(f"\nHP cycle std: {cycle.std():.4f}")
print(f"Hamilton cycle std: {hamilton_cycle.std():.4f}")

21.2 — Inflation: The CPI as a Weighted Price Index

Inflation measures the rate of change of the general price level. The Consumer Price Index (CPI) is the most widely followed measure, but its construction involves substantial statistical methodology that most people never consider.

Finance Term

CPI (Consumer Price Index): A weighted average of prices for a basket of consumer goods and services, measured by the Bureau of Labor Statistics (BLS). Core CPI: CPI excluding food and energy (volatile components). PCE (Personal Consumption Expenditures): The Fed's preferred inflation measure, broader than CPI and chain-weighted.

CPI Construction: A Laspeyres Index

The CPI is fundamentally a modified Laspeyres price index:

CPI_t = ∑_i w_i,0 · (p_i,t / p_i,0) × 100

where w_i,0 are base-period expenditure weights and p_i,t / p_i,0 are the price relatives. The weights come from the Consumer Expenditure Survey (CE survey) and are updated approximately every two years.

CPI Category	Approximate Weight (%)	Volatility
Housing (shelter)	~36%	Low (sticky, lagged)
Transportation	~16%	High (fuel prices)
Food	~13%	Moderate to high
Medical care	~8%	Low to moderate
Education & communication	~6%	Low
Energy	~7%	Very high
Other goods & services	~14%	Mixed

Stats Bridge

The CPI is a weighted composite estimator with known biases: (1) Substitution bias — consumers switch to cheaper alternatives, but fixed weights do not adjust; (2) Quality bias — if a computer costs the same but is twice as fast, the quality-adjusted price fell; (3) New goods bias — new products enter the basket with a lag; (4) Outlet substitution bias — consumers shift to discount retailers. The Boskin Commission (1996) estimated total CPI bias at ~1.1 percentage points per year.

Core vs. Headline: Signal Extraction

Core CPI excludes food and energy because they are volatile and driven by supply shocks rather than underlying demand. From a statistical perspective, core CPI is a trimmed or filtered estimator that attempts to extract the persistent signal from the noisy headline measure.

Even more aggressive filtering exists: the Cleveland Fed Median CPI (the median component price change) and the Cleveland Fed 16% Trimmed-Mean CPI (excludes the top and bottom 8% of component price changes). These are robust estimators of central tendency applied to the cross-section of price changes.

Headline CPI: π_t = ∑_i w_i Δp_i,t

Core CPI: π_t^core = ∑_{i ∉ {food, energy}} w_i' Δp_i,t

Median CPI: π_t^med = median({Δp_i,t})

Trimmed-Mean CPI: π_t^trim = weighted mean of middle 68% of Δp_i,t

CPI vs. PCE: Why the Fed Prefers PCE

Feature	CPI	PCE
Source	BLS (survey-based)	BEA (national accounts)
Weighting	Fixed (Laspeyres)	Chain-weighted (Fisher-like)
Coverage	Out-of-pocket consumer spending	All consumption, including employer-paid health care
Substitution bias	Higher (fixed weights)	Lower (chain-weighted)
Typical level	~0.3pp higher than PCE	Slightly lower
Fed's target	Not the official target	2% PCE inflation is the official target

Python
# Download and compare inflation measures from FRED
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas_datareader import data as pdr

# FRED series codes
series = {
    "CPI (All Items)": "CPIAUCSL",
    "Core CPI": "CPILFESL",
    "PCE": "PCEPI",
    "Core PCE": "PCEPILFE",
}

inflation = pd.DataFrame()
for name, code in series.items():
    df = pdr.get_data_fred(code, start="2000-01-01")
    # Compute year-over-year percent change
    inflation[name] = df.iloc[:, 0].pct_change(12) * 100

inflation = inflation.dropna()

print("Inflation Measures Comparison")
print("=" * 55)
print(inflation.describe().round(2).to_string())

print(f"\nCorrelation Matrix:")
print(inflation.corr().round(3).to_string())

# Difference between CPI and PCE
print(f"\nMean CPI - PCE gap: {(inflation['CPI (All Items)'] - inflation['PCE']).mean():.2f}pp")
print(f"Mean Core CPI - Core PCE gap: {(inflation['Core CPI'] - inflation['Core PCE']).mean():.2f}pp")

21.3 — The Unemployment Rate: A Ratio Estimator

The unemployment rate is one of the most politically sensitive and widely reported macroeconomic indicators. But few people understand that it is a survey-based ratio estimator with a specific sampling design, definitions, and margin of error.

Finance Term

Unemployment rate: The number of unemployed persons divided by the civilian labor force, expressed as a percentage. A person is "unemployed" if they (1) do not have a job, (2) are available for work, and (3) have actively searched for work in the past 4 weeks.

The Current Population Survey (CPS)

The unemployment rate comes from the Current Population Survey (CPS), a monthly household survey of approximately 60,000 households conducted by the Census Bureau for the Bureau of Labor Statistics. Key statistical features:

Sampling design: Stratified multi-stage probability sample with a 4-8-4 rotation scheme (interviewed for 4 months, out for 8, back for 4).
Sample size: ~60,000 households, yielding ~100,000 individuals 16+.
Standard error: The 90% confidence interval for the unemployment rate is approximately ±0.2 percentage points.
Non-response: The CPS has a response rate of ~80%, with potential non-response bias.

Unemployment Rate = U / L = U / (U + E)

where U = unemployed, E = employed, L = labor force

SE(û) ≈ ± 0.1 to 0.2 percentage points (90% CI)

Stats Bridge

The unemployment rate is a ratio estimator &hat;R = &hat;Y / &hat;X, where both the numerator (unemployed count) and denominator (labor force) are estimated from the survey. The variance of a ratio estimator is: Var(&hat;R) ≈ (1/X²)[Var(&hat;Y) + R²Var(&hat;X) − 2R·Cov(&hat;Y, &hat;X)]
This means a 0.1pp change in the unemployment rate (e.g., 4.0% to 3.9%) is often within the margin of error and may not represent a real change.

Alternative Unemployment Measures: U-1 through U-6

Measure	Definition	Typical Level (relative to U-3)
U-1	Persons unemployed 15 weeks or longer	Much lower (long-term only)
U-2	Job losers and completers of temporary jobs	Lower
U-3	Official unemployment rate	The headline number
U-4	U-3 + discouraged workers	Slightly higher
U-5	U-4 + all marginally attached workers	Higher
U-6	U-5 + part-time for economic reasons	Roughly double U-3

Common Pitfall

The headline unemployment rate (U-3) excludes discouraged workers (people who have stopped looking for work) and underemployed (people working part-time who want full-time). The U-6 "real" unemployment rate can be nearly double the U-3. When politicians cite "record low unemployment," ask which measure they are using and what is happening to the labor force participation rate.

Python
# Download and analyze unemployment data from FRED
import pandas as pd
import numpy as np
from pandas_datareader import data as pdr
import matplotlib.pyplot as plt

# Key labor market series
series = {
    "U-3 (Official)": "UNRATE",
    "U-6 (Broad)": "U6RATE",
    "Labor Force Participation": "CIVPART",
}

labor = pd.DataFrame()
for name, code in series.items():
    df = pdr.get_data_fred(code, start="1994-01-01")
    labor[name] = df.iloc[:, 0]

labor = labor.dropna()

print("Unemployment Rate: Statistical Summary")
print("=" * 55)
print(labor.describe().round(2).to_string())

# Margin of error analysis
latest_u3 = labor["U-3 (Official)"].iloc[-1]
se = 0.12  # approximate standard error
print(f"\nLatest U-3: {latest_u3:.1f}%")
print(f"90% CI: [{latest_u3 - 1.645*se:.1f}%, {latest_u3 + 1.645*se:.1f}%]")
print(f"A 0.1pp monthly change is often within sampling error.")

# U-6 / U-3 ratio over time
labor["U6_U3_ratio"] = labor["U-6 (Broad)"] / labor["U-3 (Official)"]
print(f"\nMean U-6/U-3 ratio: {labor['U6_U3_ratio'].mean():.2f}")
print(f"The broad measure is typically ~{labor['U6_U3_ratio'].mean():.1f}x the headline rate.")

21.4 — The Phillips Curve: A Scatterplot with History

The Phillips Curve posits an inverse relationship between inflation and unemployment: when unemployment is low, inflation tends to rise (and vice versa). This is one of the most debated empirical relationships in economics, and it provides an excellent case study in the instability of statistical relationships.

π_t = π_t^e − β(u_t − u_t^*) + ε_t

where π_t^e = expected inflation, u_t^* = natural rate (NAIRU), β > 0

Stats Bridge

The Phillips Curve is a bivariate regression with a time-varying parameter. The slope β has varied dramatically across decades: steep in the 1960s, nearly flat in the 2010s. This is a textbook case of parameter instability and the danger of treating a historical correlation as a stable structural relationship. The Lucas Critique (1976) formalizes this: policy changes alter the reduced-form relationships.

Testing the Phillips Curve

Python
# Test the Phillips Curve: inflation vs unemployment
import pandas as pd
import numpy as np
from pandas_datareader import data as pdr
import matplotlib.pyplot as plt
from scipy import stats

# Download data
unemp = pdr.get_data_fred("UNRATE", start="1960-01-01")
cpi = pdr.get_data_fred("CPIAUCSL", start="1960-01-01")
inflation = cpi.pct_change(12) * 100

# Merge on date
df = pd.DataFrame({
    "unemployment": unemp.iloc[:, 0],
    "inflation": inflation.iloc[:, 0]
}).dropna()

# Full sample regression
slope_full, intercept_full, r_full, p_full, se_full = stats.linregress(
    df["unemployment"], df["inflation"]
)

print("Phillips Curve: Full Sample (1960-present)")
print("=" * 50)
print(f"Slope: {slope_full:.3f} (se = {se_full:.3f})")
print(f"R-squared: {r_full**2:.3f}")
print(f"p-value: {p_full:.4f}")

# By decade: show parameter instability
decades = [
    ("1960s", "1960", "1970"),
    ("1970s", "1970", "1980"),
    ("1980s", "1980", "1990"),
    ("1990s", "1990", "2000"),
    ("2000s", "2000", "2010"),
    ("2010s", "2010", "2020"),
]

print(f"\n{'Decade':>8}  {'Slope':>8}  {'R-sq':>6}  {'p-val':>8}")
print("-" * 40)
for label, start, end in decades:
    subset = df.loc[start:end]
    if len(subset) < 10:
        continue
    s, i, r, p, se = stats.linregress(subset["unemployment"], subset["inflation"])
    print(f"{label:>8}  {s:>8.3f}  {r**2:>6.3f}  {p:>8.4f}")

print("\nThe slope changes dramatically by decade — parameter instability!")

21.5 — Central Bank Policy as Intervention Analysis

Central banks (the Federal Reserve in the US, ECB in Europe, Bank of Japan, etc.) are the most powerful economic actors. Their primary tool is the policy interest rate — the rate at which banks borrow overnight from each other. Changes in this rate ripple through the entire economy.

Finance Term

Federal Funds Rate: The interest rate at which depository institutions lend reserve balances to each other overnight. The Fed sets a target range (e.g., 5.25–5.50%) and uses open market operations to keep the effective rate within that range. Monetary policy transmission: Fed rate → bank lending rates → consumer/business borrowing costs → spending → GDP & inflation.

Stats Bridge

A rate change is a treatment intervention in a time series. In the interrupted time series (ITS) framework, you model the outcome before and after the intervention, controlling for pre-existing trends. The challenge: monetary policy is endogenous (the Fed cuts rates because the economy is weakening), creating a classic simultaneity bias. This is why Romer & Romer (2004) used narrative identification — reading FOMC minutes to isolate "exogenous" rate changes from systematic responses.

The Taylor Rule: A Predictive Model for Policy

The Taylor Rule (1993) is a simple regression-based prescription for setting the federal funds rate:

i_t = r* + π_t + 0.5(π_t − π*) + 0.5(y_t − y_t*)

where r* = equilibrium real rate (≈2%), π* = target inflation (2%), y_t − y_t* = output gap

The Taylor Rule is essentially a linear regression of the policy rate on the inflation gap and the output gap. When the actual rate deviates from the Taylor Rule prescription, it indicates that the Fed is being more hawkish (rate above Taylor) or dovish (rate below).

Quantitative Easing (QE): When Rates Hit Zero

When the policy rate reaches zero (the "zero lower bound"), the conventional tool is exhausted. Central banks then turn to Quantitative Easing (QE): purchasing large quantities of bonds to directly lower long-term interest rates and increase the money supply.

QE Mechanism	Channel	Statistical Evidence
Buy government bonds	Lowers long-term yields (portfolio balance)	Event studies show 50-100bp drop in 10Y yield per $1T purchase
Signal commitment	Signals rates will stay low (forward guidance)	Term premium decomposition models capture this
Increase bank reserves	More reserves → more lending capacity	Money multiplier has been unstable; less clear evidence
Wealth effect	Rising asset prices → consumer confidence → spending	Marginal propensity to consume out of wealth is ~3-5 cents per dollar

Python
# Compute the Taylor Rule and compare to actual Fed Funds Rate
import pandas as pd
import numpy as np
from pandas_datareader import data as pdr
import matplotlib.pyplot as plt

# Download data from FRED
fed_funds = pdr.get_data_fred("FEDFUNDS", start="1990-01-01")
cpi = pdr.get_data_fred("CPIAUCSL", start="1989-01-01")
gdp = pdr.get_data_fred("GDPC1", start="1989-01-01")
pot_gdp = pdr.get_data_fred("GDPPOT", start="1989-01-01")

# Compute inflation (YoY CPI)
inflation = cpi.pct_change(12).dropna() * 100
inflation.columns = ["inflation"]

# Compute output gap
output_gap = ((gdp.iloc[:, 0] - pot_gdp.iloc[:, 0]) / pot_gdp.iloc[:, 0] * 100).dropna()

# Resample to quarterly and merge
infl_q = inflation.resample('QS').last()
ff_q = fed_funds.resample('QS').last()

# Taylor Rule parameters
r_star = 2.0
pi_star = 2.0

# Build merged DataFrame
taylor = pd.DataFrame({
    "fed_funds": ff_q.iloc[:, 0],
    "inflation": infl_q.iloc[:, 0],
    "output_gap": output_gap
}).dropna()

taylor["taylor_rule"] = (r_star + taylor["inflation"]
                          + 0.5 * (taylor["inflation"] - pi_star)
                          + 0.5 * taylor["output_gap"])

taylor["taylor_rule_clamped"] = taylor["taylor_rule"].clip(lower=0)

print("Taylor Rule vs Actual Fed Funds Rate")
print("=" * 50)
gap = taylor["fed_funds"] - taylor["taylor_rule_clamped"]
print(f"Mean gap (actual - Taylor): {gap.mean():.2f}pp")
print(f"Std gap: {gap.std():.2f}pp")
print(f"Correlation: {taylor['fed_funds'].corr(taylor['taylor_rule_clamped']):.3f}")

21.6 — Leading, Coincident, and Lagging Indicators

Macroeconomic indicators can be classified by their timing relative to the business cycle. This classification is fundamentally about the cross-correlation function between each indicator and aggregate economic activity.

Type	Definition	Examples	Statistical Analogue
Leading	Turns before the business cycle	Building permits, stock prices, yield curve, new orders, consumer expectations	Predictive variables (Granger-cause GDP)
Coincident	Moves with the business cycle	GDP, industrial production, employment, personal income	Concurrent measures (proxy variables for the latent "state")
Lagging	Turns after the business cycle	Unemployment rate, CPI, bank lending, labor costs	Retrospective confirmation (backward-looking moving averages)

Stats Bridge

The Conference Board's Leading Economic Index (LEI) is a composite index (weighted average of 10 leading indicators). Its construction is analogous to principal component analysis: each component captures a different facet of "future economic activity," and the composite attempts to extract the common factor. The weights are based on standardizing each component and weighting by historical predictive power.

Key Insight

The unemployment rate is a lagging indicator. It peaks after a recession has ended, sometimes by many months. This means that if you wait for unemployment to rise before concluding a recession has started, you are already deep into it. Similarly, unemployment may still be falling as the economy is already beginning to slow.

21.7 — The Yield Curve as a Recession Predictor

The yield curve is a plot of interest rates (yields) on government bonds of different maturities. Normally, longer-maturity bonds have higher yields (compensation for duration risk). When the curve inverts — short-term rates exceed long-term rates — it has historically preceded recessions with remarkable accuracy.

Finance Term

Yield curve: The relationship between bond yields and their maturities. Term spread: The difference between long-term and short-term yields (e.g., 10-year minus 2-year Treasury yield). Inversion: When the term spread goes negative. An inverted yield curve has preceded every US recession since the 1960s, with only one false signal (1966).

Term Spread = y_10Y − y_2Y

P(Recession within 12 months | Spread < 0) ≈ 0.60 – 0.80 (historical average)

P(Recession within 12 months | Spread > 0) ≈ 0.10 – 0.15

Stats Bridge

The yield curve inversion is a binary classifier for recession. You can evaluate it with standard classification metrics: sensitivity (the fraction of recessions that were preceded by inversion — very high), specificity (the fraction of non-recessions where the curve was not inverted), precision, and the area under the ROC curve. Estrella & Mishkin (1998) showed the term spread has superior predictive power compared to other leading indicators in a probit model.

Why Does the Yield Curve Predict Recessions?

Expectations hypothesis: Long-term rates reflect expectations of future short-term rates. If investors expect the Fed to cut rates (because of recession), long-term rates fall below current short-term rates.
Term premium: The extra yield for holding longer-duration bonds. When uncertainty is high, the term premium can compress or turn negative.
Bank profitability channel: Banks borrow short and lend long. An inverted curve squeezes bank margins, reducing lending, which slows the economy. This makes the yield curve partly causal, not just predictive.

Python
# Yield Curve as Recession Predictor
import pandas as pd
import numpy as np
from pandas_datareader import data as pdr
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

# Download yield curve data and recession indicators
y10 = pdr.get_data_fred("GS10", start="1976-01-01")
y2 = pdr.get_data_fred("GS2", start="1976-01-01")
rec = pdr.get_data_fred("USREC", start="1976-01-01")

# Compute term spread
spread = pd.DataFrame({
    "spread": y10.iloc[:, 0] - y2.iloc[:, 0],
    "recession": rec.iloc[:, 0]
}).dropna()

# Create forward-looking recession indicator (recession within 12 months)
spread["recession_12m"] = spread["recession"].rolling(12).max().shift(-12)
spread = spread.dropna()

# Classification performance
spread["inverted"] = (spread["spread"] < 0).astype(int)

tp = ((spread["inverted"] == 1) & (spread["recession_12m"] == 1)).sum()
fp = ((spread["inverted"] == 1) & (spread["recession_12m"] == 0)).sum()
fn = ((spread["inverted"] == 0) & (spread["recession_12m"] == 1)).sum()
tn = ((spread["inverted"] == 0) & (spread["recession_12m"] == 0)).sum()

print("Yield Curve Inversion as Recession Classifier")
print("=" * 50)
print(f"Sensitivity (recall): {tp/(tp+fn):.1%}")
print(f"Specificity:          {tn/(tn+fp):.1%}")
print(f"Precision:            {tp/(tp+fp):.1%}")

# Probit model (logistic approximation)
X = spread["spread"].values.reshape(-1, 1)
y_target = spread["recession_12m"].values
model = LogisticRegression()
model.fit(X, y_target)
y_prob = model.predict_proba(X)[:, 1]
auc = roc_auc_score(y_target, y_prob)
print(f"ROC AUC (probit model): {auc:.3f}")

21.8 — Business Cycle Dating: Change-Point Detection

In the US, the official arbiter of business cycle dates is the National Bureau of Economic Research (NBER) Business Cycle Dating Committee. They determine when recessions begin (peaks) and end (troughs). From a statistical perspective, this is a change-point detection problem.

Stats Bridge

NBER business cycle dating is formally equivalent to multiple change-point detection in a multivariate time series. The committee examines several coincident indicators (employment, industrial production, real income, real sales) and identifies structural breaks. Algorithmic approaches include: (1) Markov-switching models (Hamilton, 1989) — a hidden Markov model with two states (expansion/recession); (2) Bai-Perron structural break tests; (3) CUSUM tests for parameter stability.

Hamilton's Markov-Switching Model

Hamilton's (1989) seminal model treats the business cycle as a latent state variable S_t ∈ {0, 1} (expansion or recession) that follows a Markov chain:

y_t = μ_{S_t} + φ(y_t-1 − μ_{S_t-1}) + ε_t

P(S_t = j | S_t-1 = i) = p_ij

Transition matrix: P = [[p₀₀, p₀₁], [p₁₀, p₁₁]]

Python
# Hamilton Markov-Switching Model for Business Cycle Dating
import pandas as pd
import numpy as np
from pandas_datareader import data as pdr
import statsmodels.api as sm

# Download GDP growth
gdp = pdr.get_data_fred("GDPC1", start="1960-01-01")
gdp_growth = gdp.pct_change().dropna() * 100
gdp_growth.columns = ["growth"]

# Fit Markov-Switching AR(1) model
mod = sm.tsa.MarkovAutoregression(
    gdp_growth["growth"].dropna(),
    k_regimes=2,
    order=1,
    switching_ar=False,
    switching_variance=True
)
res = mod.fit(search_reps=20)

print("Markov-Switching Model for Business Cycle")
print("=" * 50)
print(f"Regime 0 (expansion) mean growth: {res.params['const[0]']:.2f}%")
print(f"Regime 1 (recession) mean growth: {res.params['const[1]']:.2f}%")

# Transition probabilities
print(f"\nTransition probabilities:")
print(f"  P(stay in expansion): {res.params['p[0->0]']:.3f}")
print(f"  P(stay in recession): {res.params['p[1->1]']:.3f}")

# Expected duration of each regime
dur_exp = 1 / (1 - res.params['p[0->0]'])
dur_rec = 1 / (1 - res.params['p[1->1]'])
print(f"\nExpected duration:")
print(f"  Expansion: {dur_exp:.1f} quarters ({dur_exp/4:.1f} years)")
print(f"  Recession: {dur_rec:.1f} quarters ({dur_rec/4:.1f} years)")

# Smoothed recession probabilities
recession_prob = res.smoothed_marginal_probabilities[1]
print(f"\nRecession probability > 50% in {(recession_prob > 0.5).sum()} quarters")

21.9 — Fiscal vs. Monetary Policy: Two Levers

Macroeconomic stabilization has two main policy levers: monetary policy (controlled by the central bank) and fiscal policy (controlled by the government through taxation and spending). Understanding the difference is essential for interpreting macroeconomic data.

Feature	Monetary Policy	Fiscal Policy
Decision maker	Central bank (Fed, ECB, BoJ)	Government (Congress + President)
Primary tool	Interest rates, money supply	Government spending, taxes
Speed of implementation	Fast (FOMC meets 8x/year)	Slow (legislative process)
Transmission lag	6-18 months to full effect	Variable; direct spending is faster
Political independence	Designed to be independent	Inherently political
Effectiveness at zero lower bound	Limited (hence QE)	Potentially more effective
Key multiplier	Money multiplier	Fiscal multiplier (ΔGDP / ΔG)

Stats Bridge

Estimating the fiscal multiplier (how much GDP changes per dollar of government spending) is one of the hardest causal inference problems in economics. The challenge is endogeneity: government spending increases during recessions, creating a negative correlation between spending and GDP that masks the causal effect. Researchers use instrumental variables (Ramey, 2011 uses military spending news shocks), local projections (Jorda, 2005), and SVAR models (Blanchard & Perotti, 2002) to identify the causal effect.

Key Insight

The fiscal multiplier is state-dependent: it is larger during recessions (when there is slack in the economy) and smaller during expansions (when the economy is near capacity). Estimates range from 0.5 to 2.5 depending on the state of the economy, the type of spending, and the monetary policy regime. This is a textbook example of heterogeneous treatment effects.

21.10 — Putting It All Together: A Macro Dashboard

Python
# Comprehensive macro data download and summary
import pandas as pd
import numpy as np
from pandas_datareader import data as pdr

# Define key macro series
macro_series = {
    "Real GDP Growth (Q/Q Ann.)": "A191RL1Q225SBEA",
    "CPI Inflation (YoY)": "CPIAUCSL",
    "Core PCE Inflation (YoY)": "PCEPILFE",
    "Unemployment Rate": "UNRATE",
    "Fed Funds Rate": "FEDFUNDS",
    "10Y Treasury Yield": "GS10",
    "2Y Treasury Yield": "GS2",
}

# Download all
data_dict = {}
for name, code in macro_series.items():
    try:
        df = pdr.get_data_fred(code, start="2020-01-01")
        data_dict[name] = df.iloc[:, 0]
    except:
        print(f"Could not download {name}")

# Current snapshot
print("Current Macroeconomic Snapshot")
print("=" * 55)
for name, series in data_dict.items():
    latest = series.dropna().iloc[-1]
    date = series.dropna().index[-1].date()
    print(f"  {name:35s}: {latest:>7.2f}  ({date})")

# Yield curve status
if "10Y Treasury Yield" in data_dict and "2Y Treasury Yield" in data_dict:
    spread = (data_dict["10Y Treasury Yield"].dropna().iloc[-1] -
              data_dict["2Y Treasury Yield"].dropna().iloc[-1])
    status = "INVERTED (recession signal)" if spread < 0 else "Normal"
    print(f"\n  Yield Curve Spread (10Y-2Y): {spread:.2f}pp — {status}")

print("\nThis data forms the macro context for any investment decision.")

21.11 — Summary

This module has translated the core concepts of macroeconomics into statistical language:

GDP is a quarterly time series with trend, seasonal, and cyclical components, decomposed using HP filters or Hamilton regressions.
Inflation (CPI) is a weighted Laspeyres price index with known biases. Core measures are robust estimators of underlying price trends.
Unemployment is a ratio estimator from a survey (CPS) with a margin of error that most people ignore.
The Phillips Curve is an unstable regression with time-varying coefficients — a cautionary tale about structural change.
Central bank policy is an intervention in a time series, with the Taylor Rule as a predictive model and QE as an unconventional treatment.
Leading indicators are predictive variables; coincident indicators are concurrent measures; lagging indicators are retrospective.
The yield curve is a binary classifier for recession with impressive historical ROC performance.
Business cycle dating is a change-point detection problem, solvable with Markov-switching models.
Fiscal multipliers are heterogeneous treatment effects estimated with IV, local projections, or SVARs.

Stats Bridge

Macroeconomics is applied time series analysis, causal inference, and signal extraction — all areas where your statistical training gives you a substantial advantage. The key difference from laboratory statistics is that macro data is observational, non-stationary, subject to structural breaks, and revised after initial release. Treat every macro number with the same rigor you would apply to any other data source.

← Previous Course Home Next →