Learn Without Walls

Module 17: Risk Management & Stress Testing

Quantifying the worst that could happen using VaR, Expected Shortfall, EVT, and stress testing

Part IV of 5 Module 17 of 22

Introduction: Quantifying the Worst That Could Happen

Risk management is applied statistics with life-or-death stakes. The 2008 financial crisis was, at its core, a failure of risk measurement: institutions vastly underestimated the probability and magnitude of extreme losses. As a statistician, you already have the mathematical toolkit — distributions, quantiles, tail behavior, simulation. This module shows how that toolkit is deployed to answer the most important question in finance: how much could we lose?

Stats Bridge

Risk management translates directly into statistical estimation of tail quantiles and conditional expectations. Value at Risk is a quantile. Expected Shortfall is a conditional expectation. Stress testing is scenario analysis. Extreme Value Theory is tail modeling. Every concept in this module has a direct statistical counterpart.

1. Value at Risk (VaR): The Industry Standard

1.1 Definition

Value at Risk answers the question: “What is the maximum loss we expect over a given time horizon, at a given confidence level?”

Finance Term

Value at Risk (VaR) — The α-quantile of the portfolio loss distribution over a specified holding period. A 1-day 95% VaR of $1 million means there is a 5% probability that the portfolio will lose more than $1 million in one day.

VaRα = −FL−1(α) = FR−1(α)

where FL is the CDF of losses and α is typically 0.01 or 0.05
Stats Bridge

VaR is simply the α-quantile of the return distribution. If you have computed quantiles in R or Python, you have computed VaR. The only subtlety is the sign convention: losses are positive, so VaR = −quantile(α) of returns. A 99% VaR uses α = 0.01 (the 1st percentile of returns).

1.2 VaR Parameters

Parameter Common Values Choice Depends On
Confidence level 95%, 99%, 99.5% Regulatory requirement; internal risk appetite
Holding period 1 day, 10 days Asset liquidity; regulatory standard
Lookback window 250 days, 500 days, full history Stationarity assumption; regime sensitivity

2. Historical Simulation VaR

2.1 Method

The simplest VaR approach: use the empirical distribution of historical returns and take the α-quantile directly. No distributional assumptions required.

  1. Collect the last N daily returns (e.g., N = 500).
  2. Sort them from worst to best.
  3. The VaR at confidence level (1 − α) is the α·N-th worst return.

2.2 Pros and Cons

Pros Cons
No distributional assumptions Depends heavily on lookback window
Captures fat tails, skewness naturally Cannot extrapolate beyond observed losses
Simple to implement and explain Ghost effect: extreme events enter/exit window abruptly
Works for any portfolio Assumes past distribution equals future distribution
Common Pitfall

Historical VaR cannot predict losses larger than any previously observed. If the worst day in your lookback window was −5%, your 99% VaR will never exceed 5%. This is a fundamental limitation: the method is blind to unprecedented events. The 2008 crash was “unprecedented” until it happened.

3. Parametric (Analytical) VaR

3.1 Normal Distribution VaR

Assume returns follow a normal distribution with mean μ and standard deviation σ:

VaRα = −(μ + σ · Φ−1(α))

For 99% VaR: VaR0.01 = −(μ − 2.326 σ)

3.2 Student-t Distribution VaR

The normal distribution underestimates tail risk. The Student-t distribution, with its heavier tails controlled by the degrees-of-freedom parameter ν, provides a better fit:

VaRα = −(μ + σ · tν−1(α) · √((ν − 2)/ν))

Typical estimates for daily equity returns give ν ≈ 4–6, meaning tails are dramatically heavier than the normal distribution.

3.3 Impact of Distributional Assumptions

Distribution 99% VaR (σ = 1%) 99.5% VaR Tail Behavior
Normal 2.33% 2.58% Exponential decay (too thin)
Student-t (ν=5) 3.36% 4.03% Power-law decay (realistic)
Student-t (ν=3) 4.54% 6.27% Very heavy tails
Key Insight

The choice of distribution has a massive impact on risk estimates. At the 99% level, a Student-t with ν=5 gives VaR about 44% higher than the normal. For the 99.9% level (relevant for regulatory capital), the difference can be 2–3x. This is why the distributional assumption is the most consequential modeling decision in risk management.

4. Monte Carlo VaR

4.1 The Simulation Approach

Monte Carlo VaR generates thousands (or millions) of possible future scenarios by simulating from a model, then takes the quantile of the simulated loss distribution.

  1. Specify a model for portfolio returns (e.g., multivariate normal, GARCH, copula).
  2. Estimate model parameters from historical data.
  3. Simulate N scenarios (e.g., N = 100,000) from the fitted model.
  4. Compute portfolio loss for each scenario.
  5. Take the α-quantile of the simulated losses.
Stats Bridge

Monte Carlo VaR is a direct application of the parametric bootstrap. You estimate a model, simulate from it, and use the simulation distribution to compute the statistic of interest (in this case, a quantile). The standard error of the Monte Carlo estimate decreases as 1/√N, so you need many simulations for precise tail estimates.

4.2 Advantages of Monte Carlo

4.3 Precision and Computational Cost

SE(VaRα) ≈ √(α(1−α)) / (n · f(VaRα))

where f is the density at the VaR quantile and n is the number of simulations. For α = 0.01 (99% VaR), you need approximately 100,000 simulations to get stable estimates. For α = 0.001 (99.9%), you need millions.

5. Expected Shortfall (CVaR): Beyond VaR

5.1 The Limitation of VaR

VaR tells you the threshold of extreme losses, but nothing about what happens beyond that threshold. Two portfolios can have the same VaR but vastly different tail risk profiles. Expected Shortfall addresses this by averaging the losses in the tail.

Finance Term

Expected Shortfall (ES), also called Conditional VaR (CVaR) or Tail VaR — The expected loss given that the loss exceeds VaR. Formally: ESα = E[L | L > VaRα].

ESα = E[−R | R ≤ −VaRα] = (1/α) ∫0α VaRu du

5.2 VaR vs Expected Shortfall

Property VaR Expected Shortfall
What it measures Quantile (threshold) Average loss in the tail
Tail sensitivity None (ignores tail shape) Captures tail severity
Coherence Not subadditive Coherent risk measure
Optimization Non-convex Convex (easier to optimize)
Estimation difficulty Moderate Higher (averaging over fewer observations)
Regulatory status Basel II standard Basel III / FRTB standard

6. Coherent Risk Measures

6.1 The Axioms

Artzner et al. (1999) proposed four axioms that a “sensible” risk measure ρ should satisfy:

Axiom Formula Meaning
Subadditivity ρ(X + Y) ≤ ρ(X) + ρ(Y) Diversification should not increase risk
Monotonicity If X ≤ Y a.s., then ρ(X) ≥ ρ(Y) Worse outcomes mean higher risk
Positive homogeneity ρ(λX) = λρ(X), λ > 0 Doubling position doubles risk
Translation invariance ρ(X + c) = ρ(X) − c Adding cash reduces risk by that amount
Key Insight

VaR fails subadditivity. It is possible to construct two portfolios where VaR(A + B) > VaR(A) + VaR(B), meaning the “diversified” portfolio appears riskier than holding both positions separately. This is a mathematical absurdity for a risk measure. Expected Shortfall satisfies all four axioms, making it a coherent risk measure.

Stats Bridge

The subadditivity failure of VaR is analogous to how a quantile of a mixture distribution can exceed the mixture of quantiles. Expected Shortfall, being an average (integral) of quantiles, is always subadditive because integration preserves concavity. This is the same reason why expected values are always subadditive under convexity.

7. Python: Computing VaR and ES All Three Ways

Python
import numpy as np
import pandas as pd
from scipy import stats
import yfinance as yf

# Download portfolio data
tickers = ["SPY", "TLT", "GLD"]
weights = np.array([0.6, 0.3, 0.1])  # 60/30/10 portfolio
data = yf.download(tickers, start="2010-01-01", end="2023-12-31")
returns = data["Adj Close"].pct_change().dropna()

# Portfolio returns
port_returns = (returns * weights).sum(axis=1)
portfolio_value = 1_000_000  # $1M portfolio
alpha = 0.01  # 99% confidence

# ──────────────────────────────────────────────
# METHOD 1: Historical Simulation
# ──────────────────────────────────────────────
var_hist = -np.percentile(port_returns, alpha * 100)
# ES: average of returns below the VaR threshold
tail_returns = port_returns[port_returns <= -var_hist]
es_hist = -tail_returns.mean()

print("=== Historical Simulation ===")
print(f"99% VaR: {var_hist:.4%} (${var_hist * portfolio_value:,.0f})")
print(f"99% ES:  {es_hist:.4%} (${es_hist * portfolio_value:,.0f})")

# ──────────────────────────────────────────────
# METHOD 2: Parametric (Normal and Student-t)
# ──────────────────────────────────────────────
mu = port_returns.mean()
sigma = port_returns.std()

# Normal
var_norm = -(mu + sigma * stats.norm.ppf(alpha))
es_norm = -(mu - sigma * stats.norm.pdf(stats.norm.ppf(alpha)) / alpha)

# Student-t (fit degrees of freedom)
nu, loc, scale = stats.t.fit(port_returns)
var_t = -(loc + scale * stats.t.ppf(alpha, nu))
# ES for Student-t
t_pdf_at_var = stats.t.pdf(stats.t.ppf(alpha, nu), nu)
es_t = -(loc - scale * (t_pdf_at_var / alpha) * (nu + stats.t.ppf(alpha, nu)**2) / (nu - 1))

print("\n=== Parametric (Normal) ===")
print(f"99% VaR: {var_norm:.4%} (${var_norm * portfolio_value:,.0f})")
print(f"99% ES:  {es_norm:.4%} (${es_norm * portfolio_value:,.0f})")

print(f"\n=== Parametric (Student-t, nu={nu:.1f}) ===")
print(f"99% VaR: {var_t:.4%} (${var_t * portfolio_value:,.0f})")
print(f"99% ES:  {es_t:.4%} (${es_t * portfolio_value:,.0f})")

# ──────────────────────────────────────────────
# METHOD 3: Monte Carlo Simulation
# ──────────────────────────────────────────────
n_simulations = 100_000

# Simulate from fitted Student-t
np.random.seed(42)
simulated_returns = stats.t.rvs(nu, loc=loc, scale=scale, size=n_simulations)

var_mc = -np.percentile(simulated_returns, alpha * 100)
es_mc = -simulated_returns[simulated_returns <= -var_mc].mean()

print("\n=== Monte Carlo (100K simulations) ===")
print(f"99% VaR: {var_mc:.4%} (${var_mc * portfolio_value:,.0f})")
print(f"99% ES:  {es_mc:.4%} (${es_mc * portfolio_value:,.0f})")

# ──────────────────────────────────────────────
# Comparison summary
# ──────────────────────────────────────────────
summary = pd.DataFrame({
    "Historical": [var_hist, es_hist],
    "Normal": [var_norm, es_norm],
    "Student-t": [var_t, es_t],
    "Monte Carlo": [var_mc, es_mc],
}, index=["99% VaR", "99% ES"])

print("\n=== Summary (as % of portfolio) ===")
print((summary * 100).round(3))

8. Stress Testing: Beyond Statistical Models

8.1 The Philosophy

VaR and ES assume that the future resembles the past (through either historical data or fitted distributions). Stress testing asks a different question: what if something truly extreme happens? Stress tests apply specific hypothetical or historical scenarios to the portfolio and compute the resulting loss.

8.2 Types of Stress Tests

Type Description Example
Historical scenario Replay a past crisis Apply 2008 GFC market moves to current portfolio
Hypothetical scenario Design a plausible but unprecedented event US Treasury default; simultaneous equity and bond crash
Sensitivity analysis Shock one risk factor at a time What if rates rise 200bps? What if oil doubles?
Reverse stress test Find the scenario that causes a specific loss What market conditions would cause $10M loss?

8.3 Python: Stress Testing a Portfolio

Python
import numpy as np
import pandas as pd

# Current portfolio: 60% stocks, 30% bonds, 10% gold
weights = {"Equities": 0.60, "Bonds": 0.30, "Gold": 0.10}
portfolio_value = 1_000_000

# Historical crisis scenarios (approximate peak-to-trough moves)
scenarios = {
    "2008 GFC": {"Equities": -0.50, "Bonds": 0.20, "Gold": 0.05},
    "2020 COVID Crash": {"Equities": -0.34, "Bonds": 0.15, "Gold": -0.03},
    "2022 Rate Hikes": {"Equities": -0.25, "Bonds": -0.18, "Gold": 0.00},
    "Dot-Com Bust 2000": {"Equities": -0.45, "Bonds": 0.10, "Gold": -0.05},
    "Hypothetical: Stagflation": {"Equities": -0.30, "Bonds": -0.15, "Gold": 0.25},
    "Hypothetical: Everything Crash": {"Equities": -0.40, "Bonds": -0.20, "Gold": -0.10},
}

print(f"Portfolio: ${portfolio_value:,.0f}")
print(f"Weights: {weights}\n")

results = []
for scenario_name, shocks in scenarios.items():
    port_return = sum(weights[asset] * shocks[asset] for asset in weights)
    dollar_loss = port_return * portfolio_value
    results.append({
        "Scenario": scenario_name,
        "Equities": f"{shocks['Equities']:+.0%}",
        "Bonds": f"{shocks['Bonds']:+.0%}",
        "Gold": f"{shocks['Gold']:+.0%}",
        "Portfolio Return": f"{port_return:+.1%}",
        "Dollar P&L": f"${dollar_loss:+,.0f}",
    })

stress_df = pd.DataFrame(results)
print(stress_df.to_string(index=False))
Common Pitfall

The 2022 scenario is the most dangerous for a traditional 60/40 portfolio: stocks AND bonds fell simultaneously, destroying the diversification that investors assumed would protect them. Stress testing should always include scenarios where traditional correlations break down. The correlation between stocks and bonds is not a constant — it flips during certain macro regimes.

9. Extreme Value Theory (EVT): Modeling Only the Tails

9.1 Why EVT?

Standard distributions model the entire return distribution, but risk management cares primarily about the tails. Extreme Value Theory provides a rigorous framework for modeling only the extreme observations, using distributions justified by mathematical limit theorems (analogous to how the CLT justifies using the normal distribution for means).

Stats Bridge

EVT is to tail modeling what the Central Limit Theorem is to the normal distribution. The Fisher-Tippett-Gnedenko theorem shows that the distribution of the maximum of a large sample converges to one of three types: Gumbel, Fréchet, or Weibull — unified in the Generalized Extreme Value (GEV) distribution. For exceedances over a threshold, the analogous result gives the Generalized Pareto Distribution (GPD).

9.2 Two Approaches

Approach Method When to Use
Block Maxima Take the maximum loss in each block (month/quarter); fit GEV Seasonal patterns; sufficient data per block
Peaks Over Threshold (POT) Take all observations exceeding a threshold u; fit GPD More data-efficient; standard choice in finance

9.3 The Generalized Pareto Distribution

P(X > x | X > u) = (1 + ξ(x − u)/β)−1/ξ

where ξ is the shape parameter (tail index) and β > 0 is the scale parameter

The shape parameter ξ determines the tail behavior:

ξ Value Tail Type Examples
ξ > 0 Heavy tail (Pareto-type) Financial returns, insurance claims
ξ = 0 Exponential tail Normal distribution
ξ < 0 Bounded tail Uniform distribution; rare in finance

9.4 Choosing the Threshold

The threshold u is the key modeling choice in POT. Too low: you include non-extreme observations, violating the GPD assumption. Too high: you have too few exceedances for reliable estimation. Diagnostic tools include:

9.5 Python: Fitting GPD to Portfolio Losses

Python
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt

# Use portfolio returns from earlier
losses = -port_returns  # Convention: positive losses

# Step 1: Choose threshold (90th percentile of losses)
threshold = losses.quantile(0.90)
exceedances = losses[losses > threshold] - threshold
n_total = len(losses)
n_exceed = len(exceedances)

print(f"Threshold: {threshold:.4%}")
print(f"Exceedances: {n_exceed} out of {n_total} ({n_exceed/n_total:.1%})")

# Step 2: Fit GPD to exceedances
xi, loc, beta = stats.genpareto.fit(exceedances, floc=0)
print(f"GPD parameters: xi={xi:.4f}, beta={beta:.6f}")
print(f"Tail index xi={xi:.4f}: {'Heavy tail' if xi > 0 else 'Thin tail'}")

# Step 3: Compute VaR and ES using GPD
def gpd_var(alpha, n_total, n_exceed, threshold, xi, beta):
    """VaR from GPD tail estimate."""
    p_exceed = n_exceed / n_total
    return threshold + (beta / xi) * ((alpha / p_exceed)**(-xi) - 1)

def gpd_es(var_alpha, xi, beta, threshold):
    """ES from GPD tail estimate."""
    return var_alpha / (1 - xi) + (beta - xi * threshold) / (1 - xi)

for alpha in [0.05, 0.01, 0.005, 0.001]:
    var_gpd = gpd_var(alpha, n_total, n_exceed, threshold, xi, beta)
    es_gpd = gpd_es(var_gpd, xi, beta, threshold)
    print(f"{1-alpha:.1%} VaR (GPD): {var_gpd:.4%}  |  ES: {es_gpd:.4%}")

# Step 4: Mean Excess Plot
thresholds = np.linspace(losses.quantile(0.70), losses.quantile(0.98), 50)
mean_excess = [losses[losses > u].mean() - u for u in thresholds]

plt.figure(figsize=(10, 5))
plt.plot(thresholds * 100, mean_excess, "b.-")
plt.axvline(x=threshold * 100, color="red", linestyle="--",
            label=f"Chosen threshold ({threshold:.2%})")
plt.xlabel("Threshold (%)")
plt.ylabel("Mean Excess (%)")
plt.title("Mean Excess Plot (should be ~linear above good threshold)")
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

10. The Basel Regulatory Framework

10.1 Why Regulation Matters

Banks are required by regulators to hold capital reserves proportional to their risk. The Basel framework (Basel Committee on Banking Supervision) specifies how banks must measure and report risk. Understanding Basel is essential because it determines which statistical methods are mandated by law.

10.2 Evolution of Basel Risk Measures

Framework Risk Measure Confidence Horizon
Basel II (1996–2016) VaR 99% 10-day
Basel III / FRTB (2016+) Expected Shortfall 97.5% Variable (10–120 days by risk factor)
Finance Term

FRTB (Fundamental Review of the Trading Book) — The Basel III market risk framework that replaced VaR with Expected Shortfall as the primary risk measure for regulatory capital calculations. It also introduced liquidity-adjusted horizons, where less liquid risk factors require longer holding periods.

10.3 Backtesting VaR Models

Regulators require banks to backtest their VaR models by comparing predicted VaR to realized losses. If actual losses exceed VaR more often than expected (more than 1% of the time for 99% VaR), the model is penalized and the bank must hold more capital.

Stats Bridge

VaR backtesting is a binomial test. Under a correct 99% VaR model, the number of exceptions (days where loss exceeds VaR) in N days follows Binomial(N, 0.01). The Basel “traffic light” system maps the observed number of exceptions to regulatory penalties: green zone (0–4 exceptions in 250 days), yellow zone (5–9), red zone (10+).

11. Chapter Summary

Statistics Concept Risk Management Application Key Practical Note
Quantile estimation Value at Risk Not coherent; replaced by ES in Basel III
Conditional expectation Expected Shortfall Coherent; harder to estimate and backtest
Parametric bootstrap Monte Carlo VaR Need 100K+ simulations for 99% VaR
Empirical quantile Historical simulation VaR Cannot extrapolate beyond observed data
Extreme Value Theory Tail risk modeling (GPD) Threshold choice is critical
Scenario analysis Stress testing Test correlation breakdowns
Binomial test VaR backtesting Regulatory penalties for model failures
Key Insight

Risk management is not about getting the number right — it is about understanding the model risk in every risk estimate. Every VaR number comes with hidden assumptions about distributions, stationarity, and correlations. The best risk managers use multiple methods (historical, parametric, Monte Carlo, EVT, stress tests) and pay attention to where they disagree. Disagreement between methods is the most valuable signal in risk management.