Module 17: Risk Management & Stress Testing

Quantifying the worst that could happen using VaR, Expected Shortfall, EVT, and stress testing

Part IV of 5 Module 17 of 22

← Previous Module 17 of 22 Next →

Introduction: Quantifying the Worst That Could Happen

Risk management is applied statistics with life-or-death stakes. The 2008 financial crisis was, at its core, a failure of risk measurement: institutions vastly underestimated the probability and magnitude of extreme losses. As a statistician, you already have the mathematical toolkit — distributions, quantiles, tail behavior, simulation. This module shows how that toolkit is deployed to answer the most important question in finance: how much could we lose?

Stats Bridge

Risk management translates directly into statistical estimation of tail quantiles and conditional expectations. Value at Risk is a quantile. Expected Shortfall is a conditional expectation. Stress testing is scenario analysis. Extreme Value Theory is tail modeling. Every concept in this module has a direct statistical counterpart.

1. Value at Risk (VaR): The Industry Standard

1.1 Definition

Value at Risk answers the question: “What is the maximum loss we expect over a given time horizon, at a given confidence level?”

Finance Term

Value at Risk (VaR) — The α-quantile of the portfolio loss distribution over a specified holding period. A 1-day 95% VaR of $1 million means there is a 5% probability that the portfolio will lose more than $1 million in one day.

VaR_α = −F_L⁻¹(α) = F_R⁻¹(α)

where F_L is the CDF of losses and α is typically 0.01 or 0.05

Stats Bridge

VaR is simply the α-quantile of the return distribution. If you have computed quantiles in R or Python, you have computed VaR. The only subtlety is the sign convention: losses are positive, so VaR = −quantile(α) of returns. A 99% VaR uses α = 0.01 (the 1st percentile of returns).

1.2 VaR Parameters

Parameter	Common Values	Choice Depends On
Confidence level	95%, 99%, 99.5%	Regulatory requirement; internal risk appetite
Holding period	1 day, 10 days	Asset liquidity; regulatory standard
Lookback window	250 days, 500 days, full history	Stationarity assumption; regime sensitivity

2. Historical Simulation VaR

2.1 Method

The simplest VaR approach: use the empirical distribution of historical returns and take the α-quantile directly. No distributional assumptions required.

Collect the last N daily returns (e.g., N = 500).
Sort them from worst to best.
The VaR at confidence level (1 − α) is the α·N-th worst return.

2.2 Pros and Cons

Pros	Cons
No distributional assumptions	Depends heavily on lookback window
Captures fat tails, skewness naturally	Cannot extrapolate beyond observed losses
Simple to implement and explain	Ghost effect: extreme events enter/exit window abruptly
Works for any portfolio	Assumes past distribution equals future distribution

Common Pitfall

Historical VaR cannot predict losses larger than any previously observed. If the worst day in your lookback window was −5%, your 99% VaR will never exceed 5%. This is a fundamental limitation: the method is blind to unprecedented events. The 2008 crash was “unprecedented” until it happened.

3. Parametric (Analytical) VaR

3.1 Normal Distribution VaR

Assume returns follow a normal distribution with mean μ and standard deviation σ:

VaR_α = −(μ + σ · Φ⁻¹(α))

For 99% VaR: VaR_0.01 = −(μ − 2.326 σ)

3.2 Student-t Distribution VaR

The normal distribution underestimates tail risk. The Student-t distribution, with its heavier tails controlled by the degrees-of-freedom parameter ν, provides a better fit:

VaR_α = −(μ + σ · t_ν⁻¹(α) · √((ν − 2)/ν))

Typical estimates for daily equity returns give ν ≈ 4–6, meaning tails are dramatically heavier than the normal distribution.

3.3 Impact of Distributional Assumptions

Distribution	99% VaR (σ = 1%)	99.5% VaR	Tail Behavior
Normal	2.33%	2.58%	Exponential decay (too thin)
Student-t (ν=5)	3.36%	4.03%	Power-law decay (realistic)
Student-t (ν=3)	4.54%	6.27%	Very heavy tails

Key Insight

The choice of distribution has a massive impact on risk estimates. At the 99% level, a Student-t with ν=5 gives VaR about 44% higher than the normal. For the 99.9% level (relevant for regulatory capital), the difference can be 2–3x. This is why the distributional assumption is the most consequential modeling decision in risk management.

4. Monte Carlo VaR

4.1 The Simulation Approach

Monte Carlo VaR generates thousands (or millions) of possible future scenarios by simulating from a model, then takes the quantile of the simulated loss distribution.

Specify a model for portfolio returns (e.g., multivariate normal, GARCH, copula).
Estimate model parameters from historical data.
Simulate N scenarios (e.g., N = 100,000) from the fitted model.
Compute portfolio loss for each scenario.
Take the α-quantile of the simulated losses.

Stats Bridge

Monte Carlo VaR is a direct application of the parametric bootstrap. You estimate a model, simulate from it, and use the simulation distribution to compute the statistic of interest (in this case, a quantile). The standard error of the Monte Carlo estimate decreases as 1/√N, so you need many simulations for precise tail estimates.

4.2 Advantages of Monte Carlo

Flexibility: Can incorporate any model — fat tails, asymmetry, time-varying volatility, copula dependence.
Non-linear portfolios: Essential for portfolios with options, where losses are non-linear functions of underlying risk factors.
Extrapolation: Can generate scenarios more extreme than historical experience.
Full distribution: Produces the entire loss distribution, not just a single quantile.

4.3 Precision and Computational Cost

SE(VaR_α) ≈ √(α(1−α)) / (n · f(VaR_α))

where f is the density at the VaR quantile and n is the number of simulations. For α = 0.01 (99% VaR), you need approximately 100,000 simulations to get stable estimates. For α = 0.001 (99.9%), you need millions.

5. Expected Shortfall (CVaR): Beyond VaR

5.1 The Limitation of VaR

VaR tells you the threshold of extreme losses, but nothing about what happens beyond that threshold. Two portfolios can have the same VaR but vastly different tail risk profiles. Expected Shortfall addresses this by averaging the losses in the tail.

Finance Term

Expected Shortfall (ES), also called Conditional VaR (CVaR) or Tail VaR — The expected loss given that the loss exceeds VaR. Formally: ES_α = E[L | L > VaR_α].

ES_α = E[−R | R ≤ −VaR_α] = (1/α) ∫₀^α VaR_u du

5.2 VaR vs Expected Shortfall

Property	VaR	Expected Shortfall
What it measures	Quantile (threshold)	Average loss in the tail
Tail sensitivity	None (ignores tail shape)	Captures tail severity
Coherence	Not subadditive	Coherent risk measure
Optimization	Non-convex	Convex (easier to optimize)
Estimation difficulty	Moderate	Higher (averaging over fewer observations)
Regulatory status	Basel II standard	Basel III / FRTB standard

6. Coherent Risk Measures

6.1 The Axioms

Artzner et al. (1999) proposed four axioms that a “sensible” risk measure ρ should satisfy:

Axiom	Formula	Meaning
Subadditivity	ρ(X + Y) ≤ ρ(X) + ρ(Y)	Diversification should not increase risk
Monotonicity	If X ≤ Y a.s., then ρ(X) ≥ ρ(Y)	Worse outcomes mean higher risk
Positive homogeneity	ρ(λX) = λρ(X), λ > 0	Doubling position doubles risk
Translation invariance	ρ(X + c) = ρ(X) − c	Adding cash reduces risk by that amount

Key Insight

VaR fails subadditivity. It is possible to construct two portfolios where VaR(A + B) > VaR(A) + VaR(B), meaning the “diversified” portfolio appears riskier than holding both positions separately. This is a mathematical absurdity for a risk measure. Expected Shortfall satisfies all four axioms, making it a coherent risk measure.

Stats Bridge

The subadditivity failure of VaR is analogous to how a quantile of a mixture distribution can exceed the mixture of quantiles. Expected Shortfall, being an average (integral) of quantiles, is always subadditive because integration preserves concavity. This is the same reason why expected values are always subadditive under convexity.

7. Python: Computing VaR and ES All Three Ways

Python
import numpy as np
import pandas as pd
from scipy import stats
import yfinance as yf

# Download portfolio data
tickers = ["SPY", "TLT", "GLD"]
weights = np.array([0.6, 0.3, 0.1])  # 60/30/10 portfolio
data = yf.download(tickers, start="2010-01-01", end="2023-12-31")
returns = data["Adj Close"].pct_change().dropna()

# Portfolio returns
port_returns = (returns * weights).sum(axis=1)
portfolio_value = 1_000_000  # $1M portfolio
alpha = 0.01  # 99% confidence

# ──────────────────────────────────────────────
# METHOD 1: Historical Simulation
# ──────────────────────────────────────────────
var_hist = -np.percentile(port_returns, alpha * 100)
# ES: average of returns below the VaR threshold
tail_returns = port_returns[port_returns <= -var_hist]
es_hist = -tail_returns.mean()

print("=== Historical Simulation ===")
print(f"99% VaR: {var_hist:.4%} (${var_hist * portfolio_value:,.0f})")
print(f"99% ES:  {es_hist:.4%} (${es_hist * portfolio_value:,.0f})")

# ──────────────────────────────────────────────
# METHOD 2: Parametric (Normal and Student-t)
# ──────────────────────────────────────────────
mu = port_returns.mean()
sigma = port_returns.std()

# Normal
var_norm = -(mu + sigma * stats.norm.ppf(alpha))
es_norm = -(mu - sigma * stats.norm.pdf(stats.norm.ppf(alpha)) / alpha)

# Student-t (fit degrees of freedom)
nu, loc, scale = stats.t.fit(port_returns)
var_t = -(loc + scale * stats.t.ppf(alpha, nu))
# ES for Student-t
t_pdf_at_var = stats.t.pdf(stats.t.ppf(alpha, nu), nu)
es_t = -(loc - scale * (t_pdf_at_var / alpha) * (nu + stats.t.ppf(alpha, nu)**2) / (nu - 1))

print("\n=== Parametric (Normal) ===")
print(f"99% VaR: {var_norm:.4%} (${var_norm * portfolio_value:,.0f})")
print(f"99% ES:  {es_norm:.4%} (${es_norm * portfolio_value:,.0f})")

print(f"\n=== Parametric (Student-t, nu={nu:.1f}) ===")
print(f"99% VaR: {var_t:.4%} (${var_t * portfolio_value:,.0f})")
print(f"99% ES:  {es_t:.4%} (${es_t * portfolio_value:,.0f})")

# ──────────────────────────────────────────────
# METHOD 3: Monte Carlo Simulation
# ──────────────────────────────────────────────
n_simulations = 100_000

# Simulate from fitted Student-t
np.random.seed(42)
simulated_returns = stats.t.rvs(nu, loc=loc, scale=scale, size=n_simulations)

var_mc = -np.percentile(simulated_returns, alpha * 100)
es_mc = -simulated_returns[simulated_returns <= -var_mc].mean()

print("\n=== Monte Carlo (100K simulations) ===")
print(f"99% VaR: {var_mc:.4%} (${var_mc * portfolio_value:,.0f})")
print(f"99% ES:  {es_mc:.4%} (${es_mc * portfolio_value:,.0f})")

# ──────────────────────────────────────────────
# Comparison summary
# ──────────────────────────────────────────────
summary = pd.DataFrame({
    "Historical": [var_hist, es_hist],
    "Normal": [var_norm, es_norm],
    "Student-t": [var_t, es_t],
    "Monte Carlo": [var_mc, es_mc],
}, index=["99% VaR", "99% ES"])

print("\n=== Summary (as % of portfolio) ===")
print((summary * 100).round(3))

8. Stress Testing: Beyond Statistical Models

8.1 The Philosophy

VaR and ES assume that the future resembles the past (through either historical data or fitted distributions). Stress testing asks a different question: what if something truly extreme happens? Stress tests apply specific hypothetical or historical scenarios to the portfolio and compute the resulting loss.

8.2 Types of Stress Tests

Type	Description	Example
Historical scenario	Replay a past crisis	Apply 2008 GFC market moves to current portfolio
Hypothetical scenario	Design a plausible but unprecedented event	US Treasury default; simultaneous equity and bond crash
Sensitivity analysis	Shock one risk factor at a time	What if rates rise 200bps? What if oil doubles?
Reverse stress test	Find the scenario that causes a specific loss	What market conditions would cause $10M loss?

8.3 Python: Stress Testing a Portfolio

Python
import numpy as np
import pandas as pd

# Current portfolio: 60% stocks, 30% bonds, 10% gold
weights = {"Equities": 0.60, "Bonds": 0.30, "Gold": 0.10}
portfolio_value = 1_000_000

# Historical crisis scenarios (approximate peak-to-trough moves)
scenarios = {
    "2008 GFC": {"Equities": -0.50, "Bonds": 0.20, "Gold": 0.05},
    "2020 COVID Crash": {"Equities": -0.34, "Bonds": 0.15, "Gold": -0.03},
    "2022 Rate Hikes": {"Equities": -0.25, "Bonds": -0.18, "Gold": 0.00},
    "Dot-Com Bust 2000": {"Equities": -0.45, "Bonds": 0.10, "Gold": -0.05},
    "Hypothetical: Stagflation": {"Equities": -0.30, "Bonds": -0.15, "Gold": 0.25},
    "Hypothetical: Everything Crash": {"Equities": -0.40, "Bonds": -0.20, "Gold": -0.10},
}

print(f"Portfolio: ${portfolio_value:,.0f}")
print(f"Weights: {weights}\n")

results = []
for scenario_name, shocks in scenarios.items():
    port_return = sum(weights[asset] * shocks[asset] for asset in weights)
    dollar_loss = port_return * portfolio_value
    results.append({
        "Scenario": scenario_name,
        "Equities": f"{shocks['Equities']:+.0%}",
        "Bonds": f"{shocks['Bonds']:+.0%}",
        "Gold": f"{shocks['Gold']:+.0%}",
        "Portfolio Return": f"{port_return:+.1%}",
        "Dollar P&L": f"${dollar_loss:+,.0f}",
    })

stress_df = pd.DataFrame(results)
print(stress_df.to_string(index=False))

Common Pitfall

The 2022 scenario is the most dangerous for a traditional 60/40 portfolio: stocks AND bonds fell simultaneously, destroying the diversification that investors assumed would protect them. Stress testing should always include scenarios where traditional correlations break down. The correlation between stocks and bonds is not a constant — it flips during certain macro regimes.

9. Extreme Value Theory (EVT): Modeling Only the Tails

9.1 Why EVT?

Standard distributions model the entire return distribution, but risk management cares primarily about the tails. Extreme Value Theory provides a rigorous framework for modeling only the extreme observations, using distributions justified by mathematical limit theorems (analogous to how the CLT justifies using the normal distribution for means).

Stats Bridge

EVT is to tail modeling what the Central Limit Theorem is to the normal distribution. The Fisher-Tippett-Gnedenko theorem shows that the distribution of the maximum of a large sample converges to one of three types: Gumbel, Fréchet, or Weibull — unified in the Generalized Extreme Value (GEV) distribution. For exceedances over a threshold, the analogous result gives the Generalized Pareto Distribution (GPD).

9.2 Two Approaches

Approach	Method	When to Use
Block Maxima	Take the maximum loss in each block (month/quarter); fit GEV	Seasonal patterns; sufficient data per block
Peaks Over Threshold (POT)	Take all observations exceeding a threshold u; fit GPD	More data-efficient; standard choice in finance

9.3 The Generalized Pareto Distribution

P(X > x | X > u) = (1 + ξ(x − u)/β)^−1/ξ

where ξ is the shape parameter (tail index) and β > 0 is the scale parameter

The shape parameter ξ determines the tail behavior:

ξ Value	Tail Type	Examples
ξ > 0	Heavy tail (Pareto-type)	Financial returns, insurance claims
ξ = 0	Exponential tail	Normal distribution
ξ < 0	Bounded tail	Uniform distribution; rare in finance

9.4 Choosing the Threshold

The threshold u is the key modeling choice in POT. Too low: you include non-extreme observations, violating the GPD assumption. Too high: you have too few exceedances for reliable estimation. Diagnostic tools include:

Mean Excess Plot: Plot E[X − u | X > u] vs u. Should be approximately linear for GPD data.
Parameter stability plot: Estimate GPD parameters for a range of thresholds. The parameters should stabilize above the correct threshold.
Rule of thumb: Use the 90th–95th percentile of losses as the threshold.

9.5 Python: Fitting GPD to Portfolio Losses

Python
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt

# Use portfolio returns from earlier
losses = -port_returns  # Convention: positive losses

# Step 1: Choose threshold (90th percentile of losses)
threshold = losses.quantile(0.90)
exceedances = losses[losses > threshold] - threshold
n_total = len(losses)
n_exceed = len(exceedances)

print(f"Threshold: {threshold:.4%}")
print(f"Exceedances: {n_exceed} out of {n_total} ({n_exceed/n_total:.1%})")

# Step 2: Fit GPD to exceedances
xi, loc, beta = stats.genpareto.fit(exceedances, floc=0)
print(f"GPD parameters: xi={xi:.4f}, beta={beta:.6f}")
print(f"Tail index xi={xi:.4f}: {'Heavy tail' if xi > 0 else 'Thin tail'}")

# Step 3: Compute VaR and ES using GPD
def gpd_var(alpha, n_total, n_exceed, threshold, xi, beta):
    """VaR from GPD tail estimate."""
    p_exceed = n_exceed / n_total
    return threshold + (beta / xi) * ((alpha / p_exceed)**(-xi) - 1)

def gpd_es(var_alpha, xi, beta, threshold):
    """ES from GPD tail estimate."""
    return var_alpha / (1 - xi) + (beta - xi * threshold) / (1 - xi)

for alpha in [0.05, 0.01, 0.005, 0.001]:
    var_gpd = gpd_var(alpha, n_total, n_exceed, threshold, xi, beta)
    es_gpd = gpd_es(var_gpd, xi, beta, threshold)
    print(f"{1-alpha:.1%} VaR (GPD): {var_gpd:.4%}  |  ES: {es_gpd:.4%}")

# Step 4: Mean Excess Plot
thresholds = np.linspace(losses.quantile(0.70), losses.quantile(0.98), 50)
mean_excess = [losses[losses > u].mean() - u for u in thresholds]

plt.figure(figsize=(10, 5))
plt.plot(thresholds * 100, mean_excess, "b.-")
plt.axvline(x=threshold * 100, color="red", linestyle="--",
            label=f"Chosen threshold ({threshold:.2%})")
plt.xlabel("Threshold (%)")
plt.ylabel("Mean Excess (%)")
plt.title("Mean Excess Plot (should be ~linear above good threshold)")
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

10. The Basel Regulatory Framework

10.1 Why Regulation Matters

Banks are required by regulators to hold capital reserves proportional to their risk. The Basel framework (Basel Committee on Banking Supervision) specifies how banks must measure and report risk. Understanding Basel is essential because it determines which statistical methods are mandated by law.

10.2 Evolution of Basel Risk Measures

Framework	Risk Measure	Confidence	Horizon
Basel II (1996–2016)	VaR	99%	10-day
Basel III / FRTB (2016+)	Expected Shortfall	97.5%	Variable (10–120 days by risk factor)

Finance Term

FRTB (Fundamental Review of the Trading Book) — The Basel III market risk framework that replaced VaR with Expected Shortfall as the primary risk measure for regulatory capital calculations. It also introduced liquidity-adjusted horizons, where less liquid risk factors require longer holding periods.

10.3 Backtesting VaR Models

Regulators require banks to backtest their VaR models by comparing predicted VaR to realized losses. If actual losses exceed VaR more often than expected (more than 1% of the time for 99% VaR), the model is penalized and the bank must hold more capital.

Stats Bridge

VaR backtesting is a binomial test. Under a correct 99% VaR model, the number of exceptions (days where loss exceeds VaR) in N days follows Binomial(N, 0.01). The Basel “traffic light” system maps the observed number of exceptions to regulatory penalties: green zone (0–4 exceptions in 250 days), yellow zone (5–9), red zone (10+).

11. Chapter Summary

Statistics Concept	Risk Management Application	Key Practical Note
Quantile estimation	Value at Risk	Not coherent; replaced by ES in Basel III
Conditional expectation	Expected Shortfall	Coherent; harder to estimate and backtest
Parametric bootstrap	Monte Carlo VaR	Need 100K+ simulations for 99% VaR
Empirical quantile	Historical simulation VaR	Cannot extrapolate beyond observed data
Extreme Value Theory	Tail risk modeling (GPD)	Threshold choice is critical
Scenario analysis	Stress testing	Test correlation breakdowns
Binomial test	VaR backtesting	Regulatory penalties for model failures

Key Insight

Risk management is not about getting the number right — it is about understanding the model risk in every risk estimate. Every VaR number comes with hidden assumptions about distributions, stationarity, and correlations. The best risk managers use multiple methods (historical, parametric, Monte Carlo, EVT, stress tests) and pay attention to where they disagree. Disagreement between methods is the most valuable signal in risk management.

← Previous Course Home Next →