Module 17: Risk Management & Stress Testing
Quantifying the worst that could happen using VaR, Expected Shortfall, EVT, and stress testing
Introduction: Quantifying the Worst That Could Happen
Risk management is applied statistics with life-or-death stakes. The 2008 financial crisis was, at its core, a failure of risk measurement: institutions vastly underestimated the probability and magnitude of extreme losses. As a statistician, you already have the mathematical toolkit — distributions, quantiles, tail behavior, simulation. This module shows how that toolkit is deployed to answer the most important question in finance: how much could we lose?
Risk management translates directly into statistical estimation of tail quantiles and conditional expectations. Value at Risk is a quantile. Expected Shortfall is a conditional expectation. Stress testing is scenario analysis. Extreme Value Theory is tail modeling. Every concept in this module has a direct statistical counterpart.
1. Value at Risk (VaR): The Industry Standard
1.1 Definition
Value at Risk answers the question: “What is the maximum loss we expect over a given time horizon, at a given confidence level?”
Value at Risk (VaR) — The α-quantile of the portfolio loss distribution over a specified holding period. A 1-day 95% VaR of $1 million means there is a 5% probability that the portfolio will lose more than $1 million in one day.
where FL is the CDF of losses and α is typically 0.01 or 0.05
VaR is simply the α-quantile of the return distribution. If you have computed quantiles in R or Python, you have computed VaR. The only subtlety is the sign convention: losses are positive, so VaR = −quantile(α) of returns. A 99% VaR uses α = 0.01 (the 1st percentile of returns).
1.2 VaR Parameters
| Parameter | Common Values | Choice Depends On |
|---|---|---|
| Confidence level | 95%, 99%, 99.5% | Regulatory requirement; internal risk appetite |
| Holding period | 1 day, 10 days | Asset liquidity; regulatory standard |
| Lookback window | 250 days, 500 days, full history | Stationarity assumption; regime sensitivity |
2. Historical Simulation VaR
2.1 Method
The simplest VaR approach: use the empirical distribution of historical returns and take the α-quantile directly. No distributional assumptions required.
- Collect the last N daily returns (e.g., N = 500).
- Sort them from worst to best.
- The VaR at confidence level (1 − α) is the α·N-th worst return.
2.2 Pros and Cons
| Pros | Cons |
|---|---|
| No distributional assumptions | Depends heavily on lookback window |
| Captures fat tails, skewness naturally | Cannot extrapolate beyond observed losses |
| Simple to implement and explain | Ghost effect: extreme events enter/exit window abruptly |
| Works for any portfolio | Assumes past distribution equals future distribution |
Historical VaR cannot predict losses larger than any previously observed. If the worst day in your lookback window was −5%, your 99% VaR will never exceed 5%. This is a fundamental limitation: the method is blind to unprecedented events. The 2008 crash was “unprecedented” until it happened.
3. Parametric (Analytical) VaR
3.1 Normal Distribution VaR
Assume returns follow a normal distribution with mean μ and standard deviation σ:
For 99% VaR: VaR0.01 = −(μ − 2.326 σ)
3.2 Student-t Distribution VaR
The normal distribution underestimates tail risk. The Student-t distribution, with its heavier tails controlled by the degrees-of-freedom parameter ν, provides a better fit:
Typical estimates for daily equity returns give ν ≈ 4–6, meaning tails are dramatically heavier than the normal distribution.
3.3 Impact of Distributional Assumptions
| Distribution | 99% VaR (σ = 1%) | 99.5% VaR | Tail Behavior |
|---|---|---|---|
| Normal | 2.33% | 2.58% | Exponential decay (too thin) |
| Student-t (ν=5) | 3.36% | 4.03% | Power-law decay (realistic) |
| Student-t (ν=3) | 4.54% | 6.27% | Very heavy tails |
The choice of distribution has a massive impact on risk estimates. At the 99% level, a Student-t with ν=5 gives VaR about 44% higher than the normal. For the 99.9% level (relevant for regulatory capital), the difference can be 2–3x. This is why the distributional assumption is the most consequential modeling decision in risk management.
4. Monte Carlo VaR
4.1 The Simulation Approach
Monte Carlo VaR generates thousands (or millions) of possible future scenarios by simulating from a model, then takes the quantile of the simulated loss distribution.
- Specify a model for portfolio returns (e.g., multivariate normal, GARCH, copula).
- Estimate model parameters from historical data.
- Simulate N scenarios (e.g., N = 100,000) from the fitted model.
- Compute portfolio loss for each scenario.
- Take the α-quantile of the simulated losses.
Monte Carlo VaR is a direct application of the parametric bootstrap. You estimate a model, simulate from it, and use the simulation distribution to compute the statistic of interest (in this case, a quantile). The standard error of the Monte Carlo estimate decreases as 1/√N, so you need many simulations for precise tail estimates.
4.2 Advantages of Monte Carlo
- Flexibility: Can incorporate any model — fat tails, asymmetry, time-varying volatility, copula dependence.
- Non-linear portfolios: Essential for portfolios with options, where losses are non-linear functions of underlying risk factors.
- Extrapolation: Can generate scenarios more extreme than historical experience.
- Full distribution: Produces the entire loss distribution, not just a single quantile.
4.3 Precision and Computational Cost
where f is the density at the VaR quantile and n is the number of simulations. For α = 0.01 (99% VaR), you need approximately 100,000 simulations to get stable estimates. For α = 0.001 (99.9%), you need millions.
5. Expected Shortfall (CVaR): Beyond VaR
5.1 The Limitation of VaR
VaR tells you the threshold of extreme losses, but nothing about what happens beyond that threshold. Two portfolios can have the same VaR but vastly different tail risk profiles. Expected Shortfall addresses this by averaging the losses in the tail.
Expected Shortfall (ES), also called Conditional VaR (CVaR) or Tail VaR — The expected loss given that the loss exceeds VaR. Formally: ESα = E[L | L > VaRα].
5.2 VaR vs Expected Shortfall
| Property | VaR | Expected Shortfall |
|---|---|---|
| What it measures | Quantile (threshold) | Average loss in the tail |
| Tail sensitivity | None (ignores tail shape) | Captures tail severity |
| Coherence | Not subadditive | Coherent risk measure |
| Optimization | Non-convex | Convex (easier to optimize) |
| Estimation difficulty | Moderate | Higher (averaging over fewer observations) |
| Regulatory status | Basel II standard | Basel III / FRTB standard |
6. Coherent Risk Measures
6.1 The Axioms
Artzner et al. (1999) proposed four axioms that a “sensible” risk measure ρ should satisfy:
| Axiom | Formula | Meaning |
|---|---|---|
| Subadditivity | ρ(X + Y) ≤ ρ(X) + ρ(Y) | Diversification should not increase risk |
| Monotonicity | If X ≤ Y a.s., then ρ(X) ≥ ρ(Y) | Worse outcomes mean higher risk |
| Positive homogeneity | ρ(λX) = λρ(X), λ > 0 | Doubling position doubles risk |
| Translation invariance | ρ(X + c) = ρ(X) − c | Adding cash reduces risk by that amount |
VaR fails subadditivity. It is possible to construct two portfolios where VaR(A + B) > VaR(A) + VaR(B), meaning the “diversified” portfolio appears riskier than holding both positions separately. This is a mathematical absurdity for a risk measure. Expected Shortfall satisfies all four axioms, making it a coherent risk measure.
The subadditivity failure of VaR is analogous to how a quantile of a mixture distribution can exceed the mixture of quantiles. Expected Shortfall, being an average (integral) of quantiles, is always subadditive because integration preserves concavity. This is the same reason why expected values are always subadditive under convexity.
7. Python: Computing VaR and ES All Three Ways
Python import numpy as np import pandas as pd from scipy import stats import yfinance as yf # Download portfolio data tickers = ["SPY", "TLT", "GLD"] weights = np.array([0.6, 0.3, 0.1]) # 60/30/10 portfolio data = yf.download(tickers, start="2010-01-01", end="2023-12-31") returns = data["Adj Close"].pct_change().dropna() # Portfolio returns port_returns = (returns * weights).sum(axis=1) portfolio_value = 1_000_000 # $1M portfolio alpha = 0.01 # 99% confidence # ────────────────────────────────────────────── # METHOD 1: Historical Simulation # ────────────────────────────────────────────── var_hist = -np.percentile(port_returns, alpha * 100) # ES: average of returns below the VaR threshold tail_returns = port_returns[port_returns <= -var_hist] es_hist = -tail_returns.mean() print("=== Historical Simulation ===") print(f"99% VaR: {var_hist:.4%} (${var_hist * portfolio_value:,.0f})") print(f"99% ES: {es_hist:.4%} (${es_hist * portfolio_value:,.0f})") # ────────────────────────────────────────────── # METHOD 2: Parametric (Normal and Student-t) # ────────────────────────────────────────────── mu = port_returns.mean() sigma = port_returns.std() # Normal var_norm = -(mu + sigma * stats.norm.ppf(alpha)) es_norm = -(mu - sigma * stats.norm.pdf(stats.norm.ppf(alpha)) / alpha) # Student-t (fit degrees of freedom) nu, loc, scale = stats.t.fit(port_returns) var_t = -(loc + scale * stats.t.ppf(alpha, nu)) # ES for Student-t t_pdf_at_var = stats.t.pdf(stats.t.ppf(alpha, nu), nu) es_t = -(loc - scale * (t_pdf_at_var / alpha) * (nu + stats.t.ppf(alpha, nu)**2) / (nu - 1)) print("\n=== Parametric (Normal) ===") print(f"99% VaR: {var_norm:.4%} (${var_norm * portfolio_value:,.0f})") print(f"99% ES: {es_norm:.4%} (${es_norm * portfolio_value:,.0f})") print(f"\n=== Parametric (Student-t, nu={nu:.1f}) ===") print(f"99% VaR: {var_t:.4%} (${var_t * portfolio_value:,.0f})") print(f"99% ES: {es_t:.4%} (${es_t * portfolio_value:,.0f})") # ────────────────────────────────────────────── # METHOD 3: Monte Carlo Simulation # ────────────────────────────────────────────── n_simulations = 100_000 # Simulate from fitted Student-t np.random.seed(42) simulated_returns = stats.t.rvs(nu, loc=loc, scale=scale, size=n_simulations) var_mc = -np.percentile(simulated_returns, alpha * 100) es_mc = -simulated_returns[simulated_returns <= -var_mc].mean() print("\n=== Monte Carlo (100K simulations) ===") print(f"99% VaR: {var_mc:.4%} (${var_mc * portfolio_value:,.0f})") print(f"99% ES: {es_mc:.4%} (${es_mc * portfolio_value:,.0f})") # ────────────────────────────────────────────── # Comparison summary # ────────────────────────────────────────────── summary = pd.DataFrame({ "Historical": [var_hist, es_hist], "Normal": [var_norm, es_norm], "Student-t": [var_t, es_t], "Monte Carlo": [var_mc, es_mc], }, index=["99% VaR", "99% ES"]) print("\n=== Summary (as % of portfolio) ===") print((summary * 100).round(3))
8. Stress Testing: Beyond Statistical Models
8.1 The Philosophy
VaR and ES assume that the future resembles the past (through either historical data or fitted distributions). Stress testing asks a different question: what if something truly extreme happens? Stress tests apply specific hypothetical or historical scenarios to the portfolio and compute the resulting loss.
8.2 Types of Stress Tests
| Type | Description | Example |
|---|---|---|
| Historical scenario | Replay a past crisis | Apply 2008 GFC market moves to current portfolio |
| Hypothetical scenario | Design a plausible but unprecedented event | US Treasury default; simultaneous equity and bond crash |
| Sensitivity analysis | Shock one risk factor at a time | What if rates rise 200bps? What if oil doubles? |
| Reverse stress test | Find the scenario that causes a specific loss | What market conditions would cause $10M loss? |
8.3 Python: Stress Testing a Portfolio
Python import numpy as np import pandas as pd # Current portfolio: 60% stocks, 30% bonds, 10% gold weights = {"Equities": 0.60, "Bonds": 0.30, "Gold": 0.10} portfolio_value = 1_000_000 # Historical crisis scenarios (approximate peak-to-trough moves) scenarios = { "2008 GFC": {"Equities": -0.50, "Bonds": 0.20, "Gold": 0.05}, "2020 COVID Crash": {"Equities": -0.34, "Bonds": 0.15, "Gold": -0.03}, "2022 Rate Hikes": {"Equities": -0.25, "Bonds": -0.18, "Gold": 0.00}, "Dot-Com Bust 2000": {"Equities": -0.45, "Bonds": 0.10, "Gold": -0.05}, "Hypothetical: Stagflation": {"Equities": -0.30, "Bonds": -0.15, "Gold": 0.25}, "Hypothetical: Everything Crash": {"Equities": -0.40, "Bonds": -0.20, "Gold": -0.10}, } print(f"Portfolio: ${portfolio_value:,.0f}") print(f"Weights: {weights}\n") results = [] for scenario_name, shocks in scenarios.items(): port_return = sum(weights[asset] * shocks[asset] for asset in weights) dollar_loss = port_return * portfolio_value results.append({ "Scenario": scenario_name, "Equities": f"{shocks['Equities']:+.0%}", "Bonds": f"{shocks['Bonds']:+.0%}", "Gold": f"{shocks['Gold']:+.0%}", "Portfolio Return": f"{port_return:+.1%}", "Dollar P&L": f"${dollar_loss:+,.0f}", }) stress_df = pd.DataFrame(results) print(stress_df.to_string(index=False))
The 2022 scenario is the most dangerous for a traditional 60/40 portfolio: stocks AND bonds fell simultaneously, destroying the diversification that investors assumed would protect them. Stress testing should always include scenarios where traditional correlations break down. The correlation between stocks and bonds is not a constant — it flips during certain macro regimes.
9. Extreme Value Theory (EVT): Modeling Only the Tails
9.1 Why EVT?
Standard distributions model the entire return distribution, but risk management cares primarily about the tails. Extreme Value Theory provides a rigorous framework for modeling only the extreme observations, using distributions justified by mathematical limit theorems (analogous to how the CLT justifies using the normal distribution for means).
EVT is to tail modeling what the Central Limit Theorem is to the normal distribution. The Fisher-Tippett-Gnedenko theorem shows that the distribution of the maximum of a large sample converges to one of three types: Gumbel, Fréchet, or Weibull — unified in the Generalized Extreme Value (GEV) distribution. For exceedances over a threshold, the analogous result gives the Generalized Pareto Distribution (GPD).
9.2 Two Approaches
| Approach | Method | When to Use |
|---|---|---|
| Block Maxima | Take the maximum loss in each block (month/quarter); fit GEV | Seasonal patterns; sufficient data per block |
| Peaks Over Threshold (POT) | Take all observations exceeding a threshold u; fit GPD | More data-efficient; standard choice in finance |
9.3 The Generalized Pareto Distribution
where ξ is the shape parameter (tail index) and β > 0 is the scale parameter
The shape parameter ξ determines the tail behavior:
| ξ Value | Tail Type | Examples |
|---|---|---|
| ξ > 0 | Heavy tail (Pareto-type) | Financial returns, insurance claims |
| ξ = 0 | Exponential tail | Normal distribution |
| ξ < 0 | Bounded tail | Uniform distribution; rare in finance |
9.4 Choosing the Threshold
The threshold u is the key modeling choice in POT. Too low: you include non-extreme observations, violating the GPD assumption. Too high: you have too few exceedances for reliable estimation. Diagnostic tools include:
- Mean Excess Plot: Plot E[X − u | X > u] vs u. Should be approximately linear for GPD data.
- Parameter stability plot: Estimate GPD parameters for a range of thresholds. The parameters should stabilize above the correct threshold.
- Rule of thumb: Use the 90th–95th percentile of losses as the threshold.
9.5 Python: Fitting GPD to Portfolio Losses
Python import numpy as np import pandas as pd from scipy import stats import matplotlib.pyplot as plt # Use portfolio returns from earlier losses = -port_returns # Convention: positive losses # Step 1: Choose threshold (90th percentile of losses) threshold = losses.quantile(0.90) exceedances = losses[losses > threshold] - threshold n_total = len(losses) n_exceed = len(exceedances) print(f"Threshold: {threshold:.4%}") print(f"Exceedances: {n_exceed} out of {n_total} ({n_exceed/n_total:.1%})") # Step 2: Fit GPD to exceedances xi, loc, beta = stats.genpareto.fit(exceedances, floc=0) print(f"GPD parameters: xi={xi:.4f}, beta={beta:.6f}") print(f"Tail index xi={xi:.4f}: {'Heavy tail' if xi > 0 else 'Thin tail'}") # Step 3: Compute VaR and ES using GPD def gpd_var(alpha, n_total, n_exceed, threshold, xi, beta): """VaR from GPD tail estimate.""" p_exceed = n_exceed / n_total return threshold + (beta / xi) * ((alpha / p_exceed)**(-xi) - 1) def gpd_es(var_alpha, xi, beta, threshold): """ES from GPD tail estimate.""" return var_alpha / (1 - xi) + (beta - xi * threshold) / (1 - xi) for alpha in [0.05, 0.01, 0.005, 0.001]: var_gpd = gpd_var(alpha, n_total, n_exceed, threshold, xi, beta) es_gpd = gpd_es(var_gpd, xi, beta, threshold) print(f"{1-alpha:.1%} VaR (GPD): {var_gpd:.4%} | ES: {es_gpd:.4%}") # Step 4: Mean Excess Plot thresholds = np.linspace(losses.quantile(0.70), losses.quantile(0.98), 50) mean_excess = [losses[losses > u].mean() - u for u in thresholds] plt.figure(figsize=(10, 5)) plt.plot(thresholds * 100, mean_excess, "b.-") plt.axvline(x=threshold * 100, color="red", linestyle="--", label=f"Chosen threshold ({threshold:.2%})") plt.xlabel("Threshold (%)") plt.ylabel("Mean Excess (%)") plt.title("Mean Excess Plot (should be ~linear above good threshold)") plt.legend() plt.grid(True, alpha=0.3) plt.show()
10. The Basel Regulatory Framework
10.1 Why Regulation Matters
Banks are required by regulators to hold capital reserves proportional to their risk. The Basel framework (Basel Committee on Banking Supervision) specifies how banks must measure and report risk. Understanding Basel is essential because it determines which statistical methods are mandated by law.
10.2 Evolution of Basel Risk Measures
| Framework | Risk Measure | Confidence | Horizon |
|---|---|---|---|
| Basel II (1996–2016) | VaR | 99% | 10-day |
| Basel III / FRTB (2016+) | Expected Shortfall | 97.5% | Variable (10–120 days by risk factor) |
FRTB (Fundamental Review of the Trading Book) — The Basel III market risk framework that replaced VaR with Expected Shortfall as the primary risk measure for regulatory capital calculations. It also introduced liquidity-adjusted horizons, where less liquid risk factors require longer holding periods.
10.3 Backtesting VaR Models
Regulators require banks to backtest their VaR models by comparing predicted VaR to realized losses. If actual losses exceed VaR more often than expected (more than 1% of the time for 99% VaR), the model is penalized and the bank must hold more capital.
VaR backtesting is a binomial test. Under a correct 99% VaR model, the number of exceptions (days where loss exceeds VaR) in N days follows Binomial(N, 0.01). The Basel “traffic light” system maps the observed number of exceptions to regulatory penalties: green zone (0–4 exceptions in 250 days), yellow zone (5–9), red zone (10+).
11. Chapter Summary
| Statistics Concept | Risk Management Application | Key Practical Note |
|---|---|---|
| Quantile estimation | Value at Risk | Not coherent; replaced by ES in Basel III |
| Conditional expectation | Expected Shortfall | Coherent; harder to estimate and backtest |
| Parametric bootstrap | Monte Carlo VaR | Need 100K+ simulations for 99% VaR |
| Empirical quantile | Historical simulation VaR | Cannot extrapolate beyond observed data |
| Extreme Value Theory | Tail risk modeling (GPD) | Threshold choice is critical |
| Scenario analysis | Stress testing | Test correlation breakdowns |
| Binomial test | VaR backtesting | Regulatory penalties for model failures |
Risk management is not about getting the number right — it is about understanding the model risk in every risk estimate. Every VaR number comes with hidden assumptions about distributions, stationarity, and correlations. The best risk managers use multiple methods (historical, parametric, Monte Carlo, EVT, stress tests) and pay attention to where they disagree. Disagreement between methods is the most valuable signal in risk management.