Learn Without Walls

Module 03: Fat Tails & Stylized Facts

Universal patterns in financial returns that every statistician should know

Part 1 of 5 Module 03 of 22

1. Introduction: Universal Patterns in Financial Returns

In Module 02 we established that returns, not prices, are the proper object of statistical analysis. We also hinted that returns are not normally distributed. This module dives deep into how returns deviate from normality and introduces the stylized facts — a set of empirical regularities that appear across nearly all financial assets, markets, and time periods.

Finance Term
Stylized Facts: Empirical regularities observed so consistently across different markets, asset classes, and historical periods that they are considered universal properties of financial returns. Any realistic model of financial data should reproduce these patterns. The term was popularized by Cont (2001) in a seminal review paper.

As a statistician, you will recognize these as distributional properties and serial dependence structures. What makes them remarkable is their universality: the same patterns appear in equities, currencies, commodities, and interest rates, across New York, London, Tokyo, and Mumbai, from the 1920s to today.

Stats Bridge
Think of stylized facts as the financial equivalent of the empirical laws you encounter in other fields: Zipf’s law in linguistics, Benford’s law in accounting, or Taylor’s law in ecology. They are not derived from first principles but are so robustly observed that they serve as constraints on any credible model.

2. Heavy Tails: Extreme Events Are Not Rare

2.1 Kurtosis: Measuring Tail Heaviness

The kurtosis of a distribution measures the weight of its tails relative to the normal distribution. The normal distribution has a kurtosis of 3 (or an excess kurtosis of 0). Financial returns typically have excess kurtosis between 5 and 50 for daily data.

Excess Kurtosis = E[(X − μ)4] / σ4 − 3

An excess kurtosis of 10 does not mean the distribution is “slightly” non-normal. It means the probability of a 5-sigma event is orders of magnitude larger than the normal distribution predicts.

Pythonimport yfinance as yf
import numpy as np
import pandas as pd
import scipy.stats as stats

# Download several assets to show universality
tickers = {
    "S&P 500": "^GSPC",
    "Apple": "AAPL",
    "Euro/USD": "EURUSD=X",
    "Gold": "GC=F",
    "10Y Treasury": "^TNX",
    "Bitcoin": "BTC-USD",
}

results = []
for name, ticker in tickers.items():
    data = yf.download(ticker, start="2015-01-01", end="2025-01-01", progress=False)
    ret = np.log(data["Adj Close"] / data["Adj Close"].shift(1)).dropna()
    results.append({
        "Asset": name,
        "N": len(ret),
        "Mean (bps)": ret.mean() * 10000,
        "Std (%)": ret.std() * 100,
        "Skewness": ret.skew(),
        "Excess Kurtosis": ret.kurtosis(),
        "Min (%)": ret.min() * 100,
        "Max (%)": ret.max() * 100,
    })

df = pd.DataFrame(results)
print(df.to_string(index=False))
# Notice: EVERY asset has excess kurtosis >> 0

2.2 How Much More Likely Are Extreme Events?

Under a normal distribution, a daily return exceeding 4 standard deviations should occur roughly once every 126 years. In practice, it happens roughly once or twice per year. This table quantifies the discrepancy:

Event Size Normal Prob (one-tail) Expected Freq (daily) Actual Freq (S&P 500) Ratio
3σ move 0.135% Once per 3 years Several per year ~5–10x
4σ move 0.003% Once per 126 years Once per 1–2 years ~100x
5σ move 0.00003% Once per 13,000 years Once per 5–10 years ~1,000x
6σ move ~10−9 Once per 1.4 million years Several times in 100 years ~10,000x+
Key Insight
The Black Monday crash of October 19, 1987 was a 22-sigma event under the normal distribution. The probability of a 22-sigma event is approximately 10−107. There are only about 1017 seconds in the age of the universe. The normal distribution does not merely underestimate tail risk — it misses it by a factor that is cosmologically absurd.

2.3 Comparing to the Student-t Distribution

A much better model for the unconditional distribution of returns is the Student-t distribution with low degrees of freedom (ν ≈ 3–7). This naturally produces heavier tails.

Pythonimport matplotlib.pyplot as plt

# Fit a t-distribution to AAPL log returns
aapl = yf.download("AAPL", start="2015-01-01", end="2025-01-01", progress=False)
log_ret = np.log(aapl["Adj Close"] / aapl["Adj Close"].shift(1)).dropna()

# Fit Student-t (returns df, loc, scale)
t_params = stats.t.fit(log_ret)
print(f"Fitted t-distribution: df={t_params[0]:.2f}, loc={t_params[1]:.6f}, scale={t_params[2]:.6f}")

# Compare fits visually
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

x = np.linspace(log_ret.min(), log_ret.max(), 500)

# Linear scale
axes[0].hist(log_ret, bins=150, density=True, alpha=0.6, color='#3182ce', label='Data')
axes[0].plot(x, stats.norm.pdf(x, log_ret.mean(), log_ret.std()),
            'r-', lw=2, label='Normal')
axes[0].plot(x, stats.t.pdf(x, *t_params),
            'g-', lw=2, label=f'Student-t (df={t_params[0]:.1f})')
axes[0].set_title('Density Comparison (Linear Scale)')
axes[0].legend()

# Log scale — crucial for seeing tail behavior
axes[1].hist(log_ret, bins=200, density=True, alpha=0.6, color='#3182ce', label='Data')
axes[1].plot(x, stats.norm.pdf(x, log_ret.mean(), log_ret.std()),
            'r-', lw=2, label='Normal')
axes[1].plot(x, stats.t.pdf(x, *t_params),
            'g-', lw=2, label=f'Student-t (df={t_params[0]:.1f})')
axes[1].set_yscale('log')
axes[1].set_title('Density Comparison (Log Scale — Shows Tails)')
axes[1].legend()

plt.tight_layout()
plt.show()
Stats Bridge
The Student-t distribution is a natural choice because you already know it from hypothesis testing with small samples. When ν → ∞, the t-distribution converges to the normal. When ν is small (3–7), the tails are much heavier. A t-distribution with ν = 4 has infinite fourth moment (kurtosis), which means the sample kurtosis estimator is inconsistent — another reason why moment-based tests can behave oddly with financial data.

3. QQ Plots: The Visual Signature of Fat Tails

3.1 Reading a QQ Plot

The quantile-quantile (QQ) plot is the single most informative diagnostic for distributional assumptions. It plots the quantiles of your data against the quantiles of a theoretical distribution. Points on the 45-degree line mean perfect agreement.

3.2 QQ Plots Against Different Reference Distributions

Pythonfig, axes = plt.subplots(1, 3, figsize=(16, 5))

# 1. QQ plot against Normal
stats.probplot(log_ret, dist="norm", plot=axes[0])
axes[0].set_title('QQ Plot: Data vs Normal')
axes[0].get_lines()[0].set_markerfacecolor('#3182ce')
axes[0].get_lines()[0].set_markersize(2)

# 2. QQ plot against t-distribution with fitted df
stats.probplot(log_ret, dist=stats.t, sparams=(t_params[0],), plot=axes[1])
axes[1].set_title(f'QQ Plot: Data vs t(df={t_params[0]:.1f})')
axes[1].get_lines()[0].set_markerfacecolor('#38a169')
axes[1].get_lines()[0].set_markersize(2)

# 3. QQ plot against Laplace (double exponential)
stats.probplot(log_ret, dist=stats.laplace, plot=axes[2])
axes[2].set_title('QQ Plot: Data vs Laplace')
axes[2].get_lines()[0].set_markerfacecolor('#d69e2e')
axes[2].get_lines()[0].set_markersize(2)

plt.tight_layout()
plt.show()

# The normal QQ will show clear S-shaped deviation
# The t-distribution QQ will be much closer to the diagonal
# The Laplace may overshoot the tails slightly
Key Insight
The QQ plot against the normal distribution is the “classic financial plot.” It is included in virtually every empirical finance paper and every risk management report. The S-shaped deviation in the tails is so characteristic that an experienced analyst can identify financial return data from the QQ plot alone. If your QQ plot does not show this pattern, double-check your data.

4. Volatility Clustering: Large Moves Follow Large Moves

4.1 The Phenomenon

If you plot the time series of daily returns, you will notice something striking: large moves (positive or negative) tend to cluster together, and calm periods tend to cluster together. This is called volatility clustering.

Mandelbrot described it: large changes tend to be followed by large changes — of either sign — and small changes tend to be followed by small changes.

Stats Bridge
Volatility clustering means the returns process is conditionally heteroscedastic. While the unconditional variance may be constant (stationary returns), the conditional variance — the variance at time t given information up to t−1 — varies over time. This is exactly the kind of structure that ARCH/GARCH models are designed to capture. In statistical terms, the series is a martingale difference sequence but not an i.i.d. sequence: E[rt | Ft-1] ≈ 0, but Var(rt | Ft-1) is time-varying.

4.2 Visualizing Volatility Clustering

Pythonfig, axes = plt.subplots(3, 1, figsize=(14, 10), sharex=True)

# Panel 1: Returns
axes[0].plot(log_ret.index, log_ret.values, color='#1a365d', linewidth=0.4)
axes[0].set_ylabel('Log Return')
axes[0].set_title('AAPL: Returns and Volatility Clustering')
axes[0].axhline(y=0, color='gray', linestyle='--', linewidth=0.5)

# Panel 2: Absolute returns (proxy for volatility)
axes[1].plot(log_ret.index, log_ret.abs().values, color='#e53e3e', linewidth=0.4)
axes[1].set_ylabel('|Log Return|')
axes[1].set_title('Absolute Returns (Volatility Proxy)')

# Panel 3: Squared returns (another volatility proxy)
axes[2].plot(log_ret.index, (log_ret ** 2).values, color='#d69e2e', linewidth=0.4)
axes[2].set_ylabel('Return²')
axes[2].set_title('Squared Returns (Variance Proxy)')

plt.tight_layout()
plt.show()

# The clustering in panels 2 and 3 is visually obvious

4.3 ARCH Effects: Testing for Volatility Clustering

The Engle ARCH test formally tests whether squared residuals exhibit serial correlation — the statistical signature of volatility clustering.

Pythonfrom statsmodels.stats.diagnostic import het_arch

# Engle's ARCH test on residuals (or demeaned returns)
demeaned = log_ret - log_ret.mean()
arch_test = het_arch(demeaned, nlags=10)

print("=== Engle's ARCH Test ===")
print(f"LM statistic: {arch_test[0]:.4f}")
print(f"p-value:      {arch_test[1]:.2e}")
print(f"F-statistic:  {arch_test[2]:.4f}")
print(f"F p-value:    {arch_test[3]:.2e}")
print(f"Conclusion:   {'ARCH effects present' if arch_test[1] < 0.05 else 'No ARCH effects'}")
# Will almost certainly show significant ARCH effects
Common Pitfall
Volatility clustering means that standard OLS standard errors on financial return regressions are wrong, even if the returns themselves are stationary. The conditional heteroscedasticity violates the homoscedasticity assumption. You need either HAC standard errors (Newey-West) or explicit GARCH modeling to get correct inference.

5. Autocorrelation: Returns vs Absolute Returns

5.1 The Key Stylized Fact

Returns themselves show little or no serial correlation. This is consistent with weak-form market efficiency. However, absolute returns and squared returns show strong, persistent positive autocorrelation that decays slowly. This is the autocorrelation signature of volatility clustering.

Stats Bridge
This dual autocorrelation structure is the hallmark of a white noise process viewed through a stochastic volatility filter. If rt = σt · zt where zt is i.i.d. and σt is a slowly varying process, then: This is precisely the structure that GARCH(1,1) and stochastic volatility models are designed to capture.

5.2 Autocorrelation Plots

Pythonfrom statsmodels.graphics.tsaplots import plot_acf
from statsmodels.tsa.stattools import acf

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# ACF of returns
plot_acf(log_ret, lags=50, ax=axes[0, 0], title='ACF of Returns',
         color='#1a365d', alpha=0.05)

# ACF of absolute returns
plot_acf(log_ret.abs(), lags=50, ax=axes[0, 1], title='ACF of |Returns|',
         color='#e53e3e', alpha=0.05)

# ACF of squared returns
plot_acf(log_ret ** 2, lags=50, ax=axes[1, 0], title='ACF of Returns²',
         color='#d69e2e', alpha=0.05)

# Compare ACF decay: absolute vs squared
acf_abs = acf(log_ret.abs(), nlags=100)
acf_sq = acf(log_ret ** 2, nlags=100)
axes[1, 1].plot(acf_abs, label='|Returns|', color='#e53e3e')
axes[1, 1].plot(acf_sq, label='Returns²', color='#d69e2e')
axes[1, 1].set_title('ACF Comparison: |r| vs r²')
axes[1, 1].set_xlabel('Lag')
axes[1, 1].legend()

plt.tight_layout()
plt.show()

# Key observation: |returns| has HIGHER autocorrelation than returns²
# This is called the "Taylor effect" — another stylized fact

5.3 The Ljung-Box Test for Serial Correlation

Pythonfrom statsmodels.stats.diagnostic import acorr_ljungbox

# Test serial correlation in returns
lb_returns = acorr_ljungbox(log_ret, lags=[5, 10, 20], return_df=True)
print("=== Ljung-Box Test: Returns ===")
print(lb_returns)

print()

# Test serial correlation in squared returns
lb_squared = acorr_ljungbox(log_ret ** 2, lags=[5, 10, 20], return_df=True)
print("=== Ljung-Box Test: Squared Returns ===")
print(lb_squared)

# Returns: may or may not show significance
# Squared returns: will almost certainly show HIGHLY significant autocorrelation

5.4 Long Memory in Volatility

The autocorrelation of absolute returns decays very slowly — much slower than the exponential decay of a GARCH(1,1) model. This is evidence of long memory in volatility, sometimes modeled with fractional integration (FIGARCH) or hyperbolic decay functions.

Key Insight
The autocorrelation of |rt| remains statistically significant at lags of 100 or more trading days — that is nearly half a year. This persistence means that a period of high volatility reliably predicts continued high volatility for months, not just days. This has profound implications for risk management: if you are in a high-volatility regime, you should expect it to persist.

6. Formal Normality Tests for Financial Data

6.1 Overview of Tests

Test Tests What Best For Limitations
Jarque-Bera Skewness + kurtosis jointly Large samples, financial data Only uses 3rd and 4th moments
Shapiro-Wilk Overall distributional shape Small to medium samples (n < 5000) Not valid for n > 5000
Anderson-Darling Tail behavior (more weight on tails) Detecting tail deviations More conservative than other tests
Lilliefors Overall CDF comparison When parameters are estimated Less powerful than AD for tails
D’Agostino-Pearson Skewness + kurtosis omnibus General purpose Requires n > 20

6.2 Running All Tests

Python# Comprehensive normality testing
print("=== Normality Tests on AAPL Log Returns ===")
print(f"Sample size: {len(log_ret)}\n")

# 1. Jarque-Bera (ideal for financial data)
jb_stat, jb_p = stats.jarque_bera(log_ret)
print(f"Jarque-Bera:      stat={jb_stat:>10.2f}, p={jb_p:.2e}")

# 2. D'Agostino-Pearson omnibus
dp_stat, dp_p = stats.normaltest(log_ret)
print(f"D'Agostino-K2:    stat={dp_stat:>10.2f}, p={dp_p:.2e}")

# 3. Anderson-Darling
ad_result = stats.anderson(log_ret, dist='norm')
print(f"Anderson-Darling: stat={ad_result.statistic:>10.4f}")
for sl, cv in zip(ad_result.significance_level, ad_result.critical_values):
    reject = "REJECT" if ad_result.statistic > cv else "fail to reject"
    print(f"  At {sl}% level: critical={cv:.4f} -> {reject}")

# 4. Shapiro-Wilk (subsample for large n)
sw_sample = log_ret.sample(min(5000, len(log_ret)), random_state=42)
sw_stat, sw_p = stats.shapiro(sw_sample)
print(f"Shapiro-Wilk:     stat={sw_stat:>10.6f}, p={sw_p:.2e} (n={len(sw_sample)})")

# 5. Lilliefors (KS test with estimated parameters)
from statsmodels.stats.diagnostic import lilliefors
lf_stat, lf_p = lilliefors(log_ret, dist='norm')
print(f"Lilliefors:       stat={lf_stat:>10.6f}, p={lf_p:.4f}")

print(f"\nAll tests will reject normality. This is not surprising — it is a fact.")
Common Pitfall
Do not use the standard Kolmogorov-Smirnov test (stats.kstest) for normality testing when you estimate μ and σ from the data. The KS test assumes the parameters are known a priori; estimating them from the same sample makes the test overly conservative (reduced power). Use the Lilliefors modification instead, which corrects for parameter estimation.

7. The Leverage Effect: Asymmetric Volatility

7.1 What Is the Leverage Effect?

The leverage effect is the empirical observation that negative returns tend to increase future volatility more than positive returns of the same magnitude. When a stock drops 5%, subsequent volatility tends to be higher than after a 5% gain.

Finance Term
Leverage Effect: The asymmetric relationship between returns and volatility. Named because a decline in stock price increases the firm’s financial leverage (debt-to-equity ratio), theoretically making the stock riskier. Whether this financial explanation is correct remains debated, but the empirical pattern is robust.

7.2 Measuring the Leverage Effect

Python# Cross-correlation between returns and future squared returns
# Negative returns at time t should predict higher squared returns at t+1, t+2, ...

max_lag = 20
leverage_corr = []
for k in range(1, max_lag + 1):
    corr = log_ret.corr(log_ret.shift(-k) ** 2)
    leverage_corr.append(corr)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: cross-correlation (leverage effect)
axes[0].bar(range(1, max_lag + 1), leverage_corr, color='#e53e3e')
axes[0].set_xlabel('Lag k (days ahead)')
axes[0].set_ylabel('Corr(r_t, r²_{t+k})')
axes[0].set_title('Leverage Effect: Return vs Future Volatility')
axes[0].axhline(y=0, color='gray', linestyle='--')

# Right: scatter of returns vs next-day absolute returns
axes[1].scatter(log_ret.iloc[:-1].values, log_ret.abs().iloc[1:].values,
               alpha=0.15, s=5, color='#1a365d')
axes[1].set_xlabel('Return today')
axes[1].set_ylabel('|Return| tomorrow')
axes[1].set_title('Asymmetric Volatility Response')

# Add binned means to show the asymmetry
bins = pd.qcut(log_ret.iloc[:-1], 20)
binned_mean = log_ret.abs().iloc[1:].groupby(bins.values).mean()
bin_centers = [interval.mid for interval in binned_mean.index]
axes[1].plot(bin_centers, binned_mean.values, 'r-o', linewidth=2, markersize=4)

plt.tight_layout()
plt.show()
# The cross-correlations should be negative (negative return -> higher future vol)
# The scatter should show a "smirk" — higher abs returns following negative returns
Stats Bridge
The leverage effect introduces asymmetry in the volatility dynamics. In GARCH terms, a standard GARCH(1,1) model treats positive and negative shocks symmetrically. The GJR-GARCH and EGARCH models extend this to allow different volatility responses to positive vs negative returns. This is analogous to modeling heterogeneous treatment effects in clinical trials: the “treatment” (a price shock) has different effects depending on its sign.

8. Summary: The Complete List of Stylized Facts

Here is the canonical list of stylized facts, each mapped to its statistical interpretation:

# Stylized Fact Statistical Description Implication
1 Heavy tails Excess kurtosis >> 0; power-law tails Normal distribution underestimates extreme events
2 Absence of return autocorrelation ACF(rt) ≈ 0 for k ≥ 1 Returns are approximately unpredictable
3 Volatility clustering ACF(|rt|) > 0, slowly decaying Conditional heteroscedasticity; need GARCH
4 Leverage effect Corr(rt, σ2t+1) < 0 Negative returns increase volatility more
5 Volume-volatility correlation Corr(Vt, |rt|) > 0 High volume accompanies large price moves
6 Aggregational Gaussianity Kurtosis decreases at longer horizons CLT applies; monthly returns are more normal than daily
7 Taylor effect ACF(|rt|) > ACF(rt2) Absolute returns are more predictable than squared
8 Gain/loss asymmetry Negative skewness in stock returns Large drops more likely than equivalent large gains
9 Long memory in volatility ACF of |rt| decays hyperbolically, not exponentially GARCH(1,1) may be insufficient; consider FIGARCH
Key Insight
Any model you build for financial returns should reproduce at least stylized facts 1–4. A model that assumes i.i.d. normal returns violates all of them. The GARCH(1,1) model with t-distributed errors captures facts 1–3. Adding an asymmetric component (GJR-GARCH or EGARCH) captures fact 4 as well. This is why GARCH models are the workhorses of financial econometrics.

In the next module, we will turn to the multivariate structure of financial data: how correlations between assets behave, why they break down during crises, and why Pearson correlation is often the wrong tool for the job.