Module 04: Correlation in Finance

Why Pearson correlation is problematic and what to use instead

Part 1 of 5 Module 04 of 22

← Previous Module 04 of 22 Next →

1. Introduction: Why Correlation Matters in Finance

Correlation is the foundation of modern portfolio theory. Harry Markowitz won the Nobel Prize in Economics for showing that the risk of a portfolio depends not just on the risk of individual assets, but on the correlations between them. Diversification — the only “free lunch” in finance — works because correlations between assets are less than 1.

As a statistician, you understand correlation deeply. But financial correlation has peculiarities that can trip up even seasoned analysts: it changes over time, it breaks down exactly when you need it most, and the standard Pearson measure is often the wrong tool for the data you are working with.

Stats Bridge

In a standard statistics course, correlation is typically estimated from a single sample and treated as a fixed parameter. In finance, correlation is best thought of as a time-varying latent process — something that must be estimated with rolling windows or dynamic models (like DCC-GARCH). The “true” correlation between two assets today may be very different from what it was five years ago.

2. Why Pearson Correlation Is Problematic for Financial Data

2.1 The Assumptions Behind Pearson’s r

Pearson’s correlation coefficient measures the linear association between two variables. It is the maximum likelihood estimator when the data come from a bivariate normal distribution. For financial returns, this assumption fails on multiple fronts:

Non-normality: As we saw in Module 03, returns have heavy tails. Pearson’s r is not robust to outliers — a single extreme co-movement can dramatically inflate or deflate the estimated correlation.
Nonlinear dependence: Two assets may have zero Pearson correlation but strong dependence in the tails (both crash together during crises).
Non-constant relationship: The strength of association changes over time, so a single number is misleading.

Common Pitfall

Pearson correlation = 0 does not mean independence for non-normal data. Two assets can have zero linear correlation but strong tail dependence — they move independently in normal times but crash together in crises. This is precisely the pattern that destroyed portfolios in 2008: assets that appeared uncorrelated suddenly moved in lockstep during the crisis.

2.2 Sensitivity to Outliers

Pythonimport yfinance as yf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Download two stocks
data = yf.download(["AAPL", "MSFT"], start="2018-01-01", end="2025-01-01")
adj = data["Adj Close"].dropna()
ret = np.log(adj / adj.shift(1)).dropna()

# Full sample Pearson correlation
pearson_full = ret["AAPL"].corr(ret["MSFT"])

# Remove the 5% most extreme days (by absolute return of either stock)
max_abs = ret.abs().max(axis=1)
threshold = max_abs.quantile(0.95)
ret_trimmed = ret[max_abs <= threshold]
pearson_trimmed = ret_trimmed["AAPL"].corr(ret_trimmed["MSFT"])

print(f"Full sample Pearson r:    {pearson_full:.4f}  (n={len(ret)})")
print(f"Trimmed (95%) Pearson r:  {pearson_trimmed:.4f}  (n={len(ret_trimmed)})")
print(f"Change:                   {pearson_full - pearson_trimmed:+.4f}")
print(f"Removing 5% of data changed correlation by {abs(pearson_full - pearson_trimmed) / pearson_full * 100:.1f}%")

Key Insight

The fact that removing 5% of observations substantially changes the correlation estimate tells you that a few extreme days drive a large part of the measured correlation. This is a direct consequence of the heavy-tailed distribution: the squared deviations from the mean that enter the Pearson formula are dominated by extreme observations.

3. Robust Alternatives: Spearman and Kendall

3.1 Spearman Rank Correlation

Spearman’s rank correlation replaces the raw values with their ranks before computing Pearson’s r. This makes it robust to outliers and capable of capturing monotonic nonlinear relationships.

ρ_S = 1 − (6 ∑ d_i²) / (n(n² − 1))

where d_i = rank(x_i) − rank(y_i).

Stats Bridge

Spearman’s ρ is Pearson’s r computed on the copula of the data (the rank-transformed observations). It measures the degree to which the concordance between two variables is monotonic, regardless of the marginal distributions. For financial data with heavy tails, this is often a more honest measure of co-movement than Pearson’s r.

3.2 Kendall’s Tau

Kendall’s τ counts the proportion of concordant vs discordant pairs of observations. A pair (x_i, y_i) and (x_j, y_j) is concordant if they rank in the same order on both variables.

τ = (C − D) / (n choose 2)

where C = number of concordant pairs, D = number of discordant pairs.

3.3 Comparing All Three Measures

Pythonfrom scipy.stats import spearmanr, kendalltau

# Download a broader set of assets
tickers = ["AAPL", "MSFT", "GOOG", "JPM", "XOM", "GLD"]
data = yf.download(tickers, start="2018-01-01", end="2025-01-01", progress=False)
ret = np.log(data["Adj Close"] / data["Adj Close"].shift(1)).dropna()

# Compute all three correlation matrices
pearson_corr = ret.corr(method="pearson")
spearman_corr = ret.corr(method="spearman")
kendall_corr = ret.corr(method="kendall")

# Display differences
print("=== Pearson Correlation ===")
print(pearson_corr.round(3))
print("\n=== Spearman Rank Correlation ===")
print(spearman_corr.round(3))
print("\n=== Kendall's Tau ===")
print(kendall_corr.round(3))

# Compute the maximum absolute difference across pairs
diff_ps = (pearson_corr - spearman_corr).abs()
np.fill_diagonal(diff_ps.values, 0)
print(f"\nMax |Pearson - Spearman|: {diff_ps.max().max():.4f}")

3.4 Visualizing the Three Correlation Matrices

Pythonimport matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

for ax, corr_mat, title in zip(axes,
    [pearson_corr, spearman_corr, kendall_corr],
    ["Pearson", "Spearman", "Kendall"]):

    im = ax.imshow(corr_mat.values, cmap='RdBu_r', vmin=-1, vmax=1)
    ax.set_xticks(range(len(tickers)))
    ax.set_yticks(range(len(tickers)))
    ax.set_xticklabels(tickers, rotation=45)
    ax.set_yticklabels(tickers)
    ax.set_title(title)

    # Add correlation values as text
    for i in range(len(tickers)):
        for j in range(len(tickers)):
            ax.text(j, i, f"{corr_mat.iloc[i, j]:.2f}",
                    ha="center", va="center", fontsize=8,
                    color="white" if abs(corr_mat.iloc[i, j]) > 0.6 else "black")

fig.colorbar(im, ax=axes, shrink=0.8)
plt.suptitle('Correlation Matrices: Three Measures Compared', fontsize=14)
plt.tight_layout()
plt.show()

Measure	Robust to Outliers?	Captures Nonlinear?	Computation	Typical Use in Finance
Pearson	No	Only linear	O(n)	Portfolio optimization (by convention)
Spearman	Yes	Monotonic	O(n log n)	Robust analysis, rank-based strategies
Kendall	Yes	Monotonic	O(n²)	Copula calibration, concordance

4. Rolling Correlations: Time-Varying Dependence

4.1 Why Static Correlation Is Misleading

Computing a single correlation over ten years of data assumes the relationship between two assets has been constant the entire time. In practice, correlations change — sometimes gradually (due to shifting economic regimes) and sometimes abruptly (due to crises or policy changes).

Finance Term

Regime Change: A structural shift in the statistical properties of financial data. For example, the correlation between US and European stocks may be 0.5 during normal markets but 0.9 during a global crisis. These regimes correspond to different states of the economy or market sentiment.

4.2 Computing Rolling Correlations

Python# Rolling 63-day (quarterly) correlation
window = 63
rolling_corr = ret["AAPL"].rolling(window).corr(ret["MSFT"])

fig, axes = plt.subplots(2, 1, figsize=(14, 8), sharex=True)

# Top: individual returns (to see where crises are)
axes[0].plot(ret.index, ret["AAPL"], alpha=0.5, linewidth=0.4, label='AAPL', color='#3182ce')
axes[0].plot(ret.index, ret["MSFT"], alpha=0.5, linewidth=0.4, label='MSFT', color='#e53e3e')
axes[0].set_ylabel('Log Return')
axes[0].legend()
axes[0].set_title('Returns and Rolling Correlation: AAPL vs MSFT')

# Bottom: rolling correlation
axes[1].plot(rolling_corr.index, rolling_corr.values, color='#1a365d', linewidth=1)
axes[1].axhline(y=pearson_full, color='red', linestyle='--',
               label=f'Full-sample r = {pearson_full:.3f}')
axes[1].axhline(y=0, color='gray', linestyle=':')
axes[1].set_ylabel(f'{window}-Day Rolling Correlation')
axes[1].set_ylim(-0.2, 1.0)
axes[1].legend()
axes[1].fill_between(rolling_corr.index, 0, rolling_corr.values,
                     where=rolling_corr.values > 0, alpha=0.15, color='blue')
axes[1].fill_between(rolling_corr.index, 0, rolling_corr.values,
                     where=rolling_corr.values < 0, alpha=0.15, color='red')

plt.tight_layout()
plt.show()

4.3 Rolling Correlation Heatmap

Python# Compute rolling correlation for all pairs at quarterly frequency
# Resample to quarterly to make the heatmap tractable

window = 63
pairs = []
pair_labels = []

for i in range(len(tickers)):
    for j in range(i + 1, len(tickers)):
        rc = ret[tickers[i]].rolling(window).corr(ret[tickers[j]])
        pairs.append(rc)
        pair_labels.append(f"{tickers[i]}/{tickers[j]}")

rolling_df = pd.DataFrame(dict(zip(pair_labels, pairs))).dropna()

# Resample to monthly for cleaner visualization
monthly_corr = rolling_df.resample('M').last()

fig, ax = plt.subplots(figsize=(16, 8))
im = ax.pcolormesh(monthly_corr.index, range(len(pair_labels)),
                   monthly_corr.T.values, cmap='RdBu_r', vmin=-0.5, vmax=1.0)
ax.set_yticks(range(len(pair_labels)))
ax.set_yticklabels(pair_labels, fontsize=8)
ax.set_title('Rolling 63-Day Correlation Heatmap')
fig.colorbar(im, ax=ax, label='Correlation')
plt.tight_layout()
plt.show()

# Look for vertical bands of high correlation — those are crises

Stats Bridge

Rolling correlation is a kernel estimator with a rectangular kernel. The window size is the bandwidth parameter. A shorter window (21 days) is more responsive but noisier; a longer window (252 days) is smoother but lags behind structural changes. This is the classic bias-variance tradeoff. Exponentially weighted moving correlation (EWMA) uses an exponential kernel, which gives more weight to recent observations and is the standard in industry risk management (RiskMetrics methodology).

5. Correlation Breakdown During Crises

5.1 The Worst-Case Scenario for Diversification

One of the most dangerous properties of financial correlation is that it increases during crises. Precisely when you need diversification most — when markets are crashing — correlations spike toward 1 and the diversification benefit evaporates.

Key Insight

The correlation between asset classes during the 2008 crisis and the 2020 COVID crash rose dramatically. Stocks, commodities, corporate bonds, and even some “safe haven” assets fell together. This means that a portfolio constructed using normal-period correlations will appear well-diversified but will actually be exposed to much higher risk during exactly the scenarios where risk management matters most.

5.2 Conditional Correlation Analysis

Python# Compare correlations in different market regimes
# Define regimes by S&P 500 performance

sp500 = yf.download("^GSPC", start="2018-01-01", end="2025-01-01", progress=False)
sp_ret = np.log(sp500["Adj Close"] / sp500["Adj Close"].shift(1)).dropna()

# Align all data
common_idx = ret.index.intersection(sp_ret.index)
ret_aligned = ret.loc[common_idx]
sp_aligned = sp_ret.loc[common_idx]

# Define regimes: crisis = bottom 10% of S&P days, calm = middle 80%, rally = top 10%
q10 = sp_aligned.quantile(0.10)
q90 = sp_aligned.quantile(0.90)

crisis_days = sp_aligned[sp_aligned <= q10].index
calm_days = sp_aligned[(sp_aligned > q10) & (sp_aligned < q90)].index
rally_days = sp_aligned[sp_aligned >= q90].index

regimes = {
    "Crisis (bottom 10%)": crisis_days,
    "Calm (middle 80%)": calm_days,
    "Rally (top 10%)": rally_days,
    "Full sample": common_idx
}

print(f"{'Regime':<25} {'AAPL/MSFT':>10} {'AAPL/JPM':>10} {'AAPL/XOM':>10} {'AAPL/GLD':>10} {'N':>6}")
print("-" * 75)

for regime_name, days in regimes.items():
    r = ret_aligned.loc[days]
    corrs = [
        r["AAPL"].corr(r["MSFT"]),
        r["AAPL"].corr(r["JPM"]),
        r["AAPL"].corr(r["XOM"]),
        r["AAPL"].corr(r["GLD"]),
    ]
    print(f"{regime_name:<25} {corrs[0]:>10.3f} {corrs[1]:>10.3f} {corrs[2]:>10.3f} {corrs[3]:>10.3f} {len(days):>6}")

# You will typically see: crisis correlations > calm correlations

Common Pitfall

Be careful interpreting conditional correlations. Forbes and Rigobon (2002) showed that conditioning on high-volatility periods mechanically inflates the measured correlation, even if the underlying dependence structure has not changed. The formula is: ρ_conditional = ρ / √(1 + δ(1 − ρ²)), where δ reflects the increase in variance. Always correct for this bias before concluding that “correlations increase during crises.”

5.3 The Forbes-Rigobon Correction

Pythondef forbes_rigobon_correction(rho_conditional, var_ratio):
    """
    Correct conditional correlation for the variance-inflation bias.

    Parameters:
        rho_conditional: correlation estimated in the high-vol subsample
        var_ratio: (variance in subsample) / (variance in full sample) - 1

    Returns:
        Corrected correlation
    """
    numerator = rho_conditional
    denominator = np.sqrt(1 + var_ratio * (1 - rho_conditional ** 2))
    return numerator / denominator

# Example: correct the crisis correlation
crisis_ret = ret_aligned.loc[crisis_days]
full_ret = ret_aligned

var_ratio = (crisis_ret["AAPL"].var() / full_ret["AAPL"].var()) - 1
rho_crisis = crisis_ret["AAPL"].corr(crisis_ret["MSFT"])
rho_corrected = forbes_rigobon_correction(rho_crisis, var_ratio)

print(f"Crisis correlation (raw):       {rho_crisis:.4f}")
print(f"Crisis correlation (corrected): {rho_corrected:.4f}")
print(f"Full sample correlation:        {pearson_full:.4f}")
print(f"Variance inflation ratio:       {var_ratio:.2f}")

6. Spurious Correlations and Causation

6.1 Spurious Correlation from Nonstationarity

As we covered in Module 02, correlating two price levels (rather than returns) produces spurious results. Two random walks that have no connection to each other will typically show correlations of 0.8 or higher over a few years.

Python# Demonstrate spurious correlation
np.random.seed(42)
n = 1000

# Two COMPLETELY INDEPENDENT random walks
rw1 = np.cumsum(np.random.normal(0, 1, n))
rw2 = np.cumsum(np.random.normal(0, 1, n))

corr_levels = np.corrcoef(rw1, rw2)[0, 1]
corr_changes = np.corrcoef(np.diff(rw1), np.diff(rw2))[0, 1]

print(f"Correlation of LEVELS:  {corr_levels:.4f}  (SPURIOUS — they are independent!)")
print(f"Correlation of CHANGES: {corr_changes:.4f}  (correct — near zero)")

# Monte Carlo: repeat 1000 times to show the distribution of spurious correlations
spurious_corrs = []
for _ in range(1000):
    a = np.cumsum(np.random.normal(0, 1, n))
    b = np.cumsum(np.random.normal(0, 1, n))
    spurious_corrs.append(np.corrcoef(a, b)[0, 1])

spurious_corrs = np.array(spurious_corrs)
print(f"\nMonte Carlo (1000 pairs of independent random walks):")
print(f"  Mean |correlation|: {np.abs(spurious_corrs).mean():.4f}")
print(f"  Fraction with |r| > 0.5: {(np.abs(spurious_corrs) > 0.5).mean():.1%}")
print(f"  Fraction with |r| > 0.8: {(np.abs(spurious_corrs) > 0.8).mean():.1%}")

6.2 The Third-Variable Problem

Even with returns, observed correlations may be driven by a common factor rather than a direct relationship. The most common confound in finance is market risk: most stocks are positively correlated because they all respond to the overall market. The “true” stock-specific correlation is the residual after removing the market factor.

Stats Bridge

This is a textbook confounding variable problem. The market return is a confounder that drives both stocks. To estimate the “direct” relationship, you need to condition on (or partial out) the market factor — exactly as you would use partial correlation or include a control variable in a regression. In finance, this is formalized as the factor model: r_i,t = α_i + β_i r_m,t + ε_i,t. The residual correlations — Corr(ε_i, ε_j) — measure true stock-specific co-movement.

6.3 Correlation Does Not Imply Prediction

A high contemporaneous correlation between two asset returns does not mean that one predicts the other. If AAPL and MSFT have a 0.75 daily correlation, that means they tend to move together on the same day. It says nothing about whether today’s AAPL return predicts tomorrow’s MSFT return.

Python# Contemporaneous vs predictive correlation
contemporaneous = ret["AAPL"].corr(ret["MSFT"])
predictive = ret["AAPL"].corr(ret["MSFT"].shift(-1))  # AAPL today vs MSFT tomorrow
reverse = ret["MSFT"].corr(ret["AAPL"].shift(-1))    # MSFT today vs AAPL tomorrow

print(f"Contemporaneous: Corr(AAPL_t, MSFT_t)   = {contemporaneous:.4f}")
print(f"Predictive:      Corr(AAPL_t, MSFT_t+1) = {predictive:.4f}")
print(f"Reverse:         Corr(MSFT_t, AAPL_t+1) = {reverse:.4f}")
print(f"\nContemporaneous correlation is large; predictive is near zero.")
print(f"This is consistent with efficient markets.")

7. Beyond Correlation: A Brief Introduction to Copulas

7.1 Why Correlation Alone Is Insufficient

Correlation captures only one number about the joint distribution of two variables. Two joint distributions can have the same Pearson correlation but completely different dependence structures — especially in the tails.

Finance Term

Copula: A function that joins (couples) marginal distributions to form a multivariate distribution. The copula captures the dependence structure separate from the marginal distributions. Sklar’s theorem guarantees that any multivariate distribution can be decomposed into its marginals and a copula. This separation is extremely useful because you can model margins and dependence independently.

7.2 Tail Dependence

The key concept that copulas capture but correlation misses is tail dependence: the probability that one variable is extreme given that the other is extreme.

λ_L = lim_u→0 P(Y ≤ F_Y⁻¹(u) | X ≤ F_X⁻¹(u))

The Gaussian copula has zero tail dependence: extreme co-movements are infinitely unlikely. The Clayton copula has lower tail dependence (joint crashes are more likely). The Gumbel copula has upper tail dependence (joint booms).

Copula Family	Lower Tail Dependence	Upper Tail Dependence	Best For
Gaussian	0	0	Baseline model; often inadequate for finance
Student-t	Symmetric > 0	Symmetric > 0	Symmetric tail dependence; a good default for finance
Clayton	> 0	0	Modeling joint crashes (lower tail events)
Gumbel	0	> 0	Modeling joint booms (upper tail events)
Frank	0	0	Symmetric dependence without tail dependence

Key Insight

The misuse of the Gaussian copula was a contributing factor in the 2008 financial crisis. The model was widely used to price collateralized debt obligations (CDOs), but its assumption of zero tail dependence meant it drastically underestimated the probability of widespread simultaneous defaults. When defaults became correlated during the crisis, the losses far exceeded what the Gaussian copula predicted.

7.3 Empirical Copula Visualization

Pythonfrom scipy.stats import rankdata

# Create the empirical copula (rank-transformed data)
u = rankdata(ret["AAPL"]) / (len(ret) + 1)
v = rankdata(ret["MSFT"]) / (len(ret) + 1)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Left: raw returns scatter
axes[0].scatter(ret["AAPL"], ret["MSFT"], alpha=0.2, s=3, color='#1a365d')
axes[0].set_xlabel('AAPL Return')
axes[0].set_ylabel('MSFT Return')
axes[0].set_title('Raw Returns')

# Right: empirical copula (rank space)
axes[1].scatter(u, v, alpha=0.2, s=3, color='#e53e3e')
axes[1].set_xlabel('AAPL Rank (uniform scale)')
axes[1].set_ylabel('MSFT Rank (uniform scale)')
axes[1].set_title('Empirical Copula (Rank Space)')
axes[1].set_xlim(0, 1)
axes[1].set_ylim(0, 1)

# Add reference lines at corners to highlight tail dependence
axes[1].axhline(y=0.05, color='gray', linestyle='--', alpha=0.5)
axes[1].axvline(x=0.05, color='gray', linestyle='--', alpha=0.5)
axes[1].axhline(y=0.95, color='gray', linestyle='--', alpha=0.5)
axes[1].axvline(x=0.95, color='gray', linestyle='--', alpha=0.5)

plt.tight_layout()
plt.show()

# Look at the lower-left corner: if points cluster there, there is lower tail dependence
# (both stocks crash together more often than the Gaussian copula would predict)

# Quantify tail dependence empirically
threshold = 0.05
lower_tail = np.mean((u <= threshold) & (v <= threshold)) / threshold
upper_tail = np.mean((u >= 1 - threshold) & (v >= 1 - threshold)) / threshold
print(f"Empirical lower tail dependence (5%): {lower_tail:.4f}")
print(f"Empirical upper tail dependence (5%): {upper_tail:.4f}")
print(f"Under independence, both would be:    {threshold:.4f}")

Stats Bridge

The empirical copula is simply the bivariate rank plot scaled to [0, 1]. If you have used probability integral transform plots in model diagnostics (checking if residuals are uniform after applying the CDF), you are already familiar with this idea. The copula is the joint distribution of the probability integral transforms of each marginal.

8. Chapter Summary

Concept	Key Takeaway	Practical Action
Pearson correlation	Sensitive to outliers; only captures linear dependence	Use alongside Spearman; interpret with caution
Spearman / Kendall	Robust to outliers; capture monotonic dependence	Prefer for exploratory analysis of financial data
Rolling correlation	Financial correlations are time-varying	Always check stability before using a static estimate
Crisis correlation	Correlations increase during market stress	Stress-test portfolios with crisis-period correlations
Forbes-Rigobon	Conditioning on volatility inflates correlation mechanically	Correct for variance inflation before concluding contagion
Copulas	Correlation is one number; dependence structure is richer	Use t-copulas or Clayton copulas for tail risk

You now have a nuanced understanding of how dependence works in financial data. In the next module, we will turn to a different kind of statistical problem: missing data and selection bias, which in finance manifests as the notorious survivorship bias.

← Previous Course Home Next →