Module 08: CAPM & Factor Models = Regression
Asset pricing models are regressions you already know how to run
1. The Capital Asset Pricing Model Is a Regression
The Capital Asset Pricing Model (CAPM), developed independently by Sharpe (1964), Lintner (1965), and Mossin (1966), is the single most important model in asset pricing. For a statistician, it has a beautifully simple interpretation: it is a simple linear regression.
This is an OLS regression where:
| Finance Symbol | Regression Symbol | Meaning |
|---|---|---|
| ri,t − rf,t | Yt | Dependent variable: the stock's excess return over the risk-free rate |
| rm,t − rf,t | Xt | Independent variable: the market's excess return |
| αi | Intercept | Excess return not explained by market exposure (“alpha”) |
| βi | Slope | Sensitivity to market movements (“beta”) |
| εi,t | Residual | Idiosyncratic (stock-specific) return |
The CAPM regression is exactly what you'd write in statsmodels:
Y ~ X. Beta is the OLS slope, alpha is the OLS intercept, epsilon is the
residual, and R-squared tells you what fraction of the stock's return variance is
explained by the market. Every diagnostic you know — residual plots,
heteroscedasticity tests, influence measures — applies directly.
1.1 The CAPM Prediction
The theoretical CAPM makes a strong prediction: αi = 0 for all assets. If markets are efficient and CAPM is the correct model, no asset should earn a return above or below what its market exposure (β) predicts. A non-zero alpha means the asset is mispriced.
- α > 0: The asset earns more than its beta warrants — it's underpriced (a good buy).
- α < 0: The asset earns less than its beta warrants — it's overpriced (avoid or short).
- α = 0: The asset is fairly priced — it earns exactly the return that compensates for its systematic risk.
Alpha (α): The holy grail of active management. Fund managers are judged by whether they generate “alpha” — returns in excess of what a passive market exposure would produce. If a fund earns 12% but its beta to the market is 1.2 and the market returned 10%, the CAPM-predicted return is 1.2 × 10% = 12%. Alpha = 12% − 12% = 0%. The manager added no value beyond market exposure.
2. Beta: The Regression Slope Coefficient
Beta has a clean formula that should be immediately recognizable:
This is the formula for the OLS slope coefficient in simple linear regression: β̂ = Cov(Y, X) / Var(X). You've derived this a hundred times. Beta is the regression coefficient of the stock's excess return on the market's excess return. It measures how many percentage points the stock moves for each 1% move in the market.
2.1 Interpreting Beta Values
| Beta Value | Interpretation | Example |
|---|---|---|
| β = 1.0 | Moves with the market | S&P 500 index fund (by definition) |
| β > 1.0 | More volatile than market (amplifies movements) | Tech stocks, small-cap growth stocks |
| 0 < β < 1.0 | Less volatile than market (dampens movements) | Utilities, consumer staples |
| β = 0 | Uncorrelated with market | Market-neutral hedge fund (by design) |
| β < 0 | Moves opposite to market | Gold (sometimes), volatility products |
2.2 Beta Decomposition of Total Risk
The CAPM regression decomposes total variance into two components:
Total Risk = Systematic Risk + Idiosyncratic Risk
The fraction of variance explained by the market is the R-squared:
For a typical individual stock, R2 is only 20–40%, meaning 60–80% of the stock's return variance is idiosyncratic (unexplained by the market). For a diversified portfolio, R2 is much higher (often 90%+) because diversification averages away idiosyncratic risk. This is why β matters more for portfolio management than for individual stock analysis.
3. Running the CAPM Regression in Python
Let's run the CAPM regression for Apple (AAPL) against the market (S&P 500), with full regression diagnostics.
Pythonimport numpy as np
import pandas as pd
import yfinance as yf
import statsmodels.api as sm
import matplotlib.pyplot as plt
# ── Download data ──────────────────────────────────────────
# Stock: Apple; Market proxy: S&P 500 ETF (SPY)
# Risk-free rate: 3-month T-bill rate from FRED
tickers = {"AAPL": "Apple", "SPY": "S&P 500 (SPY)"}
data = yf.download(list(tickers.keys()), start="2019-01-01",
end="2024-01-01")["Adj Close"]
# Compute daily returns
returns = data.pct_change().dropna()
# Risk-free rate: approximate as constant 2% annual
rf_daily = 0.02 / 252
# Excess returns
excess_stock = returns["AAPL"] - rf_daily
excess_market = returns["SPY"] - rf_daily
# ── Run the CAPM regression ───────────────────────────────
X = sm.add_constant(excess_market) # adds intercept column
model = sm.OLS(excess_stock, X).fit()
print(model.summary())
# Extract key parameters
alpha = model.params["const"]
beta = model.params["SPY"]
r_squared = model.rsquared
alpha_tstat = model.tvalues["const"]
alpha_pval = model.pvalues["const"]
beta_tstat = model.tvalues["SPY"]
beta_se = model.bse["SPY"]
print(f"\n{'='*50}")
print(f" CAPM Results for AAPL")
print(f"{'='*50}")
print(f" Alpha (daily): {alpha:.6f}")
print(f" Alpha (annualized): {alpha*252*100:.2f}%")
print(f" Alpha t-stat: {alpha_tstat:.4f}")
print(f" Alpha p-value: {alpha_pval:.4f}")
print(f" Alpha significant? {'Yes' if alpha_pval < 0.05 else 'No'}")
print(f" {'─'*48}")
print(f" Beta: {beta:.4f}")
print(f" Beta SE: {beta_se:.4f}")
print(f" Beta 95% CI: [{beta-1.96*beta_se:.4f}, {beta+1.96*beta_se:.4f}]")
print(f" Beta t-stat: {beta_tstat:.4f}")
print(f" {'─'*48}")
print(f" R-squared: {r_squared:.4f}")
print(f" Residual std (daily):{model.resid.std():.6f}")
print(f" Residual std (ann.): {model.resid.std()*np.sqrt(252)*100:.2f}%")
print(f"{'='*50}")
3.1 The Security Characteristic Line
The scatter plot of the stock's excess return versus the market's excess return, with the OLS regression line, is called the Security Characteristic Line (SCL).
Python# Plot the Security Characteristic Line
fig, ax = plt.subplots(figsize=(10, 7))
# Scatter plot of excess returns
ax.scatter(excess_market*100, excess_stock*100, alpha=0.3,
s=10, color='#1a365d', label='Daily returns')
# Regression line
x_range = np.linspace(excess_market.min(), excess_market.max(), 100)
y_pred = (alpha + beta * x_range) * 100
ax.plot(x_range*100, y_pred, color='#e53e3e', linewidth=2,
label=f'SCL: y = {alpha*100:.4f} + {beta:.2f}x')
# Reference lines
ax.axhline(y=0, color='gray', linewidth=0.5, linestyle='-')
ax.axvline(x=0, color='gray', linewidth=0.5, linestyle='-')
# Annotations
ax.set_xlabel("Market Excess Return (%)")
ax.set_ylabel("AAPL Excess Return (%)")
ax.set_title(f"Security Characteristic Line: AAPL\n"
f"Beta = {beta:.3f}, Alpha (ann.) = {alpha*252*100:.2f}%, "
f"R² = {r_squared:.3f}")
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("scl_aapl.png", dpi=150, bbox_inches='tight')
plt.show()
Beta is not constant. A stock's sensitivity to the market changes over time as the company's leverage, business mix, and market conditions change. A rolling 60-day beta for AAPL can range from 0.8 to 1.8 depending on the period. Always report the estimation window alongside the beta estimate.
4. Testing Alpha: Is the Manager Skilled or Lucky?
The central question of active management is whether a fund's alpha is statistically significantly different from zero. This is a t-test on the regression intercept.
t = α̂ / SE(α̂)
4.1 The Power Problem
Even if a manager truly generates 2% annual alpha, detecting it is extremely difficult. With daily data, the daily alpha is 2%/252 = 0.008%, while daily idiosyncratic volatility might be 1.5%. The signal-to-noise ratio is abysmal.
Pythonfrom scipy.stats import t as t_dist
def alpha_power_analysis(true_alpha_annual, idio_vol_annual,
n_years_list=[1, 3, 5, 10, 20]):
"""
How many years of data do we need to detect alpha?
"""
true_alpha_daily = true_alpha_annual / 252
idio_vol_daily = idio_vol_annual / np.sqrt(252)
print(f"True alpha: {true_alpha_annual*100:.1f}% annual")
print(f"Idiosyncratic volatility: {idio_vol_annual*100:.1f}% annual")
print(f"\n{'Years':>6s} {'T':>6s} {'t-stat':>8s} {'Power':>8s}")
print(f"{'-'*30}")
for n_years in n_years_list:
T = int(n_years * 252)
# Expected t-statistic under the alternative
se_alpha = idio_vol_daily / np.sqrt(T)
expected_t = true_alpha_daily / se_alpha
# Power: probability of rejecting H0 at 5% level
critical_value = t_dist.ppf(0.975, T - 2)
power = 1 - t_dist.cdf(critical_value - expected_t, T - 2)
print(f"{n_years:>6d} {T:>6d} {expected_t:>8.2f} {power:>8.1%}")
# Scenario 1: Strong alpha (3% per year), typical stock vol
print("Scenario 1: Strong alpha, individual stock")
alpha_power_analysis(0.03, 0.25)
print("\n")
# Scenario 2: Moderate alpha (1% per year), diversified fund
print("Scenario 2: Moderate alpha, diversified fund")
alpha_power_analysis(0.01, 0.08)
To detect a 2% annual alpha with 80% power at the 5% significance level, you need roughly 15–20 years of daily data for a diversified fund, and even more for individual stocks. This means that even genuinely skilled managers will look indistinguishable from luck for many years. The “is this alpha real?” question is fundamentally a low-power hypothesis test.
4.2 Multiple Testing and Survivorship Bias
When evaluating thousands of fund managers, the multiple comparisons problem is severe. If 1,000 managers have zero alpha and you test each at α = 5%, you expect 50 false positives. These “star managers” will be profiled in magazines, given more capital, and then revert to the mean.
This is the multiple testing problem (Bonferroni, BH-FDR). The finance industry needs to apply False Discovery Rate control when evaluating fund performance. Harvey, Liu, and Zhu (2016) argued that the t-statistic threshold for a “new factor” should be 3.0, not 2.0, to account for the hundreds of factors that have been tested. This is the FDR correction applied to asset pricing.
5. Fama-French Three-Factor Model: Multiple Regression
The CAPM says the only risk that matters is market risk. But empirically, two other factors have significant explanatory power: size (small stocks outperform large) and value (cheap stocks outperform expensive). The Fama-French three-factor model (1993) extends the CAPM to a multiple regression:
SMB (Small Minus Big): The return difference between portfolios of small
and large stocks. Captures the size premium.
HML (High Minus Low): The return difference between portfolios of high
book-to-market (value) and low book-to-market (growth) stocks. Captures the value premium.
This is multiple linear regression with three predictors. The interpretation is identical to any multiple regression: each β measures the partial effect of that factor, controlling for the others. The model's R2 is typically 5–15 percentage points higher than the single-factor CAPM.
5.1 Extended Factor Models
The factor zoo has expanded considerably since Fama-French:
| Model | Factors | Regression Analogue |
|---|---|---|
| CAPM (1964) | Market | Simple regression (1 predictor) |
| Fama-French 3 (1993) | Market + SMB + HML | Multiple regression (3 predictors) |
| Carhart 4 (1997) | + Momentum (UMD) | Multiple regression (4 predictors) |
| Fama-French 5 (2015) | + Profitability (RMW) + Investment (CMA) | Multiple regression (5 predictors) |
| q-factor (2015) | Market + Size + Investment + Profitability | Multiple regression (4 predictors) |
Pythonimport pandas as pd
import statsmodels.api as sm
# ── Download Fama-French factors from Ken French's website ─
# Using pandas_datareader (or download CSV manually)
# For demonstration, we'll simulate or use the famafrench package
try:
import pandas_datareader.data as web
ff_factors = web.DataReader('F-F_Research_Data_Factors_daily',
'famafrench',
start='2019-01-01',
end='2024-01-01')[0]
ff_factors = ff_factors / 100 # Convert from percent to decimal
except ImportError:
# If pandas_datareader not available, create synthetic factors
print("Note: using synthetic factor data for illustration")
dates = returns.index
np.random.seed(42)
ff_factors = pd.DataFrame({
'Mkt-RF': excess_market,
'SMB': np.random.normal(0.0002, 0.005, len(dates)),
'HML': np.random.normal(0.0001, 0.005, len(dates)),
'RF': rf_daily * np.ones(len(dates))
}, index=dates)
# ── Merge stock returns with factors ──────────────────────
merged = pd.concat([
excess_stock.rename('AAPL_excess'),
ff_factors[['Mkt-RF', 'SMB', 'HML']]
], axis=1).dropna()
# ── Run Fama-French 3-factor regression ───────────────────
Y = merged['AAPL_excess']
X_capm = sm.add_constant(merged[['Mkt-RF']])
X_ff3 = sm.add_constant(merged[['Mkt-RF', 'SMB', 'HML']])
model_capm = sm.OLS(Y, X_capm).fit()
model_ff3 = sm.OLS(Y, X_ff3).fit()
# ── Compare models ────────────────────────────────────────
print("="*60)
print(" Model Comparison: CAPM vs Fama-French 3-Factor")
print("="*60)
print(f"\n{'Metric':<30s} {'CAPM':>12s} {'FF3':>12s}")
print(f"{'-'*54}")
print(f"{'Alpha (daily)':<30s} {model_capm.params['const']:>12.6f} "
f"{model_ff3.params['const']:>12.6f}")
print(f"{'Alpha (ann. %)':<30s} "
f"{model_capm.params['const']*252*100:>12.2f} "
f"{model_ff3.params['const']*252*100:>12.2f}")
print(f"{'Alpha t-stat':<30s} {model_capm.tvalues['const']:>12.4f} "
f"{model_ff3.tvalues['const']:>12.4f}")
print(f"{'Alpha p-value':<30s} {model_capm.pvalues['const']:>12.4f} "
f"{model_ff3.pvalues['const']:>12.4f}")
print(f"{'Beta (Market)':<30s} {model_capm.params['Mkt-RF']:>12.4f} "
f"{model_ff3.params['Mkt-RF']:>12.4f}")
print(f"{'Beta (SMB)':<30s} {'—':>12s} "
f"{model_ff3.params['SMB']:>12.4f}")
print(f"{'Beta (HML)':<30s} {'—':>12s} "
f"{model_ff3.params['HML']:>12.4f}")
print(f"{'R-squared':<30s} {model_capm.rsquared:>12.4f} "
f"{model_ff3.rsquared:>12.4f}")
print(f"{'Adj. R-squared':<30s} {model_capm.rsquared_adj:>12.4f} "
f"{model_ff3.rsquared_adj:>12.4f}")
print(f"{'AIC':<30s} {model_capm.aic:>12.1f} "
f"{model_ff3.aic:>12.1f}")
print(f"{'BIC':<30s} {model_capm.bic:>12.1f} "
f"{model_ff3.bic:>12.1f}")
6. Factor Models as Dimension Reduction (PCA Connection)
Factor models decompose the N-dimensional space of asset returns into a low-dimensional factor structure plus idiosyncratic noise. This is conceptually identical to Principal Component Analysis (PCA).
where r is the N×1 return vector, B is the N×K factor loading matrix, f is the K×1 factor vector, and ε is the idiosyncratic noise. The covariance matrix then factors as:
where Σf is the K×K factor covariance and D is a diagonal matrix of idiosyncratic variances. Instead of estimating N(N+1)/2 parameters, you estimate NK + K(K+1)/2 + N parameters — a massive reduction.
The statistical factor model (PCA on the return covariance matrix) often recovers factors that look like the Fama-French factors. The first principal component is almost always a “market” factor (all loadings positive, roughly equal). The second often looks like a “size” factor (small vs. large loadings). The third resembles a “value” factor. The economic factor models and statistical factor models converge.
Pythonfrom sklearn.decomposition import PCA
# Download a broader set of stocks
broad_tickers = ["AAPL", "MSFT", "GOOGL", "AMZN", "META",
"JPM", "BAC", "WFC", "GS", "C",
"JNJ", "PFE", "UNH", "MRK", "ABBV",
"XOM", "CVX", "COP", "SLB", "BKR",
"PG", "KO", "PEP", "WMT", "COST"]
broad_data = yf.download(broad_tickers, start="2020-01-01",
end="2024-01-01")["Adj Close"]
broad_returns = broad_data.pct_change().dropna()
# Run PCA
pca = PCA(n_components=10)
pca.fit(broad_returns.values)
# Variance explained
var_explained = pca.explained_variance_ratio_
cumvar = np.cumsum(var_explained)
print("Principal Components - Variance Explained:")
print(f"{'PC':>4s} {'Var Explained':>14s} {'Cumulative':>12s}")
print(f"{'-'*32}")
for i in range(10):
print(f"{'PC'+str(i+1):>4s} {var_explained[i]:>14.4f} {cumvar[i]:>12.4f}")
# Plot variance explained
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
ax1.bar(range(1, 11), var_explained * 100, color='#1a365d')
ax1.set_xlabel("Principal Component")
ax1.set_ylabel("Variance Explained (%)")
ax1.set_title("Scree Plot")
ax2.plot(range(1, 11), cumvar * 100, 'o-', color='#1a365d')
ax2.axhline(y=80, color='#e53e3e', linestyle='--',
label='80% threshold')
ax2.set_xlabel("Number of Components")
ax2.set_ylabel("Cumulative Variance Explained (%)")
ax2.set_title("Cumulative Variance Explained")
ax2.legend()
plt.tight_layout()
plt.savefig("pca_factors.png", dpi=150, bbox_inches='tight')
plt.show()
# Examine the first 3 factor loadings
loadings = pd.DataFrame(
pca.components_[:3].T,
columns=['PC1 (Market?)', 'PC2 (Sector?)', 'PC3 (Value?)'],
index=broad_tickers
)
print("\nFactor Loadings (first 3 PCs):")
print(loadings.round(4).to_string())
Typically, the first 3–5 principal components explain 60–80% of the variance in a panel of stock returns. This means that the effective dimensionality of stock returns is much lower than N. Factor models exploit this low-rank structure, just as PCA compresses high-dimensional data into a few latent dimensions.
7. Regression Diagnostics for Factor Models
Since the CAPM and factor models are regressions, all standard regression diagnostics apply. Financial returns, however, have special properties that make some diagnostics particularly important.
7.1 Heteroscedasticity
Returns exhibit volatility clustering (ARCH/GARCH effects), which means the residual variance is not constant over time. This doesn't bias OLS estimates, but it invalidates the usual standard errors.
Python# ── Regression diagnostics for the CAPM model ─────────────
from statsmodels.stats.diagnostic import het_breuschpagan
from statsmodels.stats.diagnostic import acorr_ljungbox
from scipy.stats import jarque_bera
residuals = model_capm.resid
# 1. Heteroscedasticity test (Breusch-Pagan)
bp_stat, bp_pval, _, _ = het_breuschpagan(residuals, X_capm)
print(f"Breusch-Pagan test: stat={bp_stat:.4f}, p={bp_pval:.4f}")
if bp_pval < 0.05:
print(" => Heteroscedasticity detected! Use HAC standard errors.")
# 2. Autocorrelation of residuals (Ljung-Box)
lb_result = acorr_ljungbox(residuals, lags=[5, 10, 20],
return_df=True)
print(f"\nLjung-Box test for residual autocorrelation:")
print(lb_result)
# 3. Normality of residuals (Jarque-Bera)
jb_stat, jb_pval = jarque_bera(residuals)
print(f"\nJarque-Bera normality test: stat={jb_stat:.2f}, p={jb_pval:.6f}")
# 4. Re-run with Heteroscedasticity and Autocorrelation
# Consistent (HAC) standard errors (Newey-West)
model_hac = sm.OLS(Y, X_capm).fit(cov_type='HAC',
cov_kwds={'maxlags': 10})
print("\n\nComparison: OLS vs HAC (Newey-West) standard errors:")
print(f"{'Parameter':<12s} {'OLS SE':>10s} {'HAC SE':>10s} {'Ratio':>8s}")
print(f"{'-'*42}")
for param in ['const', 'Mkt-RF']:
ols_se = model_capm.bse[param]
hac_se = model_hac.bse[param]
print(f"{param:<12s} {ols_se:>10.6f} {hac_se:>10.6f} "
f"{hac_se/ols_se:>8.2f}")
Always use HAC (Newey-West) standard errors for financial regressions. Volatility clustering means OLS standard errors are typically too small (by 10–30%), making alpha and beta look more significant than they really are. In finance, the correction matters for inference, even though the point estimates are unbiased.
7.2 Rolling Betas
Python# Rolling beta estimation
window = 126 # ~6 months
rolling_beta = excess_stock.rolling(window).cov(excess_market) / \
excess_market.rolling(window).var()
rolling_alpha = excess_stock.rolling(window).mean() - \
rolling_beta * excess_market.rolling(window).mean()
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8), sharex=True)
ax1.plot(rolling_beta, color='#1a365d', linewidth=1)
ax1.axhline(y=beta, color='#e53e3e', linestyle='--',
label=f'Full-sample beta = {beta:.3f}')
ax1.axhline(y=1, color='gray', linestyle=':', alpha=0.5)
ax1.set_ylabel("Beta")
ax1.set_title(f"AAPL Rolling {window}-Day Beta")
ax1.legend()
ax1.grid(True, alpha=0.3)
ax2.plot(rolling_alpha * 252 * 100, color='#38a169', linewidth=1)
ax2.axhline(y=0, color='gray', linestyle='-', alpha=0.5)
ax2.set_ylabel("Annualized Alpha (%)")
ax2.set_title(f"AAPL Rolling {window}-Day Alpha")
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("rolling_beta_alpha.png", dpi=150, bbox_inches='tight')
plt.show()
8. Practical Applications of Factor Models
8.1 Performance Attribution
Factor models decompose a portfolio's return into factor exposures and alpha:
This answers the question: “Did the fund manager generate returns through skill (α), or just by loading on known risk factors?”
| Component | Source | Should You Pay For It? |
|---|---|---|
| βMKT contribution | Market exposure (easy to replicate with index fund) | No — costs 0.03% via Vanguard |
| βSMB contribution | Small-cap tilt (replicable with small-cap ETF) | No — costs 0.05% via small-cap ETF |
| βHML contribution | Value tilt (replicable with value ETF) | No — costs 0.06% via value ETF |
| α | Genuine skill (if statistically significant) | Maybe — if α > fee |
8.2 Risk Decomposition
For a portfolio with factor exposures β, the variance decomposes as:
= (Systematic Risk) + (Idiosyncratic Risk)
This tells you exactly where the portfolio's risk comes from: how much is from market exposure, how much from size tilts, how much from value tilts, and how much is stock-specific.
Factor models reveal that many “hedge funds” and “active managers” are simply selling exposure to well-known factors at high fees. A fund that loads heavily on the size and value factors is not demonstrating skill — it's selling something you can buy for 5 basis points in an ETF. True alpha is the residual after accounting for all known factor exposures.
9. Chapter Summary
The CAPM and factor models are regressions — the most fundamental tool in a statistician's toolkit, applied to the most important problem in finance:
- CAPM is a simple linear regression of a stock's excess return on the market's excess return. Beta is the slope; alpha is the intercept.
- Beta = Cov(ri, rm) / Var(rm) — the regression slope, measuring systematic risk exposure.
- Alpha is the intercept, measuring excess return not explained by factor exposure. Testing α = 0 is a standard t-test on the intercept.
- R-squared tells you what fraction of the stock's variance the model explains (typically 20–40% for individual stocks, 90%+ for diversified portfolios).
- Multi-factor models (Fama-French) are multiple regression with additional predictors, and each new factor comes with diminishing returns and the risk of overfitting.
- Factor models are dimension reduction, closely related to PCA. The first few factors capture the bulk of cross-sectional return variation.
- Regression diagnostics matter: always use HAC standard errors, check for time-varying betas, and beware the multiple testing problem when evaluating alpha.
The entire field of empirical asset pricing is, at its core, applied regression analysis with careful attention to standard errors, model selection, and the multiple testing problem. Your training in regression, hypothesis testing, and model diagnostics is directly transferable — finance just uses different variable names.