GITNUXREPORT 2026

Multiple Regression Statistics

Multiple regression involves key statistics to validate, interpret, and improve your predictive models.

Sarah Mitchell

Sarah Mitchell

Senior Researcher specializing in consumer behavior and market trends.

First published: Feb 27, 2026

Our Commitment to Accuracy

Rigorous fact-checking · Reputable sources · Regular updatesLearn more

Key Statistics

Statistic 1

Multiple regression explains 70-90% variance in housing prices in urban datasets

Statistic 2

In economics, multiple regression GDP models achieve R²=0.95+ with lags and controls

Statistic 3

Marketing ROI models using multiple regression yield R²=0.65 average across 50 studies

Statistic 4

Healthcare cost prediction via multiple regression: R²=0.72 with age, comorbidities

Statistic 5

Salary prediction in HR: multiple regression R²=0.82 with experience, education

Statistic 6

Stock return models: Fama-French 3-factor R²=0.92 vs CAPM 0.70

Statistic 7

Environmental pollution models: PM2.5 regressed on traffic, industry R²=0.78

Statistic 8

Sports analytics: NBA player efficiency multiple reg R²=0.85 with stats

Statistic 9

Education achievement: multiple reg on SES, teacher quality R²=0.61

Statistic 10

In real estate, multiple reg price models R^2 avg 0.75 across 100 datasets

Statistic 11

Macroeconomic inflation reg: CPI on money supply R^2=0.88 quarterly data 1960-2020

Statistic 12

Customer churn prediction reg R^2=0.68 with usage, tenure features

Statistic 13

Diabetes risk multiple reg HbA1c on BMI, age R^2=0.55 in NHANES

Statistic 14

Employee turnover reg R^2=0.71 with satisfaction, pay data

Statistic 15

Climate model temp reg on CO2, solar R^2=0.91 global data

Statistic 16

Baseball WAR reg on batting, fielding R^2=0.89 MLB stats

Statistic 17

Student GPA reg on hours study, IQ R^2=0.67 n=1000

Statistic 18

Variance of beta_j hat = sigma^2 / (sum (x_ij - xbar_j)^2 * (1-R_j^2))

Statistic 19

OLS estimator beta_hat = (X'X)^(-1) X'y, unbiased under Gauss-Markov assumptions

Statistic 20

Gauss-Markov theorem states OLS has minimum variance among linear unbiased estimators

Statistic 21

Ridge regression shrinks coefficients by beta_ridge = (X'X + lambda I)^(-1) X'y

Statistic 22

Lasso uses L1 penalty: argmin ||y-Xb||^2 + lambda ||b||_1, sets some betas to zero

Statistic 23

Elastic Net combines L1 and L2: argmin ||y-Xb||^2 + lambda1 ||b||_1 + lambda2 ||b||_2^2

Statistic 24

Principal Components Regression projects X onto first m PCs: beta_pcr = V_m (V_m' X'X V_m)^(-1) V_m' X'y

Statistic 25

Weighted Least Squares uses W diagonal with 1/var(u_i): beta_wls = (X'WX)^(-1)X'Wy

Statistic 26

Iteratively Reweighted Least Squares for GLM: updates weights iteratively until convergence

Statistic 27

Generalized Least Squares: beta_gls = (X'Sigma^(-1)X)^(-1) X'Sigma^(-1)y

Statistic 28

Maximum Likelihood Estimator for normal errors equals OLS, logL = -n/2 log(2pi sigma^2) - SSE/(2 sigma^2)

Statistic 29

Bayesian linear regression posterior mean = (X'X/sigma^2 + Lambda^(-1))^(-1) (X'y/sigma^2 + mu/Lambda)

Statistic 30

OLS covariance matrix (X'X)^{-1} sigma^2, estimated by s^2 (X'X)^{-1}

Statistic 31

BLUE property under homoscedasticity, no autocorrelation, exogeneity

Statistic 32

Ridge lambda chosen by cross-validation, minimizing CV error

Statistic 33

Lasso soft-thresholding operator: sign(b) (|b| - lambda)_+

Statistic 34

PCR retains m components where m minimizes PRESS statistic

Statistic 35

WLS weights w_i = 1 / var(u_i), often 1/x_i^2 for heteroscedastic errors

Statistic 36

IRLS for robust regression converges quadratically near optimum

Statistic 37

GLS efficient when Sigma known, asymptotic var min among linear unbiased

Statistic 38

MLE variance = inverse observed Fisher info -1/n sum s_i s_i'

Statistic 39

Empirical Bayes: hyperprior on coefficients shrinks to group mean

Statistic 40

Hierarchical Bayesian multiple regression improves prediction by 25% over OLS in small samples

Statistic 41

Quantile regression estimates conditional quantiles: argmin sum rho_tau (y - Xb)

Statistic 42

Instrumental Variables: beta_iv = (Z'XZ)^(-1) Z'ZY / (Z'XZ)^(-1) Z'X

Statistic 43

Panel data fixed effects: within estimator removes time-invariant unobservables

Statistic 44

Random effects: GLS with var(u_i)=sigma_u^2, var(e_it)=sigma_e^2

Statistic 45

GMM estimator minimizes (1/n) g_n(theta)' W g_n(theta), robust to heteroscedasticity

Statistic 46

Nonparametric regression kernel: Nadaraya-Watson y_hat(x) = sum K((x_i-x)/h) y_i / sum K((x_i-x)/h)

Statistic 47

Additive models: y = f1(x1) + f2(x2) + ..., estimated via backfitting

Statistic 48

LASSO path algorithm converges in O(np log n) time for p predictors

Statistic 49

Robust regression M-estimator minimizes sum rho( r_i / s ), Huber's rho

Statistic 50

Spatial autoregression extends with rho W y in errors

Statistic 51

Vector autoregression VAR(p): Y_t = A1 Y_{t-1} + ... + Ap Y_{t-p} + e_t

Statistic 52

Dynamic panel GMM: Arellano-Bond uses lags as instruments

Statistic 53

Survival Cox PH: h(t|x) = h0(t) exp(beta x), partial likelihood

Statistic 54

Tree-based regression: CART splits minimize SSE, pruning CV

Statistic 55

Gradient boosting: trees sequential, residual fitting, learning rate 0.1

Statistic 56

Neural net multiple reg: backprop minimizes MSE, ReLU activation

Statistic 57

Causal forests: heterogeneous treatment effects estimation

Statistic 58

Standardized coefficient beta* = beta * (SD_x / SD_y), measures effect in SD units

Statistic 59

Partial correlation r_{yk.j} = (r_{yk} - r_{yj} r_{yy}) / sqrt( (1-r_{yk}^2)(1-r_{yj}^2) )

Statistic 60

Elasticity = beta_j * (x_j mean / y mean), percentage change interpretation

Statistic 61

F-change statistic tests added predictor: F = (R_full^2 - R_red^2)/ (1-R_full^2) * (n-k_full-1)/1

Statistic 62

Confidence interval for beta_j: beta_hat ± t_{alpha/2} * SE(beta_hat)

Statistic 63

Predicted value var = x0' (X'X)^(-1) x0 * sigma^2 + sigma^2

Statistic 64

Marginal effect in log-linear model: beta_j * (1/y_mean) for continuous x_j

Statistic 65

Odds ratio in logistic regression approx exp(beta_j) for rare events

Statistic 66

Semi-elasticity in log(y) = beta x: beta_j percentage points per unit x

Statistic 67

Average Marginal Effect (AME) averages partial effects across observations

Statistic 68

Beta coefficient interpretation: 1 unit x_j change holds others fixed

Statistic 69

Semi-partial correlation sr_{y xj} measures unique contrib of xj to R^2

Statistic 70

For log-log model, beta_j = elasticity = %dy / %dx_j

Statistic 71

Incremental R^2 = R_full^2 - R_reduced^2 for added predictor importance

Statistic 72

95% CI width = 4 * t * SE approx for inference reliability

Statistic 73

Mean absolute prediction error MAPE = 100 * mean(|pred - actual|/actual)

Statistic 74

Logit marginal effect = beta * p(1-p) at mean x

Statistic 75

Probit marginal effect phi(beta x_mean)

Statistic 76

Dominance analysis partitions R^2 among predictors

Statistic 77

Multicollinearity reduces forecasting accuracy by 20-30% in unstable models

Statistic 78

Omitted variable bias: bias(beta_j) = gamma_{jk} * delta_k, where delta_k true coeff

Statistic 79

Heteroscedasticity biases SE by up to 50% without correction

Statistic 80

Autocorrelation in time series reg: Durbin-Watson <1.5 inflates Type I error 2x

Statistic 81

Non-normality affects inference only asymptotically; small n p-values off by 10-20%

Statistic 82

Overfitting: R² increases but out-of-sample drops 30% with too many predictors

Statistic 83

Endogeneity causes inconsistency: plim beta_hat = beta + bias term

Statistic 84

Sample size n<50 unstable coefficients, SEs 2x larger

Statistic 85

Perfect multicollinearity: singular X'X matrix, no unique solution

Statistic 86

Multiple regression assumes linearity; nonlinearities reduce R² by 15-40%

Statistic 87

Multicollinearity causes coefficient sign flips in 15% of economic datasets

Statistic 88

Omitted variable upward bias if corr(omitted,x)>0 and corr(omitted,y)>0

Statistic 89

Heteroskedasticity test power 80% at n=200 for moderate violation

Statistic 90

AR(1) rho=0.5 halves effective sample size in time series reg

Statistic 91

Bootstrap CI for beta more accurate than t for n<30, coverage 95% vs 90%

Statistic 92

Curse of dimensionality: p>n leads to overfitting, infinite VC dimension

Statistic 93

Simpson's paradox in aggregated reg hides subgroup effects

Statistic 94

Measurement error in x attenuates beta toward zero by reliability ratio

Statistic 95

Weak instruments: first-stage F<10 invalidates IV estimates

Statistic 96

In multiple regression, the adjusted R-squared penalizes the addition of unnecessary predictors by subtracting (k-1)/(n-k-1) from R-squared, where k is the number of predictors and n is sample size

Statistic 97

Multicollinearity inflates standard errors of coefficients; a VIF greater than 10 indicates high multicollinearity

Statistic 98

The Durbin-Watson test statistic ranges from 0 to 4, with values near 2 indicating no autocorrelation in residuals

Statistic 99

Breusch-Pagan test p-value less than 0.05 rejects null of homoscedasticity in multiple regression residuals

Statistic 100

Cook's distance greater than 4/n (n=sample size) identifies influential observations in multiple regression

Statistic 101

Leverage values (h_ii) above 2p/n (p=parameters, n=sample) suggest high-influence points

Statistic 102

Ramsey RESET test uses F-statistic to detect functional form misspecification; p<0.05 indicates omitted variables

Statistic 103

Variance Inflation Factor (VIF) for a predictor is 1/(1-R_j^2), where R_j^2 is from regressing predictor j on others

Statistic 104

Shapiro-Wilk test on residuals tests normality; W close to 1 indicates normality in multiple regression

Statistic 105

Heteroscedasticity-robust standard errors adjust SE by sqrt( sum(e_i^2 / h_ii)^2 / (n-k) )

Statistic 106

Augmented Dickey-Fuller test statistic more negative than critical value rejects unit root in time series multiple regression

Statistic 107

QQ-plot of residuals should align with straight line for normality assumption in multiple regression

Statistic 108

Box-Cox transformation lambda=1 indicates no transformation needed for residuals in multiple regression

Statistic 109

Ljung-Box Q-statistic tests residual autocorrelation; p>0.05 accepts white noise

Statistic 110

Studentized residuals beyond ±3 indicate outliers in multiple regression models

Statistic 111

F-test for overall significance: F = (SSR/k) / (SSE/(n-k-1)), critical value from F(k,n-k-1)

Statistic 112

Partial F-test compares nested models: F = [(SSE_r - SSE_u)/q] / [SSE_u/(n-k-1)]

Statistic 113

In multiple regression, the adjusted R-squared penalizes the addition of unnecessary predictors by subtracting (k-1)/(n-k-1) from R-squared, where k is the number of predictors and n is sample size

Statistic 114

Multicollinearity inflates standard errors of coefficients; a VIF greater than 5-10 often suggests problematic multicollinearity requiring investigation

Statistic 115

The Durbin-Watson statistic for testing autocorrelation is approximately DW = 2(1 - rho), where rho is first-order autocorrelation coefficient

Statistic 116

In Breusch-Pagan test, the LM statistic is chi-squared distributed with k degrees of freedom under null of constant variance

Statistic 117

Cook's distance measures influence as D_i = (r_i^2 / p) * (h_ii / (1-h_ii)), where r_i studentized residual

Statistic 118

Hat values h_ii = x_i (X'X)^{-1} x_i', average leverage = (k+1)/n

Statistic 119

RESET test fits model with powers of fitted values, tests joint significance F-stat

Statistic 120

VIF_j = 1 / (1 - R^2_{Xj on others}), tolerance = 1/VIF <0.1 high collinearity

Statistic 121

Anderson-Darling test for normality more powerful than Shapiro-Wilk for regression residuals

Statistic 122

White's heteroscedasticity-consistent covariance matrix: sum x_i x_i' e_i^2 / n

Statistic 123

Jarque-Bera test JB = n/6 (S^2 + (K-3)^2/4), chi2(2) for residual normality

Statistic 124

Residual plots: patterned residuals indicate model misspecification, random scatter ok

Statistic 125

Variance of prediction error = sigma^2 (1 + x0'(X'X)^{-1}x0)

Trusted by 500+ publications
Harvard Business ReviewThe GuardianFortune+497
While it might seem like a kitchen sink of statistical checks is overwhelming, a solid grasp of multiple regression diagnostics is what separates accurate, credible models from misleading guesswork.

Key Takeaways

  • In multiple regression, the adjusted R-squared penalizes the addition of unnecessary predictors by subtracting (k-1)/(n-k-1) from R-squared, where k is the number of predictors and n is sample size
  • Multicollinearity inflates standard errors of coefficients; a VIF greater than 10 indicates high multicollinearity
  • The Durbin-Watson test statistic ranges from 0 to 4, with values near 2 indicating no autocorrelation in residuals
  • Variance of beta_j hat = sigma^2 / (sum (x_ij - xbar_j)^2 * (1-R_j^2))
  • OLS estimator beta_hat = (X'X)^(-1) X'y, unbiased under Gauss-Markov assumptions
  • Gauss-Markov theorem states OLS has minimum variance among linear unbiased estimators
  • Standardized coefficient beta* = beta * (SD_x / SD_y), measures effect in SD units
  • Partial correlation r_{yk.j} = (r_{yk} - r_{yj} r_{yy}) / sqrt( (1-r_{yk}^2)(1-r_{yj}^2) )
  • Elasticity = beta_j * (x_j mean / y mean), percentage change interpretation
  • Multiple regression explains 70-90% variance in housing prices in urban datasets
  • In economics, multiple regression GDP models achieve R²=0.95+ with lags and controls
  • Marketing ROI models using multiple regression yield R²=0.65 average across 50 studies
  • Multicollinearity reduces forecasting accuracy by 20-30% in unstable models
  • Omitted variable bias: bias(beta_j) = gamma_{jk} * delta_k, where delta_k true coeff
  • Heteroscedasticity biases SE by up to 50% without correction

Multiple regression involves key statistics to validate, interpret, and improve your predictive models.

Applications

  • Multiple regression explains 70-90% variance in housing prices in urban datasets
  • In economics, multiple regression GDP models achieve R²=0.95+ with lags and controls
  • Marketing ROI models using multiple regression yield R²=0.65 average across 50 studies
  • Healthcare cost prediction via multiple regression: R²=0.72 with age, comorbidities
  • Salary prediction in HR: multiple regression R²=0.82 with experience, education
  • Stock return models: Fama-French 3-factor R²=0.92 vs CAPM 0.70
  • Environmental pollution models: PM2.5 regressed on traffic, industry R²=0.78
  • Sports analytics: NBA player efficiency multiple reg R²=0.85 with stats
  • Education achievement: multiple reg on SES, teacher quality R²=0.61
  • In real estate, multiple reg price models R^2 avg 0.75 across 100 datasets
  • Macroeconomic inflation reg: CPI on money supply R^2=0.88 quarterly data 1960-2020
  • Customer churn prediction reg R^2=0.68 with usage, tenure features
  • Diabetes risk multiple reg HbA1c on BMI, age R^2=0.55 in NHANES
  • Employee turnover reg R^2=0.71 with satisfaction, pay data
  • Climate model temp reg on CO2, solar R^2=0.91 global data
  • Baseball WAR reg on batting, fielding R^2=0.89 MLB stats
  • Student GPA reg on hours study, IQ R^2=0.67 n=1000

Applications Interpretation

While the allure of an R² approaching 1.0 suggests our models are clever, the truth is they are merely competent—consistently explaining most, but never all, of the beautifully messy variance in human affairs, economics, and even baseball.

Estimation Methods

  • Variance of beta_j hat = sigma^2 / (sum (x_ij - xbar_j)^2 * (1-R_j^2))
  • OLS estimator beta_hat = (X'X)^(-1) X'y, unbiased under Gauss-Markov assumptions
  • Gauss-Markov theorem states OLS has minimum variance among linear unbiased estimators
  • Ridge regression shrinks coefficients by beta_ridge = (X'X + lambda I)^(-1) X'y
  • Lasso uses L1 penalty: argmin ||y-Xb||^2 + lambda ||b||_1, sets some betas to zero
  • Elastic Net combines L1 and L2: argmin ||y-Xb||^2 + lambda1 ||b||_1 + lambda2 ||b||_2^2
  • Principal Components Regression projects X onto first m PCs: beta_pcr = V_m (V_m' X'X V_m)^(-1) V_m' X'y
  • Weighted Least Squares uses W diagonal with 1/var(u_i): beta_wls = (X'WX)^(-1)X'Wy
  • Iteratively Reweighted Least Squares for GLM: updates weights iteratively until convergence
  • Generalized Least Squares: beta_gls = (X'Sigma^(-1)X)^(-1) X'Sigma^(-1)y
  • Maximum Likelihood Estimator for normal errors equals OLS, logL = -n/2 log(2pi sigma^2) - SSE/(2 sigma^2)
  • Bayesian linear regression posterior mean = (X'X/sigma^2 + Lambda^(-1))^(-1) (X'y/sigma^2 + mu/Lambda)
  • OLS covariance matrix (X'X)^{-1} sigma^2, estimated by s^2 (X'X)^{-1}
  • BLUE property under homoscedasticity, no autocorrelation, exogeneity
  • Ridge lambda chosen by cross-validation, minimizing CV error
  • Lasso soft-thresholding operator: sign(b) (|b| - lambda)_+
  • PCR retains m components where m minimizes PRESS statistic
  • WLS weights w_i = 1 / var(u_i), often 1/x_i^2 for heteroscedastic errors
  • IRLS for robust regression converges quadratically near optimum
  • GLS efficient when Sigma known, asymptotic var min among linear unbiased
  • MLE variance = inverse observed Fisher info -1/n sum s_i s_i'
  • Empirical Bayes: hyperprior on coefficients shrinks to group mean

Estimation Methods Interpretation

The variance of your OLS coefficient is a tragicomic tale of two villains: the sample's refusal to vary (which inflates it) and its pesky collinearity with other predictors (which inflates it even more), a plight from which ridge regression politely shrinks, lasso brutally zeroes, and Bayesian methods philosophically ponder.

Extensions

  • Hierarchical Bayesian multiple regression improves prediction by 25% over OLS in small samples
  • Quantile regression estimates conditional quantiles: argmin sum rho_tau (y - Xb)
  • Instrumental Variables: beta_iv = (Z'XZ)^(-1) Z'ZY / (Z'XZ)^(-1) Z'X
  • Panel data fixed effects: within estimator removes time-invariant unobservables
  • Random effects: GLS with var(u_i)=sigma_u^2, var(e_it)=sigma_e^2
  • GMM estimator minimizes (1/n) g_n(theta)' W g_n(theta), robust to heteroscedasticity
  • Nonparametric regression kernel: Nadaraya-Watson y_hat(x) = sum K((x_i-x)/h) y_i / sum K((x_i-x)/h)
  • Additive models: y = f1(x1) + f2(x2) + ..., estimated via backfitting
  • LASSO path algorithm converges in O(np log n) time for p predictors
  • Robust regression M-estimator minimizes sum rho( r_i / s ), Huber's rho
  • Spatial autoregression extends with rho W y in errors
  • Vector autoregression VAR(p): Y_t = A1 Y_{t-1} + ... + Ap Y_{t-p} + e_t
  • Dynamic panel GMM: Arellano-Bond uses lags as instruments
  • Survival Cox PH: h(t|x) = h0(t) exp(beta x), partial likelihood
  • Tree-based regression: CART splits minimize SSE, pruning CV
  • Gradient boosting: trees sequential, residual fitting, learning rate 0.1
  • Neural net multiple reg: backprop minimizes MSE, ReLU activation
  • Causal forests: heterogeneous treatment effects estimation

Extensions Interpretation

While each statistical method is a specialized tool for a different kind of analytical mess, together they form a master locksmith's kit, patiently picking apart the confounding locks on reality's door to reveal the true mechanisms hiding within the data.

Interpretation

  • Standardized coefficient beta* = beta * (SD_x / SD_y), measures effect in SD units
  • Partial correlation r_{yk.j} = (r_{yk} - r_{yj} r_{yy}) / sqrt( (1-r_{yk}^2)(1-r_{yj}^2) )
  • Elasticity = beta_j * (x_j mean / y mean), percentage change interpretation
  • F-change statistic tests added predictor: F = (R_full^2 - R_red^2)/ (1-R_full^2) * (n-k_full-1)/1
  • Confidence interval for beta_j: beta_hat ± t_{alpha/2} * SE(beta_hat)
  • Predicted value var = x0' (X'X)^(-1) x0 * sigma^2 + sigma^2
  • Marginal effect in log-linear model: beta_j * (1/y_mean) for continuous x_j
  • Odds ratio in logistic regression approx exp(beta_j) for rare events
  • Semi-elasticity in log(y) = beta x: beta_j percentage points per unit x
  • Average Marginal Effect (AME) averages partial effects across observations
  • Beta coefficient interpretation: 1 unit x_j change holds others fixed
  • Semi-partial correlation sr_{y xj} measures unique contrib of xj to R^2
  • For log-log model, beta_j = elasticity = %dy / %dx_j
  • Incremental R^2 = R_full^2 - R_reduced^2 for added predictor importance
  • 95% CI width = 4 * t * SE approx for inference reliability
  • Mean absolute prediction error MAPE = 100 * mean(|pred - actual|/actual)
  • Logit marginal effect = beta * p(1-p) at mean x
  • Probit marginal effect phi(beta x_mean)
  • Dominance analysis partitions R^2 among predictors

Interpretation Interpretation

Beta standardizes romance, partial correlation flirts with uniqueness, elasticity struts in percentages, F-change gatecrashes the model, confidence intervals whisper uncertainty, prediction variance gossips about the future, marginal effects do the calculus of influence, odds ratios gamble on rare events, semi-elasticity speaks in points, AME democratizes derivatives, beta holds the line, semi-partial correlation claims its square, log-log models are constant companions, incremental R² takes credit, CI width is the price of confidence, MAPE judges with a percentage, logit and probit effects play with probabilities, and dominance analysis divides the spoils—all proving that regression is just a sophisticated cocktail party where every statistic is vying for your attention.

Limitations

  • Multicollinearity reduces forecasting accuracy by 20-30% in unstable models
  • Omitted variable bias: bias(beta_j) = gamma_{jk} * delta_k, where delta_k true coeff
  • Heteroscedasticity biases SE by up to 50% without correction
  • Autocorrelation in time series reg: Durbin-Watson <1.5 inflates Type I error 2x
  • Non-normality affects inference only asymptotically; small n p-values off by 10-20%
  • Overfitting: R² increases but out-of-sample drops 30% with too many predictors
  • Endogeneity causes inconsistency: plim beta_hat = beta + bias term
  • Sample size n<50 unstable coefficients, SEs 2x larger
  • Perfect multicollinearity: singular X'X matrix, no unique solution
  • Multiple regression assumes linearity; nonlinearities reduce R² by 15-40%
  • Multicollinearity causes coefficient sign flips in 15% of economic datasets
  • Omitted variable upward bias if corr(omitted,x)>0 and corr(omitted,y)>0
  • Heteroskedasticity test power 80% at n=200 for moderate violation
  • AR(1) rho=0.5 halves effective sample size in time series reg
  • Bootstrap CI for beta more accurate than t for n<30, coverage 95% vs 90%
  • Curse of dimensionality: p>n leads to overfitting, infinite VC dimension
  • Simpson's paradox in aggregated reg hides subgroup effects
  • Measurement error in x attenuates beta toward zero by reliability ratio
  • Weak instruments: first-stage F<10 invalidates IV estimates

Limitations Interpretation

Multiple regression reveals a house of cards where omitting a variable tilts your world, collinearity flips signs like a fickle friend, heteroscedasticity shouts lies about your certainty, and overfitting is a siren song to a model that drowns on new shores.

Model Diagnostics

  • In multiple regression, the adjusted R-squared penalizes the addition of unnecessary predictors by subtracting (k-1)/(n-k-1) from R-squared, where k is the number of predictors and n is sample size
  • Multicollinearity inflates standard errors of coefficients; a VIF greater than 10 indicates high multicollinearity
  • The Durbin-Watson test statistic ranges from 0 to 4, with values near 2 indicating no autocorrelation in residuals
  • Breusch-Pagan test p-value less than 0.05 rejects null of homoscedasticity in multiple regression residuals
  • Cook's distance greater than 4/n (n=sample size) identifies influential observations in multiple regression
  • Leverage values (h_ii) above 2p/n (p=parameters, n=sample) suggest high-influence points
  • Ramsey RESET test uses F-statistic to detect functional form misspecification; p<0.05 indicates omitted variables
  • Variance Inflation Factor (VIF) for a predictor is 1/(1-R_j^2), where R_j^2 is from regressing predictor j on others
  • Shapiro-Wilk test on residuals tests normality; W close to 1 indicates normality in multiple regression
  • Heteroscedasticity-robust standard errors adjust SE by sqrt( sum(e_i^2 / h_ii)^2 / (n-k) )
  • Augmented Dickey-Fuller test statistic more negative than critical value rejects unit root in time series multiple regression
  • QQ-plot of residuals should align with straight line for normality assumption in multiple regression
  • Box-Cox transformation lambda=1 indicates no transformation needed for residuals in multiple regression
  • Ljung-Box Q-statistic tests residual autocorrelation; p>0.05 accepts white noise
  • Studentized residuals beyond ±3 indicate outliers in multiple regression models
  • F-test for overall significance: F = (SSR/k) / (SSE/(n-k-1)), critical value from F(k,n-k-1)
  • Partial F-test compares nested models: F = [(SSE_r - SSE_u)/q] / [SSE_u/(n-k-1)]
  • In multiple regression, the adjusted R-squared penalizes the addition of unnecessary predictors by subtracting (k-1)/(n-k-1) from R-squared, where k is the number of predictors and n is sample size
  • Multicollinearity inflates standard errors of coefficients; a VIF greater than 5-10 often suggests problematic multicollinearity requiring investigation
  • The Durbin-Watson statistic for testing autocorrelation is approximately DW = 2(1 - rho), where rho is first-order autocorrelation coefficient
  • In Breusch-Pagan test, the LM statistic is chi-squared distributed with k degrees of freedom under null of constant variance
  • Cook's distance measures influence as D_i = (r_i^2 / p) * (h_ii / (1-h_ii)), where r_i studentized residual
  • Hat values h_ii = x_i (X'X)^{-1} x_i', average leverage = (k+1)/n
  • RESET test fits model with powers of fitted values, tests joint significance F-stat
  • VIF_j = 1 / (1 - R^2_{Xj on others}), tolerance = 1/VIF <0.1 high collinearity
  • Anderson-Darling test for normality more powerful than Shapiro-Wilk for regression residuals
  • White's heteroscedasticity-consistent covariance matrix: sum x_i x_i' e_i^2 / n
  • Jarque-Bera test JB = n/6 (S^2 + (K-3)^2/4), chi2(2) for residual normality
  • Residual plots: patterned residuals indicate model misspecification, random scatter ok
  • Variance of prediction error = sigma^2 (1 + x0'(X'X)^{-1}x0)

Model Diagnostics Interpretation

In the noble pursuit of statistical truth, we first penalize our vanity with adjusted R-squared, guard against bloated and correlated predictors with VIF, hunt for lurking patterns in our residuals with Durbin-Watson and Breusch-Pagan, ruthlessly identify influential saboteurs with Cook's distance and leverage, diagnose our model's form with the RESET test, plead for normality with Shapiro-Wilk and QQ-plots, adjust our errors for heteroscedasticity, ensure our time series stands still with Dickey-Fuller, verify our noise is white with Ljung-Box, and finally, with an F-test flourish, determine if our entire elaborate endeavor was, in fact, significant.

Sources & References