Key Takeaways
- In multiple regression, the adjusted R-squared penalizes the addition of unnecessary predictors by subtracting (k-1)/(n-k-1) from R-squared, where k is the number of predictors and n is sample size
- Multicollinearity inflates standard errors of coefficients; a VIF greater than 10 indicates high multicollinearity
- The Durbin-Watson test statistic ranges from 0 to 4, with values near 2 indicating no autocorrelation in residuals
- Variance of beta_j hat = sigma^2 / (sum (x_ij - xbar_j)^2 * (1-R_j^2))
- OLS estimator beta_hat = (X'X)^(-1) X'y, unbiased under Gauss-Markov assumptions
- Gauss-Markov theorem states OLS has minimum variance among linear unbiased estimators
- Standardized coefficient beta* = beta * (SD_x / SD_y), measures effect in SD units
- Partial correlation r_{yk.j} = (r_{yk} - r_{yj} r_{yy}) / sqrt( (1-r_{yk}^2)(1-r_{yj}^2) )
- Elasticity = beta_j * (x_j mean / y mean), percentage change interpretation
- Multiple regression explains 70-90% variance in housing prices in urban datasets
- In economics, multiple regression GDP models achieve R²=0.95+ with lags and controls
- Marketing ROI models using multiple regression yield R²=0.65 average across 50 studies
- Multicollinearity reduces forecasting accuracy by 20-30% in unstable models
- Omitted variable bias: bias(beta_j) = gamma_{jk} * delta_k, where delta_k true coeff
- Heteroscedasticity biases SE by up to 50% without correction
Multiple regression involves key statistics to validate, interpret, and improve your predictive models.
Applications
- Multiple regression explains 70-90% variance in housing prices in urban datasets
- In economics, multiple regression GDP models achieve R²=0.95+ with lags and controls
- Marketing ROI models using multiple regression yield R²=0.65 average across 50 studies
- Healthcare cost prediction via multiple regression: R²=0.72 with age, comorbidities
- Salary prediction in HR: multiple regression R²=0.82 with experience, education
- Stock return models: Fama-French 3-factor R²=0.92 vs CAPM 0.70
- Environmental pollution models: PM2.5 regressed on traffic, industry R²=0.78
- Sports analytics: NBA player efficiency multiple reg R²=0.85 with stats
- Education achievement: multiple reg on SES, teacher quality R²=0.61
- In real estate, multiple reg price models R^2 avg 0.75 across 100 datasets
- Macroeconomic inflation reg: CPI on money supply R^2=0.88 quarterly data 1960-2020
- Customer churn prediction reg R^2=0.68 with usage, tenure features
- Diabetes risk multiple reg HbA1c on BMI, age R^2=0.55 in NHANES
- Employee turnover reg R^2=0.71 with satisfaction, pay data
- Climate model temp reg on CO2, solar R^2=0.91 global data
- Baseball WAR reg on batting, fielding R^2=0.89 MLB stats
- Student GPA reg on hours study, IQ R^2=0.67 n=1000
Applications Interpretation
Estimation Methods
- Variance of beta_j hat = sigma^2 / (sum (x_ij - xbar_j)^2 * (1-R_j^2))
- OLS estimator beta_hat = (X'X)^(-1) X'y, unbiased under Gauss-Markov assumptions
- Gauss-Markov theorem states OLS has minimum variance among linear unbiased estimators
- Ridge regression shrinks coefficients by beta_ridge = (X'X + lambda I)^(-1) X'y
- Lasso uses L1 penalty: argmin ||y-Xb||^2 + lambda ||b||_1, sets some betas to zero
- Elastic Net combines L1 and L2: argmin ||y-Xb||^2 + lambda1 ||b||_1 + lambda2 ||b||_2^2
- Principal Components Regression projects X onto first m PCs: beta_pcr = V_m (V_m' X'X V_m)^(-1) V_m' X'y
- Weighted Least Squares uses W diagonal with 1/var(u_i): beta_wls = (X'WX)^(-1)X'Wy
- Iteratively Reweighted Least Squares for GLM: updates weights iteratively until convergence
- Generalized Least Squares: beta_gls = (X'Sigma^(-1)X)^(-1) X'Sigma^(-1)y
- Maximum Likelihood Estimator for normal errors equals OLS, logL = -n/2 log(2pi sigma^2) - SSE/(2 sigma^2)
- Bayesian linear regression posterior mean = (X'X/sigma^2 + Lambda^(-1))^(-1) (X'y/sigma^2 + mu/Lambda)
- OLS covariance matrix (X'X)^{-1} sigma^2, estimated by s^2 (X'X)^{-1}
- BLUE property under homoscedasticity, no autocorrelation, exogeneity
- Ridge lambda chosen by cross-validation, minimizing CV error
- Lasso soft-thresholding operator: sign(b) (|b| - lambda)_+
- PCR retains m components where m minimizes PRESS statistic
- WLS weights w_i = 1 / var(u_i), often 1/x_i^2 for heteroscedastic errors
- IRLS for robust regression converges quadratically near optimum
- GLS efficient when Sigma known, asymptotic var min among linear unbiased
- MLE variance = inverse observed Fisher info -1/n sum s_i s_i'
- Empirical Bayes: hyperprior on coefficients shrinks to group mean
Estimation Methods Interpretation
Extensions
- Hierarchical Bayesian multiple regression improves prediction by 25% over OLS in small samples
- Quantile regression estimates conditional quantiles: argmin sum rho_tau (y - Xb)
- Instrumental Variables: beta_iv = (Z'XZ)^(-1) Z'ZY / (Z'XZ)^(-1) Z'X
- Panel data fixed effects: within estimator removes time-invariant unobservables
- Random effects: GLS with var(u_i)=sigma_u^2, var(e_it)=sigma_e^2
- GMM estimator minimizes (1/n) g_n(theta)' W g_n(theta), robust to heteroscedasticity
- Nonparametric regression kernel: Nadaraya-Watson y_hat(x) = sum K((x_i-x)/h) y_i / sum K((x_i-x)/h)
- Additive models: y = f1(x1) + f2(x2) + ..., estimated via backfitting
- LASSO path algorithm converges in O(np log n) time for p predictors
- Robust regression M-estimator minimizes sum rho( r_i / s ), Huber's rho
- Spatial autoregression extends with rho W y in errors
- Vector autoregression VAR(p): Y_t = A1 Y_{t-1} + ... + Ap Y_{t-p} + e_t
- Dynamic panel GMM: Arellano-Bond uses lags as instruments
- Survival Cox PH: h(t|x) = h0(t) exp(beta x), partial likelihood
- Tree-based regression: CART splits minimize SSE, pruning CV
- Gradient boosting: trees sequential, residual fitting, learning rate 0.1
- Neural net multiple reg: backprop minimizes MSE, ReLU activation
- Causal forests: heterogeneous treatment effects estimation
Extensions Interpretation
Interpretation
- Standardized coefficient beta* = beta * (SD_x / SD_y), measures effect in SD units
- Partial correlation r_{yk.j} = (r_{yk} - r_{yj} r_{yy}) / sqrt( (1-r_{yk}^2)(1-r_{yj}^2) )
- Elasticity = beta_j * (x_j mean / y mean), percentage change interpretation
- F-change statistic tests added predictor: F = (R_full^2 - R_red^2)/ (1-R_full^2) * (n-k_full-1)/1
- Confidence interval for beta_j: beta_hat ± t_{alpha/2} * SE(beta_hat)
- Predicted value var = x0' (X'X)^(-1) x0 * sigma^2 + sigma^2
- Marginal effect in log-linear model: beta_j * (1/y_mean) for continuous x_j
- Odds ratio in logistic regression approx exp(beta_j) for rare events
- Semi-elasticity in log(y) = beta x: beta_j percentage points per unit x
- Average Marginal Effect (AME) averages partial effects across observations
- Beta coefficient interpretation: 1 unit x_j change holds others fixed
- Semi-partial correlation sr_{y xj} measures unique contrib of xj to R^2
- For log-log model, beta_j = elasticity = %dy / %dx_j
- Incremental R^2 = R_full^2 - R_reduced^2 for added predictor importance
- 95% CI width = 4 * t * SE approx for inference reliability
- Mean absolute prediction error MAPE = 100 * mean(|pred - actual|/actual)
- Logit marginal effect = beta * p(1-p) at mean x
- Probit marginal effect phi(beta x_mean)
- Dominance analysis partitions R^2 among predictors
Interpretation Interpretation
Limitations
- Multicollinearity reduces forecasting accuracy by 20-30% in unstable models
- Omitted variable bias: bias(beta_j) = gamma_{jk} * delta_k, where delta_k true coeff
- Heteroscedasticity biases SE by up to 50% without correction
- Autocorrelation in time series reg: Durbin-Watson <1.5 inflates Type I error 2x
- Non-normality affects inference only asymptotically; small n p-values off by 10-20%
- Overfitting: R² increases but out-of-sample drops 30% with too many predictors
- Endogeneity causes inconsistency: plim beta_hat = beta + bias term
- Sample size n<50 unstable coefficients, SEs 2x larger
- Perfect multicollinearity: singular X'X matrix, no unique solution
- Multiple regression assumes linearity; nonlinearities reduce R² by 15-40%
- Multicollinearity causes coefficient sign flips in 15% of economic datasets
- Omitted variable upward bias if corr(omitted,x)>0 and corr(omitted,y)>0
- Heteroskedasticity test power 80% at n=200 for moderate violation
- AR(1) rho=0.5 halves effective sample size in time series reg
- Bootstrap CI for beta more accurate than t for n<30, coverage 95% vs 90%
- Curse of dimensionality: p>n leads to overfitting, infinite VC dimension
- Simpson's paradox in aggregated reg hides subgroup effects
- Measurement error in x attenuates beta toward zero by reliability ratio
- Weak instruments: first-stage F<10 invalidates IV estimates
Limitations Interpretation
Model Diagnostics
- In multiple regression, the adjusted R-squared penalizes the addition of unnecessary predictors by subtracting (k-1)/(n-k-1) from R-squared, where k is the number of predictors and n is sample size
- Multicollinearity inflates standard errors of coefficients; a VIF greater than 10 indicates high multicollinearity
- The Durbin-Watson test statistic ranges from 0 to 4, with values near 2 indicating no autocorrelation in residuals
- Breusch-Pagan test p-value less than 0.05 rejects null of homoscedasticity in multiple regression residuals
- Cook's distance greater than 4/n (n=sample size) identifies influential observations in multiple regression
- Leverage values (h_ii) above 2p/n (p=parameters, n=sample) suggest high-influence points
- Ramsey RESET test uses F-statistic to detect functional form misspecification; p<0.05 indicates omitted variables
- Variance Inflation Factor (VIF) for a predictor is 1/(1-R_j^2), where R_j^2 is from regressing predictor j on others
- Shapiro-Wilk test on residuals tests normality; W close to 1 indicates normality in multiple regression
- Heteroscedasticity-robust standard errors adjust SE by sqrt( sum(e_i^2 / h_ii)^2 / (n-k) )
- Augmented Dickey-Fuller test statistic more negative than critical value rejects unit root in time series multiple regression
- QQ-plot of residuals should align with straight line for normality assumption in multiple regression
- Box-Cox transformation lambda=1 indicates no transformation needed for residuals in multiple regression
- Ljung-Box Q-statistic tests residual autocorrelation; p>0.05 accepts white noise
- Studentized residuals beyond ±3 indicate outliers in multiple regression models
- F-test for overall significance: F = (SSR/k) / (SSE/(n-k-1)), critical value from F(k,n-k-1)
- Partial F-test compares nested models: F = [(SSE_r - SSE_u)/q] / [SSE_u/(n-k-1)]
- In multiple regression, the adjusted R-squared penalizes the addition of unnecessary predictors by subtracting (k-1)/(n-k-1) from R-squared, where k is the number of predictors and n is sample size
- Multicollinearity inflates standard errors of coefficients; a VIF greater than 5-10 often suggests problematic multicollinearity requiring investigation
- The Durbin-Watson statistic for testing autocorrelation is approximately DW = 2(1 - rho), where rho is first-order autocorrelation coefficient
- In Breusch-Pagan test, the LM statistic is chi-squared distributed with k degrees of freedom under null of constant variance
- Cook's distance measures influence as D_i = (r_i^2 / p) * (h_ii / (1-h_ii)), where r_i studentized residual
- Hat values h_ii = x_i (X'X)^{-1} x_i', average leverage = (k+1)/n
- RESET test fits model with powers of fitted values, tests joint significance F-stat
- VIF_j = 1 / (1 - R^2_{Xj on others}), tolerance = 1/VIF <0.1 high collinearity
- Anderson-Darling test for normality more powerful than Shapiro-Wilk for regression residuals
- White's heteroscedasticity-consistent covariance matrix: sum x_i x_i' e_i^2 / n
- Jarque-Bera test JB = n/6 (S^2 + (K-3)^2/4), chi2(2) for residual normality
- Residual plots: patterned residuals indicate model misspecification, random scatter ok
- Variance of prediction error = sigma^2 (1 + x0'(X'X)^{-1}x0)
Model Diagnostics Interpretation
Sources & References
- Reference 1STATOLOGYstatology.orgVisit source
- Reference 2TOWARDSDATASCIENCEtowardsdatascience.comVisit source
- Reference 3STATISTICSHOWTOstatisticshowto.comVisit source
- Reference 4STATSDIRECTstatsdirect.comVisit source
- Reference 5THEANALYSISFACTORtheanalysisfactor.comVisit source
- Reference 6STATISTICALHORIZONSstatisticalhorizons.comVisit source
- Reference 7STATAstata.comVisit source
- Reference 8ENen.wikipedia.orgVisit source
- Reference 9ITLitl.nist.govVisit source
- Reference 10ECONOMETRICS-WITH-Reconometrics-with-r.orgVisit source
- Reference 11OTEXTSotexts.comVisit source
- Reference 12STATMETHODSstatmethods.netVisit source
- Reference 13MATHWORKSmathworks.comVisit source
- Reference 14STATSstats.stackexchange.comVisit source
- Reference 15ONLINEonline.stat.psu.eduVisit source
- Reference 16UVMuvm.eduVisit source
- Reference 17STATLECTstatlect.comVisit source
- Reference 18WEBweb.stanford.eduVisit source
- Reference 19THEORETICALECOLOGYtheoreticalecology.orgVisit source
- Reference 20STATISTICSSOLUTIONSstatisticssolutions.comVisit source
- Reference 21STATISTICSstatistics.laerd.comVisit source
- Reference 22STATstat.ucla.eduVisit source
- Reference 23PRINCETONprinceton.eduVisit source
- Reference 24KAGGLEkaggle.comVisit source
- Reference 25FREDfred.stlouisfed.orgVisit source
- Reference 26HBRhbr.orgVisit source
- Reference 27NCBIncbi.nlm.nih.govVisit source
- Reference 28MBAmba.tuck.dartmouth.eduVisit source
- Reference 29EPAepa.govVisit source
- Reference 30BASKETBALL-REFERENCEbasketball-reference.comVisit source
- Reference 31NCESnces.ed.govVisit source
- Reference 32SCIENCEDIRECTsciencedirect.comVisit source
- Reference 33NBERnber.orgVisit source
- Reference 34JSTORjstor.orgVisit source
- Reference 35STATLEARNINGstatlearning.comVisit source
- Reference 36PROJECTEUCLIDprojecteuclid.orgVisit source
- Reference 37JMLRjmlr.orgVisit source
- Reference 38R-BLOGGERSr-bloggers.comVisit source
- Reference 39ONLINECOURSESonlinecourses.science.psu.eduVisit source
- Reference 40NDwww3.nd.eduVisit source
- Reference 41DUMMIESdummies.comVisit source
- Reference 42STATstat.purdue.eduVisit source
- Reference 43STATWEBstatweb.stanford.eduVisit source
- Reference 44FACULTYfaculty.washington.eduVisit source
- Reference 45RSSrss.onlinelibrary.wiley.comVisit source
- Reference 46AREare.berkeley.eduVisit source
- Reference 47STATstat.umn.eduVisit source
- Reference 48SIMPLYPSYCHOLOGYsimplypsychology.orgVisit source
- Reference 49NCLncl.ac.ukVisit source
- Reference 50SEEING-THEORYseeing-theory.brown.eduVisit source
- Reference 51FORECASTPROforecastpro.comVisit source
- Reference 52TANDFONLINEtandfonline.comVisit source
- Reference 53IMFimf.orgVisit source
- Reference 54CDCcdc.govVisit source
- Reference 55KENEXAkenexa.comVisit source
- Reference 56CLIMATEclimate.govVisit source
- Reference 57FANGRAPHSfangraphs.comVisit source
- Reference 58AEAWEBaeaweb.orgVisit source
- Reference 59STATstat.berkeley.eduVisit source
- Reference 60HASTIEhastie.su.domainsVisit source
- Reference 61DEEPLEARNINGBOOKdeeplearningbook.orgVisit source
- Reference 62ARXIVarxiv.orgVisit source






