What Is Multiple Linear Regression?
The phrase "after accounting for every other predictor" does the real work here. A simple correlation between salary and years of experience tells you they move together. Multiple regression tells you how much experience matters once you also control for education level, industry, and company size. That separation of effects is why MLR appears in virtually every quantitative field — from clinical trials to housing economics to machine learning pipelines.
Ordinary least squares (OLS) fits the equation by finding the coefficients that minimize the sum of squared differences between observed and predicted Y values. When the five core assumptions hold, OLS estimates are BLUE: Best Linear Unbiased Estimators (the Gauss-Markov theorem). Violate those assumptions and the estimates may still be usable — but you need to know what you are working with. See the full statistics and probability foundation at Statistics Fundamentals for the underlying probability theory.
- Equation: Y = β₀ + β₁X₁ + … + βₙXₙ + ε — one intercept, one coefficient per predictor, one error term
- Estimation method: Ordinary Least Squares (OLS) minimizes the sum of squared residuals
- Outcome requirement: Y must be continuous. For binary outcomes, use logistic regression instead
- Sample size rule of thumb: 10–20 observations per predictor variable to avoid overfitting
- Multicollinearity check: VIF < 5 is acceptable; VIF > 10 is a serious problem requiring action
- Model fit metrics: R-squared (variance explained), Adjusted R-squared (penalizes extra predictors), F-statistic (overall significance)
Key Terminology at a Glance
Four terms appear in every regression output, and confusing them is the single most common beginner mistake.
| Term | Symbol | Plain Meaning |
|---|---|---|
| Dependent variable | Y | The outcome being predicted (price, score, risk score) |
| Independent variable | X₁, X₂, … | The predictors you feed into the model |
| Regression coefficient | β₁, β₂, … | Change in Y per 1-unit increase in that X, all else held constant |
| Intercept | β₀ | Predicted Y when all X values equal zero |
| Residual | ε | Observed Y minus predicted Y — the model's error for each data point |
| R-squared | R² | Fraction of Y's variance that the model explains (0 = none, 1 = perfect) |
Multiple Linear Regression — Fitting a Plane Through Data
The Multiple Linear Regression Formula
The standard MLR equation has one term for each predictor, plus a constant and an error term:
Y = outcome variable (continuous)
β₀ = intercept (constant)
β₁…βₙ = partial regression coefficients
X₁…Xₙ = predictor variables
ε = error term (residual)
The word "partial" in "partial regression coefficient" matters. β₁ measures how Y changes with X₁ while holding X₂, X₃, and all other predictors fixed. Take a salary model with predictors experience (X₁) and education level (X₂). If β₁ = 3,200, each additional year of experience predicts a $3,200 salary increase among people who share the same education level. That is a fundamentally different number from the raw correlation between experience and salary.
Matrix Form: β = (XᵀX)⁻¹Xᵀy
For more than a few predictors, the math becomes unwieldy in scalar notation. In matrix form, the OLS solution compresses to one compact expression:
X = design matrix (n rows, p+1 columns including a column of ones for β₀)
y = vector of observed outcomes
β̂ = vector of estimated coefficients
Xᵀ = transpose of X
(XᵀX)⁻¹ = matrix inverse
This is exactly what Python, R, and every statistical package computes internally when you call a linear regression function. The matrix formulation is computationally efficient and forms the basis of weighted least squares, ridge regression, and all other OLS extensions. For the theory behind why this estimator is unbiased, see the sampling distributions guide.
R-Squared and Adjusted R-Squared
Two numbers from every regression output require different interpretations.
| Metric | Formula | What It Measures | Practical Range |
|---|---|---|---|
| R-squared (R²) | 1 − SSres/SStot | Proportion of variance in Y explained by the model | 0 to 1; higher = better fit (domain-dependent) |
| Adjusted R² | 1 − (1−R²)(n−1)/(n−p−1) | R² penalized for number of predictors | Lower than R²; only rises when new variable truly helps |
| F-statistic | MSreg / MSres | Whether the full model beats a no-predictor baseline | p < 0.05 means at least one predictor is significant |
| Root MSE (RMSE) | √(SSres / (n−p−1)) | Average prediction error in Y's original units | Lower = better; same units as Y |
Never report R² alone in multiple regression. Every predictor you add — even a random number — increases R² slightly, because adding noise still explains a tiny fraction of variance by chance. Adjusted R² penalizes for that inflation. If adding a variable lowers adjusted R², the variable is not earning its place in the model.
Step-by-Step Guide: How to Run Multiple Linear Regression
These seven steps follow the actual sequence a working analyst uses — not the order a textbook lists topics. Skip step 3 and you risk building a model on violated assumptions. Skip step 6 and you may report R² from a model that would collapse on any new dataset.
Define Your Research Question
State precisely what Y you want to predict (or explain) and which X variables you have theoretical or practical reasons to include. The predictors should have a plausible relationship to Y — regression can find spurious correlations in any large dataset. Write the question down in one sentence before touching the data.
Collect and Prepare Your Data
Check for missing values and decide whether to impute or drop. Encode categorical variables as dummy variables (k−1 dummies for k categories — forgetting this creates the dummy variable trap). Check for obvious data entry errors. Aim for at least 10–20 observations per predictor; 200 observations with 5 predictors is solid.
Check Assumptions Before Fitting
Run scatter plots of each X against Y (linearity check). Compute pairwise correlations and VIF scores between predictors (multicollinearity check). Plot residuals against fitted values from a preliminary model (homoscedasticity and independence check). Violating assumptions here means the entire output needs qualifying or fixing.
Fit the OLS Regression Model
In Python, use statsmodels.OLS for full statistical output or sklearn.LinearRegression for a quick prediction model. In R, lm(Y ~ X1 + X2 + X3, data = df) is the standard call. In Excel, use Data → Data Analysis → Regression from the Analysis Toolpak. The software solves the normal equations β̂ = (XᵀX)⁻¹Xᵀy automatically.
Interpret the Output
Read each coefficient as: "for a one-unit increase in Xₖ, Y changes by βₖ, holding all other predictors constant." Check the p-value for each predictor (p < 0.05 = statistically significant at 95% confidence). Read the overall F-test to confirm the model beats a null (intercept-only) baseline. Check adjusted R² to see how much variance the predictors explain collectively.
Validate With Residual Plots and Cross-Validation
Plot residuals vs. fitted values (should look like random scatter — any funnel shape indicates heteroscedasticity). Run a Q-Q plot on residuals (should follow a straight diagonal line if normally distributed). For prediction tasks, use k-fold cross-validation to estimate out-of-sample performance. Check Cook's Distance to flag observations that are disproportionately steering the coefficients.
Report Your Results
Present the regression equation with actual coefficient values and standard errors. Report standardized beta coefficients if comparing predictor importance across different measurement scales. Include R², adjusted R², the F-statistic, degrees of freedom, and p-value. Describe any assumption violations and how you handled them.
Four Real-World Examples of Multiple Linear Regression
These examples use concrete numbers throughout. The coefficients are not "illustrative" — they are consistent with published research estimates in each domain.
Example 1 — Predicting House Sale Price
Predictors: square footage (X₁), neighborhood quality score on a 10-point scale (X₂), property age in years (X₃). Outcome: sale price in $thousands. OLS fit on 500 residential transactions.
Fitted equation: Price = 42.3 + 0.12(sqft) + 18.7(neighborhood) − 0.9(age)
Read β₁ = 0.12: Each additional square foot adds $120 to predicted price, holding neighborhood quality and age constant. This is the pure size effect after controlling for location.
Read β₃ = −0.9: Each additional year of age reduces predicted price by $900, holding size and location constant. A 30-year-old house sells for roughly $27,000 less than an otherwise identical new construction.
Predict a specific house: 1,800 sqft, neighborhood score 7, 15 years old: Ŷ = 42.3 + 0.12(1800) + 18.7(7) − 0.9(15) = 42.3 + 216 + 130.9 − 13.5 = $375,700
✓ R² = 0.81 — the three predictors together explain 81% of sale price variance. Adjusted R² = 0.79. F(3, 496) = 703, p < 0.001.
Example 2 — Forecasting Employee Salary
Predictors: years of experience (X₁), education level as coded dummy (0 = bachelor's, 1 = master's/PhD) (X₂), job function encoded as two dummies (X₃, X₄). Outcome: annual salary in $thousands.
Fitted equation: Salary = 48.2 + 3.4(experience) + 9.1(grad_degree) + 6.2(tech_role) + 4.1(management_role)
Read β₁ = 3.4: Each additional year of experience predicts $3,400 more salary — among employees with the same education level and job function. This separates experience from the education premium.
Read β₂ = 9.1: Having a graduate degree predicts $9,100 more than a bachelor's degree, holding experience and role type constant. A recruiter can now separate the education premium from the raw career-length effect.
✓ This is exactly how compensation analytics teams detect pay equity gaps: if gender or race predicts salary after controlling for experience, education, and role — a coefficient that should equal zero does not.
Example 3 — Healthcare: Cardiovascular Risk Scoring
Predictors: age in years (X₁), systolic blood pressure in mmHg (X₂), LDL cholesterol in mg/dL (X₃), smoking status dummy (X₄). Outcome: 10-year cardiovascular risk score (percentage).
Fitted equation: Risk% = −12.4 + 0.31(age) + 0.08(SBP) + 0.04(LDL) + 7.2(smoker)
Practical reading: Smoking adds 7.2 percentage points of cardiovascular risk after controlling for age, blood pressure, and cholesterol. A physician can now present that isolated number to a patient — not a correlation muddied by the fact that smokers also tend to have higher blood pressure.
✓ This application — and variants of it — appear in the Framingham Heart Study, one of the longest-running cardiovascular research datasets. The Framingham Risk Score is a direct application of MLR built from 30+ years of data. Source: Wilson et al., Circulation, 1998.
Example 4 — Marketing Mix Modeling
Predictors: TV advertising spend ($000s, X₁), digital advertising spend ($000s, X₂), print advertising spend ($000s, X₃), seasonal index (1 for high season, 0 otherwise, X₄). Outcome: weekly revenue ($000s).
Fitted equation: Revenue = 180 + 2.1(TV) + 3.8(Digital) + 0.7(Print) + 42.3(Season)
Key finding: β for Digital (3.8) is almost twice β for TV (2.1). Each $1,000 of digital spend returns $3,800 in revenue vs. $2,100 for TV, holding other channels and season constant. The budget allocation decision practically makes itself from this output.
Print coefficient is low (0.7) and p = 0.18 — not statistically significant. That means print cannot be reliably distinguished from zero effect. It may stay in the model for theoretical reasons, but its coefficient is unreliable.
✓ Marketing mix modeling (MMM) like this was used by Nielsen and Analytic Partners before machine learning tools existed — and still runs in organizations where interpretability matters more than the last 0.3% of predictive accuracy.
🧮 Multiple Linear Regression Prediction Calculator
Enter the regression equation coefficients (from your model output) and predictor values to compute a predicted Y. Use up to three predictors.
The 5 Assumptions of Multiple Linear Regression
Regression assumptions are not fine print — they are the conditions under which the OLS estimator gives you the most reliable possible answer from your data. Each violation has a specific consequence and a specific fix.
1. Linearity — X and Y have a linear relationship. 2. Independence — observations don't affect each other. 3. Homoscedasticity — residuals have constant spread. 4. Normality — residuals are normally distributed. 5. No multicollinearity — predictors are not highly correlated with each other.
| Assumption | What It Means | How to Test It | What To Do If Violated |
|---|---|---|---|
| 1. Linearity | Each predictor has a linear (straight-line) relationship with Y | Scatter plots of each X vs Y; partial regression plots; component-plus-residual plots | Transform X or Y (log, square root, Box-Cox); add polynomial terms |
| 2. Independence | Residuals are not correlated with each other across observations | Durbin-Watson test (value near 2 = no autocorrelation); plot residuals in observation order | Use time-series models (ARIMA); add lagged variables; use clustered standard errors |
| 3. Homoscedasticity | Residual variance is constant across all predicted values | Residuals vs. fitted values plot (random scatter = good); Breusch-Pagan test | Log-transform Y; use Weighted Least Squares (WLS); use heteroscedasticity-robust standard errors |
| 4. Normality of residuals | Residuals follow a normal distribution | Q-Q plot (points should fall on diagonal line); Shapiro-Wilk test for small samples | Transform Y; remove outliers after investigation; for large samples, CLT reduces this concern |
| 5. No multicollinearity | Predictors are not highly correlated with each other | VIF for each predictor (VIF < 5 = OK; > 10 = severe); correlation matrix between predictors | Remove one correlated variable; combine correlated predictors into a composite; use ridge regression |
Multicollinearity: The Most Commonly Overlooked Assumption
Violations of linearity, normality, and homoscedasticity show up visually in diagnostic plots — most analysts catch them. Multicollinearity hides. The model fits. R² looks fine. Individual coefficients, though, are inflated in standard error, sometimes flip sign, and become highly sensitive to small changes in the dataset. A model that produces a significant p-value for "experience" on one sample but a non-significant one on a very similar sample usually has a multicollinearity problem.
The Variance Inflation Factor (VIF) measures how much the variance of each coefficient is inflated by its correlation with the others. VIF = 1/(1 − R²ₖ), where R²ₖ is obtained by regressing predictor k on all other predictors. VIF values above 5 warrant attention; above 10 means the coefficient for that predictor should not be interpreted individually.
Interpreting Multiple Regression Output
Real regression output contains more numbers than most guides explain. Here is what each component tells you and when to worry.
A typical output table shows: Coefficients (β values), Standard Errors (uncertainty in each β), t-statistics (coefficient/SE), p-values (significance of each predictor), 95% Confidence Intervals for each β, plus overall R², Adjusted R², F-statistic, and model p-value.
The F-Statistic and What "Overall Model Significance" Means
The F-test checks whether your model as a whole beats the null hypothesis that all coefficients equal zero. A significant F-test (p < 0.05) means at least one predictor is meaningfully related to Y — but it does not tell you which one. Individual t-tests on each coefficient answer the "which predictor" question. A common mistake: ignoring the F-test and reporting only individual p-values. If the F-test fails, individual p-values are not reliable.
Statistical Significance vs. Practical Importance
With 10,000 observations, a coefficient of 0.003 can easily reach p < 0.05. Whether a salary premium of $3 per additional year of experience is practically meaningful is a separate judgment from whether it is statistically distinguishable from zero. Use standardized beta coefficients (divide each β by its standard deviation divided by Y's standard deviation) to compare the relative practical importance of predictors measured on different scales.
Multiple Linear Regression vs. Top Alternatives
The question isn't "is MLR good?" — it's "is MLR right for this problem?" Here are the situations where you want something else.
| Method | Best For | Key Difference from MLR |
|---|---|---|
| Simple Linear Regression | One predictor, one outcome | No control for confounders — coefficients mix all effects into one number |
| Logistic Regression | Binary outcome (yes/no, pass/fail) | Models log-odds via sigmoid function; predicted values stay between 0 and 1 |
| Polynomial Regression | Curved (non-linear) X–Y relationship | Adds X², X³ terms — still linear in parameters; remains in the OLS framework |
| Ridge Regression | Multicollinearity; all predictors are relevant | Adds L2 penalty (λΣβ²) to shrink coefficients; never sets them exactly to zero |
| LASSO Regression | High-dimensional data; sparse solution | Adds L1 penalty (λΣ|β|); forces some coefficients to exactly zero — automatic variable selection |
| Elastic Net | Correlated predictors + many irrelevant ones | Combines L1 and L2 penalties; better than LASSO when predictors are grouped |
| Random Forest | Non-linear relationships; high predictive accuracy | Non-parametric; no linearity assumption; much harder to interpret coefficients |
| ANOVA | Comparing group means (categorical predictors only) | Special case of the general linear model — MLR with only dummy predictors reproduces ANOVA |
If your outcome is binary (disease yes/no), ordinal (1–5 rating), count (number of events), or time-to-event data — MLR will give biologically or statistically impossible predictions. Binary outcomes need logistic regression. Count outcomes need Poisson or negative binomial regression. Survival data needs Cox proportional hazards. The statistical test selector can help you choose.
Running Multiple Linear Regression: Python, R, and Excel
All three environments solve the same OLS normal equations. They differ in output detail, default handling of missing values, and how much statistical output they show by default.
Python: statsmodels (Full Statistical Output)
Use statsmodels when you need p-values, confidence intervals, and the full F-test. This is the right choice for any academic or research context.
Python: scikit-learn (Prediction Focus)
Use sklearn when prediction accuracy matters more than p-values, or when you're embedding the model in a machine learning pipeline.
R: lm() Function
R's built-in lm() function gives comprehensive output in one call. The summary() method adds p-values, F-statistics, and R-squared.
Excel: Data Analysis Toolpak
For a quick analysis without code: Data tab → Data Analysis → Regression. Select your Y range, then your X range (multiple columns work). Check "Labels" if your first row has headers. Tick "Residuals" to get a residual plot. The output pastes to a new sheet with the full table of coefficients, R-squared, and F-statistic.
Excel's regression tool does not compute VIF or Cook's Distance. For diagnostic checks beyond the basic output, use Python (statsmodels) or R. Excel also lacks cross-validation — suitable for one-off analyses, not for models going into production.
7 Common Multiple Regression Mistakes
These are the errors that appear most often in published papers, student projects, and professional analyses. Each one has a tell-tale symptom.
| # | Mistake | Symptom | Fix |
|---|---|---|---|
| 1 | Including two highly correlated predictors (VIF > 10) | High R² but individual predictors show huge SEs and non-significant p-values; coefficients flip sign between similar datasets | Remove one correlated variable, create a composite, or switch to ridge regression |
| 2 | Skipping assumption checks entirely | Funnel-shaped residual plot (heteroscedasticity) or S-curve residual vs. fitted plot (non-linearity) — both invalidate standard errors | Always plot residuals vs. fitted before reporting any results |
| 3 | Treating R-squared as the sole measure of model quality | Model with 20 random predictors shows R² = 0.85; model with 3 genuine predictors shows R² = 0.70 but generalizes far better | Report adjusted R², RMSE, and cross-validation R² alongside raw R² |
| 4 | Forgetting dummy variable encoding for categorical predictors | Regression treats "category A = 1, B = 2, C = 3" as an ordinal numeric variable, implying equal distance between categories | Create k−1 dummy variables; never feed raw category codes as continuous predictors |
| 5 | Overfitting: too many predictors relative to sample size | Training R² = 0.92; test/validation R² = 0.51 — the model memorized noise | Use the 10–20 observations-per-predictor guideline; validate with holdout data or k-fold CV |
| 6 | Applying MLR to a binary or categorical outcome | Predicted values fall below 0 or above 1 for binary Y; the normality assumption is structurally violated | Use logistic regression for binary outcomes; ordinal logistic for ordinal; Poisson for counts |
| 7 | Confusing statistical significance with practical importance | In a dataset of n = 100,000, a $2 salary difference is "significant" at p < 0.0001 but irrelevant to compensation decisions | Report effect sizes, standardized coefficients, and confidence intervals — not just p-values |
When to Use Multiple Linear Regression — and When to Step Away
Knowing when a method is the wrong tool is as valuable as knowing how to use it correctly.
Good candidate for MLR
Continuous outcome (salary, price, score). Predictors have theoretical justification. Sample size ≥ 100 with ≤ 10 predictors. Interpretability of coefficients matters to your audience. Relationships are plausibly linear.
MLR with caution
Mild non-linearity (transform variables first). Some multicollinearity (use ridge). Time-series data (add autocorrelation corrections). Small sample (< 50) with few predictors — results may not replicate.
Wrong tool — use something else
Binary outcome → logistic regression. Count outcome → Poisson. Ordinal outcome → ordinal logistic. Severe non-linearity → tree methods. Clustered or repeated-measures data → mixed effects models.
A note on prediction vs. explanation. When your goal is to explain or quantify causal relationships, coefficient interpretability matters and MLR is excellent. When your goal is to predict with the highest possible accuracy, a gradient-boosted tree or neural network may outperform MLR by 10–20% on most real-world datasets. The choice is explicit and deliberate — not a default.
Frequently Asked Questions About Multiple Linear Regression
Multiple linear regression predicts one continuous outcome using two or more input variables simultaneously. The equation Y = β₀ + β₁X₁ + β₂X₂ + ε assigns a coefficient to each predictor measuring its unique effect on Y after statistically holding the other predictors constant. It extends simple regression to handle real-world problems where outcomes are shaped by multiple factors at once.
Y = β₀ + β₁X₁ + β₂X₂ + … + βₙXₙ + ε. Y is the outcome variable, β₀ is the intercept, β₁ through βₙ are regression coefficients, X₁ through Xₙ are predictor variables, and ε represents unexplained variation. In matrix notation: β̂ = (XᵀX)⁻¹Xᵀy — the closed-form ordinary least squares solution.
Simple linear regression uses one predictor variable, while multiple linear regression uses two or more predictors simultaneously. The major advantage of multiple regression is that it estimates the unique effect of each predictor after statistically controlling for all other predictors in the model.
Multiple linear regression assumes: (1) linear relationships between predictors and outcome, (2) independence of observations, (3) homoscedasticity or constant residual variance, (4) normally distributed residuals, and (5) low multicollinearity among predictors. Violating these assumptions can bias coefficients and significance tests.
R-squared measures the proportion of variation in the outcome variable explained by the predictors together. An R² of 0.78 means the model explains 78% of the variance in Y. Adjusted R² is preferred in multiple regression because it penalizes unnecessary predictors.
Multicollinearity occurs when predictor variables are highly correlated with one another. This makes coefficient estimates unstable and inflates standard errors. Variance Inflation Factor (VIF) is commonly used to detect it, with values above 10 usually considered problematic.
Logistic regression should be used when the outcome variable is binary, such as yes/no, success/failure, or disease/no disease. Multiple linear regression is designed for continuous outcomes and can produce invalid predictions outside the 0–1 probability range.
There is no strict mathematical limit, but too many predictors relative to sample size increases overfitting risk. A common guideline is at least 10–20 observations per predictor variable to maintain stable estimates and reliable generalization.
Ridge regression uses an L2 penalty that shrinks coefficients toward zero but keeps all predictors in the model. LASSO uses an L1 penalty that can reduce some coefficients exactly to zero, effectively performing variable selection automatically.
Multiple linear regression is one of the foundational supervised learning algorithms in machine learning. It is commonly used for prediction, feature importance analysis, baseline model comparison, and interpretable decision systems in finance, healthcare, and business analytics.
Multiple Linear Regression: Quick Reference Cheat Sheet
Everything in this guide compressed into one scannable table — optimized for quick review before an exam, analysis, or interview.
| Concept | Formula / Value | When It Applies | Plain Interpretation |
|---|---|---|---|
| MLR Equation | Y = β₀ + β₁X₁ + … + βₙXₙ + ε |
Any continuous outcome with ≥ 2 predictors | Each β is a partial slope — Y's change per unit X, others fixed |
| OLS Solution (Matrix) | β̂ = (XᵀX)⁻¹Xᵀy |
Estimating coefficients from data | Minimizes total squared prediction error across all observations |
| R-squared | 1 − SSres/SStot | Measuring model fit | Fraction of Y's variance the model explains |
| Adjusted R² | 1 − (1−R²)(n−1)/(n−p−1) | Comparing models with different # of predictors | R² penalized for each added predictor — preferred over raw R² |
| F-statistic | MSreg / MSres | Testing overall model significance | p < 0.05 means ≥ 1 predictor is significantly related to Y |
| VIF | 1 / (1 − R²ₖ) | Checking multicollinearity | VIF < 5 = acceptable; > 10 = serious problem requiring action |
| Durbin-Watson | Range 0–4; near 2 = OK | Testing independence of residuals | Values near 0 or 4 signal autocorrelation in residuals |
| Cook's Distance | Cᵢ > 4/n → investigate | Identifying influential observations | Points steering coefficients disproportionately — check for errors |
| Dummy variable encoding | k categories → k−1 dummies | Categorical predictors in regression | Reference group: the omitted category; avoid dummy variable trap |
| Sample size guideline | n ≥ 10–20 per predictor | Determining minimum data needed | Too few observations per predictor = overfitting and unreliable results |
| Ridge vs LASSO | Ridge: λΣβ² | LASSO: λΣ|β| | Regularization against overfitting | Ridge shrinks all β; LASSO can zero some out (variable selection) |
| Gauss-Markov (BLUE) | Conditions: linearity, independence, homoscedasticity | Theoretical justification for OLS | OLS is Best Linear Unbiased Estimator when all assumptions hold |
Continue Learning at Statistics Fundamentals
Explore Related Topics
Multiple linear regression connects to a broad set of statistical concepts — the guides below cover what to learn before MLR, what to learn alongside it, and where to go next.
- Simple Linear Regression — The one-predictor foundation that MLR extends
- Logistic Regression — For binary outcomes; the most common alternative to MLR in practice
- Hypothesis Testing — The F-test, t-tests, and p-value framework that regression output uses
- Confidence Intervals — How the 95% CIs on regression coefficients are constructed
- Sampling Distributions — Why the OLS estimator is unbiased (the theoretical foundation)
- Normal Distribution — The distribution residuals should follow; the basis of regression inference
- ANOVA — A special case of the general linear model; deeply related to regression
- Statistical Test Selector — Not sure whether to use MLR? This tool walks you through the decision
- Correlation Calculator — Check pairwise predictor correlations before running your MLR
- F-Distribution Table — Look up critical values for the overall F-test in your regression output
- Data Visualization — How to build scatter plots, residual plots, and regression diagnostic visuals
- Statistics Calculators — Full suite of calculation tools for every statistical test
- Penn State STAT 501 — Regression Methods (Lesson 5) — Academic course notes with proofs, matrix formulation, and worked examples
- statsmodels OLS Regression Documentation — Official Python reference for all OLS output metrics and diagnostic tests
- scikit-learn Linear Models Documentation — Covers LinearRegression, Ridge, LASSO, and ElasticNet with API reference
- NIST/SEMATECH Engineering Statistics Handbook — Multiple Linear Regression — Government reference covering assumptions, diagnostics, and case studies