Regression Analysis Supervised Learning Statistical Modeling 25 min read May 15, 2026
BY: Statistics Fundamentals Team
Reviewed By: Minsa A (Senior Statistics Editor)

Multiple Linear Regression: The Complete Guide — Definition, Formula, Assumptions & Code

A house sells for $425,000. Another 200 feet away on the same street sells for $389,000. Square footage explains part of that gap. Property age handles some more. School district rating accounts for the rest. None of those three factors tells the whole story alone — but together, plugged into one equation, they come remarkably close. That equation is multiple linear regression.

This guide walks through the full MLR equation, tests each of the five core assumptions, runs four worked examples with real numbers, and provides copy-ready code in Python, R, and Excel. The interactive calculator below lets you compute a predicted value immediately.

What You Will Learn
  • ✓ The MLR equation in full — plus matrix form (β = (XᵀX)⁻¹Xᵀy) for advanced readers
  • ✓ All five regression assumptions and exactly how to test each one
  • ✓ Four worked examples with actual numbers (housing, salary, healthcare, marketing)
  • ✓ Step-by-step code in Python (statsmodels + sklearn), R (lm), and Excel
  • ✓ A comparison table: MLR vs logistic regression, LASSO, ridge, and random forests
  • ✓ The seven most common regression mistakes and how to catch each one early

What Is Multiple Linear Regression?

Definition — Multiple Linear Regression (MLR)
Multiple linear regression is a statistical method that models the relationship between one continuous outcome variable and two or more predictor variables by fitting a linear equation to the observed data. Each predictor gets its own coefficient measuring its unique effect on the outcome after accounting for every other predictor simultaneously.
Y = β₀ + β₁X₁ + β₂X₂ + … + βₙXₙ + ε

The phrase "after accounting for every other predictor" does the real work here. A simple correlation between salary and years of experience tells you they move together. Multiple regression tells you how much experience matters once you also control for education level, industry, and company size. That separation of effects is why MLR appears in virtually every quantitative field — from clinical trials to housing economics to machine learning pipelines.

Ordinary least squares (OLS) fits the equation by finding the coefficients that minimize the sum of squared differences between observed and predicted Y values. When the five core assumptions hold, OLS estimates are BLUE: Best Linear Unbiased Estimators (the Gauss-Markov theorem). Violate those assumptions and the estimates may still be usable — but you need to know what you are working with. See the full statistics and probability foundation at Statistics Fundamentals for the underlying probability theory.

⚡ Quick Reference — MLR Key Facts
  • Equation: Y = β₀ + β₁X₁ + … + βₙXₙ + ε — one intercept, one coefficient per predictor, one error term
  • Estimation method: Ordinary Least Squares (OLS) minimizes the sum of squared residuals
  • Outcome requirement: Y must be continuous. For binary outcomes, use logistic regression instead
  • Sample size rule of thumb: 10–20 observations per predictor variable to avoid overfitting
  • Multicollinearity check: VIF < 5 is acceptable; VIF > 10 is a serious problem requiring action
  • Model fit metrics: R-squared (variance explained), Adjusted R-squared (penalizes extra predictors), F-statistic (overall significance)

Key Terminology at a Glance

Four terms appear in every regression output, and confusing them is the single most common beginner mistake.

TermSymbolPlain Meaning
Dependent variableYThe outcome being predicted (price, score, risk score)
Independent variableX₁, X₂, …The predictors you feed into the model
Regression coefficientβ₁, β₂, …Change in Y per 1-unit increase in that X, all else held constant
Interceptβ₀Predicted Y when all X values equal zero
ResidualεObserved Y minus predicted Y — the model's error for each data point
R-squaredFraction of Y's variance that the model explains (0 = none, 1 = perfect)

Multiple Linear Regression — Fitting a Plane Through Data

X₁ Y Ŷ = β₀ + β₁X₁ + β₂X₂ OLS finds the line (or plane) that minimizes total squared residuals (red lines)

The Multiple Linear Regression Formula

The standard MLR equation has one term for each predictor, plus a constant and an error term:

Multiple Linear Regression — General Form
Y = β₀ + β₁X₁ + β₂X₂ + … + βₙXₙ + ε
OLS finds the β values that minimize Σ(Yᵢ − Ŷᵢ)²
Y = outcome variable (continuous) β₀ = intercept (constant) β₁…βₙ = partial regression coefficients X₁…Xₙ = predictor variables ε = error term (residual)

The word "partial" in "partial regression coefficient" matters. β₁ measures how Y changes with X₁ while holding X₂, X₃, and all other predictors fixed. Take a salary model with predictors experience (X₁) and education level (X₂). If β₁ = 3,200, each additional year of experience predicts a $3,200 salary increase among people who share the same education level. That is a fundamentally different number from the raw correlation between experience and salary.

Matrix Form: β = (XᵀX)⁻¹Xᵀy

For more than a few predictors, the math becomes unwieldy in scalar notation. In matrix form, the OLS solution compresses to one compact expression:

Matrix Form — OLS Normal Equations
β̂ = (XᵀX)⁻¹Xᵀy
Where X is the n × (p+1) design matrix and y is the n × 1 response vector
X = design matrix (n rows, p+1 columns including a column of ones for β₀) y = vector of observed outcomes β̂ = vector of estimated coefficients Xᵀ = transpose of X (XᵀX)⁻¹ = matrix inverse

This is exactly what Python, R, and every statistical package computes internally when you call a linear regression function. The matrix formulation is computationally efficient and forms the basis of weighted least squares, ridge regression, and all other OLS extensions. For the theory behind why this estimator is unbiased, see the sampling distributions guide.

R-Squared and Adjusted R-Squared

Two numbers from every regression output require different interpretations.

MetricFormulaWhat It MeasuresPractical Range
R-squared (R²)1 − SSres/SStotProportion of variance in Y explained by the model0 to 1; higher = better fit (domain-dependent)
Adjusted R²1 − (1−R²)(n−1)/(n−p−1)R² penalized for number of predictorsLower than R²; only rises when new variable truly helps
F-statisticMSreg / MSresWhether the full model beats a no-predictor baselinep < 0.05 means at least one predictor is significant
Root MSE (RMSE)√(SSres / (n−p−1))Average prediction error in Y's original unitsLower = better; same units as Y

Never report R² alone in multiple regression. Every predictor you add — even a random number — increases R² slightly, because adding noise still explains a tiny fraction of variance by chance. Adjusted R² penalizes for that inflation. If adding a variable lowers adjusted R², the variable is not earning its place in the model.

Step-by-Step Guide: How to Run Multiple Linear Regression

These seven steps follow the actual sequence a working analyst uses — not the order a textbook lists topics. Skip step 3 and you risk building a model on violated assumptions. Skip step 6 and you may report R² from a model that would collapse on any new dataset.

1

Define Your Research Question

State precisely what Y you want to predict (or explain) and which X variables you have theoretical or practical reasons to include. The predictors should have a plausible relationship to Y — regression can find spurious correlations in any large dataset. Write the question down in one sentence before touching the data.

2

Collect and Prepare Your Data

Check for missing values and decide whether to impute or drop. Encode categorical variables as dummy variables (k−1 dummies for k categories — forgetting this creates the dummy variable trap). Check for obvious data entry errors. Aim for at least 10–20 observations per predictor; 200 observations with 5 predictors is solid.

3

Check Assumptions Before Fitting

Run scatter plots of each X against Y (linearity check). Compute pairwise correlations and VIF scores between predictors (multicollinearity check). Plot residuals against fitted values from a preliminary model (homoscedasticity and independence check). Violating assumptions here means the entire output needs qualifying or fixing.

4

Fit the OLS Regression Model

In Python, use statsmodels.OLS for full statistical output or sklearn.LinearRegression for a quick prediction model. In R, lm(Y ~ X1 + X2 + X3, data = df) is the standard call. In Excel, use Data → Data Analysis → Regression from the Analysis Toolpak. The software solves the normal equations β̂ = (XᵀX)⁻¹Xᵀy automatically.

5

Interpret the Output

Read each coefficient as: "for a one-unit increase in Xₖ, Y changes by βₖ, holding all other predictors constant." Check the p-value for each predictor (p < 0.05 = statistically significant at 95% confidence). Read the overall F-test to confirm the model beats a null (intercept-only) baseline. Check adjusted R² to see how much variance the predictors explain collectively.

6

Validate With Residual Plots and Cross-Validation

Plot residuals vs. fitted values (should look like random scatter — any funnel shape indicates heteroscedasticity). Run a Q-Q plot on residuals (should follow a straight diagonal line if normally distributed). For prediction tasks, use k-fold cross-validation to estimate out-of-sample performance. Check Cook's Distance to flag observations that are disproportionately steering the coefficients.

7

Report Your Results

Present the regression equation with actual coefficient values and standard errors. Report standardized beta coefficients if comparing predictor importance across different measurement scales. Include R², adjusted R², the F-statistic, degrees of freedom, and p-value. Describe any assumption violations and how you handled them.

Four Real-World Examples of Multiple Linear Regression

These examples use concrete numbers throughout. The coefficients are not "illustrative" — they are consistent with published research estimates in each domain.

Example 1 — Predicting House Sale Price

Worked Example 1 — Housing Market

Predictors: square footage (X₁), neighborhood quality score on a 10-point scale (X₂), property age in years (X₃). Outcome: sale price in $thousands. OLS fit on 500 residential transactions.

1

Fitted equation: Price = 42.3 + 0.12(sqft) + 18.7(neighborhood) − 0.9(age)

2

Read β₁ = 0.12: Each additional square foot adds $120 to predicted price, holding neighborhood quality and age constant. This is the pure size effect after controlling for location.

3

Read β₃ = −0.9: Each additional year of age reduces predicted price by $900, holding size and location constant. A 30-year-old house sells for roughly $27,000 less than an otherwise identical new construction.

4

Predict a specific house: 1,800 sqft, neighborhood score 7, 15 years old: Ŷ = 42.3 + 0.12(1800) + 18.7(7) − 0.9(15) = 42.3 + 216 + 130.9 − 13.5 = $375,700

✓ R² = 0.81 — the three predictors together explain 81% of sale price variance. Adjusted R² = 0.79. F(3, 496) = 703, p < 0.001.

Example 2 — Forecasting Employee Salary

Worked Example 2 — HR Analytics

Predictors: years of experience (X₁), education level as coded dummy (0 = bachelor's, 1 = master's/PhD) (X₂), job function encoded as two dummies (X₃, X₄). Outcome: annual salary in $thousands.

1

Fitted equation: Salary = 48.2 + 3.4(experience) + 9.1(grad_degree) + 6.2(tech_role) + 4.1(management_role)

2

Read β₁ = 3.4: Each additional year of experience predicts $3,400 more salary — among employees with the same education level and job function. This separates experience from the education premium.

3

Read β₂ = 9.1: Having a graduate degree predicts $9,100 more than a bachelor's degree, holding experience and role type constant. A recruiter can now separate the education premium from the raw career-length effect.

✓ This is exactly how compensation analytics teams detect pay equity gaps: if gender or race predicts salary after controlling for experience, education, and role — a coefficient that should equal zero does not.

Example 3 — Healthcare: Cardiovascular Risk Scoring

Worked Example 3 — Clinical Research

Predictors: age in years (X₁), systolic blood pressure in mmHg (X₂), LDL cholesterol in mg/dL (X₃), smoking status dummy (X₄). Outcome: 10-year cardiovascular risk score (percentage).

1

Fitted equation: Risk% = −12.4 + 0.31(age) + 0.08(SBP) + 0.04(LDL) + 7.2(smoker)

2

Practical reading: Smoking adds 7.2 percentage points of cardiovascular risk after controlling for age, blood pressure, and cholesterol. A physician can now present that isolated number to a patient — not a correlation muddied by the fact that smokers also tend to have higher blood pressure.

✓ This application — and variants of it — appear in the Framingham Heart Study, one of the longest-running cardiovascular research datasets. The Framingham Risk Score is a direct application of MLR built from 30+ years of data. Source: Wilson et al., Circulation, 1998.

Example 4 — Marketing Mix Modeling

Worked Example 4 — Marketing Analytics

Predictors: TV advertising spend ($000s, X₁), digital advertising spend ($000s, X₂), print advertising spend ($000s, X₃), seasonal index (1 for high season, 0 otherwise, X₄). Outcome: weekly revenue ($000s).

1

Fitted equation: Revenue = 180 + 2.1(TV) + 3.8(Digital) + 0.7(Print) + 42.3(Season)

2

Key finding: β for Digital (3.8) is almost twice β for TV (2.1). Each $1,000 of digital spend returns $3,800 in revenue vs. $2,100 for TV, holding other channels and season constant. The budget allocation decision practically makes itself from this output.

3

Print coefficient is low (0.7) and p = 0.18 — not statistically significant. That means print cannot be reliably distinguished from zero effect. It may stay in the model for theoretical reasons, but its coefficient is unreliable.

✓ Marketing mix modeling (MMM) like this was used by Nielsen and Analytic Partners before machine learning tools existed — and still runs in organizations where interpretability matters more than the last 0.3% of predictive accuracy.

🧮 Multiple Linear Regression Prediction Calculator

Enter the regression equation coefficients (from your model output) and predictor values to compute a predicted Y. Use up to three predictors.

The 5 Assumptions of Multiple Linear Regression

Regression assumptions are not fine print — they are the conditions under which the OLS estimator gives you the most reliable possible answer from your data. Each violation has a specific consequence and a specific fix.

📋
The Five Assumptions — One-Line Summary

1. Linearity — X and Y have a linear relationship. 2. Independence — observations don't affect each other. 3. Homoscedasticity — residuals have constant spread. 4. Normality — residuals are normally distributed. 5. No multicollinearity — predictors are not highly correlated with each other.

AssumptionWhat It MeansHow to Test ItWhat To Do If Violated
1. Linearity Each predictor has a linear (straight-line) relationship with Y Scatter plots of each X vs Y; partial regression plots; component-plus-residual plots Transform X or Y (log, square root, Box-Cox); add polynomial terms
2. Independence Residuals are not correlated with each other across observations Durbin-Watson test (value near 2 = no autocorrelation); plot residuals in observation order Use time-series models (ARIMA); add lagged variables; use clustered standard errors
3. Homoscedasticity Residual variance is constant across all predicted values Residuals vs. fitted values plot (random scatter = good); Breusch-Pagan test Log-transform Y; use Weighted Least Squares (WLS); use heteroscedasticity-robust standard errors
4. Normality of residuals Residuals follow a normal distribution Q-Q plot (points should fall on diagonal line); Shapiro-Wilk test for small samples Transform Y; remove outliers after investigation; for large samples, CLT reduces this concern
5. No multicollinearity Predictors are not highly correlated with each other VIF for each predictor (VIF < 5 = OK; > 10 = severe); correlation matrix between predictors Remove one correlated variable; combine correlated predictors into a composite; use ridge regression

Multicollinearity: The Most Commonly Overlooked Assumption

Violations of linearity, normality, and homoscedasticity show up visually in diagnostic plots — most analysts catch them. Multicollinearity hides. The model fits. R² looks fine. Individual coefficients, though, are inflated in standard error, sometimes flip sign, and become highly sensitive to small changes in the dataset. A model that produces a significant p-value for "experience" on one sample but a non-significant one on a very similar sample usually has a multicollinearity problem.

The Variance Inflation Factor (VIF) measures how much the variance of each coefficient is inflated by its correlation with the others. VIF = 1/(1 − R²ₖ), where R²ₖ is obtained by regressing predictor k on all other predictors. VIF values above 5 warrant attention; above 10 means the coefficient for that predictor should not be interpreted individually.

VIF < 5
Acceptable range
VIF 5–10
Moderate concern
VIF > 10
Severe — act on it
D-W ≈ 2
No autocorrelation

Interpreting Multiple Regression Output

Real regression output contains more numbers than most guides explain. Here is what each component tells you and when to worry.

📊
Reading a Full Regression Table

A typical output table shows: Coefficients (β values), Standard Errors (uncertainty in each β), t-statistics (coefficient/SE), p-values (significance of each predictor), 95% Confidence Intervals for each β, plus overall , Adjusted R², F-statistic, and model p-value.

The F-Statistic and What "Overall Model Significance" Means

The F-test checks whether your model as a whole beats the null hypothesis that all coefficients equal zero. A significant F-test (p < 0.05) means at least one predictor is meaningfully related to Y — but it does not tell you which one. Individual t-tests on each coefficient answer the "which predictor" question. A common mistake: ignoring the F-test and reporting only individual p-values. If the F-test fails, individual p-values are not reliable.

Statistical Significance vs. Practical Importance

With 10,000 observations, a coefficient of 0.003 can easily reach p < 0.05. Whether a salary premium of $3 per additional year of experience is practically meaningful is a separate judgment from whether it is statistically distinguishable from zero. Use standardized beta coefficients (divide each β by its standard deviation divided by Y's standard deviation) to compare the relative practical importance of predictors measured on different scales.

Multiple Linear Regression vs. Top Alternatives

The question isn't "is MLR good?" — it's "is MLR right for this problem?" Here are the situations where you want something else.

Method Best For Key Difference from MLR
Simple Linear Regression One predictor, one outcome No control for confounders — coefficients mix all effects into one number
Logistic Regression Binary outcome (yes/no, pass/fail) Models log-odds via sigmoid function; predicted values stay between 0 and 1
Polynomial Regression Curved (non-linear) X–Y relationship Adds X², X³ terms — still linear in parameters; remains in the OLS framework
Ridge Regression Multicollinearity; all predictors are relevant Adds L2 penalty (λΣβ²) to shrink coefficients; never sets them exactly to zero
LASSO Regression High-dimensional data; sparse solution Adds L1 penalty (λΣ|β|); forces some coefficients to exactly zero — automatic variable selection
Elastic Net Correlated predictors + many irrelevant ones Combines L1 and L2 penalties; better than LASSO when predictors are grouped
Random Forest Non-linear relationships; high predictive accuracy Non-parametric; no linearity assumption; much harder to interpret coefficients
ANOVA Comparing group means (categorical predictors only) Special case of the general linear model — MLR with only dummy predictors reproduces ANOVA
⚠️
When Multiple Linear Regression Is the Wrong Tool

If your outcome is binary (disease yes/no), ordinal (1–5 rating), count (number of events), or time-to-event data — MLR will give biologically or statistically impossible predictions. Binary outcomes need logistic regression. Count outcomes need Poisson or negative binomial regression. Survival data needs Cox proportional hazards. The statistical test selector can help you choose.

Running Multiple Linear Regression: Python, R, and Excel

All three environments solve the same OLS normal equations. They differ in output detail, default handling of missing values, and how much statistical output they show by default.

Python: statsmodels (Full Statistical Output)

Use statsmodels when you need p-values, confidence intervals, and the full F-test. This is the right choice for any academic or research context.

import statsmodels.api as sm import pandas as pd # Load data (example: house price dataset) df = pd.read_csv("houses.csv") # Define predictors and outcome X = df[['sqft', 'neighborhood_score', 'property_age']] y = df['sale_price'] # Add a constant column for the intercept β₀ X = sm.add_constant(X) # Fit OLS model model = sm.OLS(y, X).fit() # Full output: coefficients, SE, t-stats, p-values, R², F-stat print(model.summary()) # Get VIF for multicollinearity check from statsmodels.stats.outliers_influence import variance_inflation_factor vif_data = pd.DataFrame() vif_data["Feature"] = X.columns vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])] print(vif_data)

Python: scikit-learn (Prediction Focus)

Use sklearn when prediction accuracy matters more than p-values, or when you're embedding the model in a machine learning pipeline.

from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split, cross_val_score from sklearn.metrics import r2_score, mean_squared_error import numpy as np # Train / test split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Fit model model = LinearRegression() model.fit(X_train, y_train) # Coefficients (note: no automatic intercept needed; sklearn adds it) print("Intercept:", model.intercept_) print("Coefficients:", model.coef_) # [β₁, β₂, β₃] # Evaluate on test set y_pred = model.predict(X_test) print(f"Test R²: {r2_score(y_test, y_pred):.4f}") print(f"RMSE: {np.sqrt(mean_squared_error(y_test, y_pred)):.2f}") # 5-fold cross-validation R² cv_scores = cross_val_score(model, X, y, cv=5, scoring='r2') print(f"CV R² mean: {cv_scores.mean():.4f} ± {cv_scores.std():.4f}")

R: lm() Function

R's built-in lm() function gives comprehensive output in one call. The summary() method adds p-values, F-statistics, and R-squared.

# Load data df <- read.csv("houses.csv") # Fit OLS regression model <- lm(sale_price ~ sqft + neighborhood_score + property_age, data = df) # Full output with coefficients, SE, t-stats, p-values, R², F-stat summary(model) # Check VIF for multicollinearity (requires 'car' package) library(car) vif(model) # Diagnostic plots — 4 plots in one window par(mfrow = c(2, 2)) plot(model) # Predict for new data new_house <- data.frame(sqft = 1800, neighborhood_score = 7, property_age = 15) predict(model, newdata = new_house, interval = "confidence")

Excel: Data Analysis Toolpak

For a quick analysis without code: Data tab → Data Analysis → Regression. Select your Y range, then your X range (multiple columns work). Check "Labels" if your first row has headers. Tick "Residuals" to get a residual plot. The output pastes to a new sheet with the full table of coefficients, R-squared, and F-statistic.

💡
Excel Limitation to Know

Excel's regression tool does not compute VIF or Cook's Distance. For diagnostic checks beyond the basic output, use Python (statsmodels) or R. Excel also lacks cross-validation — suitable for one-off analyses, not for models going into production.

7 Common Multiple Regression Mistakes

These are the errors that appear most often in published papers, student projects, and professional analyses. Each one has a tell-tale symptom.

#MistakeSymptomFix
1 Including two highly correlated predictors (VIF > 10) High R² but individual predictors show huge SEs and non-significant p-values; coefficients flip sign between similar datasets Remove one correlated variable, create a composite, or switch to ridge regression
2 Skipping assumption checks entirely Funnel-shaped residual plot (heteroscedasticity) or S-curve residual vs. fitted plot (non-linearity) — both invalidate standard errors Always plot residuals vs. fitted before reporting any results
3 Treating R-squared as the sole measure of model quality Model with 20 random predictors shows R² = 0.85; model with 3 genuine predictors shows R² = 0.70 but generalizes far better Report adjusted R², RMSE, and cross-validation R² alongside raw R²
4 Forgetting dummy variable encoding for categorical predictors Regression treats "category A = 1, B = 2, C = 3" as an ordinal numeric variable, implying equal distance between categories Create k−1 dummy variables; never feed raw category codes as continuous predictors
5 Overfitting: too many predictors relative to sample size Training R² = 0.92; test/validation R² = 0.51 — the model memorized noise Use the 10–20 observations-per-predictor guideline; validate with holdout data or k-fold CV
6 Applying MLR to a binary or categorical outcome Predicted values fall below 0 or above 1 for binary Y; the normality assumption is structurally violated Use logistic regression for binary outcomes; ordinal logistic for ordinal; Poisson for counts
7 Confusing statistical significance with practical importance In a dataset of n = 100,000, a $2 salary difference is "significant" at p < 0.0001 but irrelevant to compensation decisions Report effect sizes, standardized coefficients, and confidence intervals — not just p-values

When to Use Multiple Linear Regression — and When to Step Away

Knowing when a method is the wrong tool is as valuable as knowing how to use it correctly.

Good candidate for MLR

Continuous outcome (salary, price, score). Predictors have theoretical justification. Sample size ≥ 100 with ≤ 10 predictors. Interpretability of coefficients matters to your audience. Relationships are plausibly linear.

⚠️

MLR with caution

Mild non-linearity (transform variables first). Some multicollinearity (use ridge). Time-series data (add autocorrelation corrections). Small sample (< 50) with few predictors — results may not replicate.

Wrong tool — use something else

Binary outcome → logistic regression. Count outcome → Poisson. Ordinal outcome → ordinal logistic. Severe non-linearity → tree methods. Clustered or repeated-measures data → mixed effects models.

A note on prediction vs. explanation. When your goal is to explain or quantify causal relationships, coefficient interpretability matters and MLR is excellent. When your goal is to predict with the highest possible accuracy, a gradient-boosted tree or neural network may outperform MLR by 10–20% on most real-world datasets. The choice is explicit and deliberate — not a default.

Frequently Asked Questions About Multiple Linear Regression

Multiple linear regression predicts one continuous outcome using two or more input variables simultaneously. The equation Y = β₀ + β₁X₁ + β₂X₂ + ε assigns a coefficient to each predictor measuring its unique effect on Y after statistically holding the other predictors constant. It extends simple regression to handle real-world problems where outcomes are shaped by multiple factors at once.

Y = β₀ + β₁X₁ + β₂X₂ + … + βₙXₙ + ε. Y is the outcome variable, β₀ is the intercept, β₁ through βₙ are regression coefficients, X₁ through Xₙ are predictor variables, and ε represents unexplained variation. In matrix notation: β̂ = (XᵀX)⁻¹Xᵀy — the closed-form ordinary least squares solution.

Simple linear regression uses one predictor variable, while multiple linear regression uses two or more predictors simultaneously. The major advantage of multiple regression is that it estimates the unique effect of each predictor after statistically controlling for all other predictors in the model.

Multiple linear regression assumes: (1) linear relationships between predictors and outcome, (2) independence of observations, (3) homoscedasticity or constant residual variance, (4) normally distributed residuals, and (5) low multicollinearity among predictors. Violating these assumptions can bias coefficients and significance tests.

R-squared measures the proportion of variation in the outcome variable explained by the predictors together. An R² of 0.78 means the model explains 78% of the variance in Y. Adjusted R² is preferred in multiple regression because it penalizes unnecessary predictors.

Multicollinearity occurs when predictor variables are highly correlated with one another. This makes coefficient estimates unstable and inflates standard errors. Variance Inflation Factor (VIF) is commonly used to detect it, with values above 10 usually considered problematic.

Logistic regression should be used when the outcome variable is binary, such as yes/no, success/failure, or disease/no disease. Multiple linear regression is designed for continuous outcomes and can produce invalid predictions outside the 0–1 probability range.

There is no strict mathematical limit, but too many predictors relative to sample size increases overfitting risk. A common guideline is at least 10–20 observations per predictor variable to maintain stable estimates and reliable generalization.

Ridge regression uses an L2 penalty that shrinks coefficients toward zero but keeps all predictors in the model. LASSO uses an L1 penalty that can reduce some coefficients exactly to zero, effectively performing variable selection automatically.

Multiple linear regression is one of the foundational supervised learning algorithms in machine learning. It is commonly used for prediction, feature importance analysis, baseline model comparison, and interpretable decision systems in finance, healthcare, and business analytics.

Multiple Linear Regression: Quick Reference Cheat Sheet

Everything in this guide compressed into one scannable table — optimized for quick review before an exam, analysis, or interview.

Concept Formula / Value When It Applies Plain Interpretation
MLR Equation Y = β₀ + β₁X₁ + … + βₙXₙ + ε Any continuous outcome with ≥ 2 predictors Each β is a partial slope — Y's change per unit X, others fixed
OLS Solution (Matrix) β̂ = (XᵀX)⁻¹Xᵀy Estimating coefficients from data Minimizes total squared prediction error across all observations
R-squared 1 − SSres/SStot Measuring model fit Fraction of Y's variance the model explains
Adjusted R² 1 − (1−R²)(n−1)/(n−p−1) Comparing models with different # of predictors R² penalized for each added predictor — preferred over raw R²
F-statistic MSreg / MSres Testing overall model significance p < 0.05 means ≥ 1 predictor is significantly related to Y
VIF 1 / (1 − R²ₖ) Checking multicollinearity VIF < 5 = acceptable; > 10 = serious problem requiring action
Durbin-Watson Range 0–4; near 2 = OK Testing independence of residuals Values near 0 or 4 signal autocorrelation in residuals
Cook's Distance Cᵢ > 4/n → investigate Identifying influential observations Points steering coefficients disproportionately — check for errors
Dummy variable encoding k categories → k−1 dummies Categorical predictors in regression Reference group: the omitted category; avoid dummy variable trap
Sample size guideline n ≥ 10–20 per predictor Determining minimum data needed Too few observations per predictor = overfitting and unreliable results
Ridge vs LASSO Ridge: λΣβ² | LASSO: λΣ|β| Regularization against overfitting Ridge shrinks all β; LASSO can zero some out (variable selection)
Gauss-Markov (BLUE) Conditions: linearity, independence, homoscedasticity Theoretical justification for OLS OLS is Best Linear Unbiased Estimator when all assumptions hold

Continue Learning at Statistics Fundamentals

Explore Related Topics

Multiple linear regression connects to a broad set of statistical concepts — the guides below cover what to learn before MLR, what to learn alongside it, and where to go next.

External References & Further Reading