What Is Simple Linear Regression?
Simple linear regression models the straight-line relationship between one predictor variable (X) and one outcome variable (Y).
The word "simple" here means one predictor, not that the method is trivial. This distinguishes it from multiple linear regression, which handles two or more predictors. The regression line is the single straight line that sits as close as possible to all observed data points simultaneously — and "closeness" is defined by squared vertical distances.
Regression was introduced by Francis Galton in the late 19th century studying the inheritance of height — he noticed that tall parents tend to have children who are tall but closer to average, a phenomenon he called "regression to the mean." The statistical machinery that makes prediction precise came later through the work of Karl Pearson and Ronald Fisher. For a treatment of related foundational topics, the descriptive statistics and statistics and probability guides at Statistics Fundamentals cover the mean, variance, and probability concepts that underpin everything here.
- Equation: Y = β₀ + β₁X + ε — where β₀ is the intercept, β₁ is the slope, ε is the error term
- Goal: Minimize SSE = Σ(yᵢ − ŷᵢ)² — the sum of squared residuals
- Slope formula: β₁ = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)²
- Intercept formula: β₀ = ȳ − β₁x̄ (the line always passes through (x̄, ȳ))
- Model fit: R² = SSR / SST = 1 − SSE/SST — proportion of variance explained
- Assumptions: Use the LINE Check — Linearity, Independence, Normality, Equal variance
The Regression Equation: Y = β₀ + β₁X + ε
Every simple linear regression model has the same structure. Breaking it down term by term removes the mystery.
Y = outcome variable (dependent)
X = predictor variable (independent)
β₀ = intercept (Y when X = 0)
β₁ = slope (change in Y per unit X)
ε = error term (unexplained variation)
What Is β₁? Interpreting the Slope Coefficient
β₁ is the average change in Y for each one-unit increase in X, holding everything else constant.
If β₁ = 3.2 in a model predicting exam score (Y) from study hours (X), that means each additional hour of study per week is associated with an average 3.2-point increase in exam score. The coefficient is expressed in the units of Y per unit of X — so if Y is in dollars and X is in years, β₁ is in dollars per year.
The sign of β₁ tells you direction: positive means X and Y move together (more X → more Y); negative means they move in opposite directions. A slope of exactly zero means X provides no linear information about Y.
What Is β₀? Interpreting the Intercept
β₀ is the predicted value of Y when X equals zero.
Sometimes this is meaningful — if X is age (years) and Y is blood pressure, β₀ would be predicted blood pressure at birth. Often the intercept has no practical interpretation because X = 0 is outside the observed data range or physically impossible. In those cases, β₀ is a mathematical anchor that positions the line correctly, not a value you should report or interpret on its own.
What Is ε? The Error Term Explained
ε captures all variation in Y that X does not explain — noise, unmeasured variables, and genuine randomness.
In practice, you never observe ε directly. What you observe are residuals — the differences between actual Y values and the fitted line's predictions: eᵢ = yᵢ − ŷᵢ. Examining the pattern of residuals is the primary way to check whether your regression model is valid. Residuals should look random; any systematic pattern signals a problem with the model.
Ordinary Least Squares (OLS): How the Line Is Fitted
OLS finds the slope and intercept that minimize the Sum of Squared Errors (SSE) — the total squared distance between observed Y values and the fitted line.
Why squared? Squaring prevents positive and negative errors from canceling out, and it penalizes large errors more than small ones — which makes outliers influential. The minimization problem has a closed-form solution, meaning you do not need an iterative algorithm. The formulas below give the exact answers in one calculation.
x̄, ȳ = sample means of X and Y
Σ = sum over all n observations
β̂₁ = estimated slope
β̂₀ = estimated intercept
Under the four LINE assumptions, OLS is the Best Linear Unbiased Estimator (BLUE) — meaning no other linear estimator produces lower-variance estimates of β₀ and β₁. This theorem, proved by Carl Friedrich Gauss and Andrey Markov, is the theoretical foundation that justifies using OLS for inference, not just description. See the Gauss-Markov theorem for the formal proof.
Worked Example: Calculating Slope, Intercept, and R² by Hand
The best way to understand what OLS is doing is to run the numbers once manually. The dataset below tracks weekly study hours (X) and exam scores (Y) for five students — small enough to follow step by step.
| Student | Hours / week (X) | Exam Score (Y) | xᵢ − x̄ | yᵢ − ȳ | (xᵢ−x̄)(yᵢ−ȳ) | (xᵢ−x̄)² |
|---|---|---|---|---|---|---|
| A | 2 | 50 | −4 | −22 | 88 | 16 |
| B | 4 | 60 | −2 | −12 | 24 | 4 |
| C | 6 | 72 | 0 | 0 | 0 | 0 |
| D | 8 | 82 | 2 | 10 | 20 | 4 |
| E | 10 | 96 | 4 | 24 | 96 | 16 |
| Mean | x̄ = 6 | ȳ = 72 | — | — | Σ = 228 | Σ = 40 |
Find the regression equation and R² for the study hours vs. exam score data above.
Calculate β̂₁ (slope): β̂₁ = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)² = 228 / 40 = 5.7
Calculate β̂₀ (intercept): β̂₀ = ȳ − β̂₁ · x̄ = 72 − (5.7 × 6) = 72 − 34.2 = 37.8
Write the equation: Ŷ = 37.8 + 5.7X — so a student studying 7 hours would be predicted to score 37.8 + (5.7 × 7) = 77.7 points
Calculate SST: SST = Σ(yᵢ − ȳ)² = (−22)² + (−12)² + 0² + 10² + 24² = 484 + 144 + 0 + 100 + 576 = 1,304
Calculate SSE: Compute each ŷᵢ = 37.8 + 5.7xᵢ, then SSE = Σ(yᵢ − ŷᵢ)² ≈ 2.8 (nearly perfect fit — the data was designed to be linear)
Calculate R²: R² = 1 − SSE/SST = 1 − 2.8/1304 ≈ 0.9979 — the model explains 99.8% of the variance in exam scores
✓ Regression equation: Ŷ = 37.8 + 5.7X. Interpretation: each additional hour of study per week is associated with an average 5.7-point increase in exam score. R² = 0.998 indicates the line fits nearly perfectly.
📊 Interactive Regression Visualizer
Click on the plot to add data points. The regression line and statistics update in real time.
How to Interpret R-Squared (R²)
R-squared measures the proportion of variance in Y that the model explains — it ranges from 0 to 1, where 1 means a perfect fit.
The three variance components that define R² are worth knowing by name:
SST = Σ(yᵢ − ȳ)² — Total Sum of Squares (total variation in Y)
SSR = Σ(ŷᵢ − ȳ)² — Regression Sum of Squares (variation explained by X)
SSE = Σ(yᵢ − ŷᵢ)² — Error Sum of Squares (unexplained residual variation)
SST is fixed — it is the total variation in your outcome data regardless of any model. OLS partitions SST into the part the model explains (SSR) and the part it doesn't (SSE). R² is simply SSR's share of SST.
A dataset can have R² = 0.99 while violating homoscedasticity. Anscombe's Quartet — four datasets with identical R², slope, and intercept but wildly different shapes — demonstrates this. Always plot residuals. R² measures fit, not correctness. A high R² with an invalid model produces unreliable predictions. For the relationship between R² and the Pearson correlation, see the scatter plots and correlation guide.
Pearson r vs. R²: The Exact Relationship
In simple linear regression (one predictor only), R² equals the square of the Pearson correlation coefficient: R² = r². This means r = √R² with the sign of β₁. If R² = 0.64, then |r| = 0.80. This relationship does not hold in multiple regression, where R² and the individual pairwise correlations diverge.
The LINE Check: 4 Regression Assumptions and How to Test Them
The four assumptions of simple linear regression can be remembered with the LINE Check: Linearity, Independence of errors, Normality of residuals, and Equal variance (Homoscedasticity).
The LINE Check is a complete diagnostic protocol, not just a memory aid. Each letter maps to a specific assumption, a specific diagnostic test, and a specific fix when the test fails. Run all four checks before trusting any regression output.
Linearity — The relationship between X and Y must be approximately straight
Plot Y versus X before fitting the model. A curved pattern means linear regression will produce biased coefficients — the estimated β₁ will be wrong, not just imprecise. This is the most fundamental check; if linearity fails, no amount of fixing the other assumptions saves the model.
Diagnostic: Ramsey RESET test | Plot: Y vs X scatter, residuals vs fittedFix: Log-transform X or Y, add a polynomial term (X²), or use nonlinear regression
Independence — Residuals must not be correlated with each other
Independence is violated when observations are collected over time (autocorrelation) or clustered in groups. When residuals are correlated, standard errors are underestimated and p-values become unreliable. This assumption matters most for time-series or longitudinal data and is often irrelevant for cross-sectional surveys.
Diagnostic: Durbin-Watson statistic (target: 1.5–2.5) | Plot: residuals in collection orderFix: Add a lagged residual term, use Generalized Least Squares (GLS), or cluster robust standard errors
Normality — Residuals (not Y itself) should be approximately normally distributed
A common error is checking whether Y is normal. That is the wrong variable. The normality assumption applies to ε — the residuals. This assumption matters most for small samples (n < 30) where inference relies on the normal distribution. For large samples, the Central Limit Theorem makes this assumption less critical because the sampling distribution of β̂₁ converges to normal regardless.
Diagnostic: Shapiro-Wilk test, Q-Q plot of residualsFix: Increase sample size, apply robust regression, or transform Y (log, square root)
Equal Variance (Homoscedasticity) — Residual variance must be constant across all X values
Heteroscedasticity produces a fan-shaped residual plot — residuals spread wider as X increases. This is the most common violation in practice, especially in salary, income, or financial data where variability grows with the level. When present, OLS estimates are still unbiased but no longer efficient — standard errors are wrong, so hypothesis tests and confidence intervals cannot be trusted.
Diagnostic: Breusch-Pagan test | Plot: residuals vs fitted values (fan = fail)Fix: Log-transform Y, use Weighted Least Squares (WLS), or apply heteroscedasticity-robust (HC) standard errors
Violating Linearity → biased coefficients. Violating Independence → wrong standard errors (usually too small). Violating Normality → unreliable p-values (mainly in small samples). Violating Equal Variance → inefficient estimates with incorrect standard errors. Each violation has a distinct consequence and a different fix.
What Is the Difference Between Correlation and Simple Linear Regression?
Correlation measures the strength and direction of a linear relationship; regression estimates a prediction equation with specific coefficients in real-world units.
| Feature | Pearson Correlation (r) | Simple Linear Regression |
|---|---|---|
| What it produces | A single number (r) from −1 to +1 | An equation: Ŷ = β₀ + β₁X |
| Direction of relationship | Symmetric: r(X,Y) = r(Y,X) | Asymmetric: X predicts Y, not the reverse |
| Units | Unitless (standardized) | β₁ in units of Y per unit of X |
| Prediction | No — measures association only | Yes — gives ŷ for any new X value |
| Inference | Tests H₀: ρ = 0 | Tests H₀: β₁ = 0 (and provides CI for β₁) |
| Connection | In simple regression: R² = r² and t-test for β₁ = t-test for r | |
The key practical difference: if a hiring manager wants to know "what salary should we offer someone with 8 years of experience?", correlation cannot answer that question. Regression can, by plugging X = 8 into the fitted equation. Correlation only tells you that experience and salary are positively related — it gives no equation for translation.
🧮 Simple Linear Regression Calculator
Enter paired X,Y values separated by commas. Use the same number of values for both. Example: X = 2,4,6,8,10 and Y = 50,60,72,82,96
Simple Linear Regression in Python, R, and Excel
All three tools produce the same coefficients. The differences are in how they display output and what additional diagnostics they expose automatically.
Python: sklearn and statsmodels Side by Side
Two packages dominate Python regression: sklearn for machine learning pipelines (minimal output) and statsmodels for statistical inference (full output with standard errors, p-values, and confidence intervals).
R: lm() Output Decoded
R's lm() function runs OLS and returns a model object. The summary() call shows the full table including standard errors, t-statistics, p-values, and the F-statistic for overall model significance.
Excel: Data Analysis ToolPak Walkthrough
Where Simple Linear Regression Is Used in Practice
Simple linear regression is not an academic exercise. It produces decision-relevant numbers across industries, and understanding how to read those numbers is a transferable skill.
Salary Benchmarking
HR teams fit experience (X) vs. salary (Y) to set offers at specific tenure levels. The slope gives average pay increase per year of experience.
Real Estate Pricing
Property valuation models start with simple regression — square footage predicting sale price — before adding room count, location, and other variables in multiple regression.
Sales Forecasting
Advertising spend (X) vs. revenue (Y) lets marketing teams predict returns on budget increases. The slope is the marginal return on each advertising dollar.
Clinical Dosing
Drug concentration in blood (Y) predicted from dosage (X) establishes linear pharmacokinetic models. Research published by the U.S. Food and Drug Administration describes regression-based dose-response analysis used in drug approval.
Climate Science
CO₂ concentration (X) vs. global temperature anomaly (Y). Simple linear regression was used in early climate modeling; more complex forms are used today but the interpretation of slope coefficients remains the same.
Education Research
Class size (X) predicting test scores (Y). A classic study from Harvard's Center for Education Policy Research used regression to estimate the effect of class size reduction on student achievement.
What a Regression Line Looks Like on a Scatter Plot
The scatter plot is the first tool in any regression analysis — before any coefficient is calculated. It answers two prior questions: Is the relationship approximately linear? Are there obvious outliers that might distort the fit?
Regression Line Anatomy — Residuals, Fitted Values, and the Line of Best Fit
The regression line always passes through (x̄, ȳ) — the point of sample means. The dashed vertical lines are residuals (eᵢ = yᵢ − ŷᵢ). OLS minimizes the sum of squared residual lengths.
Simple vs. Multiple Linear Regression: When to Use Each
Simple linear regression uses one predictor; multiple linear regression uses two or more — with each coefficient representing an effect while holding all other predictors constant.
The decision is not about preference but about your research question. If you want to know how study hours alone predict exam score, simple regression is appropriate. If you suspect that sleep, prior GPA, and study hours all matter and you want to isolate each effect, multiple regression is necessary — and omitting important predictors from a simple model produces biased coefficients (omitted variable bias).
If X and an omitted variable Z are both correlated with Y, the simple regression coefficient β̂₁ absorbs Z's effect — producing a number that does not reflect X's true causal impact. This is omitted variable bias, and it is the main reason simple regression is preliminary analysis, not a final model, in applied research. See the hypothesis testing guide for how p-values and t-tests apply to each coefficient.
Frequently Asked Questions
Simple linear regression is a statistical method that models the straight-line relationship between one predictor variable (X) and one outcome variable (Y) using the equation Y = β₀ + β₁X + ε. It fits a line through data by minimizing the Sum of Squared Errors (SSE) via Ordinary Least Squares (OLS). The slope β₁ shows how much Y changes per one-unit increase in X; the intercept β₀ is the predicted Y when X equals zero.
Correlation (Pearson r) measures the strength and direction of a linear relationship — it is symmetric and unitless. Simple linear regression goes further: it estimates a prediction equation (Ŷ = β₀ + β₁X) with coefficients in real units, allows you to predict Y for new X values, and quantifies uncertainty through confidence intervals. In simple regression, R² = r², but regression provides the equation that correlation cannot.
The four assumptions are captured by the LINE Check: Linearity (X and Y have a straight-line relationship — test with RESET), Independence of errors (residuals not autocorrelated — test with Durbin-Watson), Normality of residuals (errors approximately normal — test with Shapiro-Wilk + Q-Q plot), and Equal variance / Homoscedasticity (constant residual spread — test with Breusch-Pagan). Each violation has a distinct consequence and fix.
R-squared (R²) is the proportion of variance in Y that the regression model explains. An R² of 0.78 means the model accounts for 78% of the variation in the outcome. R² ranges from 0 (model explains nothing) to 1 (perfect fit). A high R² does not confirm causation or validate model assumptions — always inspect residual plots alongside R².
β₁ is the average change in Y for each one-unit increase in X. If β₁ = 5.7 with Y in points and X in hours, then each additional hour of study per week is associated with a 5.7-point increase in exam score. The sign gives direction (positive = X and Y increase together) and the magnitude gives the rate of change in Y-units per X-unit.
Ordinary Least Squares (OLS) is the algorithm that finds the regression line by minimizing the Sum of Squared Errors (SSE) = Σ(yᵢ − ŷᵢ)². The closed-form solution gives: β̂₁ = Σ(xᵢ−x̄)(yᵢ−ȳ) / Σ(xᵢ−x̄)² and β̂₀ = ȳ − β̂₁x̄. The Gauss-Markov Theorem proves that under the LINE assumptions, OLS is the Best Linear Unbiased Estimator (BLUE).
Use simple linear regression when you have one predictor and want to understand its isolated relationship with Y. Use multiple linear regression when two or more variables affect Y and you need to estimate each effect while controlling for the others. If an important predictor is omitted from a simple model, the coefficient you get is biased — it absorbs the omitted variable's effect.
Yes — once you have the equation Ŷ = β̂₀ + β̂₁X, you can plug in any new X value to get a predicted Y. However, predictions within the observed X range (interpolation) are much more reliable than predictions outside it (extrapolation). As X moves beyond the data range, confidence intervals widen rapidly — a phenomenon sometimes called the "Extrapolation Cliff." Always report a prediction interval alongside any point prediction.
Sources & Further Reading
- Montgomery, D.C., Peck, E.A., & Vining, G.G. (2021). Introduction to Linear Regression Analysis, 6th ed. Wiley. Standard graduate-level reference for OLS theory and diagnostics.
- James, G., Witten, D., Hastie, T., & Tibshirani, R. (2023). An Introduction to Statistical Learning, 2nd ed. Springer. Free PDF at statlearning.com. Chapter 3 covers simple and multiple linear regression.
- Penn State STAT 501 (2024). Regression Methods. Department of Statistics, Pennsylvania State University. Available at online.stat.psu.edu/stat501. Covers OLS assumptions and diagnostics with worked examples.
- UCLA Statistical Consulting Group. Simple Linear Regression. University of California, Los Angeles. Available at stats.oarc.ucla.edu.
- Gauss, C.F. (1809). Theoria motus corporum coelestium. First formal derivation of the least squares method.
- U.S. Food and Drug Administration. (2019). Statistical Tools for Drug Evaluation. FDA Science and Research. fda.gov.
Continue Learning
Related Topics at Statistics Fundamentals
Simple linear regression connects to several topics covered across Statistics Fundamentals. The scatter plots and correlation guide shows how to explore data before fitting a model. Hypothesis testing covers the t-test used to evaluate whether β₁ is significantly different from zero. Confidence intervals explains how to build the uncertainty range around each coefficient estimate. The normal distribution is the theoretical basis for residual normality. Sampling distributions explains why β̂₁ varies from sample to sample and what that variation implies for inference.