What does R-squared mean in regression?

R-squared (R²) is the proportion of variance in Y explained by the regression model. An R² of 0.78 means the model accounts for 78% of the variation in the outcome. R² ranges from 0 to 1. A high R² does not confirm that the model's assumptions are met or that causation exists — a dataset can have R² = 0.99 while violating homoscedasticity. Always pair R² with a residual plot.

What are the 4 assumptions of simple linear regression?

The four assumptions of simple linear regression can be remembered with the LINE Check: Linearity (X and Y have a straight-line relationship), Independence (residuals are not correlated with each other), Normality of residuals (errors are approximately normally distributed, especially important for small samples), and Equal variance — Homoscedasticity (residual variance is constant across all values of X). Each assumption has a specific diagnostic test: Ramsey RESET, Durbin-Watson, Shapiro-Wilk, and Breusch-Pagan, respectively.

How do you interpret the slope coefficient in simple linear regression?

The slope coefficient β₁ represents the average change in Y for each one-unit increase in X, holding all else constant. If β₁ = 3.2 in a regression predicting exam score from study hours, it means each additional hour of study is associated with an average 3.2-point increase in exam score. The sign tells you the direction: positive means X and Y move together; negative means they move in opposite directions.

What is the difference between simple and multiple linear regression?

Simple linear regression uses one predictor variable (X) to predict Y: Y = β₀ + β₁X + ε. Multiple linear regression uses two or more predictors: Y = β₀ + β₁X₁ + β₂X₂ + ... + βₖXₖ + ε. In multiple regression, each β coefficient represents the effect of that variable while holding all other predictors constant, which is not a concern in the simple case with only one X.

How do you calculate the slope and intercept of a regression line?

The OLS slope formula is β₁ = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)². Once β₁ is known, the intercept is β₀ = ȳ − β₁x̄, where ȳ and x̄ are the sample means of Y and X. The regression line always passes through the point (x̄, ȳ). In Python, sklearn.linear_model.LinearRegression or statsmodels.OLS compute both coefficients in one call.

Simple Linear Regression (2026): Formula, Examples & Python Code

Q: What is simple linear regression?

Simple linear regression is a statistical method that models the straight-line relationship between one predictor variable (X) and one outcome variable (Y) using the equation Y = β₀ + β₁X + ε. It fits a line through data by minimizing the Sum of Squared Errors (SSE) — a method called Ordinary Least Squares (OLS). The slope β₁ shows how much Y changes per one-unit increase in X; the intercept β₀ is the predicted Y when X equals zero.

Q: What is the difference between correlation and simple linear regression?

Correlation (Pearson r) measures the strength and direction of a linear relationship between two variables — it is symmetric and unitless, ranging from −1 to +1. Simple linear regression goes further: it estimates a specific equation (Y = β₀ + β₁X + ε) that predicts Y from X, with β₁ expressed in the actual units of Y per unit of X. In simple linear regression, R² equals r², but regression provides the prediction equation and confidence intervals that correlation cannot.

Q: What is Ordinary Least Squares (OLS)?

Ordinary Least Squares (OLS) is the algorithm that fits a simple linear regression line by finding the slope (β₁) and intercept (β₀) that minimize the Sum of Squared Errors — the total squared distance between observed Y values and the fitted line. The Gauss-Markov Theorem proves that under the LINE assumptions, OLS produces the Best Linear Unbiased Estimator (BLUE) — no other linear estimator has lower variance.

What Is Simple Linear Regression?

Simple linear regression models the straight-line relationship between one predictor variable (X) and one outcome variable (Y).

Definition — Simple Linear Regression

Simple linear regression estimates a linear equation that predicts a continuous outcome (Y) from a single continuous predictor (X). The fitted line minimizes the total squared distance between observed data points and the line — a method called Ordinary Least Squares (OLS).

Y = β₀ + β₁X + ε

The word "simple" here means one predictor, not that the method is trivial. This distinguishes it from multiple linear regression, which handles two or more predictors. The regression line is the single straight line that sits as close as possible to all observed data points simultaneously — and "closeness" is defined by squared vertical distances.

Regression was introduced by Francis Galton in the late 19th century studying the inheritance of height — he noticed that tall parents tend to have children who are tall but closer to average, a phenomenon he called "regression to the mean." The statistical machinery that makes prediction precise came later through the work of Karl Pearson and Ronald Fisher. For a treatment of related foundational topics, the descriptive statistics and statistics and probability guides at Statistics Fundamentals cover the mean, variance, and probability concepts that underpin everything here.

⚡ Quick Reference — Simple Linear Regression Key Facts

Equation: Y = β₀ + β₁X + ε — where β₀ is the intercept, β₁ is the slope, ε is the error term
Goal: Minimize SSE = Σ(yᵢ − ŷᵢ)² — the sum of squared residuals
Slope formula: β₁ = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)²
Intercept formula: β₀ = ȳ − β₁x̄ (the line always passes through (x̄, ȳ))
Model fit: R² = SSR / SST = 1 − SSE/SST — proportion of variance explained
Assumptions: Use the LINE Check — Linearity, Independence, Normality, Equal variance

The Regression Equation: Y = β₀ + β₁X + ε

Every simple linear regression model has the same structure. Breaking it down term by term removes the mystery.

Simple Linear Regression — Population Model

Y = β₀ + β₁X + ε

The true relationship in the population — estimated from sample data

Y = outcome variable (dependent) X = predictor variable (independent) β₀ = intercept (Y when X = 0) β₁ = slope (change in Y per unit X) ε = error term (unexplained variation)

What Is β₁? Interpreting the Slope Coefficient

β₁ is the average change in Y for each one-unit increase in X, holding everything else constant.

If β₁ = 3.2 in a model predicting exam score (Y) from study hours (X), that means each additional hour of study per week is associated with an average 3.2-point increase in exam score. The coefficient is expressed in the units of Y per unit of X — so if Y is in dollars and X is in years, β₁ is in dollars per year.

The sign of β₁ tells you direction: positive means X and Y move together (more X → more Y); negative means they move in opposite directions. A slope of exactly zero means X provides no linear information about Y.

What Is β₀? Interpreting the Intercept

β₀ is the predicted value of Y when X equals zero.

Sometimes this is meaningful — if X is age (years) and Y is blood pressure, β₀ would be predicted blood pressure at birth. Often the intercept has no practical interpretation because X = 0 is outside the observed data range or physically impossible. In those cases, β₀ is a mathematical anchor that positions the line correctly, not a value you should report or interpret on its own.

What Is ε? The Error Term Explained

ε captures all variation in Y that X does not explain — noise, unmeasured variables, and genuine randomness.

In practice, you never observe ε directly. What you observe are residuals — the differences between actual Y values and the fitted line's predictions: eᵢ = yᵢ − ŷᵢ. Examining the pattern of residuals is the primary way to check whether your regression model is valid. Residuals should look random; any systematic pattern signals a problem with the model.

Ordinary Least Squares (OLS): How the Line Is Fitted

OLS finds the slope and intercept that minimize the Sum of Squared Errors (SSE) — the total squared distance between observed Y values and the fitted line.

Why squared? Squaring prevents positive and negative errors from canceling out, and it penalizes large errors more than small ones — which makes outliers influential. The minimization problem has a closed-form solution, meaning you do not need an iterative algorithm. The formulas below give the exact answers in one calculation.

OLS Estimation Formulas

β̂₁ = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)²

β̂₀ = ȳ − β̂₁ · x̄

The hat (^) means "estimated from data" — distinguishing sample estimates from true population parameters

x̄, ȳ = sample means of X and Y Σ = sum over all n observations β̂₁ = estimated slope β̂₀ = estimated intercept

📐

The Gauss-Markov Theorem

Under the four LINE assumptions, OLS is the Best Linear Unbiased Estimator (BLUE) — meaning no other linear estimator produces lower-variance estimates of β₀ and β₁. This theorem, proved by Carl Friedrich Gauss and Andrey Markov, is the theoretical foundation that justifies using OLS for inference, not just description. See the Gauss-Markov theorem for the formal proof.

Worked Example: Calculating Slope, Intercept, and R² by Hand

The best way to understand what OLS is doing is to run the numbers once manually. The dataset below tracks weekly study hours (X) and exam scores (Y) for five students — small enough to follow step by step.

Student	Hours / week (X)	Exam Score (Y)	xᵢ − x̄	yᵢ − ȳ	(xᵢ−x̄)(yᵢ−ȳ)	(xᵢ−x̄)²
A	2	50	−4	−22	88	16
B	4	60	−2	−12	24	4
C	6	72	0	0	0	0
D	8	82	2	10	20	4
E	10	96	4	24	96	16
Mean	x̄ = 6	ȳ = 72	—	—	Σ = 228	Σ = 40

Worked Example — Full OLS Calculation

Find the regression equation and R² for the study hours vs. exam score data above.

Calculate β̂₁ (slope): β̂₁ = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)² = 228 / 40 = 5.7

Calculate β̂₀ (intercept): β̂₀ = ȳ − β̂₁ · x̄ = 72 − (5.7 × 6) = 72 − 34.2 = 37.8

Write the equation: Ŷ = 37.8 + 5.7X — so a student studying 7 hours would be predicted to score 37.8 + (5.7 × 7) = 77.7 points

Calculate SST: SST = Σ(yᵢ − ȳ)² = (−22)² + (−12)² + 0² + 10² + 24² = 484 + 144 + 0 + 100 + 576 = 1,304

Calculate SSE: Compute each ŷᵢ = 37.8 + 5.7xᵢ, then SSE = Σ(yᵢ − ŷᵢ)² ≈ 2.8 (nearly perfect fit — the data was designed to be linear)

Calculate R²: R² = 1 − SSE/SST = 1 − 2.8/1304 ≈ 0.9979 — the model explains 99.8% of the variance in exam scores

✓ Regression equation: Ŷ = 37.8 + 5.7X. Interpretation: each additional hour of study per week is associated with an average 5.7-point increase in exam score. R² = 0.998 indicates the line fits nearly perfectly.

📊 Interactive Regression Visualizer

Click on the plot to add data points. The regression line and statistics update in real time.

Points (n)

—

Slope β₁

—

Intercept β₀

—

R²

How to Interpret R-Squared (R²)

R-squared measures the proportion of variance in Y that the model explains — it ranges from 0 to 1, where 1 means a perfect fit.

The three variance components that define R² are worth knowing by name:

Variance Decomposition — SST = SSR + SSE

R² = SSR / SST = 1 − SSE / SST

SST = Σ(yᵢ − ȳ)² — Total Sum of Squares (total variation in Y) SSR = Σ(ŷᵢ − ȳ)² — Regression Sum of Squares (variation explained by X) SSE = Σ(yᵢ − ŷᵢ)² — Error Sum of Squares (unexplained residual variation)

SST is fixed — it is the total variation in your outcome data regardless of any model. OLS partitions SST into the part the model explains (SSR) and the part it doesn't (SSE). R² is simply SSR's share of SST.

R² = 0

Model explains nothing beyond the mean

R² = 0.50

Model explains half the variance

R² = 0.80

Strong fit — typical in social science

R² = 1.0

Perfect fit — rare in real data

⚠️

R² Does Not Confirm Causation or Model Validity

A dataset can have R² = 0.99 while violating homoscedasticity. Anscombe's Quartet — four datasets with identical R², slope, and intercept but wildly different shapes — demonstrates this. Always plot residuals. R² measures fit, not correctness. A high R² with an invalid model produces unreliable predictions. For the relationship between R² and the Pearson correlation, see the scatter plots and correlation guide.

Pearson r vs. R²: The Exact Relationship

In simple linear regression (one predictor only), R² equals the square of the Pearson correlation coefficient: R² = r². This means r = √R² with the sign of β₁. If R² = 0.64, then |r| = 0.80. This relationship does not hold in multiple regression, where R² and the individual pairwise correlations diverge.

The LINE Check: 4 Regression Assumptions and How to Test Them

The four assumptions of simple linear regression can be remembered with the LINE Check: Linearity, Independence of errors, Normality of residuals, and Equal variance (Homoscedasticity).

The LINE Check is a complete diagnostic protocol, not just a memory aid. Each letter maps to a specific assumption, a specific diagnostic test, and a specific fix when the test fails. Run all four checks before trusting any regression output.

The LINE Check™ — Complete Diagnostic Protocol for Simple Linear Regression

Linearity — The relationship between X and Y must be approximately straight

Plot Y versus X before fitting the model. A curved pattern means linear regression will produce biased coefficients — the estimated β₁ will be wrong, not just imprecise. This is the most fundamental check; if linearity fails, no amount of fixing the other assumptions saves the model.

Diagnostic: Ramsey RESET test | Plot: Y vs X scatter, residuals vs fitted

Fix: Log-transform X or Y, add a polynomial term (X²), or use nonlinear regression

Independence — Residuals must not be correlated with each other

Independence is violated when observations are collected over time (autocorrelation) or clustered in groups. When residuals are correlated, standard errors are underestimated and p-values become unreliable. This assumption matters most for time-series or longitudinal data and is often irrelevant for cross-sectional surveys.

Diagnostic: Durbin-Watson statistic (target: 1.5–2.5) | Plot: residuals in collection order

Fix: Add a lagged residual term, use Generalized Least Squares (GLS), or cluster robust standard errors

Normality — Residuals (not Y itself) should be approximately normally distributed

A common error is checking whether Y is normal. That is the wrong variable. The normality assumption applies to ε — the residuals. This assumption matters most for small samples (n < 30) where inference relies on the normal distribution. For large samples, the Central Limit Theorem makes this assumption less critical because the sampling distribution of β̂₁ converges to normal regardless.

Diagnostic: Shapiro-Wilk test, Q-Q plot of residuals

Fix: Increase sample size, apply robust regression, or transform Y (log, square root)

Equal Variance (Homoscedasticity) — Residual variance must be constant across all X values

Heteroscedasticity produces a fan-shaped residual plot — residuals spread wider as X increases. This is the most common violation in practice, especially in salary, income, or financial data where variability grows with the level. When present, OLS estimates are still unbiased but no longer efficient — standard errors are wrong, so hypothesis tests and confidence intervals cannot be trusted.

Diagnostic: Breusch-Pagan test | Plot: residuals vs fitted values (fan = fail)

Fix: Log-transform Y, use Weighted Least Squares (WLS), or apply heteroscedasticity-robust (HC) standard errors

✅

What Happens When Assumptions Are Violated

Violating Linearity → biased coefficients. Violating Independence → wrong standard errors (usually too small). Violating Normality → unreliable p-values (mainly in small samples). Violating Equal Variance → inefficient estimates with incorrect standard errors. Each violation has a distinct consequence and a different fix.

What Is the Difference Between Correlation and Simple Linear Regression?

Correlation measures the strength and direction of a linear relationship; regression estimates a prediction equation with specific coefficients in real-world units.

Feature	Pearson Correlation (r)	Simple Linear Regression
What it produces	A single number (r) from −1 to +1	An equation: Ŷ = β₀ + β₁X
Direction of relationship	Symmetric: r(X,Y) = r(Y,X)	Asymmetric: X predicts Y, not the reverse
Units	Unitless (standardized)	β₁ in units of Y per unit of X
Prediction	No — measures association only	Yes — gives ŷ for any new X value
Inference	Tests H₀: ρ = 0	Tests H₀: β₁ = 0 (and provides CI for β₁)
Connection	In simple regression: R² = r² and t-test for β₁ = t-test for r

The key practical difference: if a hiring manager wants to know "what salary should we offer someone with 8 years of experience?", correlation cannot answer that question. Regression can, by plugging X = 8 into the fitted equation. Correlation only tells you that experience and salary are positively related — it gives no equation for translation.

🧮 Simple Linear Regression Calculator

Enter paired X,Y values separated by commas. Use the same number of values for both. Example: X = 2,4,6,8,10 and Y = 50,60,72,82,96

X values (comma-separated)

Y values (comma-separated)

Simple Linear Regression in Python, R, and Excel

All three tools produce the same coefficients. The differences are in how they display output and what additional diagnostics they expose automatically.

Python: sklearn and statsmodels Side by Side

Two packages dominate Python regression: sklearn for machine learning pipelines (minimal output) and statsmodels for statistical inference (full output with standard errors, p-values, and confidence intervals).

# ── Method 1: statsmodels — gives full statistical output ──
import statsmodels.api as sm
import numpy as np

X = np.array([2, 4, 6, 8, 10])
y = np.array([50, 60, 72, 82, 96])

X_with_const = sm.add_constant(X)  # Adds intercept term
model = sm.OLS(y, X_with_const).fit()
print(model.summary())

# Key values extracted directly:
print(f"Intercept β₀: {model.params[0]:.4f}")   # 37.8
print(f"Slope β₁:     {model.params[1]:.4f}")   # 5.7
print(f"R-squared:    {model.rsquared:.4f}")     # 0.9979
print(f"p-value β₁:   {model.pvalues[1]:.4f}") # significance of slope

# Checking homoscedasticity — Breusch-Pagan test
from statsmodels.stats.diagnostic import het_breuschpagan
bp_stat, bp_pval, _, _ = het_breuschpagan(model.resid, model.model.exog)
print(f"Breusch-Pagan p-value: {bp_pval:.4f}")  # > 0.05 → homoscedastic

# ── Method 2: sklearn — minimal, pipeline-friendly ──
from sklearn.linear_model import LinearRegression

model_sk = LinearRegression().fit(X.reshape(-1,1), y)
print(f"β₀: {model_sk.intercept_:.4f}, β₁: {model_sk.coef_[0]:.4f}")
print(f"R²: {model_sk.score(X.reshape(-1,1), y):.4f}")
        

R: lm() Output Decoded

R's lm() function runs OLS and returns a model object. The summary() call shows the full table including standard errors, t-statistics, p-values, and the F-statistic for overall model significance.

# ── Simple linear regression in R ──
X <- c(2, 4, 6, 8, 10)
y <- c(50, 60, 72, 82, 96)

model <- lm(y ~ X)
summary(model)

# Output includes:
#   Coefficients:            Estimate Std. Error t value Pr(>|t|)
#   (Intercept)              37.800     ...      ...     ...
#   X                         5.700     ...      ...     ***
#   R-squared: 0.9979

# Residual diagnostics — 4 plots at once
par(mfrow=c(2,2)); plot(model)

# Breusch-Pagan test for homoscedasticity
library(lmtest)
bptest(model)  # p > 0.05 means residual variance is constant

# Durbin-Watson test for independence
dwtest(model)  # target: statistic between 1.5 and 2.5

# Prediction for new value
predict(model, newdata=data.frame(X=7), interval="confidence")
# Returns: fit=77.7, lwr=..., upr=... (95% confidence interval)
        

Excel: Data Analysis ToolPak Walkthrough

// Excel — enable Data Analysis ToolPak first:
// File → Options → Add-ins → Analysis ToolPak → Go → check → OK

// Then: Data tab → Data Analysis → Regression
// Input Y Range: your Y column (including header)
// Input X Range: your X column (including header)
// Check: Labels, Confidence Level (95%), Residuals, Residual Plots

// Manual formula alternative — in any cell:
=SLOPE(B2:B6, A2:A6)       // returns β₁ = 5.7
=INTERCEPT(B2:B6, A2:A6)   // returns β₀ = 37.8
=RSQ(B2:B6, A2:A6)          // returns R² = 0.9979

// Predict Y for X = 7:
=FORECAST(7, B2:B6, A2:A6)   // returns 77.7

// Note: Excel does not display p-values for Breusch-Pagan or
// Durbin-Watson tests. For full diagnostics, use Python or R.
        

Where Simple Linear Regression Is Used in Practice

Simple linear regression is not an academic exercise. It produces decision-relevant numbers across industries, and understanding how to read those numbers is a transferable skill.

💰

Salary Benchmarking

HR teams fit experience (X) vs. salary (Y) to set offers at specific tenure levels. The slope gives average pay increase per year of experience.

🏠

Real Estate Pricing

Property valuation models start with simple regression — square footage predicting sale price — before adding room count, location, and other variables in multiple regression.

📈

Sales Forecasting

Advertising spend (X) vs. revenue (Y) lets marketing teams predict returns on budget increases. The slope is the marginal return on each advertising dollar.

🏥

Clinical Dosing

Drug concentration in blood (Y) predicted from dosage (X) establishes linear pharmacokinetic models. Research published by the U.S. Food and Drug Administration describes regression-based dose-response analysis used in drug approval.

🌍

Climate Science

CO₂ concentration (X) vs. global temperature anomaly (Y). Simple linear regression was used in early climate modeling; more complex forms are used today but the interpretation of slope coefficients remains the same.

🎓

Education Research

Class size (X) predicting test scores (Y). A classic study from Harvard's Center for Education Policy Research used regression to estimate the effect of class size reduction on student achievement.

What a Regression Line Looks Like on a Scatter Plot

The scatter plot is the first tool in any regression analysis — before any coefficient is calculated. It answers two prior questions: Is the relationship approximately linear? Are there obvious outliers that might distort the fit?

Regression Line Anatomy — Residuals, Fitted Values, and the Line of Best Fit

The regression line always passes through (x̄, ȳ) — the point of sample means. The dashed vertical lines are residuals (eᵢ = yᵢ − ŷᵢ). OLS minimizes the sum of squared residual lengths.

Simple vs. Multiple Linear Regression: When to Use Each

Simple linear regression uses one predictor; multiple linear regression uses two or more — with each coefficient representing an effect while holding all other predictors constant.

The decision is not about preference but about your research question. If you want to know how study hours alone predict exam score, simple regression is appropriate. If you suspect that sleep, prior GPA, and study hours all matter and you want to isolate each effect, multiple regression is necessary — and omitting important predictors from a simple model produces biased coefficients (omitted variable bias).

ℹ️

When Simple Regression Gives Biased Answers

If X and an omitted variable Z are both correlated with Y, the simple regression coefficient β̂₁ absorbs Z's effect — producing a number that does not reflect X's true causal impact. This is omitted variable bias, and it is the main reason simple regression is preliminary analysis, not a final model, in applied research. See the hypothesis testing guide for how p-values and t-tests apply to each coefficient.

Frequently Asked Questions

Simple linear regression is a statistical method that models the straight-line relationship between one predictor variable (X) and one outcome variable (Y) using the equation Y = β₀ + β₁X + ε. It fits a line through data by minimizing the Sum of Squared Errors (SSE) via Ordinary Least Squares (OLS). The slope β₁ shows how much Y changes per one-unit increase in X; the intercept β₀ is the predicted Y when X equals zero.

Correlation (Pearson r) measures the strength and direction of a linear relationship — it is symmetric and unitless. Simple linear regression goes further: it estimates a prediction equation (Ŷ = β₀ + β₁X) with coefficients in real units, allows you to predict Y for new X values, and quantifies uncertainty through confidence intervals. In simple regression, R² = r², but regression provides the equation that correlation cannot.

The four assumptions are captured by the LINE Check: Linearity (X and Y have a straight-line relationship — test with RESET), Independence of errors (residuals not autocorrelated — test with Durbin-Watson), Normality of residuals (errors approximately normal — test with Shapiro-Wilk + Q-Q plot), and Equal variance / Homoscedasticity (constant residual spread — test with Breusch-Pagan). Each violation has a distinct consequence and fix.

R-squared (R²) is the proportion of variance in Y that the regression model explains. An R² of 0.78 means the model accounts for 78% of the variation in the outcome. R² ranges from 0 (model explains nothing) to 1 (perfect fit). A high R² does not confirm causation or validate model assumptions — always inspect residual plots alongside R².

β₁ is the average change in Y for each one-unit increase in X. If β₁ = 5.7 with Y in points and X in hours, then each additional hour of study per week is associated with a 5.7-point increase in exam score. The sign gives direction (positive = X and Y increase together) and the magnitude gives the rate of change in Y-units per X-unit.

Ordinary Least Squares (OLS) is the algorithm that finds the regression line by minimizing the Sum of Squared Errors (SSE) = Σ(yᵢ − ŷᵢ)². The closed-form solution gives: β̂₁ = Σ(xᵢ−x̄)(yᵢ−ȳ) / Σ(xᵢ−x̄)² and β̂₀ = ȳ − β̂₁x̄. The Gauss-Markov Theorem proves that under the LINE assumptions, OLS is the Best Linear Unbiased Estimator (BLUE).

Use simple linear regression when you have one predictor and want to understand its isolated relationship with Y. Use multiple linear regression when two or more variables affect Y and you need to estimate each effect while controlling for the others. If an important predictor is omitted from a simple model, the coefficient you get is biased — it absorbs the omitted variable's effect.

Yes — once you have the equation Ŷ = β̂₀ + β̂₁X, you can plug in any new X value to get a predicted Y. However, predictions within the observed X range (interpolation) are much more reliable than predictions outside it (extrapolation). As X moves beyond the data range, confidence intervals widen rapidly — a phenomenon sometimes called the "Extrapolation Cliff." Always report a prediction interval alongside any point prediction.

Sources & Further Reading

Montgomery, D.C., Peck, E.A., & Vining, G.G. (2021). Introduction to Linear Regression Analysis, 6th ed. Wiley. Standard graduate-level reference for OLS theory and diagnostics.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2023). An Introduction to Statistical Learning, 2nd ed. Springer. Free PDF at statlearning.com. Chapter 3 covers simple and multiple linear regression.
Penn State STAT 501 (2024). Regression Methods. Department of Statistics, Pennsylvania State University. Available at online.stat.psu.edu/stat501. Covers OLS assumptions and diagnostics with worked examples.
UCLA Statistical Consulting Group. Simple Linear Regression. University of California, Los Angeles. Available at stats.oarc.ucla.edu.
Gauss, C.F. (1809). Theoria motus corporum coelestium. First formal derivation of the least squares method.
U.S. Food and Drug Administration. (2019). Statistical Tools for Drug Evaluation. FDA Science and Research. fda.gov.

Simple Linear Regression: Formula, OLS, and the LINE Assumptions Explained