What is a regression line?

A regression line (line of best fit) is the straight line that minimizes the sum of squared residuals — the vertical distances between each data point and the line. It is defined by ŷ = b₀ + b₁x, where b₀ is the y-intercept and b₁ is the slope.

What is R² in regression?

R² (coefficient of determination) measures the proportion of variance in Y explained by the linear regression model. R² = 1 − SSE/SST. An R² of 0.85 means 85% of the variation in Y is explained by the model; the remaining 15% is unexplained.

What is the difference between a confidence interval and prediction interval?

A confidence interval for the mean response gives a range for the average Y at a given X. A prediction interval gives a range for an individual new observation at X. Prediction intervals are always wider than confidence intervals because they account for additional variability in individual observations.

Regression Line Scatter Plot — Free Linear Regression Graph Maker

Regression Scatter Plot Maker

Equation ŷ = b₀ + b₁x Least squares minimizes Σ(yᵢ − ŷᵢ)²

X, Y data — one pair per line (comma or tab separated)

X-axis label

Y-axis label

Chart title

Plot options

Confidence level

Point color

CI for mean ŷ ± t* · SE · √(1/n + (x−x̄)²/Sxx)

Generate the regression plot first, then enter an X value below to get the predicted Ŷ with confidence interval (for mean response) and prediction interval (for individual observation).

X value to predict

Generate the plot first. The step-by-step calculation will appear here.

Generate a regression plot to see step-by-step calculations

n—

r—

R²—

Slope—

SE—

Regression scatter plot

Regression Equation

Slope (b₁)—

Y-intercept (b₀)—

Equation—

Pearson r—

R² (coeff. of determination)—

Adjusted R²—

Standard Errors & Tests

Std error of estimate (Sₑ)—

SE of slope (Sb₁)—

t-statistic for slope—

p-value (slope ≠ 0)—

Significance—

Degrees of freedom—

Descriptive Statistics

n (sample size)—

Mean X (x̄)—

Mean Y (ȳ)—

Std dev X (sₓ)—

Std dev Y (sᵧ)—

Sxx = Σ(xᵢ−x̄)²—

Sum of Squares

SST (total)—

SSR (regression)—

SSE (error/residual)—

SSR / SST = R²—

Sxy = Σ(xᵢ−x̄)(yᵢ−ȳ)—

Syy = Σ(yᵢ−ȳ)²—

ANOVA Table

Source	SS	df	MS	F	p-value

Residuals Analysis

Max residual—

Min residual—

Mean residual—

SD of residuals—

Sum of residuals—

Sum of squared residuals—

First 20 observations shown. Residual = yᵢ − ŷᵢ. Standardized residual = eᵢ / Sₑ.

#	x	y	ŷ	Residual (e)	Std. Residual

Key Formulas

Slope (b₁)b₁ = Sxy / Sxx

Intercept (b₀)b₀ = ȳ − b₁x̄

R² (coeff. of det.)R² = SSR/SST = 1 − SSE/SST

Standard error (Sₑ)Sₑ = √(SSE/(n−2))

SE of slopeSb₁ = Sₑ / √Sxx

t-test for slopet = b₁ / Sb₁, df = n−2

R² Interpretation

R² = 0.9+ → Very strong fit
R² = 0.7–0.9 → Strong fit
R² = 0.5–0.7 → Moderate fit
R² = 0.3–0.5 → Weak fit
R² < 0.3 → Poor linear fit
Always check residual plot for patterns

Simple Linear Regression Guide

Full guide to regression analysis

Related Tools

Regression Examples

Click any example to load it into the tool

View:

What Is Simple Linear Regression?

Simple linear regression fits a straight line through a set of X,Y data points that minimizes the sum of squared residuals — the squared vertical distances between each observed Y and the predicted Ŷ on the line. The resulting line is called the least-squares regression line, and its equation is ŷ = b₀ + b₁x, where b₀ is the y-intercept and b₁ is the slope.

According to the NIST Engineering Statistics Handbook, linear regression is one of the most widely used statistical methods in science, engineering, economics, and the social sciences. It serves two primary purposes: explaining the relationship between variables (inference) and predicting future Y values for given X values (prediction).

Understanding the Regression Equation

Slope (b₁): The amount Y changes for every 1-unit increase in X. A slope of 2.3 means "on average, Y increases by 2.3 units when X increases by 1." Calculated as b₁ = Sxy/Sxx, where Sxy = Σ(xᵢ−x̄)(yᵢ−ȳ) and Sxx = Σ(xᵢ−x̄)².

Y-intercept (b₀): The predicted value of Y when X = 0. Calculated as b₀ = ȳ − b₁x̄. The intercept is not always meaningful — if X = 0 is outside the data range, the intercept is a mathematical artifact and should not be interpreted literally.

R² (coefficient of determination): The proportion of variance in Y explained by the linear model. R² = SSR/SST = 1 − SSE/SST. An R² of 0.84 means 84% of the variation in Y is explained by the regression model. The remaining 16% is unexplained residual variation.

Standard error of estimate (Sₑ): The average distance between observed Y values and the regression line, in the same units as Y. Sₑ = √(SSE/(n−2)). Smaller Sₑ means the line fits closer to the data. Used in confidence intervals and prediction intervals.

Confidence Bands vs Prediction Intervals

Interval type	Estimates	Width	Formula
Confidence band (CI)	Average Y for all observations at X = x*	Narrower; approaches 0 as n→∞	ŷ ± t* · Sₑ · √(1/n + (x*−x̄)²/Sxx)
Prediction interval (PI)	Individual new observation at X = x*	Always wider than CI	ŷ ± t* · Sₑ · √(1 + 1/n + (x*−x̄)²/Sxx)

Both intervals widen as X moves further from the mean x̄ — this is called the "bowtie" shape of the confidence band. The further you predict from the center of the data, the less reliable the estimate.

Testing Whether the Slope Is Significant

The t-test for the slope tests H₀: β₁ = 0 (no linear relationship) against H₁: β₁ ≠ 0. The test statistic is t = b₁/Sb₁, where Sb₁ = Sₑ/√Sxx is the standard error of the slope. With df = n−2, compare to the critical t value. If |t| exceeds the critical value (or p < α), reject H₀ and conclude the slope is significantly different from zero — evidence of a linear relationship.

Regression Assumptions (LINE)

L — Linearity: The relationship between X and Y is truly linear. Check by looking at the scatter plot — is a straight line reasonable? A curved pattern suggests a non-linear model (polynomial, logarithmic) may fit better.

I — Independence: Observations are independent of each other. Violated in time-series data (autocorrelation). Check with the Durbin-Watson test or by plotting residuals in order.

N — Normality of residuals: Residuals eᵢ = yᵢ − ŷᵢ should follow a normal distribution. Check with a normal probability plot (Q-Q plot) of residuals or the Shapiro-Wilk test.

E — Equal variance (homoscedasticity): The spread of residuals should be roughly constant across all values of X. Fan-shaped patterns in the residual plot indicate heteroscedasticity — a violation requiring transformation or weighted regression.

Regression Line Scatter Plot