How do you calculate residuals?

Subtract the predicted value from the observed value: e = y − ŷ. For example, if a model predicts 50 but the actual value is 55, the residual is 55 − 50 = 5.

What is a residual plot?

A residual plot graphs residuals on the y-axis against fitted values (or the predictor) on the x-axis. A good model produces random scatter with no visible pattern; systematic patterns indicate assumption violations.

What is the difference between residuals and errors?

An error (ε) is the unobservable true deviation between a data point and the population regression line. A residual (e) is the observable estimate of that error, computed from the fitted sample line.

What are standardized residuals?

Standardized residuals are raw residuals divided by their standard deviation: r = e / s_e. Values beyond ±3 are typically treated as outlier candidates.

Residuals in Regression: Complete Guide to Analysis & Diagnostics

What Are Residuals?

Definition — Residual

A residual is the difference between an observed value and the predicted value produced by a regression or statistical model. It measures how far the model missed the actual outcome for a single observation.

e_i = y_i − ŷ_i

When you fit a regression line through data, the line does not pass through every point. The vertical distance from each data point to the regression line is that observation's residual. A point sitting above the line has a positive residual; a point below has a negative residual; a point sitting exactly on the line has a zero residual.

The word residual comes from the Latin residuus, meaning "remaining." In modeling terms, it is what remains unexplained after the model accounts for the relationship between the predictor and the outcome. That unexplained portion contains two things: genuine random variation in the data and any systematic patterns the model failed to capture. Residual analysis is the process of examining those leftovers to decide which of those two it is.

Residuals are the observable estimates of the true error term, ε, in the population regression equation. You can never see ε directly because the true population line is unknown. What you can see, and study, are the residuals from your fitted sample line. Everything described on this page uses the simple linear regression and multiple linear regression framework developed across Statistics Fundamentals.

e = y − ŷ

Basic Residual Formula

Sum of OLS Residuals

±3

Standardized Outlier Threshold

n − p − 1

Residual Degrees of Freedom

Actual vs Predicted Values

Every regression prediction generates two numbers for each observation: the actual (observed) value y and the predicted value ŷ (pronounced "y-hat"). The actual value is what you measured. The predicted value is what your fitted model outputs when you plug in the predictor values for that observation.

Think of a model predicting apartment rents based on square footage. An apartment with 800 sq ft might rent for $1,450. The model, based on the regression line, predicts $1,380 for that size. The residual is $1,450 − $1,380 = $70. The model underestimated rent by $70 for this unit. That $70 is unexplained by square footage alone — perhaps because of location, floor level, or renovation quality. Examining many residuals together tells you whether your single predictor is enough or whether you need more variables.

Why Residuals Matter

Residuals are not just a byproduct of fitting a model. They are a diagnostic tool that directly tells you whether the core assumptions of ordinary least squares regression hold. Those assumptions are linearity, constant variance (homoscedasticity), independence, and normality of errors. When they hold, statistical inference from the model — p-values, confidence intervals, predictions — is valid. When they do not, the inference can mislead you badly.

Patterns in residuals reveal specific problems. A curved pattern suggests the true relationship is nonlinear. A funnel shape where residuals spread out as fitted values increase signals heteroscedasticity. Clustered residuals suggest autocorrelation. A few very large residuals point to outliers or data entry errors. Identifying and addressing these issues leads to more reliable models and better predictions.

📌

Featured Snippet — What Are Residuals?

Residuals are the differences between observed values and predicted values in a regression model. The formula is e = y − ŷ. Positive residuals indicate underprediction; negative residuals indicate overprediction. Residual analysis tests whether model assumptions hold and whether the model fits the data adequately.

Residual Formula Library

Basic Residual Formula

e_i = y_i − ŷ_i

e_i = residual for observation i y_i = observed (actual) value ŷ_i = predicted value from the model

This is the foundational formula. For any regression model, whether simple or multiple, linear or polynomial, the raw residual is always actual minus predicted. Note the direction: it is always observed minus predicted, never predicted minus observed. This convention means a positive residual tells you the model undershot the actual value, and a negative residual tells you it overshot.

Standardized Residual Formula

Standardized Residual

r_i = e_i / s_e

r_i = standardized residual e_i = raw residual s_e = residual standard error

Dividing the raw residual by the residual standard error puts all residuals on a common scale, regardless of the units of the response variable. A standardized residual of 2.5 tells you that observation is 2.5 standard deviations away from the fitted value — regardless of whether your response is measured in dollars, kilograms, or percentages. This makes outlier screening consistent across datasets.

Studentized (Externally Studentized) Residual Formula

Externally Studentized Residual

t_i = e_i / (s_(i) × √(1 − h_ii))

s_(i) = model SE with observation i deleted h_ii = leverage (hat value) for obs i t_i follows t-distribution with n−p−2 df

Studentized residuals improve on standardized residuals by accounting for the fact that deleting an influential observation would actually change the model's error estimate. They follow a t-distribution under the null hypothesis that no outliers are present, so you can compute exact p-values for outlier tests. These are the residuals preferred in formal regression diagnostics software.

Sum of Squared Residuals (SSR)

Sum of Squared Residuals — SSR

SSR = Σ (y_i − ŷ_i)²

SSR = also called RSS or SSE in some texts OLS minimizes this quantity Used to compute R², MSE, and F-statistic

Ordinary least squares (OLS) estimation finds the regression coefficients that minimize the sum of squared residuals. Squaring serves two purposes: it penalizes large errors more than small ones, and it eliminates the sign so positive and negative residuals do not cancel. The SSR feeds directly into R-squared, mean squared error, and the F-test for overall model significance. For more on how these connect, see the guide to ANOVA in regression.

Residual Standard Error and Residual Variance

Residual Variance and Standard Error

s² = SSR / (n − p − 1) → s_e = √s²

n = number of observations p = number of predictors n − p − 1 = residual degrees of freedom

The residual standard error (often reported as RSE or s) is the average distance between observations and the fitted regression line, expressed in the same units as the response variable. A model predicting house prices with RSE = $25,000 misses by roughly $25,000 on average. Smaller RSE means a closer fit. The denominator n − p − 1 corrects for the degrees of freedom lost when estimating the regression coefficients. See the degrees of freedom guide for details on why this correction matters.

Formula	Use Case	Key Property
e = y − ŷ	Calculate any individual residual	Signed; sum = 0 in OLS
r = e / s_e	Standardized outlier screening	Unitless; compare across models
t_i = e / (s_(i)√(1−h_ii))	Formal outlier significance test	Follows t-distribution
SSR = Σ(y − ŷ)²	Measure total unexplained variation	Minimized by OLS
s² = SSR/(n−p−1)	Residual variance estimate	Used in s_e, R², F-test

How to Calculate Residuals — Step by Step

Calculating a residual is straightforward once you have a fitted model. The six steps below walk through the process from start to finish, whether you are doing it by hand or checking software output.

Obtain the Observed Value (y)

Record the actual measured outcome for each observation. This is your raw data — the thing you measured or recorded, before any modeling. In a dataset of student exam scores, y is the actual score each student received.

Obtain the Predicted Value (ŷ)

Plug each observation's predictor values into the fitted regression equation to generate ŷ. For simple linear regression ŷ = β̂₀ + β̂₁x. For multiple regression, substitute all predictor values. Statistical software computes these automatically when you run a regression.

Subtract: e = y − ŷ

Apply the formula. Always observed minus predicted. A student who scored 88 when the model predicted 80 has a residual of 88 − 80 = +8. Never reverse the subtraction — the sign carries meaning about the direction of the miss.

Interpret the Direction

A positive residual means the model underestimated — the actual value exceeded the prediction. A negative residual means the model overestimated. A zero residual means the prediction was exact. Direction matters for identifying systematic bias in the model.

Evaluate the Magnitude

Compare the size of the residual to the residual standard error. A residual of 5 in a model with s_e = 1 is very large (5 SDs away). The same residual of 5 in a model with s_e = 20 is unremarkable (0.25 SDs). Context from the standardized residual is necessary for any meaningful size judgment.

Examine All Residuals Together

Plot residuals against fitted values. Look for random scatter (good) vs. patterns (bad). A single large residual might be an outlier or a data error. A systematic pattern across all residuals is a model specification problem. The residual plot guide below covers all the patterns and what they mean.

⚠️

Common Calculation Mistake

The most frequent error is computing ŷ − y instead of y − ŷ. This reverses the sign of every residual and inverts the interpretation. Always check: positive residual = model underpredicted = actual was higher than predicted.

Residual Interpretation Guide

Residual Value	Direction	Meaning	Action
Positive (e > 0)	Upward miss	Model underestimated; actual was higher than predicted	No action if random
Negative (e < 0)	Downward miss	Model overestimated; actual was lower than predicted	No action if random
Near zero	On target	Model predicted accurately for this observation	Good sign
Large positive (standardized > +2)	Significant underestimate	Possible outlier or missing predictor	Investigate observation
Large negative (standardized < −2)	Significant overestimate	Possible outlier or data error	Investigate observation
Standardized > ±3	Extreme deviation	Strong outlier candidate	Check for data entry error; assess Cook's D

Residual interpretation works at two levels: the individual level and the collective level. At the individual level, a large residual for a single observation raises a flag that you investigate specifically — was the data recorded correctly? Is this a genuinely unusual case? At the collective level, you are looking at the distribution and pattern of all residuals together. The collective view is more diagnostic; it tells you about the model, not just about individual data points.

One property of OLS regression worth remembering: the residuals always sum to zero when the model includes an intercept. This means the model cannot be consistently biased upward or downward across all observations. Individual residuals can be large in either direction, but they balance out. This also means the mean residual is zero by construction. If you find the mean of your residuals is not zero, something is wrong with how the model was estimated.

Residual properties: Draper, N.R. & Smith, H. (1998). Applied Regression Analysis (3rd ed.). Wiley. Chapter 2: Fitting a Straight Line by Least Squares. Purdue Statistics — Regression Residuals Reference.

Worked Examples

Example 1 — Simple Linear Regression

Worked Example 1 — Simple Linear Regression

Dataset: Hours studied (x) vs Exam score (y). Fitted regression: ŷ = 50 + 4x. Calculate residuals for all five students and the SSR.

Student	Hours (x)	Score (y)	Predicted ŷ = 50 + 4x	Residual e = y − ŷ	e²
A	3	60	62	−2	4
B	5	73	70	+3	9
C	7	80	78	+2	4
D	9	85	86	−1	1
E	10	93	90	+3	9
Sum (Σ):				+5	27

Note: The sum of residuals is +5, not zero. This occurs because the coefficients shown (50, 4) are not the exact OLS solution for this particular dataset — they are illustrative. The actual OLS fit would force Σe = 0 exactly.

SSR calculation: SSR = 4 + 9 + 4 + 1 + 9 = 27

Residual variance: s² = SSR / (n − p − 1) = 27 / (5 − 1 − 1) = 27/3 = 9

Residual standard error: s_e = √9 = 3 points. The model's predictions are off by about 3 exam score points on average.

✅ The residuals are small and mixed in sign, suggesting a reasonable fit. With s_e = 3, predictions for this range of study hours are accurate to within roughly ±3 exam points.

Example 2 — House Price Prediction

Worked Example 2 — Real Estate Regression

A model predicts house prices using square footage: ŷ = 80,000 + 150x. A 1,400 sq ft house sold for $310,000. What is the residual?

Applying the Residual Formula

e = 310,000 − [80,000 + 150(1,400)]

Compute ŷ: ŷ = 80,000 + 150 × 1,400 = 80,000 + 210,000 = $290,000

Compute residual: e = 310,000 − 290,000 = +$20,000

Interpret: The model underpredicted by $20,000. This house sold for more than predicted based on size alone — likely because of location, renovation quality, or other features not in the model.

✅ Residual = +$20,000 (positive = underestimated). The house commanded a $20,000 premium above what the model predicted from square footage alone.

Example 3 — Sales Forecasting

Worked Example 3 — Sales Forecasting

A time-series regression model predicts monthly sales: ŷ = 12,000 + 800t (where t = month number). In month 6, actual sales were $15,200. Calculate and interpret the residual.

Predicted sales in month 6: ŷ = 12,000 + 800(6) = 12,000 + 4,800 = $16,800

Residual: e = 15,200 − 16,800 = −$1,600

Interpretation: The model overestimated sales by $1,600 in month 6. If this pattern (model consistently overpredicting in certain months) repeats, it signals seasonality that the linear trend model is not capturing. The model needs a seasonal component.

✅ Residual = −$1,600 (negative = overestimated). Investigate whether negative residuals cluster in particular months — that would signal a seasonal pattern the model is missing.

Methodology follows NIST/SEMATECH e-Handbook of Statistical Methods. NIST — Residuals and Influence in Regression. National Institute of Standards and Technology.

Residual Plots Explained

A residual plot is a scatterplot with fitted values (ŷ) on the x-axis and residuals (e) on the y-axis. It is the single most useful diagnostic tool in regression analysis. Reading a residual plot requires knowing what good looks like versus what each type of bad pattern signals. The interactive canvas below illustrates each pattern type.

The Five Residual Plot Patterns

✓ Good Pattern

Random Scatter

Residuals scattered randomly above and below zero with no trend and roughly constant spread. This is what you want to see — it confirms linearity, constant variance, and independence.

✗ Problem Pattern

Funnel (Fan) Shape

Residuals spread out as fitted values increase (or decrease). This signals heteroscedasticity — the variance of errors is not constant. Consider log-transforming the response variable or using weighted least squares.

✗ Problem Pattern

Curved (U or Arch) Pattern

A systematic curve in residuals means the true relationship is nonlinear but you fit a linear model. Add a quadratic term (x²) or transform the predictor. This is a model specification problem, not a data problem.

✗ Problem Pattern

Upward or Downward Trend

When residuals systematically increase or decrease across fitted values, a relevant predictor is likely missing from the model. Adding the missing variable typically removes the trend.

✗ Investigate

Isolated Outlier

One or a few points sit far from the rest. Large standardized residuals (beyond ±3) warrant individual investigation. The observation may have been recorded incorrectly, or it may be a genuinely unusual case worth studying separately.

✗ Problem Pattern

Clustering

Residuals cluster into distinct groups rather than distributing evenly. This often indicates a categorical variable (such as group membership or time period) that was not included in the model as a predictor.

✅

Residual Plot Checklist

How to Read a Residual Plot

When examining a residual plot, look at the plot three ways. First, draw a mental horizontal reference line at zero and ask whether the cloud of points has a roughly equal density above and below it throughout the range of fitted values. Second, check whether the vertical spread of points stays consistent from left to right, or whether it fans out or compresses. Third, look for any points that sit far outside the main cloud, either very high or very low.

Most statistical software packages — R, Python's statsmodels, SPSS, Stata — generate residual plots by default after regression. In R, plot(model) produces four diagnostic plots including the residuals vs fitted plot and the normal Q-Q plot of residuals. In Python, you can visualize residuals using plt.scatter(model.fittedvalues, model.resid) after fitting a model with statsmodels. For interactive visual tools, the regression scatter plot tool on this site lets you explore model fit visually.

Standardized Residuals

Raw residuals have units tied to the response variable, which makes it hard to compare them across observations or models. Standardized residuals solve this by dividing the raw residual by the residual standard error, producing a unitless score. Think of it like a z-score for model errors.

Standardized Residual Value	Interpretation	Typical Action
0 to ±1	Typical observation — close to the regression line	No action needed
±1 to ±2	Moderate deviation — within normal range	No action; note for pattern review
±2 to ±3	Potential issue — unusually large miss	Review the observation
> ±3	Strong outlier candidate	Investigate for data error; compute Cook's D

About 95% of standardized residuals from a well-fitted model should fall within ±2. If you see more than 5% of your observations outside that range, the model may be misfitting or the residuals may not be normally distributed. For the normal distribution context, see the empirical rule and the normal distribution guide.

Studentized Residuals

Studentized residuals (also called externally studentized residuals or jackknife residuals) take the standardization one step further: they account for each observation's leverage. Leverage measures how far a predictor value sits from the mean of all predictor values. An observation with high leverage has a large effect on the regression line and would, if deleted, substantially change the estimated coefficients and the error variance.

The key distinction from standardized residuals is that the denominator uses s_(i) — the residual standard error estimated from the model with observation i removed — rather than the full-sample s_e. This makes studentized residuals more sensitive to influential observations because a truly influential outlier inflates s_e, making standardized residuals look smaller than they should. Studentized residuals reveal those cases properly.

Aspect	Standardized Residual	Studentized Residual
Formula denominator	Full-sample s_e	Leave-one-out s_(i) × √(1−h_ii)
Distribution under H₀	Approximately N(0,1)	t(n−p−2)
Outlier threshold	±3 (informal)	Bonferroni-corrected t critical value (formal)
Better for	Quick screening	Formal outlier significance testing
Available in	All regression software	R (rstudent()), Python (get_influence())

Residual Calculator

🔢 Residual Calculator — Single Observation

Enter the observed value and predicted value. The calculator returns the residual, direction, and interpretation.

Observed Value (y)

Predicted Value (ŷ)

Residual Std Error s_e (optional)

📊 Batch Residual Calculator

Enter comma-separated observed and predicted values to calculate residuals for multiple observations at once.

Observed Values (y) — comma separated

Predicted Values (ŷ) — comma separated

Residual Diagnostics Framework

Residual diagnostics is the systematic process of examining model residuals to check whether the four assumptions of linear regression hold. Each assumption has a dedicated diagnostic method. The framework below covers all four, plus outlier detection and influential observation analysis.

Detecting Nonlinearity

Plot residuals against each predictor separately, and also against fitted values. If a curved pattern appears — a U-shape, an arch, or an S-curve — the assumption of linearity is violated for that predictor. The fix is usually to add a polynomial term (x²) or apply a transformation such as log(x) or √x. If you fit a nonlinear relationship with a linear model, every prediction will be wrong in a predictable direction, which is exactly the kind of error residual analysis catches.

Detecting Heteroscedasticity

Heteroscedasticity means the variance of residuals is not constant across the range of fitted values. It shows as a funnel or fan shape in the residuals vs fitted plot. It does not bias the coefficient estimates, but it makes standard errors incorrect, which means p-values and confidence intervals from the model cannot be trusted. The formal test is the Breusch-Pagan test or the White test. The practical fix is usually to log-transform the response variable or use heteroscedasticity-consistent (robust) standard errors. For more on confidence intervals and why their accuracy depends on constant variance, see that guide.

Detecting Autocorrelation

In time series or spatial data, residuals from one observation may be correlated with residuals from nearby observations. This violates the independence assumption. Plotting residuals in time order (residuals vs observation sequence) often reveals it directly as a wave pattern. The Durbin-Watson statistic is the standard test: values near 2 indicate no autocorrelation, values near 0 or 4 signal positive or negative autocorrelation respectively. If detected, consider adding lagged predictors or switching to a time series model.

Checking Normality of Residuals

The OLS estimator does not require normally distributed residuals for unbiased coefficient estimates — it only requires them for valid hypothesis tests on small samples. For large samples, the Central Limit Theorem makes the normality assumption less critical. The standard check is the normal Q-Q plot of residuals: if residuals are normally distributed, points fall along a straight diagonal line. Departures at the tails indicate heavy or light tails. A histogram of residuals provides a complementary view. For background on normal distributions and QQ plots, see the QQ plots guide and the normal distribution reference.

Detecting Outliers and Influential Observations

An outlier is an observation with an unusually large residual — it sits far from the regression line. An influential observation is one whose inclusion substantially changes the estimated coefficients. These two concepts are related but distinct: an outlier is not always influential, and an influential observation does not always have a large residual. Cook's Distance combines residual size and leverage to measure overall influence. An observation with Cook's D greater than 1 (or more conservatively, 4/n) deserves scrutiny. For the full treatment, see how outliers are handled in descriptive statistics — many of the same principles apply here.

Assumption Violated	Residual Plot Signal	Formal Test	Common Fix
Linearity	Curved pattern vs fitted values or x	RESET test	Add polynomial term; transform predictor
Homoscedasticity	Funnel / fan shape	Breusch-Pagan, White	Log transform response; robust SE
Independence	Wave pattern vs time or sequence	Durbin-Watson	Lagged predictors; time series model
Normality of errors	Curved QQ plot; skewed histogram	Shapiro-Wilk, K-S	Transform response; use n > 30 (CLT)
Outliers	Isolated extreme points	Studentized residual t-test	Investigate; winsorize; robust regression

Diagnostics framework: Montgomery, D.C., Peck, E.A. & Vining, G.G. (2012). Introduction to Linear Regression Analysis (5th ed.). Wiley. Chapter 4: Model Adequacy Checking. Also see statsmodels Regression Diagnostics documentation.

Residuals vs Related Concepts

Residuals vs Errors

The error term ε in the population model y = β₀ + β₁x + ε represents the true unobservable deviation between each data point and the population regression line. You never see ε because you never know the true β₀ and β₁. What you see are residuals, which are the estimates of ε computed from your fitted line ŷ = β̂₀ + β̂₁x. Residuals are observable; errors are not. Residuals depend on your specific sample; errors are fixed population quantities. This distinction matters for understanding what residual analysis can and cannot tell you about the underlying population.

Residuals vs Prediction Errors

A training residual measures how far the model misses on observations it was fitted on. A prediction error (also called a test error) measures how far the model misses on new, unseen observations. Training residuals are always smaller than prediction errors because the model was explicitly optimized to minimize training residuals. Cross-validation uses held-out data to estimate true prediction errors, which is why it gives a more honest picture of model performance than training residuals alone.

Residuals vs R-Squared

R-squared (the coefficient of determination) summarizes model fit as a single number: the proportion of variance in y explained by the model. It has a direct relationship to residuals: R² = 1 − SSR/SST, where SST is the total sum of squares. A model with small residuals has small SSR and therefore high R². A model with large residuals has large SSR and low R². R² alone does not reveal whether assumptions hold — residual plots are needed for that. See the simple linear regression page for the full decomposition.

Real-World Applications

🏥

Healthcare Research

Residual analysis in clinical trials checks whether linear models for treatment effects hold across patient subgroups. Large residuals for specific age ranges or comorbidities indicate that subgroup-specific models may be needed.

📈

Financial Modeling

In factor models for asset returns, residuals represent idiosyncratic (stock-specific) risk. Analyzing residuals helps separate explained systematic risk from unexplained firm-level variance — critical for portfolio optimization.

📦

Operations & Forecasting

Demand forecasting models use residual patterns to identify when models break down — seasonal dips, promotional spikes, or supply shocks create recognizable residual patterns that trigger model retraining.

🤖

Machine Learning

Gradient boosting algorithms learn by iteratively fitting new trees to the residuals of the current ensemble. Each new tree models the unexplained portion of the previous model — this is literally residual learning.

🔬

Scientific Research

Residuals from regression on experimental data show whether the model captures the full mechanism. A curved residual pattern in a dose-response study, for example, often means the model needs a pharmacokinetic component.

🏗️

Engineering

In quality control and process engineering, residuals from process models are monitored on control charts. Residuals trending in one direction signal process drift that needs corrective action before products go out of spec.

Residuals in Machine Learning

The concept of a residual maps directly onto error analysis in machine learning. For any regression algorithm — linear regression, gradient boosting, random forest regression, or neural network regression — the training residual for an observation is y − ŷ, exactly the same formula as in classical statistics. The difference lies in how these algorithms use residuals.

Gradient boosting models make residual learning explicit. The algorithm fits an initial model, computes the residuals, fits a new weak learner (typically a shallow decision tree) to those residuals, and adds a fraction of that tree to the ensemble. The next iteration fits a tree to the new residuals (the portion still unexplained). This process iterates until residuals are small enough or a stopping criterion is met. XGBoost, LightGBM, and CatBoost all work this way. The residuals at each stage are more precisely called pseudo-residuals or negative gradients of the loss function, but for squared error loss they are identical to ordinary residuals.

In supervised machine learning generally, monitoring residuals on validation data over the training process helps detect overfitting: training residuals keep shrinking while validation residuals stop improving or grow. This is the classic overfitting diagnostic, and it is fundamentally a comparison of training versus test residuals.

Frequently Asked Questions

What are residuals in statistics? +

A residual in statistics is the difference between an observed data value and the value predicted by a model: e = y − ŷ. In regression analysis, residuals represent the portion of the observed value that the model could not explain. They are used to evaluate model fit, detect outliers, and test whether regression assumptions are satisfied.

What does residuals mean in everyday terms? +

Think of a model as making a prediction about each observation. The residual is simply how wrong it was. If a model predicts you'll score 70 on a test and you score 78, the residual is 78 − 70 = +8. The model underestimated by 8 points. If you score 65, the residual is −5 — the model overestimated by 5 points. The word "residual" means what remains after the model's explanation is accounted for.

How do you find residuals from a regression equation? +

Plug each observation's predictor value into the fitted regression equation to get ŷ, then subtract from the actual y. For ŷ = 10 + 3x and an observation with x = 5, y = 27: ŷ = 10 + 3(5) = 25; residual = 27 − 25 = 2. Most statistical software calculates and stores residuals automatically — in R use residuals(model), in Python/statsmodels use model.resid.

Can residuals be negative? +

Residuals are frequently negative. A negative residual means the model overestimated — the actual value was less than predicted. In ordinary least squares regression, the residuals must sum to zero (when the model includes an intercept), which means there are always both positive and negative residuals, and they balance out exactly.

What is a good residual value? +

There is no single threshold for a "good" residual because the scale depends entirely on your outcome variable's units. What you can assess is the size relative to the residual standard error. Standardized residuals within ±2 are considered typical for 95% of observations. The overall goodness of fit is better assessed by looking at all residuals together: small, randomly scattered residuals = good fit.

Why should residuals be randomly scattered? +

Random scatter in the residual plot confirms that the model has captured the systematic relationship between the predictor and response. If scatter is truly random, there is nothing left for the model to exploit — the remaining variation is noise. A non-random pattern means there is still structure in the data that the model missed, which signals a modeling problem worth addressing.

What is the difference between standardized and studentized residuals? +

Standardized residuals divide the raw residual by the full-sample residual standard error. Studentized (externally studentized) residuals divide by the residual standard error estimated with that specific observation left out, and also account for leverage. Studentized residuals are more sensitive to influential observations and follow a known t-distribution, enabling formal statistical outlier tests. Standardized residuals are faster to compute and fine for initial screening.

What causes heteroscedasticity in residual plots? +

Heteroscedasticity occurs when the variance of the residuals changes across the range of fitted values, often seen as a funnel or fan pattern. Common causes include omitted variables that are correlated with the error variance, the response variable having a multiplicative rather than additive error structure (common with income, prices, and counts), or a misspecified functional form. Log-transforming the response variable often corrects it because multiplicative relationships become additive on the log scale.

What is the sum of squared residuals? +

The sum of squared residuals (SSR, also called RSS or SSE) is SSR = Σ(yᵢ − ŷᵢ)². It is the quantity that ordinary least squares minimizes to find the best-fitting line. A smaller SSR means the model fits the data more closely. SSR is used to compute R², the F-statistic for overall model significance, and the residual standard error. It is also the denominator concept behind mean squared error in machine learning contexts.

What are residuals in linear regression specifically? +

In simple linear regression, the residual for each point is the vertical distance between the data point and the fitted regression line. For the model y = β̂₀ + β̂₁x + e, the residual eᵢ = yᵢ − (β̂₀ + β̂₁xᵢ). The OLS method finds β̂₀ and β̂₁ by minimizing the sum of squared residuals. In multiple linear regression, the same formula applies but ŷ involves multiple predictors. See the simple linear regression guide for the full OLS derivation.

How are residuals used in ANOVA? +

In ANOVA, residuals are the differences between each individual observation and its group mean (the predicted value for that observation under the model). They represent within-group variation. The sum of squared residuals in ANOVA is the within-groups sum of squares (SSW), which is compared against the between-groups sum of squares (SSB) in the F-test. Residual diagnostics for ANOVA follow the same logic as for regression. See the ANOVA guide for details.