What Are Residuals?
When you fit a regression line through data, the line does not pass through every point. The vertical distance from each data point to the regression line is that observation's residual. A point sitting above the line has a positive residual; a point below has a negative residual; a point sitting exactly on the line has a zero residual.
The word residual comes from the Latin residuus, meaning "remaining." In modeling terms, it is what remains unexplained after the model accounts for the relationship between the predictor and the outcome. That unexplained portion contains two things: genuine random variation in the data and any systematic patterns the model failed to capture. Residual analysis is the process of examining those leftovers to decide which of those two it is.
Residuals are the observable estimates of the true error term, ε, in the population regression equation. You can never see ε directly because the true population line is unknown. What you can see, and study, are the residuals from your fitted sample line. Everything described on this page uses the simple linear regression and multiple linear regression framework developed across Statistics Fundamentals.
Actual vs Predicted Values
Every regression prediction generates two numbers for each observation: the actual (observed) value y and the predicted value ŷ (pronounced "y-hat"). The actual value is what you measured. The predicted value is what your fitted model outputs when you plug in the predictor values for that observation.
Think of a model predicting apartment rents based on square footage. An apartment with 800 sq ft might rent for $1,450. The model, based on the regression line, predicts $1,380 for that size. The residual is $1,450 − $1,380 = $70. The model underestimated rent by $70 for this unit. That $70 is unexplained by square footage alone — perhaps because of location, floor level, or renovation quality. Examining many residuals together tells you whether your single predictor is enough or whether you need more variables.
Why Residuals Matter
Residuals are not just a byproduct of fitting a model. They are a diagnostic tool that directly tells you whether the core assumptions of ordinary least squares regression hold. Those assumptions are linearity, constant variance (homoscedasticity), independence, and normality of errors. When they hold, statistical inference from the model — p-values, confidence intervals, predictions — is valid. When they do not, the inference can mislead you badly.
Patterns in residuals reveal specific problems. A curved pattern suggests the true relationship is nonlinear. A funnel shape where residuals spread out as fitted values increase signals heteroscedasticity. Clustered residuals suggest autocorrelation. A few very large residuals point to outliers or data entry errors. Identifying and addressing these issues leads to more reliable models and better predictions.
Residuals are the differences between observed values and predicted values in a regression model. The formula is e = y − ŷ. Positive residuals indicate underprediction; negative residuals indicate overprediction. Residual analysis tests whether model assumptions hold and whether the model fits the data adequately.
Residual Formula Library
Basic Residual Formula
ei = residual for observation i
yi = observed (actual) value
ŷi = predicted value from the model
This is the foundational formula. For any regression model, whether simple or multiple, linear or polynomial, the raw residual is always actual minus predicted. Note the direction: it is always observed minus predicted, never predicted minus observed. This convention means a positive residual tells you the model undershot the actual value, and a negative residual tells you it overshot.
Standardized Residual Formula
ri = standardized residual
ei = raw residual
se = residual standard error
Dividing the raw residual by the residual standard error puts all residuals on a common scale, regardless of the units of the response variable. A standardized residual of 2.5 tells you that observation is 2.5 standard deviations away from the fitted value — regardless of whether your response is measured in dollars, kilograms, or percentages. This makes outlier screening consistent across datasets.
Studentized (Externally Studentized) Residual Formula
s(i) = model SE with observation i deleted
hii = leverage (hat value) for obs i
ti follows t-distribution with n−p−2 df
Studentized residuals improve on standardized residuals by accounting for the fact that deleting an influential observation would actually change the model's error estimate. They follow a t-distribution under the null hypothesis that no outliers are present, so you can compute exact p-values for outlier tests. These are the residuals preferred in formal regression diagnostics software.
Sum of Squared Residuals (SSR)
SSR = also called RSS or SSE in some texts
OLS minimizes this quantity
Used to compute R², MSE, and F-statistic
Ordinary least squares (OLS) estimation finds the regression coefficients that minimize the sum of squared residuals. Squaring serves two purposes: it penalizes large errors more than small ones, and it eliminates the sign so positive and negative residuals do not cancel. The SSR feeds directly into R-squared, mean squared error, and the F-test for overall model significance. For more on how these connect, see the guide to ANOVA in regression.
Residual Standard Error and Residual Variance
n = number of observations
p = number of predictors
n − p − 1 = residual degrees of freedom
The residual standard error (often reported as RSE or s) is the average distance between observations and the fitted regression line, expressed in the same units as the response variable. A model predicting house prices with RSE = $25,000 misses by roughly $25,000 on average. Smaller RSE means a closer fit. The denominator n − p − 1 corrects for the degrees of freedom lost when estimating the regression coefficients. See the degrees of freedom guide for details on why this correction matters.
| Formula | Use Case | Key Property |
|---|---|---|
| e = y − ŷ | Calculate any individual residual | Signed; sum = 0 in OLS |
| r = e / se | Standardized outlier screening | Unitless; compare across models |
| ti = e / (s(i)√(1−hii)) | Formal outlier significance test | Follows t-distribution |
| SSR = Σ(y − ŷ)² | Measure total unexplained variation | Minimized by OLS |
| s² = SSR/(n−p−1) | Residual variance estimate | Used in se, R², F-test |
How to Calculate Residuals — Step by Step
Calculating a residual is straightforward once you have a fitted model. The six steps below walk through the process from start to finish, whether you are doing it by hand or checking software output.
Obtain the Observed Value (y)
Record the actual measured outcome for each observation. This is your raw data — the thing you measured or recorded, before any modeling. In a dataset of student exam scores, y is the actual score each student received.
Obtain the Predicted Value (ŷ)
Plug each observation's predictor values into the fitted regression equation to generate ŷ. For simple linear regression ŷ = β̂₀ + β̂₁x. For multiple regression, substitute all predictor values. Statistical software computes these automatically when you run a regression.
Subtract: e = y − ŷ
Apply the formula. Always observed minus predicted. A student who scored 88 when the model predicted 80 has a residual of 88 − 80 = +8. Never reverse the subtraction — the sign carries meaning about the direction of the miss.
Interpret the Direction
A positive residual means the model underestimated — the actual value exceeded the prediction. A negative residual means the model overestimated. A zero residual means the prediction was exact. Direction matters for identifying systematic bias in the model.
Evaluate the Magnitude
Compare the size of the residual to the residual standard error. A residual of 5 in a model with se = 1 is very large (5 SDs away). The same residual of 5 in a model with se = 20 is unremarkable (0.25 SDs). Context from the standardized residual is necessary for any meaningful size judgment.
Examine All Residuals Together
Plot residuals against fitted values. Look for random scatter (good) vs. patterns (bad). A single large residual might be an outlier or a data error. A systematic pattern across all residuals is a model specification problem. The residual plot guide below covers all the patterns and what they mean.
The most frequent error is computing ŷ − y instead of y − ŷ. This reverses the sign of every residual and inverts the interpretation. Always check: positive residual = model underpredicted = actual was higher than predicted.
Residual Interpretation Guide
| Residual Value | Direction | Meaning | Action |
|---|---|---|---|
| Positive (e > 0) | Upward miss | Model underestimated; actual was higher than predicted | No action if random |
| Negative (e < 0) | Downward miss | Model overestimated; actual was lower than predicted | No action if random |
| Near zero | On target | Model predicted accurately for this observation | Good sign |
| Large positive (standardized > +2) | Significant underestimate | Possible outlier or missing predictor | Investigate observation |
| Large negative (standardized < −2) | Significant overestimate | Possible outlier or data error | Investigate observation |
| Standardized > ±3 | Extreme deviation | Strong outlier candidate | Check for data entry error; assess Cook's D |
Residual interpretation works at two levels: the individual level and the collective level. At the individual level, a large residual for a single observation raises a flag that you investigate specifically — was the data recorded correctly? Is this a genuinely unusual case? At the collective level, you are looking at the distribution and pattern of all residuals together. The collective view is more diagnostic; it tells you about the model, not just about individual data points.
One property of OLS regression worth remembering: the residuals always sum to zero when the model includes an intercept. This means the model cannot be consistently biased upward or downward across all observations. Individual residuals can be large in either direction, but they balance out. This also means the mean residual is zero by construction. If you find the mean of your residuals is not zero, something is wrong with how the model was estimated.
Worked Examples
Example 1 — Simple Linear Regression
Dataset: Hours studied (x) vs Exam score (y). Fitted regression: ŷ = 50 + 4x. Calculate residuals for all five students and the SSR.
| Student | Hours (x) | Score (y) | Predicted ŷ = 50 + 4x | Residual e = y − ŷ | e² |
|---|---|---|---|---|---|
| A | 3 | 60 | 62 | −2 | 4 |
| B | 5 | 73 | 70 | +3 | 9 |
| C | 7 | 80 | 78 | +2 | 4 |
| D | 9 | 85 | 86 | −1 | 1 |
| E | 10 | 93 | 90 | +3 | 9 |
| Sum (Σ): | +5 | 27 | |||
Note: The sum of residuals is +5, not zero. This occurs because the coefficients shown (50, 4) are not the exact OLS solution for this particular dataset — they are illustrative. The actual OLS fit would force Σe = 0 exactly.
SSR calculation: SSR = 4 + 9 + 4 + 1 + 9 = 27
Residual variance: s² = SSR / (n − p − 1) = 27 / (5 − 1 − 1) = 27/3 = 9
Residual standard error: se = √9 = 3 points. The model's predictions are off by about 3 exam score points on average.
✅ The residuals are small and mixed in sign, suggesting a reasonable fit. With se = 3, predictions for this range of study hours are accurate to within roughly ±3 exam points.
Example 2 — House Price Prediction
A model predicts house prices using square footage: ŷ = 80,000 + 150x. A 1,400 sq ft house sold for $310,000. What is the residual?
Compute ŷ: ŷ = 80,000 + 150 × 1,400 = 80,000 + 210,000 = $290,000
Compute residual: e = 310,000 − 290,000 = +$20,000
Interpret: The model underpredicted by $20,000. This house sold for more than predicted based on size alone — likely because of location, renovation quality, or other features not in the model.
✅ Residual = +$20,000 (positive = underestimated). The house commanded a $20,000 premium above what the model predicted from square footage alone.
Example 3 — Sales Forecasting
A time-series regression model predicts monthly sales: ŷ = 12,000 + 800t (where t = month number). In month 6, actual sales were $15,200. Calculate and interpret the residual.
Predicted sales in month 6: ŷ = 12,000 + 800(6) = 12,000 + 4,800 = $16,800
Residual: e = 15,200 − 16,800 = −$1,600
Interpretation: The model overestimated sales by $1,600 in month 6. If this pattern (model consistently overpredicting in certain months) repeats, it signals seasonality that the linear trend model is not capturing. The model needs a seasonal component.
✅ Residual = −$1,600 (negative = overestimated). Investigate whether negative residuals cluster in particular months — that would signal a seasonal pattern the model is missing.
Residual Plots Explained
A residual plot is a scatterplot with fitted values (ŷ) on the x-axis and residuals (e) on the y-axis. It is the single most useful diagnostic tool in regression analysis. Reading a residual plot requires knowing what good looks like versus what each type of bad pattern signals. The interactive canvas below illustrates each pattern type.
The Five Residual Plot Patterns
Random Scatter
Residuals scattered randomly above and below zero with no trend and roughly constant spread. This is what you want to see — it confirms linearity, constant variance, and independence.
Funnel (Fan) Shape
Residuals spread out as fitted values increase (or decrease). This signals heteroscedasticity — the variance of errors is not constant. Consider log-transforming the response variable or using weighted least squares.
Curved (U or Arch) Pattern
A systematic curve in residuals means the true relationship is nonlinear but you fit a linear model. Add a quadratic term (x²) or transform the predictor. This is a model specification problem, not a data problem.
Upward or Downward Trend
When residuals systematically increase or decrease across fitted values, a relevant predictor is likely missing from the model. Adding the missing variable typically removes the trend.
Isolated Outlier
One or a few points sit far from the rest. Large standardized residuals (beyond ±3) warrant individual investigation. The observation may have been recorded incorrectly, or it may be a genuinely unusual case worth studying separately.
Clustering
Residuals cluster into distinct groups rather than distributing evenly. This often indicates a categorical variable (such as group membership or time period) that was not included in the model as a predictor.
Random scatter? (Yes) | Visible trend? (No) | Constant spread? (Yes) | Clustering? (No) | Curvature? (No) | Extreme isolated points? (Minimal) — All six conditions met = assumptions satisfied.
How to Read a Residual Plot
When examining a residual plot, look at the plot three ways. First, draw a mental horizontal reference line at zero and ask whether the cloud of points has a roughly equal density above and below it throughout the range of fitted values. Second, check whether the vertical spread of points stays consistent from left to right, or whether it fans out or compresses. Third, look for any points that sit far outside the main cloud, either very high or very low.
Most statistical software packages — R, Python's statsmodels, SPSS, Stata — generate residual plots by default after regression. In R, plot(model) produces four diagnostic plots including the residuals vs fitted plot and the normal Q-Q plot of residuals. In Python, you can visualize residuals using plt.scatter(model.fittedvalues, model.resid) after fitting a model with statsmodels. For interactive visual tools, the regression scatter plot tool on this site lets you explore model fit visually.
Standardized Residuals
Raw residuals have units tied to the response variable, which makes it hard to compare them across observations or models. Standardized residuals solve this by dividing the raw residual by the residual standard error, producing a unitless score. Think of it like a z-score for model errors.
| Standardized Residual Value | Interpretation | Typical Action |
|---|---|---|
| 0 to ±1 | Typical observation — close to the regression line | No action needed |
| ±1 to ±2 | Moderate deviation — within normal range | No action; note for pattern review |
| ±2 to ±3 | Potential issue — unusually large miss | Review the observation |
| > ±3 | Strong outlier candidate | Investigate for data error; compute Cook's D |
About 95% of standardized residuals from a well-fitted model should fall within ±2. If you see more than 5% of your observations outside that range, the model may be misfitting or the residuals may not be normally distributed. For the normal distribution context, see the empirical rule and the normal distribution guide.
Studentized Residuals
Studentized residuals (also called externally studentized residuals or jackknife residuals) take the standardization one step further: they account for each observation's leverage. Leverage measures how far a predictor value sits from the mean of all predictor values. An observation with high leverage has a large effect on the regression line and would, if deleted, substantially change the estimated coefficients and the error variance.
The key distinction from standardized residuals is that the denominator uses s(i) — the residual standard error estimated from the model with observation i removed — rather than the full-sample se. This makes studentized residuals more sensitive to influential observations because a truly influential outlier inflates se, making standardized residuals look smaller than they should. Studentized residuals reveal those cases properly.
| Aspect | Standardized Residual | Studentized Residual |
|---|---|---|
| Formula denominator | Full-sample se | Leave-one-out s(i) × √(1−hii) |
| Distribution under H₀ | Approximately N(0,1) | t(n−p−2) |
| Outlier threshold | ±3 (informal) | Bonferroni-corrected t critical value (formal) |
| Better for | Quick screening | Formal outlier significance testing |
| Available in | All regression software | R (rstudent()), Python (get_influence()) |
Residual Calculator
🔢 Residual Calculator — Single Observation
Enter the observed value and predicted value. The calculator returns the residual, direction, and interpretation.
📊 Batch Residual Calculator
Enter comma-separated observed and predicted values to calculate residuals for multiple observations at once.
Residual Diagnostics Framework
Residual diagnostics is the systematic process of examining model residuals to check whether the four assumptions of linear regression hold. Each assumption has a dedicated diagnostic method. The framework below covers all four, plus outlier detection and influential observation analysis.
Detecting Nonlinearity
Plot residuals against each predictor separately, and also against fitted values. If a curved pattern appears — a U-shape, an arch, or an S-curve — the assumption of linearity is violated for that predictor. The fix is usually to add a polynomial term (x²) or apply a transformation such as log(x) or √x. If you fit a nonlinear relationship with a linear model, every prediction will be wrong in a predictable direction, which is exactly the kind of error residual analysis catches.
Detecting Heteroscedasticity
Heteroscedasticity means the variance of residuals is not constant across the range of fitted values. It shows as a funnel or fan shape in the residuals vs fitted plot. It does not bias the coefficient estimates, but it makes standard errors incorrect, which means p-values and confidence intervals from the model cannot be trusted. The formal test is the Breusch-Pagan test or the White test. The practical fix is usually to log-transform the response variable or use heteroscedasticity-consistent (robust) standard errors. For more on confidence intervals and why their accuracy depends on constant variance, see that guide.
Detecting Autocorrelation
In time series or spatial data, residuals from one observation may be correlated with residuals from nearby observations. This violates the independence assumption. Plotting residuals in time order (residuals vs observation sequence) often reveals it directly as a wave pattern. The Durbin-Watson statistic is the standard test: values near 2 indicate no autocorrelation, values near 0 or 4 signal positive or negative autocorrelation respectively. If detected, consider adding lagged predictors or switching to a time series model.
Checking Normality of Residuals
The OLS estimator does not require normally distributed residuals for unbiased coefficient estimates — it only requires them for valid hypothesis tests on small samples. For large samples, the Central Limit Theorem makes the normality assumption less critical. The standard check is the normal Q-Q plot of residuals: if residuals are normally distributed, points fall along a straight diagonal line. Departures at the tails indicate heavy or light tails. A histogram of residuals provides a complementary view. For background on normal distributions and QQ plots, see the QQ plots guide and the normal distribution reference.
Detecting Outliers and Influential Observations
An outlier is an observation with an unusually large residual — it sits far from the regression line. An influential observation is one whose inclusion substantially changes the estimated coefficients. These two concepts are related but distinct: an outlier is not always influential, and an influential observation does not always have a large residual. Cook's Distance combines residual size and leverage to measure overall influence. An observation with Cook's D greater than 1 (or more conservatively, 4/n) deserves scrutiny. For the full treatment, see how outliers are handled in descriptive statistics — many of the same principles apply here.
| Assumption Violated | Residual Plot Signal | Formal Test | Common Fix |
|---|---|---|---|
| Linearity | Curved pattern vs fitted values or x | RESET test | Add polynomial term; transform predictor |
| Homoscedasticity | Funnel / fan shape | Breusch-Pagan, White | Log transform response; robust SE |
| Independence | Wave pattern vs time or sequence | Durbin-Watson | Lagged predictors; time series model |
| Normality of errors | Curved QQ plot; skewed histogram | Shapiro-Wilk, K-S | Transform response; use n > 30 (CLT) |
| Outliers | Isolated extreme points | Studentized residual t-test | Investigate; winsorize; robust regression |
Residuals vs Related Concepts
Residuals vs Errors
The error term ε in the population model y = β₀ + β₁x + ε represents the true unobservable deviation between each data point and the population regression line. You never see ε because you never know the true β₀ and β₁. What you see are residuals, which are the estimates of ε computed from your fitted line ŷ = β̂₀ + β̂₁x. Residuals are observable; errors are not. Residuals depend on your specific sample; errors are fixed population quantities. This distinction matters for understanding what residual analysis can and cannot tell you about the underlying population.
Residuals vs Prediction Errors
A training residual measures how far the model misses on observations it was fitted on. A prediction error (also called a test error) measures how far the model misses on new, unseen observations. Training residuals are always smaller than prediction errors because the model was explicitly optimized to minimize training residuals. Cross-validation uses held-out data to estimate true prediction errors, which is why it gives a more honest picture of model performance than training residuals alone.
Residuals vs R-Squared
R-squared (the coefficient of determination) summarizes model fit as a single number: the proportion of variance in y explained by the model. It has a direct relationship to residuals: R² = 1 − SSR/SST, where SST is the total sum of squares. A model with small residuals has small SSR and therefore high R². A model with large residuals has large SSR and low R². R² alone does not reveal whether assumptions hold — residual plots are needed for that. See the simple linear regression page for the full decomposition.
Real-World Applications
Healthcare Research
Residual analysis in clinical trials checks whether linear models for treatment effects hold across patient subgroups. Large residuals for specific age ranges or comorbidities indicate that subgroup-specific models may be needed.
Financial Modeling
In factor models for asset returns, residuals represent idiosyncratic (stock-specific) risk. Analyzing residuals helps separate explained systematic risk from unexplained firm-level variance — critical for portfolio optimization.
Operations & Forecasting
Demand forecasting models use residual patterns to identify when models break down — seasonal dips, promotional spikes, or supply shocks create recognizable residual patterns that trigger model retraining.
Machine Learning
Gradient boosting algorithms learn by iteratively fitting new trees to the residuals of the current ensemble. Each new tree models the unexplained portion of the previous model — this is literally residual learning.
Scientific Research
Residuals from regression on experimental data show whether the model captures the full mechanism. A curved residual pattern in a dose-response study, for example, often means the model needs a pharmacokinetic component.
Engineering
In quality control and process engineering, residuals from process models are monitored on control charts. Residuals trending in one direction signal process drift that needs corrective action before products go out of spec.
Residuals in Machine Learning
The concept of a residual maps directly onto error analysis in machine learning. For any regression algorithm — linear regression, gradient boosting, random forest regression, or neural network regression — the training residual for an observation is y − ŷ, exactly the same formula as in classical statistics. The difference lies in how these algorithms use residuals.
Gradient boosting models make residual learning explicit. The algorithm fits an initial model, computes the residuals, fits a new weak learner (typically a shallow decision tree) to those residuals, and adds a fraction of that tree to the ensemble. The next iteration fits a tree to the new residuals (the portion still unexplained). This process iterates until residuals are small enough or a stopping criterion is met. XGBoost, LightGBM, and CatBoost all work this way. The residuals at each stage are more precisely called pseudo-residuals or negative gradients of the loss function, but for squared error loss they are identical to ordinary residuals.
In supervised machine learning generally, monitoring residuals on validation data over the training process helps detect overfitting: training residuals keep shrinking while validation residuals stop improving or grow. This is the classic overfitting diagnostic, and it is fundamentally a comparison of training versus test residuals.
Frequently Asked Questions
A residual in statistics is the difference between an observed data value and the value predicted by a model: e = y − ŷ. In regression analysis, residuals represent the portion of the observed value that the model could not explain. They are used to evaluate model fit, detect outliers, and test whether regression assumptions are satisfied.
Think of a model as making a prediction about each observation. The residual is simply how wrong it was. If a model predicts you'll score 70 on a test and you score 78, the residual is 78 − 70 = +8. The model underestimated by 8 points. If you score 65, the residual is −5 — the model overestimated by 5 points. The word "residual" means what remains after the model's explanation is accounted for.
Plug each observation's predictor value into the fitted regression equation to get ŷ, then subtract from the actual y. For ŷ = 10 + 3x and an observation with x = 5, y = 27: ŷ = 10 + 3(5) = 25; residual = 27 − 25 = 2. Most statistical software calculates and stores residuals automatically — in R use residuals(model), in Python/statsmodels use model.resid.
Residuals are frequently negative. A negative residual means the model overestimated — the actual value was less than predicted. In ordinary least squares regression, the residuals must sum to zero (when the model includes an intercept), which means there are always both positive and negative residuals, and they balance out exactly.
There is no single threshold for a "good" residual because the scale depends entirely on your outcome variable's units. What you can assess is the size relative to the residual standard error. Standardized residuals within ±2 are considered typical for 95% of observations. The overall goodness of fit is better assessed by looking at all residuals together: small, randomly scattered residuals = good fit.
Random scatter in the residual plot confirms that the model has captured the systematic relationship between the predictor and response. If scatter is truly random, there is nothing left for the model to exploit — the remaining variation is noise. A non-random pattern means there is still structure in the data that the model missed, which signals a modeling problem worth addressing.
Standardized residuals divide the raw residual by the full-sample residual standard error. Studentized (externally studentized) residuals divide by the residual standard error estimated with that specific observation left out, and also account for leverage. Studentized residuals are more sensitive to influential observations and follow a known t-distribution, enabling formal statistical outlier tests. Standardized residuals are faster to compute and fine for initial screening.
Heteroscedasticity occurs when the variance of the residuals changes across the range of fitted values, often seen as a funnel or fan pattern. Common causes include omitted variables that are correlated with the error variance, the response variable having a multiplicative rather than additive error structure (common with income, prices, and counts), or a misspecified functional form. Log-transforming the response variable often corrects it because multiplicative relationships become additive on the log scale.
The sum of squared residuals (SSR, also called RSS or SSE) is SSR = Σ(yᵢ − ŷᵢ)². It is the quantity that ordinary least squares minimizes to find the best-fitting line. A smaller SSR means the model fits the data more closely. SSR is used to compute R², the F-statistic for overall model significance, and the residual standard error. It is also the denominator concept behind mean squared error in machine learning contexts.
In simple linear regression, the residual for each point is the vertical distance between the data point and the fitted regression line. For the model y = β̂₀ + β̂₁x + e, the residual eᵢ = yᵢ − (β̂₀ + β̂₁xᵢ). The OLS method finds β̂₀ and β̂₁ by minimizing the sum of squared residuals. In multiple linear regression, the same formula applies but ŷ involves multiple predictors. See the simple linear regression guide for the full OLS derivation.
In ANOVA, residuals are the differences between each individual observation and its group mean (the predicted value for that observation under the model). They represent within-group variation. The sum of squared residuals in ANOVA is the within-groups sum of squares (SSW), which is compared against the between-groups sum of squares (SSB) in the F-test. Residual diagnostics for ANOVA follow the same logic as for regression. See the ANOVA guide for details.