Regression Model Evaluation Coefficient of Determination 22 min read June 10, 2026
BY: Statistics Fundamentals Team
Reviewed By: Minsa A (Senior Statistics Editor)

R-Squared (R²): Formula, Interpretation, and Worked Examples

A sales manager builds a model to predict revenue from advertising spend. A data scientist trains a regression on housing prices. An economist forecasts GDP from macroeconomic indicators. After fitting the model, each of them asks the same question: how much of the outcome does this model actually explain? The answer is R-Squared.

R² (the coefficient of determination) is a single number between 0 and 1 that tells you the proportion of variance in your dependent variable captured by your regression model. This guide covers the formula from first principles, three fully worked examples, the adjusted R² correction for multiple predictors, interpretation benchmarks by field, and an interactive calculator you can use on your own data.

What You'll Learn
  • ✓ The exact definition of R² and the coefficient of determination
  • ✓ The R² formula using RSS and TSS — derived step by step
  • ✓ Three fully worked numerical examples (sales, housing, social science)
  • ✓ What counts as a good R² value in different fields
  • ✓ Adjusted R² and when to use it over R²
  • ✓ R² vs RMSE, MAE, and correlation — knowing which metric to use when
  • ✓ Interactive calculator: enter actual and predicted values to get R² instantly

What Is R-Squared? (Definition)

Definition — R-Squared (Coefficient of Determination)
R-Squared (R²), also called the coefficient of determination, is a statistical measure representing the proportion of variance in a dependent variable that is explained by one or more independent variables in a regression model. It ranges from 0 to 1: an R² of 0 means the model explains none of the variation in the outcome; an R² of 1 means it explains all of it.
R² = 1 − (RSS / TSS)

The name "coefficient of determination" reflects exactly what it measures: how much of the dependent variable is determined (explained) by the predictors. In simple linear regression with one predictor, R² equals the square of the Pearson correlation coefficient r between x and y, which is where the "R" in R-Squared comes from.

R² answers a concrete question: if you used your model's predictions instead of just predicting the mean every time, by how much would your total squared error shrink? An R² of 0.72 means your model cuts the unexplained variance by 72% compared to the naive baseline of always predicting ȳ.

The concept was formalized within the broader framework of simple linear regression and is now used across nearly every quantitative discipline. The underlying mathematics of variance decomposition connects directly to ANOVA (Analysis of Variance), which partitions total variation into the same explained and unexplained components.

0
No variance explained — model no better than the mean
0.5
50% of variance explained by the predictors
1
Perfect fit — all variance explained, zero residuals
R² = r²
In simple linear regression: R² equals the squared Pearson r
⚡ Quick Reference — R² Key Facts
  • Full name: Coefficient of determination
  • Symbol: R² (also written as R-squared, R-squared value, or r-squared)
  • Range in OLS training data: 0 to 1 (can be negative on test/out-of-sample data)
  • Interpretation: Proportion of total variance in y explained by the regression model
  • Formula: R² = 1 − RSS/TSS = ESS/TSS
  • Limitation: Always increases as you add predictors — use Adjusted R² to compare models with different numbers of predictors
  • In simple regression: R² = r² (squared Pearson correlation coefficient)

The R-Squared Formula Explained

R-Squared is built from three quantities that partition total variance into explained and unexplained parts. Understanding these three sums of squares is the foundation for interpreting any regression model.

R-Squared Formula (Coefficient of Determination)
R² = 1 − RSS / TSS
RSS = Residual Sum of Squares = Σ(yᵢ − ŷᵢ)² TSS = Total Sum of Squares = Σ(yᵢ − ȳ)² yᵢ = actual value ŷᵢ = predicted value ȳ = mean of actual values

The formula has a clear geometric meaning. TSS measures the total spread in your outcome variable — how much the actual values vary around their mean. RSS measures how much variation remains after fitting the model — the squared distances between actual and predicted values. The ratio RSS/TSS is the fraction of variance the model fails to explain. Subtracting that from 1 gives the fraction it does explain.

The Three Sums of Squares

For any regression model, the total variance in y decomposes exactly into explained and unexplained parts:

Variance Decomposition Identity
TSS = ESS + RSS
TSS = Σ(yᵢ − ȳ)²  — Total Sum of Squares ESS = Σ(ŷᵢ − ȳ)²  — Explained Sum of Squares RSS = Σ(yᵢ − ŷᵢ)²  — Residual Sum of Squares

Because TSS = ESS + RSS, the formula R² = 1 − RSS/TSS is equivalent to R² = ESS/TSS. Both give the same value. The ESS/TSS form makes the interpretation more direct: R² is the explained share of total variance. The 1 − RSS/TSS form is more commonly used in computation because residuals (yᵢ − ŷᵢ) are already calculated during model fitting.

The Adjusted R-Squared Formula

Standard R² has a known flaw: adding any predictor to a model, even a useless random variable, will either increase R² or leave it unchanged. This means comparing R² across models with different numbers of predictors is misleading. Adjusted R² corrects this by penalizing model complexity:

Adjusted R-Squared Formula
Adj. R² = 1 − [(1 − R²)(n − 1) / (n − k − 1)]
n = sample size k = number of independent predictors = standard R-Squared

When you add a predictor that genuinely improves the model, Adjusted R² increases. When you add a predictor that contributes little, Adjusted R² decreases or stays flat. This makes it the correct metric for comparing multiple linear regression models with different numbers of predictors.

⚠️
When R² and Adjusted R² Diverge

If R² increases but Adjusted R² decreases when you add a predictor, that predictor is adding noise rather than signal. The denominator term (n − k − 1) becomes small when k approaches n, causing Adjusted R² to collapse — which is why you need far more observations than predictors in any regression model.

How to Calculate R-Squared: Step-by-Step

Calculating R² from scratch requires only basic arithmetic. The procedure below works for any regression model, whether simple linear or multiple.

1

Collect Actual Values and Fit the Model

Record your n actual outcomes: y₁, y₂, …, yₙ. Fit your regression model to get the corresponding predicted values: ŷ₁, ŷ₂, …, ŷₙ. In ordinary least squares regression, these predictions minimize RSS by definition.

2

Compute the Mean of Actual Values (ȳ)

Calculate ȳ = (y₁ + y₂ + ⋯ + yₙ) / n. This is the baseline: a model that always predicts ȳ has R² = 0 by definition, because TSS − RSS = 0.

3

Calculate TSS (Total Sum of Squares)

TSS = Σ(yᵢ − ȳ)². For each observation, subtract the mean from the actual value, square it, and sum all n squared differences. TSS represents the total spread in y independent of any model.

4

Calculate RSS (Residual Sum of Squares)

RSS = Σ(yᵢ − ŷᵢ)². For each observation, subtract the predicted value from the actual value (this is the residual), square it, and sum. RSS measures the model's remaining unexplained error. You can verify residuals using the regression scatter plot tool.

5

Apply the Formula: R² = 1 − RSS / TSS

Divide RSS by TSS to get the unexplained fraction, then subtract from 1. The result is R². As a check: ESS = TSS − RSS, and R² should also equal ESS/TSS.

6

Interpret the Result

Compare the R² to field-appropriate benchmarks (see Section 5 below). Check whether the residuals follow the regression assumptions. A high R² with non-random residual patterns still indicates a misspecified model.

Worked Examples — R-Squared Step by Step

Each example below shows the full numerical calculation so you can follow every arithmetic step. These examples span different domains to show how R² is used and interpreted across fields.

Example 1 — Advertising Spend vs. Sales Revenue

Worked Example 1 — Simple Linear Regression

Problem: A retailer fits a regression model to predict monthly sales (y, in $000s) from advertising spend (x, in $000s) using five months of data: (x: 2, 4, 6, 8, 10) → (y: 50, 75, 80, 90, 120). The model produces predicted values ŷ = (52, 68, 84, 100, 116). Calculate R².

R² Formula
R² = 1 − RSS / TSS
TSS = Σ(yᵢ − ȳ)² RSS = Σ(yᵢ − ŷᵢ)²
1

Compute ȳ: ȳ = (50 + 75 + 80 + 90 + 120) / 5 = 415 / 5 = 83

2

Calculate TSS:
(50 − 83)² = (−33)² = 1089
(75 − 83)² = (−8)² = 64
(80 − 83)² = (−3)² = 9
(90 − 83)² = (7)² = 49
(120 − 83)² = (37)² = 1369
TSS = 1089 + 64 + 9 + 49 + 1369 = 2580

3

Calculate RSS:
(50 − 52)² = (−2)² = 4
(75 − 68)² = (7)² = 49
(80 − 84)² = (−4)² = 16
(90 − 100)² = (−10)² = 100
(120 − 116)² = (4)² = 16
RSS = 4 + 49 + 16 + 100 + 16 = 185

4

Apply formula: R² = 1 − 185/2580 = 1 − 0.0717 = 0.928

✅ R² = 0.928. The regression model explains 92.8% of the variance in monthly sales. For a simple two-variable model in a business context, this is a strong result. Verify the regression line using the simple linear regression calculator.

Example 2 — Housing Price Prediction (Multiple Regression)

Worked Example 2 — Multiple Linear Regression

Problem: A real estate analyst fits a multiple regression model using house size (sq ft) and age (years) to predict sale price ($000s). For 6 homes the actual prices are: y = [210, 185, 240, 195, 270, 220]. The model's predicted values are: ŷ = [218, 192, 235, 200, 265, 215]. Calculate R² and Adjusted R² (k = 2 predictors).

1

Compute ȳ: ȳ = (210 + 185 + 240 + 195 + 270 + 220) / 6 = 1320 / 6 = 220

2

TSS: (210−220)²+(185−220)²+(240−220)²+(195−220)²+(270−220)²+(220−220)² = 100 + 1225 + 400 + 625 + 2500 + 0 = 4850

3

RSS: (210−218)²+(185−192)²+(240−235)²+(195−200)²+(270−265)²+(220−215)² = 64 + 49 + 25 + 25 + 25 + 25 = 213

4

R²: R² = 1 − 213/4850 = 1 − 0.0439 = 0.956

5

Adjusted R² (n = 6, k = 2):
Adj. R² = 1 − [(1 − 0.956)(6 − 1) / (6 − 2 − 1)]
= 1 − [(0.044 × 5) / 3]
= 1 − [0.22 / 3]
= 1 − 0.0733 = 0.927

✅ R² = 0.956, Adjusted R² = 0.927. The model accounts for 95.6% of variance in housing prices. The Adjusted R² of 0.927 confirms this is genuine predictive power, not inflation from additional predictors. For deeper regression diagnostics, see the multiple linear regression guide.

Example 3 — Social Science (Moderate R²)

Worked Example 3 — Social Science Context

Problem: A researcher models exam scores (y, out of 100) from study hours per week (x) for 5 students. Actual scores: y = [65, 72, 58, 80, 70]. Predicted scores: ŷ = [68, 70, 62, 76, 69]. Calculate R² and interpret it for a social science context.

1

ȳ: (65 + 72 + 58 + 80 + 70) / 5 = 345 / 5 = 69

2

TSS: (65−69)²+(72−69)²+(58−69)²+(80−69)²+(70−69)² = 16 + 9 + 121 + 121 + 1 = 268

3

RSS: (65−68)²+(72−70)²+(58−62)²+(80−76)²+(70−69)² = 9 + 4 + 16 + 16 + 1 = 46

4

R²: 1 − 46/268 = 1 − 0.172 = 0.828

✅ R² = 0.828. Study hours explain about 83% of the variation in exam scores in this sample. In social science research, where human behavior introduces substantial unexplained variance, an R² above 0.80 is considered strong. The remaining 17% reflects factors outside the model: motivation, prior knowledge, test anxiety.

R-squared methodology follows Draper, N. R., & Smith, H. (1998). Applied Regression Analysis (3rd ed.). Wiley. Variance decomposition notation consistent with the NIST/SEMATECH e-Handbook of Statistical Methods.

R-Squared Interpretation Guide

R² has no universal "good" or "bad" threshold. The same value of 0.40 might be excellent in a social science study of behavioral outcomes and inadequate in a calibrated engineering measurement system. Interpretation always depends on the field, the model's purpose, and what other metrics show.

📊
Featured Snippet — R² Interpretation

R² = 0 means the model has no explanatory power (equivalent to always predicting the mean). R² = 1 means perfect fit. Values between 0 and 1 indicate what fraction of outcome variance the model captures. Whether a given value is "good" depends on the field and the alternative benchmarks available.

Interpretation by Value Range

0.00–0.20
Very Weak
0.20–0.40
Weak
0.40–0.60
Moderate
0.60–0.80
Strong
0.80–1.00
Very Strong

These ranges are a starting point, not rules. Use them as orientation while applying the field-specific context below.

R² Benchmarks by Field

Field / Application Typical R² Range Notes
Physics / Engineering (measurement)0.95 – 1.00Controlled experiments with low noise
Chemistry / Materials Science0.90 – 0.99Well-characterized physical relationships
Financial Forecasting0.60 – 0.90Depends on asset class and timeframe
Econometrics / Macroeconomics0.50 – 0.85GDP models, inflation predictors
Machine Learning Regression0.70 – 0.95Varies by task; check on test set
Marketing / Business Analytics0.40 – 0.80Consumer behavior adds noise
Psychology / Social Science0.25 – 0.60Complex human outcomes; 0.40+ often accepted
Clinical / Epidemiological Research0.20 – 0.60Biological variability is large
Cross-sectional Survey Data0.10 – 0.40Observational noise from unmeasured confounders
⚠️
High R² Does Not Mean a Good Model

A model can have R² = 0.97 and still be badly misspecified. Overfitting on training data, non-linear relationships fitted with a linear model, or influential outliers driving a spurious fit all produce high R² with misleading predictions. Always examine residual plots and evaluate on held-out data. Use the regression scatter plot tool to check residual patterns visually.

R-Squared Calculator

Enter your actual values and predicted values below (comma-separated or space-separated). The calculator computes R², Adjusted R², TSS, RSS, and ESS, and gives an interpretation based on the value.

R² Calculator — Enter Actual & Predicted Values

Paste comma-separated numbers (e.g. from Excel). Both lists must have the same length.

Visual Understanding: High vs Low R²

The scatter plots below show the same number of data points under two different R² conditions. When points cluster tightly around the regression line, R² is high. When they scatter widely, R² is low — the line captures less of the story in the data.

Scatter Plot Comparison: High R² vs Low R²

High R² Model
R² = 0.93
Low R² Model
R² = 0.18

R² vs Adjusted R²: When to Use Each

Choosing between R² and Adjusted R² depends on your goal. For describing how well a fixed model fits on training data, standard R² is fine. For comparing models or building models with feature selection, Adjusted R² is the correct choice.

Dimension R² (Standard) Adjusted R²
Formula1 − RSS/TSS1 − [(1−R²)(n−1)/(n−k−1)]
Effect of adding predictorsNever decreasesDecreases if predictor adds noise
Use caseDescribing a single model's fitComparing models with different k
Penalizes overfittingNoYes
InterpretationProportion of variance explainedAdjusted proportion accounting for model complexity
Range0 to 1 (on training data)Can be below 0 with many predictors and small n
Simple regression (k = 1)Same as Adj. R² when k = 1Equals R² when k = 1
Preferred in academia forReporting a final modelModel selection, feature comparison

R² vs RMSE, MAE, and Correlation

R² is one member of a broader toolkit of regression evaluation metrics. Each answers a different question about model performance, and strong practice is to report several together rather than relying on R² alone.

Metric Formula What It Measures Range Best Used When
R² (Coefficient of Determination) 1 − RSS/TSS Proportion of variance explained 0 to 1 (training) Describing overall model fit; comparing to a mean baseline
Adjusted R² 1 − [(1−R²)(n−1)/(n−k−1)] Variance explained, penalized for k Can be < 0 Comparing models with different numbers of predictors
RMSE (Root Mean Squared Error) √(RSS/n) Average prediction error in outcome units 0 to ∞ When large errors are especially costly (penalizes outliers)
MAE (Mean Absolute Error) Σ|yᵢ − ŷᵢ| / n Average absolute error in outcome units 0 to ∞ When all error magnitudes matter equally; robust to outliers
Pearson r (Correlation) Cov(x,y) / (σₓσᵧ) Strength and direction of linear relationship −1 to 1 Simple linear regression only; R² = r²
Best Practice: Report R² Alongside RMSE

R² tells you the relative performance (compared to always predicting the mean), while RMSE tells you the absolute error in the units of y. Together they give a complete picture. For model comparison, also run a hypothesis test on whether the improvement is statistically significant.

Real-World Applications of R-Squared

R-Squared appears in virtually every field that uses regression modeling. Below are the most common use cases across industry and research.

📈

Financial Modeling

Analysts use R² to evaluate how much of a stock's return is explained by market factors (beta). In factor models like the Fama-French model, R² shows how much of portfolio variance is captured by systematic risk factors.

🏠

Real Estate Valuation

Property valuation models use R² to assess how well predictors like size, location, and age explain price variation. Automated valuation models (AVMs) used by lenders typically report R² on held-out test samples.

🤖

Machine Learning

R² is the standard regression score in scikit-learn (r2_score). Crucially, it is computed on the test set in ML contexts, where R² can be negative if the model is worse than the mean predictor on unseen data.

🏥

Clinical Research

Epidemiologists report R² when modeling continuous outcomes like blood pressure, weight change, or biomarker levels. In clinical settings, even an R² of 0.30 can be meaningful because biological outcomes have high inherent variability.

🎯

Marketing Analytics

Marketing mix models use R² to evaluate how well advertising spend, promotions, and seasonality predict sales. R² values of 0.70–0.90 are typical in well-specified marketing mix models.

⚙️

Engineering & Quality Control

In process engineering and calibration, R² is used to validate measurement instruments and process models. R² above 0.99 is often required for calibration curves in analytical chemistry.

How to Calculate R-Squared in Excel

Excel provides two direct methods to obtain R² in a spreadsheet, both requiring only a few clicks.

A

Using the RSQ Function

Place actual values in column A and predicted values in column B. In any empty cell, enter: =RSQ(A2:A20, B2:B20) — substituting the actual range. Excel returns R² directly. Note: Excel's RSQ function computes the squared Pearson correlation between two arrays, which equals R² in simple linear regression. For multiple regression, use Method B.

B

Using the LINEST Function (Multiple Regression)

Select a range of cells, type =LINEST(y_range, x_range, TRUE, TRUE), and press Ctrl+Shift+Enter (array formula). Excel returns a block of statistics; R² is in row 3, column 1 of the output. This method works for any number of predictors.

C

Using a Trendline on a Chart

Create an XY scatter chart from your x and y data. Right-click on the data series, choose "Add Trendline," select Linear (or another model type), and check both "Display Equation on chart" and "Display R-squared value on chart." Excel adds R² directly to the chart. This method applies only to the chart's data range.

Formula Glossary

All key terms and formulas from this guide, collected in one reference block.

R-Squared (R²)
Proportion of variance in the dependent variable explained by the regression model. Also called the coefficient of determination.
R² = 1 − RSS/TSS
Adjusted R²
R² penalized for the number of predictors k. Decreases if adding a predictor does not improve fit beyond what chance would produce.
1 − [(1−R²)(n−1)/(n−k−1)]
TSS — Total Sum of Squares
Total variance in y relative to its mean. Baseline against which the model is evaluated.
TSS = Σ(yᵢ − ȳ)²
RSS — Residual Sum of Squares
Unexplained variance remaining after fitting the model. Also called SSE (Sum of Squared Errors).
RSS = Σ(yᵢ − ŷᵢ)²
ESS — Explained Sum of Squares
Variance captured by the regression model. ESS = TSS − RSS.
ESS = Σ(ŷᵢ − ȳ)²
Residual (e)
The difference between an actual and predicted value for observation i.
eᵢ = yᵢ − ŷᵢ
Pearson r
Correlation coefficient measuring linear association. In simple linear regression, R² = r².
r = Cov(x,y)/(σₓσᵧ)
RMSE
Root Mean Squared Error — average prediction error in the units of y. Measures absolute error magnitude.
RMSE = √(RSS/n)

Frequently Asked Questions

External references: James et al., Introduction to Statistical Learning (ISLR) — the graduate-level reference for R² in regression models; scikit-learn r2_score documentation — how R² is implemented in Python's leading ML library; NIST Engineering Statistics Handbook — authoritative technical definition of the coefficient of determination.