What is R-Squared in statistics?

R-Squared (R²), also called the coefficient of determination, is a statistical measure that shows the proportion of variance in a dependent variable that is explained by independent variables in a regression model. It ranges from 0 to 1, where 0 means the model explains none of the variance and 1 means it explains all of it.

What is the R-Squared formula?

The R-Squared formula is: R² = 1 − (RSS / TSS), where RSS is the Residual Sum of Squares (Σ(yᵢ − ŷᵢ)²) and TSS is the Total Sum of Squares (Σ(yᵢ − ȳ)²). Equivalently, R² = ESS / TSS where ESS is the Explained Sum of Squares.

What is a good R-Squared value?

A good R-Squared value depends on the field. In physical sciences, values above 0.90 are common. In social sciences, 0.40–0.60 is often acceptable. In machine learning regression, 0.70–0.90 is generally considered strong. There is no universal threshold — R² must be interpreted alongside other metrics like RMSE and in the context of the field.

What is the difference between R² and adjusted R²?

R² always increases or stays the same when you add more predictors, even if they add no real predictive value. Adjusted R² corrects for this by penalizing model complexity: Adjusted R² = 1 − [(1 − R²)(n − 1) / (n − k − 1)], where n is the sample size and k is the number of predictors. Use adjusted R² when comparing models with different numbers of predictors.

Can R-Squared be negative?

Yes, R² can be negative when using it to evaluate a model on out-of-sample (test) data or when a model performs worse than simply predicting the mean. In ordinary least squares regression on training data, R² is always between 0 and 1 by construction.

How do you calculate R-Squared step by step?

Step 1: Collect actual values (y) and compute the mean (ȳ). Step 2: Fit the regression model to get predicted values (ŷ). Step 3: Calculate TSS = Σ(yᵢ − ȳ)². Step 4: Calculate RSS = Σ(yᵢ − ŷᵢ)². Step 5: Apply R² = 1 − RSS/TSS. Step 6: Interpret the result using field-appropriate benchmarks.

R-Squared (R²) Explained: Formula, Interpretation & Uses (2026)

What Is R-Squared? (Definition)

Definition — R-Squared (Coefficient of Determination)

R-Squared (R²), also called the coefficient of determination, is a statistical measure representing the proportion of variance in a dependent variable that is explained by one or more independent variables in a regression model. It ranges from 0 to 1: an R² of 0 means the model explains none of the variation in the outcome; an R² of 1 means it explains all of it.

R² = 1 − (RSS / TSS)

The name "coefficient of determination" reflects exactly what it measures: how much of the dependent variable is determined (explained) by the predictors. In simple linear regression with one predictor, R² equals the square of the Pearson correlation coefficient r between x and y, which is where the "R" in R-Squared comes from.

R² answers a concrete question: if you used your model's predictions instead of just predicting the mean every time, by how much would your total squared error shrink? An R² of 0.72 means your model cuts the unexplained variance by 72% compared to the naive baseline of always predicting ȳ.

The concept was formalized within the broader framework of simple linear regression and is now used across nearly every quantitative discipline. The underlying mathematics of variance decomposition connects directly to ANOVA (Analysis of Variance), which partitions total variation into the same explained and unexplained components.

No variance explained — model no better than the mean

0.5

50% of variance explained by the predictors

Perfect fit — all variance explained, zero residuals

R² = r²

In simple linear regression: R² equals the squared Pearson r

⚡ Quick Reference — R² Key Facts

Full name: Coefficient of determination
Symbol: R² (also written as R-squared, R-squared value, or r-squared)
Range in OLS training data: 0 to 1 (can be negative on test/out-of-sample data)
Interpretation: Proportion of total variance in y explained by the regression model
Formula: R² = 1 − RSS/TSS = ESS/TSS
Limitation: Always increases as you add predictors — use Adjusted R² to compare models with different numbers of predictors
In simple regression: R² = r² (squared Pearson correlation coefficient)

The R-Squared Formula Explained

R-Squared is built from three quantities that partition total variance into explained and unexplained parts. Understanding these three sums of squares is the foundation for interpreting any regression model.

R-Squared Formula (Coefficient of Determination)

R² = 1 − RSS / TSS

RSS = Residual Sum of Squares = Σ(yᵢ − ŷᵢ)² TSS = Total Sum of Squares = Σ(yᵢ − ȳ)² yᵢ = actual value ŷᵢ = predicted value ȳ = mean of actual values

The formula has a clear geometric meaning. TSS measures the total spread in your outcome variable — how much the actual values vary around their mean. RSS measures how much variation remains after fitting the model — the squared distances between actual and predicted values. The ratio RSS/TSS is the fraction of variance the model fails to explain. Subtracting that from 1 gives the fraction it does explain.

The Three Sums of Squares

For any regression model, the total variance in y decomposes exactly into explained and unexplained parts:

Variance Decomposition Identity

TSS = ESS + RSS

TSS = Σ(yᵢ − ȳ)² — Total Sum of Squares ESS = Σ(ŷᵢ − ȳ)² — Explained Sum of Squares RSS = Σ(yᵢ − ŷᵢ)² — Residual Sum of Squares

Because TSS = ESS + RSS, the formula R² = 1 − RSS/TSS is equivalent to R² = ESS/TSS. Both give the same value. The ESS/TSS form makes the interpretation more direct: R² is the explained share of total variance. The 1 − RSS/TSS form is more commonly used in computation because residuals (yᵢ − ŷᵢ) are already calculated during model fitting.

The Adjusted R-Squared Formula

Standard R² has a known flaw: adding any predictor to a model, even a useless random variable, will either increase R² or leave it unchanged. This means comparing R² across models with different numbers of predictors is misleading. Adjusted R² corrects this by penalizing model complexity:

Adjusted R-Squared Formula

Adj. R² = 1 − [(1 − R²)(n − 1) / (n − k − 1)]

n = sample size k = number of independent predictors R² = standard R-Squared

When you add a predictor that genuinely improves the model, Adjusted R² increases. When you add a predictor that contributes little, Adjusted R² decreases or stays flat. This makes it the correct metric for comparing multiple linear regression models with different numbers of predictors.

⚠️

When R² and Adjusted R² Diverge

If R² increases but Adjusted R² decreases when you add a predictor, that predictor is adding noise rather than signal. The denominator term (n − k − 1) becomes small when k approaches n, causing Adjusted R² to collapse — which is why you need far more observations than predictors in any regression model.

How to Calculate R-Squared: Step-by-Step

Calculating R² from scratch requires only basic arithmetic. The procedure below works for any regression model, whether simple linear or multiple.

Collect Actual Values and Fit the Model

Record your n actual outcomes: y₁, y₂, …, yₙ. Fit your regression model to get the corresponding predicted values: ŷ₁, ŷ₂, …, ŷₙ. In ordinary least squares regression, these predictions minimize RSS by definition.

Compute the Mean of Actual Values (ȳ)

Calculate ȳ = (y₁ + y₂ + ⋯ + yₙ) / n. This is the baseline: a model that always predicts ȳ has R² = 0 by definition, because TSS − RSS = 0.

Calculate TSS (Total Sum of Squares)

TSS = Σ(yᵢ − ȳ)². For each observation, subtract the mean from the actual value, square it, and sum all n squared differences. TSS represents the total spread in y independent of any model.

Calculate RSS (Residual Sum of Squares)

RSS = Σ(yᵢ − ŷᵢ)². For each observation, subtract the predicted value from the actual value (this is the residual), square it, and sum. RSS measures the model's remaining unexplained error. You can verify residuals using the regression scatter plot tool.

Apply the Formula: R² = 1 − RSS / TSS

Divide RSS by TSS to get the unexplained fraction, then subtract from 1. The result is R². As a check: ESS = TSS − RSS, and R² should also equal ESS/TSS.

Interpret the Result

Compare the R² to field-appropriate benchmarks (see Section 5 below). Check whether the residuals follow the regression assumptions. A high R² with non-random residual patterns still indicates a misspecified model.

Worked Examples — R-Squared Step by Step

Each example below shows the full numerical calculation so you can follow every arithmetic step. These examples span different domains to show how R² is used and interpreted across fields.

Example 1 — Advertising Spend vs. Sales Revenue

Worked Example 1 — Simple Linear Regression

Problem: A retailer fits a regression model to predict monthly sales (y, in $000s) from advertising spend (x, in $000s) using five months of data: (x: 2, 4, 6, 8, 10) → (y: 50, 75, 80, 90, 120). The model produces predicted values ŷ = (52, 68, 84, 100, 116). Calculate R².

R² Formula

R² = 1 − RSS / TSS

TSS = Σ(yᵢ − ȳ)² RSS = Σ(yᵢ − ŷᵢ)²

Compute ȳ: ȳ = (50 + 75 + 80 + 90 + 120) / 5 = 415 / 5 = 83

Calculate TSS:
(50 − 83)² = (−33)² = 1089
(75 − 83)² = (−8)² = 64
(80 − 83)² = (−3)² = 9
(90 − 83)² = (7)² = 49
(120 − 83)² = (37)² = 1369
TSS = 1089 + 64 + 9 + 49 + 1369 = 2580

Calculate RSS:
(50 − 52)² = (−2)² = 4
(75 − 68)² = (7)² = 49
(80 − 84)² = (−4)² = 16
(90 − 100)² = (−10)² = 100
(120 − 116)² = (4)² = 16
RSS = 4 + 49 + 16 + 100 + 16 = 185

Apply formula: R² = 1 − 185/2580 = 1 − 0.0717 = 0.928

✅ R² = 0.928. The regression model explains 92.8% of the variance in monthly sales. For a simple two-variable model in a business context, this is a strong result. Verify the regression line using the simple linear regression calculator.

Example 2 — Housing Price Prediction (Multiple Regression)

Worked Example 2 — Multiple Linear Regression

Problem: A real estate analyst fits a multiple regression model using house size (sq ft) and age (years) to predict sale price ($000s). For 6 homes the actual prices are: y = [210, 185, 240, 195, 270, 220]. The model's predicted values are: ŷ = [218, 192, 235, 200, 265, 215]. Calculate R² and Adjusted R² (k = 2 predictors).

Compute ȳ: ȳ = (210 + 185 + 240 + 195 + 270 + 220) / 6 = 1320 / 6 = 220

TSS: (210−220)²+(185−220)²+(240−220)²+(195−220)²+(270−220)²+(220−220)² = 100 + 1225 + 400 + 625 + 2500 + 0 = 4850

RSS: (210−218)²+(185−192)²+(240−235)²+(195−200)²+(270−265)²+(220−215)² = 64 + 49 + 25 + 25 + 25 + 25 = 213

R²: R² = 1 − 213/4850 = 1 − 0.0439 = 0.956

Adjusted R² (n = 6, k = 2):
Adj. R² = 1 − [(1 − 0.956)(6 − 1) / (6 − 2 − 1)]
= 1 − [(0.044 × 5) / 3]
= 1 − [0.22 / 3]
= 1 − 0.0733 = 0.927

✅ R² = 0.956, Adjusted R² = 0.927. The model accounts for 95.6% of variance in housing prices. The Adjusted R² of 0.927 confirms this is genuine predictive power, not inflation from additional predictors. For deeper regression diagnostics, see the multiple linear regression guide.

Worked Example 3 — Social Science Context

Problem: A researcher models exam scores (y, out of 100) from study hours per week (x) for 5 students. Actual scores: y = [65, 72, 58, 80, 70]. Predicted scores: ŷ = [68, 70, 62, 76, 69]. Calculate R² and interpret it for a social science context.

ȳ: (65 + 72 + 58 + 80 + 70) / 5 = 345 / 5 = 69

TSS: (65−69)²+(72−69)²+(58−69)²+(80−69)²+(70−69)² = 16 + 9 + 121 + 121 + 1 = 268

RSS: (65−68)²+(72−70)²+(58−62)²+(80−76)²+(70−69)² = 9 + 4 + 16 + 16 + 1 = 46

R²: 1 − 46/268 = 1 − 0.172 = 0.828

✅ R² = 0.828. Study hours explain about 83% of the variation in exam scores in this sample. In social science research, where human behavior introduces substantial unexplained variance, an R² above 0.80 is considered strong. The remaining 17% reflects factors outside the model: motivation, prior knowledge, test anxiety.

R-squared methodology follows Draper, N. R., & Smith, H. (1998). Applied Regression Analysis (3rd ed.). Wiley. Variance decomposition notation consistent with the NIST/SEMATECH e-Handbook of Statistical Methods.

R-Squared Interpretation Guide

R² has no universal "good" or "bad" threshold. The same value of 0.40 might be excellent in a social science study of behavioral outcomes and inadequate in a calibrated engineering measurement system. Interpretation always depends on the field, the model's purpose, and what other metrics show.

📊

Featured Snippet — R² Interpretation

R² = 0 means the model has no explanatory power (equivalent to always predicting the mean). R² = 1 means perfect fit. Values between 0 and 1 indicate what fraction of outcome variance the model captures. Whether a given value is "good" depends on the field and the alternative benchmarks available.

Interpretation by Value Range

0.00–0.20

Very Weak

0.20–0.40

Weak

0.40–0.60

Moderate

0.60–0.80

Strong

0.80–1.00

Very Strong

These ranges are a starting point, not rules. Use them as orientation while applying the field-specific context below.

R² Benchmarks by Field

Field / Application	Typical R² Range	Notes
Physics / Engineering (measurement)	0.95 – 1.00	Controlled experiments with low noise
Chemistry / Materials Science	0.90 – 0.99	Well-characterized physical relationships
Financial Forecasting	0.60 – 0.90	Depends on asset class and timeframe
Econometrics / Macroeconomics	0.50 – 0.85	GDP models, inflation predictors
Machine Learning Regression	0.70 – 0.95	Varies by task; check on test set
Marketing / Business Analytics	0.40 – 0.80	Consumer behavior adds noise
Psychology / Social Science	0.25 – 0.60	Complex human outcomes; 0.40+ often accepted
Clinical / Epidemiological Research	0.20 – 0.60	Biological variability is large
Cross-sectional Survey Data	0.10 – 0.40	Observational noise from unmeasured confounders

⚠️

High R² Does Not Mean a Good Model

A model can have R² = 0.97 and still be badly misspecified. Overfitting on training data, non-linear relationships fitted with a linear model, or influential outliers driving a spurious fit all produce high R² with misleading predictions. Always examine residual plots and evaluate on held-out data. Use the regression scatter plot tool to check residual patterns visually.

R-Squared Calculator

Enter your actual values and predicted values below (comma-separated or space-separated). The calculator computes R², Adjusted R², TSS, RSS, and ESS, and gives an interpretation based on the value.

R² Calculator — Enter Actual & Predicted Values

Paste comma-separated numbers (e.g. from Excel). Both lists must have the same length.

Actual Values (y)

Predicted Values (ŷ)

Number of Predictors (k) — for Adjusted R²

Visual Understanding: High vs Low R²

The scatter plots below show the same number of data points under two different R² conditions. When points cluster tightly around the regression line, R² is high. When they scatter widely, R² is low — the line captures less of the story in the data.

Scatter Plot Comparison: High R² vs Low R²

High R² Model

R² = 0.93

Low R² Model

R² = 0.18

R² vs Adjusted R²: When to Use Each

Choosing between R² and Adjusted R² depends on your goal. For describing how well a fixed model fits on training data, standard R² is fine. For comparing models or building models with feature selection, Adjusted R² is the correct choice.

Dimension	R² (Standard)	Adjusted R²
Formula	1 − RSS/TSS	1 − [(1−R²)(n−1)/(n−k−1)]
Effect of adding predictors	Never decreases	Decreases if predictor adds noise
Use case	Describing a single model's fit	Comparing models with different k
Penalizes overfitting	No	Yes
Interpretation	Proportion of variance explained	Adjusted proportion accounting for model complexity
Range	0 to 1 (on training data)	Can be below 0 with many predictors and small n
Simple regression (k = 1)	Same as Adj. R² when k = 1	Equals R² when k = 1
Preferred in academia for	Reporting a final model	Model selection, feature comparison

R² vs RMSE, MAE, and Correlation

R² is one member of a broader toolkit of regression evaluation metrics. Each answers a different question about model performance, and strong practice is to report several together rather than relying on R² alone.

Metric	Formula	What It Measures	Range	Best Used When
R² (Coefficient of Determination)	1 − RSS/TSS	Proportion of variance explained	0 to 1 (training)	Describing overall model fit; comparing to a mean baseline
Adjusted R²	1 − [(1−R²)(n−1)/(n−k−1)]	Variance explained, penalized for k	Can be < 0	Comparing models with different numbers of predictors
RMSE (Root Mean Squared Error)	√(RSS/n)	Average prediction error in outcome units	0 to ∞	When large errors are especially costly (penalizes outliers)
MAE (Mean Absolute Error)	Σ\|yᵢ − ŷᵢ\| / n	Average absolute error in outcome units	0 to ∞	When all error magnitudes matter equally; robust to outliers
Pearson r (Correlation)	Cov(x,y) / (σₓσᵧ)	Strength and direction of linear relationship	−1 to 1	Simple linear regression only; R² = r²

✅

Best Practice: Report R² Alongside RMSE

R² tells you the relative performance (compared to always predicting the mean), while RMSE tells you the absolute error in the units of y. Together they give a complete picture. For model comparison, also run a hypothesis test on whether the improvement is statistically significant.

Real-World Applications of R-Squared

R-Squared appears in virtually every field that uses regression modeling. Below are the most common use cases across industry and research.

📈

Financial Modeling

Analysts use R² to evaluate how much of a stock's return is explained by market factors (beta). In factor models like the Fama-French model, R² shows how much of portfolio variance is captured by systematic risk factors.

🏠

Real Estate Valuation

Property valuation models use R² to assess how well predictors like size, location, and age explain price variation. Automated valuation models (AVMs) used by lenders typically report R² on held-out test samples.

🤖

Machine Learning

R² is the standard regression score in scikit-learn (r2_score). Crucially, it is computed on the test set in ML contexts, where R² can be negative if the model is worse than the mean predictor on unseen data.

🏥

Clinical Research

Epidemiologists report R² when modeling continuous outcomes like blood pressure, weight change, or biomarker levels. In clinical settings, even an R² of 0.30 can be meaningful because biological outcomes have high inherent variability.

🎯

Marketing Analytics

Marketing mix models use R² to evaluate how well advertising spend, promotions, and seasonality predict sales. R² values of 0.70–0.90 are typical in well-specified marketing mix models.

⚙️

Engineering & Quality Control

In process engineering and calibration, R² is used to validate measurement instruments and process models. R² above 0.99 is often required for calibration curves in analytical chemistry.

How to Calculate R-Squared in Excel

Excel provides two direct methods to obtain R² in a spreadsheet, both requiring only a few clicks.

Using the RSQ Function

Place actual values in column A and predicted values in column B. In any empty cell, enter: =RSQ(A2:A20, B2:B20) — substituting the actual range. Excel returns R² directly. Note: Excel's RSQ function computes the squared Pearson correlation between two arrays, which equals R² in simple linear regression. For multiple regression, use Method B.

Using the LINEST Function (Multiple Regression)

Select a range of cells, type =LINEST(y_range, x_range, TRUE, TRUE), and press Ctrl+Shift+Enter (array formula). Excel returns a block of statistics; R² is in row 3, column 1 of the output. This method works for any number of predictors.

Using a Trendline on a Chart

Create an XY scatter chart from your x and y data. Right-click on the data series, choose "Add Trendline," select Linear (or another model type), and check both "Display Equation on chart" and "Display R-squared value on chart." Excel adds R² directly to the chart. This method applies only to the chart's data range.

Formula Glossary

All key terms and formulas from this guide, collected in one reference block.

R-Squared (R²)

Proportion of variance in the dependent variable explained by the regression model. Also called the coefficient of determination.

R² = 1 − RSS/TSS

Adjusted R²

R² penalized for the number of predictors k. Decreases if adding a predictor does not improve fit beyond what chance would produce.

1 − [(1−R²)(n−1)/(n−k−1)]

TSS — Total Sum of Squares

Total variance in y relative to its mean. Baseline against which the model is evaluated.

TSS = Σ(yᵢ − ȳ)²

RSS — Residual Sum of Squares

Unexplained variance remaining after fitting the model. Also called SSE (Sum of Squared Errors).

RSS = Σ(yᵢ − ŷᵢ)²

ESS — Explained Sum of Squares

Variance captured by the regression model. ESS = TSS − RSS.

ESS = Σ(ŷᵢ − ȳ)²

Residual (e)

The difference between an actual and predicted value for observation i.

eᵢ = yᵢ − ŷᵢ

Pearson r

Correlation coefficient measuring linear association. In simple linear regression, R² = r².

r = Cov(x,y)/(σₓσᵧ)

RMSE

Root Mean Squared Error — average prediction error in the units of y. Measures absolute error magnitude.

RMSE = √(RSS/n)

Frequently Asked Questions

External references: James et al., Introduction to Statistical Learning (ISLR) — the graduate-level reference for R² in regression models; scikit-learn r2_score documentation — how R² is implemented in Python's leading ML library; NIST Engineering Statistics Handbook — authoritative technical definition of the coefficient of determination.