What Is R-Squared? (Definition)
The name "coefficient of determination" reflects exactly what it measures: how much of the dependent variable is determined (explained) by the predictors. In simple linear regression with one predictor, R² equals the square of the Pearson correlation coefficient r between x and y, which is where the "R" in R-Squared comes from.
R² answers a concrete question: if you used your model's predictions instead of just predicting the mean every time, by how much would your total squared error shrink? An R² of 0.72 means your model cuts the unexplained variance by 72% compared to the naive baseline of always predicting ȳ.
The concept was formalized within the broader framework of simple linear regression and is now used across nearly every quantitative discipline. The underlying mathematics of variance decomposition connects directly to ANOVA (Analysis of Variance), which partitions total variation into the same explained and unexplained components.
- Full name: Coefficient of determination
- Symbol: R² (also written as R-squared, R-squared value, or r-squared)
- Range in OLS training data: 0 to 1 (can be negative on test/out-of-sample data)
- Interpretation: Proportion of total variance in y explained by the regression model
- Formula: R² = 1 − RSS/TSS = ESS/TSS
- Limitation: Always increases as you add predictors — use Adjusted R² to compare models with different numbers of predictors
- In simple regression: R² = r² (squared Pearson correlation coefficient)
The R-Squared Formula Explained
R-Squared is built from three quantities that partition total variance into explained and unexplained parts. Understanding these three sums of squares is the foundation for interpreting any regression model.
RSS = Residual Sum of Squares = Σ(yᵢ − ŷᵢ)²
TSS = Total Sum of Squares = Σ(yᵢ − ȳ)²
yᵢ = actual value
ŷᵢ = predicted value
ȳ = mean of actual values
The formula has a clear geometric meaning. TSS measures the total spread in your outcome variable — how much the actual values vary around their mean. RSS measures how much variation remains after fitting the model — the squared distances between actual and predicted values. The ratio RSS/TSS is the fraction of variance the model fails to explain. Subtracting that from 1 gives the fraction it does explain.
The Three Sums of Squares
For any regression model, the total variance in y decomposes exactly into explained and unexplained parts:
TSS = Σ(yᵢ − ȳ)² — Total Sum of Squares
ESS = Σ(ŷᵢ − ȳ)² — Explained Sum of Squares
RSS = Σ(yᵢ − ŷᵢ)² — Residual Sum of Squares
Because TSS = ESS + RSS, the formula R² = 1 − RSS/TSS is equivalent to R² = ESS/TSS. Both give the same value. The ESS/TSS form makes the interpretation more direct: R² is the explained share of total variance. The 1 − RSS/TSS form is more commonly used in computation because residuals (yᵢ − ŷᵢ) are already calculated during model fitting.
The Adjusted R-Squared Formula
Standard R² has a known flaw: adding any predictor to a model, even a useless random variable, will either increase R² or leave it unchanged. This means comparing R² across models with different numbers of predictors is misleading. Adjusted R² corrects this by penalizing model complexity:
n = sample size
k = number of independent predictors
R² = standard R-Squared
When you add a predictor that genuinely improves the model, Adjusted R² increases. When you add a predictor that contributes little, Adjusted R² decreases or stays flat. This makes it the correct metric for comparing multiple linear regression models with different numbers of predictors.
If R² increases but Adjusted R² decreases when you add a predictor, that predictor is adding noise rather than signal. The denominator term (n − k − 1) becomes small when k approaches n, causing Adjusted R² to collapse — which is why you need far more observations than predictors in any regression model.
How to Calculate R-Squared: Step-by-Step
Calculating R² from scratch requires only basic arithmetic. The procedure below works for any regression model, whether simple linear or multiple.
Collect Actual Values and Fit the Model
Record your n actual outcomes: y₁, y₂, …, yₙ. Fit your regression model to get the corresponding predicted values: ŷ₁, ŷ₂, …, ŷₙ. In ordinary least squares regression, these predictions minimize RSS by definition.
Compute the Mean of Actual Values (ȳ)
Calculate ȳ = (y₁ + y₂ + ⋯ + yₙ) / n. This is the baseline: a model that always predicts ȳ has R² = 0 by definition, because TSS − RSS = 0.
Calculate TSS (Total Sum of Squares)
TSS = Σ(yᵢ − ȳ)². For each observation, subtract the mean from the actual value, square it, and sum all n squared differences. TSS represents the total spread in y independent of any model.
Calculate RSS (Residual Sum of Squares)
RSS = Σ(yᵢ − ŷᵢ)². For each observation, subtract the predicted value from the actual value (this is the residual), square it, and sum. RSS measures the model's remaining unexplained error. You can verify residuals using the regression scatter plot tool.
Apply the Formula: R² = 1 − RSS / TSS
Divide RSS by TSS to get the unexplained fraction, then subtract from 1. The result is R². As a check: ESS = TSS − RSS, and R² should also equal ESS/TSS.
Interpret the Result
Compare the R² to field-appropriate benchmarks (see Section 5 below). Check whether the residuals follow the regression assumptions. A high R² with non-random residual patterns still indicates a misspecified model.
Worked Examples — R-Squared Step by Step
Each example below shows the full numerical calculation so you can follow every arithmetic step. These examples span different domains to show how R² is used and interpreted across fields.
Example 1 — Advertising Spend vs. Sales Revenue
Problem: A retailer fits a regression model to predict monthly sales (y, in $000s) from advertising spend (x, in $000s) using five months of data: (x: 2, 4, 6, 8, 10) → (y: 50, 75, 80, 90, 120). The model produces predicted values ŷ = (52, 68, 84, 100, 116). Calculate R².
TSS = Σ(yᵢ − ȳ)²
RSS = Σ(yᵢ − ŷᵢ)²
Compute ȳ: ȳ = (50 + 75 + 80 + 90 + 120) / 5 = 415 / 5 = 83
Calculate TSS:
(50 − 83)² = (−33)² = 1089
(75 − 83)² = (−8)² = 64
(80 − 83)² = (−3)² = 9
(90 − 83)² = (7)² = 49
(120 − 83)² = (37)² = 1369
TSS = 1089 + 64 + 9 + 49 + 1369 = 2580
Calculate RSS:
(50 − 52)² = (−2)² = 4
(75 − 68)² = (7)² = 49
(80 − 84)² = (−4)² = 16
(90 − 100)² = (−10)² = 100
(120 − 116)² = (4)² = 16
RSS = 4 + 49 + 16 + 100 + 16 = 185
Apply formula: R² = 1 − 185/2580 = 1 − 0.0717 = 0.928
✅ R² = 0.928. The regression model explains 92.8% of the variance in monthly sales. For a simple two-variable model in a business context, this is a strong result. Verify the regression line using the simple linear regression calculator.
Example 2 — Housing Price Prediction (Multiple Regression)
Problem: A real estate analyst fits a multiple regression model using house size (sq ft) and age (years) to predict sale price ($000s). For 6 homes the actual prices are: y = [210, 185, 240, 195, 270, 220]. The model's predicted values are: ŷ = [218, 192, 235, 200, 265, 215]. Calculate R² and Adjusted R² (k = 2 predictors).
Compute ȳ: ȳ = (210 + 185 + 240 + 195 + 270 + 220) / 6 = 1320 / 6 = 220
TSS: (210−220)²+(185−220)²+(240−220)²+(195−220)²+(270−220)²+(220−220)² = 100 + 1225 + 400 + 625 + 2500 + 0 = 4850
RSS: (210−218)²+(185−192)²+(240−235)²+(195−200)²+(270−265)²+(220−215)² = 64 + 49 + 25 + 25 + 25 + 25 = 213
R²: R² = 1 − 213/4850 = 1 − 0.0439 = 0.956
Adjusted R² (n = 6, k = 2):
Adj. R² = 1 − [(1 − 0.956)(6 − 1) / (6 − 2 − 1)]
= 1 − [(0.044 × 5) / 3]
= 1 − [0.22 / 3]
= 1 − 0.0733 = 0.927
✅ R² = 0.956, Adjusted R² = 0.927. The model accounts for 95.6% of variance in housing prices. The Adjusted R² of 0.927 confirms this is genuine predictive power, not inflation from additional predictors. For deeper regression diagnostics, see the multiple linear regression guide.
Example 3 — Social Science (Moderate R²)
Problem: A researcher models exam scores (y, out of 100) from study hours per week (x) for 5 students. Actual scores: y = [65, 72, 58, 80, 70]. Predicted scores: ŷ = [68, 70, 62, 76, 69]. Calculate R² and interpret it for a social science context.
ȳ: (65 + 72 + 58 + 80 + 70) / 5 = 345 / 5 = 69
TSS: (65−69)²+(72−69)²+(58−69)²+(80−69)²+(70−69)² = 16 + 9 + 121 + 121 + 1 = 268
RSS: (65−68)²+(72−70)²+(58−62)²+(80−76)²+(70−69)² = 9 + 4 + 16 + 16 + 1 = 46
R²: 1 − 46/268 = 1 − 0.172 = 0.828
✅ R² = 0.828. Study hours explain about 83% of the variation in exam scores in this sample. In social science research, where human behavior introduces substantial unexplained variance, an R² above 0.80 is considered strong. The remaining 17% reflects factors outside the model: motivation, prior knowledge, test anxiety.
R-Squared Interpretation Guide
R² has no universal "good" or "bad" threshold. The same value of 0.40 might be excellent in a social science study of behavioral outcomes and inadequate in a calibrated engineering measurement system. Interpretation always depends on the field, the model's purpose, and what other metrics show.
R² = 0 means the model has no explanatory power (equivalent to always predicting the mean). R² = 1 means perfect fit. Values between 0 and 1 indicate what fraction of outcome variance the model captures. Whether a given value is "good" depends on the field and the alternative benchmarks available.
Interpretation by Value Range
These ranges are a starting point, not rules. Use them as orientation while applying the field-specific context below.
R² Benchmarks by Field
| Field / Application | Typical R² Range | Notes |
|---|---|---|
| Physics / Engineering (measurement) | 0.95 – 1.00 | Controlled experiments with low noise |
| Chemistry / Materials Science | 0.90 – 0.99 | Well-characterized physical relationships |
| Financial Forecasting | 0.60 – 0.90 | Depends on asset class and timeframe |
| Econometrics / Macroeconomics | 0.50 – 0.85 | GDP models, inflation predictors |
| Machine Learning Regression | 0.70 – 0.95 | Varies by task; check on test set |
| Marketing / Business Analytics | 0.40 – 0.80 | Consumer behavior adds noise |
| Psychology / Social Science | 0.25 – 0.60 | Complex human outcomes; 0.40+ often accepted |
| Clinical / Epidemiological Research | 0.20 – 0.60 | Biological variability is large |
| Cross-sectional Survey Data | 0.10 – 0.40 | Observational noise from unmeasured confounders |
A model can have R² = 0.97 and still be badly misspecified. Overfitting on training data, non-linear relationships fitted with a linear model, or influential outliers driving a spurious fit all produce high R² with misleading predictions. Always examine residual plots and evaluate on held-out data. Use the regression scatter plot tool to check residual patterns visually.
R-Squared Calculator
Enter your actual values and predicted values below (comma-separated or space-separated). The calculator computes R², Adjusted R², TSS, RSS, and ESS, and gives an interpretation based on the value.
R² Calculator — Enter Actual & Predicted Values
Paste comma-separated numbers (e.g. from Excel). Both lists must have the same length.
Visual Understanding: High vs Low R²
The scatter plots below show the same number of data points under two different R² conditions. When points cluster tightly around the regression line, R² is high. When they scatter widely, R² is low — the line captures less of the story in the data.
Scatter Plot Comparison: High R² vs Low R²
R² vs Adjusted R²: When to Use Each
Choosing between R² and Adjusted R² depends on your goal. For describing how well a fixed model fits on training data, standard R² is fine. For comparing models or building models with feature selection, Adjusted R² is the correct choice.
| Dimension | R² (Standard) | Adjusted R² |
|---|---|---|
| Formula | 1 − RSS/TSS | 1 − [(1−R²)(n−1)/(n−k−1)] |
| Effect of adding predictors | Never decreases | Decreases if predictor adds noise |
| Use case | Describing a single model's fit | Comparing models with different k |
| Penalizes overfitting | No | Yes |
| Interpretation | Proportion of variance explained | Adjusted proportion accounting for model complexity |
| Range | 0 to 1 (on training data) | Can be below 0 with many predictors and small n |
| Simple regression (k = 1) | Same as Adj. R² when k = 1 | Equals R² when k = 1 |
| Preferred in academia for | Reporting a final model | Model selection, feature comparison |
R² vs RMSE, MAE, and Correlation
R² is one member of a broader toolkit of regression evaluation metrics. Each answers a different question about model performance, and strong practice is to report several together rather than relying on R² alone.
| Metric | Formula | What It Measures | Range | Best Used When |
|---|---|---|---|---|
| R² (Coefficient of Determination) | 1 − RSS/TSS | Proportion of variance explained | 0 to 1 (training) | Describing overall model fit; comparing to a mean baseline |
| Adjusted R² | 1 − [(1−R²)(n−1)/(n−k−1)] | Variance explained, penalized for k | Can be < 0 | Comparing models with different numbers of predictors |
| RMSE (Root Mean Squared Error) | √(RSS/n) | Average prediction error in outcome units | 0 to ∞ | When large errors are especially costly (penalizes outliers) |
| MAE (Mean Absolute Error) | Σ|yᵢ − ŷᵢ| / n | Average absolute error in outcome units | 0 to ∞ | When all error magnitudes matter equally; robust to outliers |
| Pearson r (Correlation) | Cov(x,y) / (σₓσᵧ) | Strength and direction of linear relationship | −1 to 1 | Simple linear regression only; R² = r² |
R² tells you the relative performance (compared to always predicting the mean), while RMSE tells you the absolute error in the units of y. Together they give a complete picture. For model comparison, also run a hypothesis test on whether the improvement is statistically significant.
Real-World Applications of R-Squared
R-Squared appears in virtually every field that uses regression modeling. Below are the most common use cases across industry and research.
Financial Modeling
Analysts use R² to evaluate how much of a stock's return is explained by market factors (beta). In factor models like the Fama-French model, R² shows how much of portfolio variance is captured by systematic risk factors.
Real Estate Valuation
Property valuation models use R² to assess how well predictors like size, location, and age explain price variation. Automated valuation models (AVMs) used by lenders typically report R² on held-out test samples.
Machine Learning
R² is the standard regression score in scikit-learn (r2_score). Crucially, it is computed on the test set in ML contexts, where R² can be negative if the model is worse than the mean predictor on unseen data.
Clinical Research
Epidemiologists report R² when modeling continuous outcomes like blood pressure, weight change, or biomarker levels. In clinical settings, even an R² of 0.30 can be meaningful because biological outcomes have high inherent variability.
Marketing Analytics
Marketing mix models use R² to evaluate how well advertising spend, promotions, and seasonality predict sales. R² values of 0.70–0.90 are typical in well-specified marketing mix models.
Engineering & Quality Control
In process engineering and calibration, R² is used to validate measurement instruments and process models. R² above 0.99 is often required for calibration curves in analytical chemistry.
How to Calculate R-Squared in Excel
Excel provides two direct methods to obtain R² in a spreadsheet, both requiring only a few clicks.
Using the RSQ Function
Place actual values in column A and predicted values in column B. In any empty cell, enter: =RSQ(A2:A20, B2:B20) — substituting the actual range. Excel returns R² directly. Note: Excel's RSQ function computes the squared Pearson correlation between two arrays, which equals R² in simple linear regression. For multiple regression, use Method B.
Using the LINEST Function (Multiple Regression)
Select a range of cells, type =LINEST(y_range, x_range, TRUE, TRUE), and press Ctrl+Shift+Enter (array formula). Excel returns a block of statistics; R² is in row 3, column 1 of the output. This method works for any number of predictors.
Using a Trendline on a Chart
Create an XY scatter chart from your x and y data. Right-click on the data series, choose "Add Trendline," select Linear (or another model type), and check both "Display Equation on chart" and "Display R-squared value on chart." Excel adds R² directly to the chart. This method applies only to the chart's data range.
Formula Glossary
All key terms and formulas from this guide, collected in one reference block.