Regression Analysis Sales Forecasting Business Analytics 32 min read July 3, 2026
BY: Statistics Fundamentals Team
Reviewed By: Minsa A (Senior Statistics Editor)

Forecasting Sales with Regression

A regional bakery chain has 24 months of sales records and a hunch that colder weeks sell more bread. Regression turns that hunch into a number: for every 10-degree drop in temperature, weekly sales rise by a specific, testable amount. That number is what lets an operations manager order the right amount of flour instead of guessing.

This is a reference guide to sales forecasting with regression, part of the Statistics Fundamentals library. It covers the regression equation, slope and intercept, R² and residuals, three worked business examples with real arithmetic, a step-by-step build process, and common mistakes to avoid. Every formula shown here is reproduced in the interactive forecast calculator below, so you can test it against your own sales numbers.

What You'll Learn
  • ✓ What regression analysis is and why it works for sales forecasting
  • ✓ How to read the regression equation, slope, intercept, R², and residuals
  • ✓ Three worked examples: monthly sales trend, ad spend, and seasonal retail demand
  • ✓ A nine-step process for building your own forecast from raw data
  • ✓ How regression compares to moving averages, exponential smoothing, and ARIMA
  • ✓ A free calculator that forecasts future sales from your own historical numbers

What Is Sales Forecasting?

Definition — Sales Forecasting
Sales forecasting is the practice of estimating future sales using historical data, market conditions, and statistical or judgmental methods. A forecast is not a guess; it is a number attached to evidence, built so a business can plan inventory, staffing, and budget before the revenue actually arrives.

Every business runs on a forecast, whether that forecast is written down or not. A café owner who orders more milk before a holiday weekend is forecasting. A finance team building next year's budget is forecasting. The difference between an informal forecast and a statistical one is that a statistical forecast can be checked: you can measure how wrong it was last time and adjust the method, something a gut feeling does not let you do.

Forecasting methods generally split into two families. Judgmental methods rely on expert opinion, sales team estimates, or market research, and work well when little historical data exists, such as for a new product launch. Quantitative methods, including regression, moving averages, and time series models, use historical numbers directly and work best once a business has at least a year of consistent sales records. Most companies blend both: a regression model sets the baseline, and a sales manager adjusts it for known events the model cannot see, such as a planned price change or a competitor closing a nearby store.

Forecasts feed directly into decisions with real cost: how much inventory to hold, how many staff to schedule, what revenue to promise investors, and how much to spend on marketing next quarter. An inaccurate forecast is not a rounding error. It shows up as unsold stock, missed sales from stockouts, or a budget that has to be revised mid-year.

What Is Regression Analysis?

Definition — Regression Analysis
Regression analysis is a statistical method that measures the relationship between a variable you want to predict, such as sales, and one or more variables that might explain it, such as advertising spend, price, or the season. It fits a line, or a more complex curve, through historical data points and uses that line to estimate values you have not observed yet, including future sales.

Think of regression as a rule of thumb, except the rule is tested against every past data point instead of pulled from a hunch. If sales have risen by roughly $4,500 for every $1,000 spent on advertising over the past two years, regression finds that $4,500 figure formally, reports how reliable it is, and lets you plug in next month's planned ad budget to get a specific predicted sales number rather than a vague sense that "more advertising helps."

The technique dates back to work by Francis Galton in the 1880s on the heights of parents and children, and it remains, by a wide margin, the most common approach to structured forecasting in business, economics, and the sciences. For a beginner-friendly walkthrough of the underlying math, Khan Academy's statistics and probability course covers the concepts from scratch, and the Statistics Fundamentals guide to simple linear regression goes deeper into the mechanics used throughout this article.

Regression answers a specific question: given what has happened before, what is the single best straight-line estimate of what happens next? It does not know about a competitor's surprise product launch or a supply chain disruption. It only knows the pattern in the data you feed it, which is why the examples later in this guide pair every calculation with a business judgment about what the number does and does not capture.

Why Regression Is Useful for Sales Forecasting

A sales manager who says "I think Q4 will be strong" is stating an opinion. A sales manager who says "based on the last three years, Q4 sales run 42% above the quarterly average, and this model accounts for 91% of that pattern" is stating a claim that can be checked, defended in a budget meeting, and improved next year. That shift, from opinion to checkable claim, is the practical value regression brings to a business.

Where Regression Forecasting Pays Off
  • Data-driven decisions: replaces "I think" with a number backed by every past observation, not just the most memorable one.
  • Trend identification: separates a real upward or downward trend from ordinary week-to-week noise.
  • Revenue planning: gives finance a defensible baseline for budgets, investor updates, and hiring plans.
  • Inventory optimization: ties stock orders to a specific predicted demand number instead of a round guess.
  • Budget forecasting: lets a business test "what if we spent $5,000 more on ads" against a model fitted to actual past spend.
  • Marketing planning: quantifies which channels or campaigns actually move sales, not just which ones felt busy.

A 2015 Harvard Business Review piece on regression analysis makes a related point worth keeping in mind: most managers will never run the regression themselves, but they do need to read and question a colleague's model, including what variables it includes, what its R² actually means, and where it might be overstating certainty. This guide is written with that reader in mind as much as the analyst building the model.

Types of Regression Used in Sales Forecasting

Most business forecasts start with one of two models: simple linear regression, when a single factor drives the prediction, or multiple linear regression, when several factors act together. A handful of specialized variants exist for specific situations, covered in the table below.

Simple Linear Regression

Simple linear regression predicts an outcome from a single input, most often time (to capture a trend) or one spend figure such as advertising budget. It is the starting point for almost every forecasting project because it is easy to compute, easy to explain to a non-technical stakeholder, and a useful baseline against which to judge more complex models.

Simple Linear Regression Equation
ŷ = b0 + b1x
ŷ = predicted sales b0 = intercept b1 = slope x = predictor (e.g. time period)

Multiple Linear Regression

Multiple linear regression predicts sales from two or more inputs at once, such as advertising spend, price, and a seasonal indicator. It usually captures more of the real story behind sales movements than a single-variable model, since sales rarely depend on just one thing, but it needs more historical data and more care to avoid including variables that do not actually add explanatory power. The multiple linear regression guide covers how to add and interpret additional predictors.

Multiple Linear Regression Equation
ŷ = b0 + b1x1 + b2x2 + ... + bnxn
x1, x2 ... = each predictor variable b1, b2 ... = each predictor's coefficient

Four other regression variants come up often enough in forecasting work to be worth knowing by name, even if you never run one yourself.

TypeWhat It ModelsBusiness Use Case
Polynomial RegressionA curved relationship, using powers of x (x², x³) instead of a straight lineProduct life-cycle sales that rise, peak, and decline rather than moving in one direction
Logistic RegressionThe probability of a binary outcome, such as yes/no, rather than a continuous numberPredicting whether an individual lead converts to a sale, not how much they will spend
Ridge RegressionA regularized linear model that shrinks coefficients to reduce overfittingMultiple regression with many correlated predictors, such as several overlapping marketing channels
Lasso RegressionA regularized linear model that can shrink some coefficients to exactly zeroAutomatically dropping weak predictors from a large forecasting model to keep it simple

Python's scikit-learn linear_model module implements all six of these, including Ridge and Lasso, which is one reason Python has become a common choice once a forecasting project outgrows a spreadsheet.

Key Statistical Concepts Behind Regression

Every regression forecast rests on the same handful of building blocks. Understanding what each one means, in plain business terms, is what separates reading a regression output from actually trusting it.

Dependent and Independent Variables

The dependent variable is the outcome you are trying to predict, in this guide, sales. The independent variable, or predictor, is the factor you believe influences that outcome, such as time, ad spend, or price. Sales depends on ad spend; ad spend does not depend on sales in the same model, which is why the direction matters when you set up the calculation.

Slope and Intercept

The slope (b1) tells you how much the predicted value changes for every one-unit increase in the predictor. The intercept (b0) tells you the predicted value when the predictor equals zero, which is a useful anchor point even when zero itself is not a realistic scenario.

ℹ️
Reading the slope in plain terms

A slope of 2.51 in a monthly sales trend means sales are rising by about 2.51 units, in whatever unit you measured, for every month that passes. A slope of 4.56 against ad spend in thousands means every additional $1,000 in ad spend is associated with roughly $4,560 in additional sales. The full mechanics live in the slope and intercept guide.

Correlation, R², and Adjusted R²

Correlation measures how tightly two variables move together, on a scale from −1 (perfectly opposite) to +1 (perfectly aligned). R², the coefficient of determination, is the square of that relationship and describes how much of the variation in sales the model explains. Adjusted R² makes the same calculation but penalizes a model for adding predictors that do not genuinely improve the fit, which matters once you move from simple to multiple regression.

Adjusted R² Formula
Adj. R² = 1 − [(1 − R²)(n − 1) / (n − k − 1)]
n = number of observations k = number of predictors
R² RangeGeneral ReadCaveat
0.70 – 1.00Strong fit; the model explains most of the movement in salesCheck for overfitting if predictors were added freely
0.40 – 0.69Moderate fit; useful as a directional guide, less reliable for precise numbersCombine with judgment and known upcoming events
Below 0.40Weak fit; other unmeasured factors are driving most of the variationLook for a missing predictor before trusting the forecast

These ranges are a general guide, not a fixed rule. Messy, real-world business data is usually noisier than data from a physical experiment, so an R² of 0.55 in a marketing model can be genuinely useful, while the same 0.55 might be considered weak in a manufacturing quality-control context. See the R² guide and the correlation calculator for hands-on practice, and the Pearson correlation guide for the correlation math behind R².

Residuals and the Least Squares Method

A residual is the gap between what actually happened and what the model predicted: residual = actual − predicted. The least squares method is how the regression line itself gets chosen: out of every possible line, it picks the one that minimizes the sum of the squared residuals, which is why it is sometimes called ordinary least squares, or OLS.

Least Squares Objective
Minimize Σ(actual − predicted)²

Squaring the residuals before summing them does two things: it makes every gap positive so they cannot cancel out, and it penalizes large misses more heavily than small ones. For a deeper technical treatment, the NIST/SEMATECH Engineering Statistics Handbook walks through the derivation of the least squares formulas in full. On this site, the residuals guide, RMSE guide, and influential points guide cover how to use residuals to check whether a model is trustworthy.

Prediction Interval vs. Confidence Interval

These two terms are often used interchangeably in casual conversation and mean different things statistically, which is a common source of confusion in a forecasting report.

PropertyConfidence IntervalPrediction Interval
What it estimatesA range for the average, or expected, sales value at a given pointA range for one individual future sales figure
Typical widthNarrowerWider, because it adds the natural spread of individual outcomes
Business use"What is our average expected trend line for next quarter?""What range should we plan for in next month's actual number?"

The confidence interval for the mean guide and margin of error guide cover the underlying calculation, which extends directly to regression once you are estimating an interval around a predicted value rather than a plain average.

⚠️
Check the assumptions before you trust the numbers

Linear regression assumes the relationship is roughly linear, the residuals are independent of each other, the spread of residuals stays roughly constant across the range of predictions (homoscedasticity), and the residuals are approximately normally distributed. The assumptions guide and statistical interpretation guide explain how to test each one and what to do if a check fails.

Real Example 1: Monthly Sales Forecast

A small e-commerce brand has 12 months of sales data and wants a baseline forecast for month 13, before layering on anything else. This is the simplest possible use case for regression: one predictor, time, and one outcome, sales.

Free Worksheet

Sales forecasting template

Copy the table below into a spreadsheet with your own monthly figures in place of month and sales, and follow the same four steps to produce your own forecast. It doubles as a reusable regression worksheet for any single-variable trend.

Month (x)Actual Sales ($000)x × y
142421
245904
3471419
45020016
55527525
65331836
75840649
86249664
96054081
1065650100
1168748121
1270840144
Σ (sum)6754,746650

12 months of sales with the fitted regression line

35 45 55 65 75 1 4 7 10 12 Month

Dark points are actual monthly sales ($000); the line is the fitted regression. Build your own from any dataset with the regression scatter plot tool or the scatter plot maker.

Worked Example — Simple Linear Regression

Fitting the line and forecasting month 13

1

Calculate the slope: b1 = (nΣxy − ΣxΣy) / (nΣx² − (Σx)²) = (12 × 4,746 − 78 × 675) / (12 × 650 − 78²) = 4,302 / 1,716 = 2.51.

2

Calculate the intercept: b0 = mean(y) − b1 × mean(x) = 56.25 − 2.51 × 6.5 = 39.95. The fitted equation is ŷ = 39.95 + 2.51x.

3

Check the fit: R² works out to 0.977, meaning the linear trend explains about 98% of the month-to-month variation in sales, a strong fit for a 12-month series.

4

Forecast month 13: ŷ = 39.95 + 2.51 × 13 = $72.6K. For month 6, the model predicted $55.0K against an actual of $53K, a residual of −$2.0K, a normal amount of noise around a strong trend line.

✓ Result: With no other information, month 13 is forecast at roughly $72,600. Because R² is high and the residuals are small and scattered rather than following a pattern, this baseline is solid enough to plan around, with room to adjust for anything the model could not see, such as a planned promotion.

Real Example 2: Advertising Spend vs. Sales

A direct-to-consumer brand wants to know whether its advertising budget is actually moving sales, and if so, by how much. Eight months of matched ad spend and sales figures give enough data for a first pass.

Ad Spend ($000)Sales ($000)
560
875
1085
1295
15110
18125
20130
25150
Worked Example — Regression Output and Marketing Insight

Turning ad spend into a sales prediction

1

Fit the model: The same least squares formulas give b1 = 4.56 and b0 = 39.40, so ŷ = 39.40 + 4.56x, where x is ad spend in thousands of dollars.

2

Interpret the slope: Every additional $1,000 in ad spend is associated with roughly $4,560 in additional sales across this dataset, an implied 4.6x return before accounting for other costs.

3

Check the fit: R² is about 0.99. Real advertising data rarely fits this cleanly; a correlation this strong is more typical of a controlled test than broad historical spend, so treat the near-perfect fit here as illustrative of the method, not a promise about your own numbers.

4

Predict a new budget: At a planned spend of $22K, ŷ = 39.40 + 4.56 × 22 = $139.7K in predicted sales.

✓ Result: The marketing team gets a specific, defensible number to bring into a budget conversation: roughly $139.7K in sales at a $22K spend level, with the caveat that this model has not been tested against a period where spend was cut, so it says nothing about what happens below the observed range.

🚫
Correlation is not causation

A strong relationship between ad spend and sales does not prove the ads caused the sales. Both could be rising because of a broader seasonal trend, a new product launch, or a competitor's exit, each moving alongside your ad budget without being caused by it. Confirming causation usually needs a controlled test, such as a holdout region with no ad spend, not just a regression line.

Real Example 3: Retail Demand Forecasting with Seasonality

A retailer has two years of quarterly sales and wants a forecast for the next quarter. This example shows what happens when a straight line is fit to data that has a strong seasonal pattern, and why that matters.

QuarterActual Sales ($000)Trend-Only PredictionResidual
Year 1, Q1120124.6−4.6
Year 1, Q2135134.3+0.7
Year 1, Q3140144.0−4.0
Year 1, Q4210153.7+56.3
Year 2, Q1130163.4−33.4
Year 2, Q2148173.0−25.0
Year 2, Q3155182.7−27.7
Year 2, Q4230192.4+37.6
Worked Example — When a Straight Line Isn't Enough

Spotting seasonality in the residuals

1

Fit a trend-only model: Using quarter index 1 through 8, the trend-only equation is ŷ = 114.9 + 9.69x. It captures the general upward direction but treats every quarter the same.

2

Read the residual pattern: Both Q4s are underpredicted by $56K and $38K, while Q1 through Q3 are consistently overpredicted. A pattern like this in the residuals, rather than random scatter, is the clearest sign that an important variable is missing from the model.

3

Add a seasonal term: The fix is to move from simple to multiple regression: ŷ = b0 + b1(quarter index) + b2(Q4 dummy), where the Q4 dummy variable equals 1 for fourth-quarter rows and 0 otherwise. This lets the model learn a separate holiday-season lift instead of forcing one straight line through every quarter.

4

Validate against next year: Once the seasonal term is added, residuals should shrink and lose their quarterly pattern. If they do not, the retailer may need a full seasonal decomposition or a dedicated time series model instead.

⚠ Lesson: A high overall R² can hide a systematic seasonal error. Always plot residuals against the calendar, not just against the predictor, before trusting a forecast that spans more than one season.

Step-by-Step: Build a Sales Forecast Using Regression

The following process turns raw sales data into a working forecast. It applies whether you are using a spreadsheet, Python, or the calculator in the next section.

Phase 1: Prepare the Data

  • Collect at least 12 to 24 periods of historical sales figures
  • Clean the dataset: fix missing values, remove duplicates, flag one-time events
  • Choose which variables belong in the model: time alone, or additional predictors
  • Plot the data first to confirm the relationship looks roughly linear

Phase 2: Fit the Model

  • Calculate the slope and intercept using the least squares method
  • Write out the fitted regression equation
  • Evaluate R² and, for multiple regression, adjusted R²
  • Check residuals for patterns, especially seasonal ones

Phase 3: Validate and Forecast

  • Test the model against the most recent periods it has not seen
  • Predict future sales by plugging future period values into the equation
  • Add a prediction interval, not just a single point estimate
  • Monitor actual results each period and refit as new data arrives

Sales Forecast Regression Calculator

Enter your own historical sales figures, in order, and choose how many future periods to project. The calculator fits a simple linear regression using the same least squares formulas shown in Example 1 above.

🧮 Sales Forecast Regression Calculator

Values should be entered oldest to newest, one number per period, separated by commas.

Historical Sales Data
Forecast Horizon
Slope (b1)
Intercept (b0)

This calculator fits a single-variable trend line and does not account for seasonality on its own; for seasonal data, use the dummy-variable approach shown in Example 3, or the full simple linear regression calculator for a complete statistical output including confidence intervals.

Regression vs. Other Forecasting Methods

Regression is one tool among several. The right choice depends on how much data you have, whether other explanatory variables are available, and how far into the future you need to forecast.

MethodHow It WorksStrength vs. RegressionChoose It Instead When
Moving AverageAverages the last few periods to smooth out noiseSimpler, needs no predictor variables, easy to explainData is short, stable, and you have no candidate predictor
Exponential SmoothingWeights recent periods more heavily than older onesReacts faster to recent shifts in the level or trendSales patterns change gradually and recency matters more than a fixed-slope trend
Time Series / ARIMAModels sales from its own past values, trend, and seasonality directlyHandles seasonality and autocorrelation nativelyYou have long, regular history and no reliable external predictor
Machine Learning ModelsLearns complex, non-linear patterns from many variables at onceCan capture interactions regression missesYou have a large dataset and accuracy matters more than interpretability

In practice, regression and time series methods are often combined rather than pitted against each other: a regression model with a time trend and seasonal dummy variables is, structurally, a simple form of time series forecasting. The scatter plots and correlation guide is a useful next step for confirming which relationship, if any, is strong enough to justify a regression-based approach in the first place.

Common Mistakes in Regression Forecasting

MistakeWhat Goes WrongWhat To Do Instead
Using poor-quality data Missing periods, duplicate entries, or mixed units (weekly and monthly data combined) quietly distort the slope and intercept. Audit the dataset for gaps and consistent units before fitting anything.
Ignoring outliers A single unusual month, such as a warehouse outage or a one-time bulk order, can pull the whole line off course. Investigate large residuals individually; decide whether to exclude, adjust, or explicitly model the event.
Confusing correlation with causation A strong R² is treated as proof that a predictor drives sales, when both could be moving together for an unrelated reason. Treat regression as evidence of a pattern, and confirm causal claims with a controlled test where possible.
Overfitting Adding predictor after predictor pushes R² toward 1.0 while the model becomes unreliable on new, unseen data. Compare adjusted R², not raw R², and validate against a holdout period the model has not seen.
Ignoring seasonality A single straight line fit across full calendar years underpredicts peak seasons and overpredicts slow ones, as shown in Example 3. Add seasonal dummy variables or deseasonalize the data before fitting a trend line.
Using too few observations A model fit on 4 or 5 data points can produce a high R² purely by chance, with no real predictive value. Aim for at least 12 periods for a simple trend, more for multiple regression or seasonal data.
Misinterpreting R² A high R² is read as proof the forecast will be accurate, ignoring that it only describes fit to past data, not future stability. Pair R² with out-of-sample validation and a prediction interval before quoting a single forecast number.

Best Tools for Regression-Based Sales Forecasting

ToolBest ForRegression CapabilitiesConsideration
Microsoft ExcelSmall teams, quick baseline forecastsAnalysis ToolPak Regression tool, SLOPE, INTERCEPT, RSQ, TREND, FORECAST.LINEARNot built for professional-grade statistics at scale; see Microsoft's Analysis ToolPak documentation
Google SheetsCollaborative small-team forecastingSLOPE, INTERCEPT, LINEST, TREND, and chart trendlines with displayed R²Fewer built-in diagnostics than Excel's ToolPak; no native residual plots
PythonData teams, automation, custom scoringscikit-learn LinearRegression, statsmodels OLS with full statistical outputRequires programming; most flexible option for multiple regression and validation
RStatistical teams, academic rigorlm(), summary(), plot() for residual diagnosticsStrong statistical output; steeper learning curve for spreadsheet-first teams
Power BIBusiness dashboards with a forecasting layerBuilt-in forecasting visual and trend lines on chartsLimited access to raw regression coefficients compared to Excel or Python
TableauVisual exploration and stakeholder dashboardsTrend lines with R² and p-values available on hoverBest for presenting results, not for building complex multiple regression models
SPSSAcademic and market research teamsFull linear regression module with diagnostics and assumption testsLicense cost can be a barrier for small businesses
SASLarge enterprises, regulated industriesPROC REG and PROC GLM with extensive validation optionsSteep learning curve; typically requires a dedicated analyst
JMPQuality and manufacturing-adjacent forecastingInteractive fit platform with live residual and diagnostic plotsSmaller user community than Python or R for general business use
Google Looker StudioSharing forecast dashboards across a companyTrend lines on time series charts; relies on external tools for the underlying modelNot a regression engine itself, best paired with Python or Sheets for the calculation

For most small businesses, the practical path is Excel or Google Sheets for the first model, moving to Python's scikit-learn once the forecast needs to run automatically or include several predictors at once. The site's own simple linear regression calculator and full calculator library are useful for checking a spreadsheet formula against an independent result.

Regression Cheat Sheet

TermFormula / RuleQuick Read
Regression equationŷ = b0 + b1xThe fitted line used to generate every prediction
Slope (b1)(nΣxy − ΣxΣy) / (nΣx² − (Σx)²)Change in sales per one-unit change in x
Intercept (b0)mean(y) − b1 × mean(x)Predicted value when x = 0
Correlation (r)Sxy / √(Sxx × Syy)Strength and direction of the relationship, −1 to +1
r², or 1 − (SSresidual / SStotal)Share of variation in sales explained by the model
Adjusted R²1 − [(1 − R²)(n − 1) / (n − k − 1)]R² penalized for extra predictors
Residualactual − predictedThe size of a single miss; check for patterns, not just size
Least squares ruleMinimize Σ(residual²)How the "best" line is chosen among all possible lines
RMSE√(Σ(residual²) / n)Typical forecast error, in the original sales units
Assumptions to checkLinearity, independence, constant variance, normal residualsSee the assumptions guide before trusting a model

Sales Forecasting and Regression Glossary

TermPlain-English DefinitionRole in Sales Forecasting
Sales ForecastingEstimating future sales from historical data and known conditionsThe overall goal every method in this guide serves
Regression AnalysisA statistical method that measures and uses the relationship between variablesThe core technique used to turn history into a prediction
Linear RegressionA regression model that fits a straight line to the dataThe default starting model for a time-based sales trend
Multiple RegressionA regression model using two or more predictor variablesUsed once more than one factor, such as price and season, drives sales
Dependent VariableThe outcome being predictedSales, in every example in this guide
Independent VariableA factor believed to influence the outcomeTime, ad spend, price, or any other predictor
Regression EquationThe formula describing the fitted line: ŷ = b0 + b1xWhat actually produces the forecasted number
CorrelationA measure of how closely two variables move togetherChecked before fitting a model to confirm a relationship exists
R² (Coefficient of Determination)The share of variation in the outcome explained by the modelThe headline number for judging model fit
ResidualThe gap between an actual value and the model's predictionUsed to spot outliers, seasonality, and model weaknesses
Least Squares MethodThe technique that fits the line by minimizing squared residualsHow every coefficient in this guide was calculated
Prediction IntervalA range for one individual future observationSets realistic expectations around a single forecasted number
Confidence IntervalA range for the average or expected value at a pointDescribes uncertainty in the trend line itself
Trend AnalysisExamining the general direction of data over timeWhat a time-based simple regression formally quantifies
Forecast AccuracyHow closely a model's past predictions matched actual resultsTracked over time to decide whether to keep or refit a model

Frequently Asked Questions

Regression analysis in sales forecasting is a statistical method that measures how sales have moved together with one or more other variables in the past, such as time, advertising spend, or price, and uses that measured relationship to estimate sales for future periods.
Linear regression fits a straight line through historical sales data using the least squares method, which finds the line that minimizes the total squared distance between the line and every actual data point. Once the line's slope and intercept are known, you plug in a future time period or predictor value to get a forecasted sales figure.
At minimum, you need a consistent series of historical sales figures, ideally 12 to 24 periods or more, recorded at a regular interval such as weekly or monthly. If you are building a multiple regression model, you also need matching historical values for each predictor, such as ad spend, price, or foot traffic, for the same periods.
The regression equation is the formula that describes the fitted line: ŷ = b0 + b1x, where ŷ is the predicted sales value, b0 is the intercept (the predicted value when x is zero), b1 is the slope (how much sales change for each one-unit increase in x), and x is the predictor, often a time period or spend amount.
R², or the coefficient of determination, is a number between 0 and 1 that shows how much of the variation in sales is explained by the regression model. An R² of 0.80 means the model accounts for 80% of the variation in sales; the remaining 20% is due to factors the model does not capture.
Yes. Excel can fit a regression model using the Analysis ToolPak's Regression tool, the built-in SLOPE, INTERCEPT, and RSQ functions, or the TREND and FORECAST.LINEAR functions for a quick prediction. All of these use the same least squares method described in this guide.
Regression forecasts sales as a function of one or more explanatory variables, including time itself. Time series methods, such as moving averages, exponential smoothing, or ARIMA, forecast sales primarily from the pattern of past sales values, including trend and seasonality, without necessarily requiring an outside predictor. Many practical forecasts combine both.
Accuracy depends on how strong and stable the underlying relationship is, how much historical data is available, and whether the business conditions that generated that data still hold. A model with a high R² on clean, representative data can be quite accurate in the near term; accuracy typically degrades the further out the forecast extends.
Regression is a good fit when there is a measurable, roughly linear relationship between sales and one or more known factors, when at least a year of consistent historical data exists, and when the business wants an interpretable model that explains why sales move, not just a black-box prediction.
Regression assumes the relationship between variables stays stable over time, is sensitive to outliers and poor-quality data, can produce a misleadingly high R² through overfitting, and does not by itself prove that one variable causes another. It also tends to underperform when sales are driven by irregular events a linear model cannot represent.
A confidence interval gives a range for the average, or expected, sales value at a given point. A prediction interval gives a range for a single future observation, which is wider because it accounts for both the uncertainty in the average and the natural variation of an individual outcome around that average.
As a practical floor, most analysts want at least 12 periods of data for a simple time-based model, and more if the data is noisy, seasonal, or if a multiple regression model with several predictors is being fit. Seasonal patterns generally need two or more full cycles of history to be captured reliably.
Simple linear regression predicts sales from a single variable, most often time or one spend figure. Multiple linear regression predicts sales from two or more variables at once, such as advertising spend, price, and a seasonal indicator, which usually captures more of the real drivers behind sales but requires more data and care to avoid overfitting.
A negative slope means the predicted variable moves in the opposite direction of the predictor: as the predictor increases, forecasted sales decrease. For example, a regression of sales against price would typically show a negative slope, since higher prices are usually associated with lower unit sales.

Key sources and further reading: Gallo, A. — "A Refresher on Regression Analysis," Harvard Business Review · NIST/SEMATECH Engineering Statistics Handbook — Linear Least Squares Regression · scikit-learn — Linear Models Documentation · Microsoft Support — Using the Analysis ToolPak · OpenIntro Statistics — open-access textbook covering regression · Khan Academy — Statistics and Probability