Bayesian Statistics Hypothesis Testing Inferential Statistics 30 min read June 15, 2026
BY: Statistics Fundamentals Team
Reviewed By: Minsa A (Senior Statistics Editor)

Bayes Factor: Guide to Bayesian Hypothesis Testing & Interpretation

A p-value tells you how surprising your data would be if the null hypothesis were true. But it says nothing about how much more likely your data is under one model versus another. The Bayes Factor closes that gap. It quantifies the relative evidence two competing hypotheses provide for the same observed data, producing a single number that tells you exactly how much to update your belief.

This guide covers the full theory and practice: what the Bayes Factor is, how the formula works, how to interpret BF10 and BF01 with the Jeffreys Scale, how it compares to p-values, worked examples from psychology and A/B testing, and how to compute it in R and Python. The interactive calculator lets you run your own analysis directly on this page.

What You'll Learn
  • ✓ The precise definition of the Bayes Factor and its mathematical derivation
  • ✓ BF10 vs BF01 — what each measures and how they relate
  • ✓ The complete Jeffreys Scale with actionable thresholds
  • ✓ How to calculate a Bayes Factor step by step
  • ✓ Bayes Factor vs p-value — when to use each and why
  • ✓ Fully worked examples in psychology and digital marketing
  • ✓ R and Python code for running Bayesian hypothesis tests
  • ✓ An interactive Bayes Factor calculator with live interpretation

What Is a Bayes Factor?

Definition — Bayes Factor
A Bayes Factor is the ratio of the marginal likelihoods of the observed data under two competing statistical models. It quantifies how much more (or less) probable the data are under one hypothesis compared to another, giving a direct, continuous measure of statistical evidence.
BF₁₀ = P(D | H₁) / P(D | H₀)

The Bayes Factor sits at the heart of Bayesian hypothesis testing. To understand it, start with a basic question any researcher faces after collecting data: given what I observed, which of my two competing explanations does the data favor, and by how much? A p-value answers a different question — it tells you how unlikely your data would be if the null hypothesis were true, but it says nothing about the alternative. The Bayes Factor compares both hypotheses head-to-head against the same data.

Formally, the marginal likelihood P(D | H) is the probability of the observed data integrated across all possible values of the model's parameters, weighted by the prior distribution on those parameters. This integration is what separates the Bayes Factor from a simple likelihood ratio, which evaluates models only at their best-fit parameter values.

Harold Jeffreys developed the framework in the 1930s and 1940s, publishing it fully in his 1961 book Theory of Probability. His goal was a method that could confirm a null effect, not merely fail to reject it — something classical significance testing cannot do. Researchers in psychology, medicine, and data science have returned to Bayes Factors with growing frequency because they address real inferential needs that frequentist tools leave unmet. The broader context for this sits in the Bayes theorem and conditional probability topics on Statistics Fundamentals.

BF > 1
Data favors H₁
BF = 1
Inconclusive
BF < 1
Data favors H₀
BF = 10
Strong evidence for H₁
⚡ Quick Reference — Bayes Factor Key Facts
  • BF₁₀: Evidence for the alternative hypothesis H₁ relative to H₀ — the most commonly reported direction
  • BF₀₁: Evidence for the null hypothesis H₀ relative to H₁. BF₀₁ = 1 / BF₁₀
  • BF = 3: The minimum threshold many journals consider noteworthy evidence
  • Prior sensitivity: Results depend on the prior distribution specified for effect sizes under H₁
  • Null confirmation: Unlike p-values, Bayes Factors can provide positive evidence for H₀ when BF₀₁ > 1
  • No arbitrary cutoffs: The Bayes Factor is a continuous measure — interpretation is graduated, not binary

The Bayes Factor Formula

The formula derives directly from Bayes' theorem applied to competing models. Start with the relationship between prior odds, posterior odds, and the Bayes Factor:

Core Bayes Factor Formula
BF₁₀ = P(D | H₁) / P(D | H₀)
Posterior Odds = BF₁₀ × Prior Odds
P(D | H₁) = marginal likelihood of data under H₁ P(D | H₀) = marginal likelihood of data under H₀ BF₁₀ = evidence ratio for H₁ vs H₀ BF₀₁ = 1 / BF₁₀

The marginal likelihood for each hypothesis is computed by integrating the likelihood function over the prior distribution of the model's parameters:

Marginal Likelihood (Model Evidence)
P(D | H₁) = ∫ P(D | θ, H₁) · P(θ | H₁) dθ
θ = model parameter(s) P(D | θ, H₁) = likelihood of data given θ P(θ | H₁) = prior distribution on θ under H₁

Understanding Marginal Likelihoods

The marginal likelihood is often called the "model evidence" because it measures how well a hypothesis predicts the observed data overall — not just at the best-fit parameter value. A model that is flexible enough to fit many possible datasets will not be rewarded as much as a model that makes precise predictions that happen to match what you observed.

This is why the Bayes Factor naturally penalizes overly complex models. If H₁ specifies that a wide range of effect sizes are plausible, its marginal likelihood is spread thin across many possible predictions. A more focused prior — one that predicts more precisely what you found — yields higher marginal likelihood. This automatic complexity penalty is a feature, not a bug: it implements Occam's Razor mathematically.

The Savage-Dickey Density Ratio

For nested models — where H₀ is a special case of H₁ with a parameter fixed to a specific value, such as zero — the Bayes Factor has an elegant simplification known as the Savage-Dickey density ratio:

Savage-Dickey Density Ratio (Nested Models)
BF₁₀ = P(θ₀ | H₁) / P(θ₀ | D, H₁)
θ₀ = parameter value under H₀ (often 0) P(θ₀ | H₁) = prior density at θ₀ P(θ₀ | D, H₁) = posterior density at θ₀

This ratio compares the prior density and posterior density at the null value θ₀. When the data shift probability mass away from θ₀, the posterior density at θ₀ becomes smaller than the prior density. Consequently, BF₁₀ exceeds 1, indicating evidence in favor of H₁. In contrast, when the data concentrate probability mass around θ₀, the posterior density at θ₀ becomes larger than the prior density. Consequently, BF₁₀ is less than 1, indicating evidence in favor of H₀. This computational approach is known as the Savage–Dickey density ratio and is used in the default Bayesian t-tests implemented in JASP.

The Savage-Dickey ratio is described in Dickey, J. M. & Lientz, B. P. (1970). The weighted likelihood ratio, sharp hypotheses about chances, the order of a Markov chain. Annals of Mathematical Statistics, 41(1), 214–226. See also BayesFactor package documentation for practical implementation.

BF10 vs BF01: Which Direction to Report

BF₁₀ and BF₀₁ measure the same evidence; they just flip the direction of the comparison. BF₁₀ quantifies how much more strongly the data support H₁ over H₀. BF₀₁ quantifies how much more strongly the data support H₀ over H₁. They are exact mathematical reciprocals:

Reciprocal Relationship
BF₀₁ = 1 / BF₁₀
If BF₁₀ = 5, then BF₀₁ = 0.20 If BF₁₀ = 0.1, then BF₀₁ = 10

Convention in most research fields is to report BF₁₀ when results favor the alternative and BF₀₁ when results favor the null, always choosing the direction greater than 1 for clarity. Some journals and the JASP statistical software report both. When reading research papers, check which subscript the authors used before interpreting the magnitude.

Statistic Measures Evidence For Greater Than 1 When Less Than 1 When
BF₁₀H₁ over H₀Data supports H₁Data supports H₀
BF₀₁H₀ over H₁Data supports H₀Data supports H₁
log(BF₁₀)H₁ over H₀ (log scale)Positive values → H₁Negative values → H₀

The Log Bayes Factor Scale

Because Bayes Factors can range from near zero to very large numbers, researchers working with extreme evidence often use the natural logarithm, written log(BF). On the log scale, values above zero favor H₁, values below zero favor H₀, and the magnitude indicates strength symmetrically in both directions. A log(BF₁₀) of 2.3 corresponds to BF₁₀ = e^2.3 ≈ 10 — strong evidence for the alternative. Log Bayes Factors also add naturally when combining independent pieces of evidence, making them useful in meta-analytic settings.

The Jeffreys Scale: Interpreting Bayes Factor Strength

Harold Jeffreys proposed a graduated classification of evidence strength in 1961. The scale has been refined and relabeled by various researchers since then, but the numerical thresholds remain standard across most research fields. The table below combines the original Jeffreys thresholds with the labels used by JASP and the University of Amsterdam's Bayesian statistics group.

Jeffreys Scale — Complete Evidence Classification

BF₁₀ Value Evidence Strength Favors Practical Interpretation
> 100ExtremeH₁Overwhelming support; effect is essentially certain
30 – 100Very StrongH₁Reliable experimental result; highly replicable
10 – 30StrongH₁Clear evidence; effect deserves publication
3 – 10ModerateH₁Noteworthy trend; replication recommended
1 – 3AnecdotalH₁Weak, barely distinguishable from chance variation
1No EvidenceNeitherData equally consistent with both hypotheses
1/3 – 1AnecdotalH₀Slight trend toward null; not meaningful alone
1/10 – 1/3ModerateH₀Data favor null; report BF₀₁ = 3–10
1/30 – 1/10StrongH₀Meaningful null confirmation; effect likely absent
1/100 – 1/30Very StrongH₀Robust evidence of null; strong replication candidate
< 1/100ExtremeH₀Decisive null result; effect almost certainly absent
⚠️
Important: Thresholds are guidelines, not decision rules

The Jeffreys Scale categories are interpretive conventions, not statistical laws. A BF₁₀ of 2.9 and a BF₁₀ of 3.1 are not meaningfully different, even though one falls in "anecdotal" and the other in "moderate." Always report the exact numerical value alongside the category label, and interpret in context.

How to Calculate a Bayes Factor: 6 Steps

📋
Featured Snippet — 6-Step Calculation Process

Step 1: Define H₀ and H₁. Step 2: Specify prior distributions on parameters. Step 3: Collect data. Step 4: Compute marginal likelihoods. Step 5: Take the ratio BF₁₀ = P(D|H₁) / P(D|H₀). Step 6: Interpret using the Jeffreys Scale.

1

Define the Competing Hypotheses

Write H₀ as a specific constraint on your parameters — typically that an effect is zero or that two groups are equal: H₀: μ₁ = μ₂ or H₀: δ = 0. Write H₁ as a prior distribution over possible effect sizes: for a t-test, H₁: δ ~ Cauchy(0, 0.707) is the JASP default. The key difference from frequentist testing is that H₁ must be specified precisely, not just as "some difference exists."

2

Specify Prior Distributions

The prior distribution under H₁ is the most consequential design choice in a Bayes Factor analysis. A Cauchy distribution centered at zero with scale r = 0.707 is the standard default for effect sizes in many psychological tests, chosen because it is scale-invariant and gives reasonable weight to a wide range of effects. Narrow priors reward more precise predictions; wide priors are more conservative. Always report the prior you used.

3

Collect Your Data

Unlike frequentist testing, Bayesian inference allows you to update the Bayes Factor continuously as data accumulate. Sequential updating is mathematically valid here — each new data point updates the posterior, which then becomes the prior for the next observation. This property makes Bayes Factors valuable in adaptive research designs where sample size is not fixed in advance.

4

Compute the Marginal Likelihoods

For simple models like the Bayesian t-test or correlation test, closed-form solutions exist and software like R's BayesFactor package or JASP computes them directly. For complex models, numerical integration via Monte Carlo methods or bridge sampling is required. The marginal likelihood under H₀ for a one-sample t-test is simply the likelihood of the data under the fixed null value; the marginal likelihood under H₁ integrates the likelihood over the Cauchy prior.

5

Compute the Ratio

Divide P(D | H₁) by P(D | H₀). The result is BF₁₀. If it is greater than 1, the data favor H₁ by that factor. If it is less than 1, the data favor H₀ — in which case report BF₀₁ = 1/BF₁₀ for clarity. Both values carry the same information; convention favors reporting the direction greater than 1 so readers can read the strength directly.

6

Interpret and Report

Map your BF value to the Jeffreys Scale, report the exact number, the prior specification, and the direction of evidence. Example: "A Bayesian independent-samples t-test with a Cauchy prior (r = 0.707) yielded BF₁₀ = 18.5, indicating strong evidence for the alternative hypothesis." Never reduce the result to a binary decision the way p-values are often misused — the Bayes Factor is a continuous measure of evidence.

Bayes Factor Worked Examples

The two examples below follow the 6-step process. Calculations use the standard Cauchy prior with scale r = 0.707, matching the default in JASP and R's BayesFactor package. Both examples show the full reasoning chain from raw data to an interpretable conclusion.

Example 1 — Bayesian Independent-Samples T-Test (Psychology)

Worked Example 1 — Bayesian T-Test

Problem: A psychologist tests whether a memory training intervention improves spatial recall. Control group (n = 40) scores a mean of 52.3 (SD = 8.1). Intervention group (n = 42) scores a mean of 58.7 (SD = 7.9). Prior: Cauchy(0, 0.707) on standardized effect size.

Standardized Effect Size (Cohen's d)
d = (x̄₁ − x̄₂) / s_pooled
s_pooled = pooled standard deviation df = n₁ + n₂ − 2 = 80
1

Hypotheses: H₀: δ = 0 (no effect) | H₁: δ ~ Cauchy(0, 0.707) — the effect size follows a Cauchy distribution centered at zero

2

Prior: Standard Cauchy with scale r = 0.707. This prior assigns 50% probability to |δ| > 0.707, reflecting uncertainty about whether the effect is small or large before seeing the data.

3

Compute Cohen's d:
s_pooled = √[((39 × 8.1²) + (41 × 7.9²)) / 80] = √[(2559.2 + 2559.6) / 80] = √64.0 ≈ 8.0
d = (58.7 − 52.3) / 8.0 = 6.4 / 8.0 = 0.80

4

Marginal likelihoods: Using the Rouder et al. (2009) formula for the Bayesian independent t-test with n₁ = 40, n₂ = 42, d = 0.80, and r = 0.707, the marginal likelihood ratio yields BF₁₀ ≈ 18.5.

5

Interpretation: BF₁₀ = 18.5 falls in the 10–30 range on the Jeffreys Scale, meaning strong evidence for the alternative hypothesis. The observed data are 18.5 times more probable under H₁ than under H₀.

6

Reporting: "A Bayesian independent-samples t-test with a Cauchy prior (r = 0.707) yielded BF₁₀ = 18.5, providing strong evidence that memory training improves spatial recall (d = 0.80)."

✅ Conclusion: The data are 18.5 times more likely under the alternative hypothesis than under the null. This constitutes strong evidence (Jeffreys Scale) that the intervention improved spatial memory. Replication with a pre-registered design is still recommended before drawing causal conclusions.

Formula reference: Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t-tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16(2), 225–237. doi:10.3758/PBR.16.2.225

Example 2 — Bayesian A/B Test Confirming a Null Effect (Digital Marketing)

Worked Example 2 — Null Confirmation A/B Test

Problem: An e-commerce team runs an A/B test on a checkout redesign. After 10,000 visitors per variant, conversion rates are 4.82% (A) and 4.89% (B) — a difference of 0.07 percentage points. The team wants to know whether to implement variant B or stop the test.

1

Hypotheses: H₀: p_A = p_B (no conversion rate difference) | H₁: p_A ≠ p_B (there is a difference)

2

Prior: Beta distribution priors on conversion rates, equivalent to Cauchy prior on standardized difference with r = 0.707. Under H₁, a wide range of conversion lifts are plausible before seeing the data.

3

Observed effect: Δp = 0.07 pp out of a base rate of ~4.85%. Standardized, this is an extremely small effect. The large sample size (n = 10,000 per arm) gives the test ample power to detect any practically meaningful difference.

4

Bayes Factor: With n = 10,000 per variant and such a tiny observed difference, the marginal likelihood calculation yields BF₀₁ = 22.0 — meaning BF₁₀ = 1/22.0 ≈ 0.045. The data favor H₀ by a factor of 22.

5

Interpretation: BF₀₁ = 22.0 falls in the 10–30 range — strong evidence for H₀. This is the power of Bayesian analysis: the team has not merely failed to detect a difference; they have positive evidence that no meaningful difference exists.

6

Decision: Stop the test and do not implement variant B. Report: "BF₀₁ = 22.0, providing strong evidence that the checkout redesign has no meaningful effect on conversion rates."

✅ Conclusion: The data are 22 times more likely under the null hypothesis of equivalence than under the alternative. The team can confidently conclude the redesign offers no conversion benefit — a decision a frequentist null result alone (which only says p ≥ 0.05) could not support with the same clarity.

Bayes Factor vs P-Value: A Direct Comparison

The p-value and the Bayes Factor answer fundamentally different questions. A p-value answers: "If H₀ is true, how likely is it that I would see data at least as extreme as mine?" A Bayes Factor answers: "Given the data I actually observed, how much more evidence do I have for H₁ than for H₀?" These are not the same thing, and conflating them causes real errors in scientific reasoning.

Dimension Bayes Factor P-Value (NHST) Likelihood Ratio
Question answeredHow much more does the data support H₁ vs H₀?How surprising is the data if H₀ is true?What is the best-fit evidence ratio?
Support for H₀Yes — explicitly, via BF₀₁ > 1No — can only fail to reject H₀No
Prior informationRequired under H₁Not usedNot used
Sensitivity to sample sizeLower — evidence can favor either H₀ or H₁ as n increasesHigh — even trivial effects can become statistically significant with very large nMedium
Binary cutoffNo — continuous evidence scaleYes — α = 0.05 is treated as a thresholdNo
Sequential testingValid — update continuously with new dataInflates Type I error — requires pre-planned nRequires correction
Typical reportingBF₁₀ = 18.5 (strong evidence for H₁)p = 0.023 (significant at α = 0.05)LR = 12.4 at θ̂

One practical consequence is that, with a very large sample, even a trivially small effect can yield p < 0.05. For example, a drug that lowers systolic blood pressure by only 0.3 mmHg may be declared statistically significant if the sample size is sufficiently large. A Bayes factor can behave differently: if the observed effect is much smaller than the effect sizes predicted under H₁, the evidence may favor H₀, yielding BF₁₀ < 1. In this way, Bayesian inference evaluates not only whether an effect differs from zero but also whether the observed magnitude is consistent with the predictions of the competing hypotheses.

When to use each

Use p-values when following a pre-registered confirmatory protocol with a fixed sample size and you need to control long-run Type I error rates. Use Bayes Factors when you want to quantify evidence strength continuously, explicitly confirm null effects, or run adaptive designs. For more on the frequentist approach, see the p-values guide and hypothesis testing overview on Statistics Fundamentals.

Computing Bayes Factors in R and Python

R: The BayesFactor Package

The BayesFactor package by Morey and Rouder is the standard tool for Bayesian t-tests, ANOVA, and regression in R. Install it once with install.packages("BayesFactor").

R — Bayesian Independent Samples T-Test
library(BayesFactor)

# Example data: control and intervention group scores
control <- c(48, 52, 54, 50, 55, 49, 53, 51)
intervention <- c(57, 61, 59, 63, 58, 60, 62, 64)

# Run Bayesian independent-samples t-test
# rscale = 0.707 is the default "medium" Cauchy prior
bf_result <- ttestBF(x = control, y = intervention, rscale = 0.707)

# Print the result
print(bf_result)
# Output: BF10 value + posterior probability of H1

# Extract the numeric BF10 value
bf_value <- extractBF(bf_result)$bf
cat("BF10 =", bf_value, "\nBF01 =", 1/bf_value, "\n")

# One-sample test: is the mean different from 0?
bf_one_sample <- ttestBF(x = control, mu = 50, rscale = 0.707)
print(bf_one_sample)
R — Bayesian Correlation Test
# Bayesian Pearson correlation test
x <- rnorm(50, mean = 5, sd = 2)
y <- x * 0.6 + rnorm(50, mean = 0, sd = 1.5)

bf_corr <- correlationBF(y = y, x = x)
print(bf_corr)
# Interprets evidence for non-zero correlation vs rho = 0
Package documentation: Morey, R. D., & Rouder, J. N. (2022). BayesFactor: Computation of Bayes Factors for Common Designs. R package. cran.r-project.org/package=BayesFactor

Python: Pingouin Library

The pingouin library provides Bayesian t-tests directly. Install with pip install pingouin.

Python — Bayesian T-Test with Pingouin
import pingouin as pg
import numpy as np

# Generate example data
np.random.seed(42)
control = np.random.normal(52, 8, 40)
intervention = np.random.normal(58, 8, 42)

# Bayesian independent-samples t-test
# Returns BF10 directly in the output table
result = pg.ttest(intervention, control, correction=True)
print(result[["BF10", "dof", "p-val", "cohen-d"]])

# One-sample Bayesian t-test
result_one = pg.ttest(control, 50)
print(result_one[["BF10", "T", "p-val"]])

# Convert BF10 to BF01
bf10 = result["BF10"].values[0]
bf01 = 1 / bf10
print(f"BF10 = {bf10:.2f}, BF01 = {bf01:.4f}")
💡
JASP: Free GUI Software for Bayesian Analysis

If you prefer a graphical interface over code, JASP (jasp-stats.org) is a free, open-source statistics program developed at the University of Amsterdam. It computes Bayes Factors for t-tests, ANOVA, regression, and correlations with one click, using the same Rouder priors as the R BayesFactor package. Results include the full posterior distribution and robustness checks across prior specifications.

Bayes Factor Calculator

This calculator implements the marginal likelihood ratio for a one-sample or two-sample scenario using a Cauchy prior on the standardized effect size. Enter your t-statistic, sample sizes, and choose the prior scale to compute BF₁₀ directly. For a full Bayesian t-test with raw data, use the R code above.

🧮 Interactive Bayes Factor Calculator

Real-World Applications of the Bayes Factor

🧠

Psychology & Replication

Bayes Factors have become the preferred tool for replication studies in psychology. They can show not just that a replication failed, but that the null is actively supported — a distinction classical tests cannot make.

💊

Clinical Trials

In adaptive trial designs, Bayes Factors allow researchers to update evidence continuously without inflating error rates — making them useful for interim analyses where early stopping decisions must be made.

📈

A/B Testing

E-commerce and product teams use Bayesian methods to stop tests early when evidence is decisive (BF > 10) or when there is strong evidence of equivalence (BF₀₁ > 10) — saving engineering resources and revenue.

🔬

Genetics & Biomarkers

Genome-wide association studies use Bayes Factors to evaluate the evidence for genetic association at each locus, naturally balancing the prior probability of association against observed effect sizes.

🤖

Machine Learning

Bayesian model comparison uses Bayes Factors to evaluate whether a more complex model is justified by the data. This prevents overfitting by requiring that added parameters earn their complexity through improved fit.

📊

Economics & Finance

Macroeconomists use Bayes Factors to compare structural models. Financial analysts apply them to test whether new risk factors add predictive value beyond existing ones in factor models.

How to Report a Bayes Factor in Research

Academic reporting of Bayes Factors requires four elements: the specific statistic, its numerical value, the prior specification, and the qualitative interpretation. The examples below follow APA 7th edition style as adapted for Bayesian reporting per Keysers, Gazzola, and Wagenmakers (2020).

Reporting Templates

For evidence in favor of H₁:

"A Bayesian independent-samples t-test using a Cauchy prior on effect size (r = 0.707) showed that [H₁ description]: BF₁₀ = 18.5, which according to the Jeffreys scale constitutes strong evidence for the alternative hypothesis."

For evidence in favor of H₀:

"A Bayesian analysis with a Cauchy prior (r = 0.707) revealed that the data provided strong evidence against a treatment effect: BF₀₁ = 22.0. We conclude that the intervention does not meaningfully alter [outcome variable]."

For inconclusive results:

"The Bayesian t-test yielded BF₁₀ = 1.8, indicating anecdotal and inconclusive evidence. Neither hypothesis is well supported; further data collection is required."

📝
Always report your prior

Bayes Factors depend on the prior distribution chosen for H₁. Two researchers using different priors will get different Bayes Factors from identical data. This is not a flaw — it makes the influence of prior knowledge explicit — but it requires transparent reporting. State the prior family, its parameters, and the rationale for choosing it. If reporting in a journal that mandates this, the JASP manual and the Psychonomic Bulletin & Review style guide by Wagenmakers et al. (2018) provide detailed guidance.

Bayes Factor Cheat Sheet

Term / SymbolDefinitionFormula / Note
BF₁₀Evidence for H₁ over H₀P(D|H₁) / P(D|H₀)
BF₀₁Evidence for H₀ over H₁1 / BF₁₀
Marginal likelihoodProbability of data under a hypothesis, integrated over all parameters∫ P(D|θ,H) · P(θ|H) dθ
Prior oddsYour belief ratio before seeing dataP(H₁) / P(H₀)
Posterior oddsYour belief ratio after seeing dataBF₁₀ × Prior Odds
Cauchy prior r = 0.707Default JASP/BayesFactor prior on standardized effect sizeGives 50% probability to |δ| > 0.707
log BFNatural log of the Bayes Factor — symmetric evidence scalelog(BF₁₀) > 0 favors H₁
Savage-Dickey ratioShortcut for nested modelsP(θ₀|H₁) / P(θ₀|D, H₁)
Jeffreys thresholdMinimum BF for "noteworthy" evidenceBF₁₀ ≥ 3
Strong evidenceJeffreys Scale: strongBF₁₀ = 10–30

Frequently Asked Questions

What is a Bayes Factor?
A Bayes Factor is a ratio that measures the relative evidence two statistical hypotheses receive from the same observed data. Computed as BF₁₀ = P(D|H₁) / P(D|H₀), it tells you how many times more probable the data are under the alternative hypothesis compared to the null hypothesis. A BF₁₀ of 5 means the data are five times more probable under H₁ than under H₀. The concept was developed by Harold Jeffreys in the 1930s–40s as a principled Bayesian alternative to classical significance testing.
How do you interpret a Bayes Factor of 3?
A BF₁₀ of 3 falls at the lower boundary of "moderate" evidence for H₁ on the Jeffreys Scale. It means the observed data are three times more probable under the alternative hypothesis than under the null. This is generally considered the minimum threshold for noteworthy evidence, but it is weak enough that replication is strongly advisable. Do not interpret it as conclusive support for H₁ on its own.
Can the Bayes Factor prove the null hypothesis?
No statistical method proves any hypothesis — that would require infinite data. What the Bayes Factor can do, and what p-values cannot, is provide positive evidence in favor of the null. When BF₀₁ > 3 (equivalently, BF₁₀ < 1/3), the data actively favor H₀ over H₁. This matters in practice: a team that sees BF₀₁ = 22 after an A/B test has a quantified reason to stop the test and conclude the variants are equivalent, rather than just "failing to reject" a null they cannot positively affirm.
What is the difference between BF10 and BF01?
BF₁₀ measures evidence for H₁ over H₀, while BF₀₁ measures evidence for H₀ over H₁. They are exact mathematical reciprocals: BF₀₁ = 1 / BF₁₀. A BF₁₀ of 10 corresponds to a BF₀₁ of 0.10. Convention recommends reporting whichever direction is greater than 1, so that the magnitude of the number directly communicates the strength of evidence.
How sensitive is the Bayes Factor to the choice of prior?
The Bayes Factor is sensitive to the prior distribution specified for H₁, particularly when evidence is moderate (BF₁₀ between 1 and 10). When evidence is strong (BF₁₀ > 30), the result tends to be robust across reasonable prior choices. To address sensitivity concerns, researchers often perform a "robustness check" — reporting Bayes Factors across a range of prior scales (e.g., r = 0.5, 0.707, 1.0) and noting whether the substantive conclusion changes. JASP provides this automatically via its "Prior Sensitivity" plots.
How is the Bayes Factor related to Bayes' theorem?
The Bayes Factor is the updating multiplier in Bayes' theorem when applied to model comparison. Bayes' theorem in model form says: Posterior Odds = Bayes Factor × Prior Odds. The Bayes Factor is the component that depends entirely on the data — it represents exactly how much the evidence should shift your belief from the prior odds to the posterior odds. If you started with equal prior odds (P(H₁) = P(H₀) = 0.5), then your posterior probability of H₁ is BF₁₀ / (1 + BF₁₀). See the Bayes theorem guide for the full derivation.
What software computes Bayes Factors?
The most accessible options are: JASP (free GUI software at jasp-stats.org), the BayesFactor package in R, the pingouin library in Python, and the brms package in R for more complex Bayesian regression models via Stan. For advanced custom models, PyMC (Python) and Stan (called via RStan or PyStan) allow full Bayesian inference including Bayes Factor computation via bridge sampling.

The Bayes Factor connects to a broader set of topics in Bayesian and frequentist statistics. The links below cover the foundational concepts that underpin Bayes Factor analysis, all from Statistics Fundamentals:

🔗

Bayes' Theorem

The mathematical foundation that links prior probability, likelihood, and posterior probability. The Bayes Factor is the likelihood ratio component in the model-comparison form of this theorem.

🔗

Hypothesis Testing

The frequentist framework for making decisions with statistical data. Understanding null hypothesis significance testing helps clarify exactly where and why Bayes Factors offer a different approach.

🔗

P-Values

The frequentist measure of evidence most commonly compared to the Bayes Factor. Reading both guides together reveals the different inferential goals each tool serves.

🔗

Conditional Probability

The marginal likelihood P(D|H) in the Bayes Factor formula is a conditional probability. This guide covers the foundational rules needed to understand how likelihoods are constructed and evaluated.

🔗

Effect Size

Cohen's d and other standardized effect sizes are the parameters the Bayes Factor integrates over when computing the marginal likelihood under H₁. Understanding effect sizes is essential for prior selection.

🔗

Pearson Correlation

The Bayesian correlation test computes a Bayes Factor for whether ρ = 0. This guide covers the frequentist version, which makes a useful comparison point for understanding what the Bayesian test adds.