How do you interpret a p-value?

A p-value is the probability of observing your data, or something more extreme, assuming the null hypothesis is true. A p-value below your significance threshold (commonly 0.05) means the result is statistically significant and you reject the null hypothesis. A p-value at or above that threshold means you fail to reject it.

What is the difference between statistical significance and practical significance?

Statistical significance tells you whether an observed effect is unlikely to be due to random chance (p < α). Practical significance tells you whether that effect is large enough to matter in the real world, measured by effect sizes like Cohen's d or η². A result can be statistically significant but too small to be practically meaningful.

How do you interpret a confidence interval?

A 95% confidence interval gives a range of plausible values for the population parameter. If you were to repeat your study many times, 95% of such intervals would contain the true value. A narrower interval signals a more precise estimate. If the interval excludes the null value (0 for a difference, 1 for a ratio), the result is statistically significant at that level.

How do you interpret regression output?

In regression output, the coefficient (β) tells you the expected change in the outcome for each one-unit increase in a predictor, holding all others constant. The p-value for each coefficient tests whether that predictor adds explanatory power. R-squared shows the proportion of variance in the outcome explained by the model.

Statistical Interpretation: A Guide to Interpreting Results (2026)

Q: What does interpretation mean in statistics?

Statistical interpretation is the process of translating quantitative outputs — test statistics, p-values, confidence intervals, regression coefficients — into meaningful, actionable conclusions about a research question or real-world problem.

Q: How do you interpret a confidence interval?

A 95% confidence interval gives a range of plausible values for the population parameter. If you were to repeat your study many times, 95% of such intervals would contain the true value. A narrower interval signals a more precise estimate. If the interval excludes the null value (0 for a difference, 1 for a ratio), the result is statistically significant at that level.

Q: How do you interpret regression output?

In regression output, the coefficient (β) tells you the expected change in the outcome for each one-unit increase in a predictor, holding all others constant. The p-value for each coefficient tests whether that predictor adds explanatory power. R-squared shows the proportion of variance in the outcome explained by the model.

What Is Statistical Interpretation?

Definition — Statistical Interpretation

Statistical interpretation is the process of translating quantitative outputs from statistical analyses — test statistics, p-values, confidence intervals, model coefficients — into meaningful, defensible conclusions about a research question or real-world decision. It requires evaluating both statistical evidence and practical relevance.

Running a statistical test and reading the output are two separate acts. A t-test produces a number like t(38) = 2.47, p = 0.018. Interpretation is what comes next: deciding what that result means for the question you were actually asking, and how confident you can be in that answer.

Good interpretation rests on three pillars. First, you need to know what the test was measuring and whether its assumptions were met. Second, you need to assess the evidence against the null hypothesis — that is what the p-value and test statistic tell you. Third, you need to judge the size and direction of the effect, because statistical significance and practical importance are not the same thing.

The Statistics Fundamentals team compiled this guide to cover every layer of the interpretation process, from the first glance at output to communicating results to a non-technical audience.

p < α

Statistical Significance Threshold

Confidence Interval Width = Precision

d, η²

Effect Size = Practical Importance

R²

Variance Explained by Model

Calculation vs. Interpretation: Why the Difference Matters

Statistical software can calculate any test in seconds. What it cannot do is tell you whether the result is meaningful, whether the right test was chosen, or whether the conclusion drawn from the output is justified. Those judgments require interpretation.

Consider two scenarios. In the first, a researcher finds p = 0.04 in a drug trial with n = 12 patients and an effect size of d = 0.18. In the second, a quality engineer finds p = 0.0001 in a manufacturing process check with n = 10,000 and d = 0.05. Both results are "statistically significant." The first may represent a real and meaningful clinical effect but is underpowered. The second is almost certainly too small to affect production decisions despite the tiny p-value. These distinctions only emerge through interpretation, not calculation.

💡

Key Point

Statistical significance tells you about the probability of your data under the null hypothesis. It says nothing about the magnitude of an effect, its real-world importance, or whether the study was well-designed. Interpretation requires all three.

The 6-Step Framework for Interpreting Statistical Results

📋

Featured Snippet — 6-Step Interpretation Framework

Step 1: Identify the statistical method and verify assumptions. Step 2: Review the research question and hypotheses. Step 3: Extract the core numerical outputs. Step 4: Assess statistical significance alongside effect size. Step 5: Translate findings into practical conclusions. Step 6: Communicate results with appropriate caveats.

Identify the Statistical Method and Verify Assumptions

Each statistical test rests on assumptions — normality, independence, equal variances, linearity, and others. If assumptions are violated, the p-value and confidence interval may not be trustworthy. For example, a t-test requires that observations be independent and roughly normal (or n be large enough for the Central Limit Theorem to apply). Check these before drawing conclusions. See the statistical assumptions guide for a full checklist by test type.

Review the Research Question and Hypotheses

Restate the null and alternative hypotheses in plain language before reading the output. This anchors interpretation and prevents the common error of answering a different question than the one being tested. Note whether the test is one-tailed or two-tailed, because that changes the p-value and the correct conclusion. The full treatment of hypothesis structure is in the null and alternative hypothesis guide.

Extract and Isolate the Core Numerical Outputs

Identify the test statistic (t, z, F, χ²), degrees of freedom, p-value, and any confidence intervals or effect size measures reported. Write them out explicitly: "t(24) = 3.12, p = 0.005, 95% CI [1.2, 4.8], d = 0.63." This structured summary is the foundation every subsequent interpretation step builds on. The notation is standardized by the APA Style Guide for Statistics.

Assess Statistical Significance Alongside Effect Size

Compare the p-value to your pre-specified α (typically 0.05). If p < α, the result is statistically significant — the null hypothesis is rejected. Then look at effect size. A Cohen's d of 0.2 is small, 0.5 is medium, 0.8 is large. An η² of 0.01 is small, 0.06 is medium, 0.14 is large. These two pieces of information together tell a complete story about the evidence.

Translate Findings into Practical, Real-World Conclusions

Convert the statistical result into domain-specific language. Rather than "H₀ was rejected (p = 0.03)," write "The new training program produced a statistically significant improvement in test scores of approximately 8 points (95% CI [2.1, 13.9]), a medium effect (d = 0.52)." Numbers need context — a reference group, a unit of measurement, and an indication of uncertainty.

Communicate Results with Transparency and Caveats

Report what the analysis cannot prove as clearly as what it can. Acknowledge sample size limitations, whether the study was pre-registered, and any assumptions that were approximately rather than exactly met. This honesty is what separates rigorous statistical communication from overconfident claims. The EQUATOR Network provides reporting standards across disciplines.

Critical Comparisons in Statistics

Statistical Significance vs. Practical Significance

This is the most frequent misunderstanding in applied statistics. Statistical significance is a binary call: p is either below your threshold or it is not. Practical significance asks a different question — does the effect matter in the context of your domain?

Dimension	Statistical Significance	Practical Significance
What it measures	Whether the result is unlikely under H₀	Whether the effect size is meaningful in context
Key metric	p-value vs. α	Cohen's d, η², ω², odds ratio
Affected by sample size	Yes — large n inflates significance	No — effect size is independent of n
Can be misleading	Yes — trivial effects become "significant" with large n	Yes — large effects can be impractical to achieve
Required for a complete result	Yes	Yes

A well-known example: a study with n = 100,000 found that people who drink coffee have a statistically significant higher resting heart rate (p < 0.001). The effect size was 0.3 beats per minute. Statistically real, practically irrelevant for most clinical decisions. Always report both.

Correlation vs. Causation

A correlation coefficient of r = 0.85 between ice cream sales and drowning rates tells you the two variables move together strongly. It says nothing about whether one causes the other — both are driven by a third variable (hot weather). Causation requires either randomized experimental design or, in observational studies, careful causal modeling using tools like directed acyclic graphs (DAGs) and instrumental variables.

⚠️

Common Mistake

Correlation describes the strength and direction of a linear relationship. It does not establish that changes in X produce changes in Y. Only a randomized controlled experiment, or a valid quasi-experimental design, supports causal language. See the Pearson correlation guide for the correct interpretation of r.

Effect Size vs. P-Value

The p-value answers: "How likely is this data if nothing is happening?" The effect size answers: "How big is what's happening?" They complement each other. A p-value alone cannot tell you whether an effect is worth caring about. An effect size without a p-value cannot tell you whether the effect is real rather than sampling noise.

Scenario	p-value	Effect Size (d)	Correct Interpretation
Large n, tiny effect	0.001	0.05 (trivial)	Statistically significant, not practically meaningful
Small n, medium effect	0.12	0.55 (medium)	Not significant, but may be real — underpowered study
Large n, large effect	0.0001	0.82 (large)	Statistically and practically significant — strongest evidence
Small n, small effect	0.45	0.15 (small)	Not significant; effect too small or study underpowered

Confidence Interval vs. Hypothesis Test

A confidence interval and a hypothesis test answer related but distinct questions. A hypothesis test asks whether a specific null value (e.g., μ = 0) can be rejected. A confidence interval gives the range of plausible values for the parameter consistent with the data. The two are mathematically equivalent for two-sided tests — if the 95% CI excludes the null value, then p < 0.05. But confidence intervals carry more information because they convey both significance and precision.

Quick Reference Tables for Interpretation

P-Value Interpretation Reference

P-Value Range	Evidence Against H₀	Common Decision	Reporting Language
p < 0.001	Very strong	Reject H₀	"Highly statistically significant (p < .001)"
0.001 ≤ p < 0.01	Strong	Reject H₀	"Statistically significant (p = .006)"
0.01 ≤ p < 0.05	Moderate	Reject H₀	"Statistically significant (p = .032)"
0.05 ≤ p < 0.10	Marginal / Weak	Fail to reject H₀	"Marginal trend, p = .07 — not significant at α = .05"
p ≥ 0.10	Little to none	Fail to reject H₀	"No statistically significant effect (p = .34)"

Source: p-value thresholds follow Wasserstein & Lazar (2016), The ASA Statement on p-values, The American Statistician. The full guide to interpreting p-values is on Statistics Fundamentals.

Correlation Coefficient Strength Guide

\|r\| Value	Strength	Interpretation	Field Example
0.00 – 0.10	Negligible	Essentially no linear relationship	Random noise in financial data
0.10 – 0.30	Weak	Small but potentially real association	Age and resting heart rate
0.30 – 0.50	Moderate	Consistent association, some scatter	Education level and income
0.50 – 0.70	Strong	Clear linear trend	Height and weight in adults
0.70 – 0.90	Very strong	Strong predictive value	Standardized test scores and GPA
0.90 – 1.00	Near-perfect	Precise linear relationship	Repeated measurement of the same variable

The sign of r (positive or negative) tells you direction. The absolute value tells you strength. For the full derivation of this measure, see the Pearson correlation page.

Effect Size Reference: Cohen's d and η²

Label	Cohen's d	η² (Eta-squared)	Typical Context
Trivial	< 0.20	< 0.01	Detectable only with very large samples
Small	0.20 – 0.49	0.01 – 0.05	Subtle differences in psychology or education
Medium	0.50 – 0.79	0.06 – 0.13	Visible differences between groups
Large	≥ 0.80	≥ 0.14	Obvious, replicable differences

Statistical Interpretation Examples

How to Interpret P-Values

Interpretation Example — P-Value

Output: A two-sample t-test comparing exam scores between two teaching methods gives t(58) = 2.41, p = 0.019.

State what the p-value measures: p = 0.019 means there is a 1.9% probability of observing a t-statistic of 2.41 or more extreme if the two teaching methods produced identical mean scores in the population.

Compare to threshold: p = 0.019 < α = 0.05, so the result is statistically significant. Reject H₀ that the two means are equal.

State the conclusion correctly: The data provide sufficient evidence, at the 5% significance level, that the two teaching methods produce different mean exam scores. The direction and magnitude of the difference require the effect size and confidence interval to complete the picture.

✅ Plain-English conclusion: Students taught with Method B scored significantly higher on average than those taught with Method A (t(58) = 2.41, p = .019). This result is unlikely to reflect chance sampling variability alone.

🚫

Never Say This

"p = 0.019 means there is a 98.1% chance that Method B is better." The p-value is not a probability that H₀ is true or false. It is a probability about the data, not about the hypothesis itself.

How to Interpret Confidence Intervals

A 95% confidence interval of [3.2, 11.8] for the mean difference in blood pressure (mmHg) between a drug group and a placebo group tells you several things at once.

Interpretation Example — Confidence Interval

Output: 95% CI for mean BP reduction = [3.2, 11.8] mmHg; mean difference = 7.5 mmHg.

Point estimate: The drug reduced blood pressure by 7.5 mmHg on average in this sample.

Precision: The interval [3.2, 11.8] spans 8.6 mmHg — moderately wide, indicating the estimate carries some uncertainty.

Statistical significance: The interval does not include 0 (which would represent no difference), so p < 0.05 for the two-sided test.

Practical significance: A reduction of 3.2 to 11.8 mmHg is clinically meaningful. Even the lower bound falls above typical thresholds for clinically relevant BP reduction, so the drug appears both statistically and practically significant.

✅ Plain-English conclusion: The drug reduced systolic blood pressure by an estimated 7.5 mmHg (95% CI [3.2, 11.8]), a statistically and clinically meaningful reduction. The plausible range of true effects is entirely above zero.

For the full derivation of confidence intervals and how to construct them, see the confidence intervals guide and the specific page on confidence interval for the mean.

How to Interpret Correlation Coefficients

Interpretation Example — Pearson r

Output: r = 0.72, p = 0.003, n = 45 between study hours per week and course grade.

Direction: r = +0.72 — positive. More study hours are associated with higher grades.

Strength: |r| = 0.72 falls in the "very strong" category (0.70–0.90). The relationship is reliable.

Variance explained: r² = 0.52, so approximately 52% of the variance in course grades is shared with study hours. About half the variation in grades is accounted for by study time.

Significance: p = 0.003 < 0.05 — the correlation is statistically significant; very unlikely due to chance in a sample of 45.

Causation: A strong positive correlation does not prove that studying causes higher grades. Other variables (prior knowledge, motivation, test-taking skills) could drive both.

✅ Plain-English conclusion: Study hours and course grades show a strong positive linear relationship (r = .72, p = .003). Students who study more tend to score higher, though the correlation does not establish that studying alone drives the improvement.

How to Interpret Regression Output

Simple linear regression output contains multiple pieces of information. Each requires its own interpretation. The example below uses a regression predicting annual salary (in thousands of dollars) from years of experience.

Simple Linear Regression Equation

Ŷ = β₀ + β₁X

β₀ = intercept (predicted Y when X = 0) β₁ = slope (change in Y per 1-unit increase in X) Ŷ = predicted outcome X = predictor variable

Interpretation Example — Simple Linear Regression

Output: Ŷ = 38.2 + 3.7X; β₁ p < 0.001; R² = 0.68; n = 80.

Intercept (38.2): When years of experience = 0, the predicted salary is $38,200. This is meaningful only if X = 0 is realistic — which it is here (someone entering the workforce).

Slope (3.7): Each additional year of experience is associated with a $3,700 increase in annual salary, holding everything else constant. The direction is positive (more experience = higher salary).

P-value for β₁ (< 0.001): The slope is highly statistically significant. Years of experience reliably predicts salary in this sample.

R² (0.68): The model explains 68% of the variance in annual salary. About a third of salary variation is attributable to factors outside the model (role type, industry, education, etc.).

✅ Plain-English conclusion: Years of experience is a significant predictor of salary (β₁ = 3.7, p < .001). Each additional year is associated with approximately $3,700 more per year. The model accounts for 68% of the variation in salaries observed in this dataset.

For the full guide to regression coefficients, residuals, and R², see simple linear regression, R-squared interpretation, and residual analysis.

How to Interpret ANOVA Outputs

ANOVA tests whether at least one group mean differs from the others. The F-statistic is the ratio of variance between groups to variance within groups. A large F means the group differences are large relative to the random variation within each group.

ANOVA F-Statistic

F = MS_between / MS_within

MS = mean square (variance estimate) df₁ = k − 1 (between groups) df₂ = N − k (within groups)

Interpretation Example — One-Way ANOVA

Output: F(2, 57) = 8.34, p = 0.001, η² = 0.23. Three fertilizer types tested on crop yield (kg), n = 60.

F-statistic interpretation: F(2, 57) = 8.34 means the between-group variance is 8.34 times the within-group variance. The subscripts tell us df₁ = 2 (three groups minus one) and df₂ = 57 (60 observations minus 3 groups).

Statistical significance: p = 0.001 < 0.05. At least one fertilizer type produces significantly different mean crop yields than the others.

Effect size: η² = 0.23 is a large effect. About 23% of the total variance in crop yield is explained by fertilizer type — a meaningful agricultural difference.

Follow-up tests: ANOVA only tells you that differences exist. Post-hoc tests (Tukey's HSD, Bonferroni) identify which specific pairs of groups differ. See the ANOVA guide for the full post-hoc procedure.

✅ Plain-English conclusion: Fertilizer type had a statistically significant effect on crop yield (F(2, 57) = 8.34, p = .001, η² = .23). The large effect size indicates that fertilizer choice explains roughly a quarter of the variation in yield. Post-hoc tests are needed to identify which specific fertilizers differ.

How to Interpret T-Tests

T-tests produce a t-statistic, degrees of freedom, and a p-value. The specific meaning depends on which t-test you ran. The most common types are the independent samples t-test (two separate groups), the paired samples t-test (same subjects measured twice), and the one-sample t-test (one group compared to a known value).

Which T-Test Interpretation Applies?

One group, comparing to a known population mean

→

One-sample t-test: β₁ = difference from μ₀

Two independent groups (different people)

→

Independent samples t-test: β₁ = mean difference between groups

Same people measured at two time points

→

Paired t-test: β₁ = mean of within-subject differences

For detailed procedures and worked examples for each type, see the dedicated pages: one-sample t-test, two-sample t-test, and paired samples t-test.

How to Interpret Chi-Square Tests

The chi-square test of independence examines whether two categorical variables are associated. The test statistic χ² measures how far observed cell frequencies deviate from what you would expect if the variables were completely independent.

Interpretation Example — Chi-Square Test

Output: χ²(2) = 9.41, p = 0.009, n = 150. Test of association between smoking status (never / former / current) and lung disease diagnosis (yes / no).

Test statistic: χ²(2) = 9.41. The "(2)" is degrees of freedom, calculated as (rows − 1) × (columns − 1) = (3 − 1)(2 − 1) = 2.

P-value: p = 0.009 < 0.05. Reject H₀ of independence. Smoking status and lung disease diagnosis are not independent in this sample.

Effect size: For chi-square, Cramér's V provides effect size. V = √(χ²/(n × df_min)) = √(9.41/(150 × 1)) ≈ 0.25. This is a small-to-moderate association.

✅ Plain-English conclusion: There is a statistically significant association between smoking status and lung disease diagnosis (χ²(2) = 9.41, p = .009, V = .25). Current smokers had a higher rate of lung disease compared to never-smokers, though the association is modest in strength.

To look up critical values for your test, use the chi-square table. The full step-by-step test procedure is at chi-square test of independence.

How to Interpret Effect Sizes

Cohen's d, the most widely used effect size for comparing two means, expresses the difference in standard deviation units. A d of 0.5 means the two group means are half a standard deviation apart — about the difference between the 50th and 69th percentile in a normal distribution.

Cohen's d Formula

d = (μ₁ − μ₂) / s_pooled

μ₁ − μ₂ = difference between group means s_pooled = pooled standard deviation d = 0.5 → 50th vs. 69th percentile

For ANOVA and variance-explained contexts, eta-squared (η²) is preferred. It ranges from 0 to 1 and represents the proportion of total variance attributable to the group factor. The Cohen's d guide and the effect size page cover the full range of measures.

How to Interpret Residual Diagnostics

Residuals are the differences between observed and predicted values in a regression model. Examining them is the primary way to check whether model assumptions hold.

Residual Plot	What It Checks	Good Pattern	Problem Pattern
Residuals vs. Fitted	Linearity & constant variance	Random scatter around zero	Curve or funnel shape
Normal Q-Q Plot	Normality of residuals	Points on diagonal line	S-curve or heavy tails
Scale-Location	Homoscedasticity	Horizontal band of points	Points spread wider at high fitted values
Cook's Distance	Influential observations	All points below 0.5	Points above 1.0 need investigation

The residuals guide and the page on influential points walk through each diagnostic plot with annotated examples.

Interactive P-Value Interpreter

Enter a p-value, your significance level, and an optional effect size (Cohen's d) to receive a structured plain-English interpretation.

P-Value and Effect Size Interpreter

P-Value

Significance Level (α)

Cohen's d (optional)

Sample Size n (optional)

Real-World Applications of Statistical Interpretation

🏥

Clinical Trials

Drug efficacy trials use confidence intervals and p-values to determine whether a treatment differs from placebo. Effect sizes like number needed to treat (NNT) translate statistical findings into clinical decisions.

📊

A/B Testing

In product and marketing experiments, statistical interpretation determines whether one variant outperforms another. Minimum detectable effect sizes and power calculations guard against underpowered tests.

🏭

Quality Control

Manufacturing processes use control charts and hypothesis tests to distinguish natural variation from assignable causes. False alarm rates and detection power are central interpretation concerns.

🤖

Machine Learning Evaluation

Model performance metrics (accuracy, AUC, RMSE) require statistical interpretation to determine whether differences between models are reliable or due to test-set variance. Cross-validation and bootstrap confidence intervals apply here.

📈

Econometrics

Regression coefficients in economic models estimate elasticities and marginal effects. Interpreting these requires understanding coefficient units, holding-constant assumptions, and the limits of observational data for causal inference.

🏛️

Public Policy Analysis

Policy evaluations use quasi-experimental methods (difference-in-differences, regression discontinuity) whose outputs require careful interpretation of local average treatment effects and generalizability.

Common Pitfalls in Statistical Interpretation

Pitfall	What People Say	What They Should Say
Misreading the p-value	"p = 0.04 means there's a 96% chance H₁ is true"	"p = 0.04 means data this extreme occurs only 4% of the time under H₀"
Significance without effect size	"We found a significant effect — the treatment works"	"We found a significant but small effect (d = 0.12) — clinical relevance is questionable"
Confusing r with r²	"r = 0.70 means 70% of variance is explained"	"r = 0.70 means r² = 0.49 — 49% of variance is explained"
Accepting the null	"p = 0.23, so there is no effect"	"p = 0.23 — we fail to reject H₀; this may reflect insufficient power"
P-hacking	Running many tests and reporting only the significant ones	Pre-register hypotheses; apply Bonferroni or FDR correction for multiple comparisons
Overstating R²	"R² = 0.85 proves the model is correct"	"R² = 0.85 means the model explains 85% of variance in the training data — out-of-sample validation is still needed"

⚠️

On Statistical Assumptions

Every statistical test assumes certain data properties. A t-test assumes independence and approximate normality. A Pearson r assumes linearity. Violating assumptions can invalidate p-values and confidence intervals entirely, regardless of how large the sample is. Always check assumptions before interpreting results. See the statistical assumptions guide for a checklist by test type.

Statistical Interpretation Glossary

P-Value

P(data | H₀)

The probability of observing data at least as extreme as the sample, assuming the null hypothesis is true. Small values (below α) indicate the data are inconsistent with H₀.

Confidence Interval

x̄ ± z · (σ/√n)

A range of plausible values for a population parameter, constructed so that a stated percentage of such intervals (e.g., 95%) would contain the true value across repeated samples.

Cohen's d

d = (μ₁ − μ₂) / s_p

A standardized effect size for comparing two means. Expresses the difference in units of the pooled standard deviation. Small: 0.2; Medium: 0.5; Large: 0.8.

R-Squared (R²)

R² = 1 − SS_res/SS_tot

The proportion of variance in the dependent variable explained by the regression model. Ranges from 0 to 1; higher values indicate a better-fitting model.

Pearson r

r = Cov(X,Y)/(σ_X · σ_Y)

A measure of linear association between two continuous variables. Ranges from −1 (perfect negative) to +1 (perfect positive). Zero indicates no linear relationship.

Standard Error

SE = s / √n

The standard deviation of the sampling distribution of a statistic (usually the mean). Smaller values indicate more precise estimates. Decreases as n increases.

Statistical Significance

p < α

A result is statistically significant when the p-value falls below the pre-specified significance level α. It means the data are inconsistent with H₀ at that threshold — not that the effect is large or important.

F-Statistic (ANOVA)

F = MS_between / MS_within

The ratio of variance between group means to variance within groups. A large F indicates the groups differ more than expected from chance. Compare to the F-table for the critical value.

Frequently Asked Questions

Statistical interpretation is the process of translating quantitative test outputs — p-values, confidence intervals, regression coefficients, effect sizes — into meaningful conclusions about a research question. It goes beyond reading numbers to judging whether those numbers answer the question being asked, whether the assumptions underlying the test were met, and what the findings mean in context. Calculation produces the numbers; interpretation produces the insight.

Follow the 6-step framework above: (1) verify the method and its assumptions; (2) restate the research question and hypotheses; (3) extract the key numbers — test statistic, degrees of freedom, p-value, confidence interval, effect size; (4) compare p to α and evaluate effect magnitude; (5) translate the output into a plain-English, domain-specific conclusion; (6) acknowledge limitations and communicate uncertainty. Avoid reducing results to "significant vs. not significant" without reporting effect sizes.

A p-value is the probability of observing your sample data, or something more extreme, if the null hypothesis were true. It is not the probability that H₀ is true, and it is not the probability that H₁ is true. Compare it to your pre-specified α: if p < α, reject H₀ and call the result statistically significant. If p ≥ α, fail to reject H₀. Remember that a very small p-value in a large sample can coexist with a trivially small effect. Always pair p-value interpretation with an effect size. The full guide is at p-values.

Statistical significance answers whether an observed effect is unlikely to be due to chance (p < α). Practical significance asks whether the effect is large enough to matter in the real world. They can diverge sharply. A drug that reduces pain scores by 0.3 points on a 100-point scale might reach p < 0.001 in a large trial, but a change of 0.3 points is not clinically meaningful. Conversely, a treatment with d = 0.9 in a small pilot study (n = 10) might give p = 0.08 — not statistically significant, but potentially quite important. Effect sizes like Cohen's d, η², and the number needed to treat (NNT) measure practical significance.

A 95% confidence interval is a range of values calculated from your sample data. The correct interpretation is: if you repeated the study many times and constructed a 95% CI each time, 95% of those intervals would contain the true population parameter. The interval width reflects precision — narrower intervals come from larger samples or less variability. If the interval excludes the null value (0 for a difference, 1 for a ratio), the test is statistically significant at the corresponding α level. For details and worked examples, see confidence intervals.

Each regression output element has a distinct interpretation. The intercept (β₀) is the predicted outcome when all predictors equal zero. Each slope coefficient (β₁, β₂, …) is the expected change in the outcome for a one-unit increase in that predictor, holding all others fixed. The p-value for each coefficient tests whether that predictor contributes beyond chance. R² is the proportion of variance in the outcome explained by the model as a whole. Residual plots check whether model assumptions hold. See simple linear regression and multiple linear regression for full guides.

Statistical interpretation bridges the gap between raw output and evidence-based decisions. Without it, numbers from a regression or t-test are meaningless sequences of digits. Proper interpretation prevents common errors like accepting the null, overstating the importance of a significant p-value, or acting on effects too small to matter in practice. It also ensures that scientific findings are communicated accurately — reducing the chance that subsequent researchers, policymakers, or clinicians make decisions based on misunderstood data.

The most frequent errors include: (1) treating p-values as the probability that H₀ is true; (2) ignoring effect sizes and treating "significant" as synonymous with "important"; (3) concluding "no effect" from a non-significant result without considering statistical power; (4) confusing correlation with causation; (5) data dredging or p-hacking — running many tests until one reaches 0.05; (6) confusing r with r² in correlation interpretation; (7) failing to check and report whether model assumptions were met. The misconception table in the pitfalls section above covers each with examples.