Hypothesis Testing Statistical Inference Significance Testing 30 min read June 11, 2026
BY: Statistics Fundamentals Team
Reviewed By: Minsa A (Senior Statistics Editor)

P-Values: Complete Guide to Meaning, Calculation, and Interpretation

A researcher runs a clinical trial and gets p = 0.03. A data scientist tests an A/B variant and sees p = 0.12. A student reads a paper reporting p < 0.001. What does any of this actually mean? The p-value is one of the most used and most misunderstood numbers in all of science — and getting it right matters.

This guide covers the exact definition, the correct interpretation, how to calculate p-values for z-tests, t-tests, chi-square tests, and ANOVA, five fully worked examples, the decision rule, the relationship to confidence intervals and effect size, and the five misconceptions that trip up even experienced researchers. The interactive calculator at the bottom lets you compute a p-value from your own data in seconds.

What You'll Learn
  • ✓ The exact definition of a p-value — what it measures and what it does not
  • ✓ How to interpret p-values correctly using the decision rule p vs α
  • ✓ Formulas and step-by-step calculations for z-test, t-test, chi-square, and ANOVA
  • ✓ Five fully worked examples with real-world scenarios
  • ✓ The five most common misconceptions — and why they are wrong
  • ✓ How p-values relate to confidence intervals and effect size
  • ✓ The ASA statement on p-values and what statisticians actually recommend
  • ✓ An interactive p-value calculator you can use with your own numbers

What Is a P-Value? (Definition)

Definition — P-Value (Probability Value)
A p-value is the probability of obtaining a test statistic at least as extreme as the one calculated from your sample data, assuming the null hypothesis (H₀) is true. It measures how compatible your data are with H₀ — a small p-value means your data would be unusual if H₀ were correct.
p = P(observing result this extreme | H₀ is true)

The core idea is a thought experiment: if the null hypothesis were true and you repeated the same study many times, how often would you see a result as extreme as yours? A p-value of 0.04 means 4% of the time — infrequent enough to raise doubt about H₀. A p-value of 0.60 means 60% of the time — your result is entirely ordinary under H₀ and gives no reason to reject it.

That probability is measured against a pre-set threshold called the significance level, written α. The most common value is α = 0.05, established by convention in R.A. Fisher's 1925 work. When p < α, the result is called statistically significant and you reject H₀. When p ≥ α, you fail to reject H₀ — never "accept" it, because absence of evidence is not evidence of absence.

This is covered as part of the broader hypothesis testing framework at Statistics Fundamentals, where the full 6-step procedure is described in detail. The p-value specifically answers step 5 of that procedure.

⚡ P-Value Quick Reference
  • What it measures: Compatibility of your sample data with the null hypothesis
  • Range: Always between 0 and 1. Cannot be negative or greater than 1.
  • Small p-value: Your data would be rare if H₀ were true — evidence against H₀
  • Large p-value: Your data is consistent with H₀ — no reason to reject it
  • Decision rule: Reject H₀ when p < α (usually 0.05)
  • What it does not measure: The probability H₀ is true, the size of an effect, or practical importance
0.05
Standard α threshold
0.01
Conservative α (medicine)
0.001
Very strong evidence
< α
Reject H₀

How to Interpret a P-Value

Interpretation is where most mistakes happen. The p-value tells you one thing: how surprising your data would be if H₀ were true. It tells you nothing else directly. Here is a concrete framework for reading any p-value you encounter.

The Decision Rule
If p < α → Reject H₀  |  If p ≥ α → Fail to Reject H₀
Set α before collecting data. Never adjust α after seeing your p-value.

Evidence Strength Scale

P-Value Interpretation Guide

0.00010.0010.010.050.100.501.0
p < 0.001 — Very strong evidence against H₀
0.001 ≤ p < 0.01 — Strong evidence against H₀
0.01 ≤ p < 0.05 — Moderate evidence; statistically significant at α = 0.05
0.05 ≤ p < 0.10 — Weak evidence; not significant at α = 0.05
p ≥ 0.10 — Little to no evidence against H₀

These labels are conventions, not hard rules. The American Statistical Association's 2016 statement — Wasserstein & Lazar (2016) — explicitly cautions against treating 0.05 as a bright line between meaningful and meaningless results. Context matters: a p-value of 0.04 in a small exploratory study calls for different weight than the same value in a pre-registered trial with n = 5,000.

One-Tailed vs Two-Tailed P-Values

The directionality of H₁ determines which tail of the distribution you count. A two-tailed test asks "is the effect in either direction?" and uses both tails, so it is twice as conservative as the corresponding one-tailed test. Always match the tail count to your H₁ — chosen before data collection, not after.

Test TypeH₁P-Value CalculationUse When
Two-tailedμ ≠ μ₀p = 2 × P(Z > |z|)You expect an effect but not its direction
Right-tailed (upper)μ > μ₀p = P(Z > z)You expect the parameter to be larger
Left-tailed (lower)μ < μ₀p = P(Z < z)You expect the parameter to be smaller

P-Value Formula and Calculation

There is no single p-value formula because the calculation depends on the test statistic your study uses. The steps are always the same: compute the test statistic, then find the probability of observing that value or more extreme under the null distribution.

From a Z-Test

Z-Test — Test Statistic
z = (x̄ − μ₀) / (σ / √n)
= sample mean μ₀ = null hypothesis mean σ = known population SD n = sample size
Two-Tailed P-Value from Z
p = 2 × (1 − Φ(|z|))

Φ is the standard normal CDF. Once you have z, you look up the area in the tail of the normal distribution using a z-table. For |z| = 1.96 the one-tail area is 0.025, so the two-tailed p-value is 0.05 — exactly the conventional cutoff.

Reference: Fisher, R.A. (1925). Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd. Standard normal table values from the NIST Engineering Statistics Handbook.

From a T-Test

T-Test — Test Statistic
t = (x̄ − μ₀) / (s / √n)
s = sample SD df = n − 1 (one-sample)

Once you have t and df, you find p using the t-distribution with df degrees of freedom. Because the t-distribution has heavier tails than the normal, the same test statistic produces a larger p-value than a z-test would — the extra uncertainty from not knowing σ is accounted for. Use the t-distribution table or software for the exact p-value.

From a Chi-Square Test

Chi-Square — Test Statistic
χ² = Σ [(O − E)² / E]
O = observed frequency E = expected frequency df = (rows−1)(cols−1)

The p-value from a chi-square test is always one-tailed (upper) because χ² is always positive. Large χ² means observed and expected counts diverge greatly — evidence against independence or goodness-of-fit. See the chi-square table for critical values.

Calculating P-Values in Software

SoftwareFunction / CommandReturns
Excel=NORM.S.DIST(-ABS(z),TRUE)*2Two-tailed z p-value
Excel=T.DIST.2T(ABS(t),df)Two-tailed t p-value
Excel=CHISQ.DIST.RT(chi2,df)Chi-square p-value
R2*pnorm(-abs(z))Two-tailed z p-value
R2*pt(-abs(t), df)Two-tailed t p-value
Rpchisq(chi2, df, lower.tail=FALSE)Chi-square p-value
Python (scipy)stats.norm.sf(abs(z))*2Two-tailed z p-value
Python (scipy)stats.t.sf(abs(t), df)*2Two-tailed t p-value
SPSSReported automatically in output tablesListed as "Sig."

Worked Examples — P-Values Step by Step

Each example follows the same sequence: state hypotheses, set α, compute the test statistic, find the p-value, and draw a conclusion. The arithmetic is shown in full so you can see exactly where the number comes from.

Example 1 — One-Sample Z-Test

Worked Example 1 — Z-Test P-Value

A bottling plant claims its bottles contain μ = 500 mL on average (σ = 12 mL known). Quality control samples 64 bottles and finds x̄ = 503.6 mL. Find the p-value at α = 0.05 for a two-tailed test.

1

Hypotheses: H₀: μ = 500 mL  |  H₁: μ ≠ 500 mL (two-tailed)

2

Significance level: α = 0.05. Critical values: z = ±1.96 for a two-tailed test.

3

Standard error: SE = σ/√n = 12/√64 = 12/8 = 1.5

4

Test statistic: z = (503.6 − 500) / 1.5 = 3.6 / 1.5 = 2.40

5

P-value: One-tail area for z = 2.40 is P(Z > 2.40) = 0.0082 (from the z-table). Two-tailed p = 2 × 0.0082 = p = 0.016

6

Decision: p = 0.016 < α = 0.05 → Reject H₀

✅ Conclusion: At the 5% level, there is sufficient evidence that the mean fill volume differs from 500 mL. The sample mean of 503.6 mL is statistically significantly higher than the claimed average.

Example 2 — One-Sample T-Test

Worked Example 2 — T-Test P-Value

A sleep researcher believes adults sleep less than the recommended 8 hours. A sample of 16 adults yields x̄ = 7.1 hours with s = 1.2 hours. Is there evidence the population mean is below 8 hours? Use α = 0.05, one-tailed.

One-Sample T-Test
t = (7.1 − 8) / (1.2 / √16) = −0.9 / 0.3 = −3.00
df = 16 − 1 = 15 Left-tailed test (H₁: μ < 8)
1

Hypotheses: H₀: μ = 8 hours  |  H₁: μ < 8 hours (left-tailed)

2

α = 0.05. With df = 15, the one-tailed critical value is t* = −1.753 from the t-distribution table.

3

Test statistic: t = (7.1 − 8) / (1.2/4) = −0.9 / 0.3 = −3.00

4

P-value: P(t₁₅ < −3.00). From the t-distribution, the area in the left tail for t = −3.00 with df = 15 gives p ≈ 0.005

5

Decision: p = 0.005 < α = 0.05 → Reject H₀. Also |t| = 3.00 > 1.753.

✅ Conclusion: At the 5% level, the evidence supports that adults sleep significantly less than 8 hours on average. See the full one-sample t-test guide for the complete methodology.

Example 3 — Two-Sample T-Test

Worked Example 3 — Two-Sample T-Test P-Value

Two teaching methods are compared. Group A (n = 20, x̄ = 78, s = 9) and Group B (n = 20, x̄ = 83, s = 11). Is the mean difference statistically significant at α = 0.05?

1

Hypotheses: H₀: μ_A = μ_B  |  H₁: μ_A ≠ μ_B (two-tailed)

2

Pooled SE: SE = √(9²/20 + 11²/20) = √(81/20 + 121/20) = √(10.1) = 3.178

3

Test statistic: t = (78 − 83) / 3.178 = −5 / 3.178 = −1.573

4

P-value: With df ≈ 37 (Welch's approximation), two-tailed p ≈ 0.124. See the two-sample t-test page for the df formula.

5

Decision: p = 0.124 > α = 0.05 → Fail to reject H₀. The 5-point difference does not reach significance with these sample sizes and variances.

⚠️ Conclusion: There is not sufficient evidence at α = 0.05 to conclude the two teaching methods produce different mean scores. Note: "fail to reject" does not mean the methods are identical — it means the data are inconclusive.

Example 4 — Chi-Square Test of Independence

Worked Example 4 — Chi-Square P-Value

A marketing analyst surveys 200 customers to test whether purchase decision (Yes/No) is independent of ad type (Video/Banner). Observed: Video-Yes=60, Video-No=40, Banner-Yes=45, Banner-No=55.

1

Hypotheses: H₀: Purchase and ad type are independent  |  H₁: They are not independent

2

Expected frequencies: E(Video-Yes) = (100×105)/200 = 52.5  |  E(Video-No) = 47.5  |  E(Banner-Yes) = 52.5  |  E(Banner-No) = 47.5

3

Test statistic: χ² = (60−52.5)²/52.5 + (40−47.5)²/47.5 + (45−52.5)²/52.5 + (55−47.5)²/47.5 = 1.071 + 1.184 + 1.071 + 1.184 = 4.511

4

P-value: df = (2−1)(2−1) = 1. From the chi-square table, χ²(1) = 3.841 at α = 0.05. Our χ² = 4.511 > 3.841, so p ≈ 0.034

5

Decision: p = 0.034 < α = 0.05 → Reject H₀

✅ Conclusion: At α = 0.05, there is significant evidence that purchase decision and ad type are not independent — video ads produced proportionally more purchases.

Example 5 — P-Value in Simple Linear Regression

Worked Example 5 — Regression P-Value

In a simple linear regression of sales on advertising spend (n = 25), the estimated slope is b₁ = 2.35 with SE(b₁) = 0.78. Is the slope significantly different from zero?

1

Hypotheses: H₀: β₁ = 0 (no linear relationship)  |  H₁: β₁ ≠ 0

2

Test statistic: t = b₁ / SE(b₁) = 2.35 / 0.78 = 3.013

3

Degrees of freedom: df = n − 2 = 25 − 2 = 23

4

P-value: Two-tailed, t(23) = 3.013. The t-table gives critical value 2.069 at α = 0.05 and 2.807 at α = 0.01. Since 3.013 > 2.807, p < 0.01 (more precisely, p ≈ 0.006).

5

Decision: p ≈ 0.006 < α = 0.05 → Reject H₀. The slope is statistically significant.

✅ Conclusion: Advertising spend has a statistically significant positive linear relationship with sales. See the full simple linear regression guide for how slope, intercept, and R² work together.

P-Value vs Significance Level — Key Differences

These two quantities are easy to confuse but they play completely different roles. The significance level is a decision parameter you choose; the p-value is a statistic you calculate.

Aspect P-Value Significance Level (α)
What it isCalculated from your sample dataSet by the researcher before data collection
When determinedAfter running the testBefore collecting data
Typical valuesAnything between 0 and 10.05 (default), 0.01, or 0.10
What it representsEvidence against H₀ in your specific sampleMaximum acceptable rate of false positives
Role in decisionCompared to α to reach a decisionThe threshold p must beat to reject H₀
Can you change it?No — it's fixed by your dataYes — but only before seeing the p-value
⚠️
Never change α after seeing your p-value

Adjusting the significance level after calculating the p-value to achieve significance — sometimes called "p-hacking" — inflates the actual false positive rate well above the nominal α. Set α first; then calculate p; then compare.

P-Values and Confidence Intervals

P-values and confidence intervals carry the same information in different formats, and they always agree for the same α. If the p-value from a two-tailed test is less than 0.05, the 95% confidence interval for the parameter will not contain the null value — and vice versa.

P-Value Result95% Confidence IntervalInterpretation
p < 0.05Does not contain H₀ valueStatistically significant at α = 0.05
p = 0.05Boundary touches H₀ valueExactly at the threshold
p > 0.05Contains H₀ valueNot statistically significant at α = 0.05

The confidence interval adds something the p-value alone cannot provide: the range of plausible values for the parameter. A 95% confidence interval for the mean tells you both whether the result is significant and how large the effect plausibly is. The ASA and most statistical style guides now recommend reporting both, not just the p-value.

Statistical Significance vs Practical Significance

A p-value tells you whether an effect is real — it says nothing about whether it matters. With a large enough sample, even a trivially small difference becomes statistically significant.

Concrete Example

A drug reduces blood pressure by 0.4 mmHg. Is that meaningful?

With n = 50,000 participants, a 0.4 mmHg reduction might produce p = 0.0001 — highly significant. Clinically, a reduction of less than 1 mmHg is considered negligible. The p-value cannot tell you this. Effect size measures like Cohen's d, odds ratios, or the raw difference in the original units answer the practical question.

MeasureWhat It AnswersExample
P-valueIs there evidence for an effect?p = 0.003 → Yes, reject H₀
Effect size (Cohen's d)How large is the effect?d = 0.12 → very small
Confidence intervalWhat range of values is plausible?95% CI: [0.1, 0.7] mmHg reduction
PowerWhat was the chance of detecting a real effect?80% power → reasonable study design

See the Cohen's d guide for the standard effect size measure for t-tests, and the Pearson correlation page for effect size in correlation analysis.

Five Common P-Value Misconceptions

The 2016 ASA statement on p-values identified widespread misuse in published research. The five misconceptions below appear repeatedly in textbooks, papers, and statistical reporting.

MisconceptionWhat People BelieveWhat Is Actually True
#1 — Null probability p-value = P(H₀ is true) p-value = P(data this extreme | H₀ is true). These are completely different conditional probabilities.
#2 — Proof of hypothesis p < 0.05 proves the alternative hypothesis p < 0.05 means data are inconsistent with H₀ at this level. It establishes statistical significance, not truth.
#3 — No effect p > 0.05 means there is no effect p > 0.05 means the data are insufficient to reject H₀. The effect might exist but the study lacked power to detect it.
#4 — Significance = importance A statistically significant result is practically important Statistical significance depends on sample size. A large n can produce p < 0.05 for a negligibly small effect.
#5 — Replication p = 0.05 means a 95% chance of replication The replication rate depends on effect size, power, and study design — not directly on the p-value.
📖
ASA Statement on P-Values

The American Statistical Association published a landmark 2016 statement authored by Wasserstein & Lazar outlining six principles for proper p-value use, followed by a 2019 special issue. The core message: "A p-value, or statistical significance, does not measure the size of an effect or the importance of a result." It should be one piece of evidence, not the sole decision criterion.

P-Value Decision Tables

Common Significance Thresholds

P-Value RangeInterpretationTypical Conclusion
p < 0.001Very strong evidence against H₀Reject H₀ at α = 0.001, 0.01, and 0.05
0.001 ≤ p < 0.01Strong evidence against H₀Reject H₀ at α = 0.01 and 0.05
0.01 ≤ p < 0.05Moderate evidence; statistically significantReject H₀ at α = 0.05 only
0.05 ≤ p < 0.10Marginal evidence; not significant at standard αFail to reject H₀ at α = 0.05
p ≥ 0.10Weak or no evidence against H₀Fail to reject H₀ at all common thresholds

Decision Rule by Alpha Level

Significance Level (α)Reject H₀ whenCommon Use Case
α = 0.10p < 0.10Exploratory research, weak evidence sufficient
α = 0.05p < 0.05Default in most social sciences and business
α = 0.01p < 0.01Medical research, clinical trials
α = 0.001p < 0.001High-stakes decisions, physics experiments

Real-World Applications

P-values appear in every quantitative field. Here is how each domain uses them in practice.

💊

Clinical Research

Phase III trials use p < 0.05 (often 0.025 per arm) to demonstrate drug efficacy. Regulatory bodies like the FDA require this threshold for approval.

🧪

Psychology

Experimental psychology has moved toward reporting effect sizes and confidence intervals alongside p-values following reproducibility concerns in the 2010s.

📊

A/B Testing

Product teams test whether a new button color, page layout, or pricing change produces a statistically significant improvement in conversion rates.

🏭

Quality Control

Manufacturers test whether process changes produce mean outputs that differ significantly from specification using one-sample or two-sample t-tests.

📈

Economics

Regression coefficients are reported with p-values to test whether variables like income, education, or policy changes have statistically significant effects.

🔬

Genomics

Genome-wide association studies test millions of variants simultaneously, requiring Bonferroni-corrected thresholds as low as p < 5×10⁻⁸ to control false discovery rates.

How to Report P-Values

Reporting standards have tightened across journals. The following guidance reflects APA 7th edition and most major statistical style guidelines.

1

Report the exact p-value

Write p = 0.032, not "p < 0.05." Exact values give readers more information and allow independent evaluation. Round to two or three decimal places.

2

Use correct notation for very small values

When p is extremely small, write p < 0.001 rather than p = 0.0000002. Do not write p = 0.000 — this implies zero probability, which is incorrect.

3

Report alongside the test statistic

Give the full result: t(29) = 3.14, p = 0.004 or χ²(2) = 8.71, p = 0.013. This lets readers verify your calculation and assess it in context.

4

Include effect size and confidence interval

p-values alone are insufficient. The APA manual recommends reporting effect size (Cohen's d, η², r) and a confidence interval alongside every significant result.

5

State the conclusion in plain language

Write "there was a significant difference in mean scores, t(38) = 2.67, p = 0.011" — not just "p was significant." The statistic should support a substantive claim.

ConceptRelationship to P-ValuesLearn More
Null Hypothesis (H₀)P-value assumes H₀ is true during calculationNull and Alternative Hypotheses
Confidence IntervalsDual relationship: p < α ↔ CI excludes H₀ valueConfidence Intervals Guide
Z-ScoreZ is the test statistic; p comes from the z-distributionZ-Score Guide
Normal DistributionTwo-tailed p-values use the normal CDFNormal Distribution
Sampling DistributionP-values rely on the sampling distribution of the test statisticSampling Distributions
Type I Errorα is the Type I error rate — P(reject H₀ | H₀ true)Hypothesis Testing
Cohen's dEffect size: complements p-value with magnitudeCohen's d
Degrees of FreedomRequired for t and chi-square p-value calculationsDegrees of Freedom
ANOVAF-statistic maps to a p-value via the F-distributionANOVA Guide

P-Value Calculator

Enter your test parameters below. The calculator supports z-tests and t-tests with one-tailed and two-tailed options. For chi-square tests, use the chi-square calculator. For regression and ANOVA, see the regression calculator and ANOVA calculator.

P-Value Calculator (Z-Test & T-Test)

P-Value Reference Cheat Sheet

Symbol / TermMeaningTypical Value
pProbability of data this extreme under H₀0 to 1
αPre-set significance level0.05
H₀Null hypothesis — the default claim being testede.g. μ = 0
H₁Alternative hypothesis — what you're testing fore.g. μ ≠ 0
zZ test statistic (normal distribution, σ known)(x̄ − μ₀)/(σ/√n)
tT test statistic (t-distribution, σ unknown)(x̄ − μ₀)/(s/√n)
χ²Chi-square statistic (categorical data)Σ(O−E)²/E
FF statistic (ANOVA, ratio of variances)MS_between/MS_within
dfDegrees of freedom — affects tail arean−1 (one-sample t)
SEStandard error of the meanσ/√n or s/√n
βType II error rate (probability of missing real effect)0.20 common
PowerProbability of detecting a real effect (1 − β)0.80 common

Frequently Asked Questions

A p-value is the probability of obtaining a test statistic at least as extreme as the one from your sample data, assuming the null hypothesis is true. It measures how consistent your data are with H₀. A small p-value (say, 0.02) means your data would be unusual if H₀ were true — giving you reason to doubt H₀. It does not tell you the probability that H₀ itself is true.

A p-value of 0.05 means that, assuming H₀ is true, a result this extreme or more extreme would occur 5% of the time by random sampling variation alone. It sits exactly at the conventional threshold — meaning you would just barely reject H₀ at α = 0.05. This is not strong evidence; it is the minimum threshold for calling a result statistically significant by convention.

A smaller p-value means stronger evidence against the null hypothesis. Whether "better" is the right word depends on context. For a researcher trying to detect a real effect, a smaller p-value is more convincing. However, a tiny p-value does not mean the effect is large or important — it can arise from a trivially small effect in a very large sample. Always pair p-values with effect sizes and confidence intervals.

Step 1: Compute the test statistic (z, t, χ², or F) from your data. Step 2: Identify the null distribution and degrees of freedom. Step 3: Find the area in the tail(s) beyond your test statistic using a statistical table. For a z-test, look up |z| in the z-table and take the complementary area; double it for two-tailed. For a t-test, use the t-distribution table with df = n − 1. The interactive calculator on this page automates all of this.

The significance level (α) is a threshold you decide on before collecting data — typically 0.05. The p-value is what you calculate from your data after running the test. The decision rule is: if p < α, reject H₀. Think of α as the bar you've set and p as the score your data achieved. You cannot move the bar after seeing the score.

The five most common: (1) The p-value is the probability H₀ is true — false; it is the probability of the data given H₀. (2) p < 0.05 proves the alternative hypothesis — false; it establishes statistical significance, not truth. (3) A non-significant result means no effect — false; it means insufficient evidence. (4) Statistical significance means the result is important — false; significance is about evidence, not magnitude. (5) A significant p-value will replicate — false; replication depends on power, not just the p-value.

They are two sides of the same inference. A two-tailed test with α = 0.05 produces p < 0.05 if and only if the 95% confidence interval does not contain the null value. The confidence interval adds what the p-value lacks: the range of plausible parameter values. Both should be reported together — the p-value for the decision, the confidence interval for context about magnitude.

In regression output, each coefficient has an associated p-value testing whether that coefficient is significantly different from zero. A small p-value (below your chosen α) for a slope coefficient means the predictor has a statistically significant linear association with the outcome, controlling for other variables in the model. The overall F-test p-value tests whether the model as a whole explains significant variance. Always also report R² and the confidence intervals for coefficients.

No. P-values are probabilities and are always between 0 and 1 inclusive. If a calculation yields a p-value greater than 1, there is an error in the computation. P-values also cannot be negative. A p-value of exactly 0 is theoretically possible only for a perfectly deterministic outcome, which does not occur in practice with real data.

Direct comparison only makes sense when both p-values come from the same type of test on the same type of data. A smaller p-value indicates stronger evidence against H₀ in that particular test. You cannot conclude that a result with p = 0.01 is "twice as important" as one with p = 0.02 — the p-value scale is not linear in evidence strength. For comparing results across studies, meta-analysis and effect sizes are more informative than raw p-value comparisons.

APA 7th edition recommends: (1) Report exact p-values to two or three decimal places, e.g., p = 0.032. (2) For very small values, write p < 0.001. (3) Never write p = 0.000. (4) Include the test statistic, degrees of freedom, and p-value together: t(24) = 2.45, p = 0.022. (5) Omit the leading zero before the decimal since p cannot exceed 1: write p = .032 in APA style, though p = 0.032 is also widely accepted.

Sources and Further Reading

1. Wasserstein, R.L. & Lazar, N.A. (2016). The ASA Statement on p-Values: Context, Process, and Purpose. The American Statistician, 70(2), 129–133. doi:10.1080/00031305.2016.1154108
2. Fisher, R.A. (1925). Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd.
3. Neyman, J. & Pearson, E.S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society A, 231, 289–337.
4. Wasserstein, R.L., Schirm, A.L., & Lazar, N.A. (2019). Moving to a World Beyond "p < 0.05". The American Statistician, 73(sup1), 1–19. doi:10.1080/00031305.2019.1583913
5. Cumming, G. (2014). The New Statistics: Why and How. Psychological Science, 25(1), 7–29. doi:10.1177/0956797613504966
6. NIST/SEMATECH Engineering Statistics Handbook. P-values. itl.nist.gov