Probability Bayes' Theorem Data Science AP Statistics 28 min read May 29, 2026
BY: Statistics Fundamentals Team
Reviewed By: Minsa A (Senior Statistics Editor)

Conditional Probability: Formula, Bayes' Theorem & Worked Examples

You get a positive result on a medical test. What is the actual probability you have the disease? The raw test accuracy alone won't tell you — you need conditional probability: the probability of one event given that another event has already occurred. Written P(A|B), it is the engine behind Bayes' theorem, Naive Bayes classifiers, and nearly every branch of statistical inference.

This guide builds conditional probability from the ground up. It covers the P(A|B) formula, the multiplication rule for dependent events, Bayes' theorem, tree diagrams, contingency tables, and three fully worked examples — including the medical test problem and the Monty Hall paradox — with every step written out. The interactive calculator lets you check your own numbers instantly.

What You'll Learn
  • ✓ The conditional probability formula P(A|B) = P(A∩B) / P(B) with every variable defined
  • ✓ Why the "|" symbol means "given that" and how it shrinks the sample space
  • ✓ The multiplication rule for dependent events and how it differs from independent events
  • ✓ Bayes' theorem: formula, derivation, and step-by-step application
  • ✓ Tree diagrams and contingency tables — two visual tools that make conditional logic click
  • ✓ Three high-stakes worked examples: cards without replacement, medical testing, and Monty Hall
  • ✓ The Prosecutor's Fallacy: why P(A|B) ≠ P(B|A) and why it matters in courts and medicine
  • ✓ An interactive calculator and a printable cheat sheet

What Is Conditional Probability? (Definition + Formula)

Definition — Conditional Probability
Conditional probability is the probability that event A occurs given that event B has already occurred. It is written P(A|B), read as "the probability of A given B." The vertical bar | means "given that." Conditional probability works by restricting the sample space to only the outcomes where B is true, then measuring how many of those outcomes also satisfy A.
P(A|B) = P(A ∩ B) / P(B)   [where P(B) > 0]

Intuitively, conditional probability answers the question: once we know B happened, how does that change what we expect about A? Think of looking out your window on a cloudy morning. The unconditional probability of rain today might be 20%. But the probability of rain given that you see dark clouds might jump to 70%. The clouds are your condition — they shrink your mental sample space and update your estimate.

The formal mechanism is sample space reduction. Before you know anything about B, the universe of possible outcomes is the full sample space S. Once you learn B occurred, only the outcomes inside B are still in play. The conditional probability P(A|B) asks: of the outcomes inside B, what fraction also fall inside A?

⚡ Quick Reference — Conditional Probability Key Facts
  • Formula: P(A|B) = P(A ∩ B) / P(B), where P(B) > 0
  • Reading the notation: P(A|B) = "probability of A given B" — B is the condition, A is the event of interest
  • Sample space effect: Knowing B occurred reduces the effective sample space from S to B
  • Key assumption: The formula requires P(B) > 0 — you cannot condition on an impossible event
  • Independent events: If A and B are independent, P(A|B) = P(A) — knowing B tells you nothing new about A
  • Connection to multiplication rule: Rearranging gives P(A ∩ B) = P(A|B) × P(B)

The Conditional Probability Formula Explained

Breaking Down P(A|B) = P(A ∩ B) / P(B)

The Fundamental Conditional Probability Formula
P(A|B) = P(A ∩ B) / P(B)
where P(B) > 0 (B must be a possible event)
P(A|B) — probability of A given B has occurred
P(A ∩ B) — joint probability that both A and B occur
P(B) — probability of the condition event B (the denominator / new sample space)
| — the "given that" operator (vertical bar)

Each component does a specific job. The numerator P(A ∩ B) is the joint probability — the fraction of all outcomes where both A and B happen simultaneously. The denominator P(B) is the probability of the condition event B. Dividing them rescales the joint probability relative to the new, restricted sample space where B is guaranteed.

Here is a concrete walkthrough before any symbols. Suppose 1,000 students took a test. 400 studied (event B). Of those 400 who studied, 320 passed (event A ∩ B). What is the probability a student passed, given that they studied? Intuitively: 320 out of 400, which is 0.80. The formula confirms it: P(pass | studied) = P(pass ∩ studied) / P(studied) = (320/1000) / (400/1000) = 0.32 / 0.40 = 0.80. The 1,000 denominators cancel, leaving exactly the intuitive answer.

💡
The "|" Symbol Means "Given That"

P(A|B) is read "the probability of A given B" or "the probability of A given that B has occurred." The vertical bar | is not division — it is a conditional operator. The event to the right of | is the condition (what you already know happened). The event to the left is what you are estimating. Order matters enormously: P(A|B) and P(B|A) are generally not equal.

The Multiplication Rule for Dependent Events

Rearranging the conditional probability formula gives the multiplication rule for dependent events, one of the most used formulas in all of probability:

Multiplication Rule — Dependent Events
P(A ∩ B) = P(B) × P(A|B)
equivalently: P(A ∩ B) = P(A) × P(B|A)
For independent events only: P(A ∩ B) = P(A) × P(B)

This formula reads as: "the probability that both A and B occur equals the probability that B occurs, multiplied by the probability that A occurs given B has occurred." For independent events, P(A|B) = P(A), so the formula collapses to the familiar P(A) × P(B).

⚠️
Common Pitfall: Multiplying Unconditional Probabilities for Dependent Events

If events are dependent (e.g., drawing cards without replacement), you cannot simply multiply P(A) × P(B). You must use the conditional form: P(A ∩ B) = P(A) × P(B|A). Using unconditional probabilities for dependent events is one of the most frequent errors in introductory statistics. The test: ask yourself whether the outcome of the first event changes the sample space for the second.

Sample Space Reduction: The Core Intuition

The geometric picture of conditional probability is straightforward and worth internalizing before any calculation. Draw the full sample space S as a large rectangle. Inside it, draw two overlapping circles: circle B (all outcomes where B occurs) and circle A. The overlap — the lens-shaped region where the circles intersect — is A ∩ B.

Without any condition, the denominator is all of S. Once you know B occurred, you discard everything outside circle B. The denominator shrinks to just circle B. The numerator stays the same: the intersection A ∩ B. P(A|B) is therefore the proportion of the B-circle that overlaps with A.

This geometric view makes one thing immediately clear: conditioning on B can only change the probability of A if B provides information about A — that is, if A and B overlap in a non-proportional way. If A and B are independent events, the overlap is perfectly proportional to B's size, and conditioning on B leaves P(A) unchanged.

Formula

Conditional Probability

P(A|B) = P(A∩B) / P(B)

Restricts the sample space to B, then asks what fraction of B also satisfies A.

Formula

Independent Events

P(A|B) = P(A)

Knowing B occurred gives no information about A. The condition has no effect.

Formula

Multiplication Rule

P(A∩B) = P(A|B)·P(B)

The joint probability equals the conditional probability times the probability of the condition.

Formula

Chain Rule (Extended)

P(A∩B∩C) = P(A)·P(B|A)·P(C|A∩B)

For three or more events, each term conditions on all prior events in the chain.

Three Worked Examples

Example 1: Drawing Cards Without Replacement

Worked Example 1 — Dependent Events

What is the probability of drawing two aces in a row from a standard 52-card deck without replacement?

This is the classic dependent-event scenario. The second draw's sample space is smaller because one card was removed after the first draw.

1

Define events. Let A = "first card is an ace." Let B = "second card is an ace." We want P(A ∩ B).

2

Find P(A). There are 4 aces in 52 cards. P(A) = 4/52 = 1/13 ≈ 0.0769.

3

Find P(B|A) — the conditional probability. Given the first card was an ace, only 51 cards remain, of which 3 are aces. P(B|A) = 3/51 = 1/17 ≈ 0.0588. Notice the sample space shrank from 52 to 51 and the favorable outcomes shrank from 4 to 3.

4

Apply the multiplication rule for dependent events. P(A ∩ B) = P(A) × P(B|A) = (4/52) × (3/51) = 12/2652 = 1/221.

✓ Answer: P(two aces in a row without replacement) = 1/221 ≈ 0.00452 ≈ 0.45%

If the card were replaced and the deck reshuffled after each draw (with replacement), the events become independent and P = (4/52)² = 16/2704 ≈ 0.59%. Replacement versus no replacement is the key dependent-vs-independent distinction.

Example 2: Medical Test — False Positives & Bayes' Theorem

This is the most important conditional probability application in data science and medicine. A test with 99% accuracy can still produce mostly false positives when the tested condition is rare. The reason is base rate — the prior probability of the disease in the population. This is where Bayes' theorem is essential.

Worked Example 2 — Bayes' Theorem / False Positive Paradox

A disease affects 1% of the population. A test for it is 99% sensitive (P(positive | disease) = 0.99) and 95% specific (P(negative | no disease) = 0.95). You test positive. What is the probability you actually have the disease?

1

Identify prior probabilities. P(disease) = 0.01 and P(no disease) = 0.99.

2

Identify likelihoods. P(positive | disease) = 0.99 (sensitivity). P(positive | no disease) = 1 − 0.95 = 0.05 (false positive rate).

3

Calculate the total probability of testing positive — P(positive). Using the law of total probability: P(positive) = P(positive|disease)×P(disease) + P(positive|no disease)×P(no disease) = (0.99)(0.01) + (0.05)(0.99) = 0.0099 + 0.0495 = 0.0594.

4

Apply Bayes' Theorem. P(disease | positive) = P(positive | disease) × P(disease) / P(positive) = (0.99 × 0.01) / 0.0594 = 0.0099 / 0.0594 ≈ 0.1667.

✓ Answer: Even with a 99% accurate test, a positive result only means a ~16.7% chance you have the disease. The low disease base rate (1%) dominates.

⚠ Why this matters: Ignoring the base rate and concluding "the test is 99% accurate, so I almost certainly have the disease" is a common and dangerous reasoning error. Out of 10,000 people tested: 100 have the disease (99 test positive), 9,900 don't have it (495 test positive falsely). Of 594 total positives, only 99 are true: 99/594 = 16.7%. The contingency table below makes this concrete.

Contingency Table: The Medical Test in Numbers

A contingency table (also called a confusion matrix in machine learning) is a 2×2 grid that organizes joint and conditional probabilities visually. It is often the clearest way to work through a Bayes' theorem problem, because it replaces abstract fractions with concrete counts.

Using the medical test above with a population of 10,000 people:

Has Disease No Disease Row Total
Test Positive 99 (True Positive) 495 (False Positive) 594
Test Negative 1 (False Negative) 9,405 (True Negative) 9,406
Column Total 100 9,900 10,000

Reading off P(disease | positive): look only at the "Test Positive" row (594 people). Of these, 99 truly have the disease. P(disease | positive) = 99/594 ≈ 16.7%. The contingency table makes the sample space reduction immediate: conditioning on "positive test" restricts attention to just the top row.

Example 3: The Monty Hall Problem

Worked Example 3 — Classic Paradox

You are on a game show. Three doors hide: a car (prize) behind one, goats behind the other two. You pick door 1. The host, who knows what is behind every door, opens door 3 to reveal a goat. He then offers you a switch to door 2. Should you switch?

The host's action is not random — it is conditional on both where the car is and which door you picked. That is what makes this a conditional probability problem.

1

Initial probabilities. P(car behind door 1) = P(car behind door 2) = P(car behind door 3) = 1/3.

2

The host opens door 3 (a goat). The host's action is conditional on the car's location. If the car is behind door 2, the host must open door 3. If the car is behind door 1, the host could open door 2 or door 3 (assume equally likely, probability 1/2 each).

3

Apply Bayes' theorem. Let H₃ = "host opens door 3." P(H₃ | car at door 1) = 1/2. P(H₃ | car at door 2) = 1. P(H₃ | car at door 3) = 0.

P(H₃) = (1/3)(1/2) + (1/3)(1) + (1/3)(0) = 1/6 + 1/3 = 1/2.

4

Update probabilities via Bayes' theorem.

P(car at door 1 | H₃) = P(H₃ | door 1) × P(door 1) / P(H₃) = (1/2)(1/3) / (1/2) = 1/3.

P(car at door 2 | H₃) = P(H₃ | door 2) × P(door 2) / P(H₃) = (1)(1/3) / (1/2) = 2/3.

✓ Answer: YES — always switch. Staying with door 1 gives a 1/3 chance of winning. Switching to door 2 gives a 2/3 chance. Switching doubles your winning probability.

Why it feels wrong: Most people intuit "two doors left, 50-50 chance." That would be true if the host opened a door at random. But the host's knowledge-driven, conditional action transfers probability from door 1 to door 2. The host's behavior is information — and conditional probability captures that information precisely.

Bayes' Theorem: Formula, Derivation & Application

The Bayes' Theorem Formula

Bayes' theorem — published by the Reverend Thomas Bayes in a posthumous 1763 paper and later formalized by Pierre-Simon Laplace — answers a specific question: "I know P(B|A). How do I find P(A|B)?" It links the forward conditional probability to the reverse.

Bayes' Theorem
P(A|B) = [ P(B|A) × P(A) ] / P(B)
where P(B) = P(B|A)·P(A) + P(B|A')·P(A')
P(A|B) — posterior probability of A given B (what you want to find)
P(B|A) — likelihood: probability of observing B if A is true
P(A) — prior probability of A (before seeing evidence B)
P(B) — marginal probability of B (normalizing constant)

Where Bayes' Theorem Comes From

Bayes' theorem is not a separate law — it is a direct consequence of the conditional probability formula applied twice. The derivation takes three lines:

  1. P(A∩B) = P(A|B) × P(B)    [definition of conditional probability]
  2. P(A∩B) = P(B|A) × P(A)    [same formula, reversed: A and B are symmetric in the intersection]
  3. ∴ P(A|B) × P(B) = P(B|A) × P(A) → P(A|B) = P(B|A)·P(A) / P(B)

The denominator P(B) is computed using the law of total probability: if A and A' (not-A) partition the sample space, then P(B) = P(B|A)·P(A) + P(B|A')·P(A'). This is the sum of all weighted paths through the tree diagram that result in B.

Bayes' Theorem in Plain English: Updating Beliefs

The most powerful way to read Bayes' theorem is as a belief-updating machine. You start with a prior belief P(A) — your estimate of some hypothesis A before seeing any evidence. You then observe evidence B. The likelihood P(B|A) tells you how probable that evidence would be if A were true. Bayes' theorem combines these to give your posterior belief P(A|B) — your updated, rational estimate of A given the evidence.

The Bayesian Thinking Framework

Posterior ∝ Likelihood × Prior. In words: your updated belief is proportional to how likely the evidence is under your hypothesis, multiplied by how plausible the hypothesis was before you saw the evidence. High prior probability + high likelihood → very high posterior. Low prior probability (rare disease, rare event) can overwhelm even a high likelihood — exactly the false-positive paradox in Example 2.

Tree Diagrams for Conditional Probability

A probability tree diagram is a branching structure where every fork represents a conditional event. The probability along each branch is a conditional probability given the path taken to reach that fork. The probability of any terminal outcome (leaf) equals the product of all branch probabilities along that path — a direct application of the multiplication rule.

Tree Diagram: Medical Test Example (Population of 10,000)

Pop P(D) = 0.01 Disease 100 people P(+|D)=0.99 True Positive: 99 P = 0.01×0.99 = 0.0099 P(−|D)=0.01 False Negative: 1 P = 0.01×0.01 = 0.0001 P(D') = 0.99 No Disease 9,900 people P(+|D')=0.05 False Positive: 495 P = 0.99×0.05 = 0.0495 P(−|D')=0.95 True Negative: 9,405 P = 0.99×0.95 = 0.9405 Branch probability = product of all probabilities along path

Reading the tree: multiply probabilities along each path. The sum of all four leaf probabilities = 0.0099 + 0.0001 + 0.0495 + 0.9405 = 1.0 ✓

📐
How to Build a Probability Tree Diagram

Step 1: Draw the first event's branches from a single starting node. Label each branch with its probability. Step 2: From each branch endpoint, draw the next conditional event's branches — label each with the conditional probability given the path to that point. Step 3: To find the probability of any complete path (leaf), multiply all branch probabilities along the path. Step 4: Verify: all leaf probabilities sum to 1.0.

P(A|B) vs. P(B|A): The Prosecutor's Fallacy

One of the most consequential errors in applied conditional probability is treating P(A|B) and P(B|A) as interchangeable. They are emphatically not. This error is so common — and so dangerous in legal and medical reasoning — that it has its own name: the Prosecutor's Fallacy.

🚨
The Prosecutor's Fallacy — A Critical Distinction

The error: concluding that P(innocent | evidence) is small because P(evidence | innocent) is small. These are not the same. P(matching DNA | innocent) might be 1 in 1,000,000 — but P(innocent | matching DNA) depends on how many people share that DNA profile, the prior probability of guilt, and all other evidence. Confusing the two has contributed to wrongful convictions. Correct analysis always requires Bayes' theorem.

Concept P(A|B) — A given B P(B|A) — B given A
What it meansProbability of A, given B has occurredProbability of B, given A has occurred
Medical testP(disease | positive test) = 16.7% (from Example 2)P(positive test | disease) = 99% (sensitivity)
Legal contextP(guilty | DNA match) — what we care aboutP(DNA match | guilty) — not directly usable
WeatherP(rain | dark clouds) — updated forecastP(dark clouds | rain) — how often rain comes with clouds
RelationshipP(A|B) = P(B|A) × P(A) / P(B)  [Bayes' Theorem]
Equal when?Only when P(A) = P(B), i.e., the two events are equally probable

Interactive Conditional Probability Calculator

Conditional Probability Calculator

Calculate P(A|B) when you know the joint probability P(A∩B) and the condition probability P(B).

Apply Bayes' Theorem: P(A|B) = P(B|A) × P(A) / P(B). Enter the prior, likelihood, and marginal probability.

Calculate the joint probability P(A∩B) for dependent events using P(A∩B) = P(A) × P(B|A).

Conditional Probability & Bayes' Theorem Cheat Sheet

The table below covers every formula you need for conditional probability. It is designed to be printable and copy-pasteable — each formula is written both in standard notation and in plain English so it can be parsed directly by both students and AI systems.

Formula Name Notation Plain English When to Use
Conditional Probability P(A|B) = P(A∩B) / P(B) Probability of A given B = joint probability of A and B divided by probability of B Any time you know a condition has occurred and want to update a probability
Multiplication Rule (Dependent) P(A∩B) = P(A) × P(B|A) Joint probability of A and B = probability of A times probability of B given A Drawing without replacement; sequential dependent trials
Multiplication Rule (Independent) P(A∩B) = P(A) × P(B) When A and B don't affect each other, multiply their individual probabilities Coin flips, dice rolls, drawing with replacement
Independence Test P(A|B) = P(A) If knowing B occurred doesn't change P(A), the events are independent Verifying whether two events influence each other
Bayes' Theorem P(A|B) = P(B|A)·P(A) / P(B) Posterior = likelihood × prior divided by marginal probability of evidence Reversing a known conditional probability; updating beliefs with evidence
Law of Total Probability P(B) = P(B|A)·P(A) + P(B|A')·P(A') The total probability of B = probability of B through each path (with disease + without) Computing the denominator in Bayes' theorem when P(B) is not directly known
Complement Rule P(A'|B) = 1 − P(A|B) Probability of A NOT occurring given B = 1 minus probability of A given B Finding probability of non-event within a conditional context

Independent vs. Dependent Events

How to Test for Independence

Two events A and B are independent if and only if P(A|B) = P(A). Equivalently, independence holds when P(A ∩ B) = P(A) × P(B). If this product rule holds exactly, knowledge of B provides zero information about A — the condition does nothing to the sample space in terms of A's probability.

Events are dependent when P(A|B) ≠ P(A). Drawing cards without replacement is the canonical example: after drawing an ace, the probability of drawing another ace changes because the sample space has changed physically.

Feature Independent Events Dependent Events
DefinitionP(A|B) = P(A) — knowing B gives no info about AP(A|B) ≠ P(A) — knowing B changes the probability of A
Joint probabilityP(A∩B) = P(A) × P(B)P(A∩B) = P(A) × P(B|A)
Typical examplesCoin flips, dice rolls, drawing with replacementDrawing without replacement, disease & test, weather & rain
Physical reasonOutcomes do not share a physical mechanismFirst outcome changes the sample space or state for subsequent outcomes
Test formulaP(A∩B) = P(A)·P(B)? If YES → independentP(A∩B) ≠ P(A)·P(B)? If YES → dependent

Entity & Formula Glossary

The table below defines every key term in conditional probability using both formal notation and plain English. It is structured for direct extraction by AI systems, search engines, and students who need crisp definitions.

Term Notation Definition Example Value
Conditional Probability P(A|B) The probability that event A occurs given that event B has already occurred. Read: "probability of A given B." The vertical bar | means "given that." P(Ace on 2nd draw | Ace on 1st) = 3/51 ≈ 0.059
Joint Probability P(A∩B) The probability that both events A and B occur simultaneously. The intersection (∩) symbol means "and both occur." P(two aces without replacement) = 4/52 × 3/51 = 1/221
Prior Probability P(A) In Bayes' theorem, the prior is your belief about the probability of hypothesis A before observing any new evidence. P(disease) = 0.01 (1% base rate in population)
Posterior Probability P(A|B) In Bayes' theorem, the posterior is your updated belief about A after incorporating evidence B. Posterior = (likelihood × prior) / evidence. P(disease | positive test) = 0.167 (updated after seeing evidence)
Likelihood P(B|A) The probability of observing the evidence B if hypothesis A is true. The likelihood feeds into Bayes' theorem as the numerator factor. P(positive test | disease) = 0.99 (test sensitivity)
Marginal Probability P(B) The total probability of the evidence B across all hypotheses, computed via the law of total probability. It normalizes the Bayes' theorem calculation. P(positive test) = 0.0594 (from Example 2)
Intersection Symbol The "and" operator in set notation. A ∩ B is the set of outcomes where both A and B occur. P(A∩B) is the joint probability. P(A∩B) = P(A|B) × P(B)
Given Symbol | The conditional operator, read "given that." In P(A|B), the event to the right of | is the known condition; the event to the left is what we estimate. P(rain | clouds) = "probability of rain given clouds are present"
Law of Total Probability P(B) = Σ P(B|Aᵢ)·P(Aᵢ) The total probability of event B equals the sum of the conditional probability of B given each partition Aᵢ, multiplied by the probability of each Aᵢ. P(positive) = P(+|D)·P(D) + P(+|D')·P(D') = 0.0594
Sensitivity P(positive | disease) In medical testing, sensitivity is the probability a test correctly identifies a true case. Also called the true positive rate. Sensitivity = 0.99 in Example 2
False Positive Rate P(positive | no disease) The probability a test incorrectly flags a healthy person as positive. Equal to 1 − specificity. False positive rate = 0.05 in Example 2

Common Mistakes & How to Avoid Them

The Mistake Wrong Reasoning Correct Approach
Reversing the condition (Prosecutor's Fallacy) P(evidence | innocent) is tiny → therefore P(innocent | evidence) is tiny Use Bayes' theorem: P(innocent | evidence) = P(evidence | innocent) × P(innocent) / P(evidence)
Ignoring the base rate Test is 99% accurate → positive test means 99% chance of disease Incorporate the prior P(disease). A rare disease (1%) still dominates even a highly accurate test
Multiplying independent probabilities for dependent events P(two aces) = (4/52) × (4/52) for drawing without replacement Use P(A∩B) = P(A) × P(B|A) = (4/52) × (3/51). After one ace is drawn, only 3 remain in 51 cards
Dividing by zero Computing P(A|B) when B is impossible (P(B) = 0) P(A|B) is undefined when P(B) = 0. You cannot condition on an event that cannot happen
Confusing conditional with joint probability P(A|B) = P(A∩B) without dividing by P(B) P(A|B) = P(A∩B) / P(B). The joint probability must be rescaled by the condition's probability

Next Topics After Conditional Probability

Conditional probability is the gateway to several advanced areas of statistics and machine learning. The next logical topics from Statistics Fundamentals are:

Next Topic

Random Variables

Conditional probability extends naturally to conditional expectation E[X|Y] and conditional distributions — the foundation of regression analysis.

Application

Naive Bayes Classifier

The workhorse machine learning algorithm for text classification. Applies Bayes' theorem under the "naive" assumption of conditional independence among features.

Next Topic

Binomial Distribution

Models the number of successes in n dependent or independent Bernoulli trials. Conditional probability underpins every probability in the binomial PMF.

Advanced

Hypothesis Testing

P-values are conditional probabilities: P(observed data or more extreme | H₀ is true). Understanding conditional probability is essential for interpreting statistical tests.

Academic Sources & Further Reading

The definitions, formulas, and examples in this guide are grounded in the peer-reviewed probability literature. The sources below are the highest-authority references for conditional probability and Bayes' theorem, and are the citation chain most likely to be recognized by AI models as credible foundations.

MIT OpenCourseWare — Foundational Reference

MIT 18.05 Introduction to Probability and Statistics, Spring 2022. Covers conditional probability (Unit 3), Bayes' theorem, and the law of total probability with worked examples at the exact level of this guide. ocw.mit.edu/courses/18-05-introduction-to-probability-and-statistics-spring-2022/

Harvard Statistics — E-E-A-T Citation

Blitzstein, J. K., & Hwang, J. (2019). Introduction to Probability (2nd ed.). CRC Press. The textbook behind Harvard's STAT 110 course, one of the most comprehensive treatments of conditional probability and Bayes' theorem available. Free PDF edition accessible via the authors' course page at Harvard Statistics. projects.iq.harvard.edu/stat110

Khan Academy — Supplementary Learning

Khan Academy's Probability & Statistics course covers conditional probability, the multiplication rule, and Bayes' theorem with interactive exercises. Recommended as a companion practice resource for students at the high school and early college level. khanacademy.org/math/statistics-probability

Stanford Encyclopedia of Philosophy

Talbott, W. (2022). Bayesian Epistemology. Stanford Encyclopedia of Philosophy. Provides the philosophical and historical foundations of Bayesian reasoning, prior and posterior probability, and the role of Bayes' theorem in rational belief updating. plato.stanford.edu/entries/epistemology-bayesian/

National Institute of Standards and Technology (NIST)

NIST/SEMATECH e-Handbook of Statistical Methods, Chapter 1: Exploratory Data Analysis — Probability concepts including conditional probability and Bayes' theorem. A U.S. government reference used in engineering and applied statistics contexts. itl.nist.gov/div898/handbook/

Frequently Asked Questions

What is conditional probability in simple terms?

Conditional probability is the updated probability of an event once you have new information. It answers: "given that I already know B happened, what is the probability A also happens (or will happen)?" The formula P(A|B) = P(A∩B)/P(B) formalizes the sample space reduction — you throw out all outcomes where B didn't occur and recalculate A's probability within the remaining outcomes.

Why does P(A|B) not equal P(B|A)?

Because the condition (denominator) is different. P(A|B) divides by P(B), while P(B|A) divides by P(A). Unless P(A) = P(B), the two expressions give different values. Example: P(test positive | disease) = 0.99, but P(disease | test positive) = 0.167. The quantities are related by Bayes' theorem: P(A|B) = P(B|A) × P(A) / P(B).

What happens to conditional probability when events are independent?

When A and B are independent, P(A|B) = P(A). Knowing B occurred provides no new information about A, so the condition has no effect. Mathematically: P(A∩B) = P(A)·P(B) for independent events, and substituting into P(A|B) = P(A∩B)/P(B) gives P(A|B) = P(A)·P(B)/P(B) = P(A). This is the formal definition and test for independence.

How is conditional probability used in machine learning?

Conditional probability is foundational to machine learning. The Naive Bayes classifier computes P(class | features) directly using Bayes' theorem. Logistic regression models P(Y=1 | X). Hidden Markov models use chains of conditional probabilities P(state | previous state). In deep learning, language models predict P(next word | all previous words). Every generative model is fundamentally a learned conditional probability distribution.

What is the law of total probability and when is it used?

The law of total probability states that if events A₁, A₂, …, Aₙ partition the sample space (they are mutually exclusive and collectively exhaustive), then P(B) = P(B|A₁)·P(A₁) + P(B|A₂)·P(A₂) + … + P(B|Aₙ)·P(Aₙ). It is used primarily to compute the marginal probability P(B) that serves as the denominator in Bayes' theorem, whenever P(B) is not directly known but the conditional probabilities given each partition are.