What Is a Probability Tree Diagram?
Think of a probability tree as a highway map where the road keeps splitting. At each fork, the road divides into lanes — one lane per possible outcome — and the probability label on each lane tells you what fraction of traffic takes that route. By the time you reach the final destinations on the far right, every traveler has been sorted into exactly one outcome, and the numbers at each final stop tell you how likely it is that a randomly chosen traveler ended up there.
The Harvard Statistics Department's introductory curriculum describes tree diagrams as the clearest entry point into conditional probability precisely because they make the dependency structure visible: every branch is literally drawn from the outcome that precedes it. Blitzstein & Hwang (2019), whose Introduction to Probability is widely used in university statistics courses, present tree diagrams as the foundational tool for organizing multi-event sample spaces before moving to more abstract notation.
Anatomy of a Probability Tree — The Five Components
Before drawing any numbers, you need to know what every piece of a probability tree is called and what job it does. Here are the five structural components as defined in standard combinatorics and probability references, including Khan Academy's AP Statistics probability module and Wolfram MathWorld.
📐 Fully Labeled Anatomy: Two-Stage Probability Tree
Root Node
The single dot at the far left. It represents the situation before any event has occurred — the full, unfiltered sample space. Every path originates here.
Branches
Lines extending from any node — one for each possible outcome at that stage. All branches from one node together represent 100% of possibilities from that point.
Conditional Branch Probability
The number written on each branch. For second-stage branches, this is a conditional probability: the probability of that outcome given the path taken to reach this node.
Intermediate Nodes
Points where the tree forks for the next stage. Each intermediate node represents one specific first-stage outcome, and spawns its own set of second-stage branches.
Final Joint Outcomes
The probabilities listed at the far right end of each complete path. Calculated by multiplying all branch probabilities along that path. All final outcomes must sum to 1.0.
The Two Golden Rules of Probability Trees
Every calculation you will ever do on a probability tree comes down to two rules. Get these two rules in your head — [MULTIPLY] across, [ADD] down — and no tree problem can stop you.
| Direction | Action | Rule Applied | What It Finds | Formula |
|---|---|---|---|---|
| ↔ Across (along a path) | MULTIPLY | Multiplication Rule for joint events | Probability that event A AND event B both occur in sequence | P(A∩B) = P(A) × P(B|A) |
| ↕ Down (combining paths) | ADD | Addition Rule for mutually exclusive outcomes | Probability that any one of several distinct paths leads to the same event | P(B) = P(A∩B) + P(A'∩B) |
To find the probability of path A AND then path B occurring, multiply the probability of branch A by the conditional probability of branch B given A. Read it as: "P of A intersect B equals P of A times P of B given A."
P(A) = probability of the first branchP(B|A) = conditional prob of second branch given firstP(A∩B) = joint probability of the full pathTo find the overall probability of event B regardless of which first-stage path was taken, add the joint probabilities of all paths that end at B. Because those paths are mutually exclusive routes to the same destination, addition is correct.
P(A∩B) = path through A, then BP(A'∩B) = path through not-A, then BP(B) = total probability of B via any routeThink of the tree as an "and/or roadmap." Walking along a path means "this event AND then that event" — you are committing to a specific route, so you multiply the traffic fractions at each junction. Asking whether B happened by any route means "this path OR that path" — you are combining separate routes, so you add their traffic shares together. This mental model works at every level of complexity.
How to Draw a Probability Tree: 6 Steps
These six steps work for any two-stage probability tree. The process extends naturally to three or more stages by repeating steps 2 through 4 for each additional event.
Six Steps to Build Any Probability Tree Diagram
Draw the root node. Place a dot on the left side of your paper or screen. Label it "Start" or just leave it as a dot. This is the moment before any event has happened.
Draw first-stage branches. From the root, draw one line for each possible outcome of the first event. Label each line with its probability. The labels on all branches from a single node must add to exactly 1. Place a node (dot) at the tip of each branch.
Draw second-stage branches. From each first-stage node, draw a new set of branches — one per possible outcome of the second event. Label each with its conditional probability given the first outcome. Again, all branches from one node must sum to 1. For dependent events, these probabilities will differ between nodes.
Calculate joint outcome probabilities. Trace each complete path from left to right. Multiply every branch probability along that path together. Write the result at the far-right end of the path. This is the joint probability P(A ∩ B) for that sequence.
List and label all final outcomes. Write a clear description of each final outcome next to its joint probability (e.g., "H, H = 1/4"). You should have one entry per path — for a two-stage tree with two outcomes at each stage, that is four entries.
Apply the Sum of 1.0 Rule to verify. Add all final outcome probabilities. The total must equal exactly 1.0. If it does not, re-check your branch probabilities at each node and your multiplication along each path. A total ≠ 1 always points to an error.
Worked Example 1: Independent Events (Coin Flipped Twice)
Two coin flips are the clearest illustration of independent events. Because the coin has no memory, the second flip's probability is completely unaffected by the first result. The branch probabilities on the second stage are identical no matter which first-stage branch you came from.
🪙 Probability Tree: Flipping a Fair Coin Twice (Independent Events)
Fair Coin Flipped Twice: Full Calculations
Because these are independent events, P(H on flip 2) = 1/2 regardless of what happened on flip 1. The second-stage branch probabilities are identical from both intermediate nodes.
First stage (Flip 1): Two branches leave the root. P(Heads) = 1/2 and P(Tails) = 1/2. Check: 1/2 + 1/2 = 1.0 ✓
Second stage (Flip 2): From each node, two identical branches: P(H|prior) = 1/2, P(T|prior) = 1/2. Independence means these fractions never change. Check each set: 1/2 + 1/2 = 1.0 ✓
Multiply along each path: P(H,H) = 1/2 × 1/2 = 1/4. P(H,T) = 1/2 × 1/2 = 1/4. P(T,H) = 1/2 × 1/2 = 1/4. P(T,T) = 1/2 × 1/2 = 1/4.
Sum of 1.0 check: 1/4 + 1/4 + 1/4 + 1/4 = 4/4 = 1.0 ✓
Now use the ADD rule for a follow-up question: P(exactly one head) = P(H,T) + P(T,H) = 1/4 + 1/4 = 2/4 = 1/2. Two distinct paths both lead to "exactly one head," so we add their joint probabilities.
- The second-stage branch probabilities are identical from every intermediate node.
- The formula for the joint path is P(A ∩ B) = P(A) × P(B), because P(B|A) = P(B) when independent.
- Verification: P(A) × P(B) gives the same result you would get by counting the sample space directly (e.g., 4 equally likely outcomes for two coin flips → each has probability 1/4).
Worked Example 2: Dependent Events Without Replacement
Drawing items without replacement is the most common source of dependent events on probability trees. Once you remove an item from the pool, the pool shrinks — which changes every probability in the next stage.
When drawing without replacement, BOTH the numerator (if the drawn item type is removed) AND the denominator decrease by 1 after the first draw. Keeping the denominator at its original value is the single most common error on probability tree problems. After the first draw of n items, the second stage always has n − 1 items available.
Drawing Two Marbles Without Replacement: 4 Red, 6 Blue (10 total)
A bag contains 4 red marbles and 6 blue marbles. You draw one marble, observe its color, and draw a second without returning the first. Find all path probabilities.
First draw (10 marbles in bag): P(Red₁) = 4/10 = 2/5. P(Blue₁) = 6/10 = 3/5. Check: 4/10 + 6/10 = 10/10 = 1.0 ✓
Second draw — path via Red₁ (9 marbles remain, 3 red, 6 blue): P(Red₂ | Red₁) = 3/9 = 1/3. P(Blue₂ | Red₁) = 6/9 = 2/3. Check: 3/9 + 6/9 = 9/9 = 1.0 ✓
Second draw — path via Blue₁ (9 marbles remain, 4 red, 5 blue): P(Red₂ | Blue₁) = 4/9. P(Blue₂ | Blue₁) = 5/9. Check: 4/9 + 5/9 = 9/9 = 1.0 ✓
Multiply along each path:
P(R,R) = 4/10 × 3/9 = 12/90 = 2/15 ≈ 0.1333
P(R,B) = 4/10 × 6/9 = 24/90 = 4/15 ≈ 0.2667
P(B,R) = 6/10 × 4/9 = 24/90 = 4/15 ≈ 0.2667
P(B,B) = 6/10 × 5/9 = 30/90 = 1/3 ≈ 0.3333
Sum of 1.0 check: 12/90 + 24/90 + 24/90 + 30/90 = 90/90 = 1.0 ✓
P(at least one red) = P(R,R) + P(R,B) + P(B,R) = 12/90 + 24/90 + 24/90 = 60/90 = 2/3 ≈ 0.667. Note how the second-stage fractions differ between the two intermediate nodes — this is the hallmark of a dependent event tree.
🔵🔴 Probability Tree: Two Marble Draws Without Replacement (4 Red, 6 Blue)
Worked Example 3: Real-World Medical Screening Tree
Probability trees have direct applications in medicine, data science, and risk analytics. The medical screening scenario — mapping True Positives, False Positives, True Negatives, and False Negatives — is a canonical example cited in epidemiology curricula at institutions including the CDC's Applied Epidemiology self-study course (SS-1978). It illustrates why overall disease probability (prevalence) and test accuracy interact in non-obvious ways that a tree makes immediately visible.
Real-World Scenario
Medical Screening: Disease Prevalence and Test Accuracy
A disease has a 10% prevalence in a screened population. A diagnostic test has 90% sensitivity (correctly identifies 90% of those with the disease — True Positive Rate) and 85% specificity (correctly identifies 85% of those without the disease — True Negative Rate). Map all four outcomes on a probability tree.
🏥 Medical Screening Probability Tree (Prevalence 10%, Sensitivity 90%, Specificity 85%)
| Outcome | Path | Calculation | Probability | Meaning |
|---|---|---|---|---|
| True Positive (TP) | Disease+ → Test+ | 0.10 × 0.90 | 0.090 (9.0%) | Has disease, correctly identified |
| False Negative (FN) | Disease+ → Test− | 0.10 × 0.10 | 0.010 (1.0%) | Has disease, incorrectly cleared — most dangerous error |
| False Positive (FP) | Disease− → Test+ | 0.90 × 0.15 | 0.135 (13.5%) | No disease, flagged anyway |
| True Negative (TN) | Disease− → Test− | 0.90 × 0.85 | 0.765 (76.5%) | No disease, correctly cleared |
| TOTAL | — | 0.090 + 0.010 + 0.135 + 0.765 | 1.000 ✓ | Sum of 1.0 Rule confirmed |
Among all people who test positive, 13.5% are False Positives and 9.0% are True Positives. That means P(disease | positive test) = 0.090 / (0.090 + 0.135) ≈ 40% — even though the test has 90% sensitivity. This result, related to Bayes' Theorem, surprises most people and is a key reason Bayes' Theorem and probability trees are taught together in medical statistics. See also the treatment in Blitzstein & Hwang (2019, Ch. 2) and the NIH-published review on understanding medical test results.
The Sum of 1.0 Rule: How to Verify Your Tree
The Sum of 1.0 Rule is your built-in error detector. Every correctly drawn probability tree passes two checkpoints automatically. If either fails, there is a calculation error to find.
All branches from one node sum to 1
At every fork in the tree, the branches leaving that node represent all possible outcomes from that point. Together they must cover 100% of the probability from that node. If they do not add to 1, a branch probability is wrong.
All final joint probabilities sum to 1
The final outcome probabilities on the right side of the tree represent the complete sample space. Every possible outcome sequence is accounted for exactly once. Their total must be exactly 1.0. A value below 1 means a path is missing or a probability was written incorrectly.
Verification in Formal Probability Theory
The requirement that probabilities across a complete event partition sum to 1 follows directly from Kolmogorov's axioms of probability (1933), specifically the axiom of countable additivity. The UC Berkeley Statistics Department frames this verification as a fundamental habit for any multi-stage probability calculation. Probability rules including the normalization condition are covered in depth in our dedicated guide.
Interactive Probability Tree Builder
Enter your own event labels and branch probabilities below. The builder draws the two-stage tree, calculates all four joint outcomes, and verifies the Sum of 1.0 Rule automatically. For a 2-branch tree (like a coin flip), use two rows. The probabilities in each stage must sum to 1.
Two-Stage Probability Tree Builder
First Stage (Event A)
Second Stage (Event B given A₁ / A₂)
Independent vs Dependent Events: Side-by-Side Comparison
Recognizing whether events are independent or dependent is the first step in setting up any probability tree correctly. The table below maps the key differences as they appear on a tree diagram.
| Feature | Independent Events | Dependent Events (Without Replacement) |
|---|---|---|
| Second-stage branch probabilities | Identical from all intermediate nodes | Different from each intermediate node |
| Joint probability formula | P(A∩B) = P(A) × P(B) | P(A∩B) = P(A) × P(B|A) |
| Denominator on 2nd stage | Unchanged (sampling with replacement) | Reduced by 1 (item removed) |
| Classic examples | Coin flips, die rolls, cards with replacement | Drawing cards/marbles without replacing them |
| Test for independence | P(B|A) = P(B) — knowing A gives no info about B | P(B|A) ≠ P(B) — knowing A changes P(B) |
| Visual tell on the tree | Parallel second-stage branches are equal | Second-stage branches are clearly different fractions |
| Real-world context | Random sampling with replacement, repeated trials | Lotteries, quality control draws, medical testing |
Entity & Formula Glossary — Probability Tree Vocabulary
Every term below appears directly in a probability tree context. This glossary is structured to match the notation used in MIT OpenCourseWare's Statistics for Applications (18.650) and standard AP Statistics curricula.
| Term | Notation | Definition in Context of Probability Trees |
|---|---|---|
| Root Node | S (Start) | The origin point of the tree — represents the moment before any event. All probability flows from here. Total probability at the root is 1.0. |
| Branch | → | A line extending from a node, representing one possible outcome at that stage. Labeled with the probability of that outcome given the path taken to reach this node. |
| Conditional Branch Probability | P(B|A) | "P of B given A." The probability written on a second-stage branch — the likelihood of outcome B, given that outcome A occurred on the previous stage. |
| Joint Outcome Probability | P(A ∩ B) | The probability of event A AND event B both occurring in sequence. Found by multiplying all branch probabilities along the path: P(A ∩ B) = P(A) × P(B|A). Also written P(A and B). |
| Independent Events | P(B|A) = P(B) | Two events are independent if the first outcome does not change the probability of the second. On a tree, the second-stage branch probabilities are identical from all intermediate nodes. P(A ∩ B) = P(A) × P(B). |
| Dependent Events | P(B|A) ≠ P(B) | The first outcome alters the available pool for the second. The second-stage branch probabilities differ between intermediate nodes. Common in without-replacement sampling. P(A ∩ B) = P(A) × P(B|A) and P(B|A) ≠ P(B). |
| Multiplication Rule | P(A∩B) = P(A)×P(B|A) | The rule applied when moving horizontally ACROSS a path in the tree. Multiply consecutive branch probabilities to get the joint probability of that sequence of events. |
| Addition Rule (mutually exclusive paths) | P(B) = ΣP(paths ending in B) | The rule applied when moving vertically DOWN the final outcome column. When event B can be reached by multiple distinct paths, add their joint probabilities to get the total probability of B. |
| Sample Space (Outcome Vector) | S = {all paths} | The complete collection of all possible outcome sequences represented by the tree. For a two-stage tree with two outcomes per stage, |S| = 4. The sum of all final outcome probabilities equals 1.0. |
| Without Replacement | n → n−1 | Sampling where drawn items are not returned to the pool. Denominator decreases by 1 after each draw. Makes events dependent. Second-stage fractions must be recalculated from the reduced pool. |
| Sensitivity (True Positive Rate) | P(T+|D+) | In the medical tree: the probability a test is positive given the patient has the disease. Labels the branch from Disease+ to Test+. Also called the detection rate. |
| Specificity (True Negative Rate) | P(T−|D−) | In the medical tree: the probability a test is negative given no disease. Labels the branch from Disease− to Test−. Equal to 1 minus the false positive rate. |
The 4 Most Common Probability Tree Mistakes
Forgetting to Reduce the Denominator
After drawing without replacement, the denominator drops by 1. Students often keep the original denominator on all second-stage branches, making the final sum exceed 1.0. Always recount the pool.
Adding Across Instead of Multiplying
Some students add the two branch values along a path. This gives the wrong answer and a sum that exceeds 1. Always multiply when moving along a path. Addition is only for combining separate paths.
Using Overall Probabilities as Conditional Probabilities
For dependent events, students sometimes place the overall (marginal) probability of outcome B on all second-stage branches rather than adjusting for the first draw. Each intermediate node gets its own conditional fractions.
Branches at One Node Not Summing to 1
If you write P(H|T) = 0.5 and P(T|T) = 0.6 from a node, those add to 1.1 — impossible. Run the Sum of 1.0 checkpoint at every node as you draw, not only at the end.
Frequently Asked Questions About Probability Trees
What is a probability tree diagram?
A probability tree diagram is a visual tool that maps every possible sequence of outcomes for a multi-stage random experiment. Beginning from a root node at the left, it branches right at each event stage — one branch per outcome — labeled with that outcome's probability. Walking one complete path left to right traces one specific outcome combination, and the probability of that path is the product of all its branch labels.
When do you multiply probability tree branches?
Multiply when moving horizontally along a path — that is, when you want the probability that event A occurred AND event B then occurred. The joint probability P(A ∩ B) is always the product of all branch probabilities along that route. This is the Multiplication Rule: P(A ∩ B) = P(A) × P(B|A).
When do you add probability tree outcomes?
Add when combining multiple distinct paths that all lead to the same event — the OR scenario. Because those paths are mutually exclusive routes to the same destination, their probabilities add without overlap. For example: P(exactly one head in two flips) = P(H,T) + P(T,H) = 1/4 + 1/4 = 1/2.
Can a probability tree have three or more stages?
Yes. Add another column of branches for each additional event. A three-stage tree with two outcomes per stage has 2³ = 8 final paths. The rules are identical at every level: multiply across, add down. The number of paths grows exponentially, which is why trees are practical mainly for two or three stages. For more stages, other tools like combinatorial formulas or the binomial distribution are more efficient.
How does a probability tree relate to conditional probability?
Every second-stage branch in a probability tree is a conditional probability. The notation P(B|A) — read "probability of B given A" — is literally what the label on that branch represents. You can also work backwards from the tree to compute conditional probability: divide the joint probability of a specific path by the marginal probability of the first-stage outcome. This is the structure underlying Bayes' Theorem.
What is the probability tree rule for without replacement sampling?
After drawing one item from a pool of n items without replacement: (1) the total count becomes n − 1 for the second stage, and (2) the count of the specific type drawn decreases by 1 if the drawn item belonged to that type. Write new fractions at each second-stage branch that reflect these reduced counts. Never reuse the original denominator on the second stage when drawing without replacement.
Probability Tree Quick-Reference Cheat Sheet
| Situation | Formula or Rule | Direction | Example |
|---|---|---|---|
| Joint probability of one path | P(A∩B) = P(A) × P(B|A) | → MULTIPLY across | P(H,H) = ½ × ½ = ¼ |
| Total probability of an outcome via multiple paths | P(B) = P(A∩B) + P(A'∩B) | ↓ ADD down | P(one H) = ¼ + ¼ = ½ |
| Independent events (with replacement) | P(B|A) = P(B) | Same fractions on all 2nd-stage branches | Coin flips: P(H) always = ½ |
| Dependent events (without replacement) | P(B|A) ≠ P(B); denominator decreases by 1 | Different fractions per 2nd-stage node | Marbles: P(R₂|R₁) = 3/9 ≠ 4/10 |
| Node verification (checkpoint 1) | All branches from one node sum = 1 | Check each node | 3/9 + 6/9 = 9/9 = 1 ✓ |
| Final verification (checkpoint 2) | All final outcome probabilities sum = 1 | Check far-right column | 12/90 + 24/90 + 24/90 + 30/90 = 1 ✓ |
| Conditional probability from tree (Bayes) | P(A|B) = P(A∩B) / P(B) | Divide joint by marginal | P(D+|T+) = 0.09/0.225 ≈ 40% |
Related Probability Topics on Statistics Fundamentals
Probability trees are one tool in a broader toolkit. The topics below connect directly to what you have learned here. Statistics Fundamentals covers each in full detail with the same worked-example, step-by-step approach used in this guide.
Basic Probability
The P(A) = favorable/total formula, the probability scale, sample spaces, and the four core rules that form the foundation probability trees build on.
Conditional Probability
Every second-stage branch on a probability tree is a conditional probability P(B|A). This guide covers the formal definition, notation, and key formulas in depth.
Bayes' Theorem
The natural next topic after probability trees — how to reverse-engineer conditional probability to ask "given the test was positive, what is the actual probability of disease?"
Counting Methods
For larger problems with many stages or outcomes, combinatorial counting methods (permutations, combinations) are more efficient than drawing a full tree.
Binomial Distribution
When a two-outcome experiment is repeated many times under identical, independent conditions, the binomial distribution gives the aggregate probabilities without needing to draw an enormous tree.
Probability Rules
The complete formal treatment of the Multiplication Rule, Addition Rule, Complement Rule, and the conditions under which each applies — all in one reference guide.