Descriptive Statistics Data Distribution Box Plots 20 min read May 20, 2026
BY: Statistics Fundamentals Team
Reviewed By: Minsa A (Senior Statistics Editor)

Five Number Summary Explained with Quartiles & Box Plots Guide

Two datasets each have an average of 70. But one clusters tightly between 65 and 75, while the other stretches from 30 to 99. The mean tells you nothing about that difference. The five number summary does — in exactly five numbers.

This guide explains what the five number summary is, defines each of the five values with worked examples, shows how to calculate it step by step, connects it to box plots and the IQR, and applies it to real datasets from education, salary analysis, and clinical research.

What You'll Learn
  • ✓ What the five number summary is and why it matters for data analysis
  • ✓ Each of the five values defined precisely: minimum, Q1, median, Q3, maximum
  • ✓ A five-step calculation walkthrough with two real datasets
  • ✓ How to map the five values onto a box plot (box-and-whisker chart)
  • ✓ How the IQR relates to data spread and outlier detection
  • ✓ Real-world case studies from education, salaries, and medicine

What Is a Five Number Summary?

Definition — Descriptive Statistics Summary
The five number summary is a statistical description of a dataset using five values: minimum, first quartile (Q1), median, third quartile (Q3), and maximum. It shows how data is distributed across its full range and is the foundation of box-and-whisker plots.
{ Min, Q1, Median, Q3, Max }

Think of the five number summary as five boundary markers placed along your data. The minimum and maximum mark the outer edges. The median marks the exact center. Q1 and Q3 mark the boundaries of the middle 50% of observations. Together, those five positions give a complete picture of where data concentrates, where it spreads out, and whether the distribution leans in either direction.

The method was formalized by statistician John Tukey in his 1977 book Exploratory Data Analysis, alongside the box plot he invented to display it visually. Tukey argued that understanding a dataset requires looking at its entire shape — not just its average. The National Institute of Standards and Technology confirms this perspective: the NIST/SEMATECH e-Handbook of Statistical Methods lists the five-number summary as a core tool for initial data exploration. For a broader grounding in the field, Statistics Fundamentals covers every major concept from data types to regression.

⚡ Quick Reference — Five Number Summary Key Facts
  • Five values: Minimum, Q1 (25th percentile), Median (50th percentile), Q3 (75th percentile), Maximum
  • First step: Always sort data in ascending order before calculating anything
  • IQR: Q3 − Q1 measures the spread of the middle 50% and is resistant to outliers
  • Box plot connection: Each of the five values maps directly onto a specific part of a box-and-whisker plot
  • Outlier rule: Values below Q1 − 1.5 × IQR or above Q3 + 1.5 × IQR are potential outliers
  • Invented by: John Tukey (1977) as part of Exploratory Data Analysis methodology

The Five Core Values Explained

Each value in the five number summary marks a specific position in your sorted dataset. Position matters more than magnitude here — a value of 45 is only meaningful as "the minimum" or "Q1" once you know where it sits relative to everything else.

Min
Minimum
The smallest value. Marks the start of your data's range.
Q1
1st Quartile
The 25th percentile. 25% of observations fall at or below this point.
Med
Median
The 50th percentile. Half the data is below, half above.
Q3
3rd Quartile
The 75th percentile. 75% of observations fall at or below this point.
Max
Maximum
The largest value. Marks the end of your data's range.

Minimum and Maximum: Data Boundaries

The minimum is the smallest observation in your dataset; the maximum is the largest. Together they define the full range (Max − Min). Both are simple to identify once data is sorted — the minimum is the first value, the maximum is the last. Their limitation is sensitivity to outliers: a single extreme observation can make the range misleading. That is why the five number summary pairs them with Q1 and Q3, which are far more resistant to extreme values.

Median: The True Center of Data

The median is the value that divides the sorted dataset exactly in half. For datasets with an odd number of observations, the median is the single middle value. For an even number, it is the arithmetic mean of the two middle values. Because the median depends only on rank order — not on the actual distances between values — one extremely large observation has no effect on it. This makes the median a more reliable measure of center than the mean when data contains outliers or is skewed. The median guide on this site covers the calculation in full detail, and the broader descriptive statistics section places it in context alongside other summary measures.

Q1 and Q3: The Quartile Boundaries

The quartiles split sorted data into four equal groups. Q1 (the first quartile) is the median of the lower half of data — that is, all values below the overall median. Q3 (the third quartile) is the median of the upper half of data — all values above the overall median. This definition follows the inclusive method: the overall median itself is excluded from both halves before Q1 and Q3 are computed.

💡
The Quartile Splitting Rule

When finding Q1 and Q3, exclude the overall median before splitting. If your dataset has 11 values and the median is the 6th, the lower half for Q1 is values 1–5 and the upper half for Q3 is values 7–11. For an even-count dataset of 12 values with two middle values, the lower half for Q1 is values 1–6 and the upper half for Q3 is values 7–12.

How to Calculate a Five Number Summary: Step by Step

📋
The 5-Step Method — Works for Any Dataset

Step 1: Sort all values in ascending order. Step 2: Identify the minimum (first) and maximum (last) values. Step 3: Find the median (middle value, or average of two middle values). Step 4: Find Q1 as the median of the lower half. Step 5: Find Q3 as the median of the upper half.

Worked Example 1 — Exam Scores (12 Students)

Worked Example 1 — Even Dataset (n = 12)

Exam scores: 45, 52, 55, 60, 63, 65, 70, 72, 75, 80, 85, 90. Find the five number summary.

1

Sort ascending: 45, 52, 55, 60, 63, 65, 70, 72, 75, 80, 85, 90  ✓ already sorted. n = 12.

2

Minimum and Maximum: Min = 45  |  Max = 90

3

Median (n = 12, even): Middle two values are the 6th and 7th: 65 and 70. Median = (65 + 70) / 2 = 67.5

4

Q1 — lower half: Values below median: 45, 52, 55, 60, 63, 65. The median of these 6 values = (55 + 60) / 2 = Q1 = 57.5

5

Q3 — upper half: Values above median: 70, 72, 75, 80, 85, 90. The median of these 6 values = (75 + 80) / 2 = Q3 = 77.5

✓ Five number summary: Min = 45 | Q1 = 57.5 | Median = 67.5 | Q3 = 77.5 | Max = 90 | IQR = 77.5 − 57.5 = 20

45
Minimum
57.5
Q1 (25th %ile)
67.5
Median (50th %ile)
77.5
Q3 (75th %ile)
90
Maximum

The data card below shows how to read this result meaningfully:

📊 Interpretation — Exam Score Dataset

What the five numbers actually reveal

The bottom quarter of students scored between 45 and 57.5. Half scored below 67.5. The middle 50% of scores (the IQR) spanned from 57.5 to 77.5 — a range of 20 points — confirming reasonable consistency in the mid-range. The gap between the median (67.5) and the maximum (90) is larger than the gap between the median and minimum (45), which suggests slight right skew: a few high performers are pulling the top end upward.

Worked Example 2 — Odd Dataset (n = 9)

Worked Example 2 — Odd Dataset (n = 9)

Patient wait times (minutes): 8, 12, 15, 18, 22, 27, 31, 38, 45

1

Sort: 8, 12, 15, 18, 22, 27, 31, 38, 45  ✓ n = 9

2

Min = 8  |  Max = 45

3

Median (n = 9, odd): 5th value = 22

4

Q1 — lower half (exclude median): 8, 12, 15, 18. Median = (12 + 15) / 2 = Q1 = 13.5

5

Q3 — upper half (exclude median): 27, 31, 38, 45. Median = (31 + 38) / 2 = Q3 = 34.5

✓ Five number summary: Min = 8 | Q1 = 13.5 | Median = 22 | Q3 = 34.5 | Max = 45 | IQR = 21

How the Five Number Summary Connects to Box Plots

A box-and-whisker plot converts the five number summary into a visual. Every element of the box plot corresponds directly to one of the five values:

Box Plot Structure — Five Number Summary Mapping

Min 45 Q1 57.5 Median 67.5 Q3 77.5 Max 90 IQR = 20

Box plot built from the exam score example (n = 12). The box spans Q1 to Q3; the line inside marks the median; whiskers extend to the minimum and maximum.

Mapping Each Value to Its Visual Element

Five Number Summary Value Box Plot Element What It Shows
MinimumLeft whisker end (tip)The smallest non-outlier data point
Q1 (First Quartile)Left edge of the boxLower boundary of the middle 50%
MedianLine inside the boxThe center of the data
Q3 (Third Quartile)Right edge of the boxUpper boundary of the middle 50%
MaximumRight whisker end (tip)The largest non-outlier data point

When outliers exist, some box plot implementations modify the whiskers so they extend only to the most extreme non-outlier values (1.5 × IQR from each quartile). In that version, outliers appear as individual dots beyond the whiskers. This is the modified Tukey box plot, and it is the default in most statistics software. The data visualization guide on this site covers the full range of chart types for displaying distributions.

Interquartile Range (IQR) and Why It Matters

📐 IQR Formula
IQR = Q3 − Q1

The IQR measures the spread of the middle 50% of a dataset. Because it relies only on the central half of the data, it is unaffected by extreme values at either end — making it the preferred measure of spread when data contains outliers or is asymmetric.

In the exam score example, IQR = 77.5 − 57.5 = 20. That single number answers the question: "How spread out are typical scores?" A student in the 25th percentile scored roughly 20 points below a student in the 75th percentile — a meaningful gap, but not an extreme one.

Using IQR for Outlier Detection

The 1.5 × IQR rule, also developed by Tukey, defines outlier boundaries directly from the five number summary. The method is described in Penn State's STAT 200 course materials (Penn State STAT 200) and is the standard implementation in software including R, Python's pandas, and SPSS.

Outlier Detection — Tukey's 1.5 × IQR Rule
Lower fence = Q1 − 1.5 × IQR
Upper fence = Q3 + 1.5 × IQR
Any observation outside these fences is a potential outlier
IQR = Q3 − Q1 1.5 × IQR = standard multiplier for mild outliers 3 × IQR = used for extreme outliers

For the exam score dataset: Lower fence = 57.5 − 1.5(20) = 57.5 − 30 = 27.5. Upper fence = 77.5 + 1.5(20) = 77.5 + 30 = 107.5. Since all scores fall between 45 and 90, the dataset contains no outliers. If a student had scored 15, that value (below the lower fence of 27.5) would appear as an outlier dot in a box plot.

Compare this to outlier detection using z-scores, which requires assuming a normal distribution. The IQR method makes no such assumption — it works on any distribution shape.

Five Number Summary Calculator

🧮 Interactive Five Number Summary Calculator

▶ Show step-by-step breakdown

How to Read and Interpret the Five Number Summary

Computing the five values is straightforward. Reading what they mean together takes practice. Three questions guide interpretation:

Is the Distribution Symmetric or Skewed?

Check the distance between the median and each quartile. In a symmetric distribution, the median sits at roughly the same distance from Q1 as it does from Q3. If the median is much closer to Q1 than to Q3, the distribution is right-skewed — a long tail stretches toward larger values. If the median is closer to Q3 than to Q1, the distribution is left-skewed.

Case Study: Reading Skewness from the Five Number Summary

Software engineer salaries at a tech company (n = 40)

Five number summary: Min = $72,000 | Q1 = $95,000 | Median = $108,000 | Q3 = $145,000 | Max = $380,000

The median ($108k) is much closer to Q1 ($95k — gap of $13k) than to Q3 ($145k — gap of $37k). The right whisker extends far beyond the box to $380k. This is classic right skew: most engineers cluster in the $95k–$145k range, but a few senior principals or executives earn dramatically more. The mean salary would be pulled upward by those extreme values, which is precisely why the median is a more accurate representation of what a "typical" engineer earns at this company.

Assessing Spread with IQR vs. Range

The range (Max − Min) describes total spread but a single extreme value can inflate it dramatically. The IQR describes spread for the central 50% of data and is unaffected by outliers. When those two measures diverge sharply — a large range alongside a modest IQR — that is strong evidence of outliers or a heavy-tailed distribution.

⚠️
When Range Can Mislead

A dataset of: 10, 11, 12, 13, 14, 15, 95 has Range = 85 and IQR = 3. The range of 85 suggests enormous spread, but the IQR of 3 reveals that six of the seven values cluster within a 5-point window. The outlier of 95 has inflated the range without reflecting where the actual data sits. Always check IQR alongside range.

Real-World Case Studies

🎓

Education: Standardized Test Scores

The five number summary reveals whether a school's distribution is driven by a few high achievers or whether competency is broadly distributed. A narrow IQR with a high median indicates consistent, broad achievement; a wide IQR signals polarization between high and low performers.

💰

Finance: Income Distribution

Income data is almost always right-skewed. The five number summary shows this visually — the gap between Q3 and the maximum far exceeds the gap between Q1 and the minimum. Policymakers use IQR-based analysis to measure inequality without letting extreme incomes distort the picture.

🏥

Medicine: Clinical Trial Data

Reporting the five number summary alongside the mean and standard deviation is standard practice in clinical research. The American Journal of Medicine and other peer-reviewed journals require it for skewed clinical variables. Box plots allow immediate comparison of two treatment arms.

🏭

Manufacturing: Quality Control

In production settings, the IQR defines the "typical" variation band. A batch of components with a small IQR is more consistent than one with a large IQR — even if both batches share the same median. Quality engineers use the five number summary to flag batches for rejection before formal hypothesis tests are run.

🏃

Sports Analytics

Player performance metrics are compared using the five number summary to distinguish consistent performers (small IQR, median close to mean) from volatile performers (large IQR, wide whiskers). This helps coaches make decisions based on reliability rather than peak performance alone.

Five Number Summary in Excel, Python, and R

In Microsoft Excel

Excel does not have a single FIVENUMBERSUMMARY function, but each value can be computed individually. Use these formulas on data in column A (A1:A50 in this example):

=MIN(A1:A50) // Minimum
=QUARTILE.INC(A1:A50,1) // Q1 (25th percentile)
=MEDIAN(A1:A50) // Median
=QUARTILE.INC(A1:A50,3) // Q3 (75th percentile)
=MAX(A1:A50) // Maximum
=IQR: =QUARTILE.INC(A1:A50,3)-QUARTILE.INC(A1:A50,1)
💡
QUARTILE.INC vs. QUARTILE.EXC

Use QUARTILE.INC (inclusive) for most purposes — it matches the standard textbook definition used in this guide. QUARTILE.EXC (exclusive) excludes the median when computing Q1 and Q3 and may return slightly different values. QUARTILE.INC is the default method in most statistics courses.

In Python (pandas + NumPy)

import numpy as np
import pandas as pd

data = [45, 52, 55, 60, 63, 65, 70, 72, 75, 80, 85, 90]

# Using NumPy
q1 = np.percentile(data, 25)
median = np.median(data)
q3 = np.percentile(data, 75)
iqr = q3 - q1

print(f"Min: {min(data)} | Q1: {q1} | Med: {median} | Q3: {q3} | Max: {max(data)}")

# Using pandas — gives all five values at once
s = pd.Series(data)
print(s.describe()) # includes mean, std, and all quartiles

In R

data <- c(45, 52, 55, 60, 63, 65, 70, 72, 75, 80, 85, 90)

# Five number summary in one command
fivenum(data) # returns: Min Q1 Median Q3 Max using Tukey's hinges

# Or use summary() for additional context
summary(data) # adds mean alongside the five number summary

# Box plot — visualizes the five number summary instantly
boxplot(data, main="Five Number Summary", horizontal=TRUE)

Five Number Summary vs. Other Descriptive Statistics

Measure What It Tells You Use When
Five number summaryFull distribution shape: center, spread, range, quartile positionsExploring unknown data; comparing two groups; skewed data
Mean + SDCenter and average deviation; assumes roughly normal distributionData is roughly symmetric; no major outliers; parametric tests
Range onlyTotal spread from min to maxQuick scan; not adequate alone — misleading with outliers
IQR aloneSpread of the middle 50%When robustness to outliers is the priority
Median aloneTypical center valueCentral tendency for skewed data; insufficient without spread info

The variance and standard deviation are the appropriate measures of spread when data is symmetric and outlier-free. When data is skewed or contains extreme values, the IQR from the five number summary is the more informative choice. For data that contains outliers, the outlier detection guide explains both the IQR method and the z-score method in detail.

Concept Glossary

Concept Symbol / Formula Simple Definition Common Mistake
Five number summary{ Min, Q1, Median, Q3, Max }Five values that describe data distribution and spreadConfusing with mean-based summaries
MinimumMinSmallest value in the datasetTreating as the start of a "normal" range when it may be an outlier
MaximumMaxLargest value in the datasetSame as minimum — may be distorted by an outlier
MedianQ2 or MMiddle value; 50th percentileConfusing with the arithmetic mean
First quartileQ125th percentile — lower boundary of middle 50%Forgetting to exclude the median before computing Q1
Third quartileQ375th percentile — upper boundary of middle 50%Same omission as Q1
IQRQ3 − Q1Spread of the middle 50% of observationsConfusing IQR with the full range
Box plotVisual chart that maps all five summary valuesMisreading whisker ends as absolute minimum/maximum when outliers are plotted separately
Outlier fenceQ1 − 1.5×IQR ; Q3 + 1.5×IQRBoundaries beyond which values are flagged as outliersAssuming all outliers are errors — they may be real, meaningful extremes
Percentilepth percentileThe value below which p% of observations fallConfusing percentile rank with the actual value

Common Mistakes When Calculating the Five Number Summary

Mistake What Goes Wrong Correct Approach
Forgetting to sort the data first Q1, median, and Q3 all compute incorrectly; the minimum and maximum may also be wrong Always sort all values from smallest to largest before touching any calculation
Including the median in both halves when finding Q1 and Q3 Q1 and Q3 are shifted toward the median, underestimating the true IQR Exclude the overall median before splitting into lower and upper halves
Averaging the wrong pair of values for even-count datasets Median and quartiles land at the wrong positions For n values, the two middle positions are n/2 and (n/2)+1
Confusing IQR with range Overstating spread when outliers are present IQR = Q3 − Q1, not Max − Min. Use IQR for robust spread; Range for total extent
Treating the median as the mean Wrong interpretation of center, especially in skewed data The median is the middle rank; the mean is the arithmetic average — they differ in skewed datasets

Frequently Asked Questions

The five number summary describes a dataset using five values: the minimum, first quartile (Q1), median, third quartile (Q3), and maximum. It gives a complete picture of how data is distributed — showing the center, the spread, and where data concentrates — without requiring any assumptions about the shape of the distribution. It was introduced by John Tukey in 1977 and remains one of the most widely used tools in exploratory data analysis.
Step 1: Sort all values from smallest to largest. Step 2: Record the first value as the minimum and the last as the maximum. Step 3: Find the median — the middle value for odd-count datasets, or the average of the two middle values for even-count datasets. Step 4: Find Q1 as the median of all values below the overall median (excluding the median itself). Step 5: Find Q3 as the median of all values above the overall median (also excluding it).
Q1 (first quartile) is the 25th percentile — it marks the boundary below which 25% of the data falls. Q3 (third quartile) is the 75th percentile — 75% of the data falls at or below this value. Together, Q1 and Q3 define the interquartile range (IQR = Q3 − Q1), which measures how spread out the central 50% of observations are. A narrow IQR means data is tightly clustered; a wide IQR means substantial variation in the middle of the distribution.
A box plot maps all five values onto a single diagram: the left whisker extends to the minimum; the left edge of the box marks Q1; the line inside the box marks the median; the right edge of the box marks Q3; and the right whisker extends to the maximum. The box (from Q1 to Q3) covers the middle 50% of the data, and its width gives an immediate visual impression of the IQR. When outliers are present, whiskers are drawn only to the last non-outlier value, and extreme points appear as individual dots.
Yes. Tukey's 1.5 × IQR rule flags potential outliers using values directly from the five number summary. Any observation below Q1 − 1.5 × IQR or above Q3 + 1.5 × IQR is a candidate outlier. This method requires no assumption of normality and is more robust than z-score-based outlier detection when data is skewed. A stricter version uses 3 × IQR to identify only the most extreme values.
Range = Max − Min, giving one number that describes total spread. The five number summary gives five numbers that describe how data is arranged within that spread. The range can be dramatically inflated by a single extreme observation, while the IQR and quartile positions in the five number summary remain stable. For any dataset with potential outliers or non-normal distribution shape, the five number summary is far more informative than the range alone.
R's fivenum() uses Tukey's hinges, which can produce slightly different Q1 and Q3 values than the standard inclusive quartile method for some dataset sizes. R's summary() command uses a slightly different quartile algorithm (Type 7 by default) that may also differ from fivenum(). For educational purposes and textbook work, the standard inclusive method described in this guide matches QUARTILE.INC in Excel and the default behavior in most introductory statistics courses. Differences between methods are typically small and only arise in datasets with few values.
Sources and References:
Tukey, J.W. (1977). Exploratory Data Analysis. Addison-Wesley. | NIST/SEMATECH. (2012). e-Handbook of Statistical Methods — Quantile Plot. itl.nist.gov | Penn State STAT 200. Elementary Statistics: Describing Distributions with Numbers. online.stat.psu.edu | MIT OpenCourseWare. (2016). 18.650 Statistics for Applications. ocw.mit.edu | Agresti, A. & Franklin, C. (2018). Statistics: The Art and Science of Learning from Data (4th ed.). Pearson. | Moore, D.S., McCabe, G.P., & Craig, B.A. (2017). Introduction to the Practice of Statistics (9th ed.). W.H. Freeman.