What is an outlier in statistics?

An outlier is a data point that sits far from the other values in a dataset. The IQR method flags any value below Q1 − 1.5×IQR or above Q3 + 1.5×IQR as an outlier. The Z-score method flags values whose |Z| exceeds 3 standard deviations from the mean. The Modified Z-Score method uses the median and MAD instead, making it more resistant to the distorting effect of extreme values.

What is the difference between IQR and Z-score outlier detection?

The IQR method is non-parametric: it requires no assumption about the distribution shape and works well on skewed data. The Z-Score method assumes a roughly normal distribution; it is fast but can be skewed by extreme values (masking effect). The Modified Z-Score replaces the mean and SD with the median and MAD, giving robust performance on small or heavily skewed samples.

When should I remove outliers from my data?

Remove an outlier only when you have a documented, non-statistical reason: a data-entry error, a calibration fault, or a broken instrument. Statistical significance alone is not sufficient grounds for removal. Always report what you removed and why. For machine learning pipelines, consider capping (Winsorization) rather than deletion to preserve sample size.

How do I detect outliers in a dataset online?

Paste your comma-separated numbers into the input area above, choose a detection method (IQR, Z-Score, or Modified Z-Score), and click Detect Outliers. The tool calculates the relevant thresholds, flags each anomalous value, and renders a box plot and distribution chart so you can see the outliers visually.

What is the Modified Z-Score method?

The Modified Z-Score, proposed by Iglewicz and Hoaglin (1993), replaces the sample mean and standard deviation with the median and Median Absolute Deviation (MAD): Mi = 0.6745 × (Xi − Median) / MAD. Values where |Mi| > 3.5 are classified as outliers. Because MAD is not affected by the presence of outliers, this method avoids the masking problem that afflicts the classic Z-Score.

Outlier Detector & Visualizer | Interactive IQR & Z-Score Tool

Outlier Detector & Visualizer

Enter your data (comma-separated numbers) Paste any list of numbers separated by commas, spaces, or new lines.

Detection method

Threshold 1.5× IQR is standard (Tukey, 1977)

Formula Lower = Q1 − 1.5×IQR | Upper = Q3 + 1.5×IQR

Enter your data (comma-separated numbers)

Quick Reference

IQR Q3 − Q1

Tukey fences Q1 − 1.5×IQR Q3 + 1.5×IQR

Z-Score Z = (X − μ) / σ

Modified Z M = 0.6745(X − Med) / MAD

Method Selection Guide

Skewed data: use IQR
Normal dist.: Z-Score works well
Small n (<30): Modified Z is safer
Heavy tails: Modified Z or k=3 IQR
Quality control: IQR or ±3σ rule

Descriptive Statistics Guide

Mean, median, SD, IQR, and more

Related Tools

What Is an Outlier?

An outlier is a data point that sits noticeably far from the bulk of a dataset. The distance can be measured in different ways depending on the method you choose, but the underlying idea is always the same: the value is so extreme that it warrants a closer look before any analysis continues.

Outliers appear in data for several reasons. A technician may have recorded 1000 where they meant 100. A sensor may have malfunctioned for a single reading. Or the value may be genuine — a salary of $500,000 in a dataset of office salaries is not a recording error; it is real data from a CEO. The statistical test identifies the value as unusual; only subject-matter knowledge tells you what to do next.

The Three Detection Methods This Tool Uses

IQR Method — Tukey's Fences. John Tukey introduced this rule in 1977. First, compute the interquartile range: IQR = Q3 − Q1. Any value below Q1 − 1.5×IQR or above Q3 + 1.5×IQR is flagged. Because the IQR is based on ranks rather than distances from the mean, this method requires no assumption about the shape of the distribution. It is the default choice for exploratory data analysis on skewed or unknown distributions.

Z-Score Method. The Z-score measures how many standard deviations a value lies from the mean: Z = (X − μ) / σ. Values where |Z| > 3 are conventional outliers. This method assumes the data comes from a roughly normal distribution. On heavily skewed samples, extreme values inflate both the mean and the SD, so the Z-score test can miss some outliers — a problem statisticians call masking.

Modified Z-Score. Proposed by Iglewicz and Hoaglin (1993), this method replaces the mean with the median and the standard deviation with the Median Absolute Deviation (MAD): M = 0.6745 × (X − Median) / MAD. Values where |M| > 3.5 are flagged. Because neither the median nor the MAD is affected by extreme values, this method avoids masking and is the right choice for small samples or data with heavy tails.

How to Detect Outliers: Step-by-Step

1
Sort your values and compute the five-number summary: minimum, Q1, median, Q3, maximum. This gives you a first visual sense of the spread and where the bulk of the data sits.
2
Choose a method. Use IQR for skewed or unknown distributions. Use Z-Score when you have confirmed normality and a large sample (>30). Use Modified Z-Score for small samples or when extreme values are already present in the data.
3
Calculate the fences or thresholds using the appropriate formula. Any value outside those bounds is a candidate outlier.
4
Plot the data. A box plot makes the fences visible and shows exactly where each flagged point sits relative to the distribution. The dot plot above displays every value individually.
5
Investigate before removing. Check whether the outlier has a documented cause — a data-entry error, a unit mismatch, an equipment fault. Statistical tests alone are not grounds for deletion. Record every decision you make.

Worked Examples

IQR Example 1 — Student Exam Scores

55, 60, 62, 63, 65, 67, 68, 70, 71, 72, 73, 75, 98

Q1 = 62, Q3 = 73, IQR = 11. Lower fence = 62 − 16.5 = 45.5. Upper fence = 73 + 16.5 = 89.5. The score of 98 exceeds 89.5 and is flagged. All other scores fall within range. Before removing it, confirm whether 98 is a genuine high performance or a recording error.

Z-Score Example 2 — Sensor Temperature Readings (°C)

20.1, 20.3, 19.8, 20.5, 20.2, 19.9, 20.4, 20.1, 45.7, 20.0

Mean ≈ 22.7, SD ≈ 7.95. Z-score for 45.7 ≈ 2.89 — close to but below the threshold of 3. With Modified Z-Score, the median is 20.15 and MAD ≈ 0.2, giving M ≈ 85 for the 45.7 reading, well above 3.5. The Modified Z-Score catches what Z-Score misses here, because the SD was inflated by the anomalous reading itself.

Modified Z Example 3 — Enterprise Salary Data ($k)

45, 48, 50, 52, 53, 55, 57, 60, 62, 65, 70, 480

The $480k salary (likely a C-suite executive) inflates the mean to ~99.8k and the SD to ~119k. The Z-score for $480k ≈ 3.19, just barely above the threshold. The Modified Z-Score gives M ≈ 22 for this value, a clear flag. The IQR method also catches it: fences sit at 3.75 and 107.25. All three methods agree here, which increases confidence that $480k warrants investigation.

Method Comparison Table

Method	Formula	Best for	Assumption	Known limitation
IQR — Tukey's Fences	Q1 − k×IQR Q3 + k×IQR	Skewed data, general EDA, box plots	None (non-parametric)	Can be conservative on small samples
Z-Score	(X − μ) / σ	Large samples, confirmed normal distributions	Roughly normal distribution	Masking: extreme values inflate σ, hiding other outliers
Modified Z-Score	0.6745 × (X − Med) / MAD	Small samples, heavy tails, robust pipelines	Symmetric data (not strictly normal)	MAD = 0 when more than half the values are identical

When Not to Remove an Outlier

Removing a data point because it tests as an outlier, without any other justification, is a form of selective reporting. This practice can inflate effect sizes, reduce standard errors, and make results look more statistically significant than they actually are. Journals and institutional review boards increasingly require that any excluded observations be reported alongside the reason for exclusion.

If an outlier is genuine but extreme, consider Winsorization (capping the value at the fence) rather than deletion. This approach preserves your sample size while limiting the influence of extreme values on parametric statistics. For machine learning preprocessing, robust scalers based on the median and IQR are available in scikit-learn's RobustScaler as a direct alternative to standard normalization.

Python (NumPy)


import numpy as np

data = np.array([12, 15, 14, 10, 100, 13, 11, 16, 9, 14])

q1, q3 = np.percentile(data, [25, 75])
iqr = q3 - q1
lower = q1 - 1.5 * iqr
upper = q3 + 1.5 * iqr
iqr_outliers = data[(data < lower) | (data > upper)]

z_scores = (data - np.mean(data)) / np.std(data)
z_outliers = data[np.abs(z_scores) > 3]

median = np.median(data)
mad = np.median(np.abs(data - median))

if mad == 0:
    mod_z = np.zeros_like(data, dtype=float)
else:
    mod_z = 0.6745 * (data - median) / mad

mod_outliers = data[np.abs(mod_z) > 3.5]

R (Base)


data <- c(12, 15, 14, 10, 100, 13, 11, 16, 9, 14)

q <- quantile(data, c(0.25, 0.75))
iqr <- IQR(data)

iqr_out <- data[
  data < q[1] - 1.5 * iqr |
  data > q[2] + 1.5 * iqr
]

z <- scale(data)
z_out <- data[abs(z) > 3]

med <- median(data)
mad_val <- median(abs(data - med))

if (mad_val == 0) {
  mod_z <- rep(0, length(data))
} else {
  mod_z <- 0.6745 * (data - med) / mad_val
}

mod_out <- data[abs(mod_z) > 3.5]

How Outliers Affect Common Statistics

Statistic	Sensitivity to outliers	Robust alternative
Mean	High — pulled toward extreme values	Median
Standard deviation	High — inflated by extreme distances	MAD
Pearson correlation (r)	High — a single extreme pair can dominate	Spearman rank correlation
Linear regression slope	High — influential points can rotate the line	Theil-Sen estimator
Median	Low — unaffected unless >50% of data is extreme	Already robust
IQR	Low — based on ranks, not raw distances	Already robust

Related Topics

Statistics Fundamentals Descriptive Statistics Interquartile Range (IQR) Z-Score Guide Z-Score Calculator Standard Deviation Guide Box Plot Generator Five-Number Summary Normal Distribution Normality Tests Pearson Correlation Influential Points in Regression

Sources and further reading:

Tukey, J.W. (1977). Exploratory Data Analysis. Addison-Wesley. [Introduced the 1.5×IQR rule]
Iglewicz, B. & Hoaglin, D. (1993). How to Detect and Handle Outliers. ASQC Quality Press. [Modified Z-Score method, threshold of 3.5]
NIST Engineering Statistics Handbook — Detection of Outliers
Grubbs, F. E. (1969). Procedures for Detecting Outlying Observations in Samples. Technometrics, 11(1), 1–21.
scikit-learn documentation — Scaling data with outliers

Frequently Asked Questions

An outlier is a data point that falls far outside the pattern set by the rest of the dataset. The IQR method defines this as any value below Q1 − 1.5×IQR or above Q3 + 1.5×IQR. The Z-Score method flags any value more than 3 standard deviations from the mean. The Modified Z-Score flags any value where |M| exceeds 3.5. No single definition is universal — the right threshold depends on the field, sample size, and distribution shape.

The IQR method is non-parametric: it uses the spread of the middle 50% of the data and makes no assumption about the distribution. It is the safer default for skewed or unknown distributions. The Z-Score method is parametric: it measures distance from the mean in units of standard deviation and works best when data is approximately normal and the sample is large enough that a few extreme values do not heavily distort the mean and SD. When in doubt on a small or skewed sample, the Modified Z-Score is more reliable than either.

The Modified Z-Score replaces the sample mean with the median and the standard deviation with the Median Absolute Deviation (MAD): M = 0.6745 × (X − Median) / MAD. The constant 0.6745 scales MAD so that it equals the SD when the data is normally distributed. Values where |M| exceeds 3.5 are classified as outliers. Because neither the median nor the MAD is pulled by extreme values, this method avoids the masking problem that afflicts the standard Z-Score test. It was proposed by Iglewicz and Hoaglin in their 1993 ASQC monograph.

Remove an outlier only when you have a non-statistical justification: a confirmed data-entry error, a documented instrument malfunction, or a clear unit mismatch (e.g. one measurement in feet when all others are in metres). Statistical significance alone does not justify removal. If the value is genuine but extreme, consider Winsorizing it (capping at the fence) rather than deleting the observation. Always report removed values, the reason for their removal, and the effect this had on your results.

A box plot draws a box from Q1 to Q3 with a line at the median. The whiskers extend to the most extreme inlier values — the last data points still inside the IQR fences. Any points beyond the whiskers are plotted individually and are conventionally treated as outliers. This visual representation makes it easy to see both the location of the outlier and the overall shape of the distribution at the same time. Use the box plot this tool generates alongside the flagging table to examine each outlier in context.

Yes. Distance-based algorithms (k-NN, SVM, k-means) are sensitive to outliers because the extreme values distort the distance calculations. Linear and logistic regression coefficients can be rotated significantly by a small number of high-leverage points. Tree-based models (Random Forest, XGBoost) are generally more resistant but are not immune. Standard preprocessing best practice for machine learning is to detect outliers before training, then decide whether to remove, cap, or log-transform them depending on the volume of affected observations and the nature of the task.

Outlier Detector & Visualizer

Outlier Detector & Visualizer

Box Plot with Outliers

Dot Plot

Descriptive Statistics

Five-Number Summary

Detected Outliers

Per-Value Flagging Table

IQR Tukey's Fences (k=1.5)

Z-Score |Z| > 3

Modified Z Modified Z-Score |M| > 3.5

Comparison Box Plot

Method Agreement

What Is an Outlier?

The Three Detection Methods This Tool Uses

How to Detect Outliers: Step-by-Step

Worked Examples

Method Comparison Table

When Not to Remove an Outlier

How Outliers Affect Common Statistics

Related Topics

Frequently Asked Questions