Descriptive Statistics Research Methods Measurement Scales 26 min read May 20, 2026
BY: Statistics Fundamentals Team
Reviewed By: Minsa A (Senior Statistics Editor)

Types of Data in Statistics: Complete Beginner's Guide

A doctor records a patient's blood type, pain level on a 1–10 scale, body temperature, and weight. Four measurements — but four completely different data types. The method used to analyze each one, the chart used to display it, and the test used to compare groups all depend on getting that classification right first.

This guide covers every major data classification used in statistics and research: qualitative vs. quantitative, discrete vs. continuous, the four measurement scales (nominal, ordinal, interval, ratio), and the practical decision framework for classifying any variable you encounter. Use the interactive classifier below to identify data types instantly.

What You'll Learn
  • ✓ The two main data types (qualitative vs. quantitative) with real examples
  • ✓ Discrete vs. continuous data and the test that tells them apart
  • ✓ All four measurement scales (nominal, ordinal, interval, ratio) and what each allows mathematically
  • ✓ Primary vs. secondary data and why the distinction matters for research
  • ✓ Structured vs. unstructured data and its role in modern analytics
  • ✓ Which statistical tests to use for each data type
  • ✓ An interactive data type classifier tool

What Are Types of Data?

Definition — Data Classification in Statistics
Types of data refer to the different ways information can be classified in statistics and research. Data may describe qualities, categories, rankings, counts, or measurements. The classification determines which mathematical operations make sense, which visualizations communicate effectively, and which statistical tests produce valid results.

Every dataset is made up of variables, and every variable has a type. That type is not a formality — it is the first decision in any analysis. Apply the wrong statistical test to the wrong data type, and your results are meaningless no matter how carefully the rest of the math is done.

The NIST/SEMATECH Engineering Statistics Handbook defines measurement scales as the foundation of statistical analysis, noting that "the distinction between types of data is fundamental to the appropriate choice of statistical methods." Understanding data types also builds the vocabulary shared across research disciplines: a psychologist and an epidemiologist use the same framework to describe their variables, even when the variables themselves differ entirely.

At Statistics Fundamentals, this classification framework connects directly to every other topic on the site. The descriptive statistics you can compute depend on it. The data visualizations appropriate for your data depend on it. And the hypothesis tests available to you depend on it.

Why Data Classification Matters

Consider three variables from a student dataset: student ID number, grade level (Freshman/Sophomore/Junior/Senior), and GPA. All three contain numbers or labels, but they require completely different treatment:

  • Student ID is a nominal label — averaging IDs produces a nonsensical number
  • Grade level is ordinal — Freshman < Sophomore < Junior < Senior, but the gaps are not measurably equal
  • GPA is ratio scale — you can calculate a meaningful average, and a 4.0 is genuinely twice a 2.0

Researchers at Penn State's Department of Statistics note in their online curriculum that choosing the wrong analysis based on a misclassified variable is one of the most common errors in introductory statistical work (Penn State STAT 500).

⚡ Quick Reference — The Main Data Type Categories
  • Qualitative (Categorical): Describes characteristics or group membership — blood type, color, country
  • Quantitative (Numerical): Represents counts or measurements — height, income, test scores
  • Discrete: Countable whole numbers — number of children, defects per batch
  • Continuous: Any value in a range — temperature, weight, distance
  • Nominal: Categories without natural order — gender, product category
  • Ordinal: Ranked categories with unequal gaps — satisfaction ratings, education level
  • Interval: Equal gaps, no true zero — Celsius temperature, calendar years
  • Ratio: Equal gaps with true zero — weight, age, income

The Two Main Types of Data: Qualitative vs. Quantitative

Every variable in a dataset starts with one fundamental question: is it a category or a number? This distinction separates qualitative from quantitative data and determines the entire analytical pathway that follows.

Qualitative Data (Categorical Data)

Qualitative (Categorical) Data

Describes groups, labels, or characteristics — not amounts

Qualitative data places observations into named groups. You cannot add, subtract, or calculate an average from it in a meaningful way. Analysis focuses on counting frequencies, comparing proportions, and testing for associations between groups.

Blood type (A, B, AB, O) Country of origin Product category Survey response ("Agree") Color Marital status

Qualitative data answers the question "what kind?" It describes membership in a group rather than how much of something exists. A patient's diagnosis, a customer's preferred payment method, the name of a city — these are all qualitative. Note that qualitative data can contain numbers as labels (like ZIP codes or phone numbers) without those numbers being mathematically meaningful.

Quantitative Data (Numerical Data)

Quantitative (Numerical) Data

Represents measurable amounts where arithmetic makes sense

Quantitative data records actual quantities. You can calculate means, compute differences, and perform the full range of statistical operations. The values carry genuine numeric meaning — the difference between 10 and 20 is the same as the difference between 90 and 100.

Height (cm) Annual income ($) Exam score Number of purchases Temperature Time in seconds

Quantitative data answers "how many?" or "how much?" The key test is whether arithmetic operations produce meaningful results. The mean of 20 exam scores tells you something real about classroom performance. The "mean" of 20 ZIP codes tells you nothing.

Qualitative vs. Quantitative Data: Comparison

Property Qualitative Data Quantitative Data
What it capturesCategories, labels, groupsAmounts, counts, measurements
Answer to"What kind?""How many?" / "How much?"
ExamplesColor, nationality, diagnosisWeight, age, test score
Can calculate mean?NoYes
SubtypesNominal, OrdinalDiscrete, Continuous
Common chart typesBar chart, pie chartHistogram, scatter plot, line chart
Common analysesFrequency counts, chi-square testMean, standard deviation, regression
Numbers appear?Sometimes (as labels only)Always (meaningful values)
Diagram comparing qualitative and quantitative data types with examples of each in a side-by-side visual.

Discrete vs. Continuous Data

Within quantitative data, there is another split that affects both visualization and analysis: whether values are countable whole numbers (discrete) or can take any value in a range (continuous). This distinction determines whether a bar chart or a histogram is more appropriate, and whether a Poisson model or a normal model fits your data better.

Discrete Data

Discrete data can only take specific, countable values. Usually these are non-negative integers, though the defining characteristic is that you cannot have a meaningful value between two adjacent possible values. You can have 3 children or 4, but not 3.6 children. You can receive 0 complaints or 1 complaint, but the gap between those two values is absolute.

Discrete Data

Countable values with no meaningful values in between

Each observation is a whole unit. The question to ask: "Can I always insert another possible value between any two values in my dataset?" If the answer is no, the data is discrete.

Number of students (28, 29, 30...) Defective units per batch Goals scored in a game Monthly website visits Number of hospital admissions

Continuous Data

Continuous data can take any value within a range, including every fraction and decimal. Between any two possible measurements, there is always another possible value. Height, temperature, time, and weight are all continuous because a person could theoretically be 170.4, 170.41, or 170.413 cm tall — there is no natural stopping point for precision.

Continuous Data

Measurable values where any decimal is theoretically possible

Continuous data can always be subdivided further in theory, limited only by measurement instrument precision. The gaps between values are not fixed — they depend on how precisely you measure.

Height (170.4 cm) Weight (68.25 kg) Temperature (36.8°C) Revenue ($1,247.83) Reaction time (0.347 s)

Common Classification Mistakes

⚠️
Age Is Continuous — Until It Isn't

Age measured to the day or year is continuous ratio data. But when a study records age as "under 30 / 30–50 / over 50," it becomes ordinal categorical data. The same underlying phenomenon can yield different data types depending on how the measurement is recorded. Always check the measurement instrument and recording method, not just the variable name.

Star ratings (1–5 stars) cause frequent confusion. Technically they are ordinal qualitative data — the gaps between stars are not proven to be equal in the rater's mind. In practice, many researchers treat Likert scale responses as interval data for analysis convenience, but this is an assumption that requires justification. The American Psychological Association's measurement guidelines recommend explicitly stating this assumption when used.

Feature Discrete Data Continuous Data
ValuesSpecific, countable whole unitsAny value in a range
Values in betweenNo (gaps are absolute)Yes (infinite precision possible)
Test question"Can I count exact whole numbers?""Can I measure to a decimal?"
Common distributionBinomial, PoissonNormal, Uniform, Exponential
Best chartBar chartHistogram, density plot
ExamplesChildren per household, defectsHeight, reaction time, temperature

Levels of Measurement: Nominal, Ordinal, Interval, Ratio

The most precise classification framework in statistics comes from psychologist Stanley Smith Stevens, who introduced the four measurement scales in 1946 in the journal Science. His framework, now a core part of research methods education worldwide, defines nominal, ordinal, interval, and ratio scales based on the mathematical properties each scale possesses. The framework is taught in virtually every undergraduate statistics curriculum, including courses at MIT and Stanford.

Think of the four scales as a ladder of mathematical power. Each rung adds a new property to the ones below it, unlocking additional operations and analyses.

The Four Measurement Scales — From Least to Most Mathematically Powerful

1
Nominal — Labels Only

Categories with no natural order. You can count frequency but cannot rank, add, or compute averages. Operations: = and ≠ only.

2
Ordinal — Ranked Order (Unequal Gaps)

Categories that can be ranked, but the gaps between ranks are not necessarily equal. You can say "more than" or "less than" but not "how much more." Operations: =, ≠, >, <.

3
Interval — Equal Gaps (No True Zero)

Measured values with equal, meaningful intervals between points — but no true zero. Differences are meaningful; ratios are not. You cannot say 20°C is "twice as hot" as 10°C. Operations: =, ≠, >, <, +, −.

4
Ratio — Equal Gaps + True Zero

All properties of interval scale, plus a meaningful zero that represents complete absence. Ratios are valid: 100 kg is genuinely twice 50 kg. The full range of statistical operations applies. Operations: =, ≠, >, <, +, −, ×, ÷.

Nominal Data

Nominal data puts observations into named boxes with no implied ranking between them. The word "nominal" comes from the Latin nomen (name) — these are genuinely just names or labels. Changing the order in which you list nominal categories does not change any information about the data.

Nominal Data

Unordered categories — names and labels only

There is no sense in which one nominal category is "higher" or "more" than another. You cannot rank blood type A above blood type B; they are simply different groups.

Blood type (A, B, AB, O) Country of residence Eye color Product category (Food / Electronics / Clothing) Political party affiliation ZIP code

The appropriate statistics for nominal data are frequency counts, proportions, and the mode. The appropriate test for comparing two groups' nominal distributions is the chi-square test. You cannot calculate a mean for nominal data — "the average blood type is 1.73" means nothing.

Ordinal Data

Ordinal data adds rank order to the nominal properties. You know that one category is "higher" than another, but you cannot measure how much higher. The distance between Rank 1 and Rank 2 is not necessarily the same as the distance between Rank 2 and Rank 3.

Ordinal Data

Ranked categories — order matters, gaps do not

The position in the ranking carries information, but the size of the gap between positions does not. A customer who gives a product 5 stars is more satisfied than one who gives 3 stars, but you cannot claim they are exactly "2 units" more satisfied.

Satisfaction rating (1–5 stars) Education level (High School / Bachelor's / Master's / PhD) Cancer stage (I, II, III, IV) Military rank Finish position in a race (1st, 2nd, 3rd)

For ordinal data, median and rank-based statistics are appropriate. You should not calculate a mean for ordinal data, though in practice many researchers do so for Likert scales and note the assumption explicitly. Non-parametric tests like the Mann-Whitney U test or Spearman rank correlation apply directly to ordinal data.

Interval Data

Interval data is where arithmetic genuinely begins to apply. The gaps between values are equal and measurable, so differences carry real meaning. The complication is the lack of a true zero. In the Celsius scale, 0°C was defined arbitrarily as the freezing point of water — it does not mean "no temperature exists." This is why you cannot say 20°C is twice as hot as 10°C.

Interval Data

Equal gaps between values — but zero is arbitrary

Differences are meaningful and comparable. Ratios are not. The scale has no natural origin; its zero point was chosen by convention rather than representing the true absence of the thing being measured.

Temperature in °C or °F Calendar year (2000, 2010, 2020) IQ score Standardized test scores (SAT, GRE) pH scale
💡
The Ratio Test for Interval Data

Before claiming ratio statements, apply this test: does zero mean complete absence? For temperature in Celsius, 0°C does not mean "no temperature" — it is just the freezing point of water. You can say the temperature rose by 10 degrees, but not that 20°C is twice as warm as 10°C. Convert to Kelvin (which has a true absolute zero), and ratio statements become valid.

Ratio Data

Ratio data has all the properties of interval data, with one addition: a true, meaningful zero that represents the complete absence of the measured quantity. Zero kilograms means no weight. Zero income means no income. Zero reaction time is physically impossible, which confirms reaction time is ratio scale. Because the zero is real, you can make valid ratio comparisons.

Ratio Data

All interval properties plus a true, meaningful zero

The most mathematically powerful data type. Every arithmetic operation is valid. A value of zero means the measured quantity is completely absent, not just a convention or reference point.

Weight (0 kg = no weight) Height (cm) Income ($0 = no income) Age in years Distance (km) Temperature in Kelvin

Most physical measurements — weight, height, time elapsed, distance, counts of events — are ratio scale. This is the richest data type because the complete suite of statistical operations applies: mean, standard deviation, ratios, percentages, and all parametric tests.

NOIR Comparison: All Four Scales Side by Side

Property Nominal Ordinal Interval Ratio
Named categories
Meaningful order / rank
Equal intervals between values
True zero (absence of quantity)
Can calculate mean✗ (not recommended)
Can make ratio statements (2x)
Appropriate center measureModeMedianMeanMean
Classic exampleBlood typePain scaleTemperature °CWeight (kg)
Visual diagram of the four measurement scales: nominal, ordinal, interval, and ratio arranged as a ladder showing increasing mathematical power.

Interactive Data Type Classifier

Answer the questions below to identify the correct data type for any variable in your dataset. The classifier walks through the same decision logic used in statistical textbooks at institutions like Penn State and Harvard's Department of Statistics.

🔬 Data Type Classifier — What Kind of Data Do You Have?

Answer each question about your variable. The tool will identify the data type and suggest appropriate statistical methods.

↺ Start over

Primary vs. Secondary Data

Separate from the nominal/ordinal/interval/ratio classification, researchers also categorize data by its origin: was it collected firsthand for this study, or does it come from an existing source? This distinction matters for research validity, cost, and the types of conclusions you can draw.

Primary Data

Primary data is collected directly by the researcher for the specific research question at hand. The collector controls the measurement instrument, the sampling process, and the variables recorded.

Primary Data

First-hand data collected for your specific study

You define exactly what gets measured, how, and from whom. Primary data is tailored precisely to your research question, but it costs time and money to collect.

Survey responses Laboratory experiment results Clinical trial measurements Direct observation records Structured interviews

Secondary Data

Secondary data was collected by someone else for a different original purpose. The researcher reuses it for a new analysis. Secondary data is faster and cheaper to access, but it may not perfectly match the research question.

Secondary Data

Existing data reused from a different original study or source

Government agencies, academic institutions, and businesses collect enormous datasets. Researchers can access these for new analyses without the cost of original data collection.

U.S. Census Bureau data WHO health statistics Academic datasets (Harvard Dataverse) Company sales records Published research data

The U.S. Census Bureau's data.census.gov and the World Health Organization's Global Health Observatory are among the most widely used secondary data sources in academic research. A researcher studying income inequality does not need to survey millions of households — that data already exists.

Factor Primary Data Secondary Data
Who collected itThe researcher themselvesSomeone else, for another purpose
CostHigh (surveys, labs, field work)Low to free (often publicly available)
Time to obtainSlow (months to years)Fast (often immediate access)
Tailored to question?Yes — fully customizableRarely — may require adaptation
ReliabilityHigh (you control quality)Variable (depends on original source)
ExamplesClinical trials, surveys, experimentsCensus data, WHO statistics, academic datasets

Structured vs. Unstructured Data

Modern data science adds a third classification dimension that traditional statistics did not need: whether data fits neatly into rows and columns, or exists in some other, less organized form. This matters enormously in the age of machine learning and large-scale analytics.

Structured Data

Structured data is organized into a defined format — typically rows and columns in a spreadsheet or relational database. Each field has a clear data type. SQL databases, spreadsheets, and CSV files all store structured data. This is the kind of data traditional statistical methods were built to handle.

Structured Data

Organized in rows and columns with defined fields

Each record has the same set of attributes. Analysis tools can directly parse and query this data. It is searchable, sortable, and immediately ready for statistical methods.

Excel spreadsheets SQL database tables CSV files Financial transaction records Medical records (EHR fields)

Unstructured Data

Unstructured data has no predefined format or schema. Text, images, audio, and video all fall into this category. Estimates from IBM and other technology research organizations suggest that 80–90% of all data generated today is unstructured. Traditional statistics cannot directly analyze it without first converting it to a structured form — through natural language processing, computer vision, or other transformation methods.

Unstructured Data

No predefined format — requires processing before analysis

Rich in information but not immediately queryable. Modern machine learning and AI systems are largely built to extract structured insight from unstructured sources.

Customer reviews (free text) Emails Medical imaging (X-rays) Social media posts Audio recordings Video files

Semi-structured data sits between these two categories. JSON files, XML documents, and HTML pages have some organizational markers (like tags or key-value pairs) but do not conform to the strict row-column format of relational databases. Email headers are structured (To:, From:, Date:), but email body text is unstructured.

Real-World Data Classification Examples

Abstract classifications become concrete when applied to actual datasets. Each example below shows variables drawn from real domains, with the correct data type and the reasoning behind it.

🏥 Healthcare — Patient Admission Record

A hospital records these variables for each patient: patient ID, diagnosis code (ICD-10), pain level (0–10), temperature (°C), blood pressure (mmHg), and discharge status (Discharged / Transferred / Deceased).

Patient ID: Nominal — a label with no mathematical meaning.
Diagnosis code: Nominal — ICD-10 codes are labels organized by taxonomy, not rank.
Pain level (0–10): Ordinal — ranked, but the gap between 3 and 4 is not proven equal to the gap between 7 and 8.
Temperature: Interval — equal gaps, but 0°C is not "no temperature."
Blood pressure: Ratio — mmHg has a true zero (no pressure).
Discharge status: Nominal — unordered categories.

📊 Business — E-commerce Dataset

An online retailer tracks: product category, customer satisfaction rating (1–5), number of items purchased, order value ($), time to delivery (days), and customer region.

Product category: Nominal — Electronics, Clothing, Food are labels without rank.
Satisfaction rating: Ordinal — ranked, but the emotional gap between 3 and 4 stars is not measurably equal.
Items purchased: Discrete ratio — whole numbers with a true zero.
Order value ($): Continuous ratio — $0 means no purchase; $200 is genuinely twice $100.
Delivery time: Continuous ratio — 0 days would mean immediate; 4 days is twice as long as 2 days.
Customer region: Nominal — geographic labels.

🎓 Education — Student Performance Dataset

A university collects: student ID, program (Engineering / Medicine / Law / Arts), year of study (1st / 2nd / 3rd / 4th), GPA, number of credits completed, and pass/fail status.

Student ID: Nominal — an identifier, not a measurement.
Program: Nominal — no natural ordering between programs.
Year of study: Ordinal — 4th year is "further along" than 1st, but the progress isn't necessarily equal per year.
GPA: Continuous ratio — 0.0 means no grade points; 4.0 is twice 2.0.
Credits completed: Discrete ratio — whole numbers, 0 means none completed.
Pass/Fail: Nominal — a binary categorical variable.

Data Classification Decision Framework

The "Count, Measure, or Label Framework" below reduces any classification question to four sequential checks. Work through them in order for any variable.

THE COUNT, MEASURE, OR LABEL FRAMEWORK
Step 1 Is the value a label, name, or category? → Qualitative. Then ask: can it be ranked? Yes → Ordinal. No → Nominal.
Step 2 Is the value a number representing a real amount? → Quantitative. Then ask: can you count exact integers only? Yes → Discrete. Can it take decimals? → Continuous.
Step 3 For quantitative data: does zero mean complete absence? Yes → Ratio. No (zero is arbitrary) → Interval.
Step 4 Verify by checking what operations make sense. Ratios valid? → Ratio. Only differences? → Interval. Only ranks? → Ordinal. Only equals/not-equals? → Nominal.
Worked Example — Classifying a Full Survey Dataset

A health survey collects: age, gender, education level, number of doctor visits last year, systolic blood pressure, and health self-rating ("Excellent / Good / Fair / Poor").

Age: Apply Step 2 → it's a number representing a real amount. Step 3 → 0 years means complete absence (newborn). Continuous ratio data.

Gender: Step 1 → it's a label. Can it be ranked? No — male/female/non-binary are different, not ranked. Nominal.

Education level: Step 1 → it's a category. Can it be ranked? Yes — PhD is "more education" than high school diploma. But are the gaps equal? No. Ordinal.

Doctor visits: Step 2 → it's a count. Can it be decimals? No — you visit 3 or 4 times, not 3.7 times. Step 3 → 0 visits means no visits (true zero). Discrete ratio.

Blood pressure: Step 2 → a measurement. Can it take decimals? Yes. Step 3 → 0 mmHg means no pressure (true zero, though physiologically impossible). Continuous ratio.

Health self-rating: Step 1 → a category. Can it be ranked? Yes — Excellent > Good > Fair > Poor. Are gaps equal? No. Ordinal.

✓ The dataset contains nominal (gender), ordinal (education, health rating), discrete ratio (doctor visits), and continuous ratio (age, blood pressure) variables — each requiring different analytical approaches.

Data Types and Statistical Tests

The data type directly constrains which statistical tests are valid. Using a parametric test on ordinal data, or computing a mean for nominal data, produces results that look precise but are mathematically unsound. The table below maps data types to appropriate analyses, based on the curriculum from Penn State's STAT 501 and the National Institute of Statistical Sciences guidance.

Data Type Appropriate Analyses Common Tests Example Use Case
Nominal Frequency, proportion, mode Chi-square test, Fisher's exact test Is blood type distribution different between two populations?
Ordinal Median, percentile, rank correlation Mann-Whitney U, Kruskal-Wallis, Spearman ρ Do satisfaction scores differ between product versions?
Interval Mean, standard deviation, correlation t-test, ANOVA, Pearson r, regression Do temperature conditions affect IQ test performance?
Ratio All arithmetic operations, geometric mean, coefficient of variation t-test, ANOVA, regression, z-test Does treatment group differ in mean blood pressure from control?
Discrete Count models, frequency distribution Poisson regression, binomial test, chi-square Do defect counts differ between two production lines?
Continuous Probability density, summary statistics t-test, ANOVA, KS test, regression Is height significantly different between two populations?

The choice of test also depends on your study design, sample size, and distributional assumptions — but data type is always the starting constraint. The full decision flowchart for selecting a statistical test is covered in the statistical test selector.

Common Mistakes When Identifying Data Types

Getting the data type wrong at the start of an analysis corrupts everything downstream. These are the classification errors that appear most frequently in student work and even in published research.

Variable Common Wrong Classification Correct Classification Why It Matters
ZIP codes Quantitative — because they look like numbers Nominal — they are geographic labels with no arithmetic meaning Averaging ZIP codes produces a meaningless result
Star ratings (1–5) Interval — because the numbers are evenly spaced Ordinal — gaps between stars are not proven equal in subjective experience Using a t-test on star ratings requires explicitly stating the interval assumption
Temperature (°C) Ratio — because it has a zero (0°C) Interval — 0°C is not the absence of temperature; it's a convention Saying "40°C is twice as hot as 20°C" is mathematically invalid
Age grouped into brackets Continuous — because age is normally continuous Ordinal — once placed into labeled groups (under 30, 30–50, 50+), it becomes ordinal Applying t-tests to grouped age categories is not valid without additional assumptions
Likert scale responses Interval — treating "Agree" = 4, "Strongly Agree" = 5 as equal-interval Ordinal — technically, but often treated as interval with an explicit stated assumption The assumption affects which parametric tests are defensible
Number of years (2000, 2010, 2020) Ratio — because they are numbers with equal gaps Interval — "Year 0" is a calendar convention, not the absence of time Saying 2020 is "twice the year" of 1010 is meaningless

Data Types in Real Industries

Every professional field works with a mixture of data types, and the analytical tools each industry uses reflect that mixture directly.

🏥

Healthcare Analytics

Patient records combine nominal (diagnosis codes), ordinal (pain scales), and ratio data (blood pressure, weight). Clinical trials compare continuous outcomes between groups using t-tests and ANOVA, while disease prevalence studies use chi-square tests on nominal categories.

📈

Business Intelligence

Sales data is ratio scale (revenue, units sold), customer segment data is nominal, and NPS scores are ordinal. Analysts use histograms for continuous revenue data and bar charts for categorical product breakdowns.

🎓

Education Research

Test scores are ratio data (enabling mean comparisons across schools), while student self-efficacy surveys produce ordinal Likert data. Graduation status is nominal. Each variable type feeds into a different analysis pipeline.

📱

Social Media Analytics

Like counts are discrete ratio data. Sentiment classifications (positive/neutral/negative) are nominal. Engagement rates are continuous ratio. Machine learning on post text starts with unstructured data that gets converted to structured numerical features.

🔬

Scientific Research

Physics experiments typically yield continuous ratio measurements. Biology studies often classify organisms into nominal taxonomic categories. Survey-based psychology research generates primarily ordinal Likert data. The study design determines the data types collected.

Concept Glossary

Term Plain Definition Example Common Misunderstanding
Qualitative Data Describes categories or labels, not amounts Eye color, nationality, diagnosis "Not useful mathematically" — it is; chi-square tests use it directly
Quantitative Data Numerical values where arithmetic makes sense Income, height, temperature "All numbers behave the same" — ZIP codes are numbers but nominal
Discrete Data Countable whole-number values Number of children (3, 4, 5...) Often confused with nominal data because of whole-number appearance
Continuous Data Can take any decimal value in a range Height (170.43 cm) Assumed to always be precisely measured — depends on instrument
Nominal Data Unordered categories — labels only Blood type (A, B, AB, O) Sometimes mistaken as ordinal when categories seem to have an implied order
Ordinal Data Ranked categories with unequal gaps Satisfaction rating (1–5 stars) Mean is often incorrectly applied without stating the interval assumption
Interval Data Equal gaps, but zero is arbitrary Temperature in °C Confused with ratio because it has a "zero" — but that zero is not absence
Ratio Data Equal gaps with a true zero Weight in kg, income in $ Often confused with interval data; the test is whether zero means "none"
Primary Data Collected firsthand for this study Your own survey responses Assumed to always be more reliable than secondary — quality depends on methodology
Secondary Data Existing data reused from another source Census records, WHO statistics Viewed as less valid — but government datasets are often higher quality than small primary studies
Structured Data Organized into rows and columns Excel spreadsheet, SQL table Assumed to be the only valid data type for analysis — unstructured data is increasingly central to AI
Unstructured Data No fixed format — text, images, audio Customer reviews, X-ray images Assumed to be unusable without conversion — ML models process it directly

Frequently Asked Questions

The four types, classified by measurement scale, are nominal, ordinal, interval, and ratio. Nominal data consists of unordered categories (blood type, country). Ordinal data has ranked categories without equal gaps (satisfaction ratings, education level). Interval data has equal intervals but no true zero (temperature in Celsius). Ratio data has equal intervals and a meaningful zero that represents the complete absence of the quantity (weight, age, income). These four scales are sometimes called the NOIR framework, after Stanley Stevens's 1946 classification.
Qualitative data describes categories, labels, or characteristics that cannot be meaningfully added or averaged — for example, eye color, nationality, or product category. Quantitative data consists of numbers that represent real counts or measurements where arithmetic makes sense — for example, height, number of sales, or exam score. Qualitative data answers "what kind?" while quantitative data answers "how many?" or "how much?" One practical test: if computing an average would be absurd, the data is qualitative.
Age is typically continuous ratio data when recorded as a number (e.g., 34 years old). It has a true zero (birth) and equal intervals, and you can validly say one person is twice the age of another. However, if age is recorded as a category — "child," "teenager," "adult" — it becomes ordinal qualitative data. The same underlying variable can change data type depending on how it is measured and recorded. This is one of the most common sources of classification confusion in survey research.
Discrete data takes specific, countable values with no possible values in between — for example, you can have 28 or 29 students in a class, but not 28.7. Continuous data can take any value within a range, including every decimal — a person might weigh 68.25 kg or 68.3 kg. A practical test: ask whether you can always place another possible value between any two measurements. If the answer is yes, the data is continuous. Both discrete and continuous are subtypes of quantitative data.
Interval scale data has equal, measurable gaps between values, but no true zero. Temperature in Celsius is the standard example: 0°C is an arbitrarily chosen reference point (the freezing point of water), not the complete absence of temperature. Differences are meaningful (10°C is genuinely 10 degrees warmer than 0°C), but ratios are not (20°C is not "twice as hot" as 10°C). Ratio scale data adds a meaningful zero that represents complete absence of the measured quantity. Weight, height, income, and reaction time are all ratio scale because zero means none.
Data type determines which statistical methods are valid. Calculating a mean for nominal data like blood type produces a number that means nothing. A chi-square test works for comparing categorical distributions but is inappropriate for continuous measurements. Misidentifying data type leads to invalid test choices, and invalid tests produce misleading results regardless of how carefully everything else is done. The classification also affects visualization choices: bar charts suit nominal and ordinal data, histograms suit continuous data. Getting the type right is the first step in every legitimate analysis.
For nominal data: chi-square test, Fisher's exact test. For ordinal data: Mann-Whitney U test, Kruskal-Wallis test, Spearman rank correlation. For interval and ratio data: t-tests, ANOVA, Pearson correlation, linear regression. The choice also depends on sample size, number of groups, and whether you are comparing means, proportions, or relationships. The statistical test selector on Statistics Fundamentals provides a step-by-step guide for matching your specific situation to the correct test.
Primary data is collected directly by the researcher for the specific research question — through surveys, experiments, or direct observation. The researcher controls the measurement process from start to finish. Secondary data comes from existing sources that someone else collected for a different original purpose — such as U.S. Census records, WHO health statistics, or academic datasets. Primary data is expensive and slow to collect but tailored precisely to the question. Secondary data is faster and cheaper to access but may not perfectly match the research needs. Both types can be valid and rigorous, depending on the methodology.
Work through the Count, Measure, or Label Framework: (1) Is the value a name, label, or category? → qualitative. Can it be ranked? → ordinal if yes, nominal if no. (2) Is the value a number with real meaning? → quantitative. Can it take decimal values? → continuous if yes, discrete if no. (3) For numbers, does zero mean complete absence? → ratio if yes, interval if no. Verify by checking which operations produce meaningful results: if ratios make sense, it's ratio scale. If only differences make sense, it's interval. If only rank comparisons make sense, it's ordinal. If only equality checks make sense, it's nominal.
Sources and References:
NIST/SEMATECH. (2012). e-Handbook of Statistical Methods — Introduction to EDA. itl.nist.gov | Stevens, S.S. (1946). On the theory of scales of measurement. Science, 103(2684), 677–680. | Penn State STAT 500. Applied Statistics: Foundations. online.stat.psu.edu | U.S. Census Bureau. Data.census.gov. data.census.gov | World Health Organization. Global Health Observatory Data Repository. who.int/data/gho | American Psychological Association. Quantitative and Measurement Subfield Overview. apa.org | Moore, D.S., McCabe, G.P., & Craig, B.A. (2017). Introduction to the Practice of Statistics (9th ed.). W.H. Freeman. | MIT OpenCourseWare. (2016). 18.650 Statistics for Applications. ocw.mit.edu