Variable types and examples
This article presents the different variable types from a statistical point of view. To learn about the different data types in R, read “Data types in R”.
In statistics, variables are classified into 4 different types:
A quantitative variable is a variable that reflects a notion of magnitude, that is, if the values it can take are numbers. A quantitative variable represents thus a measure and is numerical.
Quantitative variables are divided into two types: discrete and continuous.
Quantitative discrete variables are variables for which the values it can take are countable and have a finite number of possibilities. The values are often (but not always) integers. Here are some examples of discrete variables:
- Number of children per family
- Number of students in a class
- Number of citizens of a country
Even if it would take a long time to count the citizens of a large country, it is still doable. Moreover, for all examples, the number of possibilities is finite. Whatever the number of children in a family, it will never be 3.58 or 7.912 so the number of possibilities is a finite number and thus countable.
On the other hand, continuous variables are variables for which the values are not countable and have an infinite number of possibilities. For example:
For simplicity, we usually referred to years, kilograms (or pounds) and centimeters (or feet and inches) for age, weight and height respectively. However, a 28-year-old man could actually be 28 years, 7 months, 16 days, 3 hours, 4 minutes, 5 seconds, 31 milliseconds, 9 nanoseconds old. For all measurements, we usually stop at a standard level of granularity, but nothing (except our measurement tools) prevents us from going deeper, leading to an infinite number of potential values. The fact that the values can take an infinite number of possibilities makes it uncountable.
In opposition to quantitative variables, qualitative variables (also referred as categorical variables or factors) are variables that are not numerical and which values fits into categories. In other words, a qualitative variable is a variable which takes as its values modalities, categories or even levels, in contrast to quantitative variables which measure a quantity on each individual.
Qualitative variables are divided into two types: nominal and ordinal.
A nominal variable is a qualitative variable where no ordering is possible or implied in the levels. For example, the variable gender is nominal because there is no order in the levels female/male. Eye color is another example of a nominal variable because there is no order among blue, brown or green eyes. A nominal variable can have between two levels (e.g., do you smoke? Yes/No or what is your gender? Female/Male) and a large number of levels (what is your college major? Each major is a level in that case).
On the other hand, an ordinal variable is a qualitative variable with an order implied in the levels. For instance, the variable “severity of road accidents” is ordinal because there is a clear order in the levels light/moderate/fatal.
There exists two main variable transformations:
- From a continuous to a discrete variable
- From a quantitative to a qualitative variable
From continuous to discrete
Let’s say we are interested in babies’ ages. The data collected is the age of the babies, so a continuous variable. However, we may work with only the number of weeks since birth and thus transforming the age into a discrete variable. The variable age remains a continuous variable but the variable we are working on (i.e., the number of weeks since birth) is a discrete variable.
From quantitative to qualitative
Let’s say we are interested in the Body Mass Index (BMI). For this, a researcher collects data on height and weight of individuals and computes the BMI. The BMI is a continuous variable but the researcher may want to turn it into a qualitative variable by categorizing individuals below a certain threshold as underweighted, above a certain threshold as overweighted and the rest as normal weight. The BMI is a continuous variable but it has been transformed to another variable, which is now a qualitative (ordinal) variable.
Same goes for age when age is transformed to an ordinal variable with levels such as minors, adults and seniors. It is also often the case (especially in surveys) that the variable salary (continuous) is transformed into an ordinal variable with different range of salaries (e.g., < 1000€, 1000 - 2000€, > 2000€).
The reason why we often class variables into different types is because not all statistical analyses can be performed on all variable types. For instance, it is impossible to compute the mean of the variable “hair color” as you cannot add brown and blond hair. On the other hand, finding the mode of a continuous variable does not really make any sense because most of the time there will not be two exact same values, so there will be no mode. And even in the case there is a mode, there will be very few observations with this value. As an example, try finding the mode of the height of the students in your class. If you are lucky, a couple of students will have the same size. However, most of the time, every student will have a different size (especially if the measurements include several decimals) and thus there will be no mode. To see what kind of analysis is possible on each type of variable, see “Descriptive statistics by hand” or “Descriptive statistics in R”.
Last but not least, in datasets it is very often the case that numbers are assigned to qualitative variables. For instance, a study may assign the number “1” to women and the number “2” to men (or “0” to the answer “No” and “1” to the answer “Yes”). Despite the numerical classification, the variable gender is still a qualitative variable and not a discrete variable as it may look. The numerical classification is used only for data analysis. It is indeed easier to write “1” or “2” instead of “women” or “men”, and thus less prone to encoding errors. If you face this kind of setup, do not forget to transform your variable into the right type before performing any statistical analyses.
Thanks for reading. I hope this article helped you to understand the different types of variable. If you would like to learn more about the different data types in R, read the article “Data types in R”.
As always, if you have a question or a suggestion related to the topic covered in this article, please add it as a comment so other readers can benefit from the discussion. If you find a mistake or bug, you can inform me by raising an issue on GitHub. For all other requests, you can contact me here.
Get updates every time a new article is published by subscribing to this blog.