How do I identify variables as numeric (quantitative) or categorical (qualitative)?
Answer
A categorical variable is a variable with a set number of groups (gender, colors of the rainbow, brands of cereal), while a numeric variable is generally something that can be measured (height, weight, miles per hour). It is easy to identify categorical variables when the groups are specified with words, because you can’t perform mathematical operations on a word. However, if the variable is represented numerically, it is important to consider the characteristics of the variable instead of automatically assuming it’s numeric.
Here are some criteria to consider:
- Do the numbers represent categories? For example, gender is often coded with “0” and “1” in a dataset, but it’s still a categorical variable.
- Is there a set number of possible values the variable could take? For example, the variable “number of car doors” will probably only have the values of “2” or “4”. In this case, the variable is categorical.
- Is the variable measured on a continuous scale (another way of thinking about this is can it be measured)? Variables like height and weight are good examples of numeric predictors that meet this criterion.