Q. What if a categorical predictor in a regression model has more than two groups?
One challenge with using dummy variables in a regression model is that you can only compare the effect of binary categorizations. Comparing the effect of gender is easy because you only have two groups, male and female. But what if you wanted to fit a model using a participant’s education level, religion, political affiliation, or marital status? These predictors are not binary.
You can solve this problem by using multiple dummy variables to create binary groups. In general, you will have one less dummy variable than you have groups. Start by selecting one group as the baseline. In many cases, there is a natural baseline. For example, we might select participants who are atheists (i.e. not religious) as the baseline for a religion variable.
Next, create a dummy variable for each of the other categories. Continuing with the above example, we might have dummy variables for whether or not the participant is Christian, whether or not the participant is Muslim, whether or not the participant is Buddhist, etc. Each dummy variable represents the change in mean response from the baseline to the factor level represented by the dummy variable.