## Use unique() instead of levels() to find the possible values of a factor in R

March 10, 2018 12 Comments

**In a previous version of this blog post, I incorrectly wrote that “Species” is a character variable. Instead, it is a factor. I thank the readers who corrected me in the comments.*

When I first encountered R, I learned to use the levels() function to find the possible values of a categorical variable. However, I recently noticed something very strange about this function.

Consider the built-in data set “iris” and its factor “Species”. Here are the possible values of “Species”, as shown by the levels() function.

> levels(iris$Species) [1] "setosa" "versicolor" "virginica"

Now, let’s remove all rows containing “setosa”. I will use the table() function to confirm that no rows contain “setosa”, and then I will apply the levels() function to “Species” again.

> iris2 = subset(iris, Species != 'setosa') > table(iris2$Species) setosa versicolor virginica 0 50 50 > levels(iris2$Species) [1] "setosa" "versicolor" "virginica"

## Recent Comments