Use unique() instead of levels() to find the possible values of a factor in R

*In a previous version of this blog post, I incorrectly wrote that “Species” is a character variable.  Instead, it is a factor.  I thank the readers who corrected me in the comments.

When I first encountered R, I learned to use the levels() function to find the possible values of a categorical variable.  However, I recently noticed something very strange about this function.

Consider the built-in data set “iris” and its factor “Species”.  Here are the possible values of “Species”, as shown by the levels() function.

> levels(iris$Species)

[1] "setosa" "versicolor" "virginica"

Now, let’s remove all rows containing “setosa”.  I will use the table() function to confirm that no rows contain “setosa”, and then I will apply the levels() function to “Species” again.

> iris2 = subset(iris, Species != 'setosa')
> table(iris2$Species)

    setosa versicolor virginica 
         0         50        50 


> levels(iris2$Species)

[1] "setosa" "versicolor" "virginica"

Read more of this post

Advertisements