February 3, 2015 24 Comments
One feature that I like about R is the ability to access and manipulate the outputs of many functions. For example, you can extract the kernel density estimates from density() and scale them to ensure that the resulting density integrates to 1 over its support set.
I recently needed to get a frequency table of a categorical variable in R, and I wanted the output as a data table that I can access and manipulate. This is a fairly simple and common task in statistics and data analysis, so I thought that there must be a function in Base R that can easily generate this. Sadly, I could not find such a function. In this post, I will explain why the seemingly obvious table() function does not work, and I will demonstrate how the count() function in the ‘plyr’ package can achieve this goal.
The Example Data Set – mtcars
Let’s use the mtcars data set that is built into R as an example. The categorical variable that I want to explore is “gear” – this denotes the number of forward gears in the car – so let’s view the first 6 observations of just the car model and the gear. We can use the subset() function to restrict the data set to show just the row names and “gear”.
> head(subset(mtcars, select = 'gear')) gear Mazda RX4 4 Mazda RX4 Wag 4 Datsun 710 4 Hornet 4 Drive 3 Hornet Sportabout 3 Valiant 3