The advantages of using count() to get N-way frequency tables as data frames in R

Introduction

I recently introduced how to use the count() function in the “plyr” package in R to produce 1-way frequency tables in R.  Several commenters provided alternative ways of doing so, and they are all appreciated.  Today, I want to extend that tutorial by demonstrating how count() can be used to produce N-way frequency tables in the list format – this will magnify the superiority of this function over other functions like table() and xtabs().

 

2-Way Frequencies: The Cross-Tabulated Format vs. The List-Format

To get a 2-way frequency table (i.e. a frequency table of the counts of a data set as divided by 2 categorical variables), you can display it in a cross-tabulated format or in a list format.

In R, the xtabs() function is good for cross-tabulation.  Let’s use the “mtcars” data set again; recall that it is a built-in data set in Base R.

> y = xtabs(~ cyl + gear, mtcars)
> y
          gear
 cyl      3     4     5
 4        1     8     2
 6        2     4     1
 8        12    0     2

This is a nice way to visualize the counts in each of the 9 different categories as divided by the variables “gear” and “cyl”.  You can use the row and column indices of this object to extract a particular value.  For example, to extract the element in the third row and first column,

> y[3,1]
[1] 12

Alternatively, you can use the count() function in the “plyr” package to get the same frequencies in a list format.

> x = count(mtcars, c('cyl', 'gear'))
> x
         cyl     gear      freq
 1       4       3         1
 2       4       4         8
 3       4       5         2
 4       6       3         2
 5       6       4         4
 6       6       5         1
 7       8       3         12
 8       8       5         2

Notice that this object is a data frame.  The column names derive naturally from its origin.

> class(x)
 [1] "data.frame"
> names(x)
 [1] "cyl"   "gear"   "freq"

You can access any particular element by 2 methods

  1. Use the row and/or column indices.
  2. Use particular values of “cyl” and “gear”.

For example, to find the number of cars with cyl = 8 and gear = 3, you can do

> x[7, ]$freq
 [1] 12
> subset(x, cyl == 8 & gear == 3)$freq
 [1] 12

I like the second method, because I don’t have to look at the values of the output table to find which row contains that particular combination of “cyl” and “gear”.  This is a key advantage of the list format over the cross-tabulation format.

 

N-way frequencies: N > 2

Another key advantage of the list format over the cross-tabulation format is in obtaining frequency tables for 3 or more factors.

Cross-tabulations for N-way frequencies are difficult to visualize when N > 2.  If N = 3, the best that you can do is using multiple tables, one for each value of the third factor.  For example,

> w = xtabs(~ cyl + gear + vs, mtcars)
> w
 , , vs = 0
gear
 cyl    3  4  5
 4      0  0  1
 6      0  2  1
 8     12  0  2
, , vs = 1
gear
 cyl    3  4  5
 4      1  8  1
 6      2  2  0
 8      0  0  0

Moreover, it is now even more cumbersome to access the value of a particular combination of these 3 factors.

In contrast, the list format works in the same way, making it equally easy to visualize for any value of N in an N-way frequency table.

> t = count(mtcars, c('cyl', 'gear', 'vs'))
> t
        cyl    gear      vs      freq
 1      4      3         1       1
 2      4      4         1       8
 3      4      5         0       1
 4      4      5         1       1
 5      6      3         1       2
 6      6      4         0       2
 7      6      4         1       2
 8      6      5         0       1
 9      8      3         0       12
 10     8      5         0       2

5 Responses to The advantages of using count() to get N-way frequency tables as data frames in R

  1. Tim says:

    As about visualization, there is data.frame(table()) method that also produces nice data.frame object instead of the default array.

  2. Jean Plamondon says:

    If its a matter of visualizing in the sense of “concise presentation” then ftable(xtabs(~ vs + cyl + gear, mtcars)) doesn’t do such a bad job. Always good to revist the subject anyway. Cheers

  3. Misha says:

    If you like count(), I would recommend switching over to dplyr’s count() verb (http://blog.rstudio.org/2014/10/13/dplyr-0-3-2/) – similar syntax and orders of magnitude faster.

Your thoughtful comments are much appreciated!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: