← How to Get the Frequency Table of a Categorical Variable as a Data Frame in R

Career Seminar at Department of Statistics and Actuarial Science, Simon Fraser University: 1:30 – 2:20 pm, Friday, February 20, 2015 →

The advantages of using count() to get N-way frequency tables as data frames in R

February 12, 2015 5 Comments

Introduction

I recently introduced how to use the count() function in the “plyr” package in R to produce 1-way frequency tables in R. Several commenters provided alternative ways of doing so, and they are all appreciated. Today, I want to extend that tutorial by demonstrating how count() can be used to produce N-way frequency tables in the list format – this will magnify the superiority of this function over other functions like table() and xtabs().

2-Way Frequencies: The Cross-Tabulated Format vs. The List-Format

To get a 2-way frequency table (i.e. a frequency table of the counts of a data set as divided by 2 categorical variables), you can display it in a cross-tabulated format or in a list format.

In R, the xtabs() function is good for cross-tabulation. Let’s use the “mtcars” data set again; recall that it is a built-in data set in Base R.

> y = xtabs(~ cyl + gear, mtcars)
> y
          gear
 cyl      3     4     5
 4        1     8     2
 6        2     4     1
 8        12    0     2

This is a nice way to visualize the counts in each of the 9 different categories as divided by the variables “gear” and “cyl”. You can use the row and column indices of this object to extract a particular value. For example, to extract the element in the third row and first column,

> y[3,1]
[1] 12

Alternatively, you can use the count() function in the “plyr” package to get the same frequencies in a list format.

> x = count(mtcars, c('cyl', 'gear'))
> x
         cyl     gear      freq
 1       4       3         1
 2       4       4         8
 3       4       5         2
 4       6       3         2
 5       6       4         4
 6       6       5         1
 7       8       3         12
 8       8       5         2

Notice that this object is a data frame. The column names derive naturally from its origin.

> class(x)
 [1] "data.frame"

> names(x)
 [1] "cyl"   "gear"   "freq"

You can access any particular element by 2 methods

Use the row and/or column indices.
Use particular values of “cyl” and “gear”.

For example, to find the number of cars with cyl = 8 and gear = 3, you can do

> x[7, ]$freq
 [1] 12
> subset(x, cyl == 8 & gear == 3)$freq
 [1] 12

I like the second method, because I don’t have to look at the values of the output table to find which row contains that particular combination of “cyl” and “gear”. This is a key advantage of the list format over the cross-tabulation format.

N-way frequencies: N > 2

Another key advantage of the list format over the cross-tabulation format is in obtaining frequency tables for 3 or more factors.

Cross-tabulations for N-way frequencies are difficult to visualize when N > 2. If N = 3, the best that you can do is using multiple tables, one for each value of the third factor. For example,

> w = xtabs(~ cyl + gear + vs, mtcars)
> w
 , , vs = 0
gear
 cyl    3  4  5
 4      0  0  1
 6      0  2  1
 8     12  0  2
, , vs = 1
gear
 cyl    3  4  5
 4      1  8  1
 6      2  2  0
 8      0  0  0

Moreover, it is now even more cumbersome to access the value of a particular combination of these 3 factors.

In contrast, the list format works in the same way, making it equally easy to visualize for any value of N in an N-way frequency table.

> t = count(mtcars, c('cyl', 'gear', 'vs'))
> t
        cyl    gear      vs      freq
 1      4      3         1       1
 2      4      4         1       8
 3      4      5         0       1
 4      4      5         1       1
 5      6      3         1       2
 6      6      4         0       2
 7      6      4         1       2
 8      6      5         0       1
 9      8      3         0       12
 10     8      5         0       2

Filed under Applied Statistics, Categorical Data Analysis, Data Analysis, Descriptive Statistics, R programming, Statistics, Tutorials Tagged with count, cross-tabulation, data analysis, frequency table, R, R programming, statistics, table(), xtabs()

5 Responses to The advantages of using count() to get N-way frequency tables as data frames in R

Tim says:

February 12, 2015 at 11:32 pm

As about visualization, there is data.frame(table()) method that also produces nice data.frame object instead of the default array.

Reply
Jean Plamondon says:

February 13, 2015 at 2:25 am

If its a matter of visualizing in the sense of “concise presentation” then ftable(xtabs(~ vs + cyl + gear, mtcars)) doesn’t do such a bad job. Always good to revist the subject anyway. Cheers

Reply
- Eric Cai - The Chemical Statistician says:
  
  February 13, 2015 at 9:39 am
  
  That’s very useful, Jean! Thanks for sharing!
  
  Reply
Misha says:

February 13, 2015 at 2:29 pm

If you like count(), I would recommend switching over to dplyr’s count() verb (http://blog.rstudio.org/2014/10/13/dplyr-0-3-2/) – similar syntax and orders of magnitude faster.

Reply
- Eric Cai - The Chemical Statistician says:
  
  February 13, 2015 at 2:55 pm
  
  Thanks, Misha! I didn’t know about the faster execution – thanks for sharing!
  
  Reply

	Eric Cai - The Chemi… on Convert multiple variables bet…
	Jack on Convert multiple variables bet…
	Eric Cai - The Chemi… on Getting the names, types, form…
	Emily V on Getting the names, types, form…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Convert multiple variables bet…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Exploratory Data Analysis: Com…
	CK on Exploratory Data Analysis: Com…
	Eric Cai - The Chemi… on Video Tutorial: Breaking Down…

The Chemical Statistician

The advantages of using count() to get N-way frequency tables as data frames in R

Introduction

2-Way Frequencies: The Cross-Tabulated Format vs. The List-Format

N-way frequencies: N > 2

5 Responses to The advantages of using count() to get N-way frequency tables as data frames in R

Your thoughtful comments are much appreciated! Cancel reply

Eric’s Twitter Feed (@chemstateric)

Recent Comments

Popular Topics

Recent Posts

About Eric

Blogs and Web Sites That I Like to Read

Archives

Categories

The Chemical Statistician

The advantages of using count() to get N-way frequency tables as data frames in R

Introduction

2-Way Frequencies: The Cross-Tabulated Format vs. The List-Format

N-way frequencies: N > 2

Share this:

Related

5 Responses to The advantages of using count() to get N-way frequency tables as data frames in R

Your thoughtful comments are much appreciated! Cancel reply

Eric’s Twitter Feed (@chemstateric)

Recent Comments

Popular Topics

Recent Posts

About Eric

Blogs and Web Sites That I Like to Read

Archives

Categories