Side-by-Side Box Plots with Patterns From Data Sets Stacked by reshape2 and melt() in R

Introduction

A while ago, one of my co-workers asked me to group box plots by plotting them side-by-side within each group, and he wanted to use patterns rather than colours to distinguish between the box plots within a group; the publication that will display his plots prints in black-and-white only.  I gladly investigated how to do this in R, and I want to share my method and an example of what the final result looks like with you.

In generating a fictitious data set for this example, I will also demonstrate how to use the melt() function from the “reshape2” package in R to stack a data set while keeping categorical labels for the individual observations.  For now, here is a sneak peek at what we will produce at the end; the stripes are the harder pattern to produce.

triple box plots with patterns

Read the rest of this post to learn how to generate side-by-side box plots with patterns like the ones above!

Generating the Data

Due to confidentiality, I cannot use my co-worker’s data set on my public blog, so I generated a data set for my example of pollution in 3 cities involving 2 gases.

# set the working directory to your folder of choice
setwd("INSERT YOUR DIRECTORY PATH HERE")

# set seed for random number generation to allow you to replicate my data
set.seed(1)
# set the standard deviation for the data
sd = 3
# generate fictitious pollution data for New York, London, and Los Angeles
co2.ny = rnorm(120, 7, sd)
co2.london = rnorm(120, 9, sd)
co2.la = rnorm(120, 10.5, sd)
ch4.ny = c(rnorm(100, 12, sd), rep(NA, 20))
ch4.london = c(rnorm(100, 15, sd), rep(NA, 20))
ch4.la = c(rnorm(100, 18, sd), rep(NA, 20))

# set vectors for labelling the data with location and pollutant
location = c('New York', 'London', 'Los Angeles', 'New York', 'London', 'Los Angeles')
pollutant = c('CO2', 'CO2', 'CO2', 'CH4', 'CH4', 'CH4')

# combine the data
all.data = data.frame(rbind(co2.ny, co2.london, co2.la, ch4.ny, ch4.london, ch4.la))

# add locations and pollutants to the data
all.data$location = location
all.data$pollutant = pollutant

I used the set.seed() function to allow you to replicate my randomly generated numbers from various normal distributions.  There are 2 important notes about my code above:

  • I added 20 values of “NA” for CH4 for all 3 cities.  This was done to ensure that the resulting data frame has the same number of observations for all 6 of the vectors.  (Try to combine the data into a data frame without doing this, and look at the resulting data frame – it’s not what we want.)
  • The data frame all.data is sorted in the same order as the vectors are listed in the rbind() function when all.data was created.  As a good data manipulation practice, it is always a good idea to explore your data set; specifically, use the head() function to look at a newly generated data table or matrix and check that it is correct or conforms to the correct format.

 

Stacking the Data

I ultimately need a data table that has 3 columns:

  • the concentration of the pollutant (the value to be plotted)
  • the name of the pollutant
  • the city

Thus, I need to stack the data frame all.data while retaining the name of the pollutant and the city.  The melt() function from the “reshape2” package can do this.  (Install it first if you don’t have it yet.)

 

# open the reshape2 library
library(reshape2)
# stack the data while retaining the location and pollutant label by 
stacked.data = melt(all.data, id = c('location', 'pollutant'))
# remove the column that gives the column name of the concentration from all.data
stacked.data = stacked.data[, -3]

 

If you look at the data table stacked.data when it is first created by melt(), it has the extra, unnecessary column that I deleted in the last line of code.

 

Generating the Box Plots

Now that the data set has the needed format, let’s plot the box plots.  Note that my code below

  • creates an object called boxplots.triple for the box plots that I will use later in the code
  • uses the formula “value ~ location + pollutant” to separate the plots into the 2 groups of triplets
  • uses the “at = ” option to specify the locations of the box plots along the horizontal axis – I experimented with the horizontal positions to find the necessary separation for side-by-side box plots
  • suppresses the default horizontal axis with the option xaxt = ‘n’
  • adds my custom axis and title with the axis() and title() functions, respectively

 

png('triple box plots with patterns.png')
boxplots.triple = boxplot(value~location + pollutant, data = stacked.data, at = c(1, 1.8, 2.6, 6, 6.8, 7.6), xaxt='n', ylim = c(min(0, min(co2.ny, co2.london, co2.la)), max(ch4.ny, ch4.london, ch4.la, na.rm = T)), col = c('white', 'white', 'gray'))
axis(side=1, at=c(1.8, 6.8), labels=c('Methane (ppb)\nNumber of Collections = 100', 'Carbon Dioxide (ppb)\nNumber of Collections = 120'), line=0.5, lwd=0)
title('Comparing Pollution in London, Los Angeles, and New York')

 

Note that the resulting box plot from above gives the grey pattern to the right-most box plot (New York) for each pollutant.  That was easy with the “col = ” option in boxplot().  I now have 2 patterns: white and grey.  A natural third pattern would be stripes, and this is the (moderately) hard part.

The code below draws a rectangle with stripes over the middle (Los Angeles) box plots.  Based on the horizontal locations that I set above and the resulting inter-quartile ranges of the 2 box plots for Los Angeles, I specified where these 2 striped rectangles should be plotted.

Let’s look for the interquartile ranges for Los Angeles.  Recall that I created the object boxplots.triple from the boxplot() function.  One of the outputs from this object is the collection of the 5-number summaries underlying the box plots; another output is the name of each 5-number summary as given by the city-pollutant label.

 

Screen Shot 2014-04-10 at 8.13.50 PM

 

 

 

 

 

 

I now have all of the information needed to draw my 2 striped rectangles for Los Angeles.  (Notice that the order of the cities have been changed from the original order in rbind() to alphabetical order in box plots’ statistics.)

  • The left coordinates for the 2 rectangles are the left coordinates of the middle box plots as given in the “at =” option in boxplot() above.  In my example, they are 1.4 and 6.4.
  • The right coordinates for the 2 rectangles are the right coordinates of the middle box plots as given in the “at =” option in boxplot() above.  In my example, they are 2.2 and 7.2.
  • The bottom coordinates for the 2 rectangles are the 1st quartiles of the middle box plots as shown in boxplots.statistics above.  In my example, they are the 2nd and 5th columns in the 2nd row: 15.713810 and 9.228822.
  • The top coordinates for the 2 rectangles are the 3rd quartiles of the middle box plots as shown in boxplots.statistics above.  In my example, they are the 2nd and 5th columns in the 4th row: 20.109108 and 12.926979.

I encourage you to read the documentation of rect() to learn more about how it works.

# draw the rectangles for Los Angeles, specifying the left, bottom, right, and top positions of the box plots, the density of the lines, and the angles of the lines
rect(c(1.4, 6.4), boxplots.triple$stats[2, c(2, 5)], c(2.2, 7.2), boxplots.triple$stats[4, c(2, 5)], density=12, angle=45)
# add city labels near the box plots
text(c(1, 1.8, 2.6, 6, 6.8, 7.6), c(6, 8, 1, 17.75, 20, 15.5), c('London', 'Los\nAngeles', 'New\nYork', 'London', 'Los\nAngeles', 'New\nYork'))
dev.off()

 

For visual ease, I added the labels of the cities directly below or above each box plot.  I chose not to use a legend because

  • it’s tiring to look back and forth between the box plots and the legend to determine which pattern denotes which city.  If there are only 2 cities, then it’s OK, but 3 or more makes it hard to read.
  • The legend() function does not work with both the “fill = ” option and the “density = ” option; it seems like it can only work with 1 of them.  I searched extensively but unsuccessfully for a solution to this problem.  If anybody knows how to solve this problem, please tell me in the comments!

triple box plots with patterns

While this is a nice outcome and likely useful for many other situations with grouped box plots, this approach does seem cumbersome and inelegant.  Do you have any suggestions for how else to plot side-by-side box plots with patterns?  If so, please leave them in the comments below!

 

About these ads

9 Responses to Side-by-Side Box Plots with Patterns From Data Sets Stacked by reshape2 and melt() in R

  1. Not exactly the same, but your post made me go a little crazy:

    mult.boxplot<-function(list,labflip=TRUE){
    g1levels<-levels(as.factor(list[,1]))
    g2levels<-levels(as.factor(list[,2]))
    listyy<-1:(length(levels(as.factor(list[,1])))*length(levels(as.factor(list[,2]))))

    firstsplit<-(length(g1levels)*length(g2levels))/length(levels(as.factor(list[,1])))
    secondsplit<-(firstsplit/(length(levels(as.factor(list[,2])))))*1

    margins<-(listyy*secondsplit)
    margfunc<-function(l){
    if ((l-1)%%(length(g2levels))==0) {margins<<-c(margins[1:l-1],margins[l:length(margins)]+1)}
    }
    fulllist<-2:length(margins)
    bettermargins<-do.call(rbind,lapply(fulllist,margfunc))
    pick<-length(bettermargins[,1])
    finalmargins<-bettermargins[pick,]
    otherlist<-1:length(margins)
    func<-function(i){
    ( (finalmargins[i]+finalmargins[i+length(levels(as.factor(list[,2])))-1])/2)
    }
    funkl<-function(l){
    if (l==1) l
    else if (l%%length(g2levels)-1==0) l
    }
    bens<-do.call(rbind,lapply(otherlist,funkl))
    oddlist<-subset((1:(2*length(g2levels))),(1:(2*length(g2levels)))%%2==1)

    labelplacement<<-do.call(rbind,lapply(bens,func))
    #png('triple box plots with patterns.png')
    boxplots.triple = boxplot(list[,3]~as.factor(list[,1]) + as.factor(list[,2]),
    data = tester,
    at = finalmargins,
    xaxt='n',
    ylim = c(min(y),
    max(y, na.rm = T)),
    col = c('white', if (length(g2levels)==2){'gray'}
    else if (length(g2levels)==3){c('white','gray')}
    else if (length(g2levels)==4){c('white','gray', 'white')}
    else if (length(g2levels)==5){c('white','gray', 'white','white')}))
    par(las=1)
    axis(side=1, at=labelplacement,
    labels=c(g1levels),
    line=3, lwd=0)
    if (labflip==TRUE) par(las=2)
    axis(side=1, at=finalmargins,
    labels=rep(g2levels,length(g1levels)),
    line=-.5, lwd=0)

    title(paste('Comparing', deparse(substitute(y)), 'across levels of', deparse(substitute(factor1)), 'and', deparse(substitute(factor2))))
    if (length(g2levels)==4 & length(g1levels)==2){
    rect(finalmargins[c(2,6)]-.4, boxplots.triple$stats[2, c(2, 6)], finalmargins[c(2,6)]+.4, boxplots.triple$stats[4, c(2, 6)], density=12, angle=45)
    rect(finalmargins[c(4,8)]-.4, boxplots.triple$stats[2, c(4, 8)], finalmargins[c(4,8)]+.4, boxplots.triple$stats[4, c(4, 8)], density=12, angle=135)
    }
    if (length(g2levels)==4 & length(g1levels)==3){
    rect(finalmargins[c(2,6,10)]-.4, boxplots.triple$stats[2, c(2, 6,10)], finalmargins[c(2,6,10)]+.4, boxplots.triple$stats[4, c(2, 6,10)], density=12, angle=45)
    rect(finalmargins[c(4,8,12)]-.4, boxplots.triple$stats[2, c(4, 8,12)], finalmargins[c(4,8,12)]+.4, boxplots.triple$stats[4, c(4, 8,12)], density=12, angle=135)
    }
    if (length(g2levels)==4 & length(g1levels)==4){
    rect(finalmargins[c(2,6,10,14)]-.4, boxplots.triple$stats[2, c(2, 6,10,14)], finalmargins[c(2,6,10,14)]+.4, boxplots.triple$stats[4, c(2, 6,10,14)], density=12, angle=45)
    rect(finalmargins[c(4,8,12,16)]-.4, boxplots.triple$stats[2, c(4, 8,12,16)], finalmargins[c(4,8,12,16)]+.4, boxplots.triple$stats[4, c(4, 8,12,16)], density=12, angle=135)
    }
    if (length(g2levels)==4 & length(g1levels)==5){
    rect(finalmargins[c(2,6,10,14,18)]-.4, boxplots.triple$stats[2, c(2, 6,10,14,18)], finalmargins[c(2,6,10,14,18)]+.4, boxplots.triple$stats[4, c(2, 6,10,14,18)], density=12, angle=45)
    rect(finalmargins[c(4,8,12,16,20)]-.4, boxplots.triple$stats[2, c(4, 8,12,16,20)], finalmargins[c(4,8,12,16,20)]+.4, boxplots.triple$stats[4, c(4, 8,12,16,20)], density=12, angle=135)
    }
    if (length(g2levels)==3 & length(g1levels)==2){
    rect(finalmargins[c(2,5)]-.4, boxplots.triple$stats[2, c(2, 5)], finalmargins[c(2,5)]+.4, boxplots.triple$stats[4, c(2, 5)], density=12, angle=45)
    }
    if (length(g2levels)==3 & length(g1levels)==3){
    rect(finalmargins[c(2,5,8)]-.4, boxplots.triple$stats[2, c(2, 5,8)], finalmargins[c(2,5,8)]+.4, boxplots.triple$stats[4, c(2, 5,8)], density=12, angle=45)
    }
    if (length(g2levels)==3 & length(g1levels)==4){
    rect(finalmargins[c(2,5,8,11)]-.4, boxplots.triple$stats[2, c(2, 5,8,11)], finalmargins[c(2,5,8,11)]+.4, boxplots.triple$stats[4, c(2, 5,8,11)], density=12, angle=45)
    }
    if (length(g2levels)==3 & length(g1levels)==5){
    rect(finalmargins[c(2,5)]-.4, boxplots.triple$stats[2, c(2, 5,8,11,14)],finalmargins[c(2,5,8,11,14)]+.4, boxplots.triple$stats[4, c(2, 5,8,11,14)], density=12, angle=45)
    }
    if (length(g2levels)==5 & length(g1levels)==2){
    rect(finalmargins[c(2,7)]-.4, boxplots.triple$stats[2, c(2, 7)], finalmargins[c(2,7)]+.4, boxplots.triple$stats[4, c(2, 7)], density=12, angle=45)
    rect(finalmargins[c(4,9)]-.4, boxplots.triple$stats[2, c(4, 9)], finalmargins[c(4,9)]+.4, boxplots.triple$stats[4, c(4, 9)], density=12, angle=135)
    rect(finalmargins[c(5,10)]-.4, boxplots.triple$stats[2, c(5, 10)], finalmargins[c(5,10)]+.4, boxplots.triple$stats[4, c(5, 10)], density=12, angle=45)
    rect(finalmargins[c(5,10)]-.4, boxplots.triple$stats[2, c(5, 10)], finalmargins[c(5,10)]+.4, boxplots.triple$stats[4, c(5, 10)], density=12, angle=135)
    }
    if (length(g2levels)==5 & length(g1levels)==3){
    rect(finalmargins[c(2,7,11)]-.4, boxplots.triple$stats[2, c(2, 7,11)], finalmargins[c(2,7,11)]+.4, boxplots.triple$stats[4, c(2, 7,11)], density=12, angle=45)
    rect(finalmargins[c(4,9,14)]-.4, boxplots.triple$stats[2, c(4, 9,14)], finalmargins[c(4,9,14)]+.4, boxplots.triple$stats[4, c(4, 9,14)], density=12, angle=135)
    rect(finalmargins[c(5,10,15)]-.4, boxplots.triple$stats[2, c(5, 10,15)], finalmargins[c(5,10,15)]+.4, boxplots.triple$stats[4, c(5, 10,15)], density=12, angle=45)
    rect(finalmargins[c(5,10,15)]-.4, boxplots.triple$stats[2, c(5, 10,15)], finalmargins[c(5,10,15)]+.4, boxplots.triple$stats[4, c(5, 10,15)], density=12, angle=135)
    }
    if (length(g2levels)==5 & length(g1levels)==4){
    rect(finalmargins[c(2,7,11,16)]-.4, boxplots.triple$stats[2, c(2, 7,11,16)], finalmargins[c(2,7,11,16)]+.4, boxplots.triple$stats[4, c(2, 7,11,16)], density=12, angle=45)
    rect(finalmargins[c(4,9,14,19)]-.4, boxplots.triple$stats[2, c(4, 9,14,19)], finalmargins[c(4,9,14,19)]+.4, boxplots.triple$stats[4, c(4, 9,14,19)], density=12, angle=135)
    rect(finalmargins[c(5,10,15,20)]-.4, boxplots.triple$stats[2, c(5, 10,15,20)], finalmargins[c(5,10,15,20)]+.4, boxplots.triple$stats[4, c(5, 10,15,20)], density=12, angle=45)
    rect(finalmargins[c(5,10,15,20)]-.4, boxplots.triple$stats[2, c(5, 10,15,20)], finalmargins[c(5,10,15,20)]+.4, boxplots.triple$stats[4, c(5, 10,15,20)], density=12, angle=135)
    }
    if (length(g2levels)==5 & length(g1levels)==5){
    rect(finalmargins[c(2,7,11,16,21)]-.4, boxplots.triple$stats[2, c(2, 7,11,16,21)], finalmargins[c(2,7,11,16,21)]+.4, boxplots.triple$stats[4, c(2, 7,11,16,21)], density=12, angle=45)
    rect(finalmargins[c(4,9,14,19,24)]-.4, boxplots.triple$stats[2, c(4, 9,14,19,24)], finalmargins[c(4,9,14,19,24)]+.4, boxplots.triple$stats[4, c(4, 9,14,19,24)], density=12, angle=135)
    rect(finalmargins[c(5,10,15,20,25)]-.4, boxplots.triple$stats[2, c(5, 10,15,20,25)], finalmargins[c(5,10,15,20,25)]+.4, boxplots.triple$stats[4, c(5, 10,15,20,25)], density=12, angle=45)
    rect(finalmargins[c(5,10,15,20,25)]-.4, boxplots.triple$stats[2, c(5, 10,15,20,25)], finalmargins[c(5,10,15,20,25)]+.4, boxplots.triple$stats[4, c(5, 10,15,20,25)], density=12, angle=135)
    }

    }

    factor2<-rbinom(100,1,.5)
    factor21<-rbinom(100,1,.5)
    factor3<-sample(c('Male','Female','Other'),100,replace=TRUE)
    factor31<-sample(c('Catholic','Christian','Atheist'),100,replace=TRUE)
    factor4<-rep(1:4,25)
    factor41<-sample(c('U.S.','Australia','UK','Canada'),100,replace=TRUE)
    factor5<-rep(1:5,20)
    factor51<-sample(c('Caucasian','Af.Amer','Asian Am.','Latin Am.','Canadian'),100,replace=TRUE)
    y<-rnorm(100,53,19)
    tester<-data.frame(factor2,factor21,factor3,factor31,factor4,factor41,factor5,factor51,y)

    list<-data.frame(factor41,factor5,y)
    mult.boxplot(list)

    • Thanks for your helpful script, Andrew! I ran it and saw a nice plot that accomplishes the same goal.

      Could you please briefly explain what aspect of my code drove you crazy? I recognize that my code may not be very elegant, so I welcome any feedback that you can share.

      Thanks,

      Eric

  2. Oh, no. Nothing about it drove me crazy. I just hate using (and seeing) situation specific code. Whenever I go to do a task I always write a general script that can be reused on a new situation with almost zero work. So when I saw your post I went crazy and had to write a more general script. I’m just now starting to test it with ‘real’ data instead of artificial data posted with it and am fixing it up.

    So thank you for giving me the idea for it. A lot of it I felt like I was reinventing the wheel when it came to box spacing and label placement, but it was definitely a fun little puzzle to figure out how to get it all to work no matter the size of the factors you put in (up to 5 levels, because I just didn’t feel like programming anymore).

    • Hi Andrew,

      Yes, I agree that generalizable code is better than situation-specific code. For this task, I just needed to get the desired plot for my co-worker, and that was good enough for this purpose. While it would have been good to spend more time to write a generalizable script, I learned enough new things to write a blog post about it, and I also don’t want to let this one interesting idea take time away from my other blog posts in development.

      Thus, I really appreciate you taking the time to write this for all of us!

      Eric

      • I absolutely agree with all of your points. I never knew how to make patterns on boxplots, and had never investigated changing plot coordinates, etc, so your post was super informative for me. And I had a lull in the work day to start investigating how to generalize it and really enjoyed doing so, so I thought I would share it. What I posted doesn’t actually work with ‘real’ data in it’s current form. I’ve fixed most of it up and can re-post it when I have some more free time to polish it up.

  3. Andrew Taylor says:

    twoway.box<-function(list,data,group1, group2, y,labflip=TRUE){
    attach(data)
    g1levels<-levels(as.factor(group1))
    g2levels<-levels(as.factor(group2))
    numboxes<-1:(length(levels(as.factor(group1)))*length(levels(as.factor(group2))))

    firstsplit<-(length(g2levels)*length(g1levels))/length(levels(as.factor(group2)))
    secondsplit<-(firstsplit/(length(levels(as.factor(group1)))))*1

    margins<-(numboxes*secondsplit)
    margfunc<-function(l){
    if ((l-1)%%(length(g1levels))==0) {margins<<-c(margins[1:l-1],margins[l:length(margins)]+1)}
    }
    fulllist<-2:length(margins)
    bettermargins<-do.call(rbind,lapply(fulllist,margfunc))
    find.depth<-length(bettermargins[,1])
    finalmargins<-bettermargins[find.depth,]
    otherlist<-1:length(margins)
    find.middle<-function(i){
    ( (finalmargins[i]+finalmargins[i+length(levels(as.factor(group1)))-1])/2)
    }
    get.labplace<-function(l){
    if (l==1) l
    else if (l%%length(g1levels)-1==0) l
    }
    firstlabs<-do.call(rbind,lapply(otherlist,get.labplace))
    oddlist<-subset((1:(2*length(g1levels))),(1:(2*length(g1levels)))%%2==1)

    labelplacement<<-do.call(rbind,lapply(firstlabs,find.middle))
    #png('triple box plots with patterns.png')
    boxplots.triple = boxplot(y~as.factor(group1) + as.factor(group2),
    data = data,
    at = finalmargins,
    xaxt='n',
    ylim = c(min(y),
    max(y, na.rm = T)),
    col = c('white', if (length(g2levels)==2){'gray'}
    else if (length(g1levels)==3){c('white','gray')}
    else if (length(g1levels)==4){c('white','gray', 'white')}
    else if (length(g1levels)==5){c('white','gray', 'white','white')}))
    par(las=1)
    axis(side=1, at=labelplacement,
    labels=c(g2levels),
    line=3, lwd=0)
    if (labflip==TRUE) par(las=2)
    axis(side=1, at=finalmargins,
    labels=rep(g1levels,length(g2levels)),
    line=-.5, lwd=0)

    title(paste('Comparing', deparse(substitute(y)), 'across levels of', deparse(substitute(group1)), 'and', deparse(substitute(group2))))
    if (length(g1levels)==4 & length(g2levels)==2){
    rect(finalmargins[c(2,6)]-.4, boxplots.triple$stats[2, c(2, 6)], finalmargins[c(2,6)]+.4, boxplots.triple$stats[4, c(2, 6)], density=12, angle=45)
    rect(finalmargins[c(4,8)]-.4, boxplots.triple$stats[2, c(4, 8)], finalmargins[c(4,8)]+.4, boxplots.triple$stats[4, c(4, 8)], density=12, angle=135)
    }
    if (length(g1levels)==4 & length(g2levels)==3){
    rect(finalmargins[c(2,6,10)]-.4, boxplots.triple$stats[2, c(2, 6,10)], finalmargins[c(2,6,10)]+.4, boxplots.triple$stats[4, c(2, 6,10)], density=12, angle=45)
    rect(finalmargins[c(4,8,12)]-.4, boxplots.triple$stats[2, c(4, 8,12)], finalmargins[c(4,8,12)]+.4, boxplots.triple$stats[4, c(4, 8,12)], density=12, angle=135)
    }
    if (length(g1levels)==4 & length(g2levels)==4){
    rect(finalmargins[c(2,6,10,14)]-.4, boxplots.triple$stats[2, c(2, 6,10,14)], finalmargins[c(2,6,10,14)]+.4, boxplots.triple$stats[4, c(2, 6,10,14)], density=12, angle=45)
    rect(finalmargins[c(4,8,12,16)]-.4, boxplots.triple$stats[2, c(4, 8,12,16)], finalmargins[c(4,8,12,16)]+.4, boxplots.triple$stats[4, c(4, 8,12,16)], density=12, angle=135)
    }
    if (length(g1levels)==4 & length(g2levels)==5){
    rect(finalmargins[c(2,6,10,14,18)]-.4, boxplots.triple$stats[2, c(2, 6,10,14,18)], finalmargins[c(2,6,10,14,18)]+.4, boxplots.triple$stats[4, c(2, 6,10,14,18)], density=12, angle=45)
    rect(finalmargins[c(4,8,12,16,20)]-.4, boxplots.triple$stats[2, c(4, 8,12,16,20)], finalmargins[c(4,8,12,16,20)]+.4, boxplots.triple$stats[4, c(4, 8,12,16,20)], density=12, angle=135)
    }
    if (length(g1levels)==3 & length(g2levels)==2){
    rect(finalmargins[c(2,5)]-.4, boxplots.triple$stats[2, c(2, 5)], finalmargins[c(2,5)]+.4, boxplots.triple$stats[4, c(2, 5)], density=12, angle=45)
    }
    if (length(g1levels)==3 & length(g2levels)==3){
    rect(finalmargins[c(2,5,8)]-.4, boxplots.triple$stats[2, c(2, 5,8)], finalmargins[c(2,5,8)]+.4, boxplots.triple$stats[4, c(2, 5,8)], density=12, angle=45)
    }
    if (length(g1levels)==3 & length(g2levels)==4){
    rect(finalmargins[c(2,5,8,11)]-.4, boxplots.triple$stats[2, c(2, 5,8,11)], finalmargins[c(2,5,8,11)]+.4, boxplots.triple$stats[4, c(2, 5,8,11)], density=12, angle=45)
    }
    if (length(g1levels)==3 & length(g2levels)==5){
    rect(finalmargins[c(2,5)]-.4, boxplots.triple$stats[2, c(2, 5,8,11,14)],finalmargins[c(2,5,8,11,14)]+.4, boxplots.triple$stats[4, c(2, 5,8,11,14)], density=12, angle=45)
    }
    if (length(g1levels)==5 & length(g2levels)==2){
    rect(finalmargins[c(2,7)]-.4, boxplots.triple$stats[2, c(2, 7)], finalmargins[c(2,7)]+.4, boxplots.triple$stats[4, c(2, 7)], density=12, angle=45)
    rect(finalmargins[c(4,9)]-.4, boxplots.triple$stats[2, c(4, 9)], finalmargins[c(4,9)]+.4, boxplots.triple$stats[4, c(4, 9)], density=12, angle=135)
    rect(finalmargins[c(5,10)]-.4, boxplots.triple$stats[2, c(5, 10)], finalmargins[c(5,10)]+.4, boxplots.triple$stats[4, c(5, 10)], density=12, angle=45)
    rect(finalmargins[c(5,10)]-.4, boxplots.triple$stats[2, c(5, 10)], finalmargins[c(5,10)]+.4, boxplots.triple$stats[4, c(5, 10)], density=12, angle=135)
    }
    if (length(g1levels)==5 & length(g2levels)==3){
    rect(finalmargins[c(2,7,11)]-.4, boxplots.triple$stats[2, c(2, 7,11)], finalmargins[c(2,7,11)]+.4, boxplots.triple$stats[4, c(2, 7,11)], density=12, angle=45)
    rect(finalmargins[c(4,9,14)]-.4, boxplots.triple$stats[2, c(4, 9,14)], finalmargins[c(4,9,14)]+.4, boxplots.triple$stats[4, c(4, 9,14)], density=12, angle=135)
    rect(finalmargins[c(5,10,15)]-.4, boxplots.triple$stats[2, c(5, 10,15)], finalmargins[c(5,10,15)]+.4, boxplots.triple$stats[4, c(5, 10,15)], density=12, angle=45)
    rect(finalmargins[c(5,10,15)]-.4, boxplots.triple$stats[2, c(5, 10,15)], finalmargins[c(5,10,15)]+.4, boxplots.triple$stats[4, c(5, 10,15)], density=12, angle=135)
    }
    if (length(g1levels)==5 & length(g2levels)==4){
    rect(finalmargins[c(2,7,11,16)]-.4, boxplots.triple$stats[2, c(2, 7,11,16)], finalmargins[c(2,7,11,16)]+.4, boxplots.triple$stats[4, c(2, 7,11,16)], density=12, angle=45)
    rect(finalmargins[c(4,9,14,19)]-.4, boxplots.triple$stats[2, c(4, 9,14,19)], finalmargins[c(4,9,14,19)]+.4, boxplots.triple$stats[4, c(4, 9,14,19)], density=12, angle=135)
    rect(finalmargins[c(5,10,15,20)]-.4, boxplots.triple$stats[2, c(5, 10,15,20)], finalmargins[c(5,10,15,20)]+.4, boxplots.triple$stats[4, c(5, 10,15,20)], density=12, angle=45)
    rect(finalmargins[c(5,10,15,20)]-.4, boxplots.triple$stats[2, c(5, 10,15,20)], finalmargins[c(5,10,15,20)]+.4, boxplots.triple$stats[4, c(5, 10,15,20)], density=12, angle=135)
    }
    if (length(g1levels)==5 & length(g2levels)==5){
    rect(finalmargins[c(2,7,11,16,21)]-.4, boxplots.triple$stats[2, c(2, 7,11,16,21)], finalmargins[c(2,7,11,16,21)]+.4, boxplots.triple$stats[4, c(2, 7,11,16,21)], density=12, angle=45)
    rect(finalmargins[c(4,9,14,19,24)]-.4, boxplots.triple$stats[2, c(4, 9,14,19,24)], finalmargins[c(4,9,14,19,24)]+.4, boxplots.triple$stats[4, c(4, 9,14,19,24)], density=12, angle=135)
    rect(finalmargins[c(5,10,15,20,25)]-.4, boxplots.triple$stats[2, c(5, 10,15,20,25)], finalmargins[c(5,10,15,20,25)]+.4, boxplots.triple$stats[4, c(5, 10,15,20,25)], density=12, angle=45)
    rect(finalmargins[c(5,10,15,20,25)]-.4, boxplots.triple$stats[2, c(5, 10,15,20,25)], finalmargins[c(5,10,15,20,25)]+.4, boxplots.triple$stats[4, c(5, 10,15,20,25)], density=12, angle=135)
    }
    detach(data)
    }
    test.data<-function(){
    factor2<-rbinom(100,1,.5)
    factor21<-rbinom(100,1,.5)
    factor3<-sample(c('Male','Female','Other'),100,replace=TRUE)
    factor31<-sample(c('Catholic','Christian','Atheist'),100,replace=TRUE)
    factor4<-rep(1:4,25)
    factor41<-sample(c('U.S.','Australia','UK','Canada'),100,replace=TRUE)
    factor5<-rep(1:5,20)
    factor51<-sample(c('Caucasian','Af.Amer','Asian Am.','Latin Am.','Canadian'),100,replace=TRUE)
    y<-rnorm(100,53,19)
    tester<<-data.frame(factor2,factor21,factor3,factor31,factor4,factor41,factor5,factor51,y)
    }
    #test.data()
    #twoway.box(list=datas,labflip=TRUE,data=tester,group1=factor41,group2=factor4,y=y)

    • Andrew Taylor says:

      This should work, and work correctly with real data.

      It now takes your data and variables in a normal way. Also, and much more importantly, it is now plotting the data correctly. The earlier version I had mentally flipped what was getting subgrouped, and so it was blatantly wrong to disastrous effects.

    • Thanks for your dedicated efforts to getting it right, Andrew! This is much appreciated!

      • Thank you for working out how to do this in R and sharing your code. The ‘list’ should have been deleted in the arguments list and call; I wanted to send it before I left work and got sloppy.

        Eh, I got paid to do it during a lull in my work, so thank Henry Ford Hospital, I guess. I hadn’t run into your blog before, but I’m definitely subscribed now, so keep up the informative information, please :)

Your thoughtful comments are much appreciated!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 362 other followers

%d bloggers like this: