Side-by-Side Box Plots with Patterns From Data Sets Stacked by reshape2 and melt() in R


A while ago, one of my co-workers asked me to group box plots by plotting them side-by-side within each group, and he wanted to use patterns rather than colours to distinguish between the box plots within a group; the publication that will display his plots prints in black-and-white only.  I gladly investigated how to do this in R, and I want to share my method and an example of what the final result looks like with you.

In generating a fictitious data set for this example, I will also demonstrate how to use the melt() function from the “reshape2” package in R to stack a data set while keeping categorical labels for the individual observations.  For now, here is a sneak peek at what we will produce at the end; the stripes are the harder pattern to produce.

triple box plots with patterns

Read the rest of this post to learn how to generate side-by-side box plots with patterns like the ones above!

Read more of this post


Exploratory Data Analysis: Combining Box Plots and Kernel Density Plots into Violin Plots for Ozone Pollution Data


Recently, I began a series on exploratory data analysis (EDA), and I have written about descriptive statistics, box plots, and kernel density plots so far.  As previously mentioned in my post on box plots, there is a way to combine box plots and kernel density plots.  This combination results in violin plots, and I will show how to create them in R today.

Continuing from my previous posts on EDA, I will use 2 univariate data sets.  One is the “ozone” data vector that is part of the “airquality” data set that is built into R; this data set contains data on New York’s air pollution.  The other is a simulated data set of ozone pollution in a fictitious city called “Ozonopolis”.  It is important to remember that the ozone data from New York has missing values, and this has created complications that needed to be addressed in previous posts; missing values need to be addressed for violin plots, too, and in a different way than before.  

The vioplot() command in the “vioplot” package creates violin plots; the plotting options in this function are different and less versatile than other plotting functions that I have used in R.  Thus, I needed to be more creative with the plot(), title(), and axis() functions to create the plots that I want.  Read the details carefully to understand and benefit fully from the code.

violin plots

Read further to learn how to create these violin plots that combine box plots with kernel density plots!  Be careful – the syntax is more complicated than usual!

Read more of this post