Exploratory Data Analysis: Variations of Box Plots in R for Ozone Concentrations in New York City and Ozonopolis
May 26, 2013 14 Comments
Last week, I wrote the first post in a series on exploratory data analysis (EDA). I began by calculating summary statistics on a univariate data set of ozone concentration in New York City in the built-in data set “airquality” in R. In particular, I talked about how to calculate those statistics when the data set has missing values. Today, I continue this series by creating box plots in R and showing different variations and extensions that can be added; be sure to examine the details of this post’s R code for some valuable details. I learned many of these tricks from Robert Kabacoff’s “R in Action” (2011). Robert also has a nice blog called Quick-R that I consult often.
Recall that I abstracted a vector called “ozone” from the data set “airquality”.
ozone = airquality$Ozone
Box Plots – What They Represent
png('INSERT YOUR DIRECTORY HERE/box plot ozone.png') boxplot(ozone, ylab = 'Ozone (ppb)', main = 'Box Plot of Ozone in New York') dev.off()