box plot | The Chemical Statistician

Eric’s Enlightenment for Monday, April 27, 2015

April 27, 2015 Leave a comment

How much wealth do Canadians have? What types of assets and debts do Canadians have? Here is a very useful table to answer those questions from Statistics Canada.
Here is the Canadian Parliamentary Budget Officer’s assessment of the tax-free savings account (2015-02-24).
Tracey Weissberger et al. argue persuasively for fundamental changes to the visualization of continuous data in academic publications. Instead of bar plots, use scatter plots, box plots, and histograms, which better allow readers to examine the distribution of the data.
Ram B. Gupta et al. have developed a new way to clean crude oil spills using soy.

Filed under Eric's Enlightenment Tagged with bar plot, box plot, continuous data, Data Visualization, histogram, oil spill, scatter plot, soy, statistics, tax-free savings account, tfsa

Exploratory Data Analysis – All Blog Posts on The Chemical Statistician

December 11, 2014 Leave a comment

This series of posts introduced various methods of exploratory data analysis, providing theoretical backgrounds and practical examples. Fully commented and readily usable R scripts are available for all topics for you to copy and paste for your own analysis! Most of these posts involve data visualization and plotting, and I include a lot of detail and comments on how to invoke specific plotting commands in R in these examples.

I will return to this blog post to add new links as I write more tutorials.

Useful R Functions for Exploring a Data Frame

The 5-Number Summary – Two Different Methods in R

Combining Histograms and Density Plots to Examine the Distribution of the Ozone Pollution Data from New York in R

Conceptual Foundations of Histograms – Illustrated with New York’s Ozone Pollution Data

Quantile-Quantile Plots for New York’s Ozone Pollution Data

Kernel Density Estimation and Rug Plots in R on Ozone Data in New York and Ozonopolis

2 Ways of Plotting Empirical Cumulative Distribution Functions in R

Conceptual Foundations of Empirical Cumulative Distribution Functions

Combining Box Plots and Kernel Density Plots into Violin Plots for Ozone Pollution Data

Kernel Density Estimation – Conceptual Foundations

Variations of Box Plots in R for Ozone Concentrations in New York City and Ozonopolis

Computing Descriptive Statistics in R for Data on Ozone Pollution in New York City

How to Get the Frequency Table of a Categorical Variable as a Data Frame in R

The advantages of using count() to get N-way frequency tables as data frames in R

Filed under Applied Statistics, Data Analysis, Data Visualization, Descriptive Statistics, R programming, Statistics Tagged with 5-number summary, applied statistics, box plot, data analysis, Data Visualization, ecdf(), empirical cumulative distribution function, exploratory data analysis, five-number summary, frequency table, histogram, kernel density estimation, kernel density plot, quantile, quantile-quantile plot, R, R programming, violin plot

Exploratory Data Analysis: Combining Box Plots and Kernel Density Plots into Violin Plots for Ozone Pollution Data

June 16, 2013 9 Comments

Introduction

Recently, I began a series on exploratory data analysis (EDA), and I have written about descriptive statistics, box plots, and kernel density plots so far. As previously mentioned in my post on box plots, there is a way to combine box plots and kernel density plots. This combination results in violin plots, and I will show how to create them in R today.

Continuing from my previous posts on EDA, I will use 2 univariate data sets. One is the “ozone” data vector that is part of the “airquality” data set that is built into R; this data set contains data on New York’s air pollution. The other is a simulated data set of ozone pollution in a fictitious city called “Ozonopolis”. It is important to remember that the ozone data from New York has missing values, and this has created complications that needed to be addressed in previous posts; missing values need to be addressed for violin plots, too, and in a different way than before.

The vioplot() command in the “vioplot” package creates violin plots; the plotting options in this function are different and less versatile than other plotting functions that I have used in R. Thus, I needed to be more creative with the plot(), title(), and axis() functions to create the plots that I want. Read the details carefully to understand and benefit fully from the code.

Read further to learn how to create these violin plots that combine box plots with kernel density plots! Be careful – the syntax is more complicated than usual!

Exploratory Data Analysis: Variations of Box Plots in R for Ozone Concentrations in New York City and Ozonopolis

May 26, 2013 19 Comments

Introduction

Last week, I wrote the first post in a series on exploratory data analysis (EDA). I began by calculating summary statistics on a univariate data set of ozone concentration in New York City in the built-in data set “airquality” in R. In particular, I talked about how to calculate those statistics when the data set has missing values. Today, I continue this series by creating box plots in R and showing different variations and extensions that can be added; be sure to examine the details of this post’s R code for some valuable details. I learned many of these tricks from Robert Kabacoff’s “R in Action” (2011). Robert also has a nice blog called Quick-R that I consult often.

Recall that I the “Ozone” vector in the data set “airquality” has missing values. Let’s remove those missing values first before constructing the box plots.

# abstract the raw data vector
ozone0 = airquality$Ozone

# remove the missing values
ozone = ozone0[!is.na(ozone)]

Box Plots – What They Represent

The simplest box plot can be obtained by using the basic settings in the boxplot() command. As usual, I use png() and dev.off() to print the image to a local folder on my computer.

png('INSERT YOUR DIRECTORY HERE/box plot ozone.png')
boxplot(ozone, ylab = 'Ozone (ppb)', main = 'Box Plot of Ozone in New York')
dev.off()

What do the different parts of this box plot mean?

Discovering Argon with the 2-Sample t-Test

March 10, 2013 1 Comment

I learned about Lord Rayleigh’s discovery of argon in my 2nd-year analytical chemistry class while reading “Quantitative Chemical Analysis” by Daniel Harris. (William Ramsay was also responsible for this discovery.) This is one of my favourite stories in chemistry; it illustrates how diligence in measurement can lead to an elegant and surprising discovery. I find no evidence that Rayleigh and Ramsay used statistics to confirm their findings; their paper was published 13 years before Gosset published about the t-test. Thus, I will use a 2-sample t-test in R to confirm their result.

Photos of Lord Rayleigh and William Ramsay

Source: Wikimedia Commons

	Eric Cai - The Chemi… on Convert multiple variables bet…
	Jack on Convert multiple variables bet…
	Eric Cai - The Chemi… on Getting the names, types, form…
	Emily V on Getting the names, types, form…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Convert multiple variables bet…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Exploratory Data Analysis: Com…
	CK on Exploratory Data Analysis: Com…
	Eric Cai - The Chemi… on Video Tutorial: Breaking Down…

The Chemical Statistician

Eric’s Enlightenment for Monday, April 27, 2015

Exploratory Data Analysis – All Blog Posts on The Chemical Statistician

Exploratory Data Analysis: Combining Box Plots and Kernel Density Plots into Violin Plots for Ozone Pollution Data

Introduction

Exploratory Data Analysis: Variations of Box Plots in R for Ozone Concentrations in New York City and Ozonopolis

Introduction

Box Plots – What They Represent

What do the different parts of this box plot mean?

Discovering Argon with the 2-Sample t-Test

Eric’s Twitter Feed (@chemstateric)

Recent Comments

Popular Topics

Recent Posts

About Eric

Blogs and Web Sites That I Like to Read

Archives

Categories