Exploratory Data Analysis: The 5-Number Summary – Two Different Methods in R


Continuing my recent series on exploratory data analysis (EDA), today’s post focuses on 5-number summaries, which were previously mentioned in the post on descriptive statistics in this series.  I will define and calculate the 5-number summary in 2 different ways that are commonly used in R.  (It turns out that different methods arise from the lack of universal agreement among statisticians on how to calculate quantiles.)  I will show that the fivenum() function uses a simpler and more interpretable method to calculate the 5-number summary than the summary() function.  This post expands on a recent comment that I made to correct an error in the post on box plots.

> y = seq(1, 11, by = 2)
> y
[1]  1  3  5  7  9 11
> fivenum(y)
[1]  1  3  6  9 11
> summary(y)
     Min.   1st Qu.   Median    Mean     3rd Qu.    Max. 
     1.0     3.5       6.0       6.0      8.5       11.0

Why do these 2 methods of calculating the 5–number summary in R give different results?  Read the rest of this post to find out the answer!

Previous posts in this series on EDA include

Read more of this post


Get every new post delivered to your Inbox.

Join 246 other followers