August 12, 2013 Leave a comment
Continuing my recent series on exploratory data analysis (EDA), today’s post focuses on 5-number summaries, which were previously mentioned in the post on descriptive statistics in this series. I will define and calculate the 5-number summary in 2 different ways that are commonly used in R. (It turns out that different methods arise from the lack of universal agreement among statisticians on how to calculate quantiles.) I will show that the fivenum() function uses a simpler and more interpretable method to calculate the 5-number summary than the summary() function. This post expands on a recent comment that I made to correct an error in the post on box plots.
> y = seq(1, 11, by = 2) > y  1 3 5 7 9 11 > fivenum(y)  1 3 6 9 11 > summary(y) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.0 3.5 6.0 6.0 8.5 11.0
Why do these 2 methods of calculating the 5–number summary in R give different results? Read the rest of this post to find out the answer!
Previous posts in this series on EDA include
- Descriptive statistics
- Box plots
- The conceptual foundations of kernel density estimation
- How to construct kernel density plots and rug plots in R
- Violin plots
- The conceptual foundations of empirical cumulative distribution functions (CDFs)
- 2 ways of plotting empirical CDFs in R
- Conceptual foundations of histograms and how to plot them in R
- Combining histograms and density plots in R