Continuing my recent series on exploratory data analysis (EDA), and following up on the last post on the conceptual foundations of empirical cumulative distribution functions (CDFs), this post shows how to plot them in R. (Previous posts in this series on EDA include descriptive statistics, box plots, kernel density estimation, and violin plots.)
I will plot empirical CDFs in 2 ways:
- using the built-in ecdf() and plot() functions in R
- calculating and plotting the cumulative probabilities against the ordered data
Continuing from the previous posts in this series on EDA, I will use the “Ozone” data from the built-in “airquality” data set in R. Recall that this data set has missing values, and, just as before, this problem needs to be addressed when constructing plots of the empirical CDFs.
Recall the plot of the empirical CDF of random standard normal numbers in my earlier post on the conceptual foundations of empirical CDFs. That plot will be compared to the plots of the empirical CDFs of the ozone data to check if they came from a normal distribution.