September 8, 2014 2 Comments
- the conceptual background behind empirical cumulative distribution functions (empirical CDFs)
- how to plot empirical cumulative distribution functions in 2 different ways in R
There is actually an elegant theorem that provides a rigorous basis for using empirical CDFs to estimate the true CDF – and this is true for any probability distribution. It is called the Glivenko-Cantelli theorem, and here is what it states:
Given a sequence of independent and identically distributed random variables, ,
In other words, the empirical CDF of converges uniformly to the true CDF.
My mathematical statistics professor at the University of Toronto, Keith Knight, told my class that this is often referred to as “The First Theorem of Statistics” or the “The Fundamental Theorem of Statistics”. I think that this is a rather subjective title – the central limit theorem is likely more useful and important – but Page 261 of John Taylor’s An introduction to measure and probability (Springer, 1997) recognizes this attribution to the Glivenko-Cantelli theorem, too.