# Mathematical Statistics Lesson of the Day – The Glivenko-Cantelli Theorem

September 8, 2014 2 Comments

In 2 earlier tutorials that focused on exploratory data analysis in statistics, I introduced

- the conceptual background behind empirical cumulative distribution functions (empirical CDFs)
- how to plot empirical cumulative distribution functions in 2 different ways in R

There is actually an elegant theorem that provides a rigorous basis for using empirical CDFs to estimate the true CDF – and this is true for any **probability distribution**. It is called the **Glivenko-Cantelli theorem**, and here is what it states:

Given a sequence of independent and identically distributed random variables, ,

In other words, the empirical CDF of converges uniformly to the true CDF.

My mathematical statistics professor at the University of Toronto, Keith Knight, told my class that this is often referred to as “The First Theorem of Statistics” or the “The Fundamental Theorem of Statistics”. I think that this is a rather subjective title – the central limit theorem is likely more useful and important – but Page 261 of John Taylor’s An introduction to measure and probability (Springer, 1997) recognizes this attribution to the Glivenko-Cantelli theorem, too.

There’s actually a stronger result known as the Dvoretzky-Kiefer-Wolfowitz inequality (http://en.wikipedia.org/wiki/Dvoretzky%E2%80%93Kiefer%E2%80%93Wolfowitz_inequality) which gives a probabilistic bound on how far off the empirical CDF is from the true CDF at any given sample size. It deserves to be better known.

Thanks for sharing this, Justin! I never knew about it before!