Mathematical Statistics Lesson of the Day – Markov’s Inequality

Markov’s inequality is an elegant and very useful inequality that relates the probability of an event concerning a non-negative random variable, X, with the expected value of X.  It states that

P(X \geq c) \leq E(X) \div c,

where c > 0.

I find Markov’s inequality to be beautiful for 2 reasons:

  1. It applies to both continuous and discrete random variables.
  2. It applies to any non-negative random variable from any distribution with a finite expected value.

In a later lesson, I will discuss the motivation and intuition behind Markov’s inequality, which have useful implications for understanding a data set.

Applied Statistics Lesson of the Day – The Coefficient of Variation

In my statistics classes, I learned to use the variance or the standard deviation to measure the variability or dispersion of a data set.  However, consider the following 2 hypothetical cases:

  1. the standard deviation for the incomes of households in Canada is $2,000
  2. the standard deviation for the incomes of the 5 major banks in Canada is $2,000

Even though this measure of dispersion has the same value for both sets of income data, $2,000 is a significant amount for a household, whereas $2,000 is not a lot of money for one of the “Big Five” banks.  Thus, the standard deviation alone does not give a fully accurate sense of the relative variability between the 2 data sets.  One way to overcome this limitation is to take the mean of the data sets into account.

A useful statistic for measuring the variability of a data set while scaling by the mean is the sample coefficient of variation:

\text{Sample Coefficient of Variation (} \bar{c_v} \text{)} \ = \ s \ \div \ \bar{x},

where s is the sample standard deviation and \bar{x} is the sample mean.

Analogously, the coefficient of variation for a random variable is

\text{Coefficient of Variation} \ (c_v) \ = \ \sigma \div \ \mu,

where \sigma is the random variable’s standard deviation and \mu is the random variable’s expected value.

The coefficient of variation is a very useful statistic that I, unfortunately, never learned in my introductory statistics classes.  I hope that all students new to statistics get to learn this alternative measure of dispersion.

Exploratory Data Analysis: Combining Histograms and Density Plots to Examine the Distribution of the Ozone Pollution Data from New York in R


This is a follow-up post to my recent introduction of histograms.  Previously, I presented the conceptual foundations of histograms and used a histogram to approximate the distribution of the “Ozone” data from the built-in data set “airquality” in R.  Today, I will examine this distribution in more detail by overlaying the histogram with parametric and non-parametric kernel density plots.  I will finally answer the question that I have asked (and hinted to answer) several times: Are the “Ozone” data normally distributed, or is another distribution more suitable?

histogram and kernel density plot

Read the rest of this post to learn how to combine histograms with density curves like this above plot!

This is another post in my continuing series on exploratory data analysis (EDA).  Previous posts in this series on EDA include

Read more of this post

Exploratory Data Analysis: Conceptual Foundations of Empirical Cumulative Distribution Functions


Continuing my recent series on exploratory data analysis (EDA), this post focuses on the conceptual foundations of empirical cumulative distribution functions (CDFs); in a separate post, I will show how to plot them in R.  (Previous posts in this series include descriptive statistics, box plots, kernel density estimation, and violin plots.)

To give you a sense of what an empirical CDF looks like, here is an example created from 100 randomly generated numbers from the standard normal distribution.  The ecdf() function in R was used to generate this plot; the entire code is provided at the end of this post, but read my next post for more detail on how to generate plots of empirical CDFs in R.

ecdf standard normal

Read to rest of this post to learn what an empirical CDF is and how to produce the above plot!

Read more of this post


Get every new post delivered to your Inbox.

Join 319 other followers