## Mathematical and Applied Statistics Lesson of the Day – The Motivation and Intuition Behind Chebyshev’s Inequality

In 2 recent Statistics Lessons of the Day, I

Chebyshev’s inequality is just a special version of Markov’s inequality; thus, their motivations and intuitions are similar.

$P[|X - \mu| \geq k \sigma] \leq 1 \div k^2$

Markov’s inequality roughly says that a random variable $X$ is most frequently observed near its expected value, $\mu$.  Remarkably, it quantifies just how often $X$ is far away from $\mu$.  Chebyshev’s inequality goes one step further and quantifies that distance between $X$ and $\mu$ in terms of the number of standard deviations away from $\mu$.  It roughly says that the probability of $X$ being $k$ standard deviations away from $\mu$ is at most $k^{-2}$.  Notice that this upper bound decreases as $k$ increases – confirming our intuition that it is highly improbable for $X$ to be far away from $\mu$.

As with Markov’s inequality, Chebyshev’s inequality applies to any random variable $X$, as long as $E(X)$ and $V(X)$ are finite.  (Markov’s inequality requires only $E(X)$ to be finite.)  This is quite a marvelous result!

## Mathematical Statistics Lesson of the Day – Chebyshev’s Inequality

The variance of a random variable $X$ is just an expected value of a function of $X$.  Specifically,

$V(X) = E[(X - \mu)^2], \ \text{where} \ \mu = E(X)$.

Let’s substitute $(X - \mu)^2$ into Markov’s inequality and see what happens.  For convenience and without loss of generality, I will replace the constant $c$ with another constant, $b^2$.

$\text{Let} \ b^2 = c, \ b > 0. \ \ \text{Then,}$

$P[(X - \mu)^2 \geq b^2] \leq E[(X - \mu)^2] \div b^2$

$P[ (X - \mu) \leq -b \ \ \text{or} \ \ (X - \mu) \geq b] \leq V(X) \div b^2$

$P[|X - \mu| \geq b] \leq V(X) \div b^2$

Now, let’s substitute $b$ with $k \sigma$, where $\sigma$ is the standard deviation of $X$.  (I can make this substitution, because $\sigma$ is just another constant.)

$\text{Let} \ k \sigma = b. \ \ \text{Then,}$

$P[|X - \mu| \geq k \sigma] \leq V(X) \div k^2 \sigma^2$

$P[|X - \mu| \geq k \sigma] \leq 1 \div k^2$

This last inequality is known as Chebyshev’s inequality, and it is just a special version of Markov’s inequality.  In a later Statistics Lesson of the Day, I will discuss the motivation and intuition behind it.  (Hint: Read my earlier lesson on the motivation and intuition behind Markov’s inequality.)

## Mathematical and Applied Statistics Lesson of the Day – The Motivation and Intuition Behind Markov’s Inequality

Markov’s inequality may seem like a rather arbitrary pair of mathematical expressions that are coincidentally related to each other by an inequality sign:

$P(X \geq c) \leq E(X) \div c,$ where $c > 0$.

However, there is a practical motivation behind Markov’s inequality, and it can be posed in the form of a simple question: How often is the random variable $X$ “far” away from its “centre” or “central value”?

Intuitively, the “central value” of $X$ is the value that of $X$ that is most commonly (or most frequently) observed.  Thus, as $X$ deviates farther and farther from its “central value”, we would expect those distant-from-the-centre values to be less frequently observed.

Recall that the expected value, $E(X)$, is a measure of the “centre” of $X$.  Thus, we would expect that the probability of $X$ being very far away from $E(X)$ is very low.  Indeed, Markov’s inequality rigorously confirms this intuition; here is its rough translation:

As $c$ becomes really far away from $E(X)$, the event $X \geq c$ becomes less probable.

You can confirm this by substituting several key values of $c$.

• If $c = E(X)$, then $P[X \geq E(X)] \leq 1$; this is the highest upper bound that $P(X \geq c)$ can get.  This makes intuitive sense; $X$ is going to be frequently observed near its own expected value.

• If $c \rightarrow \infty$, then $P(X \geq \infty) \leq 0$.  By Kolmogorov’s axioms of probability, any probability must be inclusively between $0$ and $1$, so $P(X \geq \infty) = 0$.  This makes intuitive sense; there is no possible way that $X$ can be bigger than positive infinity.

## Mathematical Statistics Lesson of the Day – Markov’s Inequality

Markov’s inequality is an elegant and very useful inequality that relates the probability of an event concerning a non-negative random variable, $X$, with the expected value of $X$.  It states that

$P(X \geq c) \leq E(X) \div c,$

where $c > 0$.

I find Markov’s inequality to be beautiful for 2 reasons:

1. It applies to both continuous and discrete random variables.
2. It applies to any non-negative random variable from any distribution with a finite expected value.

In a later lesson, I will discuss the motivation and intuition behind Markov’s inequality, which has useful implications for understanding a data set.