## Mathematical Statistics Lesson of the Day – Sufficient Statistics

*Update on 2014-11-06: Thanks to Christian Robert’s comment, I have removed the sample median as an example of a sufficient statistic.

Suppose that you collected data

$\mathbf{X} = X_1, X_2, ..., X_n$

in order to estimate a parameter $\theta$.  Let $f_\theta(x)$ be the probability density function (PDF)* for $X_1, X_2, ..., X_n$.

Let

$t = T(\mathbf{X})$

be a statistic based on $\mathbf{X}$.  Let $g_\theta(t)$ be the PDF for $T(X)$.

If the conditional PDF

$h_\theta(\mathbf{X}) = f_\theta(x) \div g_\theta[T(\mathbf{X})]$

is independent of $\theta$, then $T(\mathbf{X})$ is a sufficient statistic for $\theta$.  In other words,

$h_\theta(\mathbf{X}) = h(\mathbf{X})$,

and $\theta$ does not appear in $h(\mathbf{X})$.

Intuitively, this means that $T(\mathbf{X})$ contains everything you need to estimate $\theta$, so knowing $T(\mathbf{X})$ (i.e. conditioning $f_\theta(x)$ on $T(\mathbf{X})$) is sufficient for estimating $\theta$.

Often, the sufficient statistic for $\theta$ is a summary statistic of $X_1, X_2, ..., X_n$, such as their

• sample mean
• sample median – removed thanks to comment by Christian Robert (Xi’an)
• sample minimum
• sample maximum

If such a summary statistic is sufficient for $\theta$, then knowing this one statistic is just as useful as knowing all $n$ data for estimating $\theta$.

*This above definition holds for discrete and continuous random variables.