Mathematical Statistics Lesson of the Day – Basu’s Theorem

Today’s Statistics Lesson of the Day will discuss Basu’s theorem, which connects the previously discussed concepts of minimally sufficient statistics, complete statistics and ancillary statistics.  As before, I will begin with the following set-up.

Suppose that you collected data

$\mathbf{X} = X_1, X_2, ..., X_n$

in order to estimate a parameter $\theta$.  Let $f_\theta(x)$ be the probability density function (PDF) or probability mass function (PMF) for $X_1, X_2, ..., X_n$.

Let

$t = T(\mathbf{X})$

be a statistics based on $\textbf{X}$.

Basu’s theorem states that, if $T(\textbf{X})$ is a complete and minimal sufficient statistic, then $T(\textbf{X})$ is independent of every ancillary statistic.

Establishing the independence between 2 random variables can be very difficult if their joint distribution is hard to obtain.  This theorem allows the independence between minimally sufficient statistic and every ancillary statistic to be established without their joint distribution – and this is the great utility of Basu’s theorem.

However, establishing that a statistic is complete can be a difficult task.  In a later lesson, I will discuss another theorem that will make this task easier for certain cases.

Mathematical Statistics Lesson of the Day – Ancillary Statistics

The set-up for today’s post mirrors my earlier Statistics Lessons of the Day on sufficient statistics and complete statistics.

Suppose that you collected data

$\mathbf{X} = X_1, X_2, ..., X_n$

in order to estimate a parameter $\theta$.  Let $f_\theta(x)$ be the probability density function (PDF) or probability mass function (PMF) for $X_1, X_2, ..., X_n$.

Let

$a = A(\mathbf{X})$

be a statistics based on $\textbf{X}$.

If the distribution of $A(\textbf{X})$ does NOT depend on $\theta$, then $A(\textbf{X})$ is called an ancillary statistic.

An ancillary statistic contains no information about $\theta$; its distribution is fixed and known without any relation to $\theta$.  Why, then, would we care about $A(\textbf{X})$  I will address this question in later Statistics Lessons of the Day, and I will connect ancillary statistics to sufficient statistics, minimally sufficient statistics and complete statistics.

Mathematical Statistics Lesson of the Day – Minimally Sufficient Statistics

In using a statistic to estimate a parameter in a probability distribution, it is important to remember that there can be multiple sufficient statistics for the same parameter.  Indeed, the entire data set, $X_1, X_2, ..., X_n$, can be a sufficient statistic – it certainly contains all of the information that is needed to estimate the parameter.  However, using all $n$ variables is not very satisfying as a sufficient statistic, because it doesn’t reduce the information in any meaningful way – and a more compact, concise statistic is better than a complicated, multi-dimensional statistic.  If we can use a lower-dimensional statistic that still contains all necessary information for estimating the parameter, then we have truly reduced our data set without stripping any value from it.

Our saviour for this problem is a minimally sufficient statistic.  This is defined as a statistic, $T(\textbf{X})$, such that

1. $T(\textbf{X})$ is a sufficient statistic
2. if $U(\textbf{X})$ is any other sufficient statistic, then there exists a function $g$ such that

$T(\textbf{X}) = g[U(\textbf{X})].$

Note that, if there exists a one-to-one function $h$ such that

$T(\textbf{X}) = h[U(\textbf{X})],$

then $T(\textbf{X})$ and $U(\textbf{X})$ are equivalent.

Mathematical Statistics Lesson of the Day – Sufficient Statistics

*Update on 2014-11-06: Thanks to Christian Robert’s comment, I have removed the sample median as an example of a sufficient statistic.

Suppose that you collected data

$\mathbf{X} = X_1, X_2, ..., X_n$

in order to estimate a parameter $\theta$.  Let $f_\theta(x)$ be the probability density function (PDF)* for $X_1, X_2, ..., X_n$.

Let

$t = T(\mathbf{X})$

be a statistic based on $\mathbf{X}$.  Let $g_\theta(t)$ be the PDF for $T(X)$.

If the conditional PDF

$h_\theta(\mathbf{X}) = f_\theta(x) \div g_\theta[T(\mathbf{X})]$

is independent of $\theta$, then $T(\mathbf{X})$ is a sufficient statistic for $\theta$.  In other words,

$h_\theta(\mathbf{X}) = h(\mathbf{X})$,

and $\theta$ does not appear in $h(\mathbf{X})$.

Intuitively, this means that $T(\mathbf{X})$ contains everything you need to estimate $\theta$, so knowing $T(\mathbf{X})$ (i.e. conditioning $f_\theta(x)$ on $T(\mathbf{X})$) is sufficient for estimating $\theta$.

Often, the sufficient statistic for $\theta$ is a summary statistic of $X_1, X_2, ..., X_n$, such as their

• sample mean
• sample median – removed thanks to comment by Christian Robert (Xi’an)
• sample minimum
• sample maximum

If such a summary statistic is sufficient for $\theta$, then knowing this one statistic is just as useful as knowing all $n$ data for estimating $\theta$.

*This above definition holds for discrete and continuous random variables.