## Mathematical Statistics Lesson of the Day – An Example of An Ancillary Statistic

Consider 2 random variables, $X_1$ and $X_2$, from the normal distribution $\text{Normal}(\mu, \sigma^2)$, where $\mu$ is unknown.  Then the statistic

$D = X_1 - X_2$

has the distribution

$\text{Normal}(0, 2\sigma^2)$.

The distribution of $D$ does not depend on $\mu$, so $D$ is an ancillary statistic for $\mu$.

Note that, if $\sigma^2$ is unknown, then $D$ is not ancillary for $\sigma^2$.

## Mathematical Statistics Lesson of the Day – Ancillary Statistics

The set-up for today’s post mirrors my earlier Statistics Lessons of the Day on sufficient statistics and complete statistics.

Suppose that you collected data

$\mathbf{X} = X_1, X_2, ..., X_n$

in order to estimate a parameter $\theta$.  Let $f_\theta(x)$ be the probability density function (PDF) or probability mass function (PMF) for $X_1, X_2, ..., X_n$.

Let

$a = A(\mathbf{X})$

be a statistics based on $\textbf{X}$.

If the distribution of $A(\textbf{X})$ does NOT depend on $\theta$, then $A(\textbf{X})$ is called an ancillary statistic.

An ancillary statistic contains no information about $\theta$; its distribution is fixed and known without any relation to $\theta$.  Why, then, would we care about $A(\textbf{X})$  I will address this question in later Statistics Lessons of the Day, and I will connect ancillary statistics to sufficient statistics, minimally sufficient statistics and complete statistics.

## Mathematical Statistics Lesson of the Day – Complete Statistics

The set-up for today’s post mirrors my earlier Statistics Lesson of the Day on sufficient statistics.

Suppose that you collected data

$\mathbf{X} = X_1, X_2, ..., X_n$

in order to estimate a parameter $\theta$.  Let $f_\theta(x)$ be the probability density function (PDF)* for $X_1, X_2, ..., X_n$.

Let

$t = T(\mathbf{X})$

be a statistic based on $\mathbf{X}$.

If

$E_\theta \{g[T(\mathbf{X})]\} = 0, \ \ \forall \ \theta,$

implies that

$P \{g[T(\mathbf{X})]\} = 0] = 1,$

then $T(\mathbf{X})$ is said to be complete.  To deconstruct this esoteric mathematical statement,

1. let $g(t)$ be a measurable function
2. if you want to use $g[T(\mathbf{X})]$ to form an unbiased estimator of the zero function,
3. and if the only such function is almost surely equal to the zero function,
4. then $T(\mathbf{X})$ is a complete statistic.

I will discuss the intuition behind this bizarre definition in a later Statistics Lesson of the Day.

*This above definition holds for discrete and continuous random variables.