Mathematical Statistics Lesson of the Day – Minimally Sufficient Statistics

In using a statistic to estimate a parameter in a probability distribution, it is important to remember that there can be multiple sufficient statistics for the same parameter.  Indeed, the entire data set, X_1, X_2, ..., X_n, can be a sufficient statistic – it certainly contains all of the information that is needed to estimate the parameter.  However, using all n variables is not very satisfying as a sufficient statistic, because it doesn’t reduce the information in any meaningful way – and a more compact, concise statistic is better than a complicated, multi-dimensional statistic.  If we can use a lower-dimensional statistic that still contains all necessary information for estimating the parameter, then we have truly reduced our data set without stripping any value from it.

Our saviour for this problem is a minimally sufficient statistic.  This is defined as a statistic, T(\textbf{X}), such that

  1. T(\textbf{X}) is a sufficient statistic
  2. if U(\textbf{X}) is any other sufficient statistic, then there exists a function g such that

T(\textbf{X}) = g[U(\textbf{X})].

Note that, if there exists a one-to-one function h such that

T(\textbf{X}) = h[U(\textbf{X})],

then T(\textbf{X}) and U(\textbf{X}) are equivalent.