## Mathematical Statistics Lesson of the Day – Basu’s Theorem

Today’s Statistics Lesson of the Day will discuss Basu’s theorem, which connects the previously discussed concepts of minimally sufficient statistics, complete statistics and ancillary statistics.  As before, I will begin with the following set-up.

Suppose that you collected data

$\mathbf{X} = X_1, X_2, ..., X_n$

in order to estimate a parameter $\theta$.  Let $f_\theta(x)$ be the probability density function (PDF) or probability mass function (PMF) for $X_1, X_2, ..., X_n$.

Let

$t = T(\mathbf{X})$

be a statistics based on $\textbf{X}$.

Basu’s theorem states that, if $T(\textbf{X})$ is a complete and minimal sufficient statistic, then $T(\textbf{X})$ is independent of every ancillary statistic.

Establishing the independence between 2 random variables can be very difficult if their joint distribution is hard to obtain.  This theorem allows the independence between minimally sufficient statistic and every ancillary statistic to be established without their joint distribution – and this is the great utility of Basu’s theorem.

However, establishing that a statistic is complete can be a difficult task.  In a later lesson, I will discuss another theorem that will make this task easier for certain cases.

## Eric’s Enlightenment for Thursday, May 14, 2015

1. Alcohol kills more people worldwide than HIV, AIDS, violence and tuberculosis combined.
2. Some crystals don’t recrystallize after heating and cooling, but form amorphous supercooled liquids.  Modifying the molecular structure of diketopyrrolopyrrole using shear forces can induce this type of behaviour.  Here is a video demonstration.  Here is the original paper.
3. How pyrex was born out of an accident in cooking spongecake 100 years ago.  (Hat Tip: Lauren Wolf)
4. Check out David Campbell’s graduate statistical computing course at SFU.  It dives into some cool topics in his research that are not always covered in statistical computing, like approximate Bayesian computation and many computational Bayesian methods.

## Using the Golden Section Search Method to Minimize the Sum of Absolute Deviations

#### Introduction

Recently, I introduced the golden search method – a special way to save computation time by modifying the bisection method with the golden ratio – and I illustrated how to minimize a cusped function with this script.  I also wrote an R function to implement this method and an R script to apply this method with an example.  Today, I will use apply this method to a statistical topic: minimizing the sum of absolute deviations with the median.

While reading Page 148 (Section 6.3) in Michael Trosset’s “An Introduction to Statistical Inference and Its Applications”, I learned 2 basic, simple, yet interesting theorems.

If X is a random variable with a population mean $\mu$ and a population median $q_2$, then

a) $\mu$ minimizes the function $f(c) = E[(X - c)^2]$

b) $q_2$ minimizes the function $h(c) = E(|X - c|)$

I won’t prove these theorems in this blog post (perhaps later), but I want to use the golden section search method to show a result similar to b):

c) The sample median, $\tilde{m}$, minimizes the function

$g(c) = \sum_{i=1}^{n} |X_i - c|$.

This is not surprising, of course, since

$|X - c|$ is just a function of the random variable $X$

– by the law of large numbers,

$\lim_{n\to \infty}\sum_{i=1}^{n} |X_i - c| = E(|X - c|)$

Thus, if the median minimizes $E(|X - c|)$, then, intuitively, it minimizes $\lim_{n\to \infty}\sum_{i=1}^{n} |X_i - c|$.  Let’s show this with the golden section search method, and let’s explore any differences that may arise between odd-numbered and even-numbered data sets.

## Scripts and Functions: Using R to Implement the Golden Section Search Method for Numerical Optimization

In an earlier post, I introduced the golden section search method – a modification of the bisection method for numerical optimization that saves computation time by using the golden ratio to set its test points.  This post contains the R function that implements this method, the R functions that contain the 3 functions that were minimized by this method, and the R script that ran the minimization.

I learned some new R functions while learning this new algorithm.

– the curve() function for plotting curves

– the cat() function for concatenating strings and variables and, hence, for printing debugging statements

## The Golden Section Search Method: Modifying the Bisection Method with the Golden Ratio for Numerical Optimization

#### Introduction

The first algorithm that I learned for root-finding in my undergraduate numerical analysis class (MACM 316 at Simon Fraser University) was the bisection method.  It’s very intuitive and easy to implement in any programming language (I was using MATLAB at the time).  The bisection method can be easily adapted for optimizing 1-dimensional functions with a slight but intuitive modification.  As there are numerous books and web sites on the bisection method, I will not dwell on it in my blog post.

Instead, I will explain a clever and elegant way to modify the bisection method with the golden ratio that results in faster computation; I learned this method while reading “A First Course in Statistical Programming with R” by John Braun and Duncan Murdoch.  Using a script in R to implement this special algorithm, I will illustrate how to minimize a non-differentiable function with the golden section search method.  In a later post (for the sake of brevity), I will use the same method to show that the minimizer of the sum of the absolute deviations from a univariate data set is the median.  The R functions and script for doing everything are in another post.

The Fibonacci spiral approximates the golden spiral, a logarithmic spiral whose growth factor is the golden ratio.

Source: Dicklyon via Wikimedia