statistical computing | The Chemical Statistician

Mathematical Statistics Lesson of the Day – Basu’s Theorem

July 21, 2015 1 Comment

Today’s Statistics Lesson of the Day will discuss Basu’s theorem, which connects the previously discussed concepts of minimally sufficient statistics, complete statistics and ancillary statistics. As before, I will begin with the following set-up.

Suppose that you collected data

$\mathbf{X} = X_1, X_2, ..., X_n$

in order to estimate a parameter $\theta$ . Let $f_\theta(x)$ be the probability density function (PDF) or probability mass function (PMF) for $X_1, X_2, ..., X_n$ .

Let

$t = T(\mathbf{X})$

be a statistics based on $\textbf{X}$ .

Basu’s theorem states that, if $T(\textbf{X})$ is a complete and minimal sufficient statistic, then $T(\textbf{X})$ is independent of every ancillary statistic.

Establishing the independence between 2 random variables can be very difficult if their joint distribution is hard to obtain. This theorem allows the independence between minimally sufficient statistic and every ancillary statistic to be established without their joint distribution – and this is the great utility of Basu’s theorem.

However, establishing that a statistic is complete can be a difficult task. In a later lesson, I will discuss another theorem that will make this task easier for certain cases.

Filed under Mathematical Statistics, Mathematics, Statistics, Statistics Lesson of the Day Tagged with ancillary statistic, ancillary statistics, basu's theorem, complete statistic, complete statistics, independence, math, mathematical statistics, mathematics, minimally sufficient statistic, minimally sufficient statistics, statistical computing, sufficient statistic, sufficient statistics

Eric’s Enlightenment for Thursday, May 14, 2015

May 19, 2015 Leave a comment

Alcohol kills more people worldwide than HIV, AIDS, violence and tuberculosis combined.
Some crystals don’t recrystallize after heating and cooling, but form amorphous supercooled liquids. Modifying the molecular structure of diketopyrrolopyrrole using shear forces can induce this type of behaviour. Here is a video demonstration. Here is the original paper.
How pyrex was born out of an accident in cooking spongecake 100 years ago. (Hat Tip: Lauren Wolf)
Check out David Campbell’s graduate statistical computing course at SFU. It dives into some cool topics in his research that are not always covered in statistical computing, like approximate Bayesian computation and many computational Bayesian methods.

Filed under Eric's Enlightenment Tagged with AIDS, alcohol, amorphous supercooled liquids, approximate Bayesian computation, computational Bayesian methods, crystals, david campbell, diketopyrrolopyrrole, HIV, Lauren Wolf, pyrex, SFU, shear forces, statistical computing, tuberculosis, violence

Using the Golden Section Search Method to Minimize the Sum of Absolute Deviations

April 28, 2013 1 Comment

Introduction

Recently, I introduced the golden search method – a special way to save computation time by modifying the bisection method with the golden ratio – and I illustrated how to minimize a cusped function with this script. I also wrote an R function to implement this method and an R script to apply this method with an example. Today, I will use apply this method to a statistical topic: minimizing the sum of absolute deviations with the median.

While reading Page 148 (Section 6.3) in Michael Trosset’s “An Introduction to Statistical Inference and Its Applications”, I learned 2 basic, simple, yet interesting theorems.

If X is a random variable with a population mean $\mu$ and a population median $q_2$ , then

a) $\mu$ minimizes the function $f(c) = E[(X - c)^2]$

b) $q_2$ minimizes the function $h(c) = E(|X - c|)$

I won’t prove these theorems in this blog post (perhaps later), but I want to use the golden section search method to show a result similar to b):

c) The sample median, $\tilde{m}$ , minimizes the function

$g(c) = \sum_{i=1}^{n} |X_i - c|$ .

This is not surprising, of course, since

– $|X - c|$ is just a function of the random variable $X$

– by the law of large numbers,

$\lim_{n\to \infty}\sum_{i=1}^{n} |X_i - c| = E(|X - c|)$

Thus, if the median minimizes $E(|X - c|)$ , then, intuitively, it minimizes $\lim_{n\to \infty}\sum_{i=1}^{n} |X_i - c|$ . Let’s show this with the golden section search method, and let’s explore any differences that may arise between odd-numbered and even-numbered data sets.

Scripts and Functions: Using R to Implement the Golden Section Search Method for Numerical Optimization

April 22, 2013 4 Comments

In an earlier post, I introduced the golden section search method – a modification of the bisection method for numerical optimization that saves computation time by using the golden ratio to set its test points. This post contains the R function that implements this method, the R functions that contain the 3 functions that were minimized by this method, and the R script that ran the minimization.

I learned some new R functions while learning this new algorithm.

– the curve() function for plotting curves

– the cat() function for concatenating strings and variables and, hence, for printing debugging statements

The Golden Section Search Method: Modifying the Bisection Method with the Golden Ratio for Numerical Optimization

April 22, 2013 4 Comments

Introduction

The first algorithm that I learned for root-finding in my undergraduate numerical analysis class (MACM 316 at Simon Fraser University) was the bisection method. It’s very intuitive and easy to implement in any programming language (I was using MATLAB at the time). The bisection method can be easily adapted for optimizing 1-dimensional functions with a slight but intuitive modification. As there are numerous books and web sites on the bisection method, I will not dwell on it in my blog post.

Instead, I will explain a clever and elegant way to modify the bisection method with the golden ratio that results in faster computation; I learned this method while reading “A First Course in Statistical Programming with R” by John Braun and Duncan Murdoch. Using a script in R to implement this special algorithm, I will illustrate how to minimize a non-differentiable function with the golden section search method. In a later post (for the sake of brevity), I will use the same method to show that the minimizer of the sum of the absolute deviations from a univariate data set is the median. The R functions and script for doing everything are in another post.

The Fibonacci spiral approximates the golden spiral, a logarithmic spiral whose growth factor is the golden ratio.

Source: Dicklyon via Wikimedia

	Eric Cai - The Chemi… on Convert multiple variables bet…
	Jack on Convert multiple variables bet…
	Eric Cai - The Chemi… on Getting the names, types, form…
	Emily V on Getting the names, types, form…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Convert multiple variables bet…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Exploratory Data Analysis: Com…
	CK on Exploratory Data Analysis: Com…
	Eric Cai - The Chemi… on Video Tutorial: Breaking Down…

The Chemical Statistician

Mathematical Statistics Lesson of the Day – Basu’s Theorem

Eric’s Enlightenment for Thursday, May 14, 2015

Using the Golden Section Search Method to Minimize the Sum of Absolute Deviations

Introduction

Scripts and Functions: Using R to Implement the Golden Section Search Method for Numerical Optimization

The Golden Section Search Method: Modifying the Bisection Method with the Golden Ratio for Numerical Optimization

Introduction

Eric’s Twitter Feed (@chemstateric)

Recent Comments

Popular Topics

Recent Posts

About Eric

Blogs and Web Sites That I Like to Read

Archives

Categories