## Trapezoidal Integration – Conceptual Foundations and a Statistical Application in R

#### Introduction

Today, I will begin a series of posts on numerical integration, which has a wide range of applications in many fields, including statistics.  I will introduce trapezoidal integration by discussing its conceptual foundations, write my own R function to implement trapezoidal integration, and use it to check that the Beta(2, 5) probability density function actually integrates to 1 over its support set.  Fully commented and readily usable R code will be provided at the end.

Given a probability density function (PDF) and its support set as vectors in an array programming language like R, how do you integrate the PDF over its support set to ensure that it equals to 1?  Read the rest of this post to view my own R function to implement trapezoidal integration and learn how to use it to numerically approximate integrals.

## Using the Golden Section Search Method to Minimize the Sum of Absolute Deviations

#### Introduction

Recently, I introduced the golden search method – a special way to save computation time by modifying the bisection method with the golden ratio – and I illustrated how to minimize a cusped function with this script.  I also wrote an R function to implement this method and an R script to apply this method with an example.  Today, I will use apply this method to a statistical topic: minimizing the sum of absolute deviations with the median.

While reading Page 148 (Section 6.3) in Michael Trosset’s “An Introduction to Statistical Inference and Its Applications”, I learned 2 basic, simple, yet interesting theorems.

If X is a random variable with a population mean $\mu$ and a population median $q_2$, then

a) $\mu$ minimizes the function $f(c) = E[(X - c)^2]$

b) $q_2$ minimizes the function $h(c) = E(|X - c|)$

I won’t prove these theorems in this blog post (perhaps later), but I want to use the golden section search method to show a result similar to b):

c) The sample median, $\tilde{m}$, minimizes the function

$g(c) = \sum_{i=1}^{n} |X_i - c|$.

This is not surprising, of course, since

$|X - c|$ is just a function of the random variable $X$

– by the law of large numbers,

$\lim_{n\to \infty}\sum_{i=1}^{n} |X_i - c| = E(|X - c|)$

Thus, if the median minimizes $E(|X - c|)$, then, intuitively, it minimizes $\lim_{n\to \infty}\sum_{i=1}^{n} |X_i - c|$.  Let’s show this with the golden section search method, and let’s explore any differences that may arise between odd-numbered and even-numbered data sets.

## Scripts and Functions: Using R to Implement the Golden Section Search Method for Numerical Optimization

In an earlier post, I introduced the golden section search method – a modification of the bisection method for numerical optimization that saves computation time by using the golden ratio to set its test points.  This post contains the R function that implements this method, the R functions that contain the 3 functions that were minimized by this method, and the R script that ran the minimization.

I learned some new R functions while learning this new algorithm.

– the curve() function for plotting curves

– the cat() function for concatenating strings and variables and, hence, for printing debugging statements

## The Golden Section Search Method: Modifying the Bisection Method with the Golden Ratio for Numerical Optimization

#### Introduction

The first algorithm that I learned for root-finding in my undergraduate numerical analysis class (MACM 316 at Simon Fraser University) was the bisection method.  It’s very intuitive and easy to implement in any programming language (I was using MATLAB at the time).  The bisection method can be easily adapted for optimizing 1-dimensional functions with a slight but intuitive modification.  As there are numerous books and web sites on the bisection method, I will not dwell on it in my blog post.

Instead, I will explain a clever and elegant way to modify the bisection method with the golden ratio that results in faster computation; I learned this method while reading “A First Course in Statistical Programming with R” by John Braun and Duncan Murdoch.  Using a script in R to implement this special algorithm, I will illustrate how to minimize a non-differentiable function with the golden section search method.  In a later post (for the sake of brevity), I will use the same method to show that the minimizer of the sum of the absolute deviations from a univariate data set is the median.  The R functions and script for doing everything are in another post.

The Fibonacci spiral approximates the golden spiral, a logarithmic spiral whose growth factor is the golden ratio.

Source: Dicklyon via Wikimedia