## Benjamin Garden on Simple vs. Compound Interest in Finance – The Central Equilibrium – Episode 5

I am so pleased to publish this new episode of “The Central Equilibrium“, featuring Benjamin Garden.  He talked about simple and compound interest in the context of finance and investment, highlighting the power of compound interest to grow your money and to enlarge debt from credit cards.  We compared the formulas for calculating the accrued amounts under simple and compound interest, and we derived the formula for the Rule of 72, a short-cut to estimate the length of time needed to double your investment under compound interest.

Check out Ben’s blog, Twitter account (@GardenBenjamin), and Instagram account (@ben.garden) to get more advice about managing your money!

Part 1:

Part 2:

## Layne Newhouse on representing neural networks – The Central Equilibrium – Episode 4

I am excited to present the first of a multi-episode series on neural networks on my talk show, “The Central Equilibrium”.  My guest in this series in Layne Newhouse, and he talked about how to represent neural networks. We talked about the biological motivations behind neural networks, how to represent them in diagrams and mathematical equations, and a few of the common activation functions for neural networks.

Check it out!

## Video Tutorial – The Moment Generating Function of the Exponential Distribution

In this video tutorial on YouTube, I derive the moment generating function (MGF) of the exponential distribution.  Visit my YouTube channel to watch more video tutorials!

## Arnab Chakraborty on Bayes’ Theorem – The Central Equilibrium – Episode 3

Arnab Chakraborty kindly came to my new talk show, “The Central Equilibrium”, to talk about Bayes’ theorem.  He introduced the concept of conditional probability, stated Bayes’ theorem in its simple and general forms, and showed an example of how to use it in a calculation.

Check it out!

## Christopher Salahub on Markov Chains – The Central Equilibrium – Episode 2

It was a great pleasure to talk to Christopher Salahub about Markov chains in the second episode of my new talk show, The Central Equilibrium!  Chris graduated from the University of Waterloo with a Bachelor of Mathematics degree in statistics.  He just finished an internship in data development at Environics Analytics, and he is starting a Master’s program in statistics at ETH Zurich in Switzerland.

Chris recommends “Introduction to Probability Models” by Sheldon Ross to learn more about probability theory and Markov chains.

The Central Equilibrium is my new talk show about math, science, and economics. It focuses on technical topics that involve explanations with formulas, equations, graphs, and diagrams.  Stay tuned for more episodes in the coming weeks!

You can watch all of my videos on my YouTube channel!

Please watch the video on this blog.  You can also watch it directly on YouTube.

## Neil Seoni on the Fourier Transform and the Sampling Theorem – The Central Equilibrium – Episode 1

I am very excited to publish the very first episode of my new talk show, The Central Equilibrium!  My guest is Neil Seoni, an undergraduate student in electrical and computer engineering at Rice University in Houston, Texas. He has studied data science in his spare time, most notably taking a course on machine learning by Andrew Ng on Coursera. He is finishing his summer job as a Data Science Intern at Environics Analytics in Toronto, Ontario.

Neil recommends reading Don Johnson’s course notes from Rice University and his free text book to learn more about the topics covered in his episode.

The Central Equilibrium is my new talk show about math, science, and economics. It focuses on technical topics that involve explanations with formulas, equations, graphs, and diagrams.  Stay tuned for more episodes in the coming weeks!

You can watch all of my videos on my YouTube channel!

Please watch the video on this blog.  You can also watch it directly on YouTube.

## A Comprehensive Guide for Public Speaking at Scientific Conferences

### Introduction

I served as a judge for some of the student presentations at the 2016 Canadian Statistics Student Conference (CSSC).  The conference was both a learning opportunity and a networking opportunity for statistics students in Canada.  The presentations allowed the students to share their research and course projects with their peers, and it was a chance for them to get feedback about their work and learn new ideas from other students.

Unfortunately, I found most of the presentations to be very bad – not necessarily in terms of the content, but because of the delivery.  Although the students showed much earnestness and eagerness in sharing their work with others, most of them demonstrated poor competence in public speaking.

Public speaking is an important skill in knowledge-based industries, so these opportunities are valuable experiences for anybody to strengthen this skill.  You can learn it only by doing it many times, making mistakes, and learning from those mistakes.  Having delivered many presentations, learned from my share of mistakes, and received much praise for my seminars, I hope that the following tips will help anyone who presents at scientific conferences to improve their public-speaking skills.  In fact, most of these tips apply to public speaking in general.

I spoke at the 2016 Canadian Statistics Student Conference on career advice for students and new graduates in statistics.

Image courtesy of Peter Macdonald on Flickr.

## Mathematical Statistics Lesson of the Day – Basu’s Theorem

Today’s Statistics Lesson of the Day will discuss Basu’s theorem, which connects the previously discussed concepts of minimally sufficient statistics, complete statistics and ancillary statistics.  As before, I will begin with the following set-up.

Suppose that you collected data

$\mathbf{X} = X_1, X_2, ..., X_n$

in order to estimate a parameter $\theta$.  Let $f_\theta(x)$ be the probability density function (PDF) or probability mass function (PMF) for $X_1, X_2, ..., X_n$.

Let

$t = T(\mathbf{X})$

be a statistics based on $\textbf{X}$.

Basu’s theorem states that, if $T(\textbf{X})$ is a complete and minimal sufficient statistic, then $T(\textbf{X})$ is independent of every ancillary statistic.

Establishing the independence between 2 random variables can be very difficult if their joint distribution is hard to obtain.  This theorem allows the independence between minimally sufficient statistic and every ancillary statistic to be established without their joint distribution – and this is the great utility of Basu’s theorem.

However, establishing that a statistic is complete can be a difficult task.  In a later lesson, I will discuss another theorem that will make this task easier for certain cases.

## Mathematical Statistics Lesson of the Day – An Example of An Ancillary Statistic

Consider 2 random variables, $X_1$ and $X_2$, from the normal distribution $\text{Normal}(\mu, \sigma^2)$, where $\mu$ is unknown.  Then the statistic

$D = X_1 - X_2$

has the distribution

$\text{Normal}(0, 2\sigma^2)$.

The distribution of $D$ does not depend on $\mu$, so $D$ is an ancillary statistic for $\mu$.

Note that, if $\sigma^2$ is unknown, then $D$ is not ancillary for $\sigma^2$.

## Eric’s Enlightenment for Friday, May 8, 2015

1. A nice set of tutorials on Microsoft Excel at OfficeTuts by Tomasz Decker.
2. “We had proved that an assertion was indeed true in all of the difficult cases, but it turned out to be false in the simple case. We never bothered to check.”  Are mistakes in academic mathematics being effectively identified and corrected?  Vladimir Voevodsky (2002 Fields Medalist) published a major theorem in 1990, but Carlos Simpson found an error with the theorem in 1998.  It wasn’t until 2013 that Voevodsky finally became convinced that his theorem was wrong.  This motivated him to develop “proof assistants” – computer programs that help to prove mathematical theorems.
3. Synthesizing artificial muscles from gold-plated onion skins
4. Andrew Gelman debriefs his presentation to Princeton’s economics department about unbiasedness and econometrics.

## Mathematical Statistics Lesson of the Day – Minimally Sufficient Statistics

In using a statistic to estimate a parameter in a probability distribution, it is important to remember that there can be multiple sufficient statistics for the same parameter.  Indeed, the entire data set, $X_1, X_2, ..., X_n$, can be a sufficient statistic – it certainly contains all of the information that is needed to estimate the parameter.  However, using all $n$ variables is not very satisfying as a sufficient statistic, because it doesn’t reduce the information in any meaningful way – and a more compact, concise statistic is better than a complicated, multi-dimensional statistic.  If we can use a lower-dimensional statistic that still contains all necessary information for estimating the parameter, then we have truly reduced our data set without stripping any value from it.

Our saviour for this problem is a minimally sufficient statistic.  This is defined as a statistic, $T(\textbf{X})$, such that

1. $T(\textbf{X})$ is a sufficient statistic
2. if $U(\textbf{X})$ is any other sufficient statistic, then there exists a function $g$ such that

$T(\textbf{X}) = g[U(\textbf{X})].$

Note that, if there exists a one-to-one function $h$ such that

$T(\textbf{X}) = h[U(\textbf{X})],$

then $T(\textbf{X})$ and $U(\textbf{X})$ are equivalent.

## Mathematics and Mathematical Statistics Lesson of the Day – Convex Functions and Jensen’s Inequality

Consider a real-valued function $f(x)$ that is continuous on the interval $[x_1, x_2]$, where $x_1$ and $x_2$ are any 2 points in the domain of $f(x)$.  Let

$x_m = 0.5x_1 + 0.5x_2$

be the midpoint of $x_1$ and $x_2$.  Then, if

$f(x_m) \leq 0.5f(x_1) + 0.5f(x_2),$

then $f(x)$ is defined to be midpoint convex.

More generally, let’s consider any point within the interval $[x_1, x_2]$.  We can denote this arbitrary point as

$x_\lambda = \lambda x_1 + (1 - \lambda)x_2,$ where $0 < \lambda < 1$.

Then, if

$f(x_\lambda) \leq \lambda f(x_1) + (1 - \lambda) f(x_2),$

then $f(x)$ is defined to be convex.  If

$f(x_\lambda) < \lambda f(x_1) + (1 - \lambda) f(x_2),$

then $f(x)$ is defined to be strictly convex.

There is a very elegant and powerful relationship about convex functions in mathematics and in mathematical statistics called Jensen’s inequality.  It states that, for any random variable $Y$ with a finite expected value and for any convex function $g(y)$,

$E[g(Y)] \geq g[E(Y)]$.

A function $f(x)$ is defined to be concave if $-f(x)$ is convex.  Thus, Jensen’s inequality can also be stated for concave functions.  For any random variable $Z$ with a finite expected value and for any concave function $h(z)$,

$E[h(Z)] \leq h[E(Z)]$.

In future Statistics Lessons of the Day, I will prove Jensen’s inequality and discuss some of its implications in mathematical statistics.

## Mathematical Statistics Lesson of the Day – The Glivenko-Cantelli Theorem

In 2 earlier tutorials that focused on exploratory data analysis in statistics, I introduced

There is actually an elegant theorem that provides a rigorous basis for using empirical CDFs to estimate the true CDF – and this is true for any probability distribution.  It is called the Glivenko-Cantelli theorem, and here is what it states:

Given a sequence of $n$ independent and identically distributed random variables, $X_1, X_2, ..., X_n$,

$P[\lim_{n \to \infty} \sup_{x \epsilon \mathbb{R}} |\hat{F}_n(x) - F_X(x)| = 0] = 1.$

In other words, the empirical CDF of $X_1, X_2, ..., X_n$ converges uniformly to the true CDF.

My mathematical statistics professor at the University of Toronto, Keith Knight, told my class that this is often referred to as “The First Theorem of Statistics” or the “The Fundamental Theorem of Statistics”.  I think that this is a rather subjective title – the central limit theorem is likely more useful and important – but Page 261 of John Taylor’s An introduction to measure and probability (Springer, 1997) recognizes this attribution to the Glivenko-Cantelli theorem, too.

## Mathematical Statistics Lesson of the Day – Chebyshev’s Inequality

The variance of a random variable $X$ is just an expected value of a function of $X$.  Specifically,

$V(X) = E[(X - \mu)^2], \ \text{where} \ \mu = E(X)$.

Let’s substitute $(X - \mu)^2$ into Markov’s inequality and see what happens.  For convenience and without loss of generality, I will replace the constant $c$ with another constant, $b^2$.

$\text{Let} \ b^2 = c, \ b > 0. \ \ \text{Then,}$

$P[(X - \mu)^2 \geq b^2] \leq E[(X - \mu)^2] \div b^2$

$P[ (X - \mu) \leq -b \ \ \text{or} \ \ (X - \mu) \geq b] \leq V(X) \div b^2$

$P[|X - \mu| \geq b] \leq V(X) \div b^2$

Now, let’s substitute $b$ with $k \sigma$, where $\sigma$ is the standard deviation of $X$.  (I can make this substitution, because $\sigma$ is just another constant.)

$\text{Let} \ k \sigma = b. \ \ \text{Then,}$

$P[|X - \mu| \geq k \sigma] \leq V(X) \div k^2 \sigma^2$

$P[|X - \mu| \geq k \sigma] \leq 1 \div k^2$

This last inequality is known as Chebyshev’s inequality, and it is just a special version of Markov’s inequality.  In a later Statistics Lesson of the Day, I will discuss the motivation and intuition behind it.  (Hint: Read my earlier lesson on the motivation and intuition behind Markov’s inequality.)

## Mathematical Statistics Lesson of the Day – Markov’s Inequality

Markov’s inequality is an elegant and very useful inequality that relates the probability of an event concerning a non-negative random variable, $X$, with the expected value of $X$.  It states that

$P(X \geq c) \leq E(X) \div c,$

where $c > 0$.

I find Markov’s inequality to be beautiful for 2 reasons:

1. It applies to both continuous and discrete random variables.
2. It applies to any non-negative random variable from any distribution with a finite expected value.

In a later lesson, I will discuss the motivation and intuition behind Markov’s inequality, which has useful implications for understanding a data set.

## Mathematics and Applied Statistics Lesson of the Day – The Geometric Mean

Suppose that you invested in a stock 3 years ago, and the annual rates of return for each of the 3 years were

• 5% in the 1st year
• 10% in the 2nd year
• 15% in the 3rd year

What is the average rate of return in those 3 years?

It’s tempting to use the arithmetic mean, since we are so used to using it when trying to estimate the “centre” of our data.  However, the arithmetic mean is not appropriate in this case, because the annual rate of return implies a multiplicative growth of your investment by a factor of $1 + r$, where $r$ is the rate of return in each year.  In contrast, the arithmetic mean is appropriate for quantities that are additive in nature; for example, your average annual salary from the past 3 years is the sum of last 3 annual salaries divided by 3.

If the arithmetic mean is not appropriate, then what can we use instead?  Our saviour is the geometric mean, $G$.  The average factor of growth from the 3 years is

$G = [(1 + r_1)(1 + r_2) ... (1 + r_n)]^{1/n}$,

where $r_i$ is the rate of return in year $i$, $i = 1, 2, 3, ..., n$.  The average annual rate of return is $G - 1$.  Note that the geometric mean is NOT applied to the annual rates of return, but the annual factors of growth.

Returning to our example, our average factor of growth is

$G = [(1 + 0.05) \times (1 + 0.10) \times (1 + 0.15)]^{1/3} = 1.099242$.

Thus, our annual rate of return is $G - 1 = 1.099242 - 1 = 0.099242 = 9.9242\%$.

Here is a good way to think about the difference between the arithmetic mean and the geometric mean.  Suppose that there are 2 sets of numbers.

1. The first set, $S_1$, consists of your data $x_1, x_2, ..., x_n$, and this set has a sample size of $n$.
2. The second, $S_2$,  set also has a sample size of $n$, but all $n$ values are the same – let’s call this common value $y$.
• What number must $y$ be such that the sums in $S_1$ and $S_2$ are equal?  This value of $y$ is the arithmetic mean of the first set.
• What number must $y$ be such that the products in $S_1$ and $S_2$ are equal?  This value of $y$ is the geometric mean of the first set.

Note that the geometric means is only applicable to positive numbers.

## Mathematics and Applied Statistics Lesson of the Day – The Weighted Harmonic Mean

In a previous Statistics Lesson of the Day on the harmonic mean, I used an example of a car travelling at 2 different speeds – 60 km/hr and 40 km/hr.  In that example, the car travelled 120 km at both speeds, so the 2 speeds had equal weight in calculating the harmonic mean of the speeds.

What if the cars travelled different distances at those speeds?  In that case, we can modify the calculation to allow the weight of each datum to be different.  This results in the weighted harmonic mean, which has the formula

$H = \sum_{i = 1}^{n} w_i \ \ \div \ \ \sum_{i = 1}^{n}(w_i \ \div \ x_i)$.

For example, consider a car travelling for 240 kilometres at 2 different speeds and for 2 different distances:

1. 60 km/hr for 100 km
2. 40 km/hr for another 140 km

Then the weighted harmonic mean of the speeds (i.e. the average speed of the whole trip) is

$(100 \text{ km} \ + \ 140 \text{ km}) \ \div \ [(100 \text{ km} \ \div \ 60 \text{ km/hr}) \ + \ (140 \text{ km} \ \div \ 40 \text{ km/hr})]$

$= 46.45 \text{ km/hr}$

Notice that this is exactly the same calculation that we would use if we wanted to calculate the average speed of the whole trip by the formula from kinematics:

$\text{Average Speed} = \Delta \text{Distance} \div \Delta \text{Time}$

## Mathematics and Applied Statistics Lesson of the Day – The Harmonic Mean

The harmonic mean, H, for $n$ positive real numbers $x_1, x_2, ..., x_n$ is defined as

$H = n \div (1/x_1 + 1/x_2 + .. + 1/x_n) = n \div \sum_{i = 1}^{n}x_i^{-1}$.

This type of mean is useful for measuring the average of rates.  For example, consider a car travelling for 240 kilometres at 2 different speeds:

1. 60 km/hr for 120 km
2. 40 km/hr for another 120 km

Then its average speed for this trip is

$S_{avg} = 2 \div (1/60 + 1/40) = 48 \text{ km/hr}$

Notice that the speed for the 2 trips have equal weight in the calculation of the harmonic mean – this is valid because of the equal distance travelled at the 2 speeds.  If the distances were not equal, then use a weighted harmonic mean instead – I will cover this in a later lesson.

To confirm the formulaic calculation above, let’s use the definition of average speed from physics.  The average speed is defined as

$S_{avg} = \Delta \text{distance} \div \Delta \text{time}$

We already have the elapsed distance – it’s 240 km.  Let’s find the time elapsed for this trip.

$\Delta \text{ time} = 120 \text{ km} \times (1 \text{ hr}/60 \text{ km}) + 120 \text{ km} \times (1 \text{ hr}/40 \text{ km})$

$\Delta \text{time} = 5 \text{ hours}$

Thus,

$S_{avg} = 240 \text{ km} \div 5 \text{ hours} = 48 \text { km/hr}$

Notice that this explicit calculation of the average speed by the definition from kinematics is the same as the average speed that we calculated from the harmonic mean!

## Video Tutorial – Useful Relationships Between Any Pair of h(t), f(t) and S(t)

I first started my video tutorial series on survival analysis by defining the hazard function.  I then explained how this definition leads to the elegant relationship of

$h(t) = f(t) \div S(t)$.

In my new video, I derive 6 useful mathematical relationships that exist between any 2 of the 3 quantities in the above equation.  Each relationship allows one quantity to be written as a function of the other.

I am excited to continue adding to my Youtube channel‘s collection of video tutorials.  Please stay tuned for more!

## Video Tutorial – The Hazard Function is the Probability Density Function Divided by the Survival Function

In an earlier video, I introduced the definition of the hazard function and broke it down into its mathematical components.  Recall that the definition of the hazard function for events defined on a continuous time scale is

$h(t) = \lim_{\Delta t \rightarrow 0} [P(t < X \leq t + \Delta t \ | \ X > t) \ \div \ \Delta t]$.

Did you know that the hazard function can be expressed as the probability density function (PDF) divided by the survival function?

$h(t) = f(t) \div S(t)$

In my new Youtube video, I prove how this relationship can be obtained from the definition of the hazard function!  I am very excited to post this second video in my new Youtube channel.