Video Tutorial: Naive Bayes Classifiers

Naive Bayes classifiers are simple but powerful tools for classification in statistics and machine learning.  In this video tutorial, I use a simulated data set and illustrate the mathematical details of how this technique works.

In my recent episode on The Central Equilibrium about word embeddings and text classification, Mandy Gu used naive Bayes classifiers to determine if a sentence is toxic or non-toxic – a very common objective when moderating discussions in online forums.  If you are not familiar with naive Bayes classifiers, then I encourage you to watch this video first before watching Mandy’s episode on The Central Equilibrium.

Mandy Gu on Word Embeddings and Text Classification – The Central Equilibrium – Episode 9

I am so grateful to Mandy Gu for being a guest on The Central Equilibrium to talk about word embeddings and text classification.  She began by showing how data from text can be encoded in vectors and matrices, and then she used a naive Bayes classifier to classify sentences as toxic or non-toxic – a very common problem for moderating discussions in online forums.  I learned a lot from her in this episode, and you can learn more from Mandy on her Medium blog.

If you are not familiar with naive Bayes classifiers, then I encourage you to watch my video tutorial about this topic first.

Christopher Salahub on Markov Chains – The Central Equilibrium – Episode 2

It was a great pleasure to talk to Christopher Salahub about Markov chains in the second episode of my new talk show, The Central Equilibrium!  Chris graduated from the University of Waterloo with a Bachelor of Mathematics degree in statistics.  He just finished an internship in data development at Environics Analytics, and he is starting a Master’s program in statistics at ETH Zurich in Switzerland.

Chris recommends “Introduction to Probability Models” by Sheldon Ross to learn more about probability theory and Markov chains.

The Central Equilibrium is my new talk show about math, science, and economics. It focuses on technical topics that involve explanations with formulas, equations, graphs, and diagrams.  Stay tuned for more episodes in the coming weeks!

You can watch all of my videos on my YouTube channel!

Please watch the video on this blog.  You can also watch it directly on YouTube.

Neil Seoni on the Fourier Transform and the Sampling Theorem – The Central Equilibrium – Episode 1

I am very excited to publish the very first episode of my new talk show, The Central Equilibrium!  My guest is Neil Seoni, an undergraduate student in electrical and computer engineering at Rice University in Houston, Texas. He has studied data science in his spare time, most notably taking a course on machine learning by Andrew Ng on Coursera. He is finishing his summer job as a Data Science Intern at Environics Analytics in Toronto, Ontario.

Neil recommends reading Don Johnson’s course notes from Rice University and his free text book to learn more about the topics covered in his episode.

The Central Equilibrium is my new talk show about math, science, and economics. It focuses on technical topics that involve explanations with formulas, equations, graphs, and diagrams.  Stay tuned for more episodes in the coming weeks!

You can watch all of my videos on my YouTube channel!

Please watch the video on this blog.  You can also watch it directly on YouTube.

Rectangular Integration (a.k.a. The Midpoint Rule) – Conceptual Foundations and a Statistical Application in R

Introduction

Continuing on the recently born series on numerical integration, this post will introduce rectangular integration.  I will describe the concept behind rectangular integration, show a function in R for how to do it, and use it to check that the $Beta(2, 5)$ distribution actually integrates to 1 over its support set.  This post follows from my previous post on trapezoidal integration.

Image courtesy of Qef from

Conceptual Background of Rectangular Integration (a.k.a. The Midpoint Rule)

Rectangular integration is a numerical integration technique that approximates the integral of a function with a rectangle.  It uses rectangles to approximate the area under the curve.  Here are its features:

• The rectangle’s width is determined by the interval of integration.
• One rectangle could span the width of the interval of integration and approximate the entire integral.
• Alternatively, the interval of integration could be sub-divided into $n$ smaller intervals of equal lengths, and $n$ rectangles would used to approximate the integral; each smaller rectangle has the width of the smaller interval.
• The rectangle’s height is the function’s value at the midpoint of its base.
• Within a fixed interval of integration, the approximation becomes more accurate as more rectangles are used; each rectangle becomes narrower, and the height of the rectangle better captures the values of the function within that interval.

Trapezoidal Integration – Conceptual Foundations and a Statistical Application in R

Introduction

Today, I will begin a series of posts on numerical integration, which has a wide range of applications in many fields, including statistics.  I will introduce trapezoidal integration by discussing its conceptual foundations, write my own R function to implement trapezoidal integration, and use it to check that the Beta(2, 5) probability density function actually integrates to 1 over its support set.  Fully commented and readily usable R code will be provided at the end.

Given a probability density function (PDF) and its support set as vectors in an array programming language like R, how do you integrate the PDF over its support set to ensure that it equals to 1?  Read the rest of this post to view my own R function to implement trapezoidal integration and learn how to use it to numerically approximate integrals.

Using the Golden Section Search Method to Minimize the Sum of Absolute Deviations

Introduction

Recently, I introduced the golden search method – a special way to save computation time by modifying the bisection method with the golden ratio – and I illustrated how to minimize a cusped function with this script.  I also wrote an R function to implement this method and an R script to apply this method with an example.  Today, I will use apply this method to a statistical topic: minimizing the sum of absolute deviations with the median.

While reading Page 148 (Section 6.3) in Michael Trosset’s “An Introduction to Statistical Inference and Its Applications”, I learned 2 basic, simple, yet interesting theorems.

If X is a random variable with a population mean $\mu$ and a population median $q_2$, then

a) $\mu$ minimizes the function $f(c) = E[(X - c)^2]$

b) $q_2$ minimizes the function $h(c) = E(|X - c|)$

I won’t prove these theorems in this blog post (perhaps later), but I want to use the golden section search method to show a result similar to b):

c) The sample median, $\tilde{m}$, minimizes the function

$g(c) = \sum_{i=1}^{n} |X_i - c|$.

This is not surprising, of course, since

$|X - c|$ is just a function of the random variable $X$

– by the law of large numbers,

$\lim_{n\to \infty}\sum_{i=1}^{n} |X_i - c| = E(|X - c|)$

Thus, if the median minimizes $E(|X - c|)$, then, intuitively, it minimizes $\lim_{n\to \infty}\sum_{i=1}^{n} |X_i - c|$.  Let’s show this with the golden section search method, and let’s explore any differences that may arise between odd-numbered and even-numbered data sets.

Scripts and Functions: Using R to Implement the Golden Section Search Method for Numerical Optimization

In an earlier post, I introduced the golden section search method – a modification of the bisection method for numerical optimization that saves computation time by using the golden ratio to set its test points.  This post contains the R function that implements this method, the R functions that contain the 3 functions that were minimized by this method, and the R script that ran the minimization.

I learned some new R functions while learning this new algorithm.

– the curve() function for plotting curves

– the cat() function for concatenating strings and variables and, hence, for printing debugging statements

The Golden Section Search Method: Modifying the Bisection Method with the Golden Ratio for Numerical Optimization

Introduction

The first algorithm that I learned for root-finding in my undergraduate numerical analysis class (MACM 316 at Simon Fraser University) was the bisection method.  It’s very intuitive and easy to implement in any programming language (I was using MATLAB at the time).  The bisection method can be easily adapted for optimizing 1-dimensional functions with a slight but intuitive modification.  As there are numerous books and web sites on the bisection method, I will not dwell on it in my blog post.

Instead, I will explain a clever and elegant way to modify the bisection method with the golden ratio that results in faster computation; I learned this method while reading “A First Course in Statistical Programming with R” by John Braun and Duncan Murdoch.  Using a script in R to implement this special algorithm, I will illustrate how to minimize a non-differentiable function with the golden section search method.  In a later post (for the sake of brevity), I will use the same method to show that the minimizer of the sum of the absolute deviations from a univariate data set is the median.  The R functions and script for doing everything are in another post.

The Fibonacci spiral approximates the golden spiral, a logarithmic spiral whose growth factor is the golden ratio.

Source: Dicklyon via Wikimedia