## Video Tutorial: Naive Bayes Classifiers

Naive Bayes classifiers are simple but powerful tools for classification in statistics and machine learning.  In this video tutorial, I use a simulated data set and illustrate the mathematical details of how this technique works.

In my recent episode on The Central Equilibrium about word embeddings and text classification, Mandy Gu used naive Bayes classifiers to determine if a sentence is toxic or non-toxic – a very common objective when moderating discussions in online forums.  If you are not familiar with naive Bayes classifiers, then I encourage you to watch this video first before watching Mandy’s episode on The Central Equilibrium.

## Mandy Gu on Word Embeddings and Text Classification – The Central Equilibrium – Episode 9

I am so grateful to Mandy Gu for being a guest on The Central Equilibrium to talk about word embeddings and text classification.  She began by showing how data from text can be encoded in vectors and matrices, and then she used a naive Bayes classifier to classify sentences as toxic or non-toxic – a very common problem for moderating discussions in online forums.  I learned a lot from her in this episode, and you can learn more from Mandy on her Medium blog.

If you are not familiar with naive Bayes classifiers, then I encourage you to watch my video tutorial about this topic first.

## Mitchell Boggs on Game Theory in Behavioural Ecology – The Central Equilibrium – Episode 8

Mitchell Boggs kindly talked about game theory in behavioural ecology on my talk show, “The Central Equilibrium”!  He talked about 2 key examples:

• when animals choose to share or fight for food
• when parents choose to care for their offspring or seek new mates to produce more offspring

These examples illustrate why seemingly disadvantageous behaviours can persist or even dominate in the animal kingdom.

Mitch recommends a book called “Are We Smart Enough to Know How Smart Animals Are?” by Frans de Waal.

Thanks for being such a great guest, Mitchell!

## David Veitch on Rational vs. Irrational Numbers and Countability – The Central Equilibrium – Episode 7

I am so grateful that David Veitch appeared on my talk show, “The Central Equilibrium“, to talk about rational vs. irrational numbers.  While defining irrational numbers, he proved that $\sqrt{2}$ is an irrational number.  He then talked about the concept of bijections while defining countability, and he showed that rational numbers are countable.

David used to work as a bond trader for Bank of America.  He writes a personal blog, and you can follow him on Twitter (@daveveitch).  He recently earned admission into the Master of Science program in statistics at the University of Toronto, and he will begin that program soon.  Congratulations, David!  Thanks for being a guest on my show!

Part 1

Part 2

## Arnab Chakraborty on The Monty Hall Problem and Bayes’ Theorem – The Central Equilibrium – Episode 6

I am pleased to welcome Arnab Chakraborty back to my talk show, “The Central Equilibrium“, to talk about the Monty Hall Problem and Bayes’ theorem.  In this episode, he shows 2 solutions to this classic puzzle in probability, and invokes Bayes’ Theorem for the second solution.

If you have not watched Arnab’s first episode on Bayes’ theorem, then I encourage you to do that first.

Marilyn Vos Savant provided a solution to this problem in PARADE Magazine in 1990-1991.  Thousands of readers disagreed with her solution and criticized her vehemently (and incorrectly) for her error.  Some of these critics were mathematicians!  She included some of those replies and provided alternative perspectives that led to the same conclusion.  Although I am dismayed by the disrespect that some people showed in their letters to her, I am glad that a magazine column on probability was able to attract so much readership and interest.  Arnab and I referred to one of her solutions in our episode.  Thank you, Marilyn!

Enjoy this episode of “The Central Equilibrium“!

## Benjamin Garden on Simple vs. Compound Interest in Finance – The Central Equilibrium – Episode 5

I am so pleased to publish this new episode of “The Central Equilibrium“, featuring Benjamin Garden.  He talked about simple and compound interest in the context of finance and investment, highlighting the power of compound interest to grow your money and to enlarge debt from credit cards.  We compared the formulas for calculating the accrued amounts under simple and compound interest, and we derived the formula for the Rule of 72, a short-cut to estimate the length of time needed to double your investment under compound interest.

Part 1:

Part 2:

## Layne Newhouse on representing neural networks – The Central Equilibrium – Episode 4

I am excited to present the first of a multi-episode series on neural networks on my talk show, “The Central Equilibrium”.  My guest in this series in Layne Newhouse, and he talked about how to represent neural networks. We talked about the biological motivations behind neural networks, how to represent them in diagrams and mathematical equations, and a few of the common activation functions for neural networks.

Check it out!

## Video Tutorial – Obtaining the Expected Value of the Exponential Distribution Using the Moment Generating Function

In this video tutorial on YouTube, I use the exponential distribution’s moment generating function (MGF) to obtain the expected value of this distribution.  Visit my YouTube channel to watch more video tutorials!

## Video Tutorial – The Moment Generating Function of the Exponential Distribution

In this video tutorial on YouTube, I derive the moment generating function (MGF) of the exponential distribution.  Visit my YouTube channel to watch more video tutorials!

## Arnab Chakraborty on Bayes’ Theorem – The Central Equilibrium – Episode 3

Arnab Chakraborty kindly came to my new talk show, “The Central Equilibrium”, to talk about Bayes’ theorem.  He introduced the concept of conditional probability, stated Bayes’ theorem in its simple and general forms, and showed an example of how to use it in a calculation.

Check it out!

## Christopher Salahub on Markov Chains – The Central Equilibrium – Episode 2

It was a great pleasure to talk to Christopher Salahub about Markov chains in the second episode of my new talk show, The Central Equilibrium!  Chris graduated from the University of Waterloo with a Bachelor of Mathematics degree in statistics.  He just finished an internship in data development at Environics Analytics, and he is starting a Master’s program in statistics at ETH Zurich in Switzerland.

Chris recommends “Introduction to Probability Models” by Sheldon Ross to learn more about probability theory and Markov chains.

The Central Equilibrium is my new talk show about math, science, and economics. It focuses on technical topics that involve explanations with formulas, equations, graphs, and diagrams.  Stay tuned for more episodes in the coming weeks!

You can watch all of my videos on my YouTube channel!

Please watch the video on this blog.  You can also watch it directly on YouTube.

## Neil Seoni on the Fourier Transform and the Sampling Theorem – The Central Equilibrium – Episode 1

I am very excited to publish the very first episode of my new talk show, The Central Equilibrium!  My guest is Neil Seoni, an undergraduate student in electrical and computer engineering at Rice University in Houston, Texas. He has studied data science in his spare time, most notably taking a course on machine learning by Andrew Ng on Coursera. He is finishing his summer job as a Data Science Intern at Environics Analytics in Toronto, Ontario.

Neil recommends reading Don Johnson’s course notes from Rice University and his free text book to learn more about the topics covered in his episode.

The Central Equilibrium is my new talk show about math, science, and economics. It focuses on technical topics that involve explanations with formulas, equations, graphs, and diagrams.  Stay tuned for more episodes in the coming weeks!

You can watch all of my videos on my YouTube channel!

Please watch the video on this blog.  You can also watch it directly on YouTube.

## Video Tutorial – Calculating Expected Counts in a Contingency Table Using Joint Probabilities

In an earlier video, I showed how to calculate expected counts in a contingency table using marginal proportions and totals.  (Recall that expected counts are needed to conduct hypothesis tests of independence between categorical random variables.)  Today, I want to share a second video of calculating expected counts – this time, using joint probabilities.  This method uses the definition of independence between 2 random variables to form estimators of the joint probabilities for each cell in the contingency table.  Once the joint probabilities are estimated, the expected counts are simply the joint probabilities multipled by the grand total of the entire sample.  This method gives a more direct and deeper connection between the null hypothesis of a test of independence and the calculation of expected counts.

I encourage you to watch both of my videos on expected counts in my YouTube channel to gain a deeper understanding of how and why they can be calculated.  Please note that the expected counts are slightly different in the 2 videos due to round-off error; if you want to be convinced about this, I encourage you to do the calculations in the 2 different orders as I presented in the 2 videos – you will eventually see where the differences arise.

## Video Tutorial – Allelic Frequencies Remain Constant From Generation to Generation Under the Hardy-Weinberg Equilibrium

The Hardy-Weinberg law is a fundamental principle in statistical genetics.  If its 7 assumptions are fulfilled, then it predicts that the allelic frequency of a genetic trait will remain constant from generation to generation.  In this new video tutorial in my Youtube channel, I explain the math behind the Hardy-Weinberg theorem.  In particular, I clarify the origin of the connection between allelic frequencies and genotyopic frequencies in the second generation – I have not found a single textbook or web site on this topic that explains this calculation, so I hope that my explanation is helpful to you.

## Video Tutorial – Calculating Expected Counts in Contingency Tables Using Marginal Proportions and Marginal Totals

A common task in statistics and biostatistics is performing hypothesis tests of independence between 2 categorical random variables.  The data for such tests are best organized in contingency tables, which allow expected counts to be calculated easily.  In this video tutorial in my Youtube channel, I demonstrate how to calculate expected counts using marginal proportions and marginal totals.  In a later video, I will introduce a second method for calculating expected counts using joint probabilities and marginal probabilities.

In a later tutorial, I will illustrate how to implement the chi-squared test of independence on the same data set in R and SAS – stay tuned!

## Video Tutorial – Useful Relationships Between Any Pair of h(t), f(t) and S(t)

I first started my video tutorial series on survival analysis by defining the hazard function.  I then explained how this definition leads to the elegant relationship of $h(t) = f(t) \div S(t)$.

In my new video, I derive 6 useful mathematical relationships that exist between any 2 of the 3 quantities in the above equation.  Each relationship allows one quantity to be written as a function of the other.

I am excited to continue adding to my Youtube channel‘s collection of video tutorials.  Please stay tuned for more!

## Video Tutorial – Rolling 2 Dice: An Intuitive Explanation of The Central Limit Theorem

According to the central limit theorem, if

• $n$ random variables, $X_1, ..., X_n$, are independent and identically distributed,
• $n$ is sufficiently large,

then the distribution of their sample mean, $\bar{X_n}$, is approximately normal, and this approximation is better as $n$ increases.

One of the most remarkable aspects of the central limit theorem (CLT) is its validity for any parent distribution of $X_1, ..., X_n$.  In my new Youtube channel, you will find a video tutorial that provides an intuitive explanation of why this is true by considering a thought experiment of rolling 2 dice.  This video focuses on the intuition rather than the mathematics of the CLT.  In a later video, I will discuss the technical details of the CLT and how it applies to this example.

## Video Tutorial – The Hazard Function is the Probability Density Function Divided by the Survival Function

In an earlier video, I introduced the definition of the hazard function and broke it down into its mathematical components.  Recall that the definition of the hazard function for events defined on a continuous time scale is $h(t) = \lim_{\Delta t \rightarrow 0} [P(t < X \leq t + \Delta t \ | \ X > t) \ \div \ \Delta t]$.

Did you know that the hazard function can be expressed as the probability density function (PDF) divided by the survival function? $h(t) = f(t) \div S(t)$

In my new Youtube video, I prove how this relationship can be obtained from the definition of the hazard function!  I am very excited to post this second video in my new Youtube channel.

## Video Tutorial: Breaking Down the Definition of the Hazard Function

The hazard function is a fundamental quantity in survival analysis.  For an event occurring at some time on a continuous time scale, the hazard function, $h(t)$, for that event is defined as $h(t) = \lim_{\Delta t \rightarrow 0} [P(t < X \leq t + \Delta t \ | \ X > t) \ \div \ \Delta t]$,

where

• $t$ is the time,
• $X$ is the time of the occurrence of the event.

However, what does this actually mean?  In this Youtube video, I break down the mathematics of this definition into its individual components and explain the intuition behind each component.

I am very excited about the release of this first video in my new Youtube channel!  This is yet another mode of expansion of The Chemical Statistician since the beginning of 2014.  As always, your comments are most appreciated!