← Exploratory Data Analysis: Quantile-Quantile Plots for New York’s Ozone Pollution Data

A New Job at the British Columbia Centre for Excellence in HIV/AIDS →

Detecting Unfair Dice in Casinos with Bayes’ Theorem

October 30, 2013 1 Comment

Introduction

I saw an interesting problem that requires Bayes’ Theorem and some simple R programming while reading a bioinformatics textbook. I will discuss the math behind solving this problem in detail, and I will illustrate some very useful plotting functions to generate a plot from R that visualizes the solution effectively.

The Problem

The following question is a slightly modified version of Exercise #1.2 on Page 8 in “Biological Sequence Analysis” by Durbin, Eddy, Krogh and Mitchison.

An occasionally dishonest casino uses 2 types of dice. Of its dice, 97% are fair but 3% are unfair, and a “five” comes up 35% of the time for these unfair dice. If you pick a die randomly and roll it, how many “fives” in a row would you need to see before it was most likely that you had picked an unfair die?”

Read more to learn how to create the following plot and how it invokes Bayes’ Theorem to solve the above problem!

Translating the Problem into Math

To translate this problem into the language of probability,

let $R$ denote the outcome of rolling any die. The possible outcomes are ${1, 2, 3, 4 , 5, 6}$ .
let $D$ denote whether the die is fair or not. The possible outcomes are ${\text{Fair, Not Fair}}$ .
let $X$ denote the number of consecutive times a “five” is observed after rolling a die. The possible outcomes are all non-negative integers.

We know that

$P(\text{D = Fair}) = 0.97$
$P(\text{D = Unfair)} = 0.03$
$P(\text{R = 5} | \text{D = Fair}) = 1/6$
$P(\text{R = 5} | \text{D = Unfair}) = 0.35$

The die that you picked is either fair or unfair, regardless of how many times you roll the die. Thus, given that we observe $\text{X = x}$ consecutive fives, a sensible way to decide that a die is unfair is if the probability of the die being unfair given $\text{X = x}$ is higher than the probability of the die being fair given $\text{X = x}$ . Mathematically, we seek the value of $\text{x}$ such that

$P(\text{D = Unfair} | \text{X = x}) > 0.5. \ \ \ \ \ (1)$

To see why this equation formulates our problem accurately, let’s assume that it is true. Combining Equation $(1)$ and the fact that $D$ is a Bernoulli random variable,

$P(\text{D = Unfair} | \text{X = x}) = 1 - P(\text{D = Fair} | \text{X = x}) > 0.5$

$1 - 0.5 = 0.5 > P(\text{D = Fair} | \text{X = x})$

$P(\text{D = Unfair} | \text{X = x}) > 0.5 > P(\text{D = Fair} | \text{X = x})$

$P(\text{D = Unfair} | \text{X = x}) > P(\text{D = Fair} | \text{X = x})$

and this is the premise of the question.

Using Bayes’ Theorem

To calculate $P(\text{D = Unfair} | \text{X})$ , let’s use Bayes’ Theorem. I always “work out” Bayes’ Theorem by

stating the definition of conditional probability,
applying the law of total probability.

To apply the definition of conditional probability for our problem,

$P(\text{D = Unfair} | \text{X = x}) = P(\text{D = Unfair} \cap \text{X = x}) \ \div \ P(\text{X = x})$ .

Now, re-write the joint probability in the numerator by applying the definition of conditional probability again, but reversing the conditionality.

$P(\text{D = Unfair} | \text{X = x}) = P(\text{X = x} | \text{D = Unfair})P(\text{D = Unfair}) \ \div \ P(\text{X = x})$ .

Finally, apply the law of total probability to the denominator. (Because of the difficulties of writing LaTeX in WordPress, the entire denominator is written on the second line. Please excuse the awkward line placement.)

$P(\text{D = Unfair} | \text{X = x}) = P(\text{X = x} | \text{D = Unfair})P(\text{D = Unfair}) \ \div$

$[P(\text{X = x} | \text{D = Unfair})P(\text{D = Unfair}) + P(\text{X = x} | \text{D = Fair})P(\text{D = Fair})]$

By Equation $(1)$ , set 0.5 to be less than the above equation.

$0.5 < P(\text{X = x} | \text{D = Unfair})P(\text{D = Unfair}) \ \div$

$[P(\text{X = x} | \text{D = Unfair})P(\text{D = Unfair}) + P(\text{X = x} | \text{D = Fair})P(\text{D = Fair})] \ \ \ (2)$

Tthe probability of getting $\text{x}$ consecutive “fives” is the joint probability of getting $\text{x}$ individual “fives”, regardless of whether the die is fair or not. By the independence between rolls of the die, this probability is just the product of the marginal probabilities of getting $\text{x}$ individual “fives”. Since these marginal probabilities are the same, it is simply that marginal probability to the power of $\text{x}$ . Mathematically,

$P(\text{X = x} | \text{D = Fair}) = P(\text{R = 5} | \text{D = Fair})^\text{x}$

$P(\text{X = x} | \text{D = Unfair}) = P(\text{R = 5} | \text{D = Unfair})^\text{x}$

Plotting the Solution in R

To solve the problem, we need to find the minimum value of $\text{x}$ such that Inequality $(2)$ is true. This inequality cannot be solved analytically, so I computed the right-hand side of $(2)$ for various possible values of $\text{x}$ in R. Below is the resulting plot. Many functions were used to generate it.

plot() to generate the basic plot of points; note the use of the ifelse() function to set the unique colour for the point at $\text{X = 5}$
text() to add custom text with a green colour inside the plot
abline() to add a custom line with a brown colour
axis() to add custom tick marks
mtext() to add a custom tick label at 0.5 with a brown colour to the vertical axis

I encourage you to study the code carefully to learn all of the useful functions and build your own plot with custom axes, tick marks, and tick labels.

Notice that I needed to use the axis() function twice to add all of the tick marks that I wanted. For some reason, I could not add ‘0’ and ‘1’ among the first set of tick marks, but I could add them separately in a second calling of axis().

I also found a document by Tian Zheng from Columbia University that lists the names of various colours in R. It turns out that there are many more colours than the standard ones that I usually use from the colours of a rainbow!

As the plot shows, if you see 5 or more consecutive “fives”, you have reason to suspect that the die is unfair.

##### Detecting an Unfair Die with Bayes' Theorem
##### By Eric Cai - The Chemical Statistician

# set the possible values of the number of consecutive "fives" observed from rolling a die
x = 1:20

# calculate the probability of the die being unfair given the observed number of consecutive "fives"
prob.loaded = (0.35^x)*0.03/((0.35^x)*0.03 + ((1/6)^x)*0.97)

# export the plot as a PNG image file
png('INSERT YOUR DIRECTORY PATH HERE/unfair die plot.png')

# plot the calculated probabilities
# notice that the "yaxt = 'n'" option suppresses the verticle axis; I want to add my own custom axis later
plot(x, prob.loaded, col=ifelse(x == 5, "forestgreen", "black"), ylab = 'P(Die is Fair | X)', ylim = c(-0.05, 1.05), xlab = 'X - Number of Consecutive Fives Observed', yaxt='n', main = "Using Bayes' Theorem to Detect an Unfair Die")

# add a custom horizontal line to show where P(Die = Unfair | X = x) = 0.5
abline(0.5, 0, col = 'firebrick1')

# add a custom tick mark with the colour 'firebrick1'
# the label is left intentionally blank, because its colour can only be black
# use mtext() to set the tick label's colour
axis(2, at = c(0.5), col = 'firebrick1', labels = ' ')

# add a custom tick label at P(Die = Unfair | X = x) = 0.5
# distinguish it with the colour 'firebrick1'
mtext(text = '0.5', side = 2, line = 1, col = "firebrick1")

# add text to denote the point of interest
text(9, 0.55, 'P(Die is loaded | X = 5) = 0.5581', col = 'forestgreen')

# add some other tick marks to show the other values along the vertical axis
axis(2, at = c(0.1, 0.3, 0.7, 0.9), labels = c('0.1', '0.3', '0.7', '0.9'))
axis(2, at = c(0, 1), labels = c('0', '1'))

dev.off()

Reference

Durbin, R. (Ed.). (1998). Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge university press.

Filed under Data Visualization, Mathematics, Practical Applications of Chemistry, Probability, R programming, Statistics, Tutorials Tagged with abline(), axis(), Bayes' Theorem, Data Visualization, dev.off(), dice, die, mtext(), plot, plots, plotting, PNG, probability, R, R programming, statistics, text

One Response to Detecting Unfair Dice in Casinos with Bayes’ Theorem

nishant analyst says:

November 24, 2014 at 11:20 pm

Reblogged this on nishant@analyst.

Reply

	Eric Cai - The Chemi… on Convert multiple variables bet…
	Jack on Convert multiple variables bet…
	Eric Cai - The Chemi… on Getting the names, types, form…
	Emily V on Getting the names, types, form…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Convert multiple variables bet…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Exploratory Data Analysis: Com…
	CK on Exploratory Data Analysis: Com…
	Eric Cai - The Chemi… on Video Tutorial: Breaking Down…

The Chemical Statistician

Detecting Unfair Dice in Casinos with Bayes’ Theorem

Introduction

The Problem

Translating the Problem into Math

Using Bayes’ Theorem

Plotting the Solution in R

Reference

One Response to Detecting Unfair Dice in Casinos with Bayes’ Theorem

Your thoughtful comments are much appreciated! Cancel reply

Eric’s Twitter Feed (@chemstateric)

Recent Comments

Popular Topics

Recent Posts

About Eric

Blogs and Web Sites That I Like to Read

Archives

Categories

The Chemical Statistician

Detecting Unfair Dice in Casinos with Bayes’ Theorem

Introduction

The Problem

Translating the Problem into Math

Using Bayes’ Theorem

Plotting the Solution in R

Reference

Share this:

Related

One Response to Detecting Unfair Dice in Casinos with Bayes’ Theorem

Your thoughtful comments are much appreciated! Cancel reply

Eric’s Twitter Feed (@chemstateric)

Recent Comments

Popular Topics

Recent Posts

About Eric

Blogs and Web Sites That I Like to Read

Archives

Categories