# Checking the Goodness of Fit of the Poisson Distribution in R for Alpha Decay by Americium-241

#### Introduction

Today, I will discuss the alpha decay of americium-241 and use R to model the number of emissions from a real data set with the Poisson distribution.  I was especially intrigued in learning about the use of Am-241 in smoke detectors, and I will elaborate on this clever application.  I will then use the Pearson chi-squared test to check the goodness of fit of my model.  The R script for the full analysis is given at the end of the post; there is a particularly useful code for superscripting the mass number of a chemical isotope in the title of a plot.  While there are many examples of superscripts in plot titles and axes that can be found on the web, none showed how to put the superscript before a text.  I hope that this and other tricks in this script are of use to you.

Smoke Detector with Americium-241

#### Americium-241: Detecting Smoke Through Alpha Decay

Americium-241 is a radioactive isotope that is commonly found in nuclear waste.  (Read my very first blog post to learn what an isotope is.)  Roughly 1% of spent nuclear fuel is plutonium, 12% of which is plutonium-241; this radioactive isotope has a half-life of 14 years, and it decays by beta emission to form americium-241.  (Read my recent blog post about how to estimate the half-life of a chemical species that decays exponentially using log-transformed ordinary least-squares regression.)   Americium-241 has a half-life of 432 years, and it decays by emitting an alpha particle and gamma radiation, losing 2 protons and 2 neutrons to become neptunium-237.  (An alpha particle is really just a helium nucleus with 2 neutrons.)

241Am   →   237Np   +   4He

I was surprised to learn that americium-241 is key to the proper functioning of smoke detectors through its radioactivity.  Smoke detectors have a small amount of americium-241 in the form of americium dioxide.  As Am-241 slowly undergoes alpha decay, the alpha particles collide with the nitrogen and oxygen molecules to form ions in the air inside the detector’s ionization chamber.  These ions are collected by the chamber via a low-level electric voltage that is applied across the chamber, resulting in a small but steady electric current.

If smoke enters the smoke detector’s ionization chamber, the smoke would attach onto the ions and neutralize them.  This decreases the number of ions being collected by the chamber and, hence, decreases the electric current.  The smoke detector’s alarm is activated upon detecting this decrease of the electric current.  (I thank the World Nuclear Association for enlightening me on this subject.)

#### Modelling Alpha Decay with the Poisson Distribution

In the 1988 edition of “Mathematical Statistics and Data Analysis” by John Rice, an example data set of alpha decay of Am-241 was presented in Section 8.2 on Page 223.  The full data set is generously provided by the Virtual Laboratories in Probability and Statistics, a project by the Department of Mathematical Sciences at the University of Alabama in Huntsville.

There were 1,207 time intervals in this study, each with a length of 10 seconds.  The number of alpha particles emitted in each time interval was recorded; this number ranged from 0 to 19.  The second column in the following data table shows the observed counts for each number of alpha emissions.  For example, there was 1 interval with 0 emissions, 4 intervals with 1 emission, and 13 intervals with 2 emissions.

     Original Data - Alpha Decay of Americium-241 (Berkson, 1966)
Emissions Observed Counts Expected Counts
[1,]         0               1            0.27
[2,]         1               4            2.30
[3,]         2              13            9.63
[4,]         3              28           26.95
[5,]         4              56           56.54
[6,]         5             105           94.90
[7,]         6             126          132.73
[8,]         7             146          159.12
[9,]         8             164          166.92
[10,]         9             161          155.64
[11,]        10             123          130.62
[12,]        11             101           99.65
[13,]        12              74           69.69
[14,]        13              53           44.99
[15,]        14              23           26.97
[16,]        15              15           15.09
[17,]        16               9            7.91
[18,]        17               3            3.91
[19,]        18               1            1.82
[20,]        19               1            0.80

In his textbook, Rice wrote that the mean emission rate was calculated by dividing the total number of emissions by the total time, obtaining 0.8392 emissions per second.  I checked this rate from the data using the following steps:

– I multiplied the first column by the second column element-wise and then adding the products together (i.e. a dot product of the 2 vectors), obtaining 10,099 emissions in total.

– I then multiplied 1,207 by 10 seconds to get the total time in seconds, obtaining 12,070 seconds.

– Finally, I divided 10,099 emissions by 12,070 seconds, obtaining 0.8367 emissions per second.

Thus, I’m puzzled by how Rice (or perhaps Berkson) obtained their mean emission rate.

Assuming that Rice is correct, we can model the number of emissions using the Poisson distribution.  Using the probability mass function (PMF), I calculated the probability of observing each number of emissions, then multiplied it by the total number of intervals (1,207) to calculate the number of intervals in which I expected to contain each number of emissions.  This column is the 3rd column in the above table, titled “Expected Counts”.

$f_X(x) = \frac{\lambda^x exp(-\lambda)}{x!}, \ x = 0, 1, 2, ..., 19$

$\text{Expected Count} = 1207f_X(x)$

Since the chi-squared goodness-of-fit test requires an expected cell count of at least 5 (see Section 1.3.5.15 in the NIST/SEMATECH e-Handbook of Statistical Methods), the first 3 and last 3 bins were combined into 2 bins.  The re-binned table is below.

Data After Re-b
Emissions Observed Counts Expected Counts Chi-squared
1        0-2              18           12.20       2.757
2          3              28           26.95       0.041
3          4              56           56.54       0.005
4          5             105           94.90       1.075
5          6             126          132.73       0.341
6          7             146          159.12       1.082
7          8             164          166.92       0.051
8          9             161          155.64       0.185
9         10             123          130.62       0.445
10        11             101           99.65       0.018
11        12              74           69.69       0.267
12        13              53           44.99       1.426
13        14              23           26.97       0.584
14        15              15           15.09       0.001
15        16               9            7.91       0.150
16       17+               5            6.53       0.358

#### Testing the Goodness of Fit of the Poisson Distribution

Now that the expected counts are calculated for each cell, the chi-squared test statistic values for the individual cells can be calculated, and they are shown above in the last column.  The sum of all of these values is the Pearson chi-squared test statistic, which I obtained to be 8.786, slightly different from Rice’s 8.99 – perhaps due to rounding error.

$\chi^2 = \sum_{i=1}^{n} [\text{Observed}_i - \text{Expected}_i]^2 \div \text{Expected}_i$

The number of degrees of freedom is calculated as follows:

df = number of cells – number of independent parameters fitted – 1

df = 16 – 1 – 1 = 14

Based on the chi-squared distribution with 14 degrees of freedom, the p-value of the test statistic is 0.8445.  Thus, there is insufficient evidence to suggest that the Poisson distribution is a bad fit.  Flipping that double negative, the Poisson distribution seems like a good fit.  This is confirmed by the scatter plot of the observed counts as proportions of the total number of counts; it is close to the Poisson PMF (plotted with dpois() in R) with rate parameter 8.392 (0.8392 emissions/second multiplied by 10 seconds per interval).

I learned some useful plotting commands on this discussion thread on Stack Overflow to write the superscript for Americium.  The full R code of my analysis is here:

##### Analyzing the Goodness of Fit of the Poission Distribution to Alpha  Emissions of Americium-241
##### by Eric Cai - The Chemical Statistician

lambda = 8.392

emissions = 0:19
observed = c(1, 4, 13, 28, 56, 105, 126, 146, 164, 161, 123, 101, 74, 53, 23, 15, 9, 3, 1, 1)
total = sum(observed)
expected = total*((lambda^emissions)*exp(-lambda))/factorial(emissions)
chi.squared = (observed - expected)^2/expected

expected = round(expected, 2)
chi.squared = round(chi.squared, 3)

goodness.of.fit = cbind(emissions, observed, expected)
colnames(goodness.of.fit) = c('Emissions', 'Observed Counts', 'Expected Counts')
goodness.of.fit

chi.squared.statistic = sum(chi.squared)
p.value = pchisq(chi.squared.statistic, length(observed)-2, lower.tail = F)

png('INSERT YOUR OWN DIRECTORY PATH HERE/poisson fit.png')
### Notice the use of the expression() function to write the title and include the superscripted '241'
plot(emissions, observed/total, xlim = c(0, 20), xlab = 'Number of Emissions in a 10-Second Interval', ylab = 'Proportion of Observed Counts', main = expression("Fitting the Poission Distribution to Alpha Emissions of "^241*"Am"))
lines(emissions, dpois(emissions, lambda))
dev.off()

### Re-bin the emissions so that no cell has less than 5 counts.
emissions.rebinned = c('0-2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17+')
observed.rebinned = c(sum(observed[1:3]), observed[4:17], sum(observed[18:20]))
expected.rebinned = c(sum(expected[1:3]), expected[4:17], sum(expected[18:20]))
chi.squared.rebinned = (observed.rebinned - expected.rebinned)^2/expected.rebinned

expected.rebinned = round(expected.rebinned, 2)
chi.squared.rebinned = round(chi.squared.rebinned, 3)

### I used the data.frame() function to create a data table for the re-binned data
### instead of using the cbind() function to create a matrix as I did above.
### The re-binnded data's first column is a vector of strings, and the matrix showed
### the quotation marks, but the data table does not.
goodness.of.fit.rebinned = data.frame(emissions.rebinned, observed.rebinned, expected.rebinned, chi.squared.rebinned)
colnames(goodness.of.fit.rebinned) = c('Emissions', 'Observed Counts', 'Expected Counts', 'Chi-squared')
goodness.of.fit.rebinned

chi.squared.statistic.rebinned = sum(chi.squared.rebinned)
p.value.rebinned = pchisq(chi.squared.statistic.rebinned, length(observed.rebinned)-2, lower.tail = F)

#### References

World Nuclear Association: Smoke Detectors and Americium

Rice, John A. Mathematical statistics and data analysis. Duxbury press, 1988.  (See Chapter 8)

NIST/SEMATECH e-Handbook of Statistical Methods: Section 1.3.5.15 – Chi-Square Goodness-of-Fit Test

University of Alabama – Huntsville, Department of Mathematical Sciences.  Virtual Laboratories in Probability and Statistics: Alpha Particle Emissions Data