## Eric’s Enlightenment for Wednesday, May 6, 2015

1. Moldova has mysteriously lost one-eighth of its GDP, possibly to fraudulent loans.
2. Kai Brothers was diagnosed with HIV in 1989, but did not show any symptoms for 25 years.  Does he have a natural defense against HIV?  Now that he is starting to show symptoms, should he start taking anti-retroviral drugs and deny scientists the chance to look for that natural defense in his blood?
3. Use the VVALUE function in SAS to convert formatted values of a variable into new values of a character variable.
4. Alex Reinhart diligently compiled and explained a list of major “egregious statistical fallacies regularly committed in the name of science”.  Check them out on his web site and in his book entitled “Statistics Done Wrong“.  I highly recommend reading the section entitled “The p value and the base rate fallacy“.

## The Chi-Squared Test of Independence – An Example in Both R and SAS

#### Introduction

The chi-squared test of independence is one of the most basic and common hypothesis tests in the statistical analysis of categorical data.  Given 2 categorical random variables, $X$ and $Y$, the chi-squared test of independence determines whether or not there exists a statistical dependence between them.  Formally, it is a hypothesis test with the following null and alternative hypotheses:

$H_0: X \perp Y \ \ \ \ \ \text{vs.} \ \ \ \ \ H_a: X \not \perp Y$

If you’re not familiar with probabilistic independence and how it manifests in categorical random variables, watch my video on calculating expected counts in contingency tables using joint and marginal probabilities.  For your convenience, here is another video that gives a gentler and more practical understanding of calculating expected counts using marginal proportions and marginal totals.

Today, I will continue from those 2 videos and illustrate how the chi-squared test of independence can be implemented in both R and SAS with the same example.