A macro to execute PROC TTEST for multiple binary grouping variables in SAS (and sorting t-test statistics by their absolute values)

In SAS, you can perform PROC TTEST for multiple numeric variables in the same procedure.  Here is an example using the built-in data set SASHELP.BASEBALL; I will compare the number of at-bats and number of walks between the American League and the National League.

proc ttest
     data = sashelp.baseball;
     class League;
     var nAtBat nBB; 
     ods select ttests;
run;

Here are the resulting tables.

Method Variances DF t Value Pr > |t|
Pooled Equal 320 2.05 0.0410
Satterthwaite Unequal 313.66 2.06 0.04

Method Variances DF t Value Pr > |t|
Pooled Equal 320 0.85 0.3940
Satterthwaite Unequal 319.53 0.86 0.3884

 

What if you want to perform PROC TTEST for multiple grouping (a.k.a. classification) variables?  You cannot put more than one variable in the CLASS statement, so you would have to run PROC TTEST separately for each binary grouping variable.  If you do put LEAGUE and DIVISION in the same CLASS statement, here is the resulting log.

1303 proc ttest
1304 data = sashelp.baseball;
1305 class league division;
 --------
 22
 202
ERROR 22-322: Expecting ;.
ERROR 202-322: The option or parameter is not recognized and will be ignored.
1306 var natbat;
1307 ods select ttests;
1308 run;

 

There is no syntax in PROC TTEST to use multiple grouping variables at the same time, so this tutorial provides a macro to do so.  There are several nice features about my macro:

  1. It allows you to use multiple grouping variables at the same time.
  2. It sorts the t-test statistics by their absolute values within each grouping variable.
  3. It shows the name of each continuous variable in the output table, unlike the above output.

Here is its basic skeleton.

Read more of this post

Advertisements

Applied Statistics Lesson of the Day – The Independent 2-Sample t-Test with Unequal Variances (Welch’s t-Test)

A common problem in statistics is determining whether or not the means of 2 populations are equal.  The independent 2-sample t-test is a popular parametric method to answer this question.  (In an earlier Statistics Lesson of the Day, I discussed how data collected from a completely randomized design with 1 binary factor can be analyzed by an independent 2-sample t-test.  I also discussed its possible use in the discovery of argon.)  I have learned 2 versions of the independent 2-sample t-test, and they differ on the variances of the 2 samples.  The 2 possibilities are

  • equal variances
  • unequal variances

Most statistics textbooks that I have read elaborate at length about the independent 2-sample t-test with equal variances (also called Student’s t-test).  However, the assumption of equal variances needs to be checked using the chi-squared test before proceeding with the Student’s t-test, yet this check does not seem to be universally done in practice.  Furthermore, conducting one test based on the results of another can inflate the probability of committing a Type 1 error (Ruxton, 2006).

Some books give due attention to the independent 2-sample t-test with unequal variances (also called Welch’s t-test), but some barely mention its value, and others do not even mention it at all.  I find this to be puzzling, because the assumption of equal variances is often violated in practice, and Welch’s t-test provides an easy solution to this problem.  There is a seemingly intimidating but straightforward calculation to approximate the number of degrees of freedom for Welch’s t-test, and this calculation is automatically incorporated in most software, including R and SAS.  Finally, Welch’s t-test removes the need to check for equal variances, and it is almost as powerful as Student’s t-test when the variances are equal (Ruxton, 2006).

For all of these reasons, I recommend Welch’s t-test when using the parametric approach to compare the means of 2 populations.

Reference

Graeme D. Ruxton.  “The unequal variance t-test is an underused alternative to Student’s t-test and the Mann–Whitney U test“.  Behavioral Ecology (July/August 2006) 17 (4): 688-690 first published online May 17, 2006