A macro to execute PROC TTEST for multiple binary grouping variables in SAS (and sorting t-test statistics by their absolute values)

In SAS, you can perform PROC TTEST for multiple numeric variables in the same procedure.  Here is an example using the built-in data set SASHELP.BASEBALL; I will compare the number of at-bats and number of walks between the American League and the National League.

proc ttest
     data = sashelp.baseball;
     class League;
     var nAtBat nBB; 
     ods select ttests;
run;

Here are the resulting tables.

Method Variances DF t Value Pr > |t|
Pooled Equal 320 2.05 0.0410
Satterthwaite Unequal 313.66 2.06 0.04

Method Variances DF t Value Pr > |t|
Pooled Equal 320 0.85 0.3940
Satterthwaite Unequal 319.53 0.86 0.3884

 

What if you want to perform PROC TTEST for multiple grouping (a.k.a. classification) variables?  You cannot put more than one variable in the CLASS statement, so you would have to run PROC TTEST separately for each binary grouping variable.  If you do put LEAGUE and DIVISION in the same CLASS statement, here is the resulting log.

1303 proc ttest
1304 data = sashelp.baseball;
1305 class league division;
 --------
 22
 202
ERROR 22-322: Expecting ;.
ERROR 202-322: The option or parameter is not recognized and will be ignored.
1306 var natbat;
1307 ods select ttests;
1308 run;

 

There is no syntax in PROC TTEST to use multiple grouping variables at the same time, so this tutorial provides a macro to do so.  There are several nice features about my macro:

  1. It allows you to use multiple grouping variables at the same time.
  2. It sorts the t-test statistics by their absolute values within each grouping variable.
  3. It shows the name of each continuous variable in the output table, unlike the above output.

Here is its basic skeleton.

  1. Count the number of grouping variables.
  2. Use a DO-loop to iterate through each grouping variable.
  3. Create a variable called FIRST to denote the first iteration of the loop.  This will become useful at the end.
  4. In each iteration, execute PROC TTEST with the ith grouping variable.  Produce the output data set containing the results.
  5. Sort the resulting data set by the absolute value of the t-test statistics in descending order.
  6. Use PROC SQL to abstract the key information:
    • the grouping variable’s name
    • the t-test statistic
    • the absolute value of the t-test statistic
    • the P-value of the 2-sample t-test (assuming unequal variances)
  7. Create a new data set from the results of the first iteration; use the aforementioned “FIRST” variable to determine if it is the first iteration.  Append the results of the subsequent iterations to this data set using PROC APPEND.
  8. Delete the intermediate data sets that were created within the macro: TTEST_RESULTS1 and TTEST_RESULTS2.

 

Here is the code for the macro.  Note that you can feed more than one numeric variable into this macro.

%macro ttest_by_class(ds, class_vars, numeric_vars, output);

* create a counter that will be used later for appending data sets;
%let first = 1;

* find the number of class variables fed into the macro;
%let num_class_vars = %sysfunc(countw(&class_vars.));

%put There are &num_class_vars. grouping variables to process.;

***** loop through all class variables;
%do i = 1 %to &num_class_vars.;

     * extract the ith class variable;
     %let ClassVar = %scan(&class_vars, &i, ' ');
     %put Starting variable &i. of &num_class_vars., &ClassVar.;

     * create an output data set containing the statistical results of PROC TTEST;
     * suppress printing of output using ODS EXCLUDE ALL;
     ods exclude all;
     proc ttest 
          data = &ds.;
          class &ClassVar.;
          var &numeric_vars.;
          ods output 
               ttests = ttest_results1;
     run;
     ods exclude none;

     * choose the method using unequal variances;
     * calculate the absolute value of the t-test statistics;
     data ttest_results2;
          set ttest_results1;
          if variances = "Unequal";
          abstValue = abs(tValue);
     run;

     * sort the data set by the absolute value of the t-test statistics in descending order;
     proc sort
          data = ttest_results2;
          by 
               descending abstValue;
     run;

     * create a data set of the label, t-test statistic, absolute value of the t-test statistic, and P-value for the ith grouping variable;
     proc sql 
               noprint;
          create table ttest_results3 as
          select 
               "&ClassVar." 
                    as Grouping_Variable 
                    label = 'Grouping Variable'
                    length = 100 
                    format = $100., 
               Variable 
                    as Numeric_Variable
                    label = 'Numeric Variable'
                    length = 100
                    format = $100.,
               tValue 
                    label = "t-Test Statistic"
                    format = 8.4, 
               abstValue
                    label = "Absolute Value of t-Test Statistic"
                    format = 8.4,
               Probt 
                    as PValue
                    label = "P-Value" 
                    format = 8.4
          from ttest_results2;
     quit;
 
     * append the data sets as each new result is generated;
     %if &first. 
          %then %do;
               data &output.;
                    set ttest_results3;
               run;
               %let first = 0;
          %end;

     %else %do;
               proc append 
                    base = &output. 
                    data = ttest_results3;
               run;
     %end;
%end;

* delete the intermediate data sets that were created within the macro;
proc datasets 
     library = work
          noprint;
     delete ttest_results: ;
run;

%mend;

 

Let’s try it with the SASHELP.BASEBALL data set again!

%ttest_by_class(sashelp.baseball, League Division, nAtBat nHits nHome nRuns nRBI nBB, baseball_ttests);

proc print
     data = baseball_ttests
          noobs
          label;
run;

Here is the output from PROC PRINT.

Grouping Variable Numeric Variable t-Test Statistic Absolute Value
of t-Test Statistic
P-Value
League nHome 3.2134 3.2134 0.0014
League nRuns 2.8408 2.8408 0.0048
League nRBI 2.6692 2.6692 0.0080
League nAtBat 2.0582 2.0582 0.0404
League nHits 1.9186 1.9186 0.0559
League nBB 0.8637 0.8637 0.3884
Division nRBI 1.6386 1.6386 0.1023
Division nRuns 1.5652 1.5652 0.1186
Division nHits 1.4758 1.4758 0.1410
Division nBB 1.2131 1.2131 0.2260
Division nAtBat 0.9442 0.9442 0.3458
Division nHome 0.5238 0.5238 0.6008

 

Within my macro, notice that I used ODS EXCLUDE ALL to suppress the printing of the output from PROC TTEST.  This is very important, because PROC TTEST can take a long time to complete.  Furthermore, I used ODS OUTPUT to specify the one table that I want, which saves me time and memory by excluding the output that I don’t want.

As I mentioned before, this macro also sorts the results by the absolute values of the t-test statistics.  Thus, if that is your goal, you can do that, too!  In fact, you can use it with just one grouping variable and multiple continuous variables, and you will get a nice table of the results that are indexed by the names of the continuous variables.

I thank Cyrus Bradford from SAS Technical Support for his help with the above macro.  Although he did not write the exact macro above, he helped me with a very similar macro for a slightly different purpose, and he wrote most of the code.  My main contribution was expanding it to allow multiple numeric variables to be fed into the macro.

Your thoughtful comments are much appreciated!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: