A macro to automate the creation of indicator variables in SAS

In a recent blog post, I introduced an easy and efficient way to create indicator variables from categorical variables in SAS.  This method pretends to run logistic regression, but it really is using PROC LOGISTIC to get the design matrix based on dummy-variable coding.  I shared SAS code for how to do so, step-by-step.

I write this follow-up post to provide a macro that you can use to execute all of those steps in one line.  If you have not read my previous post on this topic, then I strongly encourage you to do that first.  Don’t use this macro blindly.

Here is the macro.  The key steps are

  1. Run PROC LOGISTIC to get the design matrix (which has the indicator variables)
  2. Merge the original data with the newly created indicator variables
  3. Delete the “INDICATORS” data set, which was created in an intermediate step
%macro create_indicators(input_data, target, covariates, output_data);

proc logistic
     data = &input_data
          noprint
          outdesign = indicators;
     class &covariates / param = glm;
     model &target = &covariates;
run;


data &output_data;
      merge    &input_data
               indicators (drop = Intercept &target);
run;


proc datasets 
     library = work
          noprint;
     delete indicators;
run;

%mend;

I will use the built-in data set SASHELP.CARS to illustrate the use of my macro.  As you can see, my macro can accept multiple categorical variables as inputs for creating indicator variables.  I will do that here for the variables TYPE, MAKE, and ORIGIN.

%create_indicators(sashelp.cars, DriveTrain, Type Make Origin, cars1);

By executing this one line, I created the data set CARS1, which has the indicator variables for all of the levels within TYPE, MAKE, and ORIGIN.

Here is some code to take a random sample of CARS1 using PROC SURVEYSELECT; I included a seed for you to replicate my results.

proc surveyselect
     data = cars1
          noprint
          n = 10
          seed = 265
          out = cars2;
run;

proc print
     data = cars2
          noobs;
     var type:;
run;

proc print
     data = cars2
          noobs;
     var origin:;
run;

Here are the results from the two PROC PRINT statements.

Type TypeHybrid TypeSUV TypeSedan TypeSports TypeTruck TypeWagon
SUV 0 1 0 0 0 0
Sedan 0 0 1 0 0 0
Sports 0 0 0 1 0 0
Wagon 0 0 0 0 0 1
SUV 0 1 0 0 0 0
Sedan 0 0 1 0 0 0
SUV 0 1 0 0 0 0
Sedan 0 0 1 0 0 0
Wagon 0 0 0 0 0 1
Sedan 0 0 1 0 0 0
Origin OriginAsia OriginEurope OriginUSA
USA 0 0 1
USA 0 0 1
USA 0 0 1
USA 0 0 1
Asia 1 0 0
Asia 1 0 0
USA 0 0 1
Asia 1 0 0
Asia 1 0 0
USA 0 0 1

I encourage you to print the entire data set in SAS for your viewing, and I also encourage you to try this macro for your own data set!

 

Your thoughtful comments are much appreciated!