A macro to automate the creation of indicator variables in SAS

In a recent blog post, I introduced an easy and efficient way to create indicator variables from categorical variables in SAS.  This method pretends to run logistic regression, but it really is using PROC LOGISTIC to get the design matrix based on dummy-variable coding.  I shared SAS code for how to do so, step-by-step.

I write this follow-up post to provide a macro that you can use to execute all of those steps in one line.  If you have not read my previous post on this topic, then I strongly encourage you to do that first.  Don’t use this macro blindly.

Here is the macro.  The key steps are

  1. Run PROC LOGISTIC to get the design matrix (which has the indicator variables)
  2. Merge the original data with the newly created indicator variables
  3. Delete the “INDICATORS” data set, which was created in an intermediate step
%macro create_indicators(input_data, target, covariates, output_data);

proc logistic
     data = &input_data
          noprint
          outdesign = indicators;
     class &covariates / param = glm;
     model &target = &covariates;
run;


data &output_data;
      merge    &input_data
               indicators (drop = Intercept &target);
run;


proc datasets 
     library = work
          noprint;
     delete indicators;
run;

%mend;

I will use the built-in data set SASHELP.CARS to illustrate the use of my macro.  As you can see, my macro can accept multiple categorical variables as inputs for creating indicator variables.  I will do that here for the variables TYPE, MAKE, and ORIGIN.

Read more of this post

An easy and efficient way to create indicator variables (a.k.a. dummy variables) from a categorical variable in SAS

Introduction

In statistics and biostatistics, the creation of binary indicators is a very useful practice.

  • They can be useful predictor variables in statistical models.
  • They can reduce the amount of memory required to store the data set.
  • They can treat a categorical covariate as a continuous covariate in regression, which has certain mathematical conveniences.

However, the creation of indicator variables can be a long, tedious, and error-prone process.  This is especially true if there are many categorical variables, or if a categorical variable has many categories.  In this tutorial, I will show an easy and efficient way to create indicator variables in SAS.  I learned this technique from SAS usage note #23217: Saving the coded design matrix of a model to a data set.

The Example Data Set

Let’s consider the PRDSAL2 data set that is built into the SASHELP library.  Here are the first 5 observations; due to a width constraint, I will show the first 5 columns and the last 6 columns separately.  (I encourage you to view this data set using PROC PRINT in SAS by yourself.)

COUNTRY STATE COUNTY ACTUAL PREDICT
U.S.A. California $987.36 $692.24
U.S.A. California $1,782.96 $568.48
U.S.A. California $32.64 $16.32
U.S.A. California $1,825.12 $756.16
U.S.A. California $750.72 $723.52

 

PRODTYPE PRODUCT YEAR QUARTER MONTH MONYR
FURNITURE SOFA 1995 1 Jan JAN95
FURNITURE SOFA 1995 1 Feb FEB95
FURNITURE SOFA 1995 1 Mar MAR95
FURNITURE SOFA 1995 2 Apr APR95
FURNITURE SOFA 1995 2 May MAY95

Read more of this post