A SAS macro to automatically label variables using another data set

Introduction

When I write SAS programs, I usually export the analytical results into an output that a client will read.  I often cannot show the original variable names in these outputs; there are 2 reasons for this:

  • The maximal length of a SAS variable’s name is 32 characters, whereas the description of the variable can be much longer.  This is the case for my current job in marketing analytics.
  • Only letters, numbers, and underscores are allowed in a SAS variable’s name.  Spaces and special characters are not allowed.  Thus, if a variable’s name is quite long and complicated to describe, then the original variable name would be not suitable for presentation or awkward to read.  It may be so abbreviated that it is devoid of practical meaning.

This is why labelling variables can be a good idea.  However, I usually label variables manually in a DATA step or within PROC SQL, which can be very slow and prone to errors.  I recently worked on a data set with 193 variables, most of which require long descriptions to understand what they mean.  Labelling them individually and manually was not a realistic method, so I sought an automated or programmatic way to do so.

The Code

Jerry Leonard from SAS Technical Support shared a useful snippet of code with me.  The key part of this code invokes PROC DATASETS, which allows me to access and manipulate attributes of a SAS data set.  The labels of the variables is an attribute that PROC DATASETS can alter.  In this code, the labels are available as observations of a character column in another data set.  There is a very convenient feature of this code: the labels in this second data set DO NOT need to be sorted in the same order as the display of variables in your first data set!

I have converted Jerry’s code into a macro, but the core of it is entirely his creation, so I thank him profusely for his help.

First, let’s create a data set with a character variable containing the descriptions of the columns in the built-in SASHELP.CLASS data set.  I will also create a temporary copy of the CLASS data in the WORK library.  Notice that the variable names in “mylabels” are sorted differently from the display of the variables in “class”.  I did this on purpose to demonstrate the flexibility of the macro to handle this non-alignment.

* create a data set of descriptions 
* I will use the built-in SASHELP.CLASS data set as an example;
data mylabels; 
     input 
     Variable $ @8 
     Description $20.; 
     cards; 
height Height (inches)
sex    Gender of Student 
name   Name of Student
weight Weight (pounds) 
age    Age (years)
; 
run; 


* create a temporary copy of the CLASS data set; 
data class; 
     set sashelp.class; 
run;

 

Next, I will create a macro to add the labels to the CLASS data set.  This invokes PROC DATASETS to manipulate the labels.  This macro has 5 arguments.

  • the data set to be labelled
  • the library containing this data set
  • the data set containing the labels in a character variable
  • the name of the column containing the variable names in this second data set
  • the name of the column containing the labels in this second data set
%macro label_data(data_set, library, ds_labels, column_name, column_label);

* create distinct macro variables for each variable name and label; 
data _null_; 
     set &ds_labels; 
     call symput('var' || trim(left(_N_)), trim(left(&column_name))); 
     call symput('label' || trim(left(_N_)), trim(left(&column_label))); 
     call symput('nobs', trim(left(_N_))); 
run;


* use PROC DATASETS to change the labels;
proc datasets 
     library = &library 
          memtype = data
          nolist; 
     modify &data_set; 
     label 
          %do i = 1 %to &nobs; 
               &&var&i = &&label&i 
          %end; 
     ; 
     quit; 
run; 

%mend;

 

Using the macro

Now, let’s implement this SAS macro for our example; you can view the results using PROC CONTENTS.  I sorted the output by the variable number.

* run the label_data macro to add the labels;
%label_data(class, work, mylabels, Variable, Description);

* view the resulting labels in the data;
proc contents 
      data = class 
           varnum; 
run;

 

Here are the results:

Variables in Creation Order
# Variable Type Len Label
1 Name Char 8 Name of Student
2 Sex Char 1 Gender
3 Age Num 8 Age of Student (Years)
4 Height Num 8 Height (inches)
5 Weight Num 8 Weight (lbs)

 

Reaping the benefits of the macro

Let’s use PROC LOGISTIC to perform logistic regression, with gender as the target variable and age, height, and weight as the predictor variables.  The PARMLABEL option in the MODEL statement adds the labels to the output.  For brevity, I will show only the table of the maximum likelihood estimates, and I will create an output data set of this table, too.

* run logistic regression to check if the labels are shown in an output;
ods output
     parameterestimates = pe;
proc logistic
     data = class;
     model sex = age height weight
     / parmlabel;
     ods select parameterestimates;
run;


* show the table of maximum likelihood estimates with the labels;
proc print
     data = pe (drop = _ESTTYPE_);
run;

 

Here are the results! First, here is the table that is produced by PROC LOGISTIC.

Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq Label
Intercept 1 -1.1926 11.9216 0.0100 0.9203 Intercept: Sex=F
Age 1 1.7647 0.9534 3.4261 0.0642 Age (years)
Height 1 -0.1517 0.2902 0.2734 0.6011 Height (inches)
Weight 1 -0.1326 0.0869 2.3277 0.1271 Weight (pounds)

Finally, here is the table that was made by the ODS OUTPUT statement before PROC LOGISTIC.

Obs Variable DF Estimate StdErr WaldChiSq ProbChiSq Label
1 Intercept 1 -1.1926 11.9216 0.0100 0.9203 Intercept: Sex=F
2 Age 1 1.7647 0.9534 3.4261 0.0642 Age (years)
3 Height 1 -0.1517 0.2902 0.2734 0.6011 Height (inches)
4 Weight 1 -0.1326 0.0869 2.3277 0.1271 Weight (pounds)

 

You can export the data set “pe” into your report, and you won’t need to manually add the labels in the report.

In this above example, I manually created the second data set with the labels.  However, I often have the second data set of labels available from another input file.  If this is the case, then I would not ever need to manually type the labels.

Advertisements

2 Responses to A SAS macro to automatically label variables using another data set

  1. groju says:

    Thank you very much for your article. It was very hepful !

Your thoughtful comments are much appreciated!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

%d bloggers like this: