Getting the names, types, formats, lengths, and labels of variables in a SAS data set

After reading my blog post on getting the variable names of a SAS data set, a reader named Robin asked how to get the formats as well.  I asked SAS Technical Support for help, and a consultant named Jerry Leonard provided a beautiful solution using PROC SQL.  Besides the names and formats of the variables, it also gives the types, lengths, and labels.  Here is an example of how to do so with the CLASS data set in the built-in SASHELP library.

* add formats and labels to 3 of the variables in the CLASS data set;
data class;                                                      
       set sashelp.class;                                            
            age 8.  
            weight height 8.2 
            name $15.;          
            age = 'Age'
            weight = 'Weight'
            height = 'Height';

* extract the variable information using PROC SQL; 
proc sql 
       create table class_info as 
       select libname as library, 
              memname as data_set, 
              name as variable_name, 
       from dictionary.columns                                       
       where libname = 'WORK' and memname = 'CLASS';                     
       /* libname and memname values must be upper case  */         
* print the resulting table;
proc print 
       data = class_info;                                            

Here is the result of that PROC PRINT step in the Results Viewer.  Notice that it also has the type, length, format, and label of each variable.

Obs library data_set variable_name type length format label
1 WORK CLASS Name char 8 $15.
2 WORK CLASS Sex char 1
3 WORK CLASS Age num 8 8. Age
4 WORK CLASS Height num 8 8.2 Height
5 WORK CLASS Weight num 8 8.2 Weight

Thank you, Jerry, for sharing your tip!

Adding Labels to Points in a Scatter Plot in R

What’s the Scatter?

A scatter plot displays the values of 2 variables for a set of data, and it is a very useful way to visualize data during exploratory data analysis, especially (though not exclusively) when you are interested in the relationship between a predictor variable and a target variable.  Sometimes, such data come with categorical labels that have important meanings, and the visualization of the relationship can be enhanced when these labels are attached to the data.

It is common practice to use a legend to label data that belong to a group, as I illustrated in a previous post on bar charts and pie charts.  However, what if every datum has a unique label, and there are many data in the scatter plot?  A legend would add unnecessary clutter in such situations.  Instead, it would be useful to write the label of each datum near its point in the scatter plot. I will show how to do this in R, illustrating the code with a built-in data set called LifeCycleSavings.

Read more of this post