Use the LENGTH statement to pre-set the lengths of character variables in SAS – with a comparison to R

I often create character variables (i.e. variables with strings of text as their values) in SAS, and they sometimes don’t render as expected.  Here is an example involving the built-in data set SASHELP.CLASS.

Here is the code:

data c1;
     set sashelp.class;
     * define a new character variable to classify someone as tall or short;
     if height > 60
     then height_class = 'Tall';
          else height_class = 'Short';

* print the results for the first 5 rows;
proc print
     data = c1 (obs = 5);

Here is the result:

Obs Name Sex Age Height Weight height_class
1 Alfred M 14 69.0 112.5 Tall
2 Alice F 13 56.5 84.0 Shor
3 Barbara F 13 65.3 98.0 Tall
4 Carol F 14 62.8 102.5 Tall
5 Henry M 14 63.5 102.5 Tall

What happened?  Why does the word “Short” render as “Shor”?

Read more of this post

Getting the names, types, formats, lengths, and labels of variables in a SAS data set

After reading my blog post on getting the variable names of a SAS data set, a reader named Robin asked how to get the formats as well.  I asked SAS Technical Support for help, and a consultant named Jerry Leonard provided a beautiful solution using PROC SQL.  Besides the names and formats of the variables, it also gives the types, lengths, and labels.  Here is an example of how to do so with the CLASS data set in the built-in SASHELP library.

* add formats and labels to 3 of the variables in the CLASS data set;
data class;                                                      
       set sashelp.class;                                            
            age 8.  
            weight height 8.2 
            name $15.;          
            age = 'Age'
            weight = 'Weight'
            height = 'Height';

* extract the variable information using PROC SQL; 
proc sql 
       create table class_info as 
       select libname as library, 
              memname as data_set, 
              name as variable_name, 
       from dictionary.columns                                       
       where libname = 'WORK' and memname = 'CLASS';                     
       /* libname and memname values must be upper case  */         
* print the resulting table;
proc print 
       data = class_info;                                            

Here is the result of that PROC PRINT step in the Results Viewer.  Notice that it also has the type, length, format, and label of each variable.

Obs library data_set variable_name type length format label
1 WORK CLASS Name char 8 $15.
2 WORK CLASS Sex char 1
3 WORK CLASS Age num 8 8. Age
4 WORK CLASS Height num 8 8.2 Height
5 WORK CLASS Weight num 8 8.2 Weight

Thank you, Jerry, for sharing your tip!

Physical Chemistry Lesson of the Day: Pressure-Volume Work

In chemistry, a common type of work is the expansion or compression of a gas under constant pressure.  Recall from physics that pressure is defined as force applied per unit of area.

P = F \div A

P \times A = F

Consider a chemical reaction that releases a gas as its product inside a sealed cylinder with a movable piston.



Image from Dpumroy via Wikimedia.

As the gas expands inside the cylinder, it pushes against the piston, and work is done by the system against the surroundings.  The atmospheric pressure on the cylinder remains constant while the cylinder expands, and the volume of the cylinder increases as a result.  The volume of the cylinder at any given point is the area of the piston times the length of the cylinder.  The change in volume is equal to the area of the piston times the distance along which the piston was pushed by the expanding gas.

w = -P \times \Delta V

w = -P \times A \times \Delta L

w = -F \times \Delta L

Note that this last line is just the definition of work under constant force in the same direction as the displacement, multiplied by the negative sign to follow the sign convention in chemistry.

Exploratory Data Analysis – Computing Descriptive Statistics in R for Data on Ozone Pollution in New York City


This is the first of a series of posts on exploratory data analysis (EDA).  This post will calculate the common summary statistics of a univariate continuous data set – the data on ozone pollution in New York City that is part of the built-in “airquality” data set in R.  This is a particularly good data set to work with, since it has missing values – a common problem in many real data sets.  In later posts, I will continue this series by exploring other methods in EDA, including box plots and kernel density plots.

Read more of this post