Use the LENGTH statement to pre-set the lengths of character variables in SAS – with a comparison to R

I often create character variables (i.e. variables with strings of text as their values) in SAS, and they sometimes don’t render as expected.  Here is an example involving the built-in data set SASHELP.CLASS.

Here is the code:

data c1;
     set sashelp.class;
 
     * define a new character variable to classify someone as tall or short;
     if height > 60
     then height_class = 'Tall';
          else height_class = 'Short';
run;


* print the results for the first 5 rows;
proc print
     data = c1 (obs = 5);
run;

Here is the result:

Obs Name Sex Age Height Weight height_class
1 Alfred M 14 69.0 112.5 Tall
2 Alice F 13 56.5 84.0 Shor
3 Barbara F 13 65.3 98.0 Tall
4 Carol F 14 62.8 102.5 Tall
5 Henry M 14 63.5 102.5 Tall

What happened?  Why does the word “Short” render as “Shor”?

This occurred because SAS sets the length of a new character variable as the length of the first value given in its definition.  My code defined “height_class” by setting the value “Tall” first, which has a length of 4.  Thus, “height_class” was defined as a character variable with a length of 4.  Any subsequent values must follow this variable type and format.

How can we circumvent this?  You can pre-set the length of any new variable with the LENGTH statement before the SET statement.  In the revised code below, I correct the problem by setting the length of “height_class” to 5 before defining its possible values.

data c2;
     set sashelp.class;
 
     * define a new character variable to classify someone as tall or short;
     length height_class $ 5;
     if height > 60
     then height_class = 'Tall';
          else height_class = 'Short';
run;


* print the results for the first 5 rows;
proc print
     data = c2 (obs = 5);
run;

Here is the result:

Obs Name Sex Age Height Weight height_class
1 Alfred M 14 69.0 112.5 Tall
2 Alice F 13 56.5 84.0 Short
3 Barbara F 13 65.3 98.0 Tall
4 Carol F 14 62.8 102.5 Tall
5 Henry M 14 63.5 102.5 Tall

 

Notice that “height_class” for Alice is “Short”, as it should be.

An alternative solution is to re-write the code so that the first instance of “height_class” is the longest possible value.  This does not require the use of the LENGTH statement.

data c3;
     set sashelp.class;
 
     * define a new character variable to classify someone as tall or short;
     if height < 60
          then height_class = 'Short';
     else height_class = 'Tall';
run;

 

By the way, I don’t notice this problem in R.  Here is some code to illustrate this observation.

> set.seed(235)
> 
> # randomly generate 4 values
> x = rnorm(3, 60, 5)
> 
> # add a value to the beginning of "x" so that the first value is above 60
> # add a value to the end of "x" so that the last vlaue is below 60
> x = c(63, x, 57)
> x
[1] 63.00000 70.68902 61.36082 56.62601 57.00000
> 
> # pre-allocate a vector for classifying "x" as "tall" or "short"
> y = 0 * x
> 
> 
> for (i in 1:length(x))
+ {
+   if (x[i] > 60)
+     {
+     y[i] = 'Tall'
+   }
+   else
+ {
+     y[i] = 'Short'
+   }
+ }
> 
> 
> # display "y"
> y
[1] "Tall"  "Tall"  "Tall"  "Short" "Short"

Notice that the value “Short” renders fully with a length of 5.  I did not need to pre-set the length of “y” first.

Advertisements

5 Responses to Use the LENGTH statement to pre-set the lengths of character variables in SAS – with a comparison to R

  1. Pingback: Use the LENGTH statement to pre-set the lengths of character variables in SAS – with a comparison to R | A bunch of data

  2. Pingback: Use the LENGTH statement to pre-set the lengths of character variables in SAS – with a comparison to R | A bunch of data

  3. Pingback: Use the LENGTH statement to pre-set the lengths of character variables in SAS – with a comparison to R – Mubashir Qasim

  4. Ista Zahn says:

    You can replace you for-loop in R with one of

    y = ifelse(x > 60, "tall", "short")
    

    ,

    factor(x > 60, labels = c("short", "tall"))
    

    or

    y <- ""
    y[x > 60] <- "tall"
    y[x <= 60] <- "short"
    

Your thoughtful comments are much appreciated!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: