Beware of accidental replacement of data sets with PROC SORT in SAS

PROC SORT is a very useful procedure in SAS.  Not only can you sort a data set on one or more variables with it, but you can sort each variable in ascending or descending order, and you can use it to obtain unique observations or duplicated observationsHowever, there is a feature about PROC SORT that can be dangerous and deserves emphasis: If you are not careful, you can accidentally replace an existing, valuable data set.

Suppose that you wish to use PROC SORT to get only the duplicated records of a data set.  Here is an example of how to do it.

data heights;
     input Name $ 
Amy 15 174
Amy 16 177
Bob 14 172
Cam 13 163
Cam 17 181

proc sort
     data = heights
     by Name;

proc print
     data = heights;
Obs Name Age Height
1 Amy 15 174
2 Amy 16 177
3 Cam 13 163
4 Cam 17 181

Note that the record for “Bob” is gone from HEIGHTS, because it was a unique observation and, thus, removed in the above PROC SORT statement.

If the original data set is valuable, then this loss can be very damaging, especially if it took a lot of work and time to obtain the original data set.  This shows the danger of accidental replacement of a data set in SAS when using PROC SORT.

Read more of this post


Sort a data set by ascending or descending variables using PROC SORT in SAS

Consider the built-in data set SASHELP.CLASS in SAS.  Here are the first 5 observations from PROC PRINT.

Obs Name Sex Age Height Weight
1 Joyce F 11 51.3 50.5
2 Thomas M 11 57.5 85.0
3 James M 12 57.3 83.0
4 Jane F 12 59.8 84.5
5 John M 12 59.0 99.5

As you can clearly see, they are NOT sorted by weight.  Here is how you can sort the data set by weight using PROC SORT.

Read more of this post