Beware of accidental replacement of data sets with PROC SORT in SAS
June 28, 2018 Leave a comment
PROC SORT is a very useful procedure in SAS. Not only can you sort a data set on one or more variables with it, but you can sort each variable in ascending or descending order, and you can use it to obtain unique observations or duplicated observations. However, there is a feature about PROC SORT that can be dangerous and deserves emphasis: If you are not careful, you can accidentally replace an existing, valuable data set.
Suppose that you wish to use PROC SORT to get only the duplicated records of a data set. Here is an example of how to do it.
data heights; input Name $ Age Height; datalines; Amy 15 174 Amy 16 177 Bob 14 172 Cam 13 163 Cam 17 181 ; run; proc sort data = heights nouniquekey; by Name; run; proc print data = heights; run;
Obs | Name | Age | Height |
---|---|---|---|
1 | Amy | 15 | 174 |
2 | Amy | 16 | 177 |
3 | Cam | 13 | 163 |
4 | Cam | 17 | 181 |
Note that the record for “Bob” is gone from HEIGHTS, because it was a unique observation and, thus, removed in the above PROC SORT statement.
If the original data set is valuable, then this loss can be very damaging, especially if it took a lot of work and time to obtain the original data set. This shows the danger of accidental replacement of a data set in SAS when using PROC SORT.
Recent Comments