Separating Unique and Duplicated Observations Using PROC SORT in SAS 9.3 and Newer Versions
April 10, 2015 5 Comments
As Fareeza Khurshed commented in my previous blog post, there is a new option in SAS 9.3 and later versions that allows sorting and the identification of duplicates to be done in one step. My previous trick uses FIRST.variable and LAST.variable to separate the unique observations from the duplicated observations, but that requires sorting the data set first before using the DATA step to do the separation. If you have SAS 9.3 or a newer version, here is an example of doing it in one step using PROC SORT.
There is a data set called ADOMSG in the SASHELP library that is built into SAS. It has an identifier called MSGID, and there are duplicates by MSGID. Let’s create 2 data sets out of SASHELP.ADOMSG:
- DUPLICATES for storing the duplicated observations
- SINGLES for storing the unique observations
proc sort data = sashelp.adomsg out = duplicates uniqueout = singles nouniquekey; by msgid; run;
Here is the log:
NOTE: There were 459 observations read from the data set SASHELP.ADOMSG. NOTE: 300 observations with unique key values were deleted. NOTE: The data set WORK.DUPLICATES has 159 observations and 6 variables. NOTE: The data set WORK.SINGLES has 300 observations and 6 variables. NOTE: PROCEDURE SORT used (Total process time): real time 0.28 seconds cpu time 0.00 seconds
Note that the number of observations in WORK.DUPLICATES and WORK.SINGLES add to 459, the total number of observations in the original data set.
In addition to Fareeza, I also thank CB for sharing this tip.
My thanks as well for the first post and for this tip – it worked flawlessly with my data. I’m relatively new to writing SAS code, and I was able to show this tip to my SAS mentor and teach him a cool new trick (for once). Much appreciated!
You’re welcome, Ron! Thanks for visiting my blog!
Great tips. Thanks Eric!
You’re welcome, Alex!
Thanks for writing this blog. It is very much informative and at the same time useful for me