← Separating Unique and Duplicated Observations Using PROC SORT in SAS 9.3 and Newer Versions

Eric’s Enlightenment for Monday, April 20, 2015 →

Using PROC SQL to Find Uncommon Observations Between 2 Data Sets in SAS

April 17, 2015 2 Comments

A common task in data analysis is to compare 2 data sets and determine the uncommon rows between them. By “uncommon rows”, I mean rows whose identifier value exists in one data set but not the other. In this tutorial, I will demonstrate how to do so using PROC SQL.

Let’s create 2 data sets.

data dataset1;
      input id $ group $ gender $ age;
      cards;
      111 A Male 11
      111 B Male 11
      222 D Male 12
      333 E Female 13
      666 G Female 14
      999 A Male 15
      999 B Male 15
      999 C Male 15
      ;
run;

data dataset2;
      input id $ group $ gender $ age;
      cards;
      111 A Male 11
      999 C Male 15
      ;
run;

First, let’s identify the observations in dataset1 whose ID variable values don’t exist in dataset2. I will export this set of observations into a data set called mismatches1, and I will print it for your viewing. The logic of the code is simple – find the IDs in dataset1 that are not in the IDs in dataset2. The code “select *” ensures that all columns from dataset1 are used to create the data set in mismatches1.

proc sql;
      create table mismatches1 as
      select * from dataset1
      where id
      not in
      (
           select id from dataset2 
      );
quit;

proc print
      data = mismatches1;
      title 'Mismatches on Agency ID';
run;

Here is the resulting data set. As expected, the observations with ID values “222”, “333” and “666” are uncommon between the 2 data sets, so they were exported into mismatches1.

Mismatches on Agency ID

Obs	id	group	gender	age
1	222	D	Male	12
2	333	E	Female	13
3	666	G	Female	14

Sometimes, I have 2 identifier variables, and I want to find the uncommon observations as identified by both of those variables. Here is an example of how to do this with the variables “ID” and “GROUP” in our 2 data sets. I will export this second set of observations into a data set called mismatches2, and I will print it for your viewing. Here is the logic of the code:

find the IDs in dataset2 such that ID and GROUP are common between dataset1 and dataset2 – let’s call this set of IDs X
find the IDs in dataset1 such that do not exist in X

proc sql;
      create table mismatches2 as
      select * from dataset1 d1
      where id not in
      (
           select id from dataset2 d2 
           where d1.id = d2.id
                and 
           d1.group = d2.group
      );
quit;

proc print
      data = mismatches2;
      title 'Mismatches on Agency ID and Group';
run;

Here is the resulting data set. Notice that, as expected, 111-B, 999-A and 999-B are also mismatches.

Mismatches on Agency ID and Group

Obs	id	group	gender	age
1	111	B	Male	11
2	222	D	Male	12
3	333	E	Female	13
4	666	G	Female	14
5	999	A	Male	15
6	999	B	Male	15

Filed under Data Analysis, SAS Programming, Tutorials Tagged with data analysis, data manipulation, PROC SQL, SAS, SQL, uncommon observations, uncommon rows

2 Responses to Using PROC SQL to Find Uncommon Observations Between 2 Data Sets in SAS

wbkrebs says:

May 3, 2015 at 3:40 pm

Your method for finding unique observations won’t work if you have two keys and there are values of the first key which are in both data sets for some values of the second key and only the first data set for other values.

However, the following will work for that case:

proc sql;
create table mismatches2 as
select * from dataset1 d1
( select id, group from dataset1
except
select id, group from dataset2 ) gl
where d1.id = gl.id and d1.group = gl.group;
quit;

Reply
- Eric Cai - The Chemical Statistician says:
  
  May 5, 2015 at 10:17 am
  
  Hi Wbkrebs,
  
  Thanks for taking the time to point this out and share the code. Could you please provide an example of this problem and how your code solves it? I struggle to understand what the problem is, though I am very interested in learning the limitations of my code.
  
  Many thanks,
  
  Eric
  
  Reply

	Eric Cai - The Chemi… on Convert multiple variables bet…
	Jack on Convert multiple variables bet…
	Eric Cai - The Chemi… on Getting the names, types, form…
	Emily V on Getting the names, types, form…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Convert multiple variables bet…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Exploratory Data Analysis: Com…
	CK on Exploratory Data Analysis: Com…
	Eric Cai - The Chemi… on Video Tutorial: Breaking Down…

The Chemical Statistician

Using PROC SQL to Find Uncommon Observations Between 2 Data Sets in SAS

2 Responses to Using PROC SQL to Find Uncommon Observations Between 2 Data Sets in SAS

Your thoughtful comments are much appreciated! Cancel reply

Eric’s Twitter Feed (@chemstateric)

Recent Comments

Popular Topics

Recent Posts

About Eric

Blogs and Web Sites That I Like to Read

Archives

Categories

The Chemical Statistician

Using PROC SQL to Find Uncommon Observations Between 2 Data Sets in SAS

Share this:

Related

2 Responses to Using PROC SQL to Find Uncommon Observations Between 2 Data Sets in SAS

Your thoughtful comments are much appreciated! Cancel reply

Eric’s Twitter Feed (@chemstateric)

Recent Comments

Popular Topics

Recent Posts

About Eric

Blogs and Web Sites That I Like to Read

Archives

Categories