# Calculating the sum or mean of a numeric (continuous) variable by a group (categorical) variable in SAS

#### Introduction

A common task in data analysis and statistics is to calculate the sum or mean of a continuous variable.  If that variable can be categorized into 2 or more classes, you may want to get the sum or mean for each class.

This sounds like a simple task, yet I took a surprisingly long time to learn how to do this in SAS and get exactly what I want – a new data with with each category as the identifier and the calculated sum/mean as the value of a second variable.  Here is an example to show you how to do it using PROC MEANS.

Read more to see an example data set and get the SAS code to calculate the sum or mean of a continuous variable by a categorical variable!

#### The Example Data

Let’s use the data set “class” in the “sashelp” library, which should already be installed and available for you when you open SAS.  If you run the code

```proc print
data = sashelp.class noobs;
run;```

you will see this data table:

Name Sex Age Height Weight
Joyce F 11 51.3 50.5
Jane F 12 59.8 84.5
Louise F 12 56.3 77.0
Alice F 13 56.5 84.0
Barbara F 13 65.3 98.0
Carol F 14 62.8 102.5
Judy F 14 64.3 90.0
Janet F 15 62.5 112.5
Mary F 15 66.5 112.0
Thomas M 11 57.5 85.0
James M 12 57.3 83.0
John M 12 59.0 99.5
Robert M 12 64.8 128.0
Jeffrey M 13 62.5 84.0
Alfred M 14 69.0 112.5
Henry M 14 63.5 102.5
Ronald M 15 67.0 133.0
William M 15 66.5 112.0
Philip M 16 72.0 150.0

#### The SAS Code – 2 Methods Involving PROC MEANS

Let’s calculate the average height by gender.  Here is the code to do it using PROC MEANS.  Since we will categorize the means by gender using the “BY” statement in PROC MEANS, we first need to sort the data by the variable “sex”.

```proc sort
data = sashelp.class;
by sex;
run;

proc means
data = sashelp.class
noprint
mean;
var height;
by sex;
output
out = mean_of_heights_by_sex (
drop = _TYPE_
_FREQ_
)
mean = Mean_Height;
run;```

There are 2 variables that I don’t want in the new data set: _TYPE_ and _FREQ_, so I drop them while the output data set is created.  The “mean” option in the “output” statement specifies the name of variable of that contains the mean.  You can leave it blank, and it will just use the original variable’s name in the new data set – in our case, “height”.  I want to specify that this isn’t just any height, but the mean height, so I declare a new variable name: Mean_Height.

Update on Wednesday, October 29, 2014: Thanks to a helpful comment below from Guillaume, here is another code to get the same result using the “CLASS” statement.  Note that I have added the “NWAY” option in the “PROC MEANS” statement to ensure that the overall mean for the entire data set is NOT printed.  This option does not require sorting ahead of time, because I am not using the “BY” statement.

```proc means
data = sashelp.class
nway
noprint
mean;
var height;
class sex;
output
out = mean_of_heights_by_sex (
drop = _TYPE_
_FREQ_
)
mean = Mean_Height;
run;```

If you are interested in learning more about the differences between using the “BY” statement and using the “CLASS” statement,

#### Viewing the Result

Now, let’s print the resulting data set to see what we get.  Both of the above methods will yield the same output data set.

```proc print
data = mean_of_heights_by_sex
noobs label;
label mean_height = 'Average Height';
run;```

Here is the output data set!

Sex Mean Height
F 60.5889
M 63.9100

### 3 Responses to Calculating the sum or mean of a numeric (continuous) variable by a group (categorical) variable in SAS

1. Guillaume says:

Hi Eric,

One little detail is that if you use the “by” statement, your input dataset has to be sorted. Replacing “by” by “class” allows you to obtain the same result without having to sort your dataset beforehand ;-). Apparently class uses more computational time than by but unless you have millions of raw I’m sure you won’t notice the difference.

• Eric Cai - The Chemical Statistician says:

Hi Guillaume,

Thanks for your insightful comment! I have updated my post to reflect your suggestion. Note the use of NWAY to avoid printing the overall mean.

Merci pour ta suggestion utile!

2. Psych n Stats Tutor says:

Reblogged this on Psychology & Statistics Tutor:Mentor and commented:
I have not used SAS myself~