# Calculating the sum or mean of a numeric (continuous) variable by a group (categorical) variable in SAS

October 28, 2014 3 Comments

#### Introduction

A common task in **data analysis** and **statistics** is to calculate the **sum** or **mean** of a continuous variable. If that variable can be categorized into 2 or more classes, you may want to **get the sum or mean for each class**.

This sounds like a simple task, yet I took a surprisingly long time to learn how to do this in SAS and get exactly what I want – a new data with with each category as the identifier and the calculated sum/mean as the value of a second variable. Here is an example to show you how to do it using PROC MEANS.

**Read more to see an example data set and get the SAS code to calculate the sum or mean of a continuous variable by a categorical variable!**

#### The Example Data

Let’s use the data set “class” in the “sashelp” library, which should already be installed and available for you when you open SAS. If you run the code

proc print data = sashelp.class noobs; run;

you will see this data table:

Name | Sex | Age | Height | Weight |
---|---|---|---|---|

Joyce | F | 11 | 51.3 | 50.5 |

Jane | F | 12 | 59.8 | 84.5 |

Louise | F | 12 | 56.3 | 77.0 |

Alice | F | 13 | 56.5 | 84.0 |

Barbara | F | 13 | 65.3 | 98.0 |

Carol | F | 14 | 62.8 | 102.5 |

Judy | F | 14 | 64.3 | 90.0 |

Janet | F | 15 | 62.5 | 112.5 |

Mary | F | 15 | 66.5 | 112.0 |

Thomas | M | 11 | 57.5 | 85.0 |

James | M | 12 | 57.3 | 83.0 |

John | M | 12 | 59.0 | 99.5 |

Robert | M | 12 | 64.8 | 128.0 |

Jeffrey | M | 13 | 62.5 | 84.0 |

Alfred | M | 14 | 69.0 | 112.5 |

Henry | M | 14 | 63.5 | 102.5 |

Ronald | M | 15 | 67.0 | 133.0 |

William | M | 15 | 66.5 | 112.0 |

Philip | M | 16 | 72.0 | 150.0 |

#### The SAS Code – 2 Methods Involving PROC MEANS

Let’s calculate the average height by gender. Here is the code to do it using PROC MEANS. Since we will categorize the means by gender using the “BY” statement in PROC MEANS, we first need to sort the data by the variable “sex”.

proc sort data = sashelp.class; by sex; run; proc means data = sashelp.class noprint mean; var height; by sex; output out = mean_of_heights_by_sex ( drop = _TYPE_ _FREQ_ ) mean = Mean_Height; run;

There are 2 variables that I don’t want in the new data set: _TYPE_ and _FREQ_, so I drop them while the output data set is created. The “mean” option in the “output” statement specifies the name of variable of that contains the mean. You can leave it blank, and it will just use the original variable’s name in the new data set – in our case, “height”. I want to specify that this isn’t just any height, but the mean height, so I declare a new variable name: Mean_Height.

Update on Wednesday, October 29, 2014: Thanks to a helpful comment below from Guillaume, here is another code to get the same result using the “CLASS” statement. Note that I have added the “NWAY” option in the “PROC MEANS” statement to ensure that the overall mean for the entire data set is NOT printed. This option does not require sorting ahead of time, because I am not using the “BY” statement.

proc means data = sashelp.class nway noprint mean; var height; class sex; output out = mean_of_heights_by_sex ( drop = _TYPE_ _FREQ_ ) mean = Mean_Height; run;

If you are interested in learning more about the differences between using the “BY” statement and using the “CLASS” statement,

- here is a discussion on the SAS Support Communities forum on this topic
- here is the SAS documentation on the CLASS statement; scroll down to the bottom to read a paragraph on this topic

#### Viewing the Result

Now, let’s print the resulting data set to see what we get. **Both of the above methods will yield the same output data set**.

proc print data = mean_of_heights_by_sex noobs label; label mean_height = 'Average Height'; run;

Here is the output data set!

Sex | Mean Height |
---|---|

F | 60.5889 |

M | 63.9100 |

Hi Eric,

One little detail is that if you use the “by” statement, your input dataset has to be sorted. Replacing “by” by “class” allows you to obtain the same result without having to sort your dataset beforehand😉. Apparently class uses more computational time than by but unless you have millions of raw I’m sure you won’t notice the difference.

Hi Guillaume,

Thanks for your insightful comment! I have updated my post to reflect your suggestion. Note the use of NWAY to avoid printing the overall mean.

Merci pour ta suggestion utile!

Reblogged this on Psychology & Statistics Tutor:Mentor and commented:

I have not used SAS myself~