Free Seminar on Sports Analytics + Free Food @ Boston Pizza in New Westminster: 7 pm, Friday, January 23, 2015

I will attend the following seminar at 7 pm on Friday, January 23, 2015.  It will be held in a private dining room at Boston Pizza (1045 Columbia Street in New Westminster, British Columbia).  This seminar is part of Café Scientifique, an ongoing series of public lectures from my alma mater, Simon Fraser University.

If you will attend, please come and say “Hello”!

Reserve your free seat by emailing:
**Note that there is no accent above the “e” in this address.

SFU Café Scientifique

Friday, January 23, 2015

Speaker: Dr. Tim Swartz, Professor, Department of Statistics & Actuarial ScienceSimon Fraser University

Research interest: My general interest is statistical computing. Most of my work attempts to take advantage of the power of modern computing machinery to solve real statistical problems. The area where I have devoted a lot of attention is the integration problem arising in Bayesian applications. Lately, my interest in statistics in sport has grown to consume a fair bit of my time, perhaps too much of my time.

Topic: Sports Analytics

Sports analytics has become an important area of emphasis for professional sports teams in their attempt to obtain a competitive edge. The discussion will revolve around recent work that Dr. Swartz has conducted in sports analytics such as the optimal time to pull a goalie in hockey, insights into home team advantage and the value of draft positions in major league soccer.

Café Scientifique is a series of informal discussions connecting research to important issues of interest to the community.  Enjoy light snacks and refreshments while engaging with cutting-edge, award-winning researchers from Simon Fraser University’s (SFU) Faculty of Science.

Using PROC SGPLOT to Produce Box Plots with Contrasting Colours in SAS

I previously explained the statistics behind box plots and how to produce them in R in a very detailed tutorial.  I also illustrated how to produce side-by-side box plots with contrasting patterns in R.

Here is an example of how to make box plots in SAS using the VBOX statement in PROC SGPLOT.  I modified the built-in data set SASHELP.CLASS to generate one that suits my needs.

The PROC TEMPLATE statement specifies the contrasting colours to be used for different classes.  I also include code for exportingthe result into a PDF file using ODS PDF.  (I used varying shades of grey to allow the contrast to be shown when printed in black and white.)


Read more of this post

Getting a List of the Variable Names of a SAS Data Set

Update on 2017-04-15: I have written a new blog post that obtains the names, types, formats, lengths, and labels of variables in a SAS data set.  This uses PROC SQL instead of PROC CONTENTS.  I thank Robin for suggesting this topic in the comments and Jerry Leonard from SAS Technical Support for teaching me this method.


Getting a list of the variable names of a data set is a fairly common and useful task in data analysis and manipulation, but there is actually no procedure or function that will do this directly in SAS.  After some diligent searching on the Internet, I found a few tricks within PROC CONTENTS do accomplish this task.

Here is an example involving the built-in data set SASHELP.CLASS.  The ultimate goal is to create a new data set called “variable_names” that contains the variable names in one column.

The results of PROC CONTENTS can be exported into a new data set.  I will call this data set “data_info”, and it contains just 2 variables that we need – “name” and “varnum“.

Read more of this post

Getting All Duplicates of a SAS Data Set


A common task in data manipulation is to obtain all observations that appear multiple times in a data set – in other words, to obtain the duplicates.  It turns out that there is no procedure or function that will directly provide the duplicates of a data set in SAS*.

*Update: As Fareeza Khurshed kindly commented, the NOUNIQUEKEY option in PROC SORT is available in SAS 9.3+ to directly obtain duplicates and unique observations.  I have written a new blog post to illustrate her solution.

The Wrong Way to Obtain Duplicates in SAS

You may think that PROC SORT can accomplish this task with the nodupkey and the dupout options.  However, the output data set from such a procedure does not have the first of each set of duplicates.  Here is an example.

Read more of this post