Beware of accidental replacement of data sets with PROC SORT in SAS

PROC SORT is a very useful procedure in SAS.  Not only can you sort a data set on one or more variables with it, but you can sort each variable in ascending or descending order, and you can use it to obtain unique observations or duplicated observationsHowever, there is a feature about PROC SORT that can be dangerous and deserves emphasis: If you are not careful, you can accidentally replace an existing, valuable data set.

Suppose that you wish to use PROC SORT to get only the duplicated records of a data set.  Here is an example of how to do it.

data heights;
     input Name $ 
Amy 15 174
Amy 16 177
Bob 14 172
Cam 13 163
Cam 17 181

proc sort
     data = heights
     by Name;

proc print
     data = heights;
Obs Name Age Height
1 Amy 15 174
2 Amy 16 177
3 Cam 13 163
4 Cam 17 181

Note that the record for “Bob” is gone from HEIGHTS, because it was a unique observation and, thus, removed in the above PROC SORT statement.

If the original data set is valuable, then this loss can be very damaging, especially if it took a lot of work and time to obtain the original data set.  This shows the danger of accidental replacement of a data set in SAS when using PROC SORT.

Read more of this post


Layne Newhouse on representing neural networks – The Central Equilibrium – Episode 4

I am excited to present the first of a multi-episode series on neural networks on my talk show, “The Central Equilibrium”.  My guest in this series in Layne Newhouse, and he talked about how to represent neural networks. We talked about the biological motivations behind neural networks, how to represent them in diagrams and mathematical equations, and a few of the common activation functions for neural networks.

Check it out!

My Silver Medal from the Canadian Society for Chemistry – Reflections After 10 Years

In June, 2008, I received an email from Dr. Ken MacFarlane, then the Undergraduate Advisor in the Department of Chemistry at Simon Fraser University (SFU).  He wrote to inform me that I had won the Canadian Society for Chemistry‘s Silver Medal, given to the top undergraduate student in chemistry entering their final year of study at each Canadian university.

I won the Canadian Society for Chemistry’s Silver Medal for being the top fourth-year student in the Department of Chemistry at Simon Fraser University in 2008.

Later in November of that year, I received this medal at a dinner banquet, which honoured all of the award winners from the universities and colleges in the Vancouver Section of the Chemical Institute of Canada (CIC).  (Awards were given to the top students in their second year, third year, and fourth year of study.)  Here is a photo of me receiving my medal from Dr. Daniel Leznoff; he was then the Chair of the Vancouver Section of the CIC and a professor specializing in inorganic chemistry at SFU.

Eric getting medal from Dr. Leznoff

I received the Canadian Society for Chemistry’s Silver Medal from Dr. Daniel Leznoff at a dinner banquet in November, 2008.

The CIC publishes a magazine called Canadian Chemical News, and it covered the above award banquet in January, 2009.  You can find a photo of the award winners from that night on Page 29.

Dr. Cameron Forde succeeded Dr. MacFarlane as our Undergraduate Advisor in 2009.  In an email to me in October, 2009, Dr. Forde wrote that 100-120 students were eligible for the CSC’s Silver Medal in our department in 2008.

This is one of the greatest achievements of my life.  I am even more excited about it today than I was at that banquet, because I now have 10 years of perspective about how this medal has benefited my career.  In this retrospective article, I write to share my reflections about the impact that this medal has had on my professional trajectory – which has been unusual, to say the least.

Read more of this post

Include a professional photo of yourself in business attire in your LinkedIn profile

One of the easiest ways to polish your LinkedIn profile is posting a photo of yourself in business attire.  I strongly encourage every LinkedIn user to spend several hours to take at least 100 such photos of yourself.  Ask a friend or family member to take these photographs, if they would be so kind and willing to do so.  Alternatively, you can hire a professional photographer.  After this session, you will have a large stock of photos that you can use for various purposes.

Position yourself with many backgrounds, and take those photos from many angles.  You should always look straight into the camera, smile, and maintain an upright posture.  Here is what my LinkedIn profile looks like.

When you build a professional network and an online brand, people need to know who you are and what you look like.  When they meet you in person, your photo allows them to visually connect you with the online profile that they saw on LinkedIn.

This is especially crucial for people with common names; showing your photo allows others to easily distinguish you from others who share your name.  It turns out that there is another person named Eric Cai who works as a data scientist!  Not only do we share the same name, but we also have the same profession.  Without photographs, it would be quite difficult to distinguish between us in a professional setting.


Common Mistakes

I recently spoke at the Canadian Statistics Student Conference and at the University of Toronto’s Biostatistics Research Day, and I talked about this with students at both events.  Here are the common mistakes that I see in LinkedIn profile photos, and I urge you to avoid all of them.

  • Not having a profile photo
  • Not wearing professional attire
  • Not smiling
  • Covering your eyes with sunglasses
  • Looking away from the camera

Remember: This photo is for your professional branding, and your future employers or clients will look at it.  It is not for Facebook, Tinder, Grindr, or other social networks that are personal in nature.  Do not try to be cute, funny, sexy, or controversial – be professional.

Highlighting cells to quickly view Average, Count, and Sum in Excel

I recently needed to check my answers after some data analysis in Alteryx.  I computed many averages using a formula in Alteryx, and I wanted to check those results by calculating the average for a few randomly selected rows.  I did this by invoking a helpful tool in Microsoft Excel.  I will illustrate this functionality with some random data.

In Column E, I used a formula to calculate the average of the 3 populations in Columns B, C, and D.  To manually check that the formula is correct, I highlighted the 3 columns for ID #125.  On the bottom right, Excel calculates the average; it’s difficult to see in the picture below, but Excel confirms that the average is 707,154.

Excel average

Whenever you highlight a range of cells containing numeric data, Excel will provide the average, count, and sum of the selected cells.  I did not know about this functionality when I first began working as a statistician, and I am very glad that I did learn it eventually – it is very useful for checking answers by analyzing a few randomly selected rows in Excel!

Career-Advice Panel at Biostatistics Research Day – SORA-TABA Workshop – Dalla Lana School of Public Health – University of Toronto – Friday, June 15, 2018

I will speak on the career-advice panel at Biostatistics Research Day, an event jointly hosted by

  • the Dalla Lana School of Public Health at the University of Toronto
  • the Southern Ontario Regional Association (SORA) of the Statistical Society of Canada (SSC)
  • the Toronto Applied Biostatistics Association (TABA)

This one-day conference will be held on Friday, June 15, from 8:00 am to 5:00 pm.  The location is HS610 in the Dalla Lana School of Public Health (155 College Street in Toronto, Ontario).  This room is a large auditorium on the 6th floor.

The career panel will be held from 12:15 pm to 2:00 pm.  If you will be at this conference, please feel free to come and say “Hello”!