← Organic and Inorganic Chemistry Lesson of the Day – Conformational Isomers (or Conformers)

Organic Chemistry Lesson of the Day – The 2 Conformational Isomers of Ethane →

Performing Logistic Regression in R and SAS

November 24, 2014 3 Comments

Introduction

My statistics education focused a lot on normal linear least-squares regression, and I was even told by a professor in an introductory statistics class that 95% of statistical consulting can be done with knowledge learned up to and including a course in linear regression. Unfortunately, that advice has turned out to vastly underestimate the variety and depth of problems that I have encountered in statistical consulting, and the emphasis on linear regression has not paid dividends in my statistics career so far. Wisdom from veteran statisticians and my own experience combine to suggest that logistic regression is actually much more commonly used in industry than linear regression. I have already started a series of short lessons on binary classification in my Statistics Lesson of the Day and Machine Learning Lesson of the Day. In this post, I will show how to perform logistic regression in both R and SAS. I will discuss how to interpret the results in a later post.

The Data Set

The data set that I will use is slightly modified from Michael Brannick’s web page that explains logistic regression. I copied and pasted the data from his web page into Excel, modified the data to create a new data set, then saved it as an Excel spreadsheet called heart attack.xlsx.

In R, I imported it using the “XLConnect” package.
In SAS, I imported it using PROC IMPORT.

This data set has 3 variables (I have renamed them for convenience in my R programming).

ha2 – Whether or not a patient had a second heart attack. If ha2 = 1, then the patient had a second heart attack; otherwise, if ha2 = 0, then the patient did not have a second heart attack. This is the response variable.
treatment – Whether or not the patient completed an anger control treatment program.
anxiety – A continuous variable that scores the patient’s anxiety level. A higher score denotes higher anxiety.

Read the rest of this post to get the full scripts and view the full outputs of this logistic regression model in both R and SAS!

R Script for Implementing Logistic Regression

##### Interpreting the Results of a Logistic Regression Model in R
##### By Eric Cai - The Chemical Statistician

# clear all variables in the workspace
rm(list=ls(all=TRUE))

library(XLConnect)
heart.attack <- readWorksheet(loadWorkbook("YOUR DIRECTORY PATH HERE/heart attack.xlsx"), sheet = 1)

# perform logistic regression and assign the output object to the variable "logistic.ha"
logistic.ha = glm(ha2 ~ treatment + anxiety, family = binomial, data = heart.attack)

# use the summary() function to view the results of the model
summary(logistic.ha)

R Output of Logistic Regression Model

Here is the output from summary(logistic.ha).

> summary(logistic.ha)

Call:
glm(formula = ha2 ~ treatment + anxiety, family = binomial, data = heart.attack)

Deviance Residuals: 
Min             1Q             Median        3Q          Max 
-2.06190        -0.51429       -0.02087      0.50417     2.11830 

Coefficients:
Estimate        Std.           Error       z value      Pr(>|z|) 
(Intercept)     -6.38342       2.50468     -2.549       0.01082 * 
treatment       -2.73309       1.00548     -2.718       0.00656 **
anxiety         0.13970        0.04819     2.899        0.00374 **
---
Signif. codes: 0 ‘***’    0.001 ‘**’    0.01 ‘*’    0.05 ‘.’    0.1 ‘ ’    1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 55.452 on 39 degrees of freedom
Residual deviance: 29.753 on 37 degrees of freedom
AIC: 35.753

Number of Fisher Scoring iterations: 5

SAS Script for Implementing Logistic Regression

Here is the SAS script for performing the same logistic regression analysis. The code at the beginning is useful for clearing the log, the output file and the results viewer.

/* 
Useful Options For Every SAS Program 
- With Some Tips Learned From Dr. Jerry Brunner
by Eric Cai - The Chemical Statistician
*/

dm 'cle log; cle out;';
ods html close; 
ods html;

dm 'odsresults; clear';
ods listing close;
ods listing;

options 
          noovp
          nodate
          linesize = 105
          formdlim = '-'
          pageno = min;

title 'Worcester Heart Attack Study';

* import the data;
proc import 
          datafile = "INSERT YOUR DIRECTORY PATH HERE\heart attack.xlsx" 
                     dbms = xlsx 
                     out = heart_attack_raw
                     replace;
run;


* create formats for the categorical variables;
proc format;
          value ha      1 = 'Yes'
                         0 = 'No';

          value treatment  1 = 'Yes'
                           0 = 'No';
run;


* add formats and labels to variables in the data set; 
data heart_attack;
           set heart_attack_raw;
           format        ha2 ha.
                         treatment treatment.;
           label         ha2 = '2nd Heart Attack'
                         treatment = 'Received Treatment for Anger'
                         anxiety = 'Anxiety Score';
run;


* export the logistic regression output to a PDF file;
ods graphics on;
ods pdf file = "INSERT YOUR DIRECTORY PATH HERE\SAS output - logistic regression of heart attacks.pdf";

proc logistic
           data = heart_attack;
           class treatment(ref = 'No') / param = ref;
           model ha2(event = 'Yes') = treatment anxiety / parmlabel;
run;

quit;
ods pdf close;

SAS Output of Logistic Regression Model

Here is the output as seen in the results viewer. As you can see in my above code, I also used ods graphics and ods pdf to export the output into a PDF file for easy viewing and reporting.

Worcester Heart Attack Study

The LOGISTIC Procedure

Model Information
Data Set	WORK.HEART_ATTACK
Response Variable	ha2	2nd Heart Attack
Number of Response Levels	2
Model	binary logit
Optimization Technique	Fisher’s scoring

Number of Observations Read	40
Number of Observations Used	40

Response Profile
Ordered Value	ha2	Total Frequency
1	No	20
2	Yes	20

Probability modeled is ha2=’Yes’.

Class Level Information
Class	Value	Design Variables
treatment	No	0
	Yes	1

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics
Criterion	Intercept Only	Intercept and Covariates
AIC	57.452	35.753
SC	59.141	40.820
-2 Log L	55.452	29.753

Testing Global Null Hypothesis: BETA=0
Test	Chi-Square	DF	Pr > ChiSq
Likelihood Ratio	25.6988	2	<.0001
Score	20.3024	2	<.0001
Wald	11.3897	2	0.0034

Type 3 Analysis of Effects
Effect	DF	Wald Chi-Square	Pr > ChiSq
treatment	1	7.3879	0.0066
anxiety	1	8.4033	0.0037

Analysis of Maximum Likelihood Estimates
Parameter		DF	Estimate	Standard Error	Wald Chi-Square	Pr > ChiSq	Label
Intercept		1	-6.3834	2.5048	6.4949	0.0108	Intercept: ha2=No
treatment	Yes	1	-2.7331	1.0055	7.3879	0.0066	Received Treatment for Anger Yes
anxiety		1	0.1397	0.0482	8.4033	0.0037	Anxiety Score

Odds Ratio Estimates
Effect	Point Estimate	95% Wald Confidence Limits
treatment Yes vs No	0.065	0.009	0.467
anxiety	1.150	1.046	1.264

Association of Predicted Probabilities and Observed Responses
Percent Concordant	89.5	Somers’ D	0.823
Percent Discordant	7.3	Gamma	0.850
Percent Tied	3.3	Tau-a	0.422
Pairs	400	c	0.911

Filed under Applied Statistics, Biostatistics, Categorical Data Analysis, R programming, Statistics, Tutorials Tagged with applied statistics, binary classification, deviance residual, deviance residuals, Excel, fisher scoring, fisher scoring algorithm, logistic regression, null deviance, ods graphics, ods pdf, proc import, proc logistic, R, R programming, regression, residual deviance, SAS, sas programming, statistics

3 Responses to Performing Logistic Regression in R and SAS

nishant analyst says:

November 24, 2014 at 11:04 pm

Reblogged this on nishant@analyst.

Reply
Rehan says:

February 23, 2018 at 2:08 pm

Can you please explain why the coefficients are different?

Reply
- Eric Cai - The Chemical Statistician says:
  
  February 23, 2018 at 2:31 pm
  
  Hi Rehan,
  
  I think that the differences are due to round-off error.
  
  Eric
  
  Reply

	Eric Cai - The Chemi… on Convert multiple variables bet…
	Jack on Convert multiple variables bet…
	Eric Cai - The Chemi… on Getting the names, types, form…
	Emily V on Getting the names, types, form…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Convert multiple variables bet…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Exploratory Data Analysis: Com…
	CK on Exploratory Data Analysis: Com…
	Eric Cai - The Chemi… on Video Tutorial: Breaking Down…

The Chemical Statistician

Performing Logistic Regression in R and SAS

Introduction

The Data Set

R Script for Implementing Logistic Regression

R Output of Logistic Regression Model

SAS Script for Implementing Logistic Regression

SAS Output of Logistic Regression Model

3 Responses to Performing Logistic Regression in R and SAS

Leave a reply to Eric Cai - The Chemical Statistician Cancel reply

Eric’s Twitter Feed (@chemstateric)

Recent Comments

Popular Topics

Recent Posts

About Eric

Blogs and Web Sites That I Like to Read

Archives

Categories

The Chemical Statistician

Performing Logistic Regression in R and SAS

Introduction

The Data Set

R Script for Implementing Logistic Regression

R Output of Logistic Regression Model

SAS Script for Implementing Logistic Regression

SAS Output of Logistic Regression Model

Share this:

Related

3 Responses to Performing Logistic Regression in R and SAS

Leave a reply to Eric Cai - The Chemical Statistician Cancel reply

Eric’s Twitter Feed (@chemstateric)

Recent Comments

Popular Topics

Recent Posts

About Eric

Blogs and Web Sites That I Like to Read

Archives

Categories