My Own R Function and Script for Simple Linear Regression – An Illustration with Exponential Decay of DDT in Trout

Here is the function that I wrote for doing simple linear regression, as alluded to in my blog post about simple linear regression on log-transformed data on the decay of DDT concentration in trout in Lake Michigan.  My goal was to replicate the 4 columns of the output from applying summary() to the output of lm().

To use this file and this script,

1) I saved this file as “simple linear regression.r”.

2) In the same folder, I saved a script called “DDT trout regression.r” that used this function to implement simple linear regression on the log-transformed DDT data.

3) I used setwd() to change the working directory to the folder containing the function and the script.

4) I made sure “DDT trout regression.r” used the source() function to call my user-defined function for simple linear regression.

5) I ran “DDT trout regression.r”.

##### A function for simple linear regression
##### Written by Eric Cai - The Chemical Statistician
simple.linear.regression = function(predictor, target)
{
     sample.size = length(target)

     mean.predictor = mean(predictor)
     mean.target = mean(target)

     sxx = sum((predictor - mean.predictor)^2)
     sxy = sum((predictor - mean.predictor)*(target - mean.target))

     beta1.hat = sxy/sxx
     beta0.hat = mean.target - beta1.hat*mean.predictor

     target.fitted = beta0.hat + beta1.hat*predictor
     target.residuals = target - target.fitted
     sum.squared.residuals = sum(target.residuals^2)

     degree.freedom = sample.size - 2
     mean.squared.error = sum.squared.residuals/degree.freedom
     standard.error = sqrt(mean.squared.error)

     beta0.hat.sample.variance = mean.squared.error*(1/sample.size + mean.predictor^2/sxx)
     beta0.hat.standard.error = sqrt(beta0.hat.sample.variance)

     beta1.hat.sample.variance = mean.squared.error/sxx
     beta1.hat.standard.error = sqrt(beta1.hat.sample.variance)

     t.beta0.hat = beta0.hat/beta0.hat.standard.error
     t.beta1.hat = beta1.hat/beta1.hat.standard.error

     p.value.beta0.hat = 2*pt(abs(t.beta0.hat), degree.freedom, lower.tail = F)
     p.value.beta1.hat = 2*pt(abs(t.beta1.hat), degree.freedom, lower.tail = F)

     # aggregate inferential results into a table, similar to output of summary(lm(y~x))
     beta0.inference = c(beta0.hat, beta0.hat.standard.error, t.beta0.hat, p.value.beta0.hat)
     beta1.inference = c(beta1.hat, beta1.hat.standard.error, t.beta1.hat, p.value.beta1.hat)

     # combine inferential results of beta0 and beta1 into a matrix
     regression.inference = rbind(beta0.inference, beta1.inference)

     # provide names to columns and rows for table of inferential results
     colnames(regression.inference) = c('Estimated Coefficient', 'Standard Error', 't-statistic', 'p-value')
     rownames(regression.inference) = c('beta0', 'beta1')

     # display table
     regression.inference
}

 

Here is “DDT in trout regression.r”.  In this script, I entered the data and conducted simple linear regression.

##### Linear Regression of DDT Decay in Michigan Trout
##### Data from "Elements of Environmental Chemistry" by Ronald A. Hites, Page 49
##### Script written by Eric Cai - The Chemical Statistician

# clear all variables in the workspace
rm(list=ls(all=TRUE))

# set working directory 
setwd('INSERT YOUR DIRECTORY PATH HERE')
# call my user-defined function, 'simple linear regression.r'
source('simple linear regression.r')

year = c(1970:1982, seq(1984, 1992, by=2), 1995, 1998) 
DDT = c(19.19, 13.00, 11.31, 9.96, 8.42, 7.50, 5.65, 6.34, 4.58, 6.91, 4.74, 3.22, 2.74, 2.22, 1.10, 1.44, 1.39, 1.16, 0.98, 0.85)

DDT.data = cbind(year, DDT)
colnames(DDT.data) = c('Time (Year)', 'DDT Concentration')
DDT.data

# let's plot the untransformed DDT concentration data with respect to time 
png("INSERT YOUR DIRECTORY HERE/DDT in trout.png")
plot(year, DDT, xlab = 'Year', ylab = 'DDT Concentration in Trout (ppm)', main = 'DDT Concentration in Trout in Lake Michigan')
dev.off()

# let's conduct simple linear regression on the untransformed data
DDT.regression = simple.linear.regression(year, DDT)
DDT.regression

# now transform the data using natural logarithm to linearize the DDT-time relationship 
ln.DDT = log(DDT)

# let's plot the log-transformed DDT concentration data with respect to time 
 png("INSERT YOUR DIRECTORY HERE/ln(DDT) in trout.png")
plot(year, ln.DDT, xlab = 'Year', ylab = 'Natural Logarithm of DDT Concentration in Trout in Lake', main = 'Natural Logarithm of DDT Concentration in Trout in Lake Michigan')
dev.off()

# conduct simple linear regression on the log-transformed data 
ln.DDT.regression = simple.linear.regression(year, ln.DDT)
ln.DDT.regression

One Response to My Own R Function and Script for Simple Linear Regression – An Illustration with Exponential Decay of DDT in Trout

Your thoughtful comments are much appreciated!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: