## Sorting correlation coefficients by their magnitudes in a SAS macro

March 21, 2017 Leave a comment

#### Theoretical Background

Many statisticians and data scientists use the **correlation coefficient** to study the relationship between 2 variables. For 2 random variables, and , the correlation coefficient between them is defined as their covariance scaled by the product of their standard deviations. Algebraically, this can be expressed as

.

In real life, you can never know what the true correlation coefficient is, but you can estimate it from data. The most common estimator for is the **Pearson correlation coefficient**, which is defined as the sample covariance between and divided by the product of their sample standard deviations. Since there is a common factor of

in the numerator and the denominator, they cancel out each other, so the formula simplifies to

.

In predictive modelling, you may want to find the covariates that are most correlated with the response variable before building a regression model. You can do this by

- computing the correlation coefficients
- obtaining their absolute values
- sorting them by their absolute values.

## Recent Comments