## Sorting correlation coefficients by their magnitudes in a SAS macro

#### Theoretical Background

Many statisticians and data scientists use the correlation coefficient to study the relationship between 2 variables.  For 2 random variables, $X$ and $Y$, the correlation coefficient between them is defined as their covariance scaled by the product of their standard deviations.  Algebraically, this can be expressed as

$\rho_{X, Y} = \frac{Cov(X, Y)}{\sigma_X \sigma_Y} = \frac{E[(X - \mu_X)(Y - \mu_Y)]}{\sigma_X \sigma_Y}$.

In real life, you can never know what the true correlation coefficient is, but you can estimate it from data.  The most common estimator for $\rho$ is the Pearson correlation coefficient, which is defined as the sample covariance between $X$ and $Y$ divided by the product of their sample standard deviations.  Since there is a common factor of

$\frac{1}{n - 1}$

in the numerator and the denominator, they cancel out each other, so the formula simplifies to

$r_P = \frac{\sum_{i = 1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i = 1}^{n}(x_i - \bar{x})^2 \sum_{i = 1}^{n}(y_i - \bar{y})^2}}$.

In predictive modelling, you may want to find the covariates that are most correlated with the response variable before building a regression model.  You can do this by

1. computing the correlation coefficients
2. obtaining their absolute values
3. sorting them by their absolute values.