## Analytical Chemistry Lesson of the Day – Specificity in Method Validation and Quality Assurance

In pharmaceutical chemistry, one of the requirements for method validation is specificity, the ability of an analytical method to distinguish the analyte from other chemicals in the sample.  The specificity of the method may be assessed by deliberately adding impurities into a sample containing the analyte and testing how well the method can identify the analyte.

Statistics is an important tool in analytical chemistry, and, ideally, there is no overlap in the vocabulary that is used between the 2 fields.  Unfortunately, the above definition of specificity is different from that in statistics.  In a previous Machine Learning and Applied Statistics Lesson of the Day, I introduced the concepts of sensitivity and specificity in binary classification.  In the context of assessing the predictive accuracy of a binary classifier, its specificity is the proportion of truly negative cases among the classified negative cases.

## Machine Learning and Applied Statistics Lesson of the Day – Positive Predictive Value and Negative Predictive Value

For a binary classifier,

• its positive predictive value (PPV) is the proportion of positively classified cases that were truly positive.

$\text{PPV} = \text{(Number of True Positives)} \ \div \ \text{(Number of True Positives} \ + \ \text{Number of False Positives)}$

• its negative predictive value (NPV) is the proportion of negatively classified cases that were truly negative.

$\text{NPV} = \text{(Number of True Negatives)} \ \div \ \text{(Number of True Negatives} \ + \ \text{Number of False Negatives)}$

In a later Statistics and Machine Learning Lesson of the Day, I will discuss the differences between PPV/NPV and sensitivity/specificity in assessing the predictive accuracy of a binary classifier.

(Recall that sensitivity and specificity can also be used to evaluate the performance of a binary classifier.  Based on those 2 statistics, we can construct receiver operating characteristic (ROC) curves to assess the predictive accuracy of the classifier, and a minimum standard for a good ROC curve is being better than the line of no discrimination.)

## Machine Learning and Applied Statistics Lesson of the Day – The Line of No Discrimination in ROC Curves

After training a binary classifier, calculating its various values of sensitivity and specificity, and constructing its receiver operating characteristic (ROC) curve, we can use the ROC curve to assess the predictive accuracy of the classifier.

A minimum standard for a good ROC curve is being better than the line of no discrimination.  On a plot of

$\text{Sensitivity}$

on the vertical axis and

$1 - \text{Specificity}$

on the horizontal axis, the line of no discrimination is the line that passes through the points

$(\text{Sensitivity} = 0, 1 - \text{Specificity} = 0)$

and

$(\text{Sensitivity} = 1, 1 - \text{Specificity} = 1)$.

In other words, the line of discrimination is the diagonal line that runs from the bottom left to the top right.  This line shows the performance of a binary classifier that predicts the class of the target variable purely by the outcome of a Bernoulli random variable with 0.5 as its probability of attaining the “Success” category.  Such a classifier does not use any of the predictors to make the prediction; instead, its predictions are based entirely on random guessing, with the probabilities of predicting the “Success” class and the “Failure” class being equal.

If we did not have any predictors, then we can rely on only random guessing, and a random variable with the distribution $\text{Bernoulli}(0.5)$ is the best that we can use for such guessing.  If we do have predictors, then we aim to develop a model (i.e. the binary classifier) that uses the information from the predictors to make predictions that are better than random guessing.  Thus, a minimum standard of a binary classifier is having an ROC curve that is higher than the line of no discrimination.  (By “higher“, I mean that, for a given value of $1 - \text{Specificity}$, the $\text{Sensitivity}$ of the binary classifier is higher than the $\text{Sensitivity}$ of the line of no discrimination.)

## Machine Learning and Applied Statistics Lesson of the Day – How to Construct Receiver Operating Characteristic Curves

A receiver operating characteristic (ROC) curve is a 2-dimensional plot of the $\text{Sensitivity}$ (the true positive rate) versus $1 - \text{Specificity}$ (1 minus the true negative rate) of a binary classifier while varying its discrimination threshold.  In statistics and machine learning, a basic and popular tool for binary classification is logistic regression, and an ROC curve is a useful way to assess the predictive accuracy of the logistic regression model.

To illustrate with an example, let’s consider the Bernoulli response variable $Y$ and the covariates $X_1, X_2, ..., X_p$.  A logistic regression model takes the covariates as inputs and returns $P(Y = 1)$.  You as the user of the model must decide above which value of $P(Y = 1)$ you will predict that $Y = 1$; this value is the discrimination threshold.  A common threshold is $P(Y = 1) = 0.5$.

Once you finish fitting the model with a training set, you can construct an ROC curve by following these steps below:

1. Set a discrimination threshold.
2. Use the covariates to predict $Y$ for each observation in a validation set.
3. Since you have the actual response values in the validation set, you can then calculate the sensitivity and specificity for your logistic regression model at that threshold.
4. Repeat Steps 1-3 with a new threshold.
5. Plot the values of $\text{Sensitivity}$ versus $1 - \text{Specificity}$ for all thresholds.  The result is your ROC curve.

The use of a validation set to assess the predictive accuracy of a model is called validation, and it is a good practice for supervised learning.  If you have another fresh data set, it is also good practice to use that as a test set to assess the predictive accuracy of your model.

Note that you can perform Steps 2-5 for the training set, too – this is often done in statistics when you don’t have many data to work with, and the best that you can do is to assess the predictive accuracy of your model on the data set that you used to fit the model.

## Machine Learning and Applied Statistics Lesson of the Day – Sensitivity and Specificity

To evaluate the predictive accuracy of a binary classifier, two useful (but imperfect) criteria are sensitivity and specificity.

Sensitivity is the proportion of truly positives cases that were classified as positive; thus, it is a measure of how well your classifier identifies positive cases.  It is also known as the true positive rate.  Formally,

$\text{Sensitivity} = \text{(Number of True Positives)} \ \div \ \text{(Number of True Positives + Number of False Negatives)}$

Specificity is the proportion of truly negative cases that were classified as negative; thus, it is a measure of how well your classifier identifies negative cases.  It is also known as the true negative rate.  Formally,

$\text{Specificity} = \text{(Number of True Negatives)} \ \div \ \text{(Number of True Negatives + Number of False Positives)}$