ROC Curve Plots, Delong-Delong comparisons & Decision plots

ROC, or Receiver Operator Characteristic, is used to examine the performance of two or more diagnostic tests over a range of decision levels (medical decision points). Performance is the test's ability to correctly identify normal and abnormal (diseased) cases. Individual diagnostic tests can be further evaluated using ROC curve, and individual decision levels evaluated using Qualitative (Sensitivity / Specificity).

The requirements of the test are:

A dichotomous scale sample containing the true state of the subject: normal or abnormal (diseased).
Two or more diagnostic tests measured on a continuous scale.

Arranging the dataset

Data in existing Excel worksheets can be used and should be arranged in the List dataset layout. dataset must contain a nominal scale variable indicating the true state of the case, positive or negative, and two to six continuous scale variables containing the observations of the diagnostic tests.

When entering new data we recommend using New Dataset to create a new test performance dataset.

Using the test

To start the test:

Excel 2007:
Select any cell in the range containing the dataset to analyse, then click Test performance on the Analyse-it tab, then click Compare ROC curves.

Excel 97, 2000, 2002 & 2003:
Select any cell in the range containing the dataset to analyse, then click Analyse on the Analyse-it toolbar, click Test performance then click Compare ROC curves.

Click True classification then select the dichotomous variable containing the true state (positive or negative) of the case.
Click Positive indicated by and select the group indicating a positive case. The list shows the two groups of the classification variable.
Tick the variables in Test containing the observations of the diagnostic tests to compare.
Click Missing observations and select Pairwise deletion if, when comparing pairs of diagnostic tests, a missing observation in either test will cause the case to be deleted (ignored). Select Listwise deletion if a missing observation in any of the diagnostic tests will cause the case to be deleted (ignored), regardless of which diagnostic tests are being compared.
Enter Confidence level to compute for the area under the curve (AUC, see below) of the diagnostic test. The level should be entered as a percentage between 50 and 100, without the % sign.
Click OK to run the test.

The report shows the number of observations analysed, how many missing values were listwise excluded, and the number of normal and abnormal cases.

The area under the curve (AUC) is a measure of the ability of the diagnostic test to correctly identify cases. Diagnostic tests with higher AUCs are generally better and should always be higher than 0.5, indicating the test is better at diagnosing than chance (guessing the diagnosis).

When multiple diagnostic test are compared, each pair of tests are compared to determine if one test is significantly better than the other. The difference between the AUC is shown and a hypothesis test is used to test if they are significantly different. A significant p-value implies that the test with the highest AUC (see the table above) diagnoses significantly better than the other.

METHOD The Delong Delong Clarke-Pearson method (see [2]) is used to compare curves.

Understanding the ROC plot

The ROC plot (see below) shows False Positive rate (1-specificity) (X axis), the probability of incorrectly diagnosing a positive case when actually negative, against True positive rate (sensitivity) (Y axis), the probability of correctly diagnosing a positive case, across all decision levels for the diagnostic tests. Ideally the curve for a diagnostic test would climb quickly toward the top-left, meaning the test identifies positive cases without incorrectly diagnosing negative cases. The diagonal grey line is a guideline for a test that has no ability (better than chance) of correctly identifying cases.

As values on the X axis increase the chance of incorrectly diagnosing negative cases (as positive) increases, though whether this is significant has to be evaluated in terms of the cost of missing a positive case vs. treating a negative case (see below for how to factor in misdiagnosis costs).

References to further reading

Comparing the Areas Under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach
DeLong E.R., DeLong D.M., Clarke-Pearson D.L. Biometrics 1988; 44:837-45.
Statistical Methods in Diagnostic Medicine.
Xiao-Hua Zhou, Nancy A. Obuchowski, Donna K. McClish 2002; ISBN 0-471-34772-8