ROC, or Receiver Operator Characteristic, is used to examine the performance of two or more diagnostic tests over a range of decision levels (medical decision points). Performance is the test's ability to correctly identify normal and abnormal (diseased) cases. Individual diagnostic tests can be further evaluated using ROC curve, and individual decision levels evaluated using Qualitative (Sensitivity / Specificity).
The requirements of the test are:
Data in existing Excel worksheets can be used and should be arranged in the List dataset layout. dataset must contain a nominal scale variable indicating the true state of the case, positive or negative, and two to six continuous scale variables containing the observations of the diagnostic tests.
When entering new data we recommend using New Dataset to create a new test performance dataset.
To start the test:
Excel 97, 2000, 2002 & 2003: Select any cell in the range containing the dataset to analyse, then click Analyse on the Analyse-it toolbar, click Test performance then click Compare ROC curves.
The report shows the number of observations analysed, how many missing values were listwise excluded, and the number of normal and abnormal cases.
The area under the curve (AUC) is a measure of the ability of the diagnostic test to correctly identify cases. Diagnostic tests with higher AUCs are generally better and should always be higher than 0.5, indicating the test is better at diagnosing than chance (guessing the diagnosis).
When multiple diagnostic test are compared, each pair of tests are compared to determine if one test is significantly better than the other. The difference between the AUC is shown and a hypothesis test is used to test if they are significantly different. A significant p-value implies that the test with the highest AUC (see the table above) diagnoses significantly better than the other.
METHOD The Delong Delong Clarke-Pearson method (see [2]) is used to compare curves.
The ROC plot (see below) shows False Positive rate (1-specificity) (X axis), the probability of incorrectly diagnosing a positive case when actually negative, against True positive rate (sensitivity) (Y axis), the probability of correctly diagnosing a positive case, across all decision levels for the diagnostic tests. Ideally the curve for a diagnostic test would climb quickly toward the top-left, meaning the test identifies positive cases without incorrectly diagnosing negative cases. The diagonal grey line is a guideline for a test that has no ability (better than chance) of correctly identifying cases.
As values on the X axis increase the chance of incorrectly diagnosing negative cases (as positive) increases, though whether this is significant has to be evaluated in terms of the cost of missing a positive case vs. treating a negative case (see below for how to factor in misdiagnosis costs).