We are receiving a lot of questions about relevant analyses in the Analyse-it Method Validation edition to help in evaluating new diagnostic tests in the fight against COVID-19. Below are some quick links that will help, but contact us if you have questions - we are working as normal.
Also see our latest blog post: Sensitivity/Specificity and The Importance of Predictive Values for a COVID-19 test
A residual plot shows the difference between the observed response and the fitted response values.
The ideal residual plot, called the null residual plot, shows a random scatter of points forming an approximately constant width band around the identity line.
It is important to check the fit of the model and assumptions – constant variance, normality, and independence of the errors, using the residual plot, along with normal, sequence, and lag plot.
You might be able to transform variables or add polynomial and interaction terms to remove the pattern.
You should consider transforming the response variable or incorporating weights into the model. When variance increases as a percentage of the response, you can use a log transform, although you should ensure it does not produce a poorly fitting model.
Even with non-constant variance, the parameter estimates remain unbiased if somewhat inefficient. However,
the hypothesis tests and confidence intervals are inaccurate.
Violation of the normality assumption only becomes an issue with small sample sizes. For large sample sizes, the assumption is less important due to the central limit theorem, and the fact that the F- and t-tests used for hypothesis tests and forming confidence intervals are quite robust to modest departures from normality.
Examine a sequence plot of the residuals against the order to identify any dependency between the residual and time.
Examine a lag-1 plot of each residual against the previous residual to identify a serial correlation, where observations are not independent, and there is a correlation between an observation and the previous observation.
Time-series analysis may be more suitable to model
data where serial correlation is present.
For a model with many terms, it can be difficult to identify specific problems using the
residual plot. A non-null residual plot indicates that there are problems with the model, but not
necessarily what these are.